Pre-trained Models

To facilitate the usage of knowledge graph representations in semantic tasks, we provide a bunch of pre-trained embeddings for some common datasets.

Wikidata5m

Wikidata5m is a large-scale knowledge graph dataset constructed from Wikidata and Wikipedia. It contains plenty of entities in the general domain, such as celebrities, events, concepts and things.

We trained 5 standard knowledge graph embedding models on Wikidata5m. The performance benchmark of these models can be found here.

Model

Dimension

Size

Download link

TransE

512

9.33 GB

transe_wikidata5m.pkl

DistMult

512

9.33 GB

distmult_wikidata5m.pkl

ComplEx

512

9.33 GB

complex_wikidata5m.pkl

SimplE

512

9.33 GB

simple_wikidata5m.pkl

RotatE

512

9.33 GB

rotate_wikidata5m.pkl

QuatE

512

9.36 GB

quate_wikidata5m.pkl

Load pre-trained models

The pre-trained models can be loaded through pickle.

import pickle
with open("transe_wikidata5m.pkl", "rb") as fin:
    model = pickle.load(fin)
entity2id = model.graph.entity2id
relation2id = model.graph.relation2id
entity_embeddings = model.solver.entity_embeddings
relation_embeddings = model.solver.relation_embeddings

Load the alias mapping from the dataset. Now we can access the embeddings by natural language index.

import graphvite as gv
alias2entity = gv.dataset.wikidata5m.alias2entity
alias2relation = gv.dataset.wikidata5m.alias2relation
print(entity_embeddings[entity2id[alias2entity["machine learning"]]])
print(relation_embeddings[relation2id[alias2relation["field of work"]]])