Pre-trained Models¶

To facilitate the usage of knowledge graph representations in semantic tasks, we provide a bunch of pre-trained embeddings for some common datasets.

Wikidata5m¶

Wikidata5m is a large-scale knowledge graph dataset constructed from Wikidata and Wikipedia. It contains plenty of entities in the general domain, such as celebrities, events, concepts and things.

We trained 5 standard knowledge graph embedding models on Wikidata5m. The performance benchmark of these models can be found here.

Model	Dimension	Size	Download link
TransE	512	9.33 GB	transe_wikidata5m.pkl
DistMult	512	9.33 GB	distmult_wikidata5m.pkl
ComplEx	512	9.33 GB	complex_wikidata5m.pkl
SimplE	512	9.33 GB	simple_wikidata5m.pkl
RotatE	512	9.33 GB	rotate_wikidata5m.pkl
QuatE	512	9.36 GB	quate_wikidata5m.pkl

Load pre-trained models¶

The pre-trained models can be loaded through pickle.

import pickle
with open("transe_wikidata5m.pkl", "rb") as fin:
    model = pickle.load(fin)
entity2id = model.graph.entity2id
relation2id = model.graph.relation2id
entity_embeddings = model.solver.entity_embeddings
relation_embeddings = model.solver.relation_embeddings

Load the alias mapping from the dataset. Now we can access the embeddings by natural language index.

import graphvite as gv
alias2entity = gv.dataset.wikidata5m.alias2entity
alias2relation = gv.dataset.wikidata5m.alias2relation
print(entity_embeddings[entity2id[alias2entity["machine learning"]]])
print(relation_embeddings[relation2id[alias2relation["field of work"]]])