Pre-trained Models¶
To facilitate the usage of knowledge graph representations in semantic tasks, we provide a bunch of pre-trained embeddings for some common datasets.
Wikidata5m¶
Wikidata5m is a large-scale knowledge graph dataset constructed from Wikidata and Wikipedia. It contains plenty of entities in the general domain, such as celebrities, events, concepts and things.
We trained 5 standard knowledge graph embedding models on Wikidata5m. The performance benchmark of these models can be found here.
Model |
Dimension |
Size |
Download link |
---|---|---|---|
512 |
9.33 GB |
||
512 |
9.33 GB |
||
512 |
9.33 GB |
||
512 |
9.33 GB |
||
512 |
9.33 GB |
||
512 |
9.36 GB |
Load pre-trained models¶
The pre-trained models can be loaded through pickle
.
import pickle
with open("transe_wikidata5m.pkl", "rb") as fin:
model = pickle.load(fin)
entity2id = model.graph.entity2id
relation2id = model.graph.relation2id
entity_embeddings = model.solver.entity_embeddings
relation_embeddings = model.solver.relation_embeddings
Load the alias mapping from the dataset. Now we can access the embeddings by natural language index.
import graphvite as gv
alias2entity = gv.dataset.wikidata5m.alias2entity
alias2relation = gv.dataset.wikidata5m.alias2relation
print(entity_embeddings[entity2id[alias2entity["machine learning"]]])
print(relation_embeddings[relation2id[alias2relation["field of work"]]])