Quick Start¶

Here is a quick-start example that illustrate the pipeline in GraphVite. If pytorch is not installed, we can simply add --no-eval to skip the evaluation stage.

graphvite baseline quick start

The example will automatically download a social network dataset called BlogCatalog, where nodes correspond to blog users. For each node, we learn an embedding vector that preserves its neighborhood structure, which is done by minimizing a reconstruction loss. GraphVite will display the progress and the loss during training.

Once the training is done, the learned embeddings are evaluated on link prediction and node classification tasks. For link prediction, we try to predict unseen edges with the embeddings. For node classification, we use the embeddings as inputs for multi-label classification of nodes.

Typically, this example takes no more than 1 minute. We will obtain some output like

Batch id: 6000
loss = 0.371041

------------- link prediction --------------
AUC: 0.899933

----------- node classification ------------
macro-F1@20%: 0.242114
micro-F1@20%: 0.391342

Note that the F1 scores may vary across different trials, as only one random split is evaluated for quick demonstration here.

The learned embeddings are saved into a pickle dump. We can load them for further use.

>>> import pickle
>>> with open("line_blogcatalog.pkl", "rb") as fin:
>>>     blogcatalog = pickle.load(fin)
>>> names = blogcatalog.id2name
>>> embeddings = blogcatalog.vertex_embeddings
>>> print(names[1024], embeddings[1024])

Another interesting example is a synthetic math dataset of arithmetic operations. By treating the operations as relations of a knowledge graph, we can learn embeddings that generalize to unseen triplets (i.e. computation formulas). Check out this example with

graphvite baseline math

For a more in-depth tutorial about GraphVite, take a look at