GraphVite - graph embedding at high speed and large scale¶

GraphVite is a general graph embedding engine, dedicated to high-speed and large-scale embedding learning in various applications. By cooperating CPUs and GPUs for learning, it scales to million-scale or even billion-scale graphs. With its Python interface, you can easily practice advanced graph embedding algorithms, and get results in incredibly short time.

Try GraphVite if you have any of the following demands.

You want to reproduce graph learning algorithms on a uniform platform.
You need fast visualization for graphs or high-dimensional data.
You are tired of waiting a long time for prototyping or tuning models.
You need to learn representations of large graphs or knowledge graphs.

Generally, GraphVite provides complete training and evaluation pipelines for 3 applications: node embedding, knowledge graph embedding and graph & high-dimensional data visualization. Besides, it also includes 9 popular models, along with their benchmarks on a bunch of standard datasets.

Node Embedding¶

Knowledge Graph
Embedding¶

Graph &
High-dimensional
Data Visualization¶

How fast is GraphVite?¶

To give a brief idea of GraphVite’s speed, we summarize the training time of GraphVite along with the best open-source implementations. All the time is reported based on a server with 24 CPU threads and 4 V100 GPUs.

Training time of node embedding on Youtube dataset.

Model	Existing Implementation	GraphVite	Speedup
DeepWalk	1.64 hrs (CPU parallel)	1.19 mins	82.9x
LINE	1.39 hrs (CPU parallel)	1.17 mins	71.4x
node2vec	24.4 hrs (CPU parallel)	4.39 mins	334x

Training / evaluation time of knowledge graph embedding on FB15k dataset.

Model	Existing Implementation	GraphVite	Speedup
TransE	1.31 hrs / 1.75 mins (1 GPU)	13.5 mins / 54.3 s	5.82x / 1.93x
RotatE	3.69 hrs / 4.19 mins (1 GPU)	28.1 mins / 55.8 s	7.88x / 4.50x

Training time of high-dimensional data visualization on MNIST dataset.

Model	Existing Implementation	GraphVite	Speedup
LargeVis	15.3 mins (CPU parallel)	13.9 s	66.8x

Comparison to concurrent work¶

A work concurrent to GraphVite is PyTorch-BigGraph, which aims at accelerating knowledge graph embedding on large-scale data. Here is an apple-to-apple comparison of models implemented in both libraries on FB15k, under the same setting of hyperparameters.

Model	PyTorch-BigGraph	GraphVite	Speedup
TransE	1.21 hrs	8.37 mins	8.70x
DistMult	2.48 hrs	20.3 mins	7.33x
ComplEx	3.13 hrs	18.5 mins	10.1x

GraphVite surpasses its counterpart by a signficant margin. Besides, the framework of GraphVite also supports two more applications, and provides many benchmarks for easy research and development.

About the name¶

GraphVite(/ɡɹæfvit/) is a combination of English word “graph” and French word “vite”, which means “rapid”. GraphVite represents the traits of this library, as well as the bilingual environment of Mila where the library was developed.