GraphVite - graph embedding at high speed and large scale

GraphVite is a general graph embedding engine, dedicated to high-speed and large-scale embedding learning in various applications. By cooperating CPUs and GPUs for learning, it scales to million-scale or even billion-scale graphs. With its Python interface, you can easily practice advanced graph embedding algorithms, and get results in incredibly short time.

Try GraphVite if you have any of the following demands.

  • You want to reproduce graph learning algorithms on a uniform platform.

  • You need fast visualization for graphs or high-dimensional data.

  • You are tired of waiting a long time for prototyping or tuning models.

  • You need to learn representations of large graphs or knowledge graphs.

Generally, GraphVite provides complete training and evaluation pipelines for 3 applications: node embedding, knowledge graph embedding and graph & high-dimensional data visualization. Besides, it also includes 9 popular models, along with their benchmarks on a bunch of standard datasets.

_images/graph.png

Node Embedding

_images/knowledge_graph.png

Knowledge Graph
Embedding

_images/visualization.png

Graph &
High-dimensional
Data Visualization

How fast is GraphVite?

To give a brief idea of GraphVite’s speed, we summarize the training time of GraphVite along with the best open-source implementations. All the time is reported based on a server with 24 CPU threads and 4 V100 GPUs.

Training time of node embedding on Youtube dataset.

Model

Existing Implementation

GraphVite

Speedup

DeepWalk

1.64 hrs (CPU parallel)

1.19 mins

82.9x

LINE

1.39 hrs (CPU parallel)

1.17 mins

71.4x

node2vec

24.4 hrs (CPU parallel)

4.39 mins

334x

Training / evaluation time of knowledge graph embedding on FB15k dataset.

Model

Existing Implementation

GraphVite

Speedup

TransE

1.31 hrs / 1.75 mins (1 GPU)

13.5 mins / 54.3 s

5.82x / 1.93x

RotatE

3.69 hrs / 4.19 mins (1 GPU)

28.1 mins / 55.8 s

7.88x / 4.50x

Training time of high-dimensional data visualization on MNIST dataset.

Model

Existing Implementation

GraphVite

Speedup

LargeVis

15.3 mins (CPU parallel)

13.9 s

66.8x

Comparison to concurrent work

A work concurrent to GraphVite is PyTorch-BigGraph, which aims at accelerating knowledge graph embedding on large-scale data. Here is an apple-to-apple comparison of models implemented in both libraries on FB15k, under the same setting of hyperparameters.

Model

PyTorch-BigGraph

GraphVite

Speedup

TransE

1.21 hrs

8.37 mins

8.70x

DistMult

2.48 hrs

20.3 mins

7.33x

ComplEx

3.13 hrs

18.5 mins

10.1x

GraphVite surpasses its counterpart by a signficant margin. Besides, the framework of GraphVite also supports two more applications, and provides many benchmarks for easy research and development.

About the name

GraphVite(/ɡɹæfvit/) is a combination of English word “graph” and French word “vite”, which means “rapid”. GraphVite represents the traits of this library, as well as the bilingual environment of Mila where the library was developed.