GraphVite - graph embedding at high speed and large scale ========================================================= .. include:: link.rst GraphVite is a general graph embedding engine, dedicated to high-speed and large-scale embedding learning in various applications. By cooperating CPUs and GPUs for learning, it scales to million-scale or even billion-scale graphs. With its Python interface, you can easily practice advanced graph embedding algorithms, and get results in incredibly short time. Try GraphVite if you have any of the following demands. - You want to reproduce graph learning algorithms on a uniform platform. - You need fast visualization for graphs or high-dimensional data. - You are tired of waiting a long time for prototyping or tuning models. - You need to learn representations of large graphs or knowledge graphs. Generally, GraphVite provides complete training and evaluation pipelines for 3 applications: **node embedding**, **knowledge graph embedding** and **graph & high-dimensional data visualization**. Besides, it also includes 9 popular models, along with their benchmarks on a bunch of standard datasets. .. figure:: ../../asset/graph.png :align: left :height: 180px :target: overview.html#node-embedding :figclass: align-center Node Embedding .. figure:: ../../asset/knowledge_graph.png :align: left :height: 180px :target: overview.html#knowledge-graph-embedding :figclass: align-center Knowledge Graph |br| Embedding .. figure:: ../../asset/visualization.png :align: left :height: 180px :target: overview.html#graph-high-dimensional-data-visualization :figclass: align-center Graph & |br| High-dimensional |br| Data Visualization .. |br| raw:: html
.. raw:: html
How fast is GraphVite? ---------------------- To give a brief idea of GraphVite's speed, we summarize the training time of GraphVite along with the best open-source implementations. All the time is reported based on a server with 24 CPU threads and 4 V100 GPUs. Training time of node embedding on `Youtube`_ dataset. +-------------+----------------------------+-----------+---------+ | Model | Existing Implementation | GraphVite | Speedup | +=============+============================+===========+=========+ | `DeepWalk`_ | `1.64 hrs (CPU parallel)`_ | 1.19 mins | 82.9x | +-------------+----------------------------+-----------+---------+ | `LINE`_ | `1.39 hrs (CPU parallel)`_ | 1.17 mins | 71.4x | +-------------+----------------------------+-----------+---------+ | `node2vec`_ | `24.4 hrs (CPU parallel)`_ | 4.39 mins | 334x | +-------------+----------------------------+-----------+---------+ .. _1.64 hrs (CPU parallel): https://github.com/phanein/deepwalk .. _1.39 hrs (CPU parallel): https://github.com/tangjianpku/LINE .. _24.4 hrs (CPU parallel): https://github.com/aditya-grover/node2vec Training / evaluation time of knowledge graph embedding on `FB15k`_ dataset. +-----------+---------------------------------+--------------------+---------------+ | Model | Existing Implementation | GraphVite | Speedup | +===========+=================================+====================+===============+ | `TransE`_ | `1.31 hrs / 1.75 mins (1 GPU)`_ | 13.5 mins / 54.3 s | 5.82x / 1.93x | +-----------+---------------------------------+--------------------+---------------+ | `RotatE`_ | `3.69 hrs / 4.19 mins (1 GPU)`_ | 28.1 mins / 55.8 s | 7.88x / 4.50x | +-----------+---------------------------------+--------------------+---------------+ .. _1.31 hrs / 1.75 mins (1 GPU): https://github.com/DeepGraphLearning/KnowledgeGraphEmbedding .. _3.69 hrs / 4.19 mins (1 GPU): https://github.com/DeepGraphLearning/KnowledgeGraphEmbedding Training time of high-dimensional data visualization on `MNIST`_ dataset. +-------------+-----------------------------+-----------+---------+ | Model | Existing Implementation | GraphVite | Speedup | +=============+=============================+===========+=========+ | `LargeVis`_ | `15.3 mins (CPU parallel)`_ | 13.9 s | 66.8x | +-------------+-----------------------------+-----------+---------+ .. _15.3 mins (CPU parallel): https://github.com/lferry007/LargeVis Comparison to concurrent work ----------------------------- A work concurrent to GraphVite is `PyTorch-BigGraph`_, which aims at accelerating knowledge graph embedding on large-scale data. Here is an apple-to-apple comparison of models implemented in both libraries on `FB15k`_, under the same setting of hyperparameters. .. _PyTorch-BigGraph: https://torchbiggraph.readthedocs.io +-------------+------------------+-----------+---------+ | Model | PyTorch-BigGraph | GraphVite | Speedup | +=============+==================+===========+=========+ | `TransE`_ | 1.21 hrs | 8.37 mins | 8.70x | +-------------+------------------+-----------+---------+ | `DistMult`_ | 2.48 hrs | 20.3 mins | 7.33x | +-------------+------------------+-----------+---------+ | `ComplEx`_ | 3.13 hrs | 18.5 mins | 10.1x | +-------------+------------------+-----------+---------+ GraphVite surpasses its counterpart by a signficant margin. Besides, the framework of GraphVite also supports two more applications, and provides many benchmarks for easy research and development. About the name -------------- GraphVite(/ɡɹæfvit/) is a combination of English word "graph" and French word "vite", which means "rapid". GraphVite represents the traits of this library, as well as the bilingual environment of `Mila`_ where the library was developed. .. _Mila: https://mila.quebec