graphvite.application¶

Application module of GraphVite

class graphvite.application.Application(type, *args, **kwargs)[source]¶

Create an application instance of any type.

Parameters: type (str) – application type, can be ‘graph’, ‘word graph’, ‘knowledge graph’ or ‘visualization’

class graphvite.application.GraphApplication(dim, gpus=[], cpu_per_gpu=0, gpu_memory_limit=0, float_type=dtype.float32, index_type=dtype.uint32)[source]¶

Node embedding application.

Given a graph, it embeds each node into a continuous vector representation. The learned embeddings can be used for many downstream tasks. e.g. node classification, link prediction, node analogy. The similarity between node embeddings can be measured by cosine distance.

Supported Models:

DeepWalk (DeepWalk: Online Learning of Social Representations)
LINE (LINE: Large-scale Information Network Embedding)
node2vec (node2vec: Scalable Feature Learning for Networks)

Parameters

dim (int) – dimension of embeddings
gpus (list of int, optional) – GPU ids, default is all GPUs
cpu_per_gpu (int, optional) – number of CPU threads per GPU, default is all CPUs
float_type (dtype, optional) – type of parameters
index_type (dtype, optional) – type of graph indexes

See also

Graph, GraphSolver

build(**kwargs)¶: Build the solver from the graph. Arguments depend on the underlying solver type.

evaluate(task, **kwargs)¶

Evaluate the learned embeddings on a downstream task. Arguments depend on the underlying graph type and the task.

Parameters: task (str) – name of task
Returns: metrics and their values
Return type: dict

link_prediction(H=None, T=None, Y=None, file_name=None, filter_H=None, filter_T=None, filter_file=None)[source]¶

Evaluate node embeddings on link prediction task.

Parameters

H (list of str, optional) – names of head nodes
T (list of str, optional) – names of tail nodes
Y (list of int, optional) – labels of edges
file_name (str, optional) – file of edges and labels (e.g. validation set)
filter_H (list of str, optional) – names of head nodes to filter out
filter_T (list of str, optional) – names of tail nodes to filter out
filter_file (str, optional) – file of edges to filter out (e.g. training set)

Returns

AUC of link prediction

Return type

dict

load(**kwargs)¶: Load a graph from file or Python object. Arguments depend on the underlying graph type.

node_classification(X=None, Y=None, file_name=None, portions=(0.02, ), normalization=False, times=1, patience=100)[source]¶

Evaluate node embeddings on node classification task.

Parameters

X (list of str, optional) – names of nodes
Y (list, optional) – labels of nodes
file_name (str, optional) – file of nodes & labels
portions (tuple of float, optional) – how much data for training
normalization (bool, optional) – normalize the embeddings or not
times (int, optional) – number of trials
patience (int, optional) – patience on loss convergence

Returns

macro-F1 & micro-F1 averaged over all trials

Return type

dict

save(file_name)¶

Save embeddings and name mappings in numpy format.

Parameters: file_name (str) – file name

set_format(delimiters=' \t\r\n', comment='#')¶

Set the format for parsing input data.

Parameters

delimiters (str, optional) – string of delimiter characters
comment (str, optional) – prefix of comment strings

train(**kwargs)¶: Train embeddings with the solver. Arguments depend on the underlying solver type.

class graphvite.application.WordGraphApplication(dim, gpus=[], cpu_per_gpu=0, gpu_memory_limit=0, float_type=dtype.float32, index_type=dtype.uint32)[source]¶

Word node embedding application.

Given a corpus, it embeds each word into a continuous vector representation. The learned embeddings can be used for natural language processing tasks. This can be viewed as a variant of the word2vec algorithm, with random walk augmentation support. The similarity between node embeddings can be measured by cosine distance.

Supported Models:

LINE (LINE: Large-scale Information Network Embedding)

Parameters

dim (int) – dimension of embeddings
gpus (list of int, optional) – GPU ids, default is all GPUs
cpu_per_gpu (int, optional) – number of CPU threads per GPU, default is all CPUs
float_type (dtype, optional) – type of parameters
index_type (dtype, optional) – type of graph indexes

See also

WordGraph, GraphSolver

build(**kwargs)¶: Build the solver from the graph. Arguments depend on the underlying solver type.

evaluate(task, **kwargs)¶

Evaluate the learned embeddings on a downstream task. Arguments depend on the underlying graph type and the task.

Parameters: task (str) – name of task
Returns: metrics and their values
Return type: dict

load(**kwargs)¶: Load a graph from file or Python object. Arguments depend on the underlying graph type.

save(file_name)¶

Save embeddings and name mappings in numpy format.

Parameters: file_name (str) – file name

set_format(delimiters=' \t\r\n', comment='#')¶

Set the format for parsing input data.

Parameters

delimiters (str, optional) – string of delimiter characters
comment (str, optional) – prefix of comment strings

train(**kwargs)¶: Train embeddings with the solver. Arguments depend on the underlying solver type.

class graphvite.application.KnowledgeGraphApplication(dim, gpus=[], cpu_per_gpu=0, gpu_memory_limit=0, float_type=dtype.float32, index_type=dtype.uint32)[source]¶

Knowledge graph embedding application.

Given a knowledge graph, it embeds each entity and relation into a continuous vector representation respectively. The learned embeddings can be used for analysis of knowledge graphs. e.g. entity prediction, link prediction. The likelihood of edges can be predicted by computing the score function over embeddings of triplets.

Supported Models:

TransE (Translating Embeddings for Modeling Multi-relational Data)
DistMult (Embedding Entities and Relations for Learning and Inference in Knowledge Bases)
ComplEx (Complex Embeddings for Simple Link Prediction)
SimplE (SimplE Embedding for Link Prediction in Knowledge Graphs)
RotatE (RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space)

Parameters

dim (int) – dimension of embeddings
gpus (list of int, optional) – GPU ids, default is all GPUs
cpu_per_gpu (int, optional) – number of CPU threads per GPU, default is all CPUs
float_type (dtype, optional) – type of parameters
index_type (dtype, optional) – type of graph indexes

Note

The implementation of TransE, DistMult and ComplEx, SimplE are slightly different from their original papers. The loss function and the regularization term generally follow this repo. Self-adversarial negative sampling is also adopted in these models like RotatE.

build(**kwargs)¶: Build the solver from the graph. Arguments depend on the underlying solver type.

entity_prediction(H=None, R=None, T=None, file_name=None, save_file=None, target='tail', k=10, backend='graphvite')[source]¶

Predict the distribution of missing entity or relation for triplets.

Parameters

H (list of str, optional) – names of head entities
R (list of str, optional) – names of relations
T (list of str, optional) – names of tail entities
file_name (str, optional) – file of triplets (e.g. validation set)
save_file (str, optional) – txt or pkl file to save predictions
k (int, optional) – top-k recalls will be returned
target (str, optional) – ‘head’ or ‘tail’
backend (str, optional) – ‘graphvite’ or ‘torch’

Returns

top-k recalls for each triplet, if save file is not provided

Return type

list of list of tuple

evaluate(task, **kwargs)¶

Evaluate the learned embeddings on a downstream task. Arguments depend on the underlying graph type and the task.

Parameters: task (str) – name of task
Returns: metrics and their values
Return type: dict

link_prediction(H=None, R=None, T=None, filter_H=None, filter_R=None, filter_T=None, file_name=None, filter_files=None, target='both', fast_mode=None, backend='graphvite')[source]¶

Evaluate knowledge graph embeddings on link prediction task.

Parameters

H (list of str, optional) – names of head entities
R (list of str, optional) – names of relations
T (list of str, optional) – names of tail entities
file_name (str, optional) – file of triplets (e.g. validation set)
filter_H (list of str, optional) – names of head entities to filter out
filter_R (list of str, optional) – names of relations to filter out
filter_T (list of str, optional) – names of tail entities to filter out
filter_files (str, optional) – files of triplets to filter out (e.g. training / validation / test set)
target (str, optional) – ‘head’, ‘tail’ or ‘both’
fast_mode (int, optional) – if specified, only that number of samples will be evaluated
backend (str, optional) – ‘graphvite’ or ‘torch’

Returns

MR, MRR, HITS@1, HITS@3 & HITS@10 of link prediction

Return type

dict

load(**kwargs)¶: Load a graph from file or Python object. Arguments depend on the underlying graph type.

save(file_name)¶

Save embeddings and name mappings in numpy format.

Parameters: file_name (str) – file name

set_format(delimiters=' \t\r\n', comment='#')¶

Set the format for parsing input data.

Parameters

delimiters (str, optional) – string of delimiter characters
comment (str, optional) – prefix of comment strings

train(**kwargs)¶: Train embeddings with the solver. Arguments depend on the underlying solver type.

class graphvite.application.VisualizationApplication(dim, gpus=[], cpu_per_gpu=0, gpu_memory_limit=0, float_type=dtype.float32, index_type=dtype.uint32)[source]¶

Graph & high-dimensional data visualization.

Given a graph or high-dimensional vectors, it maps each node to 2D or 3D coordinates to faciliate visualization. The learned coordinates preserve most local similarity information of the original input, and may shed some light on the structure of the graph or the high-dimensional space.

Supported Models:

LargeVis (Visualizing Large-scale and High-dimensional Data)

Parameters

dim (int) – dimension of embeddings
gpus (list of int, optional) – GPU ids, default is all GPUs
cpu_per_gpu (int, optional) – number of CPU threads per GPU, default is all CPUs
float_type (dtype, optional) – type of parameters
index_type (dtype, optional) – type of graph indexes

animation(Y=None, file_name=None, save_file=None, figure_size=5, scale=1, elevation=30, num_frame=700)[source]¶

Rotate learn 3D coordinates as an animation.

Parameters

Y (list of str, optional) – labels of vectors
file_name (str, optional) – file of labels
save_file (str) – gif file to save visualization
figure_size (int, optional) – size of figure
scale (int, optional) – size of points
elevation (float, optional) – elevation angle
num_frame (int, optional) – number of frames

build(**kwargs)¶: Build the solver from the graph. Arguments depend on the underlying solver type.

evaluate(task, **kwargs)¶

Evaluate the learned embeddings on a downstream task. Arguments depend on the underlying graph type and the task.

Parameters: task (str) – name of task
Returns: metrics and their values
Return type: dict

hierarchy(HY=None, file_name=None, target=None, save_file=None, figure_size=10, scale=2, duration=3)[source]¶

Visualize learned 2D coordinates with hierarchical labels.

Parameters

HY (list of list of str, optional) – hierarchical labels of vectors
file_name (str, optional) – file of hierarchical labels
target (str) – target class
save_file (str) – gif file to save visualization
figure_size (int, optional) – size of figure
scale (int, optional) – size of points
duration (float, optional) – duration of each frame in seconds

load(**kwargs)¶: Load a graph from file or Python object. Arguments depend on the underlying graph type.

save(file_name)¶

Save embeddings and name mappings in numpy format.

Parameters: file_name (str) – file name

set_format(delimiters=' \t\r\n', comment='#')¶

Set the format for parsing input data.

Parameters

delimiters (str, optional) – string of delimiter characters
comment (str, optional) – prefix of comment strings

train(**kwargs)¶: Train embeddings with the solver. Arguments depend on the underlying solver type.

visualization(Y=None, file_name=None, save_file=None, figure_size=10, scale=2)[source]¶

Visualize learned 2D or 3D coordinates.

Parameters

Y (list of str, optional) – labels of vectors
file_name (str, optional) – file of labels
save_file (str, optional) – png or pdf file to save visualization, if not provided, show the figure in window
figure_size (int, optional) – size of figure
scale (int, optional) – size of points