graphvite.application¶
Application module of GraphVite
-
class
graphvite.application.
Application
(type, *args, **kwargs)[source]¶ Create an application instance of any type.
- Parameters
type (str) – application type, can be ‘graph’, ‘word graph’, ‘knowledge graph’ or ‘visualization’
-
class
graphvite.application.
GraphApplication
(dim, gpus=[], cpu_per_gpu=0, gpu_memory_limit=0, float_type=dtype.float32, index_type=dtype.uint32)[source]¶ Node embedding application.
Given a graph, it embeds each node into a continuous vector representation. The learned embeddings can be used for many downstream tasks. e.g. node classification, link prediction, node analogy. The similarity between node embeddings can be measured by cosine distance.
- Supported Models:
- Parameters
dim (int) – dimension of embeddings
gpus (list of int, optional) – GPU ids, default is all GPUs
cpu_per_gpu (int, optional) – number of CPU threads per GPU, default is all CPUs
float_type (dtype, optional) – type of parameters
index_type (dtype, optional) – type of graph indexes
See also
-
build
(**kwargs)¶ Build the solver from the graph. Arguments depend on the underlying solver type.
-
evaluate
(task, **kwargs)¶ Evaluate the learned embeddings on a downstream task. Arguments depend on the underlying graph type and the task.
- Parameters
task (str) – name of task
- Returns
metrics and their values
- Return type
dict
-
link_prediction
(H=None, T=None, Y=None, file_name=None, filter_H=None, filter_T=None, filter_file=None)[source]¶ Evaluate node embeddings on link prediction task.
- Parameters
H (list of str, optional) – names of head nodes
T (list of str, optional) – names of tail nodes
Y (list of int, optional) – labels of edges
file_name (str, optional) – file of edges and labels (e.g. validation set)
filter_H (list of str, optional) – names of head nodes to filter out
filter_T (list of str, optional) – names of tail nodes to filter out
filter_file (str, optional) – file of edges to filter out (e.g. training set)
- Returns
AUC of link prediction
- Return type
dict
-
load
(**kwargs)¶ Load a graph from file or Python object. Arguments depend on the underlying graph type.
-
node_classification
(X=None, Y=None, file_name=None, portions=(0.02, ), normalization=False, times=1, patience=100)[source]¶ Evaluate node embeddings on node classification task.
- Parameters
X (list of str, optional) – names of nodes
Y (list, optional) – labels of nodes
file_name (str, optional) – file of nodes & labels
portions (tuple of float, optional) – how much data for training
normalization (bool, optional) – normalize the embeddings or not
times (int, optional) – number of trials
patience (int, optional) – patience on loss convergence
- Returns
macro-F1 & micro-F1 averaged over all trials
- Return type
dict
-
save
(file_name)¶ Save embeddings and name mappings in numpy format.
- Parameters
file_name (str) – file name
-
set_format
(delimiters=' \t\r\n', comment='#')¶ Set the format for parsing input data.
- Parameters
delimiters (str, optional) – string of delimiter characters
comment (str, optional) – prefix of comment strings
-
train
(**kwargs)¶ Train embeddings with the solver. Arguments depend on the underlying solver type.
-
class
graphvite.application.
WordGraphApplication
(dim, gpus=[], cpu_per_gpu=0, gpu_memory_limit=0, float_type=dtype.float32, index_type=dtype.uint32)[source]¶ Word node embedding application.
Given a corpus, it embeds each word into a continuous vector representation. The learned embeddings can be used for natural language processing tasks. This can be viewed as a variant of the word2vec algorithm, with random walk augmentation support. The similarity between node embeddings can be measured by cosine distance.
- Supported Models:
- Parameters
dim (int) – dimension of embeddings
gpus (list of int, optional) – GPU ids, default is all GPUs
cpu_per_gpu (int, optional) – number of CPU threads per GPU, default is all CPUs
float_type (dtype, optional) – type of parameters
index_type (dtype, optional) – type of graph indexes
See also
-
build
(**kwargs)¶ Build the solver from the graph. Arguments depend on the underlying solver type.
-
evaluate
(task, **kwargs)¶ Evaluate the learned embeddings on a downstream task. Arguments depend on the underlying graph type and the task.
- Parameters
task (str) – name of task
- Returns
metrics and their values
- Return type
dict
-
load
(**kwargs)¶ Load a graph from file or Python object. Arguments depend on the underlying graph type.
-
save
(file_name)¶ Save embeddings and name mappings in numpy format.
- Parameters
file_name (str) – file name
-
set_format
(delimiters=' \t\r\n', comment='#')¶ Set the format for parsing input data.
- Parameters
delimiters (str, optional) – string of delimiter characters
comment (str, optional) – prefix of comment strings
-
train
(**kwargs)¶ Train embeddings with the solver. Arguments depend on the underlying solver type.
-
class
graphvite.application.
KnowledgeGraphApplication
(dim, gpus=[], cpu_per_gpu=0, gpu_memory_limit=0, float_type=dtype.float32, index_type=dtype.uint32)[source]¶ Knowledge graph embedding application.
Given a knowledge graph, it embeds each entity and relation into a continuous vector representation respectively. The learned embeddings can be used for analysis of knowledge graphs. e.g. entity prediction, link prediction. The likelihood of edges can be predicted by computing the score function over embeddings of triplets.
- Supported Models:
TransE (Translating Embeddings for Modeling Multi-relational Data)
DistMult (Embedding Entities and Relations for Learning and Inference in Knowledge Bases)
SimplE (SimplE Embedding for Link Prediction in Knowledge Graphs)
RotatE (RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space)
- Parameters
dim (int) – dimension of embeddings
gpus (list of int, optional) – GPU ids, default is all GPUs
cpu_per_gpu (int, optional) – number of CPU threads per GPU, default is all CPUs
float_type (dtype, optional) – type of parameters
index_type (dtype, optional) – type of graph indexes
Note
The implementation of TransE, DistMult and ComplEx, SimplE are slightly different from their original papers. The loss function and the regularization term generally follow this repo. Self-adversarial negative sampling is also adopted in these models like RotatE.
See also
-
build
(**kwargs)¶ Build the solver from the graph. Arguments depend on the underlying solver type.
-
entity_prediction
(H=None, R=None, T=None, file_name=None, save_file=None, target='tail', k=10, backend='graphvite')[source]¶ Predict the distribution of missing entity or relation for triplets.
- Parameters
H (list of str, optional) – names of head entities
R (list of str, optional) – names of relations
T (list of str, optional) – names of tail entities
file_name (str, optional) – file of triplets (e.g. validation set)
save_file (str, optional) –
txt
orpkl
file to save predictionsk (int, optional) – top-k recalls will be returned
target (str, optional) – ‘head’ or ‘tail’
backend (str, optional) – ‘graphvite’ or ‘torch’
- Returns
top-k recalls for each triplet, if save file is not provided
- Return type
list of list of tuple
-
evaluate
(task, **kwargs)¶ Evaluate the learned embeddings on a downstream task. Arguments depend on the underlying graph type and the task.
- Parameters
task (str) – name of task
- Returns
metrics and their values
- Return type
dict
-
link_prediction
(H=None, R=None, T=None, filter_H=None, filter_R=None, filter_T=None, file_name=None, filter_files=None, target='both', fast_mode=None, backend='graphvite')[source]¶ Evaluate knowledge graph embeddings on link prediction task.
- Parameters
H (list of str, optional) – names of head entities
R (list of str, optional) – names of relations
T (list of str, optional) – names of tail entities
file_name (str, optional) – file of triplets (e.g. validation set)
filter_H (list of str, optional) – names of head entities to filter out
filter_R (list of str, optional) – names of relations to filter out
filter_T (list of str, optional) – names of tail entities to filter out
filter_files (str, optional) – files of triplets to filter out (e.g. training / validation / test set)
target (str, optional) – ‘head’, ‘tail’ or ‘both’
fast_mode (int, optional) – if specified, only that number of samples will be evaluated
backend (str, optional) – ‘graphvite’ or ‘torch’
- Returns
MR, MRR, HITS@1, HITS@3 & HITS@10 of link prediction
- Return type
dict
-
load
(**kwargs)¶ Load a graph from file or Python object. Arguments depend on the underlying graph type.
-
save
(file_name)¶ Save embeddings and name mappings in numpy format.
- Parameters
file_name (str) – file name
-
set_format
(delimiters=' \t\r\n', comment='#')¶ Set the format for parsing input data.
- Parameters
delimiters (str, optional) – string of delimiter characters
comment (str, optional) – prefix of comment strings
-
train
(**kwargs)¶ Train embeddings with the solver. Arguments depend on the underlying solver type.
-
class
graphvite.application.
VisualizationApplication
(dim, gpus=[], cpu_per_gpu=0, gpu_memory_limit=0, float_type=dtype.float32, index_type=dtype.uint32)[source]¶ Graph & high-dimensional data visualization.
Given a graph or high-dimensional vectors, it maps each node to 2D or 3D coordinates to faciliate visualization. The learned coordinates preserve most local similarity information of the original input, and may shed some light on the structure of the graph or the high-dimensional space.
- Supported Models:
- Parameters
dim (int) – dimension of embeddings
gpus (list of int, optional) – GPU ids, default is all GPUs
cpu_per_gpu (int, optional) – number of CPU threads per GPU, default is all CPUs
float_type (dtype, optional) – type of parameters
index_type (dtype, optional) – type of graph indexes
See also
-
animation
(Y=None, file_name=None, save_file=None, figure_size=5, scale=1, elevation=30, num_frame=700)[source]¶ Rotate learn 3D coordinates as an animation.
- Parameters
Y (list of str, optional) – labels of vectors
file_name (str, optional) – file of labels
save_file (str) –
gif
file to save visualizationfigure_size (int, optional) – size of figure
scale (int, optional) – size of points
elevation (float, optional) – elevation angle
num_frame (int, optional) – number of frames
-
build
(**kwargs)¶ Build the solver from the graph. Arguments depend on the underlying solver type.
-
evaluate
(task, **kwargs)¶ Evaluate the learned embeddings on a downstream task. Arguments depend on the underlying graph type and the task.
- Parameters
task (str) – name of task
- Returns
metrics and their values
- Return type
dict
-
hierarchy
(HY=None, file_name=None, target=None, save_file=None, figure_size=10, scale=2, duration=3)[source]¶ Visualize learned 2D coordinates with hierarchical labels.
- Parameters
HY (list of list of str, optional) – hierarchical labels of vectors
file_name (str, optional) – file of hierarchical labels
target (str) – target class
save_file (str) –
gif
file to save visualizationfigure_size (int, optional) – size of figure
scale (int, optional) – size of points
duration (float, optional) – duration of each frame in seconds
-
load
(**kwargs)¶ Load a graph from file or Python object. Arguments depend on the underlying graph type.
-
save
(file_name)¶ Save embeddings and name mappings in numpy format.
- Parameters
file_name (str) – file name
-
set_format
(delimiters=' \t\r\n', comment='#')¶ Set the format for parsing input data.
- Parameters
delimiters (str, optional) – string of delimiter characters
comment (str, optional) – prefix of comment strings
-
train
(**kwargs)¶ Train embeddings with the solver. Arguments depend on the underlying solver type.
-
visualization
(Y=None, file_name=None, save_file=None, figure_size=10, scale=2)[source]¶ Visualize learned 2D or 3D coordinates.
- Parameters
Y (list of str, optional) – labels of vectors
file_name (str, optional) – file of labels
save_file (str, optional) –
png
orpdf
file to save visualization, if not provided, show the figure in windowfigure_size (int, optional) – size of figure
scale (int, optional) – size of points