graphvite.application

Application module of GraphVite

class graphvite.application.Application(type, *args, **kwargs)[source]

Create an application instance of any type.

Parameters

type (str) – application type, can be ‘graph’, ‘word graph’, ‘knowledge graph’ or ‘visualization’

class graphvite.application.GraphApplication(dim, gpus=[], cpu_per_gpu=0, gpu_memory_limit=0, float_type=dtype.float32, index_type=dtype.uint32)[source]

Node embedding application.

Given a graph, it embeds each node into a continuous vector representation. The learned embeddings can be used for many downstream tasks. e.g. node classification, link prediction, node analogy. The similarity between node embeddings can be measured by cosine distance.

Supported Models:
Parameters
  • dim (int) – dimension of embeddings

  • gpus (list of int, optional) – GPU ids, default is all GPUs

  • cpu_per_gpu (int, optional) – number of CPU threads per GPU, default is all CPUs

  • float_type (dtype, optional) – type of parameters

  • index_type (dtype, optional) – type of graph indexes

See also

Graph, GraphSolver

build(**kwargs)

Build the solver from the graph. Arguments depend on the underlying solver type.

evaluate(task, **kwargs)

Evaluate the learned embeddings on a downstream task. Arguments depend on the underlying graph type and the task.

Parameters

task (str) – name of task

Returns

metrics and their values

Return type

dict

Evaluate node embeddings on link prediction task.

Parameters
  • H (list of str, optional) – names of head nodes

  • T (list of str, optional) – names of tail nodes

  • Y (list of int, optional) – labels of edges

  • file_name (str, optional) – file of edges and labels (e.g. validation set)

  • filter_H (list of str, optional) – names of head nodes to filter out

  • filter_T (list of str, optional) – names of tail nodes to filter out

  • filter_file (str, optional) – file of edges to filter out (e.g. training set)

Returns

AUC of link prediction

Return type

dict

load(**kwargs)

Load a graph from file or Python object. Arguments depend on the underlying graph type.

node_classification(X=None, Y=None, file_name=None, portions=(0.02, ), normalization=False, times=1, patience=100)[source]

Evaluate node embeddings on node classification task.

Parameters
  • X (list of str, optional) – names of nodes

  • Y (list, optional) – labels of nodes

  • file_name (str, optional) – file of nodes & labels

  • portions (tuple of float, optional) – how much data for training

  • normalization (bool, optional) – normalize the embeddings or not

  • times (int, optional) – number of trials

  • patience (int, optional) – patience on loss convergence

Returns

macro-F1 & micro-F1 averaged over all trials

Return type

dict

save(file_name)

Save embeddings and name mappings in numpy format.

Parameters

file_name (str) – file name

set_format(delimiters=' \t\r\n', comment='#')

Set the format for parsing input data.

Parameters
  • delimiters (str, optional) – string of delimiter characters

  • comment (str, optional) – prefix of comment strings

train(**kwargs)

Train embeddings with the solver. Arguments depend on the underlying solver type.

class graphvite.application.WordGraphApplication(dim, gpus=[], cpu_per_gpu=0, gpu_memory_limit=0, float_type=dtype.float32, index_type=dtype.uint32)[source]

Word node embedding application.

Given a corpus, it embeds each word into a continuous vector representation. The learned embeddings can be used for natural language processing tasks. This can be viewed as a variant of the word2vec algorithm, with random walk augmentation support. The similarity between node embeddings can be measured by cosine distance.

Supported Models:
Parameters
  • dim (int) – dimension of embeddings

  • gpus (list of int, optional) – GPU ids, default is all GPUs

  • cpu_per_gpu (int, optional) – number of CPU threads per GPU, default is all CPUs

  • float_type (dtype, optional) – type of parameters

  • index_type (dtype, optional) – type of graph indexes

build(**kwargs)

Build the solver from the graph. Arguments depend on the underlying solver type.

evaluate(task, **kwargs)

Evaluate the learned embeddings on a downstream task. Arguments depend on the underlying graph type and the task.

Parameters

task (str) – name of task

Returns

metrics and their values

Return type

dict

load(**kwargs)

Load a graph from file or Python object. Arguments depend on the underlying graph type.

save(file_name)

Save embeddings and name mappings in numpy format.

Parameters

file_name (str) – file name

set_format(delimiters=' \t\r\n', comment='#')

Set the format for parsing input data.

Parameters
  • delimiters (str, optional) – string of delimiter characters

  • comment (str, optional) – prefix of comment strings

train(**kwargs)

Train embeddings with the solver. Arguments depend on the underlying solver type.

class graphvite.application.KnowledgeGraphApplication(dim, gpus=[], cpu_per_gpu=0, gpu_memory_limit=0, float_type=dtype.float32, index_type=dtype.uint32)[source]

Knowledge graph embedding application.

Given a knowledge graph, it embeds each entity and relation into a continuous vector representation respectively. The learned embeddings can be used for analysis of knowledge graphs. e.g. entity prediction, link prediction. The likelihood of edges can be predicted by computing the score function over embeddings of triplets.

Supported Models:
Parameters
  • dim (int) – dimension of embeddings

  • gpus (list of int, optional) – GPU ids, default is all GPUs

  • cpu_per_gpu (int, optional) – number of CPU threads per GPU, default is all CPUs

  • float_type (dtype, optional) – type of parameters

  • index_type (dtype, optional) – type of graph indexes

Note

The implementation of TransE, DistMult and ComplEx, SimplE are slightly different from their original papers. The loss function and the regularization term generally follow this repo. Self-adversarial negative sampling is also adopted in these models like RotatE.

build(**kwargs)

Build the solver from the graph. Arguments depend on the underlying solver type.

entity_prediction(H=None, R=None, T=None, file_name=None, save_file=None, target='tail', k=10, backend='graphvite')[source]

Predict the distribution of missing entity or relation for triplets.

Parameters
  • H (list of str, optional) – names of head entities

  • R (list of str, optional) – names of relations

  • T (list of str, optional) – names of tail entities

  • file_name (str, optional) – file of triplets (e.g. validation set)

  • save_file (str, optional) – txt or pkl file to save predictions

  • k (int, optional) – top-k recalls will be returned

  • target (str, optional) – ‘head’ or ‘tail’

  • backend (str, optional) – ‘graphvite’ or ‘torch’

Returns

top-k recalls for each triplet, if save file is not provided

Return type

list of list of tuple

evaluate(task, **kwargs)

Evaluate the learned embeddings on a downstream task. Arguments depend on the underlying graph type and the task.

Parameters

task (str) – name of task

Returns

metrics and their values

Return type

dict

Evaluate knowledge graph embeddings on link prediction task.

Parameters
  • H (list of str, optional) – names of head entities

  • R (list of str, optional) – names of relations

  • T (list of str, optional) – names of tail entities

  • file_name (str, optional) – file of triplets (e.g. validation set)

  • filter_H (list of str, optional) – names of head entities to filter out

  • filter_R (list of str, optional) – names of relations to filter out

  • filter_T (list of str, optional) – names of tail entities to filter out

  • filter_files (str, optional) – files of triplets to filter out (e.g. training / validation / test set)

  • target (str, optional) – ‘head’, ‘tail’ or ‘both’

  • fast_mode (int, optional) – if specified, only that number of samples will be evaluated

  • backend (str, optional) – ‘graphvite’ or ‘torch’

Returns

MR, MRR, HITS@1, HITS@3 & HITS@10 of link prediction

Return type

dict

load(**kwargs)

Load a graph from file or Python object. Arguments depend on the underlying graph type.

save(file_name)

Save embeddings and name mappings in numpy format.

Parameters

file_name (str) – file name

set_format(delimiters=' \t\r\n', comment='#')

Set the format for parsing input data.

Parameters
  • delimiters (str, optional) – string of delimiter characters

  • comment (str, optional) – prefix of comment strings

train(**kwargs)

Train embeddings with the solver. Arguments depend on the underlying solver type.

class graphvite.application.VisualizationApplication(dim, gpus=[], cpu_per_gpu=0, gpu_memory_limit=0, float_type=dtype.float32, index_type=dtype.uint32)[source]

Graph & high-dimensional data visualization.

Given a graph or high-dimensional vectors, it maps each node to 2D or 3D coordinates to faciliate visualization. The learned coordinates preserve most local similarity information of the original input, and may shed some light on the structure of the graph or the high-dimensional space.

Supported Models:
Parameters
  • dim (int) – dimension of embeddings

  • gpus (list of int, optional) – GPU ids, default is all GPUs

  • cpu_per_gpu (int, optional) – number of CPU threads per GPU, default is all CPUs

  • float_type (dtype, optional) – type of parameters

  • index_type (dtype, optional) – type of graph indexes

animation(Y=None, file_name=None, save_file=None, figure_size=5, scale=1, elevation=30, num_frame=700)[source]

Rotate learn 3D coordinates as an animation.

Parameters
  • Y (list of str, optional) – labels of vectors

  • file_name (str, optional) – file of labels

  • save_file (str) – gif file to save visualization

  • figure_size (int, optional) – size of figure

  • scale (int, optional) – size of points

  • elevation (float, optional) – elevation angle

  • num_frame (int, optional) – number of frames

build(**kwargs)

Build the solver from the graph. Arguments depend on the underlying solver type.

evaluate(task, **kwargs)

Evaluate the learned embeddings on a downstream task. Arguments depend on the underlying graph type and the task.

Parameters

task (str) – name of task

Returns

metrics and their values

Return type

dict

hierarchy(HY=None, file_name=None, target=None, save_file=None, figure_size=10, scale=2, duration=3)[source]

Visualize learned 2D coordinates with hierarchical labels.

Parameters
  • HY (list of list of str, optional) – hierarchical labels of vectors

  • file_name (str, optional) – file of hierarchical labels

  • target (str) – target class

  • save_file (str) – gif file to save visualization

  • figure_size (int, optional) – size of figure

  • scale (int, optional) – size of points

  • duration (float, optional) – duration of each frame in seconds

load(**kwargs)

Load a graph from file or Python object. Arguments depend on the underlying graph type.

save(file_name)

Save embeddings and name mappings in numpy format.

Parameters

file_name (str) – file name

set_format(delimiters=' \t\r\n', comment='#')

Set the format for parsing input data.

Parameters
  • delimiters (str, optional) – string of delimiter characters

  • comment (str, optional) – prefix of comment strings

train(**kwargs)

Train embeddings with the solver. Arguments depend on the underlying solver type.

visualization(Y=None, file_name=None, save_file=None, figure_size=10, scale=2)[source]

Visualize learned 2D or 3D coordinates.

Parameters
  • Y (list of str, optional) – labels of vectors

  • file_name (str, optional) – file of labels

  • save_file (str, optional) – png or pdf file to save visualization, if not provided, show the figure in window

  • figure_size (int, optional) – size of figure

  • scale (int, optional) – size of points