graphvite.solver

Solver module of GraphVite

class graphvite.solver.GraphSolver(dim, float_type=dtype.float32, index_type=dtype.uint32, device_ids=[], num_sampler_per_worker=auto, gpu_memory_limit=auto)

Graph embedding solver.

Parameters
  • dim (int) – dimension of embeddings

  • float_type (dtype) – type of parameters

  • index_type (dtype) – type of node indexes

  • device_ids (list of int, optional) – GPU ids, [] for auto

  • num_sampler_per_worker (int, optional) – number of sampler thread per GPU

  • gpu_memory_limit (int, optional) – memory limit for each GPU in bytes

Instantiations:
  • dim: 32, 64, 96, 128, 256, 512

  • float_type: dtype.float32

  • index_type: dtype.uint32

build(graph, optimizer=auto, num_partition=auto, num_negative=1, batch_size=1e5, episode_size=auto)

Determine and allocate all resources for the solver.

Parameters
  • graph (Graph) – graph

  • optimizer (Optimizer or float, optional) – optimizer or learning rate

  • num_partition (int, optional) – number of partitions

  • num_negative (int, optional) – number of negative samples per positive sample

  • batch_size (int, optional) – batch size of samples in CPU-GPU transfer

  • episode_size (int, optional) – number of batches in a partition block

clear()

Free CPU and GPU memory, except the embeddings on CPU.

predict(samples)

Predict logits for samples.

Parameters

samples (ndarray) – triplets with shape (?, 2), each triplet is ordered as (v, c)

train(model='LINE', num_epoch=2000, resume=False, augmentation_step=auto, random_walk_length=40, random_walk_batch_size=100, shuffle_base=auto, p=1, q=1, positive_reuse=1, negative_sample_exponent=0.75, negative_weight=5, log_frequency=1000)

Train node embeddings.

Parameters
  • model (str, optional) – ‘DeepWalk’, ‘LINE’ or ‘node2vec’

  • num_epoch (int, optional) – number of epochs, i.e. #positive edges / |E|

  • resume (bool, optional) – resume training from learned embeddings or not

  • augmentation_step (int, optional) – node pairs with distance <= augmentation_step are considered as positive samples

  • random_walk_length (int, optional) – length of each random walk

  • random_walk_batch_size (int, optional) – batch size of random walks in samplers

  • shuffle_base (int, optional) – base for pseudo shuffle

  • p (float, optional) – return parameter (for node2vec)

  • q (float, optional) – in-out parameter (for node2vec)

  • positive_reuse (int, optional) – times of reusing positive samples

  • negative_sample_exponent (float, optional) – exponent of degrees in negative sampling

  • negative_weight (float, optional) – weight for each negative sample

  • log_frequency (int, optional) – log every log_frequency batches

property context_embeddings

Context node embeddings (2D numpy view).

property vertex_embeddings

Vertex node embeddings (2D numpy view).

class graphvite.solver.KnowledgeGraphSolver(dim, float_type=dtype.float32, index_type=dtype.uint32, device_ids=[], num_sampler_per_worker=auto, gpu_memory_limit=auto)

Knowledge graph embedding solver.

Parameters
  • dim (int) – dimension of embeddings

  • float_type (dtype) – type of parameters

  • index_type (dtype) – type of node indexes

  • device_ids (list of int, optional) – GPU ids, [] for auto

  • num_sampler_per_worker (int, optional) – number of sampler thread per GPU

  • gpu_memory_limit (int, optional) – memory limit for each GPU in bytes

Instantiations:
  • dim: 32, 64, 96, 128, 256, 512, 1024, 2048

  • float_type: dtype.float32

  • index_type: dtype.uint32

build(graph, optimizer=auto, num_partition=auto, num_negative=64, batch_size=1e5, episode_size=auto)

Determine and allocate all resources for the solver.

Parameters
  • graph (KnowledgeGraph) – knowledge graph

  • optimizer (Optimizer or float, optional) – optimizer or learning rate

  • num_partition (int, optional) – number of partitions

  • num_negative (int, optional) – number of negative samples per positive sample

  • batch_size (int, optional) – batch size of samples in CPU-GPU transfer

  • episode_size (int, optional) – number of batches in a partition block

clear()

Free CPU and GPU memory, except the embeddings on CPU.

predict(samples)

Predict logits for samples.

Parameters

samples (ndarray) – triplets with shape (?, 3), each triplet is ordered as (h, t, r)

train(model='RotatE', num_epoch=2000, resume=False, margin=12, l3_regularization=2e-3, sample_batch_size=2000, positive_reuse=1, adversarial_temperature=2, log_frequency=100)

Train knowledge graph embeddings.

Parameters
  • model (str, optional) – ‘TransE’, ‘DistMult’, ‘ComplEx’, ‘SimplE’ or ‘RotatE’

  • num_epoch (int, optional) – number of epochs, i.e. #positive edges / |E|

  • resume (bool, optional) – resume training from learned embeddings or not

  • margin (float, optional) – logit margin (for TransE & RotatE)

  • l3_regularization (float, optional) – L3 regularization (for DistMult, ComplEx & SimplE)

  • sample_batch_size (int, optional) – batch size of samples in samplers

  • positive_reuse (int, optional) – times of reusing positive samples

  • adversarial_temperature (float, optional) – temperature of self-adversarial negative sampling, disabled when set to non-positive value

  • log_frequency (int, optional) – log every log_frequency batches

property entity_embeddings

Entity embeddings (2D numpy view).

property relation_embeddings

Relation embeddings (2D numpy view).

class graphvite.solver.VisualizationSolver(dim, float_type=dtype.float32, index_type=dtype.uint32, device_ids=[], num_sampler_per_worker=auto, gpu_memory_limit=auto)

Visualization solver.

Parameters
  • dim (int) – dimension of embeddings

  • float_type (dtype) – type of parameters

  • index_type (dtype) – type of node indexes

  • device_ids (list of int, optional) – GPU ids, [] for auto

  • num_sampler_per_worker (int, optional) – number of sampler thread per GPU

  • gpu_memory_limit (int, optional) – memory limit for each GPU in bytes

Instantiations:
  • dim: 2, 3

  • float_type: dtype.float32

  • index_type: dtype.uint32

build(graph, optimizer=auto, num_partition=auto, num_negative=5, batch_size=1e5, episode_size=auto)

Determine and allocate all resources for the solver.

Parameters
  • graph (KNNGraph) – KNNGraph

  • optimizer (Optimizer or float, optional) – optimizer or learning rate

  • num_partition (int, optional) – number of partitions

  • num_negative (int, optional) – number of negative samples per positive sample

  • batch_size (int, optional) – batch size of samples in CPU-GPU transfer

  • episode_size (int, optional) – number of batches in a partition block

clear()

Free CPU and GPU memory, except the embeddings on CPU.

train(model='LargeVis', num_epoch=100, resume=False, sample_batch_size=2000, positive_reuse=1, negative_sample_exponent=0.75, negative_weight=3, log_frequency=1000)

Train visualization.

Parameters
  • model (str, optional) – ‘LargeVis’

  • num_epoch (int, optional) – number of epochs, i.e. #positive edges / |E|

  • resume (bool, optional) – resume training from learned embeddings or not

  • sample_batch_size (int, optional) – batch size of samples in samplers

  • positive_reuse (int, optional) – times of reusing positive samples

  • negative_sample_exponent (float, optional) – exponent of degrees in negative sampling

  • negative_weight (float, optional) – weight for each negative sample

  • log_frequency (int, optional) – log every log_frequency batches

property coordinates

Low-dimensional coordinates (2D numpy view).