graphvite.solver¶
Solver module of GraphVite
-
class
graphvite.solver.
GraphSolver
(dim, float_type=dtype.float32, index_type=dtype.uint32, device_ids=[], num_sampler_per_worker=auto, gpu_memory_limit=auto)¶ Graph embedding solver.
- Parameters
dim (int) – dimension of embeddings
float_type (dtype) – type of parameters
index_type (dtype) – type of node indexes
device_ids (list of int, optional) – GPU ids, [] for auto
num_sampler_per_worker (int, optional) – number of sampler thread per GPU
gpu_memory_limit (int, optional) – memory limit for each GPU in bytes
- Instantiations:
dim: 32, 64, 96, 128, 256, 512
float_type: dtype.float32
index_type: dtype.uint32
-
build
(graph, optimizer=auto, num_partition=auto, num_negative=1, batch_size=1e5, episode_size=auto)¶ Determine and allocate all resources for the solver.
- Parameters
graph (Graph) – graph
optimizer (Optimizer or float, optional) – optimizer or learning rate
num_partition (int, optional) – number of partitions
num_negative (int, optional) – number of negative samples per positive sample
batch_size (int, optional) – batch size of samples in CPU-GPU transfer
episode_size (int, optional) – number of batches in a partition block
-
clear
()¶ Free CPU and GPU memory, except the embeddings on CPU.
-
predict
(samples)¶ Predict logits for samples.
- Parameters
samples (ndarray) – triplets with shape (?, 2), each triplet is ordered as (v, c)
-
train
(model='LINE', num_epoch=2000, resume=False, augmentation_step=auto, random_walk_length=40, random_walk_batch_size=100, shuffle_base=auto, p=1, q=1, positive_reuse=1, negative_sample_exponent=0.75, negative_weight=5, log_frequency=1000)¶ Train node embeddings.
- Parameters
model (str, optional) – ‘DeepWalk’, ‘LINE’ or ‘node2vec’
num_epoch (int, optional) – number of epochs, i.e. #positive edges / |E|
resume (bool, optional) – resume training from learned embeddings or not
augmentation_step (int, optional) – node pairs with distance <= augmentation_step are considered as positive samples
random_walk_length (int, optional) – length of each random walk
random_walk_batch_size (int, optional) – batch size of random walks in samplers
shuffle_base (int, optional) – base for pseudo shuffle
p (float, optional) – return parameter (for node2vec)
q (float, optional) – in-out parameter (for node2vec)
positive_reuse (int, optional) – times of reusing positive samples
negative_sample_exponent (float, optional) – exponent of degrees in negative sampling
negative_weight (float, optional) – weight for each negative sample
log_frequency (int, optional) – log every log_frequency batches
-
property
context_embeddings
¶ Context node embeddings (2D numpy view).
-
property
vertex_embeddings
¶ Vertex node embeddings (2D numpy view).
-
class
graphvite.solver.
KnowledgeGraphSolver
(dim, float_type=dtype.float32, index_type=dtype.uint32, device_ids=[], num_sampler_per_worker=auto, gpu_memory_limit=auto)¶ Knowledge graph embedding solver.
- Parameters
dim (int) – dimension of embeddings
float_type (dtype) – type of parameters
index_type (dtype) – type of node indexes
device_ids (list of int, optional) – GPU ids, [] for auto
num_sampler_per_worker (int, optional) – number of sampler thread per GPU
gpu_memory_limit (int, optional) – memory limit for each GPU in bytes
- Instantiations:
dim: 32, 64, 96, 128, 256, 512, 1024, 2048
float_type: dtype.float32
index_type: dtype.uint32
-
build
(graph, optimizer=auto, num_partition=auto, num_negative=64, batch_size=1e5, episode_size=auto)¶ Determine and allocate all resources for the solver.
- Parameters
graph (KnowledgeGraph) – knowledge graph
optimizer (Optimizer or float, optional) – optimizer or learning rate
num_partition (int, optional) – number of partitions
num_negative (int, optional) – number of negative samples per positive sample
batch_size (int, optional) – batch size of samples in CPU-GPU transfer
episode_size (int, optional) – number of batches in a partition block
-
clear
()¶ Free CPU and GPU memory, except the embeddings on CPU.
-
predict
(samples)¶ Predict logits for samples.
- Parameters
samples (ndarray) – triplets with shape (?, 3), each triplet is ordered as (h, t, r)
-
train
(model='RotatE', num_epoch=2000, resume=False, margin=12, l3_regularization=2e-3, sample_batch_size=2000, positive_reuse=1, adversarial_temperature=2, log_frequency=100)¶ Train knowledge graph embeddings.
- Parameters
model (str, optional) – ‘TransE’, ‘DistMult’, ‘ComplEx’, ‘SimplE’ or ‘RotatE’
num_epoch (int, optional) – number of epochs, i.e. #positive edges / |E|
resume (bool, optional) – resume training from learned embeddings or not
margin (float, optional) – logit margin (for TransE & RotatE)
l3_regularization (float, optional) – L3 regularization (for DistMult, ComplEx & SimplE)
sample_batch_size (int, optional) – batch size of samples in samplers
positive_reuse (int, optional) – times of reusing positive samples
adversarial_temperature (float, optional) – temperature of self-adversarial negative sampling, disabled when set to non-positive value
log_frequency (int, optional) – log every log_frequency batches
-
property
entity_embeddings
¶ Entity embeddings (2D numpy view).
-
property
relation_embeddings
¶ Relation embeddings (2D numpy view).
-
class
graphvite.solver.
VisualizationSolver
(dim, float_type=dtype.float32, index_type=dtype.uint32, device_ids=[], num_sampler_per_worker=auto, gpu_memory_limit=auto)¶ Visualization solver.
- Parameters
dim (int) – dimension of embeddings
float_type (dtype) – type of parameters
index_type (dtype) – type of node indexes
device_ids (list of int, optional) – GPU ids, [] for auto
num_sampler_per_worker (int, optional) – number of sampler thread per GPU
gpu_memory_limit (int, optional) – memory limit for each GPU in bytes
- Instantiations:
dim: 2, 3
float_type: dtype.float32
index_type: dtype.uint32
-
build
(graph, optimizer=auto, num_partition=auto, num_negative=5, batch_size=1e5, episode_size=auto)¶ Determine and allocate all resources for the solver.
- Parameters
graph (KNNGraph) – KNNGraph
optimizer (Optimizer or float, optional) – optimizer or learning rate
num_partition (int, optional) – number of partitions
num_negative (int, optional) – number of negative samples per positive sample
batch_size (int, optional) – batch size of samples in CPU-GPU transfer
episode_size (int, optional) – number of batches in a partition block
-
clear
()¶ Free CPU and GPU memory, except the embeddings on CPU.
-
train
(model='LargeVis', num_epoch=100, resume=False, sample_batch_size=2000, positive_reuse=1, negative_sample_exponent=0.75, negative_weight=3, log_frequency=1000)¶ Train visualization.
- Parameters
model (str, optional) – ‘LargeVis’
num_epoch (int, optional) – number of epochs, i.e. #positive edges / |E|
resume (bool, optional) – resume training from learned embeddings or not
sample_batch_size (int, optional) – batch size of samples in samplers
positive_reuse (int, optional) – times of reusing positive samples
negative_sample_exponent (float, optional) – exponent of degrees in negative sampling
negative_weight (float, optional) – weight for each negative sample
log_frequency (int, optional) – log every log_frequency batches
-
property
coordinates
¶ Low-dimensional coordinates (2D numpy view).