graphvite.dataset¶

Dataset module of GraphVite

Graph

BlogCatalog
Youtube
Flickr
Hyperlink2012
Friendster
Wikipedia

Knowledge Graph

Math
FB15k
FB15k237
WN18
WN18RR
Freebase

Visualization

MNIST
CIFAR10
ImageNet

class graphvite.dataset.Dataset(name, urls=None, members=None)[source]¶

Graph dataset.

Parameters

name (str) – name of dataset
urls (dict, optional) – url(s) for each split, can be either str or list of str
members (dict, optional) – zip member(s) for each split, leave empty for default

Datasets contain several splits, such as train, valid and test. For each split, there are one or more URLs, specifying the file to download. You may also specify the zip member to extract. When a split is accessed, it will be automatically downloaded and decompressed if it is not present.

You can assign a preprocess for each split, by defining a function with name [split]_preprocess:

class MyDataset(Dataset):
    def __init__(self):
        super(MyDataset, self).__init__(
            "my_dataset",
            train="url/to/train/split",
            test="url/to/test/split"
        )

    def train_preprocess(self, input_file, output_file):
        with open(input_file, "r") as fin, open(output_file, "w") as fout:
            fout.write(fin.read())

f = open(MyDataset().train)

If the preprocess returns a non-trivial value, then it is assigned to the split, otherwise the file name is assigned. By convention, only splits ending with _data have non-trivial return value.

See also

Pre-defined preprocess functions csv2txt(), top_k_label(), induced_graph(), link_prediction_split(), image_feature_data()

csv2txt(csv_file, txt_file)[source]¶

Convert csv to txt.

Parameters

csv_file – csv file
txt_file – txt file

image_feature_data(dataset, model='resnet50', batch_size=128)[source]¶

Compute feature vectors for an image dataset using a neural network.

Parameters

dataset (torch.utils.data.Dataset) – dataset
model (str or torch.nn.Module, optional) – pretrained model. If it is a str, use the last hidden model of that model.
batch_size (int, optional) – batch size

induced_graph(graph_file, label_file, save_file)[source]¶

Induce a subgraph from labeled nodes. All edges in the induced graph have at least one labeled node.

Parameters

graph_file (str) – graph file
label_file (str) – label file
save_file (str) – save file

link_prediction_split(graph_file, files, portions)[source]¶

Divide a normal graph into a train split and several test splits for link prediction use. Each test split contains half true and half false edges.

Parameters

graph_file (str) – graph file
files (list of str) – file names, the first file is treated as train file
portions (list of float) – split portions

top_k_label(label_file, save_file, k, format='node-label')[source]¶

Extract top-k labels.

Parameters

label_file (str) – label file
save_file (str) – save file
k (int) – top-k labels will be extracted
format (str, optional) – format of label file,
be 'node-label' or ' (can) –
- node-label: each line is [node] [label]
- (label)-nodes: each line is [node]…, no explicit label

class graphvite.dataset.BlogCatalog[source]¶

BlogCatalog social network dataset.

Splits:: graph, label, train, test

Train and test splits are used for link prediction purpose.

class graphvite.dataset.Youtube[source]¶

Youtube social network dataset.

Splits:: graph, label

class graphvite.dataset.Flickr[source]¶

Flickr social network dataset.

Splits:: graph, label

class graphvite.dataset.Hyperlink2012[source]¶

Hyperlink 2012 graph dataset.

Splits:: pld_train, pld_test

class graphvite.dataset.Friendster[source]¶

Friendster social network dataset.

Splits:: graph, small_graph, label

class graphvite.dataset.Wikipedia[source]¶

Wikipedia dump for word embedding.

Splits:: graph

class graphvite.dataset.Math[source]¶

Synthetic math knowledge graph dataset.

Splits:: train, valid, test

class graphvite.dataset.FB15k[source]¶

FB15k knowledge graph dataset.

Splits:: train, valid, test

class graphvite.dataset.FB15k237[source]¶

FB15k-237 knowledge graph dataset.

Splits:: train, valid, test

class graphvite.dataset.WN18[source]¶

WN18 knowledge graph dataset.

Splits:: train, valid, test

class graphvite.dataset.WN18RR[source]¶

WN18RR knowledge graph dataset.

Splits:: train, valid, test

class graphvite.dataset.Freebase[source]¶

Freebase knowledge graph dataset.

Splits:: train

class graphvite.dataset.MNIST[source]¶

MNIST dataset for visualization.

Splits:: train_image_data, train_label_data, test_image_data, test_label_data, image_data, label_data

class graphvite.dataset.CIFAR10[source]¶

CIFAR10 dataset for visualization.

Splits:: train_image_data, train_label_data, test_image_data, test_label_data, image_data, label_data

class graphvite.dataset.ImageNet[source]¶

ImageNet dataset for visualization.

Splits:: train_image, train_feature_data, train_label, train_hierarchical_label, valid_image, valid_feature_data, valid_label, valid_hierarchical_label