graphvite.dataset¶

Dataset module of GraphVite

Graph

BlogCatalog
Youtube
Flickr
Hyperlink2012
Friendster
Wikipedia

Knowledge Graph

FB15k
FB15k237
WN18
WN18RR
Freebase

Visualization

MNIST
CIFAR10
ImageNet

class graphvite.dataset.Dataset(name, urls=None, members=None)[source]¶

Graph dataset.

Parameters

name (str) – name of dataset
urls (dict, optional) – url(s) for each split, can be either str or list of str
members (dict, optional) – zip member(s) for each split, leave empty for default

Datasets contain several splits, such as train, valid and test. For each split, there are one or more URLs, specifying the file to download. You may also specify the zip member to extract. When a split is accessed, it will be automatically downloaded and decompressed if it is not present.

You can assign a preprocess for each split, by defining a function with name [split]_preprocess:

class MyDataset(Dataset):
    def __init__(self):
        super(MyDataset, self).__init__(
            "my_dataset",
            train="url/to/train/split",
            test="url/to/test/split"
        )

    def train_preprocess(self, input_file, output_file):
        with open(input_file, "r") as fin, open(output_file, "w") as fout:
            fout.write(fin.read())

f = open(MyDataset().train)

If the preprocess returns a non-trivial value, then it is assigned to the split, otherwise the file name is assigned. By convention, only splits ending with _data have non-trivial return value.

csv2txt(csv_file, txt_file)[source]¶

Convert csv to txt.

Parameters

csv_file – csv file
txt_file – txt file

image_feature_data(dataset, model='resnet50', batch_size=128)[source]¶

Infer feature vectors on a dataset using a neural network.

Parameters

dataset (torch.utils.data.Dataset) – dataset
model (str or torch.nn.Module, optional) – pretrained model. If it is a str, use the last hidden layer of that model.
batch_size (int, optional) – batch size

induced_graph(graph_file, label_file, save_file)[source]¶

Induce a subgraph from labeled nodes. All edges in the induced graph have at least one labeled node.

Parameters

graph_file (str) – graph file
label_file (str) – label file
save_file (str) – save file

link_prediction_split(graph_file, train_file, test_file, portion)[source]¶

Split a graph for link prediction use. The test split will contain half true and half false edges.

Parameters

graph_file (str) – graph file
train_file (str) – train file
test_file (str) – test file
portion (str) – portion of test edges

top_k_label(label_file, save_file, k, format='node-label')[source]¶

Extract top-k labels.

Parameters

label_file (str) – label file
save_file (str) – save file
k (int) – top-k labels will be extracted
format (str, optional) – format of label file,
be 'node-label' or ' (can) –
- node-label: each line is [node] [label]
- (label)-nodes: each line is [node]…, no explicit label

class graphvite.dataset.BlogCatalog[source]¶

BlogCatalog social network dataset.

Splits:: train, label

class graphvite.dataset.Youtube[source]¶

Youtube social network dataset.

Splits:: train, label

class graphvite.dataset.Flickr[source]¶

Flickr social network dataset.

Splits:: train, label

class graphvite.dataset.Hyperlink2012[source]¶

Hyperlink 2012 graph dataset.

Splits:: pld_train, pld_test

class graphvite.dataset.Friendster[source]¶

Friendster social network dataset.

Splits:: train, small_train, label

class graphvite.dataset.Wikipedia[source]¶

Wikipedia dump for word embedding.

Splits:: train

class graphvite.dataset.FB15k[source]¶

FB15k knowledge graph dataset.

Splits:: train, valid, test

class graphvite.dataset.FB15k237[source]¶

FB15k-237 knowledge graph dataset.

Splits:: train, valid, test

class graphvite.dataset.WN18[source]¶

WN18 knowledge graph dataset.

Splits:: train, valid, test

class graphvite.dataset.WN18RR[source]¶

WN18RR knowledge graph dataset.

Splits:: train, valid, test

class graphvite.dataset.Freebase[source]¶

Freebase knowledge graph dataset.

Splits:: train

class graphvite.dataset.MNIST[source]¶

MNIST dataset for visualization.

Splits:: train_image_data, train_label_data, test_image_data, test_label_data, image_data, label_data

class graphvite.dataset.CIFAR10[source]¶

CIFAR10 dataset for visualization.

Splits:: train_image_data, train_label_data, test_image_data, test_label_data, image_data, label_data

class graphvite.dataset.ImageNet[source]¶

ImageNet dataset for visualization.

Splits:: train_image, train_feature_data, train_label, train_hierarchical_label, valid_image, valid_feature_data, valid_label, valid_hierarchical_label