graphvite.dataset

Dataset module of GraphVite

Graph

Knowledge Graph

Visualization

class graphvite.dataset.Dataset(name, urls=None, members=None)[source]

Graph dataset.

Parameters
  • name (str) – name of dataset

  • urls (dict, optional) – url(s) for each split, can be either str or list of str

  • members (dict, optional) – zip member(s) for each split, leave empty for default

Datasets contain several splits, such as train, valid and test. For each split, there are one or more URLs, specifying the file to download. You may also specify the zip member to extract. When a split is accessed, it will be automatically downloaded and decompressed if it is not present.

You can assign a preprocess for each split, by defining a function with name [split]_preprocess:

class MyDataset(Dataset):
    def __init__(self):
        super(MyDataset, self).__init__(
            "my_dataset",
            train="url/to/train/split",
            test="url/to/test/split"
        )

    def train_preprocess(self, input_file, output_file):
        with open(input_file, "r") as fin, open(output_file, "w") as fout:
            fout.write(fin.read())

f = open(MyDataset().train)

If the preprocess returns a non-trivial value, then it is assigned to the split, otherwise the file name is assigned. By convention, only splits ending with _data have non-trivial return value.

csv2txt(csv_file, txt_file)[source]

Convert csv to txt.

Parameters
  • csv_file – csv file

  • txt_file – txt file

image_feature_data(dataset, model='resnet50', batch_size=128)[source]

Compute feature vectors for an image dataset using a neural network.

Parameters
  • dataset (torch.utils.data.Dataset) – dataset

  • model (str or torch.nn.Module, optional) – pretrained model. If it is a str, use the last hidden model of that model.

  • batch_size (int, optional) – batch size

induced_graph(graph_file, label_file, save_file)[source]

Induce a subgraph from labeled nodes. All edges in the induced graph have at least one labeled node.

Parameters
  • graph_file (str) – graph file

  • label_file (str) – label file

  • save_file (str) – save file

Divide a normal graph into a train split and several test splits for link prediction use. Each test split contains half true and half false edges.

Parameters
  • graph_file (str) – graph file

  • files (list of str) – file names, the first file is treated as train file

  • portions (list of float) – split portions

top_k_label(label_file, save_file, k, format='node-label')[source]

Extract top-k labels.

Parameters
  • label_file (str) – label file

  • save_file (str) – save file

  • k (int) – top-k labels will be extracted

  • format (str, optional) – format of label file,

  • be 'node-label' or ' (can) –

    • node-label: each line is [node] [label]

    • (label)-nodes: each line is [node]…, no explicit label

class graphvite.dataset.BlogCatalog[source]

BlogCatalog social network dataset.

Splits:

graph, label, train, test

Train and test splits are used for link prediction purpose.

class graphvite.dataset.Youtube[source]

Youtube social network dataset.

Splits:

graph, label

class graphvite.dataset.Flickr[source]

Flickr social network dataset.

Splits:

graph, label

class graphvite.dataset.Hyperlink2012[source]

Hyperlink 2012 graph dataset.

Splits:

pld_train, pld_test

class graphvite.dataset.Friendster[source]

Friendster social network dataset.

Splits:

graph, small_graph, label

class graphvite.dataset.Wikipedia[source]

Wikipedia dump for word embedding.

Splits:

graph

class graphvite.dataset.Math[source]

Synthetic math knowledge graph dataset.

Splits:

train, valid, test

class graphvite.dataset.FB15k[source]

FB15k knowledge graph dataset.

Splits:

train, valid, test

class graphvite.dataset.FB15k237[source]

FB15k-237 knowledge graph dataset.

Splits:

train, valid, test

class graphvite.dataset.WN18[source]

WN18 knowledge graph dataset.

Splits:

train, valid, test

class graphvite.dataset.WN18RR[source]

WN18RR knowledge graph dataset.

Splits:

train, valid, test

class graphvite.dataset.Freebase[source]

Freebase knowledge graph dataset.

Splits:

train

class graphvite.dataset.MNIST[source]

MNIST dataset for visualization.

Splits:

train_image_data, train_label_data, test_image_data, test_label_data, image_data, label_data

class graphvite.dataset.CIFAR10[source]

CIFAR10 dataset for visualization.

Splits:

train_image_data, train_label_data, test_image_data, test_label_data, image_data, label_data

class graphvite.dataset.ImageNet[source]

ImageNet dataset for visualization.

Splits:

train_image, train_feature_data, train_label, train_hierarchical_label, valid_image, valid_feature_data, valid_label, valid_hierarchical_label