graphvite.dataset

Dataset module of GraphVite

Graph

Knowledge Graph

Visualization

class graphvite.dataset.Dataset(name, urls=None, members=None)[source]

Graph dataset.

Parameters
  • name (str) – name of dataset

  • urls (dict, optional) – url(s) for each split, can be either str or list of str

  • members (dict, optional) – zip member(s) for each split, leave empty for default

Datasets contain several splits, such as train, valid and test. For each split, there are one or more URLs, specifying the file to download. You may also specify the zip member to extract. When a split is accessed, it will be automatically downloaded and decompressed if it is not present.

You can assign a preprocess for each split, by defining a function with name [split]_preprocess:

class MyDataset(Dataset):
    def __init__(self):
        super(MyDataset, self).__init__(
            "my_dataset",
            train="url/to/train/split",
            test="url/to/test/split"
        )

    def train_preprocess(self, input_file, output_file):
        with open(input_file, "r") as fin, open(output_file, "w") as fout:
            fout.write(fin.read())

f = open(MyDataset().train)

If the preprocess returns a non-trivial value, then it is assigned to the split, otherwise the file name is assigned. By convention, only splits ending with _data have non-trivial return value.

csv2txt(csv_file, txt_file)[source]

Convert csv to txt.

Parameters
  • csv_file – csv file

  • txt_file – txt file

image_feature_data(dataset, model='resnet50', batch_size=128)[source]

Infer feature vectors on a dataset using a neural network.

Parameters
  • dataset (torch.utils.data.Dataset) – dataset

  • model (str or torch.nn.Module, optional) – pretrained model. If it is a str, use the last hidden layer of that model.

  • batch_size (int, optional) – batch size

induced_graph(graph_file, label_file, save_file)[source]

Induce a subgraph from labeled nodes. All edges in the induced graph have at least one labeled node.

Parameters
  • graph_file (str) – graph file

  • label_file (str) – label file

  • save_file (str) – save file

Split a graph for link prediction use. The test split will contain half true and half false edges.

Parameters
  • graph_file (str) – graph file

  • train_file (str) – train file

  • test_file (str) – test file

  • portion (str) – portion of test edges

top_k_label(label_file, save_file, k, format='node-label')[source]

Extract top-k labels.

Parameters
  • label_file (str) – label file

  • save_file (str) – save file

  • k (int) – top-k labels will be extracted

  • format (str, optional) – format of label file,

  • be 'node-label' or ' (can) –

    • node-label: each line is [node] [label]

    • (label)-nodes: each line is [node]…, no explicit label

class graphvite.dataset.BlogCatalog[source]

BlogCatalog social network dataset.

Splits:

train, label

class graphvite.dataset.Youtube[source]

Youtube social network dataset.

Splits:

train, label

class graphvite.dataset.Flickr[source]

Flickr social network dataset.

Splits:

train, label

class graphvite.dataset.Hyperlink2012[source]

Hyperlink 2012 graph dataset.

Splits:

pld_train, pld_test

class graphvite.dataset.Friendster[source]

Friendster social network dataset.

Splits:

train, small_train, label

class graphvite.dataset.Wikipedia[source]

Wikipedia dump for word embedding.

Splits:

train

class graphvite.dataset.FB15k[source]

FB15k knowledge graph dataset.

Splits:

train, valid, test

class graphvite.dataset.FB15k237[source]

FB15k-237 knowledge graph dataset.

Splits:

train, valid, test

class graphvite.dataset.WN18[source]

WN18 knowledge graph dataset.

Splits:

train, valid, test

class graphvite.dataset.WN18RR[source]

WN18RR knowledge graph dataset.

Splits:

train, valid, test

class graphvite.dataset.Freebase[source]

Freebase knowledge graph dataset.

Splits:

train

class graphvite.dataset.MNIST[source]

MNIST dataset for visualization.

Splits:

train_image_data, train_label_data, test_image_data, test_label_data, image_data, label_data

class graphvite.dataset.CIFAR10[source]

CIFAR10 dataset for visualization.

Splits:

train_image_data, train_label_data, test_image_data, test_label_data, image_data, label_data

class graphvite.dataset.ImageNet[source]

ImageNet dataset for visualization.

Splits:

train_image, train_feature_data, train_label, train_hierarchical_label, valid_image, valid_feature_data, valid_label, valid_hierarchical_label