graphvite.dataset¶
Dataset module of GraphVite
Graph
Knowledge Graph
Visualization
-
class
graphvite.dataset.
Dataset
(name, urls=None, members=None)[source]¶ Graph dataset.
- Parameters
name (str) – name of dataset
urls (dict, optional) – url(s) for each split, can be either str or list of str
members (dict, optional) – zip member(s) for each split, leave empty for default
Datasets contain several splits, such as train, valid and test. For each split, there are one or more URLs, specifying the file to download. You may also specify the zip member to extract. When a split is accessed, it will be automatically downloaded and decompressed if it is not present.
You can assign a preprocess for each split, by defining a function with name [split]_preprocess:
class MyDataset(Dataset): def __init__(self): super(MyDataset, self).__init__( "my_dataset", train="url/to/train/split", test="url/to/test/split" ) def train_preprocess(self, input_file, output_file): with open(input_file, "r") as fin, open(output_file, "w") as fout: fout.write(fin.read()) f = open(MyDataset().train)
If the preprocess returns a non-trivial value, then it is assigned to the split, otherwise the file name is assigned. By convention, only splits ending with
_data
have non-trivial return value.See also
Pre-defined preprocess functions
csv2txt()
,top_k_label()
,induced_graph()
,link_prediction_split()
,image_feature_data()
-
csv2txt
(csv_file, txt_file)[source]¶ Convert
csv
totxt
.- Parameters
csv_file – csv file
txt_file – txt file
-
image_feature_data
(dataset, model='resnet50', batch_size=128)[source]¶ Compute feature vectors for an image dataset using a neural network.
- Parameters
dataset (torch.utils.data.Dataset) – dataset
model (str or torch.nn.Module, optional) – pretrained model. If it is a str, use the last hidden model of that model.
batch_size (int, optional) – batch size
-
induced_graph
(graph_file, label_file, save_file)[source]¶ Induce a subgraph from labeled nodes. All edges in the induced graph have at least one labeled node.
- Parameters
graph_file (str) – graph file
label_file (str) – label file
save_file (str) – save file
-
link_prediction_split
(graph_file, files, portions)[source]¶ Divide a normal graph into a train split and several test splits for link prediction use. Each test split contains half true and half false edges.
- Parameters
graph_file (str) – graph file
files (list of str) – file names, the first file is treated as train file
portions (list of float) – split portions
-
top_k_label
(label_file, save_file, k, format='node-label')[source]¶ Extract top-k labels.
- Parameters
label_file (str) – label file
save_file (str) – save file
k (int) – top-k labels will be extracted
format (str, optional) – format of label file,
be 'node-label' or ' (can) –
node-label: each line is [node] [label]
(label)-nodes: each line is [node]…, no explicit label
-
class
graphvite.dataset.
BlogCatalog
[source]¶ BlogCatalog social network dataset.
- Splits:
graph, label, train, test
Train and test splits are used for link prediction purpose.
-
class
graphvite.dataset.
Hyperlink2012
[source]¶ Hyperlink 2012 graph dataset.
- Splits:
pld_train, pld_test
-
class
graphvite.dataset.
Friendster
[source]¶ Friendster social network dataset.
- Splits:
graph, small_graph, label
-
class
graphvite.dataset.
Math
[source]¶ Synthetic math knowledge graph dataset.
- Splits:
train, valid, test
-
class
graphvite.dataset.
FB15k237
[source]¶ FB15k-237 knowledge graph dataset.
- Splits:
train, valid, test
-
class
graphvite.dataset.
MNIST
[source]¶ MNIST dataset for visualization.
- Splits:
train_image_data, train_label_data, test_image_data, test_label_data, image_data, label_data