graphvite.dataset¶
Dataset module of GraphVite
Graph
Knowledge Graph
Visualization
-
class
graphvite.dataset.
Dataset
(name, urls=None, members=None)[source]¶ Graph dataset.
- Parameters
name (str) – name of dataset
urls (dict, optional) – url(s) for each split, can be either str or list of str
members (dict, optional) – zip member(s) for each split, leave empty for default
Datasets contain several splits, such as train, valid and test. For each split, there are one or more URLs, specifying the file to download. You may also specify the zip member to extract. When a split is accessed, it will be automatically downloaded and decompressed if it is not present.
You can assign a preprocess for each split, by defining a function with name [split]_preprocess:
class MyDataset(Dataset): def __init__(self): super(MyDataset, self).__init__( "my_dataset", train="url/to/train/split", test="url/to/test/split" ) def train_preprocess(self, input_file, output_file): with open(input_file, "r") as fin, open(output_file, "w") as fout: fout.write(fin.read()) f = open(MyDataset().train)
If the preprocess returns a non-trivial value, then it is assigned to the split, otherwise the file name is assigned. By convention, only splits ending with
_data
have non-trivial return value.See also
Pre-defined preprocess functions
csv2txt()
top_k_label()
,induced_graph()
link_prediction_split()
image_feature_data()
-
csv2txt
(csv_file, txt_file)[source]¶ Convert
csv
totxt
.- Parameters
csv_file – csv file
txt_file – txt file
-
image_feature_data
(dataset, model='resnet50', batch_size=128)[source]¶ Infer feature vectors on a dataset using a neural network.
- Parameters
dataset (torch.utils.data.Dataset) – dataset
model (str or torch.nn.Module, optional) – pretrained model. If it is a str, use the last hidden layer of that model.
batch_size (int, optional) – batch size
-
induced_graph
(graph_file, label_file, save_file)[source]¶ Induce a subgraph from labeled nodes. All edges in the induced graph have at least one labeled node.
- Parameters
graph_file (str) – graph file
label_file (str) – label file
save_file (str) – save file
-
link_prediction_split
(graph_file, train_file, test_file, portion)[source]¶ Split a graph for link prediction use. The test split will contain half true and half false edges.
- Parameters
graph_file (str) – graph file
train_file (str) – train file
test_file (str) – test file
portion (str) – portion of test edges
-
top_k_label
(label_file, save_file, k, format='node-label')[source]¶ Extract top-k labels.
- Parameters
label_file (str) – label file
save_file (str) – save file
k (int) – top-k labels will be extracted
format (str, optional) – format of label file,
be 'node-label' or ' (can) –
node-label: each line is [node] [label]
(label)-nodes: each line is [node]…, no explicit label
-
class
graphvite.dataset.
BlogCatalog
[source]¶ BlogCatalog social network dataset.
- Splits:
train, label
-
class
graphvite.dataset.
Hyperlink2012
[source]¶ Hyperlink 2012 graph dataset.
- Splits:
pld_train, pld_test
-
class
graphvite.dataset.
Friendster
[source]¶ Friendster social network dataset.
- Splits:
train, small_train, label
-
class
graphvite.dataset.
FB15k237
[source]¶ FB15k-237 knowledge graph dataset.
- Splits:
train, valid, test
-
class
graphvite.dataset.
MNIST
[source]¶ MNIST dataset for visualization.
- Splits:
train_image_data, train_label_data, test_image_data, test_label_data, image_data, label_data