metapath-generator's Introduction

Metapath guided random walk generator

$ python3 gene_walk.py [dataset] [full_graph] [length_of_walk] [coverage] [multiprocessing]

or using the other version that implemented by dictionary instead of networkx.

$ python3 gene_walk_dict.py [dataset] [full_graph] [length_of_walk] [coverage] [multiprocessing]

The latter runs faster.

The parameters are the following:

dataset: the name of the dataset
full_graph: when 1, using full graph, otherwise use [dataset].edges.lp.train for link prediction
length_of_walk: length of the walk, metapath2vec set it to 100
coverage: the numwalks parameters, the number of times each node was covered
multiprocessing: the number of processes to run the generation.

[dataset].edges: id pairs of edges. [plain txt]
[dataset].type: a pickle dumpped file in this form - [id1_type, id2_type, ...]. Type is represented by char, such as A. [binary]
[dataset].edge.lp.train: subgraph for training of link prediction [plain text]
[dataset].metapath: a set of metapath separated by \n. [plain text]

[file name]: useage [if it's the input/output of some model or neither]

embeddings: generated embeddings from metapath2vec++ [output]
hin_data: reformatted data for hin2vec [input]
hin_embeddings: generated embeddings from hin2vec [output]
metapath: generated walks composed by id's only [neither]
metapath_100_1000: huge length of walks (metapath2vec's authors' suggestion) [neigher]
metapath2vec: metapath2vec source code [src]
pte: [NOT USED]
typed_walk: adding type before each id's , input of metapath2vec [input]

Recommend Projects