NeuralSubgraphCounting

This repository is an official implementation of the paper Neural Subgraph Isomorphism Counting.

Introduction

We propose a learning framework which augments different representation learning architectures and iteratively attends pattern and target data graphs to memorize subgraph isomorphisms for the global counting.

We can use the minimum code (with the minimum lexicographic order) defined by Xifeng Yan to convert a graph to a sequence and use sequence models, e.g., CNN, LSTM, and Transformer-XL. A more direct apporach is to use graph covlutional networks to learn representations, e.g., RGCN, RGIN.

As for the interaction module, simple pooling is obviously not enough. We design the Memory Attention Predict Network (MemAttnPredictNet) and Dynamic Intermedium Attention Memory (DIAMNet), you can try them in the following reproduction part.

Reproduction

Package Dependencies

tqdm
numpy
pandas
scipy
tensorboardX
python-igraph == 0.9.11
torch >= 1.3.0
dgl == 0.4.3post2

Data Generation

The data in the KDD paper is available at OneDrive.

You can also generate data by modifying run.py to set CONFIG and run

cd generator
python run.py

For the MUTAG data, you can use the mutag_convertor.py to generate the raw graphs.

cd convertor
python mutag_convertor.py

You can use generator\mutag_generator.py to generate patterns. But be careful of duplications.

Model Training/Finetuning

For the small dataset, just run

cd src
python train.py --model RGIN --predict_net SumPredictNet \
    --gpu_id 0 --batch_size 512 \
    --max_npv 8 --max_npe 8 --max_npvl 8 --max_npel 8 \
    --max_ngv 64 --max_nge 256 --max_ngvl 16 --max_ngel 16 \
    --pattern_dir ../data/small/patterns \
    --graph_dir ../data/small/graphs \
    --metadata_dir ../data/small/metadata \
    --save_data_dir ../data/small \
    --save_model_dir ../dumps/small/RGIN-SumPredictNet

cd src
python train.py --model RGIN --predict_net DIAMNet \
    --predict_net_mem_init mean --predict_net_mem_len 4 --predict_net_recurrent_steps 3 \
    --gpu_id 0 --batch_size 512 \
    --max_npv 8 --max_npe 8 --max_npvl 8 --max_npel 8 \
    --max_ngv 64 --max_nge 256 --max_ngvl 16 --max_ngel 16 \
    --pattern_dir ../data/small/patterns \
    --graph_dir ../data/small/graphs \
    --metadata_dir ../data/small/metadata \
    --save_data_dir ../data/small \
    --save_model_dir ../dumps/small/RGIN-DIAMNet

We find using the encoder module from RGIN-SumPredictNet results in the faster convergence of RGIN-DIAMNet

cd src
python finetune.py --model RGIN --predict_net DIAMNet \
    --predict_net_mem_init mean --predict_net_mem_len 4 --predict_net_recurrent_steps 3 \
    --gpu_id 0 --batch_size 512 \
    --max_npv 8 --max_npe 8 --max_npvl 8 --max_npel 8 \
    --max_ngv 64 --max_nge 256 --max_ngvl 16 --max_ngel 16 \
    --pattern_dir ../data/small/patterns \
    --graph_dir ../data/small/graphs \
    --metadata_dir ../data/small/metadata \
    --save_data_dir ../data/small \
    --save_model_dir ../dumps/small/RGIN-DIAMNet \
    --load_model_dir ../dumps/small/RGIN-SumPredictNet

For the large dataset, just run

cd src
python finetune.py --model RGIN --predict_net SumPredictNet \
    --gpu_id 0 --batch_size 128 --update_every 4 \
    --max_npv 16 --max_npe 16 --max_npvl 16 --max_npel 16 \
    --max_ngv 512 --max_nge 2048 --max_ngvl 64 --max_ngel 64 \
    --pattern_dir ../data/large/patterns \
    --graph_dir ../data/large/graphs \
    --metadata_dir ../data/large/metadata \
    --save_data_dir ../data/large \
    --save_model_dir ../dumps/large/RGIN-SumPredictNet \
    --load_model_dir ../dumps/small/RGIN-SumPredictNet

cd src
python finetune.py --model RGIN --predict_net DIAMNet \
    --predict_net_mem_init mean --predict_net_mem_len 4 --predict_net_recurrent_steps 3 \
    --gpu_id 0 --batch_size 128 --update_every 4 \
    --max_npv 16 --max_npe 16 --max_npvl 16 --max_npel 16 \
    --max_ngv 512 --max_nge 2048 --max_ngvl 64 --max_ngel 64 \
    --pattern_dir ../data/large/patterns \
    --graph_dir ../data/large/graphs \
    --metadata_dir ../data/large/metadata \
    --save_data_dir ../data/large \
    --save_model_dir ../dumps/large/RGIN-DIAMNet \
    --load_model_dir ../dumps/small/RGIN-DIAMNet

For the MUTAG dataset, you need to set the train_ratio manually

cd src
python train_mutag.py --model RGIN --predict_net SumPredictNet \
    --gpu_id 0 --batch_size 64 \
    --max_npv 4 --max_npe 3 --max_npvl 2 --max_npel 2 \
    --max_ngv 28 --max_nge 66 --max_ngvl 7 --max_ngel 4 \
    --pattern_dir ../data/MUTAG/patterns \
    --graph_dir ../data/MUTAG/raw \
    --metadata_dir ../data/MUTAG/metadata \
    --save_data_dir ../data/MUTAG/RGIN-SumPredictNet-0.4 \
    --save_model_dir ../dumps/MUTAG \
    --train_ratio 0.4

Transfer learning can improve the performance when the number of training data is limited.

cd src
python finetune_mutag.py --model RGIN --predict_net SumPredictNet \
    --gpu_id 0 --batch_size 64 \
    --max_npv 8 --max_npe 8 --max_npvl 8 --max_npel 8 \
    --max_ngv 64 --max_nge 256 --max_ngvl 16 --max_ngel 16 \
    --pattern_dir ../data/MUTAG/patterns \
    --graph_dir ../data/MUTAG/raw \
    --metadata_dir ../data/MUTAG/metadata \
    --save_data_dir ../data/MUTAG \
    --save_model_dir ../dumps/MUTAG/Transfer-RGIN-SumPredictNet-0.4 \
    --train_ratio 0.4 \
    --load_model_dir ../dumps/small/RGIN-SumPredictNet

For the RGIN-DIAMNet on the MUTAG, it is difficult to converge. So we load RGIN-SumPredictNet and replace the interaction module for both MeanMemAttnPredictNet and DIAMNet.

cd src
python finetune_mutag.py --model RGIN --predict_net DIAMNet \
    --predict_net_mem_init mean --predict_net_mem_len 4 --predict_net_recurrent_steps 1 \
    --gpu_id 0 --batch_size 64 \
    --max_npv 4 --max_npe 3 --max_npvl 2 --max_npel 2 \
    --max_ngv 28 --max_nge 66 --max_ngvl 7 --max_ngel 4 \
    --pattern_dir ../data/MUTAG/patterns \
    --graph_dir ../data/MUTAG/raw \
    --metadata_dir ../data/MUTAG/metadata \
    --save_data_dir ../data/MUTAG \
    --save_model_dir ../dumps/MUTAG/RGIN-DIAMNet-0.4 \
    --train_ratio 0.4 \
    --load_model_dir ../dumps/MUTAG/RGIN-SumPredictNet-0.4

cd src
python finetune_mutag.py --model RGIN --predict_net DIAMNet \
    --predict_net_mem_init mean --predict_net_mem_len 4 --predict_net_recurrent_steps 1 \
    --gpu_id 0 --batch_size 64 \
    --max_npv 8 --max_npe 8 --max_npvl 8 --max_npel 8 \
    --max_ngv 64 --max_nge 256 --max_ngvl 16 --max_ngel 16 \
    --pattern_dir ../data/MUTAG/patterns \
    --graph_dir ../data/MUTAG/raw \
    --metadata_dir ../data/MUTAG/metadata \
    --save_data_dir ../data/MUTAG \
    --save_model_dir ../dumps/MUTAG/Transfer-RGIN-SumPredictNet-0.4 \
    --train_ratio 0.4 \
    --load_model_dir ../dumps/MUTAG/Transfer-RGIN-DIAMNet-0.4

Model Evaluation

cd src
python evaluate.py ../dumps/small/RGIN-DIAMNet

Citation

The details of this pipeline are described in the following paper. If you use this code in your work, please kindly cite it.

@inproceedings{liu2020neuralsubgrpahcounting,
  author    = {Xin Liu, Haojie Pan, Mutian He, Yangqiu Song, Xin Jiang, Lifeng Shang},
  title     = {Neural Subgraph Isomorphism Counting},
  booktitle = {ACM SIGKDD Conference on Knowledge Discovery and Data Mining {KDD} 2020, August 23-27, 2020, San Diego, United States.}
}

Miscellaneous

Please send any questions about the code and/or the algorithm to [email protected].

Generating data meets error "ValueError: sequence must not be empty"

I have created a conda environment using python3.7 and installed all the required packages in the requirements.txt. Then I try to use the following instruction to create a dataset.

cd generator
python run.py

However, it first raises an error that "ModuleNotFoundError: No module named 'igraph'.

Then I installed the igraph package using "pip install python-igraph", but another error was raised.

python run.py
patterns_id P_N3_E2_NL2_EL2
patterns_id P_N3_E4_NL2_EL2
patterns_id P_N3_E4_NL2_EL4
patterns_id P_N4_E4_NL2_EL2
patterns_id P_N4_E4_NL2_EL4
patterns_id P_N4_E4_NL4_EL2
patterns_id P_N4_E4_NL4_EL4
7 patterns generation finished!
  0%|                                                                                                                                                              | 0/112 [00:00<?, ?it/s]
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/jiaang/.conda/envs/subgraph/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "run.py", line 97, in generate_graphs
    alpha, max_pattern_counts=max_pattern_counts, max_subgraph=max_subgraph, return_subisomorphisms=True)
  File "/data2/jiaang/github/NeuralSubgraphCounting/generator/graph_generator.py", line 484, in generate
    self.update_subgraphs(subgraphs, graph_edge_label_mapping)
  File "/data2/jiaang/github/NeuralSubgraphCounting/generator/graph_generator.py", line 331, in update_subgraphs
    subgraph.es["label"] = new_edge_labels_in_subgraphs[sg]
ValueError: sequence must not be empty
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "run.py", line 204, in <module>
    x.get()
  File "/home/jiaang/.conda/envs/subgraph/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
ValueError: sequence must not be empty

Could you help me solve this issue?

hkust-knowcomp / neuralsubgraphcounting Goto Github PK