Giter Site home page Giter Site logo

neuralsubgraphcounting's Introduction

NeuralSubgraphCounting

This repository is an official implementation of the paper Neural Subgraph Isomorphism Counting.

Introduction

We propose a learning framework which augments different representation learning architectures and iteratively attends pattern and target data graphs to memorize subgraph isomorphisms for the global counting.

Overview

Representation

We can use the minimum code (with the minimum lexicographic order) defined by Xifeng Yan to convert a graph to a sequence and use sequence models, e.g., CNN, LSTM, and Transformer-XL. A more direct apporach is to use graph covlutional networks to learn representations, e.g., RGCN, RGIN.

As for the interaction module, simple pooling is obviously not enough. We design the Memory Attention Predict Network (MemAttnPredictNet) and Dynamic Intermedium Attention Memory (DIAMNet), you can try them in the following reproduction part.

DIAMNet

Reproduction

Package Dependencies

  • tqdm
  • numpy
  • pandas
  • scipy
  • tensorboardX
  • python-igraph == 0.9.11
  • torch >= 1.3.0
  • dgl == 0.4.3post2

Data Generation

The data in the KDD paper is available at OneDrive.

You can also generate data by modifying run.py to set CONFIG and run

cd generator
python run.py

For the MUTAG data, you can use the mutag_convertor.py to generate the raw graphs.

cd convertor
python mutag_convertor.py

You can use generator\mutag_generator.py to generate patterns. But be careful of duplications.

Model Training/Finetuning

For the small dataset, just run

cd src
python train.py --model RGIN --predict_net SumPredictNet \
    --gpu_id 0 --batch_size 512 \
    --max_npv 8 --max_npe 8 --max_npvl 8 --max_npel 8 \
    --max_ngv 64 --max_nge 256 --max_ngvl 16 --max_ngel 16 \
    --pattern_dir ../data/small/patterns \
    --graph_dir ../data/small/graphs \
    --metadata_dir ../data/small/metadata \
    --save_data_dir ../data/small \
    --save_model_dir ../dumps/small/RGIN-SumPredictNet
cd src
python train.py --model RGIN --predict_net DIAMNet \
    --predict_net_mem_init mean --predict_net_mem_len 4 --predict_net_recurrent_steps 3 \
    --gpu_id 0 --batch_size 512 \
    --max_npv 8 --max_npe 8 --max_npvl 8 --max_npel 8 \
    --max_ngv 64 --max_nge 256 --max_ngvl 16 --max_ngel 16 \
    --pattern_dir ../data/small/patterns \
    --graph_dir ../data/small/graphs \
    --metadata_dir ../data/small/metadata \
    --save_data_dir ../data/small \
    --save_model_dir ../dumps/small/RGIN-DIAMNet

We find using the encoder module from RGIN-SumPredictNet results in the faster convergence of RGIN-DIAMNet

cd src
python finetune.py --model RGIN --predict_net DIAMNet \
    --predict_net_mem_init mean --predict_net_mem_len 4 --predict_net_recurrent_steps 3 \
    --gpu_id 0 --batch_size 512 \
    --max_npv 8 --max_npe 8 --max_npvl 8 --max_npel 8 \
    --max_ngv 64 --max_nge 256 --max_ngvl 16 --max_ngel 16 \
    --pattern_dir ../data/small/patterns \
    --graph_dir ../data/small/graphs \
    --metadata_dir ../data/small/metadata \
    --save_data_dir ../data/small \
    --save_model_dir ../dumps/small/RGIN-DIAMNet \
    --load_model_dir ../dumps/small/RGIN-SumPredictNet

For the large dataset, just run

cd src
python finetune.py --model RGIN --predict_net SumPredictNet \
    --gpu_id 0 --batch_size 128 --update_every 4 \
    --max_npv 16 --max_npe 16 --max_npvl 16 --max_npel 16 \
    --max_ngv 512 --max_nge 2048 --max_ngvl 64 --max_ngel 64 \
    --pattern_dir ../data/large/patterns \
    --graph_dir ../data/large/graphs \
    --metadata_dir ../data/large/metadata \
    --save_data_dir ../data/large \
    --save_model_dir ../dumps/large/RGIN-SumPredictNet \
    --load_model_dir ../dumps/small/RGIN-SumPredictNet
cd src
python finetune.py --model RGIN --predict_net DIAMNet \
    --predict_net_mem_init mean --predict_net_mem_len 4 --predict_net_recurrent_steps 3 \
    --gpu_id 0 --batch_size 128 --update_every 4 \
    --max_npv 16 --max_npe 16 --max_npvl 16 --max_npel 16 \
    --max_ngv 512 --max_nge 2048 --max_ngvl 64 --max_ngel 64 \
    --pattern_dir ../data/large/patterns \
    --graph_dir ../data/large/graphs \
    --metadata_dir ../data/large/metadata \
    --save_data_dir ../data/large \
    --save_model_dir ../dumps/large/RGIN-DIAMNet \
    --load_model_dir ../dumps/small/RGIN-DIAMNet

For the MUTAG dataset, you need to set the train_ratio manually

cd src
python train_mutag.py --model RGIN --predict_net SumPredictNet \
    --gpu_id 0 --batch_size 64 \
    --max_npv 4 --max_npe 3 --max_npvl 2 --max_npel 2 \
    --max_ngv 28 --max_nge 66 --max_ngvl 7 --max_ngel 4 \
    --pattern_dir ../data/MUTAG/patterns \
    --graph_dir ../data/MUTAG/raw \
    --metadata_dir ../data/MUTAG/metadata \
    --save_data_dir ../data/MUTAG/RGIN-SumPredictNet-0.4 \
    --save_model_dir ../dumps/MUTAG \
    --train_ratio 0.4

Transfer learning can improve the performance when the number of training data is limited.

cd src
python finetune_mutag.py --model RGIN --predict_net SumPredictNet \
    --gpu_id 0 --batch_size 64 \
    --max_npv 8 --max_npe 8 --max_npvl 8 --max_npel 8 \
    --max_ngv 64 --max_nge 256 --max_ngvl 16 --max_ngel 16 \
    --pattern_dir ../data/MUTAG/patterns \
    --graph_dir ../data/MUTAG/raw \
    --metadata_dir ../data/MUTAG/metadata \
    --save_data_dir ../data/MUTAG \
    --save_model_dir ../dumps/MUTAG/Transfer-RGIN-SumPredictNet-0.4 \
    --train_ratio 0.4 \
    --load_model_dir ../dumps/small/RGIN-SumPredictNet

For the RGIN-DIAMNet on the MUTAG, it is difficult to converge. So we load RGIN-SumPredictNet and replace the interaction module for both MeanMemAttnPredictNet and DIAMNet.

cd src
python finetune_mutag.py --model RGIN --predict_net DIAMNet \
    --predict_net_mem_init mean --predict_net_mem_len 4 --predict_net_recurrent_steps 1 \
    --gpu_id 0 --batch_size 64 \
    --max_npv 4 --max_npe 3 --max_npvl 2 --max_npel 2 \
    --max_ngv 28 --max_nge 66 --max_ngvl 7 --max_ngel 4 \
    --pattern_dir ../data/MUTAG/patterns \
    --graph_dir ../data/MUTAG/raw \
    --metadata_dir ../data/MUTAG/metadata \
    --save_data_dir ../data/MUTAG \
    --save_model_dir ../dumps/MUTAG/RGIN-DIAMNet-0.4 \
    --train_ratio 0.4 \
    --load_model_dir ../dumps/MUTAG/RGIN-SumPredictNet-0.4
cd src
python finetune_mutag.py --model RGIN --predict_net DIAMNet \
    --predict_net_mem_init mean --predict_net_mem_len 4 --predict_net_recurrent_steps 1 \
    --gpu_id 0 --batch_size 64 \
    --max_npv 8 --max_npe 8 --max_npvl 8 --max_npel 8 \
    --max_ngv 64 --max_nge 256 --max_ngvl 16 --max_ngel 16 \
    --pattern_dir ../data/MUTAG/patterns \
    --graph_dir ../data/MUTAG/raw \
    --metadata_dir ../data/MUTAG/metadata \
    --save_data_dir ../data/MUTAG \
    --save_model_dir ../dumps/MUTAG/Transfer-RGIN-SumPredictNet-0.4 \
    --train_ratio 0.4 \
    --load_model_dir ../dumps/MUTAG/Transfer-RGIN-DIAMNet-0.4

Model Evaluation

cd src
python evaluate.py ../dumps/small/RGIN-DIAMNet

Citation

The details of this pipeline are described in the following paper. If you use this code in your work, please kindly cite it.

@inproceedings{liu2020neuralsubgrpahcounting,
  author    = {Xin Liu, Haojie Pan, Mutian He, Yangqiu Song, Xin Jiang, Lifeng Shang},
  title     = {Neural Subgraph Isomorphism Counting},
  booktitle = {ACM SIGKDD Conference on Knowledge Discovery and Data Mining {KDD} 2020, August 23-27, 2020, San Diego, United States.}
}

Miscellaneous

Please send any questions about the code and/or the algorithm to [email protected].

neuralsubgraphcounting's People

Contributors

seanliu96 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

neuralsubgraphcounting's Issues

Generating data meets error "ValueError: sequence must not be empty"

I have created a conda environment using python3.7 and installed all the required packages in the requirements.txt. Then I try to use the following instruction to create a dataset.

cd generator
python run.py

However, it first raises an error that "ModuleNotFoundError: No module named 'igraph'.

Then I installed the igraph package using "pip install python-igraph", but another error was raised.

python run.py
patterns_id P_N3_E2_NL2_EL2
patterns_id P_N3_E4_NL2_EL2
patterns_id P_N3_E4_NL2_EL4
patterns_id P_N4_E4_NL2_EL2
patterns_id P_N4_E4_NL2_EL4
patterns_id P_N4_E4_NL4_EL2
patterns_id P_N4_E4_NL4_EL4
7 patterns generation finished!
  0%|                                                                                                                                                              | 0/112 [00:00<?, ?it/s]
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/jiaang/.conda/envs/subgraph/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "run.py", line 97, in generate_graphs
    alpha, max_pattern_counts=max_pattern_counts, max_subgraph=max_subgraph, return_subisomorphisms=True)
  File "/data2/jiaang/github/NeuralSubgraphCounting/generator/graph_generator.py", line 484, in generate
    self.update_subgraphs(subgraphs, graph_edge_label_mapping)
  File "/data2/jiaang/github/NeuralSubgraphCounting/generator/graph_generator.py", line 331, in update_subgraphs
    subgraph.es["label"] = new_edge_labels_in_subgraphs[sg]
ValueError: sequence must not be empty
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "run.py", line 204, in <module>
    x.get()
  File "/home/jiaang/.conda/envs/subgraph/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
ValueError: sequence must not be empty

Could you help me solve this issue?

which torch and dlg version that you saved the small dataset

I download your small dataset, but I could not managed to load it. I tried different version of torch and dgl.
I somehow think the problem of the version problem. Please can you tell us which version of torch and dgl you used when saving the dataset?

The meaning of "int2onehot"

Hi Xin:

May I know the meaning of "int2onehot" in your code? or can you give an example on how to calculate this?

Thanks.

MaxGatedFilterNet

Hi Xin Liu:

class MaxGatedFilterNet(nn.Module):
def init(self):
super(MaxGatedFilterNet, self).init()

def forward(self, p_x, g_x):
    max_x = torch.max(p_x, dim=1, keepdim=True)[0]
    if max_x.dim() == 2:
        return g_x <= max_x
    else:
        return (g_x <= max_x).all(keepdim=True, dim=2)

May I know the meaning of MaxGatedFilterNet? It looks different from the Filter network in the paper. Can you give an example to show how it works?

KeyError: 'return_weights'

Hi Xin,

When I gonna finetune the model, I got the error KeyError: 'return_weights'.
And I read the code of finetune.py, but I couldn't find return_weights is defined in the finetune_config.
Could you tell me how I can debug?

Thanks a lot.

AttributeError: 'numpy.ndarray' object has no attribute 'device' Process finished with exit code 1

python train_mutag.py --model RGIN --predict_net SumPredictNet
--gpu_id 0 --batch_size 64
--max_npv 4 --max_npe 3 --max_npvl 2 --max_npel 2
--max_ngv 28 --max_nge 66 --max_ngvl 7 --max_ngel 4
--pattern_dir ../data/MUTAG/patterns
--graph_dir ../data/MUTAG/raw
--metadata_dir ../data/MUTAG/metadata
--save_data_dir ../data/MUTAG/RGIN-SumPredictNet-0.4
--save_model_dir ../dumps/MUTAG
--train_ratio 0.4

problem:
image

Traceback (most recent call last):
File "/home/faker/Desktop/code/NeuralSubgraphCounting-master/src/train_mutag.py", line 249, in
dataset = GraphAdjDataset(x)
File "/home/faker/Desktop/code/NeuralSubgraphCounting-master/src/dataset.py", line 218, in init
self.data = GraphAdjDataset.preprocess_batch(data, use_tqdm=True)
File "/home/faker/Desktop/code/NeuralSubgraphCounting-master/src/dataset.py", line 319, in preprocess_batch
d.append(GraphAdjDataset.preprocess(x))
File "/home/faker/Desktop/code/NeuralSubgraphCounting-master/src/dataset.py", line 291, in preprocess
pattern_dglgraph.ndata["indeg"] = np.array(pattern.indegree(), dtype=np.float32)
File "/home/faker/.conda/envs/ISONET/lib/python3.9/site-packages/dgl/view.py", line 81, in setitem
self._graph._set_n_repr(self._ntid, self._nodes, {key : val})
File "/home/faker/.conda/envs/ISONET/lib/python3.9/site-packages/dgl/heterograph.py", line 4112, in _set_n_repr
if F.context(val) != self.device:
File "/home/faker/.conda/envs/ISONET/lib/python3.9/site-packages/dgl/backend/pytorch/tensor.py", line 82, in context
return input.device
AttributeError: 'numpy.ndarray' object has no attribute 'device'

Process finished with exit code 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.