Giter Site home page Giter Site logo

minjiyoon / cgt Goto Github PK

View Code? Open in Web Editor NEW
15.0 1.0 1.0 377 KB

Scalable and privacy-enhanced graph generative models for benchmark graph neural networks

Home Page: https://arxiv.org/abs/2207.04396

Python 99.36% Shell 0.64%
graph-neural-networks graph-transformer graph-generative-models

cgt's Introduction

Graph Generative Model for Benchmarking Graph Neural Networks

We propose a novel, modern graph generation problem to enable generating privacy-controlled, synthetic substitutes of large-scale real-world graphs that can be effectively used to evaluate GNN models. Our proposed graph generative model, Computation Graph Transformer (CGT) 1) operates on minibatches rather than the whole graph, avoiding scalability issues, and 2) reduces the task of learning graph distributions to learning feature vector sequence distributions, which we approach with a novel Transformer architecture.

You can see our ICML 2023 paper for more details.

Setup

Create a new conda environment, install PyTorch and the remaining requirements:

conda create python==3.7 -n cgt
conda activate cgt
pip install -r requirement.txt
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

The code is implemented on PyTorch DataParallel.

Dataset

You can download public graph datasets in the npz format from GNN-Benchmark. Place the dataset in data/ directory. For your convenience, cora.npz and citeseer.npz are already saved in data\. We also support ogbn-arxiv and ogbn-products dataset from OGBN benchmark.

Usage

In run.sh, you write down a list of graph datasets that you want to learn distributions into DATASETS. First, we add different sizes of noisy neighbors to augment the original graphs using NOISES. By executing run.sh, we learn three different distributions with different noise sizes NOISES=(0 2 4) for each dataset. For each dataset, we train three different GCN models (GCN, GIN, SGC) on a pair of original and synthetic graphs, and then compare their performance. The details of other hyperparameters can be found in args.py.

Differential Privacy module

As described in the main paper, DP-SGD on transformer performs badly. Thus we provide only DP-k-means module in this repository. To run DP-k-means, you need to download an open-source library from: https://github.com/google/differential-privacy/tree/main/learning/clustering Then you can uncomment line 11-12 in generator/cluster.py and set dp_feature in args.py to True.

File description

We provide brief descriptions for each file as follows:

Directory/File description
run.sh script to run experiments
args.py set hyperparameters
test.py main file: prepare models, read datasets, graph generation, GNN evaluation
data/ download datasets
generator/ codes related to graph transformer
generator/cluster.py k-means or DP k-means clustering
generator/gpt CGT main directory
generator/gpt/gpt.py prepare models, prepare datasets, train/generation loops
generator/gpt/dataset.py dataset for flatten computation graphs
generator/gpt/model.py XLNet model
generator/gpt/trainer.py training loop
generator/gpt/utils.py generation loop
task/ GNN models
task/aggregation GNN models with different aggregation strategies (GCN, GAT, SGN, GIN)
task/utils/dataset.py Computation Graph Dataset for PyTorch DataParallel
task/utils/utils.py ogbn/npz format datasets loading, utility functions

Citation

Please consider citing the following paper when using our code for your application.

@article{yoon2022scalable,
  title={Scalable Privacy-enhanced Benchmark Graph Generative Model for Graph Convolutional Networks},
  author={Yoon, Minji and Wu, Yue and Palowitch, John and Perozzi, Bryan and Salakhutdinov, Ruslan},
  journal={arXiv preprint arXiv:2207.04396},
  year={2022}
}

cgt's People

Contributors

minjiyoon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

moguizhizi

cgt's Issues

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Steps to reproduce the error from a Google Colab:

!python --version
>>> Python 3.7.15
!git clone https://github.com/minjiyoon/CGT.git
import os
os.chdir('CGT')
!python -m pip install k-means-constrained networkx jinja2
!python -m pip install --force-reinstall ortools==9.3.10497
!python test.py

produced the output:

original sampling time: 1.051
original evaluation time: 39.548, acc: [0.83785439 0.83987465 0.83943878 0.81844264]
Clustering time: 12.330
tcmalloc: large alloc 2934784000 bytes == 0x10646000 @  0x7fb5c10cf1e7 0x7fb5bea200ce 0x7fb5bea76cf5 0x7fb5bea76e08 0x7fb5beb360f4 0x7fb5beb3930c 0x7fb5becc03ac 0x7fb5becc0e10 0x5917ee 0x591ac9 0x7fb5beb402a6 0x4e50c9 0x50d124 0x58fd37 0x50c4fc 0x5b4ee6 0x58ff2e 0x50d482 0x58fd37 0x50c4fc 0x5b4ee6 0x6005a3 0x607796 0x60785c 0x60a436 0x64db82 0x64dd2e 0x7fb5c0cccc87 0x5b636a
tcmalloc: large alloc 2934784000 bytes == 0x10646000 @  0x7fb5c10cf1e7 0x7fb5bea200ce 0x7fb5bea76cf5 0x7fb5bea76e08 0x7fb5beb360f4 0x7fb5beb3930c 0x7fb5becc03ac 0x7fb5becc0e10 0x5917ee 0x591ac9 0x7fb5beb402a6 0x4e50c9 0x50d124 0x58fd37 0x50c4fc 0x5b4ee6 0x58ff2e 0x50d482 0x58fd37 0x50c4fc 0x5b4ee6 0x6005a3 0x607796 0x60785c 0x60a436 0x64db82 0x64dd2e 0x7fb5c0cccc87 0x5b636a
tcmalloc: large alloc 2077827072 bytes == 0x104ae000 @  0x7fb5c10cf1e7 0x7fb5bea200ce 0x7fb5bea76cf5 0x7fb5bea76e08 0x7fb5beb360f4 0x7fb5beb3930c 0x7fb5becc03ac 0x7fb5becc0e10 0x5917ee 0x591ac9 0x7fb5beb402a6 0x4e50c9 0x50d124 0x58fd37 0x50c4fc 0x5b4ee6 0x58ff2e 0x50d482 0x58fd37 0x50c4fc 0x5b4ee6 0x6005a3 0x607796 0x60785c 0x60a436 0x64db82 0x64dd2e 0x7fb5c0cccc87 0x5b636a
Clustered graph generation time: 4.611
[GPT] data preparation time: 0.924
Traceback (most recent call last):
  File "test.py", line 110, in <module>
    main()
  File "test.py", line 71, in main
    generated_list, cluster_center = gpt.run(args, adj, feat, label, ids, train_name=args.gpt_train_name)
  File "/content/CGT/generator/gpt/gpt.py", line 103, in run
    model_name, model, comp_ids = train(args, graphs, cluster_ids, labels, split_ids, train_name, split_name=split)
  File "/content/CGT/generator/gpt/gpt.py", line 52, in train
    trainer.train(split_name)
  File "/content/CGT/generator/gpt/trainer.py", line 116, in train
    run_epoch('train')
  File "/content/CGT/generator/gpt/trainer.py", line 79, in run_epoch
    logits, loss = model(x, lbl, y)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/CGT/generator/gpt/model.py", line 248, in forward
    targets = targets.view(-1)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Could you provide more accurate instructions to reproduce your code?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.