Giter Site home page Giter Site logo

powerful-gnns's Introduction

How Powerful are Graph Neural Networks?

This repository is the official PyTorch implementation of the experiments in the following paper:

Keyulu Xu*, Weihua Hu*, Jure Leskovec, Stefanie Jegelka. How Powerful are Graph Neural Networks? ICLR 2019.

arXiv OpenReview

If you make use of the code/experiment or GIN algorithm in your work, please cite our paper (Bibtex below).

@inproceedings{
xu2018how,
title={How Powerful are Graph Neural Networks?},
author={Keyulu Xu and Weihua Hu and Jure Leskovec and Stefanie Jegelka},
booktitle={International Conference on Learning Representations},
year={2019},
url={https://openreview.net/forum?id=ryGs6iA5Km},
}

Installation

Install PyTorch following the instuctions on the [official website] (https://pytorch.org/). The code has been tested over PyTorch 0.4.1 and 1.0.0 versions.

Then install the other dependencies.

pip install -r requirements.txt

Test run

Unzip the dataset file

unzip dataset.zip

and run

python main.py

The default parameters are not the best performing-hyper-parameters used to reproduce our results in the paper. Hyper-parameters need to be specified through the commandline arguments. Please refer to our paper for the details of how we set the hyper-parameters. For instance, for the COLLAB and IMDB datasets, you need to add --degree_as_tag so that the node degrees are used for input node features.

To learn hyper-parameters to be specified, please type

python main.py --help

Cross-validation strategy in the paper

The cross-validation in our paper only uses training and validation sets (no test set) due to small dataset size. Specifically, after obtaining 10 validation curves corresponding to 10 folds, we first took average of validation curves across the 10 folds (thus, we obtain an averaged validation curve), and then selected a single epoch that achieved the maximum averaged validation accuracy. Finally, the standard devision over the 10 folds was computed at the selected epoch.

powerful-gnns's People

Contributors

keyulu avatar weihua916 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

powerful-gnns's Issues

About node attributes

Hello, your program only uses the node label as input. If you want to add the node attributes as input, how should the program deal with the label

Dropout in last layer

Thank you for your work.
You have used dropout prior to computing the output from each layer. What is the role of this dropout?
See:

score_over_layer += F.dropout(self.linears_prediction[layer](pooled_h), self.final_dropout, training = self.training)

Data preprocessing

Hi,
I want to use your GIN implementation for my own dataset. I don't understand how should I prepare the initial txt file for my dataset.
can you explain it, please.
Thanks

Inconsistent dataset description and actual data

Hi,
I'm looking at Table 1 in the paper, and the number of classes associated with the datasets does not match the description of the data in the appendix. For example, the MUTAG dataset has 2 classes according to the table (and the actual data labels that I checked, which are either 1 or -1), versus in the appendix it says that the dataset has 7 discrete labels. Was wondering if you could clarify the disagreement.

Thank you!

Custom dataset creation

Hi, thank you for such an impressive work. I would like to apply your algorithm to my task, so I need to create dataset which will fit your code. I have, say, 150 graphs of 200 nodes each, where all the nodes are equal.


I'm trying to understand your txt files, but have some issues with that.
For example:

10 0
0 3 1 2 9
0 3 0 2 9
0 4 0 1 3 9
0 3 2 4 5
0 3 3 5 6
0 5 3 4 6 7 8
0 4 4 5 7 8
0 3 5 6 8
0 3 5 6 7
1 3 0 1 2

My assumption is that the block correspond to a graph, 10 is a number of nodes while 0 is a graph class label. Each row correspond to a node, where first value is a (node label I guess?), second is a number of links, and other are connections. Is that right? Is row number correspond to the node index?

About node smoothing

Hello, excellent job!

I have a little question about GIN.
Does GIN make nodes smoother and smoother like GCN?
If not, does node feature represent the structure around this node?

Thanks.

dataset

Hello,weihua,thanks for your sharing this code.I just started to touch the knowledge of the graph neural network,soI am not very clear about something. I would like to ask what is the MUTAG data set.Like what's the nodes or edges.etc.Thanks^_^

Reproduce Issues

Hi, I used the same codes and datasets to tune the parameters provided by the paper. The random seed is set by 0. The followings are the results:

image
Where the first line represents results from the paper and the second line represents experimental results I conducted.
As you can see, I can not reproduce the results of the paper on many datasets. Would you tell me how to reproduce your results?

Possible bug in `load_data()`

For line 52 & 71 in util.load_data(), current node is assumed to be j instead of row[0], isn't this bug here? When reading NCI1.txt, wrong data will be assigned when line 25 reached.

Think about graph spectral

Hi ! I am interested in your GIN work, but can you provide a spectral view of GIN's operation like GCN?

Low accuracy

Hi:
I use your published of this paper, I can't reproduce the result. For example, MUTAG, test accuracy is very low, about 70 percent while train accuracy is near to 1. There occurs overfitting I think. Do you meet this before and how to fix it ?

Ask for information of discrete labels

Hello authors,

Thank you for the great work!

I would like to know about the information of discrete labels for all datasets.

In each dataset, I can only see the ".txt" file which includes only discrete numbers. Therefore, it's hard for me to guess which category belongs to what chemical element.

Thank you,

Incorrect reference to updated row length when meant to compare old row length

In line 66 of util.py, if tmp > len(row):, I think the intention is to compare tmp against the length of the row just after reading a line of the data file.

However, in line 60, row is updated to be a subarray of length tmp. This is the else clause of tmp == len(row).

This means line 66 will never be satisfied.

result of paper

I am a beginner. I want to ask you how you got the result of your paper.The results of each validation are the max, and then the max in 10?

COLLAB

Dear Author,

I could reproduce task including MYTAG,PTC,NCl1,PROTEINS,IMDB-B,IMDB-M,RDT-M5K using the parameters of your paper, however, I could not reproduce the results for RDT-B and COLLAB using the parameters.

Might I get your advice or suggestion about reproducing the results of COLLAB and RDT-B.

Regards

Question about input data

Hi. Thanks for great work.
I have a question about input data, specifically for node id numbering.
Is a node id unique across all graphs? or unique in a graph?

I mean, graph 1 contains node A, B, C, and graph 2 contain node A, D, E.
Then, node ids for graph 1 and 2 are both node 1, 2, 3 (for inductive setting).
If transductive setting, node ids are 1, 2, 3 for graph1 and 1, 4, 5 for graph2.

Is it correct?

Apply GIN to node classification

Hi and thanks for sharing your code.
When applying GIN to node classification task for example on cora dataset, the accuracy is low.
You said in the paper that for mean aggregation and linear function GIN is GCN. I use the DGL implementation of GIN for node classification but I can't produce accuracy near to GCN.
IS there a need for some preprocessing when applying GIN on node classification?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.