Giter Site home page Giter Site logo

gasteigerjo / ppnp Goto Github PK

View Code? Open in Web Editor NEW
317.0 317.0 54.0 9.11 MB

PPNP & APPNP models from "Predict then Propagate: Graph Neural Networks meet Personalized PageRank" (ICLR 2019)

Home Page: https://www.daml.in.tum.de/ppnp

License: MIT License

Python 31.74% Jupyter Notebook 68.26%
deep-learning gcn gnn graph-algorithms graph-classification graph-neural-networks machine-learning pagerank pytorch tensorflow

ppnp's People

Contributors

gasteigerjo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ppnp's Issues

Data split on ms_academic

Hi, I tried to run your code on ms_academic but I came across with this problem. You have 20 labeled nodes as the training data for each class, as you said in the paper. But there is a class in ms_academic dataset that has fewer than 20 nodes. How do you deal with this problem?

Gap between the results of pytorch implementation and tensorflow implementation

Hi,

Thanks for release the codes and it's a very interesting work! But I found that there is a gap between the results of pytorch implementation (e.g. 82.5% accuracy in cora_ml) and tensorflow (e.g. 85.2% accuracy in cora_ml) implementation. Why would it result in such a gap between pytorch and tensorflow?

I would be very appreciate if you could answer my question!

PageRank vs your formulation of PageRank

Hi,

Thank you for this interesting work! I just have a few clarification questions about the formulation of the (original) PageRank that you presented in the paper. You mention that PageRank can be calculated via r = Ar, but from what I understand from PageRank, this isn't the complete formulation of PageRank. PageRank also includes terms that take into account what they call "rank sinks." I'm wondering why this is missing in your equations.

Thank you!

How do you tune the hyper-paramerters?

I was wondering if my understanding is correct:

You actually create 4 datasets, train, early-stopping, val, test. Then, you tune hyper-parameters on [ train, early-stopping, val], after the best param is obtained by choosing the best-performing one on val set, the param is then applied on the test set (the 4th set) to report the accuracy.

Thanks!

Running on a graph with missing labels

Judging from this code and the fact that the labels need to be an array of ints (so no NaNs), it looks like this repo assumes a graph with a ground truth label for every node (and then hides some of those labels from training passes). I'm interested in running this on a graph for which I don't have ground truth for every node. So I have a couple questions:

  1. Is the above understanding correct?
  2. If so, before I really get into the code, any pointers on what I might need to change to make this work?

Thanks for all the work you've done making this open!

There have 2 bugs in networkx_to_sparsegraph function

I'm using torch for training, and I try to convert a dataset to sparsegraph defined in project, it raise me an error with numpy.

/ppnp/pytorch/propagation.py:12, in calc_A_hat(adj_matrix)
     10 nnodes = adj_matrix.shape[0]
     11 A = adj_matrix + sp.eye(nnodes)
---> 12 D_vec = np.sum(A, axis=1).A1
     13 D_vec_invsqrt_corr = 1 [/](https://file+.vscode-resource.vscode-cdn.net/) np.sqrt(D_vec)
     14 D_invsqrt_corr = sp.diags(D_vec_invsqrt_corr)

AttributeError: 'numpy.ndarray' object has no attribute 'A1'

Finally I find that bug in networkx_to_sparsegraph function.

# Extract adjacency matrix
adj_matrix = nx.adjacency_matrix(nx_graph)

The type of adj_matrix privide here was scipy.sparse._arrays.csr_array, not the sparse matrix!

Thus, np.sum(A, axis=1) will produce a numpy.ndarray object which has no attribute 'A1', only numpy.matrix has it. Actually, it's easy to fix it with try...except block. I modify the calc_A_hat(adj_matrix) function in propagation.py as below. ๐Ÿ˜Š

def calc_A_hat(adj_matrix: sp.spmatrix) -> sp.spmatrix:
    nnodes = adj_matrix.shape[0]
    A = adj_matrix + sp.eye(nnodes)
    D_vec = np.sum(A, axis=1)
    try:
        D_vec = D_vec.A1
    except:
        pass
    D_vec_invsqrt_corr = 1 / np.sqrt(D_vec)
    D_invsqrt_corr = sp.diags(D_vec_invsqrt_corr)
    return D_invsqrt_corr @ A @ D_invsqrt_corr

Another bug is the labels. In function train_stopping_split, the for i in range(max(labels) + 1): statement needs a integer labels.

/ppnp/preprocessing.py:32, in train_stopping_split(idx, labels, ntrain_per_class, nstopping, seed)
     30 rnd_state = np.random.RandomState(seed)
     31 train_idx_split = []
---> 32 for i in range(max(labels) + 1):
     33     train_idx_split.append(rnd_state.choice(
     34             idx[labels == i], ntrain_per_class, replace=False))
     35 train_idx = np.concatenate(train_idx_split)

TypeError: 'numpy.float64' object cannot be interpreted as an integer

But actully the type of labels produced by networkx_to_sparsegraph function is float32!

# Convert labels to integers
if labels is None:
    class_names = None
else:
    try:
        labels = np.array(labels, dtype=np.float32)
        class_names = None
    except ValueError:
        class_names = np.unique(labels)
        class_mapping = {k: i for i, k in enumerate(class_names)}
        labels_int = np.empty(nx_graph.number_of_nodes(), dtype=np.float32)
        for inode, label in enumerate(labels):
            labels_int[inode] = class_mapping[label]
        labels = labels_int

Just modify it to integer can fix this problem. ๐Ÿ˜Š

Full version of MS academic dataset?

Hi, thank you for making your code available on GitHub.

I've noticed that the MS academic dataset in this repo has only 5,116 nodes. On the other hand, the dataset mentioned in the paper has 18,333 nodes. If possible, may I know where I can find the full version of this dataset? Thank you!

Why to normalize features?

I find the function normalize_attributes() in preprocessing.py, and I think it may be the function to normalize features.
I have read about why to normalize adjacency matrix, but I haven't learnt anything about normalizing features. So why to normalize the features? Is there any theoretical basis or explanation for this operation?

Dataset

Hi! Are these datasets directed graph before the normalization?

About comparing to Ying. 2018

Hi,

Thanks for the nice work! wonder if you could give a short description of the difference between this work's message passing and that of Ying. 2018, Graph Convolutional Neural Networks for Web-Scale Recommender Systems?
Both use (approximate) personalized page rank for message passing. Is there any procedural difference?

Thanks!
D

Issue with reproducing the results: variance in accuracy

Hi there,
I tried to run your code and model, but I have not been able to get the results which is shown in the simple_example_tensorflow.ipynb using the same seed value used there.
Does the seed, set in idx_split_args, fix the data set split for training, early stopping and validation?Fixing the same seed value in idx_split_args, is there any other components which might cause the variance in accuracy (apart from the randomized components in training such as dropout and weight initialization)?

Thanks!

How to use your project?

Thank you so much for your sharing.
This is a pretty good job.
Currently, I want to follow your work.
But, I am confused about how to use your project in my own server.
Could you please give me a concrete example?
Thank you so much for your help.

Problem about significance test.

Hi, it's a wonderful job and the opensource code is clear!
However, I am a bit confused about some of the experimental procedures due to my lack of knowledge in significance testing. Specifically, in Sect.5, how to calculate the p-values of a paired t-test for your main claims.
Thus, would you be able to give a detailed explanation or share the calculating codes? I would appreciate it!

How to use this project for my own data

Besides, could you please tell me how to use this project for my own data?
Specifically, what is the format of the data provided by you?
Thank you so much.

ValueError of graph.standarlize

Screenshot 2019-10-27 at 23 49 44

So I encountered this "missing match of shape" error. And I failed to find fixes for such bugs online. Could you tell me how to solve it? Thank you so much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.