gasteigerjo / ppnp Goto Github PK

PPNP & APPNP models from "Predict then Propagate: Graph Neural Networks meet Personalized PageRank" (ICLR 2019)

Home Page: https://www.daml.in.tum.de/ppnp

License: MIT License

Python 31.74% Jupyter Notebook 68.26%

deep-learning gcn gnn graph-algorithms graph-classification graph-neural-networks machine-learning pagerank pytorch tensorflow

ppnp's People

Contributors

Stargazers

Watchers

Forkers

sdxshuai songfgh conancui libertyeagle qss2012 kckishan legendtianjin crrflying valerytyumen zjl130345 coreybiao neal-ztwu zhuqingling lilingyunsunn 2danlin patrickgsheng lidanyang12 songzhen-neu bkj ibrahim85 ccfbupt marblet achalagarwal xiawenwen49 liun-online juhi10071998 ghgmail hanbei969 mldl basilwang leiwangr youngflyasd tangxznlp dongqifu jeffgan99 javierortegar lzz0007 sxxtyz princebadshah antoniooroz xiangwenkai mumer92 linhduongtuan techthiyanes zhuolinli-shu lfchener arielramos97 zaedulislam asclepiusinformatica zzysh12345 hookk dh-anna yaoyaohe-cn qin87

ppnp's Issues

Data split on ms_academic

Hi, I tried to run your code on ms_academic but I came across with this problem. You have 20 labeled nodes as the training data for each class, as you said in the paper. But there is a class in ms_academic dataset that has fewer than 20 nodes. How do you deal with this problem?

Gap between the results of pytorch implementation and tensorflow implementation

Hi,

Thanks for release the codes and it's a very interesting work! But I found that there is a gap between the results of pytorch implementation (e.g. 82.5% accuracy in cora_ml) and tensorflow (e.g. 85.2% accuracy in cora_ml) implementation. Why would it result in such a gap between pytorch and tensorflow?

I would be very appreciate if you could answer my question!

PageRank vs your formulation of PageRank

Hi,

Thank you for this interesting work! I just have a few clarification questions about the formulation of the (original) PageRank that you presented in the paper. You mention that PageRank can be calculated via r = Ar, but from what I understand from PageRank, this isn't the complete formulation of PageRank. PageRank also includes terms that take into account what they call "rank sinks." I'm wondering why this is missing in your equations.

Thank you!

How do you tune the hyper-paramerters?

I was wondering if my understanding is correct:

You actually create 4 datasets, train, early-stopping, val, test. Then, you tune hyper-parameters on [ train, early-stopping, val], after the best param is obtained by choosing the best-performing one on val set, the param is then applied on the test set (the 4th set) to report the accuracy.

Thanks!

Running on a graph with missing labels

Judging from this code and the fact that the labels need to be an array of ints (so no NaNs), it looks like this repo assumes a graph with a ground truth label for every node (and then hides some of those labels from training passes). I'm interested in running this on a graph for which I don't have ground truth for every node. So I have a couple questions:

Is the above understanding correct?
If so, before I really get into the code, any pointers on what I might need to change to make this work?

Thanks for all the work you've done making this open!

There have 2 bugs in networkx_to_sparsegraph function

I'm using torch for training, and I try to convert a dataset to sparsegraph defined in project, it raise me an error with numpy.

/ppnp/pytorch/propagation.py:12, in calc_A_hat(adj_matrix)
     10 nnodes = adj_matrix.shape[0]
     11 A = adj_matrix + sp.eye(nnodes)
---> 12 D_vec = np.sum(A, axis=1).A1
     13 D_vec_invsqrt_corr = 1 [/](https://file+.vscode-resource.vscode-cdn.net/) np.sqrt(D_vec)
     14 D_invsqrt_corr = sp.diags(D_vec_invsqrt_corr)

AttributeError: 'numpy.ndarray' object has no attribute 'A1'

Finally I find that bug in networkx_to_sparsegraph function.

# Extract adjacency matrix
adj_matrix = nx.adjacency_matrix(nx_graph)

The type of adj_matrix privide here was scipy.sparse._arrays.csr_array, not the sparse matrix!

Thus, np.sum(A, axis=1) will produce a numpy.ndarray object which has no attribute 'A1', only numpy.matrix has it. Actually, it's easy to fix it with try...except block. I modify the calc_A_hat(adj_matrix) function in propagation.py as below. 😊

def calc_A_hat(adj_matrix: sp.spmatrix) -> sp.spmatrix:
    nnodes = adj_matrix.shape[0]
    A = adj_matrix + sp.eye(nnodes)
    D_vec = np.sum(A, axis=1)
    try:
        D_vec = D_vec.A1
    except:
        pass
    D_vec_invsqrt_corr = 1 / np.sqrt(D_vec)
    D_invsqrt_corr = sp.diags(D_vec_invsqrt_corr)
    return D_invsqrt_corr @ A @ D_invsqrt_corr

Another bug is the labels. In function train_stopping_split, the for i in range(max(labels) + 1): statement needs a integer labels.

/ppnp/preprocessing.py:32, in train_stopping_split(idx, labels, ntrain_per_class, nstopping, seed)
     30 rnd_state = np.random.RandomState(seed)
     31 train_idx_split = []
---> 32 for i in range(max(labels) + 1):
     33     train_idx_split.append(rnd_state.choice(
     34             idx[labels == i], ntrain_per_class, replace=False))
     35 train_idx = np.concatenate(train_idx_split)

TypeError: 'numpy.float64' object cannot be interpreted as an integer

But actully the type of labels produced by networkx_to_sparsegraph function is float32!

# Convert labels to integers
if labels is None:
    class_names = None
else:
    try:
        labels = np.array(labels, dtype=np.float32)
        class_names = None
    except ValueError:
        class_names = np.unique(labels)
        class_mapping = {k: i for i, k in enumerate(class_names)}
        labels_int = np.empty(nx_graph.number_of_nodes(), dtype=np.float32)
        for inode, label in enumerate(labels):
            labels_int[inode] = class_mapping[label]
        labels = labels_int

Just modify it to integer can fix this problem. 😊

Full version of MS academic dataset?

Hi, thank you for making your code available on GitHub.

I've noticed that the MS academic dataset in this repo has only 5,116 nodes. On the other hand, the dataset mentioned in the paper has 18,333 nodes. If possible, may I know where I can find the full version of this dataset? Thank you!

Why to normalize features?

I find the function normalize_attributes() in preprocessing.py, and I think it may be the function to normalize features.
I have read about why to normalize adjacency matrix, but I haven't learnt anything about normalizing features. So why to normalize the features? Is there any theoretical basis or explanation for this operation?

Dataset

Hi! Are these datasets directed graph before the normalization?

About comparing to Ying. 2018

Hi,

Thanks for the nice work! wonder if you could give a short description of the difference between this work's message passing and that of Ying. 2018, Graph Convolutional Neural Networks for Web-Scale Recommender Systems?
Both use (approximate) personalized page rank for message passing. Is there any procedural difference?

Thanks!
D

Issue with reproducing the results: variance in accuracy

Hi there,
I tried to run your code and model, but I have not been able to get the results which is shown in the simple_example_tensorflow.ipynb using the same seed value used there.
Does the seed, set in idx_split_args, fix the data set split for training, early stopping and validation?Fixing the same seed value in idx_split_args, is there any other components which might cause the variance in accuracy (apart from the randomized components in training such as dropout and weight initialization)?

Thanks!

How to use your project?

Thank you so much for your sharing.
This is a pretty good job.
Currently, I want to follow your work.
But, I am confused about how to use your project in my own server.
Could you please give me a concrete example?
Thank you so much for your help.

when I train with tensorflow, an error about the confusion_matrix appeared

Dear klicperajo,
when I train with tensorflow, if I choose save_result when I train, it will appear name 'conf_mat' is not defined. so can you help me to deal with this problem, thank you very much.
the problem is in ppnp/tensorflow/training.py line 129

Problem about significance test.

Hi, it's a wonderful job and the opensource code is clear!
However, I am a bit confused about some of the experimental procedures due to my lack of knowledge in significance testing. Specifically, in Sect.5, how to calculate the p-values of a paired t-test for your main claims.
Thus, would you be able to give a detailed explanation or share the calculating codes? I would appreciate it!

How to use this project for my own data

Besides, could you please tell me how to use this project for my own data?
Specifically, what is the format of the data provided by you?
Thank you so much.

ValueError of graph.standarlize

So I encountered this "missing match of shape" error. And I failed to find fixes for such bugs online. Could you tell me how to solve it? Thank you so much!