gasteigerjo / ppnp Goto Github PK
View Code? Open in Web Editor NEWPPNP & APPNP models from "Predict then Propagate: Graph Neural Networks meet Personalized PageRank" (ICLR 2019)
Home Page: https://www.daml.in.tum.de/ppnp
License: MIT License
PPNP & APPNP models from "Predict then Propagate: Graph Neural Networks meet Personalized PageRank" (ICLR 2019)
Home Page: https://www.daml.in.tum.de/ppnp
License: MIT License
Hi, I tried to run your code on ms_academic but I came across with this problem. You have 20 labeled nodes as the training data for each class, as you said in the paper. But there is a class in ms_academic dataset that has fewer than 20 nodes. How do you deal with this problem?
Hi,
Thanks for release the codes and it's a very interesting work! But I found that there is a gap between the results of pytorch implementation (e.g. 82.5% accuracy in cora_ml) and tensorflow (e.g. 85.2% accuracy in cora_ml) implementation. Why would it result in such a gap between pytorch and tensorflow?
I would be very appreciate if you could answer my question!
Hi,
Thank you for this interesting work! I just have a few clarification questions about the formulation of the (original) PageRank that you presented in the paper. You mention that PageRank can be calculated via r = Ar, but from what I understand from PageRank, this isn't the complete formulation of PageRank. PageRank also includes terms that take into account what they call "rank sinks." I'm wondering why this is missing in your equations.
Thank you!
I was wondering if my understanding is correct:
You actually create 4 datasets, train, early-stopping, val, test. Then, you tune hyper-parameters on [ train, early-stopping, val], after the best param is obtained by choosing the best-performing one on val set, the param is then applied on the test set (the 4th set) to report the accuracy.
Thanks!
Judging from this code and the fact that the labels need to be an array of ints (so no NaNs), it looks like this repo assumes a graph with a ground truth label for every node (and then hides some of those labels from training passes). I'm interested in running this on a graph for which I don't have ground truth for every node. So I have a couple questions:
Thanks for all the work you've done making this open!
I'm using torch for training, and I try to convert a dataset to sparsegraph defined in project, it raise me an error with numpy.
/ppnp/pytorch/propagation.py:12, in calc_A_hat(adj_matrix)
10 nnodes = adj_matrix.shape[0]
11 A = adj_matrix + sp.eye(nnodes)
---> 12 D_vec = np.sum(A, axis=1).A1
13 D_vec_invsqrt_corr = 1 [/](https://file+.vscode-resource.vscode-cdn.net/) np.sqrt(D_vec)
14 D_invsqrt_corr = sp.diags(D_vec_invsqrt_corr)
AttributeError: 'numpy.ndarray' object has no attribute 'A1'
Finally I find that bug in networkx_to_sparsegraph function.
# Extract adjacency matrix
adj_matrix = nx.adjacency_matrix(nx_graph)
The type of adj_matrix privide here was scipy.sparse._arrays.csr_array
, not the sparse matrix!
Thus, np.sum(A, axis=1)
will produce a numpy.ndarray
object which has no attribute 'A1', only numpy.matrix
has it. Actually, it's easy to fix it with try...except
block. I modify the calc_A_hat(adj_matrix)
function in propagation.py as below. ๐
def calc_A_hat(adj_matrix: sp.spmatrix) -> sp.spmatrix:
nnodes = adj_matrix.shape[0]
A = adj_matrix + sp.eye(nnodes)
D_vec = np.sum(A, axis=1)
try:
D_vec = D_vec.A1
except:
pass
D_vec_invsqrt_corr = 1 / np.sqrt(D_vec)
D_invsqrt_corr = sp.diags(D_vec_invsqrt_corr)
return D_invsqrt_corr @ A @ D_invsqrt_corr
Another bug is the labels. In function train_stopping_split
, the for i in range(max(labels) + 1):
statement needs a integer labels.
/ppnp/preprocessing.py:32, in train_stopping_split(idx, labels, ntrain_per_class, nstopping, seed)
30 rnd_state = np.random.RandomState(seed)
31 train_idx_split = []
---> 32 for i in range(max(labels) + 1):
33 train_idx_split.append(rnd_state.choice(
34 idx[labels == i], ntrain_per_class, replace=False))
35 train_idx = np.concatenate(train_idx_split)
TypeError: 'numpy.float64' object cannot be interpreted as an integer
But actully the type of labels produced by networkx_to_sparsegraph
function is float32
!
# Convert labels to integers
if labels is None:
class_names = None
else:
try:
labels = np.array(labels, dtype=np.float32)
class_names = None
except ValueError:
class_names = np.unique(labels)
class_mapping = {k: i for i, k in enumerate(class_names)}
labels_int = np.empty(nx_graph.number_of_nodes(), dtype=np.float32)
for inode, label in enumerate(labels):
labels_int[inode] = class_mapping[label]
labels = labels_int
Just modify it to integer
can fix this problem. ๐
Hi, thank you for making your code available on GitHub.
I've noticed that the MS academic dataset in this repo has only 5,116 nodes. On the other hand, the dataset mentioned in the paper has 18,333 nodes. If possible, may I know where I can find the full version of this dataset? Thank you!
I find the function normalize_attributes()
in preprocessing.py
, and I think it may be the function to normalize features.
I have read about why to normalize adjacency matrix, but I haven't learnt anything about normalizing features. So why to normalize the features? Is there any theoretical basis or explanation for this operation?
Hi! Are these datasets directed graph before the normalization?
Hi,
Thanks for the nice work! wonder if you could give a short description of the difference between this work's message passing and that of Ying. 2018, Graph Convolutional Neural Networks for Web-Scale Recommender Systems?
Both use (approximate) personalized page rank for message passing. Is there any procedural difference?
Thanks!
D
Hi there,
I tried to run your code and model, but I have not been able to get the results which is shown in the simple_example_tensorflow.ipynb using the same seed value used there.
Does the seed, set in idx_split_args, fix the data set split for training, early stopping and validation?Fixing the same seed value in idx_split_args, is there any other components which might cause the variance in accuracy (apart from the randomized components in training such as dropout and weight initialization)?
Thanks!
Thank you so much for your sharing.
This is a pretty good job.
Currently, I want to follow your work.
But, I am confused about how to use your project in my own server.
Could you please give me a concrete example?
Thank you so much for your help.
Dear klicperajo,
when I train with tensorflow, if I choose save_result when I train, it will appear name 'conf_mat' is not defined. so can you help me to deal with this problem, thank you very much.
the problem is in ppnp/tensorflow/training.py line 129
Hi, it's a wonderful job and the opensource code is clear!
However, I am a bit confused about some of the experimental procedures due to my lack of knowledge in significance testing. Specifically, in Sect.5, how to calculate the p-values of a paired t-test for your main claims.
Thus, would you be able to give a detailed explanation or share the calculating codes? I would appreciate it!
Besides, could you please tell me how to use this project for my own data?
Specifically, what is the format of the data provided by you?
Thank you so much.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.