Hi, I would like to inquire about the structure of the <ml_networ

Questions about how to construt the initial <ml_network.csv> file for node classification task,about yule-buaa/dyglib

Comments (11)

yule-BUAA commented on September 27, 2024

Hello,

If I understand correctly, a simple solution for the mentioned case is to add a new column (maybe named as dst_label) to the <ml_network.csv> file, which stores the label of target nodes.

Then, when loading the dataset for node classification, you can add a line below here to read the target node labels, like dst_labels = graph_df.dst_label.values. Also, remember to add a new property of Data at here and below here, for example, self.dst_labels = dst_labels. Similarly, please get the target node labels here when using the target node labels to compute the loss.

from dyglib.

mgao97 commented on September 27, 2024

Thank you very much! I think it is clear to me now.

I have another question regarding dynamic node classification. I have the edgelist csv file with three columns: source_id, target_id, and timestamp (a list of timestamps). the third column means that there are at least several interactions between the source node and the target node.

My question is that how to modify and condtruct the dataset and model trainig and test for supporting above scenario and dataset.

from dyglib.

yule-BUAA commented on September 27, 2024

Firstly, for each line, you can split the interaction list into multiple lines, where each line stands for a single interaction. For example, split the first line u18839785, u266463103, "['2022-01-14 xxxx', '2021-11-30 xxxx']" into u18839785, u266463103, '2022-01-14 xxxx' and u18839785, u266463103, '2021-11-30 xxxx'.
Then, after splitting all the lines, you can sort the interactions in increasing order based on the timestamp.
The obtained data can be computed by our code.

from dyglib.

mgao97 commented on September 27, 2024

Thanks a lot! I will check it ASAP.

from dyglib.

yule-BUAA commented on September 27, 2024

Close this issue now. Feel free to reopen it when needed.

from dyglib.

mgao97 commented on September 27, 2024

Hi,

I have a question regarding the dataset utilized in your code. Could you please specify if the dataset employed is an instance of a homogeneous graph within the context of dynamic node classification?

from dyglib.

yule-BUAA commented on September 27, 2024

Hi,

Our work uses two datasets for dynamic node classification, including Wikipedia and Reddit. Compared with homogeneous graphs, these two datasets are bipartite with two types of nodes. You can refer to Section B.1 and Table 6 in our paper for more details.

from dyglib.

mgao97 commented on September 27, 2024

Yes, I see.

However, I am uncertain as to how these models can be adapted to classify both source and target nodes in a single homogeneous graph.

Wikipedia and Reddit both contain source nodes that are labelled, but in a homogeneous graph, both source and target nodes must be labelled.

Could you please advise me on how to modify the code to accommodate this scenario? According to your previous suggestions #13 (comment), it still does not work.

Thank you so much!

from dyglib.

yule-BUAA commented on September 27, 2024

In my opinion, by adding a column to additionally denote the target node labels and loading them with slight code modifications, your case can be solved. Could you further explain why the previous suggestion cannot work?

from dyglib.

mgao97 commented on September 27, 2024

Sure!

One of the most important questions for me is how to construct the original csv file like wikipedia.csv?

Specifically, I have 100 million edges and each with individual timestamp, I am not clear how to place the node features for both source nodes and target nodes in the original csv file. The node features are also about 1k dimensions.

from dyglib.

yule-BUAA commented on September 27, 2024

We obtain the original files like wikipedia.csv from previous work EdgeBank.

For the mentioned case, you can directly place the edge features as the original wikipedia.csv does. Then, for node features, if the node features change over time, you can append the features of source and target nodes after edge feature at each line. Moreover, you should also record the edge and node feature dimensions, so that you can split them like this. If the node features are consistent regardless of time, you can additionally save node features in a separate file with shape (num_nodes, node_feat_dim), and then load the node features by the node index in each line.

from dyglib.

Questions about how to construt the initial <ml_network.csv> file for node classification task about dyglib HOT 11 CLOSED

Comments (11)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent