thudm / cogdl Goto Github PK

CogDL: A Comprehensive Library for Graph Deep Learning (WWW 2023)

License: MIT License

Python 93.24% C++ 2.46% Cuda 4.12% Shell 0.05% Makefile 0.02% C 0.11%

gnn-model graph-classification graph-embedding graph-neural-networks leaderboard link-prediction node-classification pytorch

cogdl's People

Contributors

Stargazers

Watchers

Forkers

awesome-archive zxlzr flylearning qinjr cenyk1230 changanliu zxhhh97 icyxiaowenyi fereyfang seeker1943 beesitech debrawang samzhaoziran jimmyjunucas xyuan dennisshaw zbyzby11 wellwang simwiki littlebadrobot tianjiansmile qibinc ldw-sh-cn milllllk fishredleaf liuweiping2020 wupuqu kyzhouhzau javastudenttwo shijintong limingdata liuchuang0059 kaiqiao1992 zhangch9 chaosqian neng245547874 nifannn gnn2qsu psy2013github hsinyu7330 shenyi666666 amorsun bznkxs songzhen-neu huaxz1986 jiaoyining zbn123 xs-li simba2017 rain0831 crack521 joee1995 gaimjkp qianrenjian liuxinkai94 itaymanes hamigua2019 greenerz barcawy michael-wzhu liboyuty w1074098501 shawntl detached-whale destinyjlu brucew91 pinkney03 sengxian sorrowyn yaoxingcheng chaoshengt zxin1023 qazcy1983 sxw814457915 drpengsong yunyoonaer silvaco chengzhipiao jennyjrwong tmacmilan hmartelb li-ziang hbhswl yaofeng1998 fannzi irisli17 kwyoke sahandfer xssstory shawnrs-dl jiayilijayee brickser frouioui gxiaodong wxr1998 malin2223 spkgyk tiagomantunes alvinwen428 bluep0int

cogdl's Issues

python scripts/train.py --task unsupervised_node_classification --dataset wikipedia --model gcn gat deepwalk node2vec hope netmf netsmf prone

python scripts/train.py --task unsupervised_node_classification --dataset wikipedia --model gcn gat deepwalk node2vec hope netmf netsmf prone
我是运行的上面的命令
出现如下的错误。环境都安装好了 macOS +torch1.4.0 无cuda,在笔记本本地跑的。
`Traceback (most recent call last):
File "scripts/train.py", line 75, in
results = [main(args) for args in variant_args_generator(args, variants)]
File "scripts/train.py", line 75, in
results = [main(args) for args in variant_args_generator(args, variants)]
File "scripts/train.py", line 26, in main
task = build_task(args)
File "/Users/XXX/cogdl-master/cogdl/tasks/init.py", line 48, in build_task
return TASK_REGISTRYargs.task
File "/Users/XXX/cogdl-master/cogdl/tasks/unsupervised_node_classification.py", line 56, in init
self.model = build_model(args)
File "/Users/XXX/cogdl-master/cogdl/models/init.py", line 108, in build_model
return MODEL_REGISTRY[args.model].build_model_from_args(args)
File "/Users/XXX/cogdl-master/cogdl/models/nn/gcn.py", line 71, in build_model_from_args
return cls(args.num_features, args.hidden_size, args.num_classes, args.dropout)
File "/Users/XXX/cogdl-master/cogdl/models/nn/gcn.py", line 76, in init
self.gc1 = GraphConvolution(nfeat, nhid)
File "/Users/zengyujian/cogdl-master/cogdl/models/nn/gcn.py", line 20, in init
self.weight = Parameter(torch.FloatTensor(in_features, out_features))
TypeError: new() received an invalid combination of arguments - got (NoneType, int), but expected one of:

(torch.device device)
(torch.Storage storage)
(Tensor other)
(tuple of ints size, torch.device device)
didn't match because some of the arguments have invalid types: (NoneType, int)
(object data, torch.device device)
didn't match because some of the arguments have invalid types: (NoneType, int)`

Some question about link prediction

It seems that it is complex to implement other link prediction model based cogdl due to the design of HomoLinkPrediction task, such as https://github.com/rusty1s/pytorch_geometric/blob/master/examples/link_pred.py. Can you give me some ideas? Thanks.

wrong negative sampling in LINE? and some other suggestions

In LINE's code, there might be a minor error in negative sampling part (if I understand it correctly).
https://github.com/THUDM/cogdl/blob/a69a969020b8aa41cfcd8ac54511984bc5b32d62/cogdl/models/emb/line.py#L133-L137

If index j for negative samples start at 1, then the number of negative sample should be self.negative-1. For example, if you set self.negative=5, 0 is not the negative sample (since it is skipped in the for loop) and 1,2,3,4 are negative samples drawn by alias algorithm. And I also checked original implementation by Jian Tang, the range of negative sampling is set as negative+1 (please see below).
https://github.com/tangjianpku/LINE/blob/d5f840941e0f4026090d1b1feeaf15da38e2b24b/linux/line.cpp#L332-L348

Some other suggestions:

It seems that cora, citeseer, pubmed are not supported in current version. I tried to run on these datasets, but error occurs saying that datasets are not supported. It would be better to provide documentation on how to run tasks on these datasets, or how to add new datasets (e.g., data formats, paths, naming conventions) by myself to run tasks
It would be better if there are more details on: stats of datasets (e.g., labeled or not), which datasets are supported for which tasks.
It would be better to provide some docstrings/comments in source code indicating the meaning of some variables (e.g., input, output).

入侵检测

我想应用于入侵检测，入侵检测中的网络数据集要怎么处理呢？
KDDCup '99': http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
NSL-KDD: http://www.unb.ca/cic/datasets/nsl.html
UNSW-NB15: https://www.unsw.adfa.edu.au/australian-centre-for-cyber-security/cybersecurity/ADFA-NB15-Datasets/
IDS2017: https://www.unb.ca/cic/datasets/ids-2017.html

Problem with testing graph classification models

Hello,
I have been trying out different graph classification models on the available datasets. Running some models on a number of datasets for this task generates 'RuntimeError: There were no tensor arguments to this function' in the beginning or middle of the training phase. In addition, I haven't been able to observe the accuracy that is provided in the README file from training the given models. So I'm wondering what the problem could be, whether it is a bug or a dependancy problem, since I'm also trying to get the results for my newly added graph classification model. Thank you

Computer configuration

What are the requirements for the configuration of the computer and the pyTorch version of the library?

运行experiment(task="node_classification", dataset="cora", model="gcn", hidden_size=32, max_epoch=200)时，提示缺少cogdl\\match.yml文件

📚 Installation

Environment

OS:
Python version:
PyTorch version:
CUDA/cuDNN version:
How did you try to install CogDL (wheel, source):
Any other relevant information:

Checklist

I followed the installation guide.
I set up CUDA correctly.
I do have multiple CUDA versions on my machine.

Additional context

when I run the example of combaining model,dataset and task in the tutorial, the pycharm return "Using backend: pytorch" and there is nothing about the ret.

Error:
Using backend: pytorch

Some Advice

1.Adding docs for preparing data

Like what Euler does: Preparing-Data

2.Support exporting embedding for nodes

3.Description for Graph Structure

supporting bipartite graph?
supervised or unsupervised?

results on shapenet part segmentation using dgcnn

so, is there any result when using dgcnn for pointcloud segmention, hope you add to your leadboard, thanks.

documents

@neozhangthe1
Hi, I'm very curious about your project. I would like to consult with your the problem what is the difference from the geometrics.
And I found that the documents are incomplete and lace some introduction about how to use. And I expect the update the documents.
thank you!

Possible label leakage problem with the link prediction task

This happens for undirected networks like PPI. The link prediction task class

https://github.com/THUDM/cogdl/blob/4ed7838018400377dae9da30017399f56585208f/cogdl/tasks/link_prediction.py#L116

reads the adjacency matrix directly without removing duplicates, this means the the edge list would have (x, y) and (y, x) at the same time for every edge.

(x, y) and (y, x) are referring to the same edge, producing the same cosine similarity for x and y in evaluation later. However, the training-test splitting process treats them as independent edges. As a result, a great portion of the generated test edges are also in the training set, just with a reversed order of two ends.

I have tried resolving this label leakage issue, and I can only get about 0.8 ROC-AUC on PPI instead of over 0.9 reported in your leader board.

how to run unsupervised_node_classification task on graphsage model

Dear author, when I tried run unsupervised_node_classification task on graphsage model, this error is shown as follow:

Traceback (most recent call last):
File "train.py", line 76, in
results = pool.map(main, variant_args_generator())
File "/home/mli/.localpython/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/mli/.localpython/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
TypeError: new() received an invalid combination of arguments - got (NoneType, int), but expected one of:

(torch.device device)
(torch.Storage storage)
(Tensor other)
(tuple of ints size, torch.device device)
didn't match because some of the arguments have invalid types: (NoneType, int)
(object data, torch.device device)
didn't match because some of the arguments have invalid types: (NoneType, int)

ModuleNotFoundError

Traceback (most recent call last):
File "train.py", line 14, in
from cogdl import options
ModuleNotFoundError: No module named 'cogdl'

What's the matter, please

when i want to install CogDL via:git clone [email protected]:THUDM/cogdl.git , there are some question

❓ Questions & Help

Cloning into 'cogdl'...
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

cogdl on linux tesla K40c

HI, my environment is linux, tesla K40c, pytorch1.4, cuda101, python3.7.
When I run python gcn.py, the error information is
Traceback (most recent call last):
File "gcn.py", line 29, in
task = build_task(args, dataset=dataset, model=model)
File "/home/hsc/Desktop/wyt/cogdl-master/cogdl/tasks/init.py", line 54, in build_task
return TASK_REGISTRY[args.task](args, dataset=dataset, model=model)
File "/home/hsc/Desktop/wyt/cogdl-master/cogdl/tasks/node_classification.py", line 35, in init
args.num_features = dataset.num_features
File "/home/hsc/Desktop/wyt/anaconda3/envs/pytorch_1.4/lib/python3.7/site-packages/torch_geometric/data/dataset.py", line 117, in num_features
return self.num_node_features
File "/home/hsc/Desktop/wyt/anaconda3/envs/pytorch_1.4/lib/python3.7/site-packages/torch_geometric/data/dataset.py", line 112, in num_node_features
return self[0].num_node_features
File "/home/hsc/Desktop/wyt/anaconda3/envs/pytorch_1.4/lib/python3.7/site-packages/torch_geometric/data/dataset.py", line 189, in getitem
data = data if self.transform is None else self.transform(data)
File "/home/hsc/Desktop/wyt/anaconda3/envs/pytorch_1.4/lib/python3.7/site-packages/torch_geometric/transforms/target_indegree.py", line 27, in call
deg = degree(col, data.num_nodes)
File "/home/hsc/Desktop/wyt/anaconda3/envs/pytorch_1.4/lib/python3.7/site-packages/torch_geometric/utils/degree.py", line 20, in degree
out = torch.zeros((num_nodes), dtype=dtype, device=index.device)
RuntimeError: CUDA error: no kernel image is available for execution on the device

K40c has a compute capability of 3.5. Does it not support PyTorch 1.4?
Thanks!

oagbert model Killed

🐛 Bug

I'm using oagbert for sentence embedding.

To Reproduce

I want to obtain the embedding of a list of sentences ( len(corpus)=124) so I use (as the example) the following code:

tokenizer, bert_model = oagbert()
tokens = tokenizer(corpus, return_tensors="pt", padding=True)
batch_embeddings = bert_model(**tokens)
embeddings = embeddings[1]

Expected behavior

embedding should be the a torch.Size([124, 768]) but instead the bert_model(**tokens) says:

Killed

Advice: provide a script to download all datasets for offline usage

Currrently, CogDL downloads the missing datasets on the fly. However, some servers are installed in an environment isolated from the Internet. It is inconvenient to use CogDL in such environment.

A script to download all the needed datasets into a local directory will be very helpful. Users can then upload the local directory to the remote server only once.

Thanks for considering the advice.

Refactor Data, Dataset

Hi,

Our current implementations of data preprocessing and data loading are borrowed from pyg. This part needs refactor before release.

Embedding vectors

Can I output the embedding vectors (contex vectors) as a file ?

Add Adaptive Sampling GCN

Add AS-GCN for comparison with Dr-GCN

How to setup codgl in cluster!!

Hi,
I am trying to setup cogdl in a virtual environment of a cluster. Can you provide setup instruction to do so!!
Thanks,
Ajay Madhavan

Add GATNE

kuogeniubi!

unsupervised_node_classification evaluation

Hi. When evaluating the performance of node classification, why LINE, NetMF, ProNE has the same result every time? For example, if use Wikipedia dataset on NetMF, It's always going to be this,
| ('wikipedia', 'netmf') | 0.4373±0.0000 | 0.4747±0.0000 | 0.4883±0.0000 | 0.4953±0.0000 | 0.5022±0.0000 |

Looking forward to your reply, Thanks!

Issue about unsupervised_graph_classification

❓ Questions & Help

Hi,I just installed cogdl and tried to run a demo,but I found is seems that unsupervised_graph_classification model and dataset are missing,for example ,gin and infograph
So,maybe something is wrong with my code?
My code is :
experiment(task="unsupervised_graph_classification", dataset="proteins", model="infograph")

AttributeError: 'NoneType' object has no attribute 'dim'

python train.py --task node_classification --dataset wikipedia --model gcn

AttributeError: 'NoneType' object has no attribute 'dim'

[Question] Which node classification to use for which model?

Hi,
Which one of the 4 node classification tasks should be used for the model 'deep walk', 'node2vec','NetMF', and 'NetSMF'?.

python scripts/train.py --task node_classification --dataset cora --model gcn

Using backend: pytorch
Namespace(cpu=False, dataset=['cora'], device_id=[0], dropout=0.5, enhance=None, hidden_size=64, lr=0.01, max_epoch=500, model=['gcn'], num_classes=None, num_features=None, patience=100, save_dir='.', seed=[1], task='node_classification', weight_decay=0.0005)
Traceback (most recent call last):
File "scripts/train.py", line 101, in
results = [main(args) for args in variant_args_generator(args, variants)]
File "scripts/train.py", line 101, in
results = [main(args) for args in variant_args_generator(args, variants)]
File "scripts/train.py", line 29, in main
task = build_task(args)
File "/Users/happywei/Desktop/code/cogdl-master/cogdl/tasks/init.py", line 49, in build_task
return TASK_REGISTRYargs.task
File "/Users/happywei/Desktop/code/cogdl-master/cogdl/tasks/node_classification.py", line 66, in init
self.data.apply(lambda x: x.to(self.device))
File "/Users/happywei/opt/anaconda3/lib/python3.7/site-packages/torch_geometric/data/data.py", line 308, in apply
self[key] = self.apply(item, func)
File "/Users/happywei/opt/anaconda3/lib/python3.7/site-packages/torch_geometric/data/data.py", line 287, in apply
return func(item)
File "/Users/happywei/Desktop/code/cogdl-master/cogdl/tasks/node_classification.py", line 66, in
self.data.apply(lambda x: x.to(self.device))
File "/Users/happywei/opt/anaconda3/lib/python3.7/site-packages/torch/cuda/init.py", line 186, in _lazy_init
_check_driver()
File "/Users/happywei/opt/anaconda3/lib/python3.7/site-packages/torch/cuda/init.py", line 61, in _check_driver
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

我配置的环境是基于CPU版本的PyTorch 1.6，请问只有配置CUDA才能运行吗？

metapath2vec --schema

Hi,
When I choose schema "0-1-0,0-1-2-1-0”, not all nodes are included in walks.
It will throw error:
KeyError: "word '6931' not in vocabulary"
How to solve this problem?Thanks!

Learn and save embedding files for customized dataset WITHOUT running evaluation tasks.

Thanks for open sourcing this wonderful repo. Is it possible to directly learn and save embedding files for customized dataset WITHOUT running evaluation tasks? Can I do this via command line ? Thanks

Possible bug in cogdl/tasks/node_classification.py

cogdl/tasks/node_classification.py, line 93 may be wrong? it seems that the missing_rate is not used.

if args.missing_rate >= 0:
    if args.model == "sgcpn":
        assert args.dataset in ["cora", "citeseer", "pubmed"]
        dataset.data = preprocess_data_sgcpn(dataset.data, normalize_feature=True, missing_rate=0)
        adj_slice = torch.tensor(dataset.data.adj.size())
        adj_slice[0] = 0
        dataset.slices["adj"] = adj_slice

Where is the DBLP dataset (51,264 nodes)?

Hi, does anyone know where the DBLP dataset with 51264 nodes and 127,968 edges is? (mentioned in this page: https://keg.cs.tsinghua.edu.cn/cogdl/datasets.html)

Running on new dataset

Hi,
I am wondering if this is limited to the datasets you have made available. Is there any documentation for how to format a new graph dataset to test?

I apologize as this is likely not a code base issue, but the slack invite link is broken.

Thanks,
Kayla

Split package requirements

Not all packages in setup.py of cogdl are actually needed for end-users, and setuptools supports split requirements into install_requires, setup_requires and tests_requires.

It will great to move packages like pytest and spinx out of install_requires, the former should be in tests_requires, and the latter should be removed because doc folder has its own requirements.txt.

怎样获得模型输出而不是准确率？

❓ Questions & Help

Hi,
CogDL是很好的工作包，请问怎样获得结点分类模型对每个结点的预测值？如用GAT在Cora上执行半监督结点分类，希望得到输出尺寸为（2708，7）的预测矩阵，而不是简单的测试精度。

when i want to install CogDL via:git clone [email protected]:THUDM/cogdl.git , there are some questions

❓ Questions & Help

Cloning into 'cogdl'...
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

"missing 1 required positional argument: num_nodes" when running graphsage model

I am sorry to bother you, could you help me? when i run the command:
python scripts/train.py -dt cora --model graphsage -t node_classification, i get this error:

Traceback (most recent call last):
  File "/home/xxx/anaconda3/envs/gpu/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/xxx/anaconda3/envs/gpu/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/xxx/cogdl/scripts/parallel_train.py", line 34, in main
    result = task.train()
  File "/home/xxx/cogdl/cogdl/tasks/node_classification.py", line 52, in train
    self._train_step()
  File "/home/xxx/cogdl/cogdl/tasks/node_classification.py", line 80, in _train_step
    self.model(self.data.x, self.data.edge_index)[self.data.train_mask],
  File "/home/xxx/anaconda3/envs/gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xxx/cogdl/cogdl/models/nn/graphsage.py", line 86, in forward
    x = self.convs[i](x, edge_index_sp)
  File "/home/xxx/anaconda3/envs/gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
TypeError: forward() missing 1 required positional argument: 'num_nodes'

random.seed

I want to use the function display_data.py, I get the error:

Traceback (most recent call last):
  File "display_data.py", line 75, in <module>
    random.seed(args.seed)
  File "/Users/wangzhikai/.conda/envs/cogdl/lib/python3.7/random.py", line 126, in seed
    super().seed(a)
TypeError: unhashable type: 'list'

Add GraphSAGE (sample) on Reddit

The current GraphSAGE implementation sample neighbors for every nodes in one batch. This only works for toy datasets (e.g., Cora), not for larger datasets (e.g., Reddit)

Optimize Exception Message

Loading CogDL will report Ninja is required to load C++ extensions, which could be traced back to https://github.com/THUDM/cogdl/blob/94b8bc9bf8215f63eb1f0471457ca831817a9bf3/cogdl/operators/sample.py#L14

Similarly, the following code also has the same problem.
https://github.com/THUDM/cogdl/blob/94b8bc9bf8215f63eb1f0471457ca831817a9bf3/cogdl/operators/spmm.py#L26

It might be better to add extra message to the print(e) to identify the "Ninja" message is thrown here. Since the first line will be reached as soon as cogdl is imported, it might confuse the users that do not use this module.

how to use cogdl in pycharm?

unsupervised node classification evaluation

Hi,

Internal Error in formatting

Anyone knows what this error is about?

About the performance

Hi, thanks for the great job!
I simplify this project and remove the torch dependency, and plan to implement it under tf.
Now I have finished the algorithms of emb folder. These are pure Python implementations. But my test results is poor. I find the default parameters in codes differ from the readme. So, could you please clarify the parameters of the results displayed on your website？
Thanks a lot.

How to run the unsupervised graphsage by using command?

It seems cannot use the command "--task unsupervised_node_classification --dataset cora --model graphsage --seed 0 1 2 3 4" to run the unsupervised graphsage

怎么获取模型的acc、recall、F1或者分类报告？

非常感谢你们开发并开源cogdl，请问怎样获得结点分类模型分分类报告而不是只显示acc呢？

十分感谢！！

Having trouble using youtube dataset to do unsupervised node classification

when use youtube dataset to do unsupervised node classification. It will throw error:

"...cogdl\cogdl\tasks\unsupervised_node_classification.py", line 54, in __init__
    self.num_nodes, self.num_classes = self.data.y.shape
AttributeError: 'NoneType' object has no attribute 'shape'

Issues loading TU dataset

❓ Questions & Help

@THINK2TRY

You were the last person to make significant edits to the TU dataloader. I am getting an error when loading what I believe to be correctly formatted data. I've been trying to debug for a while now. Any idea what is happening? I've attached the error along with my dataset. No worries if you don't have time to investigate, figured I'd ask in case I am missing something straightforward :)

tu-format-gh.zip