thudm / cogdl Goto Github PK
View Code? Open in Web Editor NEWCogDL: A Comprehensive Library for Graph Deep Learning (WWW 2023)
Home Page: https://cogdl.ai
License: MIT License
CogDL: A Comprehensive Library for Graph Deep Learning (WWW 2023)
Home Page: https://cogdl.ai
License: MIT License
python scripts/train.py --task unsupervised_node_classification --dataset wikipedia --model gcn gat deepwalk node2vec hope netmf netsmf prone
我是运行的上面的命令
出现如下的错误。环境都安装好了 macOS +torch1.4.0 无cuda,在笔记本本地跑的。
`Traceback (most recent call last):
File "scripts/train.py", line 75, in
results = [main(args) for args in variant_args_generator(args, variants)]
File "scripts/train.py", line 75, in
results = [main(args) for args in variant_args_generator(args, variants)]
File "scripts/train.py", line 26, in main
task = build_task(args)
File "/Users/XXX/cogdl-master/cogdl/tasks/init.py", line 48, in build_task
return TASK_REGISTRYargs.task
File "/Users/XXX/cogdl-master/cogdl/tasks/unsupervised_node_classification.py", line 56, in init
self.model = build_model(args)
File "/Users/XXX/cogdl-master/cogdl/models/init.py", line 108, in build_model
return MODEL_REGISTRY[args.model].build_model_from_args(args)
File "/Users/XXX/cogdl-master/cogdl/models/nn/gcn.py", line 71, in build_model_from_args
return cls(args.num_features, args.hidden_size, args.num_classes, args.dropout)
File "/Users/XXX/cogdl-master/cogdl/models/nn/gcn.py", line 76, in init
self.gc1 = GraphConvolution(nfeat, nhid)
File "/Users/zengyujian/cogdl-master/cogdl/models/nn/gcn.py", line 20, in init
self.weight = Parameter(torch.FloatTensor(in_features, out_features))
TypeError: new() received an invalid combination of arguments - got (NoneType, int), but expected one of:
It seems that it is complex to implement other link prediction model based cogdl due to the design of HomoLinkPrediction task, such as https://github.com/rusty1s/pytorch_geometric/blob/master/examples/link_pred.py. Can you give me some ideas? Thanks.
In LINE's code, there might be a minor error in negative sampling part (if I understand it correctly).
https://github.com/THUDM/cogdl/blob/a69a969020b8aa41cfcd8ac54511984bc5b32d62/cogdl/models/emb/line.py#L133-L137
If index j for negative samples start at 1, then the number of negative sample should be self.negative-1. For example, if you set self.negative=5, 0 is not the negative sample (since it is skipped in the for loop) and 1,2,3,4 are negative samples drawn by alias algorithm. And I also checked original implementation by Jian Tang, the range of negative sampling is set as negative+1 (please see below).
https://github.com/tangjianpku/LINE/blob/d5f840941e0f4026090d1b1feeaf15da38e2b24b/linux/line.cpp#L332-L348
Some other suggestions:
我想应用于入侵检测,入侵检测中的网络数据集要怎么处理呢?
KDDCup '99': http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
NSL-KDD: http://www.unb.ca/cic/datasets/nsl.html
UNSW-NB15: https://www.unsw.adfa.edu.au/australian-centre-for-cyber-security/cybersecurity/ADFA-NB15-Datasets/
IDS2017: https://www.unb.ca/cic/datasets/ids-2017.html
Hello,
I have been trying out different graph classification models on the available datasets. Running some models on a number of datasets for this task generates 'RuntimeError: There were no tensor arguments to this function' in the beginning or middle of the training phase. In addition, I haven't been able to observe the accuracy that is provided in the README file from training the given models. So I'm wondering what the problem could be, whether it is a bug or a dependancy problem, since I'm also trying to get the results for my newly added graph classification model. Thank you
What are the requirements for the configuration of the computer and the pyTorch version of the library?
when I run the example of combaining model,dataset and task in the tutorial, the pycharm return "Using backend: pytorch" and there is nothing about the ret.
Error:
Using backend: pytorch
Like what Euler does: Preparing-Data
so, is there any result when using dgcnn for pointcloud segmention, hope you add to your leadboard, thanks.
@neozhangthe1
Hi, I'm very curious about your project. I would like to consult with your the problem what is the difference from the geometrics.
And I found that the documents are incomplete and lace some introduction about how to use. And I expect the update the documents.
thank you!
This happens for undirected networks like PPI. The link prediction task class
reads the adjacency matrix directly without removing duplicates, this means the the edge list would have (x, y) and (y, x) at the same time for every edge.
(x, y) and (y, x) are referring to the same edge, producing the same cosine similarity for x and y in evaluation later. However, the training-test splitting process treats them as independent edges. As a result, a great portion of the generated test edges are also in the training set, just with a reversed order of two ends.
I have tried resolving this label leakage issue, and I can only get about 0.8 ROC-AUC on PPI instead of over 0.9 reported in your leader board.
Dear author, when I tried run unsupervised_node_classification task on graphsage model, this error is shown as follow:
Traceback (most recent call last):
File "train.py", line 76, in
results = pool.map(main, variant_args_generator())
File "/home/mli/.localpython/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/mli/.localpython/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
TypeError: new() received an invalid combination of arguments - got (NoneType, int), but expected one of:
Traceback (most recent call last):
File "train.py", line 14, in
from cogdl import options
ModuleNotFoundError: No module named 'cogdl'
What's the matter, please
Cloning into 'cogdl'...
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
HI, my environment is linux, tesla K40c, pytorch1.4, cuda101, python3.7.
When I run python gcn.py, the error information is
Traceback (most recent call last):
File "gcn.py", line 29, in
task = build_task(args, dataset=dataset, model=model)
File "/home/hsc/Desktop/wyt/cogdl-master/cogdl/tasks/init.py", line 54, in build_task
return TASK_REGISTRY[args.task](args, dataset=dataset, model=model)
File "/home/hsc/Desktop/wyt/cogdl-master/cogdl/tasks/node_classification.py", line 35, in init
args.num_features = dataset.num_features
File "/home/hsc/Desktop/wyt/anaconda3/envs/pytorch_1.4/lib/python3.7/site-packages/torch_geometric/data/dataset.py", line 117, in num_features
return self.num_node_features
File "/home/hsc/Desktop/wyt/anaconda3/envs/pytorch_1.4/lib/python3.7/site-packages/torch_geometric/data/dataset.py", line 112, in num_node_features
return self[0].num_node_features
File "/home/hsc/Desktop/wyt/anaconda3/envs/pytorch_1.4/lib/python3.7/site-packages/torch_geometric/data/dataset.py", line 189, in getitem
data = data if self.transform is None else self.transform(data)
File "/home/hsc/Desktop/wyt/anaconda3/envs/pytorch_1.4/lib/python3.7/site-packages/torch_geometric/transforms/target_indegree.py", line 27, in call
deg = degree(col, data.num_nodes)
File "/home/hsc/Desktop/wyt/anaconda3/envs/pytorch_1.4/lib/python3.7/site-packages/torch_geometric/utils/degree.py", line 20, in degree
out = torch.zeros((num_nodes), dtype=dtype, device=index.device)
RuntimeError: CUDA error: no kernel image is available for execution on the device
K40c has a compute capability of 3.5. Does it not support PyTorch 1.4?
Thanks!
I'm using oagbert for sentence embedding.
I want to obtain the embedding of a list of sentences ( len(corpus)=124
) so I use (as the example) the following code:
tokenizer, bert_model = oagbert()
tokens = tokenizer(corpus, return_tensors="pt", padding=True)
batch_embeddings = bert_model(**tokens)
embeddings = embeddings[1]
embedding should be the a torch.Size([124, 768])
but instead the bert_model(**tokens)
says:
Killed
Currrently, CogDL downloads the missing datasets on the fly. However, some servers are installed in an environment isolated from the Internet. It is inconvenient to use CogDL in such environment.
A script to download all the needed datasets into a local directory will be very helpful. Users can then upload the local directory to the remote server only once.
Thanks for considering the advice.
Hi,
Our current implementations of data preprocessing and data loading are borrowed from pyg. This part needs refactor before release.
Can I output the embedding vectors (contex vectors) as a file ?
Add AS-GCN for comparison with Dr-GCN
Hi,
I am trying to setup cogdl in a virtual environment of a cluster. Can you provide setup instruction to do so!!
Thanks,
Ajay Madhavan
kuogeniubi!
Hi. When evaluating the performance of node classification, why LINE, NetMF, ProNE has the same result every time? For example, if use Wikipedia dataset on NetMF, It's always going to be this,
| ('wikipedia', 'netmf') | 0.4373±0.0000 | 0.4747±0.0000 | 0.4883±0.0000 | 0.4953±0.0000 | 0.5022±0.0000 |
Looking forward to your reply, Thanks!
Hi,I just installed cogdl and tried to run a demo,but I found is seems that unsupervised_graph_classification model and dataset are missing,for example ,gin and infograph
So,maybe something is wrong with my code?
My code is :
experiment(task="unsupervised_graph_classification", dataset="proteins", model="infograph")
python train.py --task node_classification --dataset wikipedia --model gcn
AttributeError: 'NoneType' object has no attribute 'dim'
Hi,
Which one of the 4 node classification tasks should be used for the model 'deep walk', 'node2vec','NetMF', and 'NetSMF'?.
python scripts/train.py --task node_classification --dataset cora --model gcn
Using backend: pytorch
Namespace(cpu=False, dataset=['cora'], device_id=[0], dropout=0.5, enhance=None, hidden_size=64, lr=0.01, max_epoch=500, model=['gcn'], num_classes=None, num_features=None, patience=100, save_dir='.', seed=[1], task='node_classification', weight_decay=0.0005)
Traceback (most recent call last):
File "scripts/train.py", line 101, in
results = [main(args) for args in variant_args_generator(args, variants)]
File "scripts/train.py", line 101, in
results = [main(args) for args in variant_args_generator(args, variants)]
File "scripts/train.py", line 29, in main
task = build_task(args)
File "/Users/happywei/Desktop/code/cogdl-master/cogdl/tasks/init.py", line 49, in build_task
return TASK_REGISTRYargs.task
File "/Users/happywei/Desktop/code/cogdl-master/cogdl/tasks/node_classification.py", line 66, in init
self.data.apply(lambda x: x.to(self.device))
File "/Users/happywei/opt/anaconda3/lib/python3.7/site-packages/torch_geometric/data/data.py", line 308, in apply
self[key] = self.apply(item, func)
File "/Users/happywei/opt/anaconda3/lib/python3.7/site-packages/torch_geometric/data/data.py", line 287, in apply
return func(item)
File "/Users/happywei/Desktop/code/cogdl-master/cogdl/tasks/node_classification.py", line 66, in
self.data.apply(lambda x: x.to(self.device))
File "/Users/happywei/opt/anaconda3/lib/python3.7/site-packages/torch/cuda/init.py", line 186, in _lazy_init
_check_driver()
File "/Users/happywei/opt/anaconda3/lib/python3.7/site-packages/torch/cuda/init.py", line 61, in _check_driver
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
我配置的环境是基于CPU版本的PyTorch 1.6,请问只有配置CUDA才能运行吗?
Hi,
When I choose schema "0-1-0,0-1-2-1-0”, not all nodes are included in walks.
It will throw error:
KeyError: "word '6931' not in vocabulary"
How to solve this problem?Thanks!
Thanks for open sourcing this wonderful repo. Is it possible to directly learn and save embedding files for customized dataset WITHOUT running evaluation tasks? Can I do this via command line ? Thanks
cogdl/tasks/node_classification.py, line 93 may be wrong? it seems that the missing_rate is not used.
if args.missing_rate >= 0:
if args.model == "sgcpn":
assert args.dataset in ["cora", "citeseer", "pubmed"]
dataset.data = preprocess_data_sgcpn(dataset.data, normalize_feature=True, missing_rate=0)
adj_slice = torch.tensor(dataset.data.adj.size())
adj_slice[0] = 0
dataset.slices["adj"] = adj_slice
Hi, does anyone know where the DBLP dataset with 51264 nodes and 127,968 edges is? (mentioned in this page: https://keg.cs.tsinghua.edu.cn/cogdl/datasets.html)
Hi,
I am wondering if this is limited to the datasets you have made available. Is there any documentation for how to format a new graph dataset to test?
I apologize as this is likely not a code base issue, but the slack invite link is broken.
Thanks,
Kayla
Not all packages in setup.py
of cogdl
are actually needed for end-users, and setuptools
supports split requirements into install_requires
, setup_requires
and tests_requires
.
It will great to move packages like pytest
and spinx
out of install_requires
, the former should be in tests_requires
, and the latter should be removed because doc
folder has its own requirements.txt
.
Hi,
CogDL是很好的工作包,请问怎样获得结点分类模型对每个结点的预测值?如用GAT在Cora上执行半监督结点分类,希望得到输出尺寸为(2708,7)的预测矩阵,而不是简单的测试精度。
Cloning into 'cogdl'...
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
I am sorry to bother you, could you help me? when i run the command:
python scripts/train.py -dt cora --model graphsage -t node_classification
, i get this error:
Traceback (most recent call last):
File "/home/xxx/anaconda3/envs/gpu/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/xxx/anaconda3/envs/gpu/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/xxx/cogdl/scripts/parallel_train.py", line 34, in main
result = task.train()
File "/home/xxx/cogdl/cogdl/tasks/node_classification.py", line 52, in train
self._train_step()
File "/home/xxx/cogdl/cogdl/tasks/node_classification.py", line 80, in _train_step
self.model(self.data.x, self.data.edge_index)[self.data.train_mask],
File "/home/xxx/anaconda3/envs/gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/xxx/cogdl/cogdl/models/nn/graphsage.py", line 86, in forward
x = self.convs[i](x, edge_index_sp)
File "/home/xxx/anaconda3/envs/gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
TypeError: forward() missing 1 required positional argument: 'num_nodes'
I want to use the function display_data.py, I get the error:
Traceback (most recent call last):
File "display_data.py", line 75, in <module>
random.seed(args.seed)
File "/Users/wangzhikai/.conda/envs/cogdl/lib/python3.7/random.py", line 126, in seed
super().seed(a)
TypeError: unhashable type: 'list'
The current GraphSAGE implementation sample neighbors for every nodes in one batch. This only works for toy datasets (e.g., Cora), not for larger datasets (e.g., Reddit)
Loading CogDL will report Ninja is required to load C++ extensions
, which could be traced back to https://github.com/THUDM/cogdl/blob/94b8bc9bf8215f63eb1f0471457ca831817a9bf3/cogdl/operators/sample.py#L14
Similarly, the following code also has the same problem.
https://github.com/THUDM/cogdl/blob/94b8bc9bf8215f63eb1f0471457ca831817a9bf3/cogdl/operators/spmm.py#L26
It might be better to add extra message to the print(e)
to identify the "Ninja" message is thrown here. Since the first line will be reached as soon as cogdl is imported, it might confuse the users that do not use this module.
Hi,
Hi, thanks for the great job!
I simplify this project and remove the torch dependency, and plan to implement it under tf.
Now I have finished the algorithms of emb folder. These are pure Python implementations. But my test results is poor. I find the default parameters in codes differ from the readme. So, could you please clarify the parameters of the results displayed on your website?
Thanks a lot.
It seems cannot use the command "--task unsupervised_node_classification --dataset cora --model graphsage --seed 0 1 2 3 4" to run the unsupervised graphsage
非常感谢你们开发并开源cogdl,请问怎样获得结点分类模型分分类报告而不是只显示acc呢?
十分感谢!!
when use youtube dataset to do unsupervised node classification. It will throw error:
"...cogdl\cogdl\tasks\unsupervised_node_classification.py", line 54, in __init__
self.num_nodes, self.num_classes = self.data.y.shape
AttributeError: 'NoneType' object has no attribute 'shape'
You were the last person to make significant edits to the TU dataloader. I am getting an error when loading what I believe to be correctly formatted data. I've been trying to debug for a while now. Any idea what is happening? I've attached the error along with my dataset. No worries if you don't have time to investigate, figured I'd ask in case I am missing something straightforward :)
Thanks a lot.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.