Giter Site home page Giter Site logo

divelab / good Goto Github PK

View Code? Open in Web Editor NEW
175.0 4.0 19.0 17.33 MB

GOOD: A Graph Out-of-Distribution Benchmark [NeurIPS 2022 Datasets and Benchmarks]

Home Page: https://good.readthedocs.io/

License: GNU General Public License v3.0

Python 100.00%
deep-learning graph-neural-networks out-of-distribution-generalization pytorch graph-ood distribution-shift invariant-learning pytorch-geometric

good's Introduction

logo
Logo by Zhao Xu

The Data Integration, Visualization, and Exploration (DIVE) Laboratory at Texas A&M University is led by Dr. Shuiwang Ji and conducts foundational research in machine learning and deep learning and applies machine learning methods to solve challenging real-world problems in biology, chemistry, neuroscience and medicine.

Highlighted Work

good's People

Contributors

cm-bf avatar divelab avatar hyanan16 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

good's Issues

Questions about the comparison graph

Excuse me, how can I draw a comparison graph between the visually extracted causal diagram and the actual label? such as the paper:"Joint Learning of Label and Environment Causal Independence for Graph Out-of-Distribution Generalization" F.2 Interpretability visualization results
"We provide interpretability [38, 98] visualization results on GOOD-Motif and CFP-Motif. As illustrated in Fig. 8..."
or
Snipaste_2024-01-21_10-43-21

Best regards

Question about applying split method on new datasets

Hi,

I am trying to apply the split method on the HIV dataset to other molecule datasets like BACE.
However, the split results I get are not promising.
The performance on size covariate ood splits is higher than open graph benchmark data split.
I want to know if is there anything I am missing.

Thank you!

Question about dataset

Hi GOOD team,
I see that when mode is test, it is a direct read out of the pt file, which contains the test results. This seems different from the normal form, which I've seen in the past is reading the model and loading the data, then making predictions. That means that the pt only contains the model and not the results, how should I adjust it like that? Thanks!

Best regards,

run CIGA on GOODPCBA dataset got error

$ goodtg --config_path final_configs/GOODPCBA/scaffold/covariate/CIGAv2.yaml
This logger will substitute general print function

INFO: -----------------------------------
Task: train
Thu Oct 5 23:19:49 2023
INFO: Load Dataset GOODPCBA
DEBUG: 10/05/2023 11:19:51 PM : Dataset: {'train': GOODPCBA(262764), 'id_val': GOODPCBA(43792), 'id_test': GOODPCBA(43792), 'val': GOODPCBA(44019), 'test': GOODPCBA(43562), 'task': 'Binary classification', 'metric': 'Average Precision'}
DEBUG: 10/05/2023 11:19:51 PM : Data(x=[21, 9], edge_index=[2, 46], edge_attr=[46, 3], y=[1, 128], smiles='CC1CCN(C(=O)CN2CC(C)Sc3ccccc32)CC1', idx=[1], scaffold='O=C(CN1CCSc2ccccc21)N1CCCCC1', domain_id=[1], env_id=[1])
INFO: Loading model...
DEBUG: 10/05/2023 11:19:51 PM : Config model
DEBUG: 10/05/2023 11:19:53 PM : Load training utils
INFO: Epoch 0:
0%|░░░░░░░░░░░░░░░░░░░░| 0/8212 [00:00<?, ?it/s]tensor(2550, device='cuda:0') torch.Size([32, 300]) torch.Size([32, 128]) torch.Size([32, 128])
0%|░░░░░░░░░░░░░░░░░░░░| 0/8212 [00:01<?, ?it/s]
ERROR: 10/05/2023 11:19:54 PM - utils.py - line 87 : Traceback (most recent call last):
File "/home/cz/miniconda3/envs/py38/bin/goodtg", line 33, in
sys.exit(load_entry_point('graph-ood', 'console_scripts', 'goodtg')())
File "/home/cz/code/GOOD-GOODv1/GOOD/kernel/main.py", line 69, in goodtg
main()
File "/home/cz/code/GOOD-GOODv1/GOOD/kernel/main.py", line 60, in main
pipeline.load_task()
File "/home/cz/code/GOOD-GOODv1/GOOD/kernel/pipelines/basic_pipeline.py", line 231, in load_task
self.train()
File "/home/cz/code/GOOD-GOODv1/GOOD/kernel/pipelines/basic_pipeline.py", line 113, in train
train_stat = self.train_batch(data, pbar) # train_stat是一个字典,包含loss
File "/home/cz/code/GOOD-GOODv1/GOOD/kernel/pipelines/basic_pipeline.py", line 74, in train_batch
loss = self.ood_algorithm.loss_calculate(raw_pred, targets, mask, node_norm, self.config)
File "/home/cz/code/GOOD-GOODv1/GOOD/ood_algorithms/algorithms/CIGA.py", line 81, in loss_calculate
assert self.rep_out.size(0)==targets[mask].size(0), print(mask.sum(),self.rep_out.size(),targets.size(),mask.size())
AssertionError: None

LBAPcore-Configs

Hello,Where should I get the relevant configs for running Lbapcore data?

Circular Import error

Hello @divelab,
n my opinion, this repository's code is amazing and very clean. It is very helpful to use others. My case involves a Circular Import error bug, which I fixed by changing two files in this repository:

  1. GOOD/data/__init__.py file i changed
from .good_datasets import *
from .good_loaders import *

to

from GOOD.data.good_datasets import good_arxiv, good_cbas, good_cmnist, good_cora, good_hiv, good_motif, good_pcba, good_sst2, good_twitch, good_webkb, good_zinc
from GOOD.data.good_loaders import BaseLoader
  1. GOOD/data/good_datasets/__init__.py file i changed
from . import *

to

import good_arxiv, good_cbas, good_cmnist, good_cora, good_hiv, good_motif, good_pcba, good_sst2, good_twitch, good_webkb, good_zinc

Please update the repository if you believe this is a global issue.

ERROR: Cannot install graph-ood and graph-ood==1.1.1 because these package versions have conflicting dependencies.

Hi GOOD Team @CM-BF ,
Thanks for the great dataset. I had some problems like #9 with the installation on Mac.
WeChatefaa5a21f9fafef5c370118e7506f258

I followed the solution you suggested in #9, but it didn't work. I also tried updating typed-argument-parser==1.7.2 and that didn't work either.
Could you help me to solve this problem? Thanks.

This is my pip list:

Package Version


absl-py 1.4.0
alabaster 0.7.13
antlr4-python3-runtime 4.9.3
appnope 0.1.3
asttokens 2.2.1
attrs 22.2.0
Babel 2.12.1
backcall 0.2.0
beautifulsoup4 4.12.0
cachetools 5.3.0
captum 0.2.0
certifi 2022.12.7
charset-normalizer 3.1.0
cilog 1.2.3
cloudpickle 2.2.1
cvxopt 1.3.0
cycler 0.11.0
decorator 5.1.1
dive-into-graphs 0.1.2
docutils 0.17.1
et-xmlfile 1.1.0
executing 1.2.0
filelock 3.10.7
fonttools 4.39.3
gdown 4.7.1
google-auth 2.17.1
google-auth-oauthlib 0.4.6
grpcio 1.53.0
hydra-core 1.3.2
idna 3.4
imagesize 1.4.1
importlib-metadata 6.1.0
importlib-resources 5.12.0
iniconfig 2.0.0
ipython 8.12.0
jedi 0.18.2
Jinja2 3.1.2
joblib 1.2.0
kiwisolver 1.4.4
littleutils 0.2.2
llvmlite 0.39.1
Markdown 3.4.3
MarkupSafe 2.1.2
matplotlib 3.5.2
matplotlib-inline 0.1.6
mpmath 1.3.0
munch 2.5.0
mypy-extensions 1.0.0
networkx 2.8
numba 0.56.4
numpy 1.23.5
oauthlib 3.2.2
ogb 1.3.5
omegaconf 2.3.0
openpyxl 3.1.2
outdated 0.2.2
packaging 23.0
pandas 2.0.0
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.5.0
pip 23.0.1
pluggy 1.0.0
prompt-toolkit 3.0.38
protobuf 3.20.1
psutil 5.9.4
ptyprocess 0.7.0
pure-eval 0.2.2
py 1.11.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
Pygments 2.14.0
pynvml 11.4.0
pyparsing 3.0.9
PySocks 1.7.1
pytest 7.1.2
python-dateutil 2.8.2
pytz 2023.3
PyYAML 6.0
rdkit 2022.9.5
rdkit-pypi 2022.9.5
requests 2.28.2
requests-oauthlib 1.3.1
rsa 4.9
ruamel.yaml 0.17.21
ruamel.yaml.clib 0.2.7
scikit-learn 1.2.2
scipy 1.10.1
setuptools 59.5.0
shap 0.41.0
six 1.16.0
slicer 0.0.7
snowballstemmer 2.2.0
soupsieve 2.4
Sphinx 5.3.0
sphinx-rtd-theme 1.0.0
sphinxcontrib-applehelp 1.0.4
sphinxcontrib-devhelp 1.0.2
sphinxcontrib-htmlhelp 2.0.1
sphinxcontrib-jsmath 1.0.1
sphinxcontrib-qthelp 1.0.3
sphinxcontrib-serializinghtml 1.1.5
stack-data 0.6.2
sympy 1.11.1
tabulate 0.9.0
tensorboard 2.8.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
threadpoolctl 3.1.0
tomli 2.0.1
torch 1.10.1
torch-geometric 2.0.4
torch-scatter 2.0.9
torch-sparse 0.6.13
torchaudio 0.10.1
torchvision 0.11.2
tqdm 4.64.0
traitlets 5.9.0
typed-argument-parser 1.5.4
typing_extensions 4.5.0
typing-inspect 0.8.0
tzdata 2023.3
urllib3 1.26.15
wcwidth 0.2.6
Werkzeug 2.2.3
wheel 0.38.4
zipp 3.15.0

An issue about GOOD-HIV dataset

Hi there,
I wonder is there any other difference between GOOD-HIV and mobg-molhiv except for the split setting? Because I found the dataset size of GOOD-HIV and ogbg-molhiv are equal (41127) and both of them are adapted from MoleculeNet so I assume the full set of them are the same at first.
But if I pre-train a model on the ogbg-molhiv dataset and then finetune it on the GOOD-HIV dataset, an error will be reported. Is this because the node features of GOOD-HIV and ogbg-molhiv are different?
Screen Shot 2022-07-05 at 6 22 26 PM

Screen Shot 2022-07-05 at 6 17 14 PM

Leaderboard results of GOODTwitter

Hi,

Thanks for the code and this is a great work.

I saw you have included a Twitter dataset besides the datasets in the paper, I am wondering if you happen to have the leaderboard results on this dataset as well?

How can one add a new algorithm and benchmark it with GOOD?

Hi GOOD authors,

Thanks for your impressive amount of work on developing the GOOD benchmark. As one who also works with OOD algorithms for graph data, I believe this benchmark could provide more insights for future developments in this field. Recently, I am trying to add and benchmark my graph OOD algorithm (which happens to have the same name as GOOD 🤣) with the GOOD benchmark:

However, when reading through the code and the documents, I find it seems there is no explicit description of the pipeline for one to add a new algorithm and benchmark with GOOD. For example,

  • How can one add OOD algorithm-specific parameters into the pipeline and use them in the code?
  • After adding the new algorithm, how can one evaluate it with all datasets in the GOOD benchmark?
  • How can one sweep the hyperparameters based on the GOOD benchmark?
  • How can one compare different algorithms in a fair way with the GOOD benchmark?

It could facilitate the follow-up developments with the GOOD benchmark if the authors could provide a convenient way along with an explicit description for adding and benchmarking new algorithms. Going through the OOD literature, I find DomainBed seems to be a good example that provides both a rigorous evaluation and a convenient new algorithm integration pipeline. I believe the GOOD benchmark would make much more impact on the community if you could add the corresponding features to the existing code : )

Best Regards,
Andrew

How to obtain results of multiple runs

Hi GOOD Team,

Thanks for the great library. I have been successfully ran goodtg but I found it only runs for one time and reports the best epoch on validation. What's the best way to reproduce paper's result on 10 random runs ?

INFO: pip is looking at multiple versions of wheel to determine which version is compatible with other requirements

Hi GOOD authors,
Thank you for your contribution to the research on the out of distribution generalization on the graph. I meet an issue when I install GOOD through 'pip install -e .', and it shows 'INFO: pip is looking at multiple versions of wheel to determine which version is compatible with other requirements. This could take a while'. I can't find a good solution.Are there some ways to solve this problem?

Best Regards,
xuyike

How to fix random seeds?

Every time I run it out the results are different. Tried setting random seeds in both the args, and yaml files to no avail...

run CIGA algorithm error

when i run !goodtg --config_path GOOD_configs/GOODCMNIST/color/concept/CIGA.yaml in colab:


2023-05-07` 13:42:38.808831: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-07 13:42:40.848157: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
This logger will substitute general print function

INFO: -----------------------------------
    Task: train
Sun May  7 13:42:43 2023
INFO: Load Dataset GOODCMNIST
Downloading...
From: https://drive.google.com/uc?id=1F2r2kVmA0X07AXyap9Y_rOM6LipDzwhq
To: /content/GOOD/storage/datasets/GOODCMNIST.zip
100%|##########| 719M/719M [00:03<00:00, 225MB/s]
Extracting /content/GOOD/storage/datasets/GOODCMNIST.zip
DEBUG: 05/07/2023 01:43:41 PM : Dataset: {'train': GOODCMNIST(29400), 'id_val': GOODCMNIST(6300), 'id_test': GOODCMNIST(6300), 'val': GOODCMNIST(14000), 'test': GOODCMNIST(14000), 'task': 'Multi-label classification', 'metric': 'Accuracy'}
DEBUG: 05/07/2023 01:43:41 PM :  Data(x=[75, 3], edge_index=[2, 1367], y=[1], pos=[75, 2], color=[1], env_id=[1])
INFO: Loading model...
DEBUG: 05/07/2023 01:43:41 PM : Config model
DEBUG: 05/07/2023 01:43:45 PM : Load training utils
INFO: Epoch 0:
/usr/local/lib/python3.10/dist-packages/torch_geometric/data/in_memory_dataset.py:182: UserWarning: It is not recommended to directly access the internal storage format `data` of an 'InMemoryDataset'. The data of the dataset is already cached, so any modifications to `data` will not be reflected when accessing its elements. Clearing the cache now by removing all elements in `dataset._data_list`. If you are absolutely certain what you are doing, access the internal storage via `InMemoryDataset._data` instead to suppress this warning. Alternatively, you can access stacked individual attributes of every graph via `dataset.{attr_name}`.
  warnings.warn(msg)
/usr/local/lib/python3.10/dist-packages/torch_geometric/data/in_memory_dataset.py:182: UserWarning: It is not recommended to directly access the internal storage format `data` of an 'InMemoryDataset'. If you are absolutely certain what you are doing, access the internal storage via `InMemoryDataset._data` instead to suppress this warning. Alternatively, you can access stacked individual attributes of every graph via `dataset.{attr_name}`.
  warnings.warn(msg)
  0%|░░░░░░░░░░░░░░░░░░░░| 0/230 [00:01<?, ?it/s]
ERROR: 05/07/2023 01:43:47 PM - utils.py - line 87 : Traceback (most recent call last):
  File "/usr/local/bin/goodtg", line 33, in <module>
    sys.exit(load_entry_point('graph-ood', 'console_scripts', 'goodtg')())
  File "/content/GOOD/GOOD/kernel/main.py", line 69, in goodtg
    main()
  File "/content/GOOD/GOOD/kernel/main.py", line 60, in main
    pipeline.load_task()
  File "/content/GOOD/GOOD/kernel/pipelines/basic_pipeline.py", line 231, in load_task
    self.train()
  File "/content/GOOD/GOOD/kernel/pipelines/basic_pipeline.py", line 113, in train
    train_stat = self.train_batch(data, pbar)
  File "/content/GOOD/GOOD/kernel/pipelines/basic_pipeline.py", line 71, in train_batch
    model_output = self.model(data=data, edge_weight=edge_weight, ood_algorithm=self.ood_algorithm)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/GOOD/GOOD/networks/models/CIGAGNN.py", line 73, in forward
    causal_rep = self.get_graph_rep(
  File "/content/GOOD/GOOD/networks/models/CIGAGNN.py", line 108, in get_graph_rep
    return self.feat_encoder(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/GOOD/GOOD/networks/models/GINs.py", line 94, in forward
    out_readout = self.encoder(x, edge_index, batch, batch_size, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/GOOD/GOOD/networks/models/GINvirtualnode.py", line 118, in forward
    post_conv = self.dropout1(self.relu1(self.batch_norm1(self.conv1(x, edge_index))))
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch_geometric/nn/conv/gin_conv.py", line 80, in forward
    out = self.propagate(edge_index, x=x, size=size)
  File "/usr/local/lib/python3.10/dist-packages/torch_geometric/nn/conv/message_passing.py", line 476, in propagate
    explain_msg_kwargs = self.inspector.distribute(
  File "/usr/local/lib/python3.10/dist-packages/torch_geometric/nn/conv/utils/inspector.py", line 54, in distribute
    for key, param in self.params[func_name].items():
KeyError: 'explain_message'

run final_configs yaml got error

!goodtg --config_path /content/GOOD/configs/final_configs/GOODMotif/basis/concept/GSAT.yaml
got the problem:

ERROR: 06/08/2023 12:27:43 PM - utils.py - line 87 : Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/munch/init.py", line 103, in getattr
return object.getattribute(self, k)
AttributeError: 'Munch' object has no attribute 'clean_save'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/munch/init.py", line 106, in getattr
return self[k]
KeyError: 'clean_save'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/bin/goodtg", line 33, in
sys.exit(load_entry_point('graph-ood', 'console_scripts', 'goodtg')())
File "/content/GOOD/GOOD/kernel/main.py", line 69, in goodtg
main()
File "/content/GOOD/GOOD/kernel/main.py", line 60, in main
pipeline.load_task()
File "/content/GOOD/GOOD/kernel/pipelines/basic_pipeline.py", line 231, in load_task
self.train()
File "/content/GOOD/GOOD/kernel/pipelines/basic_pipeline.py", line 155, in train
self.save_epoch(epoch, epoch_train_stat, id_val_stat, id_test_stat, val_stat, test_stat, self.config)
File "/content/GOOD/GOOD/kernel/pipelines/basic_pipeline.py", line 412, in save_epoch
if config.clean_save:
File "/usr/local/lib/python3.10/dist-packages/munch/init.py", line 108, in getattr
raise AttributeError(k)
AttributeError: clean_save

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.