Comments (10)
Hi Zhimeng,
Thank you for your question! There are three reasons as follows.
- The OOD generalization problem has not been theoretically solved by DIR, i.e., the lack of guarantee leads to relatively random results with large variances.
- The GOOD-Motif dataset is designed as a sanity check that can exaggerate the OOD problem in under structural shifts.
- The leaderboard 1.1.0 on latest datasets have larger hyperparameter spaces and more runs for hyperparameter sweeping, which leads to new but more statistical significant results. However, it cannot guarantee better results, e.g., you can notice that DIR's performances on the basis-covariate split are different (39.99 on the leaderboard v.s. 61.50 on the paper), which also reflects my first point.
Best,
Shurui Gui
from good.
Hi,
Hello, thank you for creating GOOD, which has been incredibly helpful. I have a similar question. Is the strong performance of DIR on the leaderboard attributed to your tuning it across a broader range of hyperparameters?
The tuning process is an automatic process without my interferences. The broader range is only a part of the reason but is not the most important factor. The most significant problem is that DIR strategy cannot guarantee a sucessful subgraph discovery, making its results on this sanity check unspecified, i.e., it has a high hyperparameter sensitivity in this scenario. If one runs the hyperparameter sweeping, one may notice that the performance gap between its best and second best results can be huge.
Best,
Shurui Gui
from good.
Hi Zhimeng,
The discrepancy in performance, specifically for DIR, is primarily due to its unstable performance across runs.
Yes, partially. It is not just across runs, but also across different hyperparameters (high sensitivity).
You have runed many times for DIR. The results presented in Table 13 are based on an earlier version, while the leaderboard displays the most recent outcomes.
Yes. The leaderboard results are the latest results. We haven't updated the paper to reflect them.
There have been no modifications or updates to the datasets between these two sets of results.
Could you please confirm my understanding on these points?
Yes. Both GOOD-Motif datasets are the same.
Best,
Shurui
from good.
Hi,
Hi, Thank you! Do you have some insights about why DIR is not stable compared to other methods in the leaderboard?
Thank you for your question! Since you are interested in this insight, I'd like to redirect you to our work LECI. Specifically, you may find Figure 4 and Table 8 useful. In brief, the training of subgraph discovery networks adds one more degree of freedom (structure disentanglement), so without guarantees, the generalization results are unspecified.
In addition, it is critical to note that these synthetic datasets are sanity checks that exaggerate the OOD problems. You may test your initial theory and implementation on them. If your theory is right, you can obtain much higher results. The easiest way to validate is using test domain validations as shown in Table 10 in LECI. Generally, without appropriate theoretical guarantees, the method cannot pass the sanity check even using the test domain validation as we observed.
Best,
Shurui
from good.
Hi Zhimeng,
You are most welcome!
What was the rationale behind setting different spurious ratios and then combining the three sets? Why not employ a single spurious ratio for the entire training set?
The original purpose is to simulate a scenario in the real world where one can collect data from several environments. Although these environments include data distribution with similar biases, the degrees of the biases are different. This information can contribute to the judgment of whether the strong correlation is spurious or not, under the assumption that data collecting noises from different environments are at the same intensity.
I noticed that val_spurious_ratio is set to 0.3, as opposed to 0. Was this choice made to emulate a more realistic scenario?
This design is also for simulating real-world scenarios in which it is more practical to collect data similar to the test domain than to obtain data with distributions as the same as the test domain. The validation set is a bridge between the training and testing set. Inspired by DomainBed where oracle domain validations can produce better results, we modify this principle by making the obtainment of validation set more practical.
Please let me know if any questions. 😄
Best,
Shurui
from good.
Hello, thank you for creating GOOD, which has been incredibly helpful. I have a similar question. Is the strong performance of DIR on the leaderboard attributed to your tuning it across a broader range of hyperparameters?
from good.
Hi, Thank you! Do you have some insights about why DIR is not stable compared to other methods in the leaderboard?
from good.
The leaderboard 1.1.0 on latest datasets have larger hyperparameter spaces and more runs for hyperparameter sweeping, which leads to new but more statistical significant results. However, it cannot guarantee better results, e.g., you can notice that DIR's performances on the basis-covariate split are different (39.99 on the leaderboard v.s. 61.50 on the paper), which also reflects my first point.
Hi Shurui,
Thank you for shedding light on the differences in the leaderboard results and the paper. My current understanding is:
- The discrepancy in performance, specifically for DIR, is primarily due to its unstable performance across runs.
- You have runed many times for DIR. The results presented in Table 13 are based on an earlier version, while the leaderboard displays the most recent outcomes.
- There have been no modifications or updates to the datasets between these two sets of results.
Could you please confirm my understanding on these points?
Thank you!
Zhimeng
from good.
Hi Shurui,
I really appreciate your timely reply.
Thank you for providing clarity on my previous queries. I have a few more questions, particularly related to the design choices of the GOOD-motif dataset. In the get_basis_concept_shift_list function:
- What was the rationale behind setting different spurious ratios and then combining the three sets? Why not employ a single spurious ratio for the entire training set?
- I noticed that val_spurious_ratio is set to 0.3, as opposed to 0. Was this choice made to emulate a more realistic scenario?
Best,
Zhimeng
from good.
Thank you for your timely and patient responce. It's quite helpful!
Best,
Zhimeng
from good.
Related Issues (20)
- Dependencies conflicts during package installation HOT 4
- ERROR: Cannot install graph-ood and graph-ood==1.1.1 because these package versions have conflicting dependencies. HOT 3
- run CIGA algorithm error HOT 5
- run final_configs yaml got error HOT 10
- Leaderboard results of GOODTwitter HOT 6
- Questions about the GOOD-motif dataset HOT 1
- run CIGA on GOODPCBA dataset got error HOT 2
- Questions about the GOOD-motif dataset HOT 1
- 报错OSError: /home/.local/lib/python3.8/site-packages/torch_sparse/_convert_cuda.so: undefined symbol: _ZN6caffe28TypeMeta21_typeMetaDataInstanceIdEEPKNS_6detail12TypeMetaDataEv HOT 6
- LBAPcore-Configs HOT 2
- Questions about the comparison graph HOT 2
- 请问有计划加入GALA,GIL等最新的graph ood方法吗 HOT 4
- Question about applying split method on new datasets HOT 2
- Question about dataset
- How to fix random seeds? HOT 2
- Hi, about environmental partitioning of the dataset HOT 5
- About more metrics HOT 2
- Hello, I encounter some problems when I use your code. .ERROR: 05/06/2024 07:51:52 PM - utils.py - line 87 : 。It seems that there is a problem with the connection. Is it because it's trying to download the dataset by accessing a certain website? I look forward to your reply. Thank you.Traceback (most recent call last): File "/home/lenovo/anaconda3/envs/JX/lib/python3.9/site-packages/urllib3/connection.py", line 198, in _new_conn sock = connection.create_connection( File "/home/lenovo/anaconda3/envs/JX/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection raise err File "/home/lenovo/anaconda3/envs/JX/lib/python3.9/site-packages/urllib3/util/connection.py", line 73, in create_connection sock.connect(sa) OSError: [Errno 101] Network is unreachable。 HOT 1
- hello HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from good.