Comments (4)
Hello, Jaume! Thanks for your impressive work.
You mentioned that the results with early stopping would not be fair. I want to ask if there is evidence for this statement. I am looking for fair and universal ways to evaluate the survival models, since I find that the final observed (or reported) performance is sensitive to how we evaluate.
Concretely, when doing 5-fold cross-validation with a fixed number of epochs (T
), just one more training epoch could lead to a significantly different value on the final observed performance. In other words, the result of performance evaluation with 5-fold cross-validation is often sensitive to T
. To avoid setting a fixed T
, one could choose to adopt early stopping in the training of each fold. This way would still not be fair, as mentioned by you. One possible reason for this is that the size of the validation set is too small to support a reasonable early stopping, from my humble understanding. So, in the face of limited samples in computational pathology, what could be the fairer way to evaluate the predictive models?
Look forward to hearing from you side.
from survpath.
Hello, Jaume! Thanks for your impressive work.
You mentioned that the results with early stopping would not be fair. I want to ask if there is evidence for this statement. I am looking for fair and universal ways to evaluate the survival models, since I find that the final observed (or reported) performance is sensitive to how we evaluate.
Concretely, when doing 5-fold cross-validation with a fixed number of epochs (
T
), just one more training epoch could lead to a significantly different value on the final observed performance. In other words, the result of performance evaluation with 5-fold cross-validation is often sensitive toT
. To avoid setting a fixedT
, one could choose to adopt early stopping in the training of each fold. This way would still not be fair, as mentioned by you. One possible reason for this is that the size of the validation set is too small to support a reasonable early stopping, from my humble understanding. So, in the face of limited samples in computational pathology, what could be the fairer way to evaluate the predictive models?Look forward to hearing from you side.
I'm also very interested in this issue. What would be the fairest way to handle it? It seems that using a fixed epoch and an early stopping mechanism with a validation set may not yield the most reliable results.
from survpath.
For someone who is also seeking fair means to configure and evaluate survival analysis models, there are some facts that could be helpful.
- Early stopping seems frequently adopted in the survival analysis community, e.g., the representative MTLR [1] and DeepHit [2]. In these works, the scale of some datasets is similar to that of WSI-based survival analysis datasets (~1000).
- Discrete survival models are common, like those used in SurvPath, Patch-GCN, and MCAT. In the survival analysis community, setting the number of discrete times to the square root of uncensored patient numbers is often suggested, as stated in the JMLR paper [3] and the ICML paper [4]. In addition, although the prediction is discrete, the survival time label is still continuous in performance evaluation, e.g., C-Index calculation.
- SurvPath, Patch-GCN, and MCAT set the number of discrete times to
4
by default. Moreover, their performance metric, C-Index, is calculated using the survival time label after quantile discretization.
[1] Yu, C.-N., Greiner, R., Lin, H.-C., and Baracos, V. Learning patient-specific cancer survival distributions as a sequence
of dependent regressors. Advances in Neural Information Processing Systems, 24:1845–1853, 2011.
[2] Lee, C., Zame, W. R., Yoon, J., and van der Schaar, M. Deephit: A deep learning approach to survival analysis with competing risks. In Thirty-second AAAI conference on artificial intelligence, 2018.
[3] Haider, H., Hoehn, B., Davis, S., and Greiner, R. Effective ways to build and evaluate individual survival distributions.
Journal of Machine Learning Research, 21(85): 1–63, 2020.
[4] Qi, S. A., Kumar, N., Farrokh, M., Sun, W., Kuan, L. H., Ranganath, R., ... & Greiner, R. (2023, July). An Effective Meaningful Way to Evaluate Survival Models. In International Conference on Machine Learning (pp. 28244-28276). PMLR.
from survpath.
For someone who is also seeking fair means to configure and evaluate survival analysis models, there are some facts that could be helpful.
- Early stopping seems frequently adopted in the survival analysis community, e.g., the representative MTLR [1] and DeepHit [2]. In these works, the scale of some datasets is similar to that of WSI-based survival analysis datasets (~1000).
- Discrete survival models are common, like those used in SurvPath, Patch-GCN, and MCAT. In the survival analysis community, setting the number of discrete times to the square root of uncensored patient numbers is often suggested, as stated in the JMLR paper [3] and the ICML paper [4]. In addition, although the prediction is discrete, the survival time label is still continuous in performance evaluation, e.g., C-Index calculation.
- SurvPath, Patch-GCN, and MCAT set the number of discrete times to
4
by default. Moreover, their performance metric, C-Index, is calculated using the survival time label after quantile discretization.[1] Yu, C.-N., Greiner, R., Lin, H.-C., and Baracos, V. Learning patient-specific cancer survival distributions as a sequence of dependent regressors. Advances in Neural Information Processing Systems, 24:1845–1853, 2011. [2] Lee, C., Zame, W. R., Yoon, J., and van der Schaar, M. Deephit: A deep learning approach to survival analysis with competing risks. In Thirty-second AAAI conference on artificial intelligence, 2018. [3] Haider, H., Hoehn, B., Davis, S., and Greiner, R. Effective ways to build and evaluate individual survival distributions. Journal of Machine Learning Research, 21(85): 1–63, 2020. [4] Qi, S. A., Kumar, N., Farrokh, M., Sun, W., Kuan, L. H., Ranganath, R., ... & Greiner, R. (2023, July). An Effective Meaningful Way to Evaluate Survival Models. In International Conference on Machine Learning (pp. 28244-28276). PMLR.
Hi, Pei!
I have a question. If I am not mistaken, it seems that in SurvPath, during 5-fold cross-validation, the validation set = the test set. This differs from DeepHit, which splits the dataset into training and test sets in an 8:2 ratio and then performs 5-fold cross-validation on the training set. DeepHit's approach explicitly separates the test set and validation set, making the use of early stopping understandable. However, with SurvPath's data splitting method, how can we ensure that the test set remains unknown and is not leaked when using early stopping?
I believe the current data splitting method has issues. Another drawback is that some datasets have very small sample sizes, and perhaps a few-shot approach might be more appropriate.
from survpath.
Related Issues (8)
- the folder structure of dataset HOT 1
- How is the interpretability visualization of the WSI in your article implemented? HOT 6
- Number of Parameters HOT 1
- Discussion on the issue of the basis for the selection of 5-fold cross-validation results and the preservation of trained models. HOT 9
- Data HOT 3
- Some bugs occurred during data loading.
- Cannot reproduce the same results
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from survpath.