Giter Site home page Giter Site logo

Each fold is trained for a fixed number of epochs. There is no early stopping based on validation c-index (results wouldn't be fair). about survpath HOT 4 OPEN

liupei101 avatar liupei101 commented on June 12, 2024
Each fold is trained for a fixed number of epochs. There is no early stopping based on validation c-index (results wouldn't be fair).

from survpath.

Comments (4)

liupei101 avatar liupei101 commented on June 12, 2024

Hello, Jaume! Thanks for your impressive work.

You mentioned that the results with early stopping would not be fair. I want to ask if there is evidence for this statement. I am looking for fair and universal ways to evaluate the survival models, since I find that the final observed (or reported) performance is sensitive to how we evaluate.

Concretely, when doing 5-fold cross-validation with a fixed number of epochs (T), just one more training epoch could lead to a significantly different value on the final observed performance. In other words, the result of performance evaluation with 5-fold cross-validation is often sensitive to T. To avoid setting a fixed T, one could choose to adopt early stopping in the training of each fold. This way would still not be fair, as mentioned by you. One possible reason for this is that the size of the validation set is too small to support a reasonable early stopping, from my humble understanding. So, in the face of limited samples in computational pathology, what could be the fairer way to evaluate the predictive models?

Look forward to hearing from you side.

from survpath.

HuahuiYi avatar HuahuiYi commented on June 12, 2024

Hello, Jaume! Thanks for your impressive work.

You mentioned that the results with early stopping would not be fair. I want to ask if there is evidence for this statement. I am looking for fair and universal ways to evaluate the survival models, since I find that the final observed (or reported) performance is sensitive to how we evaluate.

Concretely, when doing 5-fold cross-validation with a fixed number of epochs (T), just one more training epoch could lead to a significantly different value on the final observed performance. In other words, the result of performance evaluation with 5-fold cross-validation is often sensitive to T. To avoid setting a fixed T, one could choose to adopt early stopping in the training of each fold. This way would still not be fair, as mentioned by you. One possible reason for this is that the size of the validation set is too small to support a reasonable early stopping, from my humble understanding. So, in the face of limited samples in computational pathology, what could be the fairer way to evaluate the predictive models?

Look forward to hearing from you side.

I'm also very interested in this issue. What would be the fairest way to handle it? It seems that using a fixed epoch and an early stopping mechanism with a validation set may not yield the most reliable results.

from survpath.

liupei101 avatar liupei101 commented on June 12, 2024

For someone who is also seeking fair means to configure and evaluate survival analysis models, there are some facts that could be helpful.

  • Early stopping seems frequently adopted in the survival analysis community, e.g., the representative MTLR [1] and DeepHit [2]. In these works, the scale of some datasets is similar to that of WSI-based survival analysis datasets (~1000).
  • Discrete survival models are common, like those used in SurvPath, Patch-GCN, and MCAT. In the survival analysis community, setting the number of discrete times to the square root of uncensored patient numbers is often suggested, as stated in the JMLR paper [3] and the ICML paper [4]. In addition, although the prediction is discrete, the survival time label is still continuous in performance evaluation, e.g., C-Index calculation.
  • SurvPath, Patch-GCN, and MCAT set the number of discrete times to 4 by default. Moreover, their performance metric, C-Index, is calculated using the survival time label after quantile discretization.

[1] Yu, C.-N., Greiner, R., Lin, H.-C., and Baracos, V. Learning patient-specific cancer survival distributions as a sequence
of dependent regressors. Advances in Neural Information Processing Systems, 24:1845–1853, 2011.
[2] Lee, C., Zame, W. R., Yoon, J., and van der Schaar, M. Deephit: A deep learning approach to survival analysis with competing risks. In Thirty-second AAAI conference on artificial intelligence, 2018.
[3] Haider, H., Hoehn, B., Davis, S., and Greiner, R. Effective ways to build and evaluate individual survival distributions.
Journal of Machine Learning Research, 21(85): 1–63, 2020.
[4] Qi, S. A., Kumar, N., Farrokh, M., Sun, W., Kuan, L. H., Ranganath, R., ... & Greiner, R. (2023, July). An Effective Meaningful Way to Evaluate Survival Models. In International Conference on Machine Learning (pp. 28244-28276). PMLR.

from survpath.

HuahuiYi avatar HuahuiYi commented on June 12, 2024

For someone who is also seeking fair means to configure and evaluate survival analysis models, there are some facts that could be helpful.

  • Early stopping seems frequently adopted in the survival analysis community, e.g., the representative MTLR [1] and DeepHit [2]. In these works, the scale of some datasets is similar to that of WSI-based survival analysis datasets (~1000).
  • Discrete survival models are common, like those used in SurvPath, Patch-GCN, and MCAT. In the survival analysis community, setting the number of discrete times to the square root of uncensored patient numbers is often suggested, as stated in the JMLR paper [3] and the ICML paper [4]. In addition, although the prediction is discrete, the survival time label is still continuous in performance evaluation, e.g., C-Index calculation.
  • SurvPath, Patch-GCN, and MCAT set the number of discrete times to 4 by default. Moreover, their performance metric, C-Index, is calculated using the survival time label after quantile discretization.

[1] Yu, C.-N., Greiner, R., Lin, H.-C., and Baracos, V. Learning patient-specific cancer survival distributions as a sequence of dependent regressors. Advances in Neural Information Processing Systems, 24:1845–1853, 2011. [2] Lee, C., Zame, W. R., Yoon, J., and van der Schaar, M. Deephit: A deep learning approach to survival analysis with competing risks. In Thirty-second AAAI conference on artificial intelligence, 2018. [3] Haider, H., Hoehn, B., Davis, S., and Greiner, R. Effective ways to build and evaluate individual survival distributions. Journal of Machine Learning Research, 21(85): 1–63, 2020. [4] Qi, S. A., Kumar, N., Farrokh, M., Sun, W., Kuan, L. H., Ranganath, R., ... & Greiner, R. (2023, July). An Effective Meaningful Way to Evaluate Survival Models. In International Conference on Machine Learning (pp. 28244-28276). PMLR.

Hi, Pei!
I have a question. If I am not mistaken, it seems that in SurvPath, during 5-fold cross-validation, the validation set = the test set. This differs from DeepHit, which splits the dataset into training and test sets in an 8:2 ratio and then performs 5-fold cross-validation on the training set. DeepHit's approach explicitly separates the test set and validation set, making the use of early stopping understandable. However, with SurvPath's data splitting method, how can we ensure that the test set remains unknown and is not leaked when using early stopping?

I believe the current data splitting method has issues. Another drawback is that some datasets have very small sample sizes, and perhaps a few-shot approach might be more appropriate.

from survpath.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.