Hello Richard, I have a few questions regarding your paper. I hope you can answer

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Question regarding Evaluation about hipt HOT 4 CLOSED

mahmoodlab commented on May 26, 2024

Question regarding Evaluation

from hipt.

Comments (4)

Richarizardd commented on May 26, 2024

Hi @FabianHoerst

Even with regions with large blank spaces / have high background ratio, two-stage HIPT should still learn relevant region-level embeddings. From my own experimentation, inference + evaluation regions with empty patches is fine. However, if you are averaging the region-level embeddings as a proxy for your slide-level embeddings (for large WSIs), you may be blurring the relevant signal in the WSI.
Multiple WSIs per patient are used for survival prediction (risk is predicted at patient-level). CLAM-SB is a variation of Attention MIL with an additional instance subtyping loss that works for discriminative classification task. Using the instance subtyping loss does not translate immediately to regression tasks.

from hipt.

FabianHoerst commented on May 26, 2024

Thanks for clarifying!

from hipt.

clemsgrs commented on May 26, 2024

Hi @Richarizardd, I had a quick follow-up question regarding point 2.
How did you work at patient-level with HIPT when there are multiple slides per patient?

Let's say patient A is mapped to 2 slides, slide1 and slide2:

in slide1, there are M1 [4096, 4096] regions
in slide2, there are M2 [4096, 4096] regions

did you simply extract region-level embeddings for both slides, then fed the last Transformer block the concatenated sequence of embeddings (of length M1 + M2)?
what's the reason behind only using IDCs for survival prediction?
are you still planning to upload the survival code to this repo? (#9)
in MCAT, you discretize survival times into 4 bins (using uncensored patients only), then based on the censorship status (either 0 or 1) you go from 4 to 8 discrete labels (done in these few lines). Yet, you only use the "initial" 4 discrete labels when training: the model outputs 4 logits and the survival dataset returns disc_label (between 0 and 3) and not label (between 0 and 7) (see this line). Why not using the full 8 discrete labels? (or why bother creating 8 labels in the first place?)

from hipt.

clemsgrs commented on May 26, 2024

After having dived into MCAT code, I found at that -- at least in MCAT -- you did concatenate the sequence of embeddings when multiple slides were available. I assume you did the same for survival prediction with HIPT.

I'm trying to reproduce HIPT Table 2 results for IDC, hence using the pre-extracted region-level features you kindly provided under 3-Self-Supervised-Eval/embeddings_slide_lib/embeddings_slide_lib/vit256mean_tcga_slide_embeddings/.
However, features seem to be missing for 61 IDC slides (below a few slide_id with missing features):

TCGA-A2-A0T2-01Z-00-DX1.29A5C4C8-6AE8-44EE-98C2-ACBCBFBE9D60
TCGA-A7-A0CD-01Z-00-DX2.609CED8D-5947-4753-A75B-73A8343B47EC
TCGA-A7-A6VX-01Z-00-DX2.9EE94B59-6A2C-4507-AA4F-DC6402F2B74F
TCGA-A8-A06O-01Z-00-DX1.FA4495B2-5B13-4448-ADCB-EF5316E0955B
TCGA-A8-A06P-01Z-00-DX1.37660D0F-1595-43C5-9D30-58D6CB93B52C
TCGA-A8-A06R-01Z-00-DX1.41476D0D-BA72-4FB8-B143-9EB679F26D28
...

Any idea why these features are missing?
Based on 2-Weakly-Supervised-Survival/splits/5foldcv/tcga_brca/splits_0.csv, these should be used for training / validation.

from hipt.

Question regarding Evaluation about hipt HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent