Hi, thank you for a great paper. I would like to ask some questions. <p dir="a

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a href="https://github.com/mahmoodlab/HIPT/tree/master/2-Weakly-Supervised-Subtyping/

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Some questions about hipt HOT 8 CLOSED

mahmoodlab commented on May 26, 2024

Some questions

from hipt.

Comments (8)

Richarizardd commented on May 26, 2024 2

Hi @Khoa-NT - Thank you for your interest in this repository

We uploaded the training scaffold code to ./2-Weakly-Supervised-Subtyping/. Most of the training scaffold code is heavily borrowed from CLAM. For the survival code (MCAT) and future works that we release that continuing to build off of CLAM, we will just point to the original scaffold code w/ updated models that we have added.

Model walkthrough code is found here for Three-Stage HIPT.

The eval_linear was from the original DINO repository, which we did not touch.

As hopefully made transparent in the model walkthrough code, the Three-Stage HIPT only trains the last vision transformer stage (for aggregating $x_{4096}$ features). The input into this HIPT model uses pre-extracted $x_{256}$ features, but unlike [M_256 x D] tensor bag-like inputs used in CLAM and most other models (where $M_{256}$ is the number of 256-sized patches in the WSI, $D=384$ for ViT-16) , the input shape in this setting is [M_4K x 256 x D], where $M_{4K}$ is the number of 4K-sized patches and 256 is the number of 256-sized patches in a single 4K-sized patch. To create [M_4K x 256 x D], for the $M_{4K}$ $x_{4096}$ images in your WSI, you can extract [1 x 256 x D] features for each $x_{4096}$ image using Two-Stage HIPT with loaded weights, then torch.vstack these tensors along the zeroth dimension.

On MIL and using HIPT features - since the main HIPT method we evaluate uses pretrained + pre-extracted features at the lower levels, we can think of Three-Stage HIPT as just doing MIL with pre-extracted Two-Stage HIPT features (of 4K resolution), meaning that you can just do MIL on [M_4K x D] tensors for each WSI (where $D=192$ at this stage, the shape output of HIPT_4K). In the CLAM scaffold code that we experimented with, we still used pre-extracted $x_{256}$ features as input, which was important for doing ablation experiments of Three-Stage HIPT without pretraining. Lots of fun variations to try!

Will standardize the naming convention of the scripts.
This may be a typo - it should say ViT-4096. Will update the arXiv soon

from hipt.

Richarizardd commented on May 26, 2024 2

Hi @invoker-LL - given the breadth of the experiments already in the paper, we did not run ablation experiments with other feature embedding types. There are many permutations of experiments going down that path (not only ResNet-50 evaluation for every model and for cancer type, but also doing Hierarchical Pretraining of ResNet-50 features), some of which is partially explored already in our previous NeurIPS work. The exact splits in the NeurIPS work is not the same as this work due to missing data w/ patching all slides at 4K resolution, but you can see the overall trend of ViT-16 doing better than ResNet-50.

Overall, the focus of this paper is primarily looking at architecture + hierarchical pretraining. The experiments you propose sound reasonable, which can be performed in follow-up work.

You can see the modification here for CLAM using VIT-16 features (of 384-dim) (we use a size of 384->384->256).

from hipt.

Richarizardd commented on May 26, 2024 2

See these files, which contains the total list of WSIs evaluated. Though there are more patients included in the 10foldcv_subtype/tcga_brca/split_{i}.csv split, patients with insufficient tissue content for patching at the 4K-level and are thus missing in the pt_files-type folder, get excluded / masked out when slicing the dataframe.

I should update the split csv files to clarify this confusion.

from hipt.

invoker-LL commented on May 26, 2024

Hi @Khoa-NT - Thank you for your interest in this repository

We uploaded the training scaffold code to ./2-Weakly-Supervised-Subtyping/. Most of the training scaffold code is heavily borrowed from CLAM. For the survival code (MCAT) and future works that we release that continuing to build off of CLAM, we will just point to the original scaffold code w/ updated models that we have added.

Model walkthrough code is found here for 3-stage HIPT.

The eval_linear was from the original DINO repository, which we did not touch.

As hopefully made transparent in the model walkthrough code, the 3-Stage HIPT only trains the last vision transformer stage (for aggregating x4096 features). The input into this HIPT model uses pre-extracted x256 features, but unlike [M_256 x D] tensor bag-like inputs used in CLAM and most other models (where M256 is the number of 256-sized patches in the WSI) , the input shape in this setting is [M_4K x 256 x D], where M4K is the number of 4K-sized patches and 256 is the number of 256-sized patches in a single 4K-sized patch. To create [M_4K x 256 x D], for the M4Kx4096 images in your WSI, you can extract [1 x 256 x D] features from each x_{4096} image using Two-Stage HIPT with loaded weights, then stack these tensors along the zeroth dimension.

To point something else out, since the main HIPT method we evaluate uses pretrained + pre-extracted features at the lower level, we can think of 3-Stage-HIPT as just doing MIL with pretrained 2-Stage-HIPT features (of 4K resolution). However, still using pre-extracted x256 features was important for doing ablation experiments in 3-Stage-HIPT without pretraining. Lots of fun variations to try!

Will standardize the naming convention of the scripts.

This may be a typo - it should say ViT-4096. Will update the arXiv soon

Can you share more details about the subtyping training or how to modify CLAM to adapt the HIPT features instead of RESNET-50(layer3's pooling, N1024 dim)? Especially the beginning linear layers with 1024->512->256 dim in CLAM for HIPT features N384? Also I find that the there is also no train/evaluation result with traditional resnet-50 + CLAM baseline.

from hipt.

invoker-LL commented on May 26, 2024

Another question is that you collected about 1040 WSIs of TCGA-BRCA, but in the 10-fold- train/val/test splits sheet, the summation of which is only about 880. Why and how to make such 1040->880 subsets?

from hipt.

Khoa-NT commented on May 26, 2024

Hi, I would like to ask some more questions about training classification.

The code you provided for training the classification task. E.g., the tcga_brca_subtype

Full List of Training Classification Commands

GPU=0
DATAROOT=/path/to/TCGA_ROOT_DIR/
TASK=tcga_brca_subtype
CUDA_VISIBLE_DEVICES=$GPU python main.py --data_root_dir $DATAROOT --model_type hipt_lgp --task $TASK --prop 0.25
CUDA_VISIBLE_DEVICES=$GPU python main.py --data_root_dir $DATAROOT --model_type hipt_lgp --task $TASK --prop 1.0
CUDA_VISIBLE_DEVICES=$GPU python main.py --data_root_dir $DATAROOT --model_type hipt_lgp --task $TASK --prop 0.25 --pretrain_4k vit4k_xs_dino
CUDA_VISIBLE_DEVICES=$GPU python main.py --data_root_dir $DATAROOT --model_type hipt_lgp --task $TASK --prop 1.0 --pretrain_4k vit4k_xs_dino
CUDA_VISIBLE_DEVICES=$GPU python main.py --data_root_dir $DATAROOT --model_type hipt_lgp --task $TASK --prop 0.25 --pretrain_4k vit4k_xs_dino --freeze_4k
CUDA_VISIBLE_DEVICES=$GPU python main.py --data_root_dir $DATAROOT --model_type hipt_lgp --task $TASK --prop 1.0 --pretrain_4k vit4k_xs_dino --freeze_4k

1/ Following this code, the upper commands represent the below experiments:

$ViT-16_{PF}, ViT-256, ViT-4096$ with 25% and 100% training
$ViT-16_{PF}, ViT-256_{P}, ViT-4096$ with 25% and 100% training
$ViT-16_{PF}, ViT-256_{PF}, ViT-4096$ with 25% and 100% training

Is it right?

2/ Then the code HIPT_None_FC is used for the case $ViT-16_{PF}, AP-256, AP-4096$, right?
And the commands to run are:

CUDA_VISIBLE_DEVICES=$GPU python main.py --data_root_dir $DATAROOT --model_type hipt_n --task $TASK --prop 0.25
CUDA_VISIBLE_DEVICES=$GPU python main.py --data_root_dir $DATAROOT --model_type hipt_n--task $TASK --prop 1.0

Am I correct?

3/ I can't find the definition of pretrain_WSI in this code.
And also there is no definition of self.global_vit in this code.
If I understand correctly, self.global_vit is the $ViT_{WSI}-4096$, right?

4/ Following the Model Walkthrough.ipynb. I can assume that pretrain_WSI='None'.
Then, it means the stage-3 $ViT_{WSI}-4096$ is not a Vision transformer. It's simply just a stack of TransformerEncoderLayer.
Is it correct? or You are still updating the code for finetuning the $ViT_{WSI}-4096$?

As hopefully made transparent in the model walkthrough code, the Three-Stage HIPT only trains the last vision transformer stage (for aggregating x4096 features). The input into this HIPT model uses pre-extracted x256 features, but unlike [M_256 x D] tensor bag-like inputs used in CLAM and most other models (where M256 is the number of 256-sized patches in the WSI, D=384 for ViT-16) , the input shape in this setting is [M_4K x 256 x D], where M4K is the number of 4K-sized patches and 256 is the number of 256-sized patches in a single 4K-sized patch. To create [M_4K x 256 x D], for the M4K x4096 images in your WSI, you can extract [1 x 256 x D] features for each x4096 image using Two-Stage HIPT with loaded weights, then torch.vstack these tensors along the zeroth dimension.

Do you have the code for extracting the pre-extracted $x_{256}$ features to the folder extracted_mag20x_patch256_fp?
I only saw the training code and the create_slide_embeddings.

6/ Is this Normalize and this a typo? because I saw you train the DINO with ImageNet norm.
Or
If you train the HIPT from scratch, you will use mean = (0.5,0.5,0.5) & std = (0.5,0.5,0.5)
and if you train with the pretrain then you use mean = (0.485, 0.456, 0.406) & std = (0.229, 0.224, 0.225)

@Richarizardd can you help me
Thank you for supporting me.

from hipt.

juanigp commented on May 26, 2024

Hi @Khoa-NT @Richarizardd , I would also like to know about the normalization transform.

from hipt.

Richarizardd commented on May 26, 2024

Hi @juanigp @Khoa-NT - The normalization transform is mean = (0.5,0.5,0.5) &std = (0.5,0.5,0.5). I may have accidentally pushed some code that I was playing around with a few months ago (since reverted back). My apologies.

from hipt.

Some questions about hipt HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent