Comments (8)
Hi @Khoa-NT - Thank you for your interest in this repository
- We uploaded the training scaffold code to ./2-Weakly-Supervised-Subtyping/. Most of the training scaffold code is heavily borrowed from CLAM. For the survival code (MCAT) and future works that we release that continuing to build off of CLAM, we will just point to the original scaffold code w/ updated models that we have added.
Model walkthrough code is found here for Three-Stage HIPT.
The eval_linear was from the original DINO repository, which we did not touch.
- As hopefully made transparent in the model walkthrough code, the Three-Stage HIPT only trains the last vision transformer stage (for aggregating
$x_{4096}$ features). The input into this HIPT model uses pre-extracted$x_{256}$ features, but unlike[M_256 x D]
tensor bag-like inputs used in CLAM and most other models (where$M_{256}$ is the number of 256-sized patches in the WSI,$D=384$ for ViT-16) , the input shape in this setting is[M_4K x 256 x D]
, where$M_{4K}$ is the number of 4K-sized patches and 256 is the number of 256-sized patches in a single 4K-sized patch. To create[M_4K x 256 x D]
, for the$M_{4K}$ $x_{4096}$ images in your WSI, you can extract[1 x 256 x D]
features for each$x_{4096}$ image using Two-Stage HIPT with loaded weights, thentorch.vstack
these tensors along the zeroth dimension.
On MIL and using HIPT features - since the main HIPT method we evaluate uses pretrained + pre-extracted features at the lower levels, we can think of Three-Stage HIPT as just doing MIL with pre-extracted Two-Stage HIPT features (of 4K resolution), meaning that you can just do MIL on [M_4K x D]
tensors for each WSI (where
-
Will standardize the naming convention of the scripts.
-
This may be a typo - it should say ViT-4096. Will update the arXiv soon
from hipt.
Hi @invoker-LL - given the breadth of the experiments already in the paper, we did not run ablation experiments with other feature embedding types. There are many permutations of experiments going down that path (not only ResNet-50 evaluation for every model and for cancer type, but also doing Hierarchical Pretraining of ResNet-50 features), some of which is partially explored already in our previous NeurIPS work. The exact splits in the NeurIPS work is not the same as this work due to missing data w/ patching all slides at 4K resolution, but you can see the overall trend of ViT-16 doing better than ResNet-50.
Overall, the focus of this paper is primarily looking at architecture + hierarchical pretraining. The experiments you propose sound reasonable, which can be performed in follow-up work.
You can see the modification here for CLAM using VIT-16 features (of 384-dim) (we use a size of 384->384->256).
from hipt.
See these files, which contains the total list of WSIs evaluated. Though there are more patients included in the 10foldcv_subtype/tcga_brca/split_{i}.csv
split, patients with insufficient tissue content for patching at the 4K-level and are thus missing in the pt_files
-type folder, get excluded / masked out when slicing the dataframe.
I should update the split csv files to clarify this confusion.
from hipt.
Hi @Khoa-NT - Thank you for your interest in this repository
- We uploaded the training scaffold code to ./2-Weakly-Supervised-Subtyping/. Most of the training scaffold code is heavily borrowed from CLAM. For the survival code (MCAT) and future works that we release that continuing to build off of CLAM, we will just point to the original scaffold code w/ updated models that we have added.
Model walkthrough code is found here for 3-stage HIPT.
The eval_linear was from the original DINO repository, which we did not touch.
- As hopefully made transparent in the model walkthrough code, the 3-Stage HIPT only trains the last vision transformer stage (for aggregating x4096 features). The input into this HIPT model uses pre-extracted x256 features, but unlike
[M_256 x D]
tensor bag-like inputs used in CLAM and most other models (where M256 is the number of 256-sized patches in the WSI) , the input shape in this setting is[M_4K x 256 x D]
, where M4K is the number of 4K-sized patches and 256 is the number of 256-sized patches in a single 4K-sized patch. To create[M_4K x 256 x D]
, for the M4Kx4096 images in your WSI, you can extract[1 x 256 x D]
features from eachx_{4096}
image using Two-Stage HIPT with loaded weights, then stack these tensors along the zeroth dimension.To point something else out, since the main HIPT method we evaluate uses pretrained + pre-extracted features at the lower level, we can think of 3-Stage-HIPT as just doing MIL with pretrained 2-Stage-HIPT features (of 4K resolution). However, still using pre-extracted x256 features was important for doing ablation experiments in 3-Stage-HIPT without pretraining. Lots of fun variations to try!
- Will standardize the naming convention of the scripts.
- This may be a typo - it should say ViT-4096. Will update the arXiv soon
Can you share more details about the subtyping training or how to modify CLAM to adapt the HIPT features instead of RESNET-50(layer3's pooling, N1024 dim)? Especially the beginning linear layers with 1024->512->256 dim in CLAM for HIPT features N384? Also I find that the there is also no train/evaluation result with traditional resnet-50 + CLAM baseline.
from hipt.
Another question is that you collected about 1040 WSIs of TCGA-BRCA, but in the 10-fold- train/val/test splits sheet, the summation of which is only about 880. Why and how to make such 1040->880 subsets?
from hipt.
Hi, I would like to ask some more questions about training classification.
The code you provided for training the classification task. E.g., the tcga_brca_subtype
Full List of Training Classification Commands
GPU=0
DATAROOT=/path/to/TCGA_ROOT_DIR/
TASK=tcga_brca_subtype
CUDA_VISIBLE_DEVICES=$GPU python main.py --data_root_dir $DATAROOT --model_type hipt_lgp --task $TASK --prop 0.25
CUDA_VISIBLE_DEVICES=$GPU python main.py --data_root_dir $DATAROOT --model_type hipt_lgp --task $TASK --prop 1.0
CUDA_VISIBLE_DEVICES=$GPU python main.py --data_root_dir $DATAROOT --model_type hipt_lgp --task $TASK --prop 0.25 --pretrain_4k vit4k_xs_dino
CUDA_VISIBLE_DEVICES=$GPU python main.py --data_root_dir $DATAROOT --model_type hipt_lgp --task $TASK --prop 1.0 --pretrain_4k vit4k_xs_dino
CUDA_VISIBLE_DEVICES=$GPU python main.py --data_root_dir $DATAROOT --model_type hipt_lgp --task $TASK --prop 0.25 --pretrain_4k vit4k_xs_dino --freeze_4k
CUDA_VISIBLE_DEVICES=$GPU python main.py --data_root_dir $DATAROOT --model_type hipt_lgp --task $TASK --prop 1.0 --pretrain_4k vit4k_xs_dino --freeze_4k
1/ Following this code, the upper commands represent the below experiments:
-
$ViT-16_{PF}, ViT-256, ViT-4096$ with 25% and 100% training -
$ViT-16_{PF}, ViT-256_{P}, ViT-4096$ with 25% and 100% training -
$ViT-16_{PF}, ViT-256_{PF}, ViT-4096$ with 25% and 100% training
Is it right?
2/ Then the code HIPT_None_FC is used for the case
And the commands to run are:
CUDA_VISIBLE_DEVICES=$GPU python main.py --data_root_dir $DATAROOT --model_type hipt_n --task $TASK --prop 0.25
CUDA_VISIBLE_DEVICES=$GPU python main.py --data_root_dir $DATAROOT --model_type hipt_n--task $TASK --prop 1.0
Am I correct?
3/ I can't find the definition of pretrain_WSI
in this code.
And also there is no definition of self.global_vit
in this code.
If I understand correctly, self.global_vit
is the
4/ Following the Model Walkthrough.ipynb. I can assume that pretrain_WSI='None'
.
Then, it means the stage-3 TransformerEncoderLayer
.
Is it correct? or You are still updating the code for finetuning the
5/
- As hopefully made transparent in the model walkthrough code, the Three-Stage HIPT only trains the last vision transformer stage (for aggregating x4096 features). The input into this HIPT model uses pre-extracted x256 features, but unlike
[M_256 x D]
tensor bag-like inputs used in CLAM and most other models (where M256 is the number of 256-sized patches in the WSI, D=384 for ViT-16) , the input shape in this setting is[M_4K x 256 x D]
, where M4K is the number of 4K-sized patches and 256 is the number of 256-sized patches in a single 4K-sized patch. To create[M_4K x 256 x D]
, for the M4K x4096 images in your WSI, you can extract[1 x 256 x D]
features for each x4096 image using Two-Stage HIPT with loaded weights, thentorch.vstack
these tensors along the zeroth dimension.
Do you have the code for extracting the pre-extracted extracted_mag20x_patch256_fp
?
I only saw the training code and the create_slide_embeddings.
6/ Is this Normalize and this a typo? because I saw you train the DINO with ImageNet norm.
Or
If you train the HIPT from scratch, you will use mean = (0.5,0.5,0.5)
& std = (0.5,0.5,0.5)
and if you train with the pretrain then you use mean = (0.485, 0.456, 0.406)
& std = (0.229, 0.224, 0.225)
@Richarizardd can you help me
Thank you for supporting me.
from hipt.
Hi @Khoa-NT @Richarizardd , I would also like to know about the normalization transform.
from hipt.
Hi @juanigp @Khoa-NT - The normalization transform is mean = (0.5,0.5,0.5)
&std = (0.5,0.5,0.5)
. I may have accidentally pushed some code that I was playing around with a few months ago (since reverted back). My apologies.
from hipt.
Related Issues (20)
- Creating patches and extracting features for [4096 x 4096] HOT 3
- VIT-4096-WSI HOT 1
- Some issue about create patch. HOT 14
- Gradient accumulation not properly implemented HOT 2
- knn_classifier
- .csv metadata file for colorectal dataset
- Got different results when runing the example "Using the HIPT_4K API" two times HOT 7
- Hurry!!!How to slove this problem? HOT 2
- a WSI image with a higher resolution
- The results of each inference are inconsistent HOT 2
- dino4k pretraining HOT 4
- Setting patient_strat=FALSE in subtyping HOT 3
- coordinates of 4k region embeddings
- Data preprocessing
- How to determine the survival score when a patient has multiple WSIs? HOT 1
- 4k attention visualization output kind of weird HOT 1
- Duration of Feature Extraction Using hipt4k Pretrained Model HOT 4
- Pretraining Dataset
- Visualization Disturbing Original H&E
- Poor performance on PRAD downstream
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hipt.