mahmoodlab / hipt Goto Github PK

Hierarchical Image Pyramid Transformer - CVPR 2022 (Oral)

License: Other

Jupyter Notebook 98.97% Python 1.03%

computational-pathology cvpr cvpr2022 deep-learning hierarchical-attention-networks high-resolution histopathology pretrained-weights pytorch self-supervised-learning

hipt's Issues

Unable to find *.pt files for region_4096_pretraining

Am I right in expecting the patch level feature *.pt files (433779 files each containing 256 x 384 tensor) used for pretraining the second stage of HIPT to be present in the HIPT/3-Self-Supervised-Eval/embeddings_patch_lib/ directory?

Currently, I only see the following pickle files in that directory.

25M     bcss_train_resnet50_trunc.pkl
9.3M    bcss_train_vits_tcga_brca_dino.pkl
4.5M    bcss_val_resnet50_tcga_brca_simclr.pkl
2.3M    bcss_val_resnet50_trunc.pkl
868K    bcss_val_vits_tcga_brca_dino.pkl
19M     breastpathq_train_resnet50_tcga_brca_simclr.pkl
9.4M    breastpathq_train_resnet50_trunc.pkl
3.6M    breastpathq_train_vits_tcga_brca_dino.pkl
1.5M    breastpathq_val_resnet50_tcga_brca_simclr.pkl
744K    breastpathq_val_resnet50_trunc.pkl
280K    breastpathq_val_vits_tcga_brca_dino.pkl
783M    crc100knonorm_train_resnet50_tcga_brca_simclr.pkl
393M    crc100knonorm_train_resnet50_trunc.pkl
149M    crc100knonorm_train_vits_tcga_brca_dino.pkl
57M     crc100knonorm_val_resnet50_tcga_brca_simclr.pkl
29M     crc100knonorm_val_resnet50_trunc.pkl
11M     crc100knonorm_val_vits_tcga_brca_dino.pkl
783M    crc100k_train_resnet50_tcga_brca_simclr.pkl
393M    crc100k_train_resnet50_trunc.pkl
149M    crc100k_train_vits_tcga_brca_dino.pkl
57M     crc100k_val_resnet50_tcga_brca_simclr.pkl
29M     crc100k_val_resnet50_trunc.pkl
11M     crc100k_val_vits_tcga_brca_dino.pkl

Thanks in advance.

Datasets

Thank you for the great work and for sharing the code.

I have noticed that the datasets used in this article (NSCLSC and RCC) have been used in other articles of the author. Here I have a question to ask whether there is any overlap between the pretraining data and the datasets. Of course, if possible, can the author provide the manifest files of the respective TCGA datasets（I say this because I download TCGA datasets through the manifest file, so I record the ID of each WSI slide,and I want to use these two data sets as a task, so I want to match the data you use）？ If possible, I can email you, and then you can tell me by email. Please forgive me for my poor language ability.

tar_patch_4096 with webdataset API

Hello,
Thank you for your great work.

I am trying to recreate the folder structure for weakly supervised learning.

I am trying to create the tar_patch_4096 folder.

The description is:

Directory of saved [4096 × 4096] image regions for each WSI, stored in a *.tar format using WebDataset API.

I am not sure how to proceed here, Do I take the .h5 patches and turn them into .tar files?

Thank you,

Juan

Worse Performance in CAMELYON16 only

Getting raw patches after CLAM preprocessing

Hi @Richarizardd,

May I know how to get raw [256 × 256] patches (as *.png format) after CLAM processing?

Pretrained ViTWSI-4096 model

Thanks for sharing this excellect work! The method is both amazing and elegant.

I wonder if there is a pretrained ViTWSI-4096(n = 2, h = 3, d = 192) which aggregate the [CLS]4096 tokens and generate a slide-level representaion.

Creating patches and extracting features for [4096 x 4096]

@Richarizardd @faisalml - I appreciate your intuitive work. I have been using CLAM for quite some time, but I have encountered an obstacle as follows:

[Preface] - I use an in-house dataset, and CLAM works fine. I recently read your paper and was curious to generate the hierarchical attention maps for the custom dataset. I have the splits and features for [256 x 256] patches, but how do I connect the existing [256 x 256] to the newly extracted [4096 x 4096] features? I have read the open and closed issues. However, I am not finding a lucid explanation.

Consider a WSI with ~20000 [256 x 256] patches, and I have Resnet50 features already extracted and stored on my disk using CLAM's scripts. @Richarizardd has mentioned that I have to change [256 x 256] to [4096 x 4096] while creating patches and extracting the features. In doing this, is the hierarchy still preserved? For example, if I extract a [4096 x 4096] patch hp1, how do I correlate it with the existing [256 x 256] patches in my directory? Is it using the [x,y] coordinates? Is the trajectory of my understanding of the pre-processing reasonable? Am I missing something?

In addition to this, where do I find ViT-16 features pretrained on TCGA (ref)? Is it from

HIPT/1-Hierarchical-Pretraining/From ViT-16 to ViT-256.ipynb

Line 29 in b5f4844

"from vision_transformer import vit_small\n",

Do I use this instead of resnet_custom in the feature extraction

Or is it from

HIPT/HIPT_4K/hipt_4k.py

Line 67 in b5f4844

features_cls256 = []

Please correct me if I am wrong @Richarizardd @faisalml. Thank you.

git lfs pull

Thanks for your excellent work!
When I run the git lfs pull, the error is reported as follows:
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
error: failed to fetch some objects from 'https://github.com/mahmoodlab/HIPT.git/info/lfs'

Can you upload the large files(images, checkpoint) to google drive or other cloud disks?

.csv metadata file for colorectal dataset

hi, thanks for the interesting paper! I am trying to replicate the results for survival prediction but cannot find the .csv file (not splits, the dataset csv itself) for colon and rectal cancer, in either the HIPT codebase or the MCAT codebase. Could you please let me know where I can find this?

subtyping: training loss not really decreasing

Hi, I'm trying to replicate the subtyping results you report on TCGA BRCA as a sanity check before applying HIPT to a different dataset. Hence, I'm using the same slides, same splits & same labels as given in the repo. For now, I've sticked to training & evaluating on fold_0.

I'm having troubles when training a model: my training loss barely goes down (see picture below: loss plateaus after epoch 6, with training AUC being around 0.50).

After having deeply dived into the code, there are a few things I'd love to have your help to understand:

In HIPT_LGP_FC you set self.local_vit = vit4k_xs() ; based on the following lines, it means self.local_vit is an instance of VisionTransformer4K with patch_size = 16

HIPT/HIPT_4K/vision_transformer4k.py

Lines 267 to 272 in 2e0adbe

    
           def vit4k_xs(patch_size=16, **kwargs): 
        
               model = VisionTransformer4K( 
        
                   patch_size=patch_size, input_embed_dim=384, output_embed_dim=192, 
        
                   depth=6, num_heads=6, mlp_ratio=4,  
        
                   qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), **kwargs) 
        
               return model

Then, looking at the VisionTransformer4K class, the default img_size argument is [224].
Combined with patch_size = 16, this means that num_patches = 196 (line 170), which is used line 174 to instantiate self.pos_embed

HIPT/HIPT_4K/vision_transformer4k.py

Lines 161 to 174 in 2e0adbe

    
           class VisionTransformer4K(nn.Module): 
        
               """ Vision Transformer 4K """ 
        
               def __init__(self, num_classes=0, img_size=[224], input_embed_dim=384, output_embed_dim = 192, 
        
                            depth=12, num_heads=12, mlp_ratio=4., qkv_bias=False, qk_scale=None,  
        
                            drop_rate=0., attn_drop_rate=0., drop_path_rate=0., norm_layer=nn.LayerNorm, num_prototypes=64, **kwargs): 
        
                   super().__init__() 
        
                   embed_dim = output_embed_dim 
        
                   self.num_features = self.embed_dim = embed_dim 
        
                   self.phi = nn.Sequential(*[nn.Linear(input_embed_dim, output_embed_dim), nn.GELU(), nn.Dropout(p=drop_rate)]) 
        
                   num_patches = int(img_size[0] // 16)**2 
        
                   print("# of Patches:", num_patches) 
        
                   self.cls_token = nn.Parameter(torch.zeros(1, 1, embed_dim)) 
        
                   self.pos_embed = nn.Parameter(torch.zeros(1, num_patches + 1, embed_dim))

Hence, if we feed HIPT_LGP_FC a tensor of shape [M, 256, 384] as done in the model walkthrough notebook, at some point during the forward pass, the interpolate_pos_encoding method gets called. Given x.shape = [M, 257, 384] and pos_embed.shape = [1, 197, 192], npatch = 256 and N = 196: the condition npatch == N on line 204 is False, so we need interpolate the positional embedding

HIPT/HIPT_4K/vision_transformer4k.py

Lines 201 to 205 in 2e0adbe

    
           def interpolate_pos_encoding(self, x, w, h): 
        
               npatch = x.shape[1] - 1 
        
               N = self.pos_embed.shape[1] - 1 
        
               if npatch == N and w == h: 
        
                   return self.pos_embed

why the patch_size argument passed when instantiating VisionTransformer4K is actually not used in VisionTransformer4K.__init__() -- instead, a hard coded value of 16 is used (line 170, see below)

HIPT/HIPT_4K/vision_transformer4k.py

Line 170 in 2e0adbe

num_patches = int(img_size[0] // 16)**2
why the img_size argument passed when instantiating VisionTransformer4K is left as default (i.e. img_size = [224]) and not set to [256]? I get that during self-supervised pre-training, you use crops of size [224, 224], but during subtyping, we're using the full [256, 256] patch, so I guess we should use img_size = [256], shouldn't we? Doing so, the previously discussed condition npatch == N would become True (hence we would not need to interpolate the positional embedding anymore).
Given we pass a tensor of shape [M, 256, 384] to HIPT_LGP_FC, which get reshaped to [M, 384, 16, 16] before being passed to HIPT_LGP_FC.local_vit, the following line gives B = M.

HIPT/HIPT_4K/vision_transformer4k.py

Line 226 in 2e0adbe

B, embed_dim, w, h = x.shape

Then, in the following line we define cls_token as a tensor of shape [M, 1, 192]. Isn't there a confusion between B (supposed to account for the batch size) and M (number of [4096, 4096] regions per slide)? Shouldn't the cls_token tensor be of shape [batch_size, 1, 192] ?

HIPT/HIPT_4K/vision_transformer4k.py

Lines 232 to 233 in 2e0adbe

# add the [CLS] token to the embed patch tokens

cls_tokens = self.cls_token.expand(B, -1, -1)
I've also tried training only the global aggregation layers by directly feeding the region-level pre-extracted features (of shape [M, 192]), without success (training loss not really decreasing either). Could you confirm that this should work just as well as training the intermediate transformer + the global aggregation layers on the [M, 256, 384] features?

Thanks!

VIT-4096-WSI

What would y'all recommend to use for VIT4096-WSI? I ended up just using an adjusted wrapper of the VisionTransformer4K class, but I feel as if this isn't the best solution.

img_size argument

Thank you for your work and for sharing the code.
I've dived into the repo as I'm trying to run HIPT on a custom WSI dataset to see how it compares to other methods.

Following what's described in the paper, ViT-4096 is supposed to encode a [4096,4096] region into an embedding.
This happens by leveraging ViT-256 to encode each [256,256] patches in that region, then work on a sequence of [CLS] tokens.

When looking at the HIPT_4K class, you use get_vit4k, which returns:

model4k = vits4k.__dict__[arch](num_classes=0)

with arch='vit4k_xs', this is basically a call to:

def vit4k_xs(patch_size=16, **kwargs):
    model = VisionTransformer4K(
        patch_size=patch_size, input_embed_dim=384, output_embed_dim=192,
        depth=6, num_heads=6, mlp_ratio=4, 
        qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), **kwargs)
    return model

I was wondering if you could shed light on the following:

VisionTransformer & PatchEmbed img_size argument defaults to [224]: not sure why it doesn't default to [256]
VisionTransformer4K img_size argument defaults to [224] : not sure why it doesn't default to [4096]?

point 1. also impacts the shape of some self-supervised pre-trained weights (output of Conv2d w/ kernel_size=16 & stride=16 on a 224 × 224 img gives a 14 × 14 map, while on a 256 × 256 img it gives a 16 × 16 map)

Thanks in advance.

Using the Features in CLAM

Hi @Richarizardd ,
Thanks for the great work.
I am trying the subtyping part from the repository. I want to ask you if the features for slide having dimension NX192, where N is number of patches provided here https://github.com/mahmoodlab/HIPT/tree/master/3-Self-Supervised-Eval/embeddings_slide_lib/embeddings_slide_lib/vit256mean_tcga_slide_embeddings

Q1 : can be used directly in the CLAM training pipeline provided in the subtyping part?

Q2 : if we want touse the hipt_lgp or hipt_n models from the repo, we would need to feed them with NX384 features also? I am assuming this are not provided in the repository ?

Q3 : When we extract patches with 4096 will give us features 192, and when we extract patches at 256 it will give us features at 384?
When we extract features from 4096 from VIT4K, and for 256 size patches from VIT256, when I tried to extract features from VIT256, features size still was still as N X 192. Is there anything need to be changed while extracting VIT 256 features?

Thanks!

Unknow architecture: vit_small when running 1-Hierarchical-Pretraining/main_dino4k.py

It seems like the code doesn't recognize the "vit_small" architecture, even through it's the default architecture

ModuleNotFoundError: No module named 'nn_encoder_arch'

Thanks for your excellent work!
when I ran the code in ./3-Self-Supervised-Eval/patch_extraction.py , It occured the error ModuleNotFoundError: No module named 'nn_encoder_arch'. I cannot find the nn_encoder_arch, what is nn_encoder_arch? Thank you.

Model Architectures

from nn_encoder_arch.vision_transformer import vit_small
from nn_encoder_arch.resnet_trunc import resnet50_trunc_baseline

How can I get ’vits_tcga_pancancer_dino_pt_patch_features‘?

How can I make the size of .pt is M256384, not M*192. Can u open your change on CLAM or show me some help about how to create the dataset?

Gradient accumulation not properly implemented

Hi, based on the following lines, it seems gradient accumulation is not properly implemented:

HIPT/2-Weakly-Supervised-Subtyping/utils/core_utils.py

Lines 285 to 290 in a9b5bb8

    
           loss = loss / gc 
        
           loss.backward() 
        
           # step 
        
           optimizer.step() 
        
           optimizer.zero_grad()

A proper implementation should look like the following:

loss = loss / gc
loss.backward()

if (batch_idx + 1) % gc == 0:
      optimizer.step()
      optimizer.zero_grad()

average number of [4096,4096] regions per slide

Hi,

I'm trying to reproduce your results for the TCGA BRCA dataset. I've thus narrowed down the dataset to the 875 breast slides you use to train & evaluate your models (based on the .csv files you provided here).

As a first step, I ran CLAM segmentation & patching pipeline on this dataset. I used the preset parameters that the group provided for TCGA BRCA (can be found here). When computing the average number of regions per slide, I get avg_M ~ 212.

The paper states that the average number of [4096,4096] regions per slide (avg_M) is around 38 when computed over the 10,678 FFPE slides from 33 cancer types in TCGA. I thought "maybe TCGA BRCA slides are much bigger than the other cancer types', hence why such a big difference when computing avg_M".

To assess whether or not that assumption was true, I downloaded the pre-extracted "region-level" feature embeddings you kindly provide under 3-Self-Supervised-Eval/embeddings_slide_lib/embeddings_slide_lib/vit256mean_tcga_slide_embeddings. From these I can easily tell how many [4096,4096] regions you found for each of the 875 TCGA BRCA slides. After computing the average over these slides, I get avg_M ~ 30.

I was thus wondering if you had an idea of where such a big difference (212 vs. 30) may come from. Do you remember using tcga preset parameters when running CLAM or did you use custom parameters?

Thank you!

knn_classifier

def knn_classifier(train_features, train_labels, test_features, test_labels, k, T, num_classes=2)，The function is given a parameter of num_classes of 1000, when this value is changed to 2 and the binary classification problem is performed, an error is reported: RuntimeError: start (0) + length (5) exceeds dimension size (2). How should I use knn for binary classification problem?

ModuleNotFoundError: No module named 'vision_transformer'

Hello, even installing the dependencies (pip install -r requirements.txt) I got the message "ModuleNotFoundError: No module named 'vision_transformer'" while executing "from HIPT_4K.hipt_4k import HIPT_4K". If you can tell me where I went wrong or how to import these "vision_transformer" library I would appreciate it. Thanks.

Labels for cancer subtyping

Hi, first of all thank you for the amazing work and codebase. I see that the data folds for the subtype classification task are provided in HIPT/2-Weakly-Supervised-Subtyping/splits/10foldcv_subtype.

However in the .csv files, the labels are not provided. I tried retrieving the diagnoses for all the .svs in the brca cohort using the gdc api and these are the different diagnoses:

'Adenoid cystic carcinoma', 'Apocrine adenocarcinoma', 'Basal cell carcinoma, NOS', 'Carcinoma, NOS', 'Cribriform carcinoma, NOS', 'Infiltrating duct and lobular carcinoma', 'Infiltrating duct carcinoma, NOS', 'Infiltrating duct mixed with other types of carcinoma', 'Infiltrating lobular mixed with other types of carcinoma', 'Intraductal micropapillary carcinoma', 'Intraductal papillary adenocarcinoma with invasion', 'Large cell neuroendocrine carcinoma', 'Lobular carcinoma, NOS', 'Medullary carcinoma, NOS', 'Metaplastic carcinoma, NOS', 'Mucinous adenocarcinoma', 'Paget disease and infiltrating duct carcinoma of breast', 'Papillary carcinoma, NOS', 'Phyllodes tumor, malignant', 'Pleomorphic carcinoma', 'Secretory carcinoma of breast', 'Tubular adenocarcinoma'

I would like to ask how did you derive the ILC and IDC labels that you use, or if you could provide the labels for the .svs files.

I am asking particularly about the brca cohort since it is the one I started inspecting, but probably a similar question would arise from the other cohorts, and their labels would be useful as well (:

Which mean "n" and "h" ?

ViT4096-256(n = 4, h = 3, d = 192)
ViTWSI-4096(n = 2, h = 3, d = 192)

Hi, which mean "n" and "h" ?

Reshaping Vit256-16 output to be consistent with ViT4096-256 and following ViT-WSI

Hello,

I have two questions regarding your implementation for using it with MIL.

1.) My first question is about connecting Vit256-16, Vit4096-256 and ViT-WSI to build a complete network

As you state in the model walkthrough notebook for weakly supervised subtyping, input shape with pre-extracted ViT256-16 tokens must be

Input: $[M \times L \times D]$ Tensor, where:

M: Number of (non-overlapping) $[4096 \times 4096]$ Image regions in a WSI (On Average: 38)

L: Number of (non-overlapping) $[256 \times 256]$ Image Patches in a $[4096 \times 4096]$ Image Region (Default: 256)

D: Embedding Dimension (Default: 384)

Now I am a little bit confused about the output shape of ViT256-16. If I take the given implementation, the output shape of ViT256-16 is $[(M*L) \times D]$. My first thought to match the new input dimension for ViT4096-256 is just to use x = x.reshape(M, L, D) , respectively x = einops.rearrange(x, '(M L) D -> M L D', M=4, L=16), for a given tensor with shape [M*L, D], but I am not sure if this correctly maintains the spatial dimensions and axis?

2.) How do you handle Multiple WSI per patient in your Framework?

Do you simply use the patient label as the label for every WSI? I have not found information about merging multiple WSI for one patient in your code (e.g. the train loop in core utils simply iterates over WSI with given labels).

Confusion regarding the number of epochs the HIPT model was pre-trained for

Thank you for the great work and for sharing the code.

The paper mentions that the model was trained for 400K iterations with batch size of 256 which amounts to 102,400,000 patches, which seems to be about the same as the size of the dataset used for pretraining. So it seems like the model was trained for just 1 epoch, but the training script in the README

python -m torch.distributed.launch --nproc_per_node=8 main_dino.py --arch vit_small --data_path /path/to/TCGA_PRETRAINING_DIR/patch_256_pretraining/ --output_dir /path/to/TCGA_PRETRAINING_DIR/ckpts/pretrain/ --epochs 100

seems to suggest that it was pretrained for 100 epochs. Could you please clarify this detail? Thanks in advance.

Process of training subtyping task on custom WSIs dataset using provided pretrained model

Hi, thank you for your wonderful work and sharing the codes.

We hope to use provided feature extractor on our own dataset. (i.e., Skip the self-supervised learning process.)

However, we are not clear about how to organize the folders for weakly-supervised training on our own dataset.

We'd like to ask you how to construct folder like this, by modifying CLAM's create_patches_fp.py and extract_features_fp.py.

Custom_Dataset_Features_Folder/
    ├── h5_files
            ├── slide_1.h5
            ├── slide_2.h5
            └── ...
    └── pt_files
            ├── slide_1.pt
            ├── slide_2.pt
            └── ...

or if we are not correct, can you provide the codes or details for weakly-supervised training setup?

Thank you!

Some questions

thanks you for your open source but i have some questions

where is the function of get_patch_attention_scores.I can't find that

All categories were classified into the same category

I was using a tcga dataset for grading learning. However, the results showed that all categories were classified into the same category in stage 3, i.e., final fine-tune WSI stage. The training losses had converged in stages 256 and 4096.

Have you ever met such this problem, and how did you solve it？
If not， could you tell me what can I do next step?

Thanks!

Inquiry about Dataset

Hi Richard,

Could you please let me know how to download the same datasets as yours? Additionally, I tried git cloning your repository, but I couldn't unzip all zip folders in HIPT/2-Weakly-Supervised-Subtyping/dataset_csv. What is the procedure for unzipping it? Thank you

Training procedure for custom dataset?

Hi great work and repo :)

I wanted to try this on a private dataset of WSIs at my lab. Do you have instructions for doing this? Cant find a clear path. Seems like the 3 stages must be trained independently?

Survival Code

Hi,
Thanks for sharing the code.
It seems that hipt_lgp is implemented in the MCAT repo yet, but it says in this repo that we should use the below command which I assume is related to MCAT repo(Please correct me if I'm wrong):

python main.py --data_root_dir $DATAROOT --which_splits 5foldcv --split_dir tcga_brca --mode pyramid --model_type hipt_lgp --pretrain_4k vit4k_xs_dino --freeze_4k

I wonder if you can provide more info on how to use MCAT code with hipt_lgp backbone for survival prediction especially when there is no omics data available in the dataset? or the main file with other required files for this task.

Thanks.

Recommended GPUs

What are recommended/required GPUs for running the experiments?
Especially, curious about the GPU memory requirements.

Thank you for this awesome work! @Richarizardd

Number of epochs required for finetuning

Hi, My goal is to fine-tune a DINO 1st stage model from a checkpoint folder (vit256_small_dino.pth) using the Camelyon16 dataset. For an optimal solution, how many epochs are needed?

Some issue about create patch.

Thanks for your intuitive work for Wsi classification. Could you please inform me some details about the feature extraction. Here are some questions of me to process tagc_lung.

To extract features in 256*256 patches for MIL baseline, we set patch_size=256 in create_patches.py of CLAM to generate patch (python create_patches_fp.py --source DATA_DIRECTORY --save_dir RESULTS_DIRECTORY --patch_size 256 --preset bwh_biopsy.csv --seg --patch --stitch). Subsequently, model is changed to pre-trained ViT16 in extract_features_fp.py of CLAM to extract features. Is it correct?
To extract features in 4096*4096 patches for HIPT baseline, we set patch_size=4096 in create_patches.pyof CLAM. Then model is changed to hipt_4k in extract_features_fp.py of CLAM to extract features (pre-trained ViT16 and ViT4k are both utilized in hipt_4k model). Is it correct?
If instruction 2 is correct. I wonder how to fast extract features with hipt_4k because its default batch_size is set to 1. Did you change the hipt_4k for a bigger batch size or take several servers to extract features. (If I process almost 900 slides on two Nvidia 3090 with batch_size = 1, almost 80 days will be consumed.)

Worse Performance in CAMELYON16 only

Hi @Richarizardd , I have a question concerning the worst performance I had in the CAMELYON16 dataset only. Here are the results of my experiment after following all of the settings and pretraining models provided:

CAMELYON16
1 fold: Train: 242 WSIs, Val: 28 WSIs, Test: 129 WSIs
Mean Test AUC across 10 folds: 0.709 += 0.024
Mean Test ACC across 10 folds: 0.764 += 0.021
UCEC
1 fold: Train: 668 WSIs, Val: 58 WSIs, Test: 238 WSIs
Mean Test AUC across 10 folds : 0.991 += 0.004
Mean Test ACC across 10 folds: 0.958 += 0.012
our own dataset (leica colon)
1 fold: Train: 469 WSIs, Val: 49 WSIs, Test: 223 WSIs
Mean Test AUC across 10 folds : 0.977 += 0.04
Mean Test ACC across 10 folds: 0.941 += 0.009

Are you perhaps able to explain why the mean test AUC and ACC in CAMELYON16 aren't that good? Could it be that the pretraining dataset is very different from the training dataset? Is it because there aren't many training slides? In fact, I trained CLAM with the same dataset, distribution for CAMELYON16, and it achieved AUC and ACC of around 85. It is greatly appreciated that you shared your insight. Thank you~

Question regarding Evaluation

Hello Richard,
I have a few questions regarding your paper. I hope you can answer them.

Is it possible to use your implementation for Slides with less than 4096 to 4096 pixels? I am very curious about this because if you use biopsy slides with tissue gathered by fine needles, you usually have slides with high background ratio and tissue area is smaller than 4096 x 4096 pixels.
Why are you not incorporating CLAM-SB into your survival prediction comparison since it seems to be the second-best classification model in your Slide-Level classification task? Also, are you using multiple WSI per patient for survival prediction or just one WSI?

I hope you can understand my questions, and I am looking forward for a reply!

Thanks, Fabian

Split indices of precomputed embeddings for reproducibility & comparison

Dear Team,

Excellent work - thanks for providing precomputed embeddings!

For reproducibility, benchmarking and comparison, could you please share the indices of the train-val-test split in the patch embeddings under "embeddings_patch_lib/" - e.g. in "bcss_val_vits_tcga_brca_dino.pkl"?

The file contains the embeddings and labels, but there seems to be no way to map them back to which case id / slide the embeddings came from, so it's difficult to use them to benchmark... Looking at patch_extraction_utils.py, the indices in the pickles seem to have something to do with "BCSS/40x/patches/summary.csv"; but that file is not provided either; so we can't reproduce the train-val-test split...

Thanks in advance!

Some questions

Hi, thank you for a great paper.
I would like to ask some questions.

1/ About the training code for Slide-Level Classification.
Are you guys still updating the code? Because I can't find the file 2-Weakly-Supervised-Train-Val/Model%20Walkthrough.ipynb. Or main.py in the training commands.
I only find the training in this eval_linear.py. Is it the training code for Slide-Level Classification?

2/ You set num_classes=0 in this code, so I guess you train the $ViT_{WSI}-4096$ + an additional classification layer together at the Fine-tuning state. But I can't find the details of the classification layer for MIL in the paper or the supplementary.
You only mentioned in the Fine-tuning part of the paper, that the $ViT_{WSI}-4096$ is finetuned but no information about the classification layer.
In addition, in this code, you only train one Linear layer for the classification layer from the feature.
I'm confused. I thought you train the $ViT_{WSI}-4096$ + classification layer.
Can you give me more details about the Fine-tuning state?

3/ And also, does main_dino256.py in the command is the 1-Hierarchical-Pretraining/main_dino.py?

4/ In the Supplementary, Table 4:

In the 4th case, Is it a typo?

If you used only the $ViT-16_{PF}$, $ViT-256_{PF}$ then an AP-4096 should be at the red line.
If it is not the typo, then how you can get the result with only 2 $ViT-16_{PF}$, $ViT-256_{PF}$?
Furthermore, It means using only 2 pretrained $ViT-16_{PF}$, $ViT-256_{PF}$ from DINO is the best result ( the same as the best result in Table 1)? Then it means $ViT_{WSI}-4096$ is not important, right?

Thank you for supporting me.

requirment

Hi, can you open source the requirements document? Requirements are what the packages need to install.

Issue with Attention Visualization

Hello, thank you for the work that you published, very exciting!
I am trying to play with the code and took the code from the Attention Visualization notebook.
I get the following error when I am trying to run create_hierarchical_heatmaps_indiv

Traceback (most recent call last):
  File "hipt4kinference_attentionvisualization.py", line 28, in <module>
    create_hierarchical_heatmaps_indiv(region, model256, model4k,
  File "/storage01/nikitam/HIPT/HIPT_4K/hipt_heatmap_utils.py", line 388, in create_hierarchical_heatmaps_indiv
    score256_1 = concat_scores256(a256_1[:,i,:,:], size=(s//16,)*2)
TypeError: concat_scores256() missing 2 required positional arguments: 'w_256' and 'h_256'

Thank you!

cannot load pretrained model weights

Hi! I am trying to load pretrained dino weights found in HIPT/HIPT_4K/Checkpoints using the following code:

pretrained_weights4k = 'vit256_small_dino.pth'
state_dict = torch.load(pretrained_weights4k)['teacher']

but i am getting an error: UnpicklingError: invalid load key, 'v'.

Also running the notebook HIPT_4K Inference + Attention Visualization I have checked that when get_vit256 or get_vit4k are called, os.path.isfile(pretrained_weights) returns False inside these functions, so the code doesn't load model weights.

Can you please help with this issue

Thank you!

'Weakly-Supervised Training' task - Issue with fast_cluster_ids.pkl file

Hello,
Thank you for the great work and the released code. I am trying to perform the training of the ViT-WSI to get a slide-level classification on a customized dataset. Even though I have structured the TCGA_ROOT_DIR folder as shown in the README and adapted the code in the main.py file to my use-case by adding the following lines

          elif args.task == 'tcga_h_subtype':
                      args.n_classes = 2

                       dataset = Generic_MIL_Dataset(csv_path = './dataset_csv/tcga_h_subset.csv',
                                                                          data_dir= os.path.join(args.data_root_dir, study_dir),
                                                                          mode=args.mode, 
                                                                          shuffle = False, 
                                                                          seed = args.seed, 
                                                                          print_info = True,
                                                                          label_col='oncotree_code',
                                                                          label_dict = {0:0, 1:1},
                                                                          patient_strat=False,
                                                                          prop=args.prop) `

when running the command

CUDA_VISIBLE_DEVICES=$GPU python main.py --data_root_dir $DATAROOT --model_type hipt_lgp --task $TASK --prop 1.0 --pretrain_4k vit4k_xs_dino --freeze_4k

I get the following error File "HIPT/2-Weakly-Supervised-Subtyping/datasets/dataset_generic.py", line 411, in __init__ with open(os.path.join(cluster_dir, 'fast_cluster_ids.pkl'), 'rb') as handle: FileNotFoundError: [Errno 2] No such file or directory: data_dir/extracted_mag20x_patch4096_fp/fast_cluster_ids.pkl'.

Which script should I run to produce this file? What does it represent? I cannot find any indication in the README.

Moreover, I noticed that in line 123 and 125 on main.py file, the args.pretrain_WSI argument is called but it is not specified in the arg.parser.

Thanks!!

WSI preprocessing

Hi, Thanks for sharing this inspiring work :)

Could you please share the steps and criteria in the preprocessing in this project? For example, the preset parameters used for CLAM preprocessing, and patch selection criteria (e.g. to exclude artefacts and patches without enough tissue, etc.). Or other preprocessing steps such as stain normalisation.

Many thanks!

Extracting features from 4096 x 4096 patches (M x L x D)

Hello @Richarizardd,

Could you please guide me on how to extract vits_tcga_pancancer_dino_pt_patch_features for every 4096 x 4096 patch and save them in the extracted_mag20x_patch4096_fp folder?

Also, I was wondering if there is a feature extraction pipeline available for this task, or do we need to extract features for each 4096 x 4096 patch using HIPT_4k api and then vstack all of them? If you have a code for this, it would be great if you could share it.

Thank you!

Pretrained-WSI

Thank you for your great work on the HIPT model!

In HIPT_LGP_FC, it seems that the code for loading pretrained-WSI is just "pass". Is there a way to load pretrained-WSI? Also, is the pretrained model for this part included in the repository?

the reasoning behind attention heatmaps code

Hi, I thoroughly went through the attention heatmap generation code and there is one thing I have trouble understanding.
I'd love to hear your take on this as it would allow me to get the part of the picture I'm missing.

To keep it simple, let's focus on the create_patch_heatmaps_indiv function.

HIPT/HIPT_4K/attention_visualization_utils.py

Line 311 in b5f4844

    
           patch2 = add_margin(patch.crop((16,16,256,256)), top=0, left=0, bottom=16, right=16, color=(255,255,255))

In the line above, you're taking the (240, 240) bottom crop of the input patch, then paste it in the top-left corner of a white (256, 256) image. Then, you retrieve the attention scores for the original input patch, as well as for patch2.
Eventually, you combine both attention scores in the following lines:

HIPT/HIPT_4K/attention_visualization_utils.py

Lines 342 to 346 in b5f4844

    
           new_score256_2 = np.zeros_like(score256_2) 
        
           new_score256_2[offset_2:s, offset_2:s] = score256_2[:(s-offset_2), :(s-offset_2)] 
        
           overlay256 = np.ones_like(score256_2)*100 
        
           overlay256[offset_2:s, offset_2:s] += 100 
        
           score256 = (score256_1+new_score256_2)/overlay256

Here all you do is restricting the attention scores from scores256_2 the those corresponding to the tissue crop in patch2.
Then, you sum scores256_1 and new_scores256_2, making sure to divide by a twice bigger weight (200) the portion where scores256_1 and scores256_2 overlap (because they represent the same tissue crop).

I draw a summary of what is happening:

My question then boils down to: what is the reasoning behind blending a crop and not simply computing scores256 via:

_, a256 = get_patch_attention_scores(patch, model256, device256=device256)
score256 = get_scores256(a256[:,i,:,:], size=(s,)*2)
score256 = score256 / 100

Thanks!

ModuleNotFoundError: No module named 'utils.gpu_utils'

Hello; I am trying to reproduce results in 'Weakly-Supervised Training + Evaluation' section however when I run code, I am facing with this error. There is no file named gpu_utils under HIPT/ 2-Weakly-Supervised-Subtyping/ utils. I also checked CLAM and MCAT GitHub repos but could not find the file under utils files.

name 'get_patch_attention_scores' is not defined

Hi,

Where is the 'get_patch_attention_scores' function?

Here is the error when perform 256 x 256 Demo (Saving Attention Maps Individually)

NameError Traceback (most recent call last)
/tmp/ipykernel_3452983/1904309063.py in
4 create_patch_heatmaps_indiv(patch=patch, model256=model256,
5 output_dir=output_dir, fname='patch',
----> 6 cmap=light_jet, device256=device256)

/data1/partitionA/CUHKSZ/histopath_2022/codes/histopathology_pretraining/HIPT/HIPT_4K/hipt_heatmap_utils.py in create_patch_heatmaps_indiv(patch, model256, output_dir, fname, threshold, offset, alpha, cmap, device256)
174 patch1 = patch.copy()
175 patch2 = add_margin(patch.crop((16,16,256,256)), top=0, left=0, bottom=16, right=16, color=(255,255,255))
--> 176 b256_1, a256_1 = get_patch_attention_scores(patch1, model256, device256=device256)
177 b256_1, a256_2 = get_patch_attention_scores(patch2, model256, device256=device256)
178 save_region = np.array(patch.copy())

NameError: name 'get_patch_attention_scores' is not defined

Batch-wise extract features

Hello,

I'm trying to extract and save features using HIPT/hipt_4k.py codes. However, It seems to works for only a 1-batch size.
Do you have any suggestions or tips for modifying the code to allow for batch-wise processing?

Thank you!

Stochastic behavior when extracting the features for one image

Hi,
Really great work!

I have been extracting features at the two first levels (256 and 4k) for one whole slide images. Running it four times (to see if it was reproducible), I noticed that I got four different features. Setting torch.backends.cudnn.deterministic=True solved the issue, however I still do not get why we are having differences as the weights should be fixed.

The line that is creating the stochastic behavior is the following:
from line 43 in HIPT_4K/hipt_4k.py: self.model256 = get_vit256(pretrained_weights=model256_path).to(device256)
then line 54 in HIPT_4K/hipt_model_utils.py: model256 = vits.__dict__[arch](patch_size=16, num_classes=0) (which is calling vit_small)
then line 165 in: HIPT_4K/vision_transformer.py: self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size)

This last line is where I found the stochastic behavior. Why is that?
Thank

Some Questions

Hi @Richarizardd,

I would be glad if you can answer some of my questions below:

Before running main_dino4k.py, I notice you saved all [256-Length x 384-Dim] Tensors for the input which correspond to extracted ViT-16 features for 4K x 4K patch. May I know in which part of the code you did that? Do I need to extract patches of 4k x 4k in jpg or png format to get the input tensors?
Should I change the equation below if batch size=64 couldn't be used? How about the learning rate?

args.lr * (args.batch_size_per_gpu * utils.get_world_size()) / 256.

Is HIPT_LGP_FC your main model?

Thanks for your time and kindness!

	def vit4k_xs(patch_size=16, **kwargs):
	model = VisionTransformer4K(
	patch_size=patch_size, input_embed_dim=384, output_embed_dim=192,
	depth=6, num_heads=6, mlp_ratio=4,
	qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), **kwargs)
	return model

	class VisionTransformer4K(nn.Module):
	""" Vision Transformer 4K """
	def __init__(self, num_classes=0, img_size=[224], input_embed_dim=384, output_embed_dim = 192,
	depth=12, num_heads=12, mlp_ratio=4., qkv_bias=False, qk_scale=None,
	drop_rate=0., attn_drop_rate=0., drop_path_rate=0., norm_layer=nn.LayerNorm, num_prototypes=64, **kwargs):
	super().__init__()
	embed_dim = output_embed_dim
	self.num_features = self.embed_dim = embed_dim
	self.phi = nn.Sequential(*[nn.Linear(input_embed_dim, output_embed_dim), nn.GELU(), nn.Dropout(p=drop_rate)])
	num_patches = int(img_size[0] // 16)**2
	print("# of Patches:", num_patches)

	self.cls_token = nn.Parameter(torch.zeros(1, 1, embed_dim))
	self.pos_embed = nn.Parameter(torch.zeros(1, num_patches + 1, embed_dim))

	def interpolate_pos_encoding(self, x, w, h):
	npatch = x.shape[1] - 1
	N = self.pos_embed.shape[1] - 1
	if npatch == N and w == h:
	return self.pos_embed

	# add the [CLS] token to the embed patch tokens
	cls_tokens = self.cls_token.expand(B, -1, -1)

	loss = loss / gc
	loss.backward()

	# step
	optimizer.step()
	optimizer.zero_grad()

	new_score256_2 = np.zeros_like(score256_2)
	new_score256_2[offset_2:s, offset_2:s] = score256_2[:(s-offset_2), :(s-offset_2)]
	overlay256 = np.ones_like(score256_2)*100
	overlay256[offset_2:s, offset_2:s] += 100
	score256 = (score256_1+new_score256_2)/overlay256

mahmoodlab / hipt Goto Github PK

hipt's Issues

Model Architectures

1.) My first question is about connecting Vit256-16, Vit4096-256 and ViT-WSI to build a complete network

2.) How do you handle Multiple WSI per patient in your Framework?

Recommend Projects

Recommend Topics

Recommend Org