Giter Site home page Giter Site logo

chexpert's Introduction

Top1 Solution of CheXpert

What is Chexpert?

CheXpert is a large dataset of chest X-rays and competition for automated chest x-ray interpretation, which features uncertainty labels and radiologist-labeled reference standard evaluation sets.

Why Chexpert?

Chest radiography is the most common imaging examination globally, critical for screening, diagnosis, and management of many life threatening diseases. Automated chest radiograph interpretation at the level of practicing radiologists could provide substantial benefit in many medical settings, from improved workflow prioritization and clinical decision support to large-scale screening and global population health initiatives. For progress in both development and validation of automated algorithms, we realized there was a need for a labeled dataset that (1) was large, (2) had strong reference standards, and (3) provided expert human performance metrics for comparison.

How to take part in?

CheXpert uses a hidden test set for official evaluation of models. Teams submit their executable code on Codalab, which is then run on a test set that is not publicly readable. Such a setup preserves the integrity of the test results.

Here's a tutorial walking you through official evaluation of your model. Once your model has been evaluated officially, your scores will be added to the leaderboard.Please refer to the https://stanfordmlgroup.github.io/competitions/chexpert/

What the code include?

  • If you want to train yourself from scratch, we provide training and test the footwork code. In addition, we provide complete training courses
  • If you want to use our model in your method, we provide a best single network pre-training model, and you can get the network code in the code

Train the model by yourself

  • Data preparation

We gave you the example file, which is in the folder config/train.csv You can follow it and write its path to config/example.json

  • If you want to train the model,please run the command. (We use 4 1080Ti for training, so larger than 4 gpus is recommended):

pip install -r requirements.txt

python Chexpert/bin/train.py Chexpert/config/example.json logdir --num_workers 8 --device_ids "0,1,2,3"

  • If you want to test your model, please run the command:

cd logdir/

  • Cuz we set "save_top_k": 3 in the config/example.json, so we may have got 3 models for ensemble here. So you should do as below:

cp best1.ckpt best.ckpt

python classification/bin/test.py

  • If you want to plot the roc figure and get the AUC, please run the command

python classification/bin/roc.py plotname

  • How about drink a cup of coffee?

you can run the command like this. Then you can have a cup of caffe.(log will be written down on the disk) python Chexpert/bin/train.py Chexpert/config/example.json logdir --num_workers 8 --device_ids "0,1,2,3" --logtofile True &

train the model with pre-trained weights

  • We provide one pre-trained model here: config/pre_train.pth we test it on 200 patients dataset, got the AUC as below:
Cardiomegaly Edema Consolidation Atelectasis Pleural_Effusion
0.8703 0.9436 0.9334 0.9029 0.9166
  • You can train the model with pre-trained weights, run the command as below:

python Chexpert/bin/train.py Chexpert/config/example.json logdir --num_workers 8 --device_ids "0,1,2,3" --pre_train "Chexpert/config/pre_train.pth"

Plot heatmap using trained model

  • Currently supported global_pool options in /config/example.json to plot heatmaps
global_pool Support
MAX Yes
AVG Yes
EXP Yes
LSE Yes
LINEAR Yes
PCAM Yes
AVG_MAX No
AVG_MAX_LSE No
  • We also provide heatmap comparision here, including AVG, LSE, and our own PCAM pooling.
              original               AVG (dev mAUC:0.895) LSE (dev mAUC:0.896) PCAM (dev mAUC:0.896)
Cardiomegaly
Atelectasis
Pleural Effusion
Consolidation
  • You can plot heatmaps using command as below:

python Chexpert/bin/heatmap.py logdir/best1.ckpt logdir/cfg.json CheXper_valid.txt logdir/heatmap_Cardiomegaly/ --device_ids '0' --prefix 'Cardiomegaly'

Where the CheXper_valid.txt contains lines of jpg path

About PCAM pooling

  • PCAM Overview:

  • If you think PCAM is a good way to generate heatmaps, you can cite our article like this:

Citation

@misc{ye2020weakly,
    title={Weakly Supervised Lesion Localization With Probabilistic-CAM Pooling},
    author={Wenwu Ye and Jin Yao and Hui Xue and Yi Li},
    year={2020},
    eprint={2005.14480},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Contact

  • If you have any quesions, please post it on github issues or email at [email protected]

Reference

chexpert's People

Contributors

buaatao avatar deadpoppy avatar dependabot[bot] avatar yil8 avatar yww211 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

chexpert's Issues

Lower AUC score of pre-trained weights compared to the README

Hi, Thanks for sharing your codes.

I tried testing on dev.csv (validation dataset that have about 200 cases) with the pretrained model in "config/pre_train.pth"
and got the AUC score like bellow
Cardiomegaly : 0.75921332
Edema: 0.91922399
Consolidation: 0.92703151
Atelectasis: 0.89496753
Pleural_Effusion: 0.91831263

mean AUC: 0.88375

and the mean AUC is lower than the results reported in your readme file where the mean AUC was 0.91336..
Can you let me know what is the reason that the pretrained model has lower AUC score?
I ran the code on the validation set of CheXpert original dataset.
Maybe I ran the code with different dataset?

Thank you for reading!

Definiton of PcamPool in global_pool.py

Hi,

Thanks for the neat repo. Quick question, in global_pool.py the class PcamPool is defined as:

`class PcamPool(nn.Module):

def __init__(self):
    super(ProbPool, self).__init__()

def forward(self, feat_map, logit_map):
    assert logit_map is not None

    prob_map = torch.sigmoid(logit_map)
    weight_map = prob_map / prob_map.sum(dim=2, keepdim=True)\
        .sum(dim=3, keepdim=True)
    feat = (feat_map * weight_map).sum(dim=2, keepdim=True)\
        .sum(dim=3, keepdim=True)

    return feat`

should it be super(PcamPool, self).__init__()?

RuntimeError: stack expects a non-empty TensorList

I get a runtime error when I execute heatmap.py. logit_maps = torch.stack(logit_maps) the model returns an empty tensor for logit_maps. Am not sure where the problem is coming from.
Any assistance will be highly appreciated
Take a look.
W NNPACK.cpp:51] Could not initialize NNPACK! Reason: Unsupported hardware.
Traceback (most recent call last):
File "/root/CCF5/Chexpert/bin/heatmap.py", line 114, in
main()
File "/root/CCF5/Chexpert/bin/heatmap.py", line 110, in main
run(args)
File "/root/CCF5/Chexpert/bin/heatmap.py", line 96, in run
prefix, figure_data = heatmaper.gen_heatmap(jpg_file)
File "/root/CCF5/Chexpert/bin/../utils/heatmaper.py", line 129, in gen_heatmap
logit_maps = torch.stack(logit_maps)
RuntimeError: stack expects a non-empty TensorList

function snippet from heatmapper.py where the error is raised.
def gen_heatmap(self, image_file):
"""
Args:
image_file: str to a jpg file path
Returns:
prefix_name: str of a prefix_name of a jpg with/without prob
figure_data: numpy array of a color image
"""
image_tensor, image_color = self.image_reader(image_file)
image_tensor = image_tensor.to(self.device)
# model inference
logits, logit_maps = self.model(image_tensor)
logits = torch.stack(logits)
logit_maps = torch.stack(logit_maps)

On the uncertain label problem

Dear Author:
Hello, first of all, thank you for sharing the great project! During my study, I encountered some problems, and I would like to ask you: The first question is that the labels of training and verification sets in ChexPert data set are all 14 categories, so I want to extend the five categories in your code to 14 categories. I noticed that official website only evaluated five categories for data sets, and gave five methods for uncertain labels. So why did you use U-one and U-zero in your code? The second question is, do you have any better suggestions for the selection of uncertain labels in other nine diseases?
I look forward to your reply! Wish you a happy life!

Questions about normalization and auc score

Hello. I really appreciate for your sharing this great project!
I have some questions.

You set the pixel_mean and pixel_std like this.

"pixel_mean": 128.0,
"pixel_std": 64.0,

I have seen a lot of examples of obtaining mean and std from data and normalizing it in the following way.

Chexpert/data/utils.py

Lines 44 to 68 in 96d4906

def transform(image, cfg):
assert image.ndim == 2, "image must be gray image"
if cfg.use_equalizeHist:
image = cv2.equalizeHist(image)
if cfg.gaussian_blur > 0:
image = cv2.GaussianBlur(
image,
(cfg.gaussian_blur, cfg.gaussian_blur), 0)
image = cv2.cvtColor(image, cv2.COLOR_GRAY2RGB)
image = fix_ratio(image, cfg)
# augmentation for train or co_train
# normalization
image = image.astype(np.float32) - cfg.pixel_mean
# vgg and resnet do not use pixel_std, densenet and inception use.
if cfg.pixel_std:
image /= cfg.pixel_std
# normal image tensor : H x W x C
# torch image tensor : C X H X W
image = image.transpose((2, 0, 1))
return image

Is there any problem even if I fix it to a specific value rather than mean and std obtained from the data?
If so, on what basis was the value set? I'm trying to test it on a 16 bit image.

And I have another question about the auc score.
Auc score of your model on the chexpert leaderboard is 0.929.
When I tested the model you provided with validation data, I got an auc score of 0.89.
I think you used the ensemble method. Should I implement additional code
to achieve similar performance to yours?

Thank you!

Can you add a license file?

Great work! Thanks for open sourcing the code! Can you add a LICENSE file (github doc)? Having an explicit open source license makes it much easier for us (and others) to built on top of your code. Otherwise the code is copyright protected by default. Thanks a lot!

What's the pretrained model?

Hi I saw you provided config/pre_train.pth
However, I do not know which model I should use to load this weight. Is it a densenet? Or a customized model that you built.
Thank you

About the AUC

I have followed the steps from the readme.md(there are several mistakes), but my AUC is not same as your AUC. I tested the pre_train.pth and got the AUC which was same as your ans. Could you give me some advice on how to improve my AUC? My result is below:

Save best is step : 2600 AUC : 0.8863004293372964

Cardiomegaly auc 0.801560379918589
Edema auc 0.9314345991561181
Consolidation auc 0.9283854166666666
Atelectasis auc 0.8830933333333334
Pleural_Effusion auc 0.9126838235294118

Discrepancy in AUC Scores Using Provided Pre-trained Model

I am currently working with the pre-trained model located at "config/pre_train.pth" as indicated in the provided data. My objective is to evaluate the AUC scores for the five given findings using the dev.csv dataset. However, the AUC scores that I have obtained appear to be significantly lower than the values mentioned in the readme.md file.

According to the readme.md file, the AUC scores obtained from testing the pre-trained model on a dataset of 200 patients are higher. However, after using the provided pre-trained model and evaluating it on the same dataset, I obtained the following AUC scores:

Finding Cardiomegaly Edema Consolidation Atelectasis Pleural_Effusion
Mentioned 0.8703 0.9436 0.9334 0.9029 0.9166
Obtained 0.7727 0.8711 0.8582 0.8434 0.9113

I am seeking clarification on this discrepancy. Could you please help me understand why there is a difference between the AUC scores I have obtained and the values mentioned in the readme.md file? Is there something specific I need to consider or any steps that I might be missing? Your guidance would be greatly appreciated in resolving this issue.

About image transforms

Hi, thanks for open-sourcing this nice work!!

I found in the config example.json, the use_transforms_type is set to 'Aug'. I wonder what kind of augmentation did you use? Based on the imgaug.py, it looks like you only apply RandomAffine (

def Aug(image):
). Did you try things like RandomCrop?

Also, how do you determine the pixel_mean and pixel_std? Does the image transformation matter a lot on the final performance? I found many people use the following normalization for the original RGB images.

transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

Meanwhile, you convert the images into Gray. I guess that's the reason why you choose not to use such normalization. Is my understanding correct?

Any suggestion would be appreciated :-) Thanks.

heatmaps wth avg_max pooling

Hi. can you tell me , how to get heatmaps when pooling is avg_max?
is it correct to take the last convolution layer of the backbone and use its gradient and activations to generate heatmap?

Reproducible Result Problem

Hi,
Thanks for the great work.

I trained the CheXpert dataset with this model by using pretrained model in the repo. I got the following results:
best1.ckpt -> 0.8554021060651713
best2.ckpt -> 0.8531909236770152
best3.ckpt -> 0.8546921678446425

Then, I want to train the same dataset with the same config and pretrained model. This time, the results were:
best1.ckpt -> 0.8405812891919311
best2.ckpt -> 0.8603955345253775
best3.ckpt -> 0.8383620949529762

Although seeds are set in the train.py for torch and I set the seed for numpy random, the results change every time I train the model. What could be the problem? How can I solve it? Thanks in advance.

torch.manual_seed(0)
torch.cuda.manual_seed_all(0)
np.random.seed(0)

Training the network gives high AUC but low ACC

Thanks for open-sourcing this solution. I am using it as the backbone to evaluate a Federated Learning approach to medical imaging for my final year individual undergraduate project.

I trained the model (not using the pre-saved weights) on a 20% sample of the data, which follows the same distribution as the overall training set, and although the AUC scores are quite good, the ACC scores are quite low for some of the observations. I was wondering if you had some insight on what might be causing this. What Acc scores were you getting for your best model?

Atelectasis - AUCROC: 0.871, Acc: 0.375
Cardiomegaly - AUCROC: 0.834, Acc: 0.670
Consolidation - AUCROC: 0.908, Acc: 0.860
Edema - AUCROC: 0.888, Acc: 0.790
Pleural Effusion - AUCROC: 0.901, Acc: 0.335

Also, I am using the CheXpert downsampled dataset (11GB). The config file I am using is similar to the provided one except I have had to reduce the batch size to 8 due to GPU memory constraints.

Thanks!

Use pre-trained model on customised dataset

Hi,

Thanks for sharing this. I'm wondering if I can use the pre-trained model on a customised dataset and what type of result do I expect to get? I have some chest CT scan but not labelled and I'm not a medic expert nor a medical student (aka radiograph mean nothing to me) and I'm wondering if I feed those data in, would it give me detections of the possibilities of 14 observations? Maybe I need to adjust the model?

training the model on all 14 parameters give lower AUC score

Thanks for open-sourcing your code it's a huge help. I trained your model on image size of 128X128 images and got the AUC scores as follows:
Cardiomegaly : 0.8282
Edema: 0.8878
Consolidation: 0.9250
Atelectasis: 0.8216
Pleural_Effusion: 0.9227

but when I trained the model with all 14 classes, the AUC scores of the following reduced to :
Cardiomegaly : 0.7698
Edema: 0.7843
Consolidation: 0.8443
Atelectasis: 0.7332
Pleural_Effusion: 0.7595

is there a way to keep the AUC scores high while training on

Render attention maps

Greetings! Thanks for nice work. Is it possible to render attention map as a heat map on the picture?

Tensorflow trained model

Hi,

Do you also have a pre trained model that is compatible with tensorflow? I have been converting the pytorch model here into onnx and then that into tensorflow and am getting a lot of extra nodes as part of the conversion. It would be great if you could provide a tensorflow version

About “label_header” from dataset.py

self._label_header = [
header[3],
header[6],
header[7],
header[9],
header[11]]
对应train.csv上列号是不是有问题,改成
self._label_header = [
header[7],
header[10],
header[11],
header[13],
header[15]]?

Example Configuration

Hi! We tried the pretrained model with given example.json configuration file, but we could not get your results given in the readme file. Could you share your configuration file so we can get the same results? Thanks!

Here are the results we get:

Cardiomegaly auc 0.7554274084124831
Edema auc 0.8814044605183846
Consolidation auc 0.8677455357142857
Atelectasis auc 0.84928
Pleural Effusion auc 0.9034926470588235

Are you sincere?

 ori_image: (H, W) numpy array of gray image
 logit_map: (H, W) numpy array of model prediction
 prob_map: (H, W) numpy array of model prediction with prob

is the size of the Logit maps is (H,W). when i train other network and removed the global pool layer and final linear layer then also i get the size of (N, C , 20,20) why is this like this when i print the shape of self.model.features or is this true

training the net again given output ranging between 0.4 and 0.5 only

When I am training the model again, it is giving probability values for consolidation between 0.4 and 0.5. and for other labels, the difference between max and min probability is again 0.1.
The AUC of consolidation is coming to be 0.945
Can you comment on this please?

Thank you

About changing dataset

Dear author, hello!
First of all, thank you for sharing your great project. I saw in your paper that you were experimenting with a chestx-ray14 dataset, but the data set used in this project is chexpert, and I'd like to re-examine the experiment with the chestx-ray14 dataset, but I found that the two data sets are different, I would like to ask you, what part of the code needs to be changed? And how do I modify it? Finally, I wish you a happy life! Looking forward to your advice, thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.