Giter Site home page Giter Site logo

mlmed / torchxrayvision Goto Github PK

View Code? Open in Web Editor NEW
842.0 18.0 206.0 44.88 MB

TorchXRayVision: A library of chest X-ray datasets and models. Classifiers, segmentation, and autoencoders.

Home Page: https://mlmed.org/torchxrayvision

License: Apache License 2.0

Python 3.29% Jupyter Notebook 96.69% Shell 0.01%
medical-imaging medical-ai deep-learning machine-learning transfer-learning dataset image-classification medical-image-processing medical-application medical

torchxrayvision's People

Contributors

a-parida12 avatar eczy avatar ieee8023 avatar josephdviviano avatar kilj4eden avatar kingjr avatar matteoguarrera avatar mrtj avatar parsatorb avatar rupertbrooks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

torchxrayvision's Issues

pixel normalization and NN weight inits

Hi,

Thanks for sharing these helpful dataset objects. I tried training another (shallower) architecture on these data with default weight inits and I start getting infs and nans after the first training iteration. When I normalized the pixels to have mean=0, std=1 then it worked fine.
Did you encounter this problem? How did you initialize your weights to prevent things from blowing up?
Dan

Same Output for each image

I've tried to use the pertained models on Images from MIMIC-CXR. For each model I get the same probabilities for each image, even if they are very different from each other.

For example:
image
image

Both images have been center cropped and resized to 224x224 resolution. I used the following code to get outputs from the model:

model = xrv.models.DenseNet(weights="densenet121-res224-all")

with torch.no_grad(): 
    out0 = model(inputs['image'][0].unsqueeze(0))
    out1 = model(inputs['image'][1].unsqueeze(0))

Both out0 and out1 equal
tensor([[0.5995, 0.5762, 0.5281, 0.5486, 0.5477, 0.5251, 0.5390, 0.6766, 0.5525, 0.5918, 0.6122, 0.5254, 0.5949, 0.3159, 0.5500, 0.5202, 0.6688, 0.6592]])

The same problem occurs with other pertained models like the one for MIMIC-CXR as well, only with different probabilities, but still the same ones for each respective image.

Split covid19 dataset group by patient ID

The COVID19_Dataset(Dataset) provides image and binary decision is_covid but lacks patient ID.

In the process of creating the dataset, it drops the information about the patient ID.
https://github.com/mlmed/torchxrayvision/blob/master/torchxrayvision/datasets.py#L859

Why it matters?
I would like to avoid having images from the same patient in both train and valid (or test set)?

How to solve it?
Could you please provide one of the following:

  • the patient ID on the output
  • or provide a split method which respects the patient ID?
    Thanks!

VinBigData Dataset

Hello! Do you plan in the future to add kaggle 'VinBigData Chest X-ray'? Or it is too label-noisy?

Dataset Creator

Thanks for the wonderful repo. I had some ideas to add flexibility to datasets loader for further experiments .

  1. For CheXpert datasets, the pure_label option is not available. My current understanding is, pure labels for other datasets filter those scans with only 1 finding. Currently for CheXpert the loader only returns Multi-labelled examples.
  2. Question/Doubt : How are the NaNs getting handled for CheXpert ? Does the Dataloader converts converts NaNs to 0 ?
  3. The pathologies for all the Datasets are hard coded. Maybe we can provide user with an option to chose pathologies from each datasets. Like maybe including a "No Finding" if they want

Do share your thoughts on the same. Let me know if you would me to raise a PR for some of it ?

Test csv on CheX_Dataset

patientid = self.csv.Path.str.split("train/", expand=True)[1]

It seems that CheX_Dataset was designed only for loading the train.csv
I've tried to use it also with valid.csv but I got an error, since it split the path using the constant string "train/".

The following lines should fix the problem!

    # patientid
    if 'train' in csvpath:
        patientid = self.csv.Path.str.split("train/", expand=True)[1]

    elif 'valid' in csvpath:
        patientid = self.csv.Path.str.split("valid/", expand=True)[1]
    else:
        raise NotImplemented

Thank you so much!

Normalization for the probability thresholds

outputs_new[mask_gt] = 1.0 - ((1.0 - outputs[mask_gt])/((1-op_threshs[mask_gt])*2))

Shouldn't this be computed as: outputs_new[mask_gt] = 0.5 + 0.5 * (outputs[mask_gt] - op_threshs[mask_gt]) / (1 - op_threshs[mask_gt])

instead of outputs_new[mask_gt] = 1.0 - ((1.0 - outputs[mask_gt]) / ((1 - op_threshs[mask_gt]) * 2))

In your computation, if outputs[mask_gt] == op_threshs[mask_gt], then the computed new value outputs_new[mask_gt] < 0.

More in depth documentation

Hey I'm new to the library and it looks fantastic however after some time fiddling around and reading documentation its still a bit confusing to me, I'm trying to scan a image and get the predictions on the model which was trained for all diseases however couldn't get it to work. I understand the project is still under construction but more examples and documentation would help a lot!

The train/val/test split of datasets

Firstly, thanks a lot for your sharing code!

I want to compare the performance on traditional setting (train on NIH and test on NIH) and leave one out setting (train on PC+MIMIC-CXR+CheXpert and test on NIH). But in 'torchxrayvision' we only get a whole dataset. The comparison between leave one out setting and other papers' traditional setting are unfair because the test data are different(my understanding is the traditional setting use a small part of the dataset for testing and leave one out setting use the whole dataset for testing), this problem confuses me these days.

I find in the Figure 2 of your paper 'On the limits of cross-domain generalization in automated X-ray prediction'(MIDL 2020), your team conducts the comparison between 'training domain only', 'all other domains' and 'all domains'. Could you please share how to do the comparison, what's your strategy on datasets split.

Thank you!

Got very different result between process_image.py and chester-xray web with same image.

process_image.py result:
{'preds': {'Atelectasis': 0.5126234,
'Cardiomegaly': 0.06345778,
'Consolidation': 0.5328131,
'Edema': 0.2958934,
'Effusion': 0.53310466,
'Emphysema': 0.50264,
'Enlarged Cardiomediastinum': 0.5229955,
'Fibrosis': 0.5155035,
'Fracture': 0.51720035,
'Hernia': 0.0015778106,
'Infiltration': 0.5470137,
'Lung Lesion': 0.16751169,
'Lung Opacity': 0.8801794,
'Mass': 0.65471303,
'Nodule': 0.5540755,
'Pleural_Thickening': 0.50598806,
'Pneumonia': 0.033414263, <------------------------------------------------------------------------------
'Pneumothorax': 0.50849223}}

chester-xray web result:
Atelectasis,pred:0.16236889362335205,OP_POINT:0.07422872->normalized:0.5476036444030495
Consolidation,pred:0.14904548227787018,OP_POINT:0.038290843->normalized:0.5575821902452106
Infiltration,pred:0.18629415333271027,OP_POINT:0.09814756->normalized:0.5488697426669435
Pneumothorax,pred:0.03416414186358452,OP_POINT:0.0098118475->normalized:0.5122968015230744
Edema,pred:0.1063307672739029,OP_POINT:0.023601074->normalized:0.5423646990338368
Emphysema,pred:0.01815347746014595,OP_POINT:0.0022490358->normalized:0.507970145973699
Fibrosis,pred:0.02496715821325779,OP_POINT:0.010060724->normalized:0.5075289639347826
Effusion,pred:0.1678045243024826,OP_POINT:0.103246614->normalized:0.5359953535221347
Pneumonia,pred:0.2875974476337433,OP_POINT:0.056810737->normalized:0.8090469355374046 <------------------------
Pleural Thickening,pred:0.039720576256513596,OP_POINT:0.026791653->normalized:0.5066424231236653
Cardiomegaly,pred:0.01207550149410963,OP_POINT:0.050318155->normalized:0.11999149704624136
Nodule,pred:0.13887496292591095,OP_POINT:0.023985857->normalized:0.558856271064256
Mass,pred:0.24335533380508423,OP_POINT:0.01939503->normalized:0.7984534567199927
Hernia,pred:0.00022874104615766555,OP_POINT:0.042889766->normalized:0.0026666156928632527
Lung Lesion,pred:0.020575225353240967,OP_POINT:0.053369623->normalized:0.19276157668605762
Fracture,pred:0.06660972535610199,OP_POINT:0.035975814->normalized:0.5158885595408194
Lung Opacity,pred:0.7050759196281433,OP_POINT:0.20204692->normalized:1
Enlarged Cardiomedia.,pred:0.08681041747331619,OP_POINT:0.05015312->normalized:0.5192964246370511

The result of Pneumonia is very different. Is this a normal behavior?

And what's the difference of web and process_image.py?

Model not classifying obvious pneumothorax

Hi, I am trying to run a quick test using this model. I have 2 different chest XR images that show an entire lung collapsed from pneumothorax, and I'd like to verify that the model correctly picks them up. However it isn't. My starting image is "img_data" which is a numpy.ndarray of size (2800, 3408) that looks like this:

image

below is the code I'm running:

import torchxrayvision as xrv
import skimage
import torchvision
import torch

model = xrv.models.ResNet(weights="resnet50-res512-all")

img = xrv.datasets.normalize(img_data, 255)

if len(img.shape) > 2:
img = img[:, :, 0]
if len(img.shape) < 2:
print("error, dimension lower than 2 for image")

img = img[None, :, :]

transform = torchvision.transforms.Compose([xrv.datasets.XRayCenterCrop(),
xrv.datasets.XRayResizer(512)])

img = transform(img)

output = {}
with torch.no_grad():
<tab img = torch.from_numpy(img).unsqueeze(0)
<tab preds = model(img).cpu()
<tab output["preds"] = dict(zip(xrv.datasets.default_pathologies,preds[0].detach().numpy()))

The "<tab" is indented lines in the loop. Running all this code, I get the following "output" variable:

{'preds': {'Atelectasis': 0.031930413,
'Consolidation': 0.0079838885,
'Infiltration': 0.022067936,
'Pneumothorax': 0.012027948,
'Edema': 3.992413e-06,
'Emphysema': 0.008683062,
'Fibrosis': 0.0037461556,
'Effusion': 0.012206978,
'Pneumonia': 0.005400587,
'Pleural_Thickening': 0.043657843,
'Cardiomegaly': 0.0010988085,
'Nodule': 0.011990261,
'Mass': 0.20278542,
'Hernia': 1.3901392e-05,
'Lung Lesion': 0.5,
'Fracture': 0.033246215,
'Lung Opacity': 0.04536338,
'Enlarged Cardiomediastinum': 0.5}}

Where we can see that Pneumothorax has a score of 0.012. It should be much higher given the obvious pneumothorax. The other test image does the same thing, shows an obvious pneumothorax but scores about 0.01 using this pipeline. What am I doing wrong here? Thanks much!

NLMTB_Dataset cannot be run without images

Traceback (most recent call last):
  File "/home/mila/v/vivianoj/code/torchxrayvision/scripts/train_model.py", line 131, in <module>
    dataset = xrv.datasets.NLMTB_Dataset(
  File "/home/mila/v/vivianoj/code/torchxrayvision/scripts/../torchxrayvision/datasets.py", line 1356, in __init__
    for fname in sorted(os.listdir(os.path.join(self.imgpath, "CXR_png"))):
FileNotFoundError: [Errno 2] No such file or directory: './CXR_png'

num_classes no longer changing the classifier

Today I noticed that all predictions by the "All" model were "16" despite the dataset only giving labels between 0 and 3. First I thought it was a mistake on my end but I knew that I didn't change anything today. I checked the code anyways and found out that setting num_classes to 4 still gives a model.classifier of Linear(in_features=1024, out_features=18, bias=True).

Using normal Densenet121 from torchvision models works fine so it is not an issue with my datamodule or data. The very strange part is that this changed all of a sudden while it was searching through hyperparameters with optuna. First 3 or 4 trials were fine and then all of a sudden I see label "4" and even "5" on the subsequent runs.

data loading

Hi,

I use your kaggle dataset object (including all data points) and define a data loader to train my model on a AWS EC2 instance with a GPU.
I am experiencing a volatile GPU-utility that flickers between 0% and 90%, but mostly at 0%.
I tried to make my dataloader as efficient

    train_loader = torch.utils.data.DataLoader(train_data, 
        batch_size=batch_size, num_workers=8, pin_memory=True, shuffle=True)

But I'm also double checking for other places in my code that might slow things down.

I'm curious what was your experience of training using these datasets? did you encounter anything similar?

Example notebook doesn't work and has poor results when fixed

Hi,

In using Torch XRV we started with the example notebook. However, it has multiple issues such that it does not run to begin with. We were able to fix it but the results of the pretrained weights seem poor on the NIH dataset. Additionally we think there is a bug in the code that will erroneously apply sigmoid twice if apply_sigmoid=True link.

I've opened a PR with our changes that fix the notebook here but we would appreciate input on the model performance.

EDIT: I cleared the notebook in the PR so there wouldn't be a lot of output, but essentially what we are seeing is very low precision on NIH res224. Here are the metrics we get generated from running the above notebook:

image

Thanks!

How to get binary scores?

The process_image.py script provides scores from classifier models in the interval [0, 1]. In order to ensure a binary decision, these scores have to compared to a threshold (normally 0.5). Are these the thresholds that are made available in the models.py script? What's the exact procedure to convert the model output scores to binary 0/1 decision?

extract 14class for mimic-crx

I want to use this codebase to predict pic in mimic-crx with 14 classes of diseases , how should I do?
I find all the model out put a 18dim feature ,and many of the dim ===0.

Bad performance when making predictions with the CheXpert model

Hi, thanks for making your work available! I'm trying to do a Fairness analysis and as a first step I need to obtain the model's predictions. I'm focusing on the CheXpert dataset and the CheXpert model. I reproduced the same split (seed=0) as you do, and then made predictions for the Test set using your CheXpert model. Computing AUC and other metrics on the Test set results to quite mediocre performance, far worse from what is reported on the paper. So I was wondering if I'm missing something big.

Let me note here that I am using the 'small' version of CheXpert (same as you do) and that I am transforming the Test set data when I create the CheX_Dataset object in the following way:
image

Your feedback on what I might be doing wrong would be extremely helpful!

Refactor image loading code to handle mode file types in a more standardized way

Add a utility function that takes in a path (jpg, png, dcm, etc) and returns a standardized tensor. Most of this code is in the get function on the dataloaders and it should be factored out.

Possibly in a new namespace like xrv.utils.load_img

This will make it very easy to include dcm support and also make tools which load and process images.

Sharing models through Hugging Face Hub

Hi TorchXRayVision team!

This project is amazing! Several Hugging Face followers and members of the "ML for healthcare" community recommended that we contacted you 🤗. I see you host and share models/datasets with your own server. Would you be interested in sharing your models in the Hugging Face Hub?

This integration would allow you to freely download/upload models, and make your work more accessible and visible to the rest of the ML community. We can help you set up a TorchXRayVision organization (examples, Facebook AI y Stanford NLP).

Creating the repos and adding new models should be a relatively straightforward process. This is a step-by-step guide explaining the process in case you're interested. Please let us know if you would be interested and if you have any questions.

Some of the benefits of sharing your models through the Hub would be:

  • Presence in the HF Hub might lower the entry of barrier to TorchXRayVision as well as increase its visibility.
    • Repos provide useful metadata about their tasks, languages, metrics, etc that make them discoverable
  • versioning, commit history, and diffs.
  • multiple features from TensorBoard visualizations, PapersWithCode integration, and more.

Additionally, we have a library to programmatically access repositories (both downloading pretrained models and pushing, with a lot of nice things such as filtering, caching, etc). If we want to try out this integration, I would suggest you add one or two models manually and then use the huggingface_hub library to implement downloading those models programmatically from torchxrayvision. You might want to check our documentation to read more about it.

Relevant references:

Happy to hear your thoughts,

Omar and the Hugging Face team (cc @osanseviero @abidlabs )

Which pixel data range?

Hi. Firstly, thank you for sharing.
I'm using your weights to do finetuning and train the network, with additional fully connected layers, on my own dataset. Which normalization have you used to train your network? I suppose the data are in [0, 1] pixel range.

Training Script

Are you using a the Merged Dataset for training ? I was not able to find the training script for the pre-trained models provided.

Incorrect URLs for trained model weights

Thank you for your work!

I noticed that the URLs for 'chex', 'kaggle', 'mimic_nb', 'mimic_ch' are set to the exact same URL. Could you update them to correct path?

model_urls = {
    'all': 'https://github.com/mlmed/torchxrayvision/releases/download/v1/nih-pc-chex-mimic_ch-google-openi-kaggle-densenet121-d121-tw-lr001-rot45-tr15-sc15-seed0-best.pt',
    'nih': 'https://github.com/mlmed/torchxrayvision/releases/download/v1/nih-densenet121-d121-tw-lr001-rot45-tr15-sc15-seed0-best.pt',
    'pc': 'https://github.com/mlmed/torchxrayvision/releases/download/v1/pc-densenet121-d121-tw-lr001-rot45-tr15-sc15-seed0-best.pt',
    'chex': 'https://github.com/mlmed/torchxrayvision/releases/download/v1/chex-densenet121-d121-tw-lr001-rot45-tr15-sc15-seed0-best.pt',
    'kaggle': 'https://github.com/mlmed/torchxrayvision/releases/download/v1/chex-densenet121-d121-tw-lr001-rot45-tr15-sc15-seed0-best.pt',
    'mimic_nb': 'https://github.com/mlmed/torchxrayvision/releases/download/v1/chex-densenet121-d121-tw-lr001-rot45-tr15-sc15-seed0-best.pt',
    'mimic_ch': 'https://github.com/mlmed/torchxrayvision/releases/download/v1/chex-densenet121-d121-tw-lr001-rot45-tr15-sc15-seed0-best.pt',
}

Thanks!

No module named 'torchxrayvision.baseline_models.jfhealthcare.model'

I did
pip install torchxrayvision
and in an ipython session, when I do:
import torchxrayvision as xrv
I get the following error:

In [1]: import torchxrayvision as xrv
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-04203dc2a6b0> in <module>
----> 1 import torchxrayvision as xrv

~/workspace/torchxrayvision/lib/python3.6/site-packages/torchxrayvision/__init__.py in <module>
      1 from . import datasets
      2 from . import models
----> 3 from . import baseline_models
      4 from . import autoencoders

~/workspace/torchxrayvision/lib/python3.6/site-packages/torchxrayvision/baseline_models/__init__.py in <module>
----> 1 from . import jfhealthcare
      2

~/workspace/torchxrayvision/lib/python3.6/site-packages/torchxrayvision/baseline_models/jfhealthcare/__init__.py in <module>
      5 import csv
      6 import numpy as np
----> 7 from .model import classifier
      8 import json
      9 import argparse

ModuleNotFoundError: No module named 'torchxrayvision.baseline_models.jfhealthcare.model'

Confusion about input image format

Hello author, thank you for sharing the code! But I have some questions, I found that the input format of the chest image has been changed from 3 channels to single channel, what is the benefit of doing this? Does this have any effect on the results? Looking forward to your insights!

taking gradients through the 'all' densenet

Hi,

I am trying to plug your 'all' densenet (in eval mode + fixed weights) to my generative pipeline. However I'm getting errors taking gradients. I saw that you updated the package with the 'op' util we discussed here a couple of weeks ago; so I updated the package myself as well. Now I'm getting a different error, which is:

Warning: Error detected in AddBackward0. Traceback of forward call that caused the error:
  ...(some prints)...
  File "/home/ubuntu/DR-VAE/drvae/model/vae.py", line 233, in lossfun
    return vae_loss + disc_loss
 (print_stack at /pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:60)
Traceback (most recent call last):
File "/home/ubuntu/DR-VAE/drvae/model/train.py", line 297, in train_epoch_xraydata
    loss.backward()
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/tensor.py", line 198, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/autograd/__init__.py", line 100, in backward
    allow_unreachable=True)  # allow_unreachable flag
**RuntimeError: Function AddBackward0 returned an invalid gradient at index 1 - expected type TensorOptions(dtype=float, device=cpu, layout=Strided, requires_grad=false) but got TensorOptions(dtype=float, device=cuda:0, layout=Strided, requires_grad=false) (validate_outputs at /pytorch/torch/csrc/autograd/engine.cpp:484)**
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x46 (0x7fbd188b5536 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x2d84224 (0x7fbd57f1e224 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #2: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0x548 (0x7fbd57f1fd58 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #3: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&, bool) + 0x3d2 (0x7fbd57f21ce2 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #4: torch::autograd::Engine::thread_init(int) + 0x39 (0x7fbd57f1a359 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #5: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7fbd646594d8 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0xee0f (0x7fbd65246e0f in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #7: <unknown function> + 0x76ba (0x7fbd683996ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #8: clone + 0x6d (0x7fbd680cf41d in /lib/x86_64-linux-gnu/libc.so.6)

I'm not sure why it expects cpu there.

what is difference between weights="densenet121-res224-mimic_ch" and weights="densenet121-res224-mimic_nb"

I have used model with weights="densenet121-res224-mimic_ch" to extract MIMIC-CXR's picture's feature ,and I found there 7 dims of the feature always equal to 0.5 ,it was strange.
I am new for medical image processing, and I found there two types of pre-trained model for MIMIC-CXR , What is the difference between them.
By the way , I have also used model with weight="densenet121-res224-all" , and I found this don't always present 0.5 in some dimensions like "densenet121-res224-mimic_ch" dose. Is the model with weight="densenet121-res224-all" suitable for all of the medical Image Dataset?

Covid19 Dataset Memory Issue

Has anyone encountered this sort of error?

It's implying a memory error, but my batch size is 4 and I set my num_workers to 1. Oddly it also happens right at the beginning of training.

I'm trying to train a DenseNet initialized using this package on the Covid-19 dataset.

Thanks in advance

Epoch 1:   0%|          | 0/16 [00:00<?, ?it/s]Begin training...
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
Traceback (most recent call last):
  File "/h/ptorabi/.anaconda3/envs/torch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 761, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/h/ptorabi/.anaconda3/envs/torch/lib/python3.7/multiprocessing/queues.py", line 104, in get
    if not self._poll(timeout):
  File "/h/ptorabi/.anaconda3/envs/torch/lib/python3.7/multiprocessing/connection.py", line 257, in poll
    return self._poll(timeout)
  File "/h/ptorabi/.anaconda3/envs/torch/lib/python3.7/multiprocessing/connection.py", line 414, in _poll
    r = wait([self], timeout)
  File "/h/ptorabi/.anaconda3/envs/torch/lib/python3.7/multiprocessing/connection.py", line 920, in wait
    ready = selector.select(timeout)
  File "/h/ptorabi/.anaconda3/envs/torch/lib/python3.7/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
  File "/h/ptorabi/.anaconda3/envs/torch/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 814) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/h/ptorabi/.anaconda3/envs/torch/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/h/ptorabi/.anaconda3/envs/torch/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/scratch/ssd001/home/ptorabi/dev/dlt/dlt/experiments/baseline1.py", line 112, in <module>
    fit_function_kwargs={}
  File "/scratch/ssd001/home/ptorabi/dev/dlt/dlt/commons/train.py", line 124, in fit
    for batch_index, batch in enumerate(dataloader):
  File "/h/ptorabi/.anaconda3/envs/torch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/h/ptorabi/.anaconda3/envs/torch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 841, in _next_data
    idx, data = self._get_data()
  File "/h/ptorabi/.anaconda3/envs/torch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 808, in _get_data
    success, data = self._try_get_data()
  File "/h/ptorabi/.anaconda3/envs/torch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 774, in _try_get_data
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
RuntimeError: DataLoader worker (pid(s) 814) exited unexpectedly
Epoch 1:   0%|          | 0/16 [00:00<?, ?it/s]

xrv.models.DenseNet(weights="all").cuda() problem in new update

Hello
Thanks for great repo

It seems you recently updated your codes,
This error happens when i try to load pretrained net:

xrv.models.DenseNet(weights="all").cuda()

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in getattr(self, name)
770 return modules[name]
771 raise ModuleAttributeError("'{}' object has no attribute '{}'".format(
--> 772 type(self).name, name))
773
774 def setattr(self, name: str, value: Union[Tensor, 'Module']) -> None:

ModuleAttributeError: 'BatchNorm2d' object has no attribute '_non_persistent_buffers_set'

Finetune pretrained models on different dataset

Hi all and thank you for developing this library.
In the readme it's said that the pretrained models can be used for feature reuse for few-shot learning.
Does that mean that it's possible to take a pretrained model and finetune it for a couple of epochs on a different dataset?
I spent some time to explore the repo, but I couldn't find anything related to that.
I would truly grateful if you could help me on that.
Thanks

The output of the kaggle densenet model

Hi,
I started playing with your cool package and wanted to make sure I follow. How do I work with the output of the final linear layer of the kaggle model?

if
model = xrv.models.DenseNet(weights="kaggle")
d_kaggle = xrv.datasets.Kaggle_Dataset(..)

and we push one image forward,
sample = d_kaggle[92]
out = model(torch.tensor(sample['PA']).unsqueeze(0))

and given that the relevant labels in d_kaggle.pathologies appear in indices 8 and 16 of xrv.datasets.default_pathologies

with
out_softmax = torch.nn.functional.softmax(out[0,[8,16]], dim=0)
(or sigmoid for that matter)
I always get out_softmax = [~x, ~x] for every example that I've pushed forward, regardless of the label.

autoencoder input size

I understand that the native autoencoder AE101 is a resnet autoencoder, but I wanted to clarify what is the expected input dimensionality for this?

is it 224x224 or 512x512 ? Also like the resnet model, if one inputs the wrong dimensionality does the model upscale it automatically?

PadChest dataset resized images are padded

Hello, I guess you're maintaining the torrent file of PadChest resized (224x224) images referenced from the dataset.py source code so I decided to open the issue here.

The png files in this torrent are resized to 224x224 pixels, by means of white padding. So effectively the original dimension of these images is lost and you have no (easy) way to separate the padding from the original image and use a different cropping method (eg. center cropping). May I suggest to save these files with:

  1. different resizing options (eg. padding, center cropping)
  2. leaving the smaller dimension of the images to 224 and the other dimension corresponding to the original aspect ratio so the user can choose a resize method of his preference?

not match between CheX_Dataset and model with weights="densenet121-res224-chex"

I found that the summary of CheX_Dataset has 13 classes of disease ,but in the head of model.py ,model with weights="densenet121-res224-chex" provide 11 classes of disease, I think this is not matched. And as same as "mimic_ch" and "mimic_nb".
this means the model will provide 7 useless dimensions for each sample. Is there any kind of disease missing?

Recreating Model Performance

We tried to recreate your pretrained model in: Link

Somehow we couldn't achieve the same performance. We were using open dataset (NIH, Padchest, MIMIC, RSNA, NIH_Google, Chexpert) based on your paper "On the limits of cross-domain
generalization in automated X-ray prediction"
as training/validation/test set and we retrieve all the datasets in original size.

As such, we have some questions that we would like to ask you:

  1. Regarding your augmentation settings in: datasets.py: Are they enough (cropping to 224x224 pixels, center crop, etc) to recreate the same model as your pretrain models? Should the Dataset be in RGB format or grayscale?

  2. Did you use the same preprocessing and augmentation for every dataset?

  3. We tried your code in: mlmed/covid-severity using the models we created. We tested it with your pretrained model: Link. Then, we compared it with our models (we trained the NIH, Padchest, MIMIC, RSNA, NIH_Google, Chexpert from scratch) with default hyper-parameter and default augmentation settings from your code. Somehow we could not achieve the same performance. Do you have any suggestions on why we couldn't achieve the same performance as you do? Or we were missing some steps here?

  4. Did you use any form of weight initialization or pretrained models?

How to use op_thresh during model testing

Hi,

I'm using the library to measure cross-domain generalization performance using the pre-trained models. For example, I'm using the Chexpert pre-trained model and testing it on another dataset to measure it's performance - I would like to know how I can use the op_thresh attribute of the model to threshold the outputs for metric calculation (F1, Sensitivity, etc.). Or should I use 0.5 as the threshold?

I'm new to using the library and any suggestions are greatly appreciated. Thanks!

Dataset class problem

Hi there,
thanks for your repo,
I run dataset = xrv.datasets.StonyBrookCOVID_Dataset() in Colab and it show the error : AttributeError: module 'torchxrayvision.datasets' has no attribute 'StonyBrookCOVID_Dataset' . Is there any solution to fix this error?

Histogram of scores produced by chex model

Hello! We are using your chex model to evaluate the whole CheXpert-v1.0-small dataset. Here are the histograms of scores per disease:
image
We were wondering if you could help us understand the jumps that happen on 0.5 for every disease.
We read the relevant section in the paper, where you dicsuss the calibration of the model, but we don't fully understand how to connect this with the above question.
Thank you in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.