lindawangg / covid-net Goto Github PK

View Code? Open in Web Editor NEW

1.2K 75.0 482.0 66.44 MB

COVID-Net Open Source Initiative

License: Other

Jupyter Notebook 56.92% Python 43.08%

coronavirus coronavirus-dataset coronavirus-detect chest-radiography covid-net covidx-dataset covid-19 sars-cov-2

covid-net's People

Contributors

Stargazers

Watchers

Forkers

alhaol prabindh sweetpand jhooge gyanachand1 engineer1999 qursaan chirathd tcml-bme romprakash ashora wilerjrxd weihao94 acjzz simonry14 jayshreesuresh shunsunsun codeaudit drcerenkaya aoe-khkhan tigerneil manoj1995madushanka shubhamgoel90 fdoperezi msalaciak jamesrenhoulee wkambale yujing1997 sshere ramstein tkhan3 volkansenturk2012 mustafakisacik waseemabbas05 sushantjha8 m-0day tspannhw tommylitlle sedacavdaroglu nitin5545 wamuko eric-le-12 surajjayraman adrabi-abderrahim rivanrashid mwilian syde770 smpadhy james-yuchen-he sumit33k mikeliux asdlei99 fbxie azahreba deepmeditativemind rishienandhan faizhamid levensworth rajneeshmehta rhenryherrera onuricen husnejahan nkamkolkar qitsweauca rezacsedu zell12 amirstudy deewan07 mmejiam-eafit berenicehdr stjordanis amardpt trifygri diop rifat007 jhonatantirado swickrotation gracechung-sw thoamsdong manikant92 i201821180 shahdghorsi aalhaimi remolaz mathewspjacob eperezp1990 nunofernandes-plight jzhang73 luuthienxuan abdelpakey poeblu trendingtechnology omar-fouad jasonzliang jurjsorinliviu fanky10 fikritrader doctordavidwalker imvvk11 mohamedelkaddoury

covid-net's Issues

Number of Trainable Parameters for COVID-Net

Any idea on the number of Trainable Parameters for COVID-Net CXR small and large?

The paper says the number of parameters in COVID-Net is 116.6 Million. However, keras network compiled in https://github.com/busyyang/COVID-19 which replicates the architecture in paper has 364.6 Million Parameters, which is more than 3 times the parameter.

Duplicate filenames in train/test text files as per `create_COVIDx_v2.ipynb` and a way to resolve the issue

Hi Dear. First of all, thank you so much for sharing the data and network. Though you have removed the duplicates from test_COVIDx.txt but as per me, there are some duplicate filenames in train_COVIDx.txt file. It is requested to please add the following function at the last of your create_COVIDx_v2.ipynb notebook. This function will resolve all the duplicate issues and will sort all of the images in train/test data to their respective subfolders (i.e., Normal, Pneumonia and COVID-19) as following.

1- train
           |_____ normal
                    |_____ 7,966 images
           |_____ pneumonia
                    |_____ 5442 images
           |_____ COVID-19
                    |_____ 92 images

2- test
           |_____ normal
                    |_____ 885 images
           |_____ pneumonia
                    |_____ 594 images
           |_____ COVID-19
                    |_____ 10 images

The function is

import pandas as pd
import shutil
from tqdm import tqdm_notebook as tqdm


def ArrangeData_LabelNamedFolders(file_path, folder_path, dest_folder_path, indicator):
    print('{} Operation'.format(indicator))
    df = pd.read_csv(file_path, sep=' ', names=['patientid', 'filename', 'label'])
    df = df.drop_duplicates(subset='filename', keep="first")
    labelFolders = df.label.unique()
    print(labelFolders)
    for labelFolder in labelFolders:
        if not os.path.exists(dest_folder_path+'/'+labelFolder):
            os.makedirs(dest_folder_path+'/'+labelFolder)
    imageNames = sorted(os.listdir(folder_path))
    for imageName in tqdm(imageNames):
        temp_df = df.loc[df['filename']== imageName]
        class_ = temp_df['label'].values.item()
        src = folder_path +'/' + imageName
        dest = dest_folder_path + '/' + str(class_) + '/' + imageName 
        shutil.copy(src, dest)
        
        
    
train_file = 'train_split_v2.txt'
train_folder = './data/train'
dest_train_folder = './categorize data/train'

test_file = 'test_split_v2.txt'
test_folder = './data/test'
dest_test_folder = './categorize data/test'


ArrangeData_LabelNamedFolders(test_file, test_folder, dest_test_folder, indicator='Test')
ArrangeData_LabelNamedFolders(train_file, train_folder, dest_train_folder, indicator='Train')

Incorrect reference in white paper

There is an error in the white paper on the last reference point, where the Kaggle dataset link is pointing to Cohen's dataset on github.

Test size too small for COVID-19 cases

With only 10 samples for COVID-19 cases, isn't the test set size too small to have any degree of confidence?

Confusion Matrix and Results Should Be Updated Or Clarified

It looks like additional training and test examples were added but the Confusion Matrix and Results have not been updated to reflect this. I recommend either updating the results, or if the results are not available yet (possibly still training the new model?) a quick note added to make sure that there isn't confusion about the Confusion Matrix, which only shows 8 ground truth COVID-19 samples still. As there are two false positives in the Confusion Matrix, it's possible to assume that the results have been miscalculated with false negatives as false positives, which would reverse the precision and recall.

Out-of-Sample filtering and transformation

Hello, and thank you for sharing this great project.

I am testing the net on out-of-sample data, some known COVID or non-COVID images and having some troubles. These are my questions to the community if anybody could help:

Is it mandatory to filter the out-of-sample images to PA projections? Don't know if it is important or it supposed to work fine with AP too.
Is it needed to transform the image to RGB? On the README says the Net is expecting a 224,244,3 array and DICOM images are just grayscale, i'm trying with openCV libraries transformation, don't know if this is correct to handle DICOM files:
"img=cv2.cvtColor(img, cv2.COLOR_GRAY2RGB)"

DataLossError While Loading Model

Issue

DataLossError While Loading Model

Environment

Libraries
opencv-contrib-python 4.2.0.34
tensorflow-gpu 1.15.0
Docker Environment, Tensorflow-GPU Image
JupyterLab 2.0.1

Description

Hi, while I'm trying to load model for inference or evaluation in jupyter, I always got this error.

DataLossError: Checksum does not match: stored 1497157360 vs. calculated on the restored bytes 2410561084
	 [[node save/RestoreV2 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]

My full notebook is shown below

import numpy as np
import os, argparse
import cv2
import tensorflow as tf

weightspath = '/tf/notebooks/mrc_is_here/COVID-Net/models/covid-net-large'
metaname = "model.meta_eval"
ckptname = "model-2069"
imagepath = 'assets/ex-covid.jpeg'

mapping = {'normal': 0, 'pneumonia': 1, 'COVID-19': 2}
inv_mapping = {0: 'normal', 1: 'pneumonia', 2: 'COVID-19'}

sess = tf.Session()
tf.get_default_graph()
saver = tf.train.import_meta_graph(os.path.join(weightspath, metaname))
saver.restore(sess, os.path.join(weightspath, ckptname))

My folder structure is:

COVID-NET

inference.ipynb
models
- covid-net-large
  - checkpoint
  - model-2069.data-00000-of-00001
  - model-2069.index
  - mode.meta_eval
  - mode.meta_train
- covid-net-small
  - checkpoint
  - model-6207.data-00000-of-00001
  - model-6207.index
  - mode.meta_eval
  - mode.meta_train

Steps to Reproduce

Create a model folder as shown above
Copy all the code to a new jupyter notebook
Run step by step

Expected behavior

Restore model from meta and checkpoint files

Actual behavior

Throw exception DataLossError

Dataset distribution

There seem to be some discordance between the number of training and testing samples mentioned in the RSNA pneumonia dataset versus the dataset distribution mentioned on the github page, probably because of the multiple rows corresponding to same patient ID in the RSNA dataset’s csv file as it was supposed to be a detection task, please verify.

Missing checkpoint file

Could you please include the checkpoint file which is missing in your Pretrained model (COVID-Netv1.zip)

On using a train saver, the following files might have been generated at your end:

model.data-00000-of-00001
model.meta
model.index
checkpoint

These 4 files would be required when importing the meta graph and restoring the latest checkpoint.

Thanks!

Missing requirements.txt

Hi @lindawangg,

Can you please add requirements.txt so we can install the exact library versions you tested with?

pip freeze > requirements.txt and then remove the libraries you aren't explicitly using here.

Open Source Helps!

Thanks for your work to help the people in need! Your site has been added! I currently maintain the Open-Source-COVID-19 page, which collects all open source projects related to COVID-19, including maps, data, news, api, analysis, medical and supply information, etc. Please share to anyone who might need the information in the list, or will possibly contribute to some of those projects. You are also welcome to recommend more projects.

http://open-source-covid-19.weileizeng.com/

Cheers!

Issue related to Confidence score in Inference.py

The below recent commit in Inference.py file is missing "Covid-19". "Normal" is showing twice.

print('Confidence')
print('Normal: {:.3f}, Pneumonia: {:.3f}, Normal: {:.3f}'.format(pred[0][0], pred[0][1], pred[0][2]))

unable to download dataset from kaggle

Missing RSNA images

RSNA has ~26k images, but COVIDx has ~13k samples.
Could you include some explanation on how the samples are chosen?

Covid-net model?

Paper mentions that code for covid-net architecture is available at repo but it is not present atm.

inference on cpu vs gpu

What's the recommended environment for inference? CPU or GPU?

I've tested this on Google Colab

python inference.py --weightspath ./ --metaname model.meta_eval --ckptname model-6207 --imagepath assets/ex-covid.jpeg

With CPU it takes ~ 4.9s

With GPU it takes ~ 6.5s

How to obtain the COVID-Netv2 model?

Hello my friends,

I'm trying to use train_tf to train with a pre-trained model, but it doesn't have this COVIDNetv2 file or model-2069. How to get them?

Missing images from Figure1 dataset

Thanks for the amazing work!

I see that in Covidx2 you used only 3 images from Figure1 collections. Is there a reason for that, or was it just timing? Do you know if there are overlaps between Figure1 and ieee8023/covid-chestxray-dataset?

Can you provide pretrained model on imageNet?

Issue Template

Description

Please include a summary of the issue.
Please include the steps to reproduce.
List any additional libraries that are affected.

Steps to Reproduce

First step
Second step
Third step

Expected behavior

A description of what you expected to happen.

Actual behavior

A description of what happens instead.

Environment

Build: [e.g. 3180 - type "About" in the Command Palette]
Operating system and version: [e.g. macOS 10.14, Windows 10, Ubuntu 18.04]
[Linux] Desktop Environment and/or Window Manager: [e.g. Gnome, LXDE, i3]

running inference file is getting Killed.

I am trying to inference for multiple input images (six of them) and getting below error
it works fine for 4 input images.

2020-04-14 07:43:21.737334: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-04-14 07:43:21.748542: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2020-04-14 07:43:21.748968: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f2a7c638260 executing computations on platform Host. Device
s:
2020-04-14 07:43:21.749010: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
Killed

PEPX Network Design Pattern

Thanks for working on this project! This is very interesting and very impactful.

COVID-Net relies on a design pattern of projection-expansion-projection-extension (PEPX) throughout the network. I have beginner-level knowledge of computer vision, and I haven't seen this design pattern before.

Without loading in the model, what are the output dimensions of each layer in the PEPX module (Figure 2, top right box) for PEPX1.1? This would give me a better understanding of how dimension is changing within the module.
What is the intuition around the effectiveness of this design pattern? Are there some previous papers that use this design pattern for their core results?

Model detects COVID-19 or Data Source?

Given the datasource is different for the two classes: COVID and Pneumonia/Normal, how do you validate that the model doesn't classify the data source, but actually classifies the presence of COVID-19?

Name of model ckpts

I couldn't find the model ckpts file inside the covid_net.large file .. what should i do?

Confirmation on the data splits and benchmark results with {train,test}_COVIDx2.txt

Hi Linda,

Thanks for providing nice guidelines for the COVIDx dataset and COVID-Net. I recently compiled the dataset using the guideline provided in https://github.com/lindawangg/COVID-Net/blob/master/docs/COVIDx.md. However, I noticed that the test class distribution is slightly different than the one presented in https://github.com/lindawangg/COVID-Net#results. I have used the https://github.com/lindawangg/COVID-Net/blob/master/create_COVIDx_v3.ipynb script, train_COVIDx2.txt and test_COVIDx2.txt files. For your reference, I observed the following data distribution:

COVID19 -> Train (223 images), Test (31 images)
Normal -> Train (7966 images), Test (885 images)
Prenumonia -> Train (5451 images), Test (594 images)

Kindly confirm whether the distribution is correct.

Furthermore, do you have any benchmark result with the above data distribution? The benchmark presented in https://github.com/lindawangg/COVID-Net#results is with the lesser test samples. What version of data distribution do you recommend for comparison with the COVID-Net? Kindly advise.

I look forward to your answers. Thank you.

Regards,
Saimun

Visual result control

Hello everyone,

I run the inference.py scipt on some "healthy" X ray images but the result was "Covid-19". I would like to check what the network is really classifing.
I read in the joint paper that the QSInquire method was used to have a visual control of results. Is the method available ? Does someone know how it works ?
Or have an alternative way of visualise the result ?

Thanks,

PDF says softmax 4 but create_COVIDx_v2.ipynb only creates 3 categories.

Apparently the model uses 4 softmax layers, but the latest dataset creation notebook only splits the cases into Normal, Pneumonia and Covid-19. There seems to be a mismatch here. What is the latest approach?

OSError: File models/COVIDNet-CXR-Large/model.meta does not exist

How to fix it ?

Migrate to tensorflow 2

Can you please make your code compatible with Tensorflow 2.0+ by default.

Pretty easy to do with no code changes, see:
https://www.tensorflow.org/guide/migrate

Just add this wherever you previously imported tensorflow:

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

And update your requirements.txt per #45

model-8485 vs model-10 (epoch 10)

Why the evaluation results are different when loading model-8485 and model-10 (epoch 10)?
These are my results when running eval.py using COVIDNet-CXR-Large, test_COVIDx2.txt and model-10 (epoch 10)

[[93. 7. 0.]
[ 5. 93. 2.]
[ 2. 2. 27.]]
Sens Normal: 0.930, Pneumonia: 0.930, COVID-19: 0.871
PPV Normal: 0.930, Pneumonia 0.912, COVID-19: 0.931

Missing Expected regions where xray analysis is expected to be viable

There are many things one may learn from your work here with CovidNet, as it is robust work that will reasonably help to push/democratize Ai in a positive direction.

One thing of note, is that there are expected regimes for which xray/ct based techniques, be it by human or by ai, are expected to be viable. Unless I am mistaken, I did not detect that in the CovidNet paper nor the CovidNet/repository's readme file.

I think a similar section from the repository seen in issue 55 after title "Preliminary Conclusion", concerning expected constraints on testing/diagnosis, should be considered for CovidNet.

A quick snippet can be seen below:

Information regarding GSInquire

Example chest radiography images of COVID-19 cases from 2 different patients and their associated critical factors (highlighted in red) as identified by GSInquire.

Can you kindly tell what is GSInquire and how it was instrumental in identifying associated critical factors?

At least two variables have the same name: conv1_conv/bias

Hi,
I am getting below error message frequently
"At least two variables have the same name: conv1_conv/bias"
when was trying to test pneumonia image and also i saw this error with normal and COVID-19 image but less frequently

version of tensor flow i have.
tensorboard = 1.14.0
tensorflow = 1.14.0

Percentage training and test dataset

Hi, Why do they use 99% for training and 1% for testing? The standard is 80-20 70-30.

Code for the model

Hi,

Will you be publishing the code for the model anytime soon? Thanks.

Best,

Arijit.

Example of how to use the pre-trained model

I've managed to load the provided model, but I'm not sure how to proceed from there to actually use it on an image.

In #2 @Vikramank mentioned a Flask app, so I assume it's possible to use the pre-trained models, but without more info or docs, I don't know how to do it.

EDIT: actually, digging a bit more into this, it seems that, without the actual Keras model, using the Tensorflow checkpoint is pretty hard :-? From keras-team/keras#5273 (comment):

Fundamentally, you cannot "turn an arbitrary TensorFlow checkpoint into a Keras model".

What you can do, however, is build an equivalent Keras model then load into this Keras model the weights contained in a TensorFlow checkpoint that corresponds to the saved model. In fact this is how the pre-trained InceptionV3 in Keras was obtained.

So, without more info, it seems pretty hard (or impossible?) to do :-?

Would it be possible to add a simple example script that (1) gets the path to an image as input (1) loads the module (2) uses it and outputs the "COVID probability"?

Thanks!

Script for loading the model

is:issue Sir can you provide a simple code to load the model because i have tried many methods and all are showing some errors kindly also mention versions of tf. last error i got was :
{module 'tensorflow._api.v2.train' has no attribute 'import_meta_graph')
............... the code was as following :
import tensorflow as tf
new_graph = tf.Graph()
with tf.compat.v1.Session(graph=new_graph) as sess:
saver = tf.train.import_meta_graph('/content/models2/model.meta')
saver.restore(sess, "/content/models2/model")

i have also tried tf.compat.v1.Session and tf.Session but unable to load the model

OSError: File models/COVIDNet-CXR-Small/model.meta_eval does not exist

How to fix it?

Tensorflow 2.0

Hi do you plan to upgrade to TF2.0?

Run train_tf.py failed

Failure to acknowledge 1 other open source lung scan based ConvNet that was published back on February 9, over a month before CovidNet

Just a heads up.

With reference to this line on main repository: "Motivated by this, a number of artificial intelligence (AI) systems based on deep learning have been proposed and results have been shown to be quite promising in terms of accuracy in detecting patients infected with COVID-19 using chest radiography images. However, to the best of the authors' knowledge, these developed AI systems have been closed source and unavailable to the research community for deeper understanding and extension, and unavailable for public access and use."

There has been another open source project since February 9, that pre-dated CovidNet by over a month:

Emails were sent to one of the authors namely Alexander, but the main CovidNet github repository is yet to reflect/acknowledge the much earlier repository, which is easy and quickly doable. Why hasn't this been done?

Can't reconcile layer dimensions in chart from COVID_Netv2.pdf

Hi, First of all thanks for sharing your work.

When will you release the training script?
I can't seem to reconcile the layer dimensions in the PDF. The first layer gives the dimensions of the input images in parentheses, so I assume the numbers in parentheses are the dimensions of what is passed to the next layer. If so, how does a 7x7 convolutional layer output 112x112x64 from an input of 224x224x3? Assuming step size and padding size are integers, this doesn't seem to work with the formula for calculating the output size of convolutions unless you have unreasonably huge padding.

For the conv1x1 layers the dimension gets cut in half, suggesting the step size is 2. However with a 1x1 filter this means you're dropping half of the pixels. Is this correct?

Furthermore, the first flatten layer is said to have a flattened dimension of 100352 - but that's what you'd get from just PEPX 4.3. However you also have PEPX 4.2, PEPX 4.1, and the last conv1x1 on the right all feeding into the flattened layer, which each have 100352 elements. so are these 4x100352 all flattened together, feeding a vector of 401408 elements into the first FC layer (as I would expect since they all come from the same input image), or are you treating them separately?

Could you please specify the PEPX layer dimensions?

Dataset RSNA Pneumonia Challenge

Hi to everyone,

I've a doubt about Dataset RSNA Pneumonia Challenge.
I'm going to download dataset (4gb) for detection of pneumonia on my own NN model.
I was wondering if the dataset was paid or if there is any constraint over the term "challenge"

Direct link to RSNA dataset

Just to confirm if the dataset can be downloaded directly from here as well?

https://www.rsna.org/en/education/ai-resources-and-training/ai-image-challenge/RSNA-Pneumonia-Detection-Challenge-2018

This is to avoid the signing up a Kaggle account with mobile number confirmation.

Not able to obtain the same sensitivity values (96.8) when trained on COVIDNet-CXR-Large model

I am not getting the same sensitivity values when trained for 15 more epochs. The sensitivity values are not retained for the 1st epoch itself!
I have used COVIDNet-CXR-Large model and the dataset files being train_COVIDx2.txt and test_COVIDx2.txt.
It has been mentioned in the paper that you have used a learning rate policy which reduces the learning rate if the learning is stagnated for a period of time. The factor and patience values, 0.7 & 5 respectively have also been mentioned in the paper. However, I did not come across any line in the code which implement this.
I have tried to train the model on the same dataset for another 30 epochs as well with different learning rates (2e-07 & 2e-08). The sensitivity kept dropping.
Am I missing something?

Re-Implementation in PyTorch

I'm currently making an implementation of the COVID-Net in PyTorch
---- > https://github.com/IliasPap/COVIDNet

I'm looking forward for the network exact details ( num_of_filters, activation functions ... )

Do I need to train the model "COVIDNet-CXR-Large"? or I can directly inference the model with inference.py script?

Do I need to train the model "COVIDNet-CXR-Large"? or I can directly inference the model with inference.py script?
What should be the inference's x-ray image dimension in pixel? and what should be the image's file size of the inference's image?

How to train network from scratch?

Hello,

I want to train you network from scratch and not from the pre-trained "Small" and "Large" models.

Could you please describe how to do that since I want to compare the effects of data augmentation vs dropout vs no augmentation & no dropout?

Thank you!

CovidNet query

Issue Template

Before posting, have you looked at the FAQ page?

Description

Please include a summary of the issue.
Please include the steps to reproduce.
List any additional libraries that are affected.

Steps to Reproduce

First step
Second step
Third step

Expected behavior

A description of what you expected to happen.

Actual behavior

A description of what happens instead.

Environment

Build: [e.g. 3180 - type "About" in the Command Palette]
Operating system and version: [e.g. macOS 10.14, Windows 10, Ubuntu 18.04]
[Linux] Desktop Environment and/or Window Manager: [e.g. Gnome, LXDE, i3]
Can COVIDNet be used for binary class classification? (normal and COVID19)

Can you please upload the ImageNet pretrained weights For COVID-Net?

Description

As mentioned in the paper, the COVID_Net has been pretrained on ImageNet. Could you please upload the imageNet pretrained weights so we can reproduce your result

Normal, Pneumonia and Covid-19 split

Dear Linda,

For training the model, is there a script available that can separate the classes ( Normal, Pneumonia and Covid-19) based on the train and test text files?

I've managed to build the train and test data sets but they aren't labeled at the moment.

Thanks,
Babs

lindawangg / covid-net Goto Github PK

covid-net's People

Contributors

Stargazers

Watchers

Forkers

covid-net's Issues

Issue

Environment

Description

Steps to Reproduce

Expected behavior

Actual behavior

Issue Template

Description

Steps to Reproduce

Expected behavior

Actual behavior

Environment

One thing of note, is that there are expected regimes for which xray/ct based techniques, be it by human or by ai, are expected to be viable. Unless I am mistaken, I did not detect that in the CovidNet paper nor the CovidNet/repository's readme file.

Issue Template

Description

Steps to Reproduce

Expected behavior

Actual behavior

Environment

Description

Recommend Projects

Recommend Topics

Recommend Org