lindawangg / covid-net Goto Github PK
View Code? Open in Web Editor NEWCOVID-Net Open Source Initiative
License: Other
COVID-Net Open Source Initiative
License: Other
Any idea on the number of Trainable Parameters for COVID-Net CXR small and large?
The paper says the number of parameters in COVID-Net is 116.6 Million. However, keras network compiled in https://github.com/busyyang/COVID-19 which replicates the architecture in paper has 364.6 Million Parameters, which is more than 3 times the parameter.
Hi Dear. First of all, thank you so much for sharing the data and network. Though you have removed the duplicates from test_COVIDx.txt
but as per me, there are some duplicate filenames
in train_COVIDx.txt
file. It is requested to please add the following function at the last of your create_COVIDx_v2.ipynb
notebook. This function will resolve all the duplicate issues and will sort all of the images in train/test data to their respective subfolders (i.e., Normal, Pneumonia and COVID-19) as following.
1- train
|_____ normal
|_____ 7,966 images
|_____ pneumonia
|_____ 5442 images
|_____ COVID-19
|_____ 92 images
2- test
|_____ normal
|_____ 885 images
|_____ pneumonia
|_____ 594 images
|_____ COVID-19
|_____ 10 images
The function is
import pandas as pd
import shutil
from tqdm import tqdm_notebook as tqdm
def ArrangeData_LabelNamedFolders(file_path, folder_path, dest_folder_path, indicator):
print('{} Operation'.format(indicator))
df = pd.read_csv(file_path, sep=' ', names=['patientid', 'filename', 'label'])
df = df.drop_duplicates(subset='filename', keep="first")
labelFolders = df.label.unique()
print(labelFolders)
for labelFolder in labelFolders:
if not os.path.exists(dest_folder_path+'/'+labelFolder):
os.makedirs(dest_folder_path+'/'+labelFolder)
imageNames = sorted(os.listdir(folder_path))
for imageName in tqdm(imageNames):
temp_df = df.loc[df['filename']== imageName]
class_ = temp_df['label'].values.item()
src = folder_path +'/' + imageName
dest = dest_folder_path + '/' + str(class_) + '/' + imageName
shutil.copy(src, dest)
train_file = 'train_split_v2.txt'
train_folder = './data/train'
dest_train_folder = './categorize data/train'
test_file = 'test_split_v2.txt'
test_folder = './data/test'
dest_test_folder = './categorize data/test'
ArrangeData_LabelNamedFolders(test_file, test_folder, dest_test_folder, indicator='Test')
ArrangeData_LabelNamedFolders(train_file, train_folder, dest_train_folder, indicator='Train')
With only 10 samples for COVID-19 cases, isn't the test set size too small to have any degree of confidence?
It looks like additional training and test examples were added but the Confusion Matrix and Results have not been updated to reflect this. I recommend either updating the results, or if the results are not available yet (possibly still training the new model?) a quick note added to make sure that there isn't confusion about the Confusion Matrix, which only shows 8 ground truth COVID-19 samples still. As there are two false positives in the Confusion Matrix, it's possible to assume that the results have been miscalculated with false negatives as false positives, which would reverse the precision and recall.
Hello, and thank you for sharing this great project.
I am testing the net on out-of-sample data, some known COVID or non-COVID images and having some troubles. These are my questions to the community if anybody could help:
Is it mandatory to filter the out-of-sample images to PA projections? Don't know if it is important or it supposed to work fine with AP too.
Is it needed to transform the image to RGB? On the README says the Net is expecting a 224,244,3 array and DICOM images are just grayscale, i'm trying with openCV libraries transformation, don't know if this is correct to handle DICOM files:
"img=cv2.cvtColor(img, cv2.COLOR_GRAY2RGB)"
DataLossError While Loading Model
Hi, while I'm trying to load model for inference or evaluation in jupyter, I always got this error.
DataLossError: Checksum does not match: stored 1497157360 vs. calculated on the restored bytes 2410561084
[[node save/RestoreV2 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
My full notebook is shown below
import numpy as np
import os, argparse
import cv2
import tensorflow as tf
weightspath = '/tf/notebooks/mrc_is_here/COVID-Net/models/covid-net-large'
metaname = "model.meta_eval"
ckptname = "model-2069"
imagepath = 'assets/ex-covid.jpeg'
mapping = {'normal': 0, 'pneumonia': 1, 'COVID-19': 2}
inv_mapping = {0: 'normal', 1: 'pneumonia', 2: 'COVID-19'}
sess = tf.Session()
tf.get_default_graph()
saver = tf.train.import_meta_graph(os.path.join(weightspath, metaname))
saver.restore(sess, os.path.join(weightspath, ckptname))
My folder structure is:
COVID-NET
Restore model from meta and checkpoint files
Throw exception DataLossError
There seem to be some discordance between the number of training and testing samples mentioned in the RSNA pneumonia dataset versus the dataset distribution mentioned on the github page, probably because of the multiple rows corresponding to same patient ID in the RSNA dataset’s csv file as it was supposed to be a detection task, please verify.
Could you please include the checkpoint file which is missing in your Pretrained model (COVID-Netv1.zip)
On using a train saver, the following files might have been generated at your end:
These 4 files would be required when importing the meta graph and restoring the latest checkpoint.
Thanks!
Hi @lindawangg,
Can you please add requirements.txt
so we can install the exact library versions you tested with?
pip freeze > requirements.txt
and then remove the libraries you aren't explicitly using here.
Thanks for your work to help the people in need! Your site has been added! I currently maintain the Open-Source-COVID-19 page, which collects all open source projects related to COVID-19, including maps, data, news, api, analysis, medical and supply information, etc. Please share to anyone who might need the information in the list, or will possibly contribute to some of those projects. You are also welcome to recommend more projects.
http://open-source-covid-19.weileizeng.com/
Cheers!
The below recent commit in Inference.py file is missing "Covid-19". "Normal" is showing twice.
print('Confidence')
print('Normal: {:.3f}, Pneumonia: {:.3f}, Normal: {:.3f}'.format(pred[0][0], pred[0][1], pred[0][2]))
RSNA has ~26k images, but COVIDx has ~13k samples.
Could you include some explanation on how the samples are chosen?
Paper mentions that code for covid-net architecture is available at repo but it is not present atm.
What's the recommended environment for inference? CPU or GPU?
I've tested this on Google Colab
python inference.py --weightspath ./ --metaname model.meta_eval --ckptname model-6207 --imagepath assets/ex-covid.jpeg
With CPU it takes ~ 4.9s
With GPU it takes ~ 6.5s
Hello my friends,
I'm trying to use train_tf to train with a pre-trained model, but it doesn't have this COVIDNetv2 file or model-2069. How to get them?
Thanks for the amazing work!
I see that in Covidx2 you used only 3 images from Figure1 collections. Is there a reason for that, or was it just timing? Do you know if there are overlaps between Figure1 and ieee8023/covid-chestxray-dataset?
Please include a summary of the issue.
Please include the steps to reproduce.
List any additional libraries that are affected.
A description of what you expected to happen.
A description of what happens instead.
I am trying to inference for multiple input images (six of them) and getting below error
it works fine for 4 input images.
2020-04-14 07:43:21.737334: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-04-14 07:43:21.748542: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2020-04-14 07:43:21.748968: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f2a7c638260 executing computations on platform Host. Device
s:
2020-04-14 07:43:21.749010: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
Killed
Thanks for working on this project! This is very interesting and very impactful.
COVID-Net relies on a design pattern of projection-expansion-projection-extension (PEPX) throughout the network. I have beginner-level knowledge of computer vision, and I haven't seen this design pattern before.
Without loading in the model, what are the output dimensions of each layer in the PEPX module (Figure 2, top right box) for PEPX1.1? This would give me a better understanding of how dimension is changing within the module.
What is the intuition around the effectiveness of this design pattern? Are there some previous papers that use this design pattern for their core results?
Given the datasource is different for the two classes: COVID and Pneumonia/Normal, how do you validate that the model doesn't classify the data source, but actually classifies the presence of COVID-19?
I couldn't find the model ckpts file inside the covid_net.large file .. what should i do?
Hi Linda,
Thanks for providing nice guidelines for the COVIDx dataset and COVID-Net. I recently compiled the dataset using the guideline provided in https://github.com/lindawangg/COVID-Net/blob/master/docs/COVIDx.md. However, I noticed that the test class distribution is slightly different than the one presented in https://github.com/lindawangg/COVID-Net#results. I have used the https://github.com/lindawangg/COVID-Net/blob/master/create_COVIDx_v3.ipynb script, train_COVIDx2.txt and test_COVIDx2.txt files. For your reference, I observed the following data distribution:
COVID19 -> Train (223 images), Test (31 images)
Normal -> Train (7966 images), Test (885 images)
Prenumonia -> Train (5451 images), Test (594 images)
Kindly confirm whether the distribution is correct.
Furthermore, do you have any benchmark result with the above data distribution? The benchmark presented in https://github.com/lindawangg/COVID-Net#results is with the lesser test samples. What version of data distribution do you recommend for comparison with the COVID-Net? Kindly advise.
I look forward to your answers. Thank you.
Regards,
Saimun
Hello everyone,
I run the inference.py scipt on some "healthy" X ray images but the result was "Covid-19". I would like to check what the network is really classifing.
I read in the joint paper that the QSInquire method was used to have a visual control of results. Is the method available ? Does someone know how it works ?
Or have an alternative way of visualise the result ?
Thanks,
Apparently the model uses 4 softmax layers, but the latest dataset creation notebook only splits the cases into Normal, Pneumonia and Covid-19. There seems to be a mismatch here. What is the latest approach?
Can you please make your code compatible with Tensorflow 2.0+ by default.
Pretty easy to do with no code changes, see:
https://www.tensorflow.org/guide/migrate
Just add this wherever you previously imported tensorflow
:
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
And update your requirements.txt
per #45
Why the evaluation results are different when loading model-8485 and model-10 (epoch 10)?
These are my results when running eval.py using COVIDNet-CXR-Large, test_COVIDx2.txt and model-10 (epoch 10)
[[93. 7. 0.]
[ 5. 93. 2.]
[ 2. 2. 27.]]
Sens Normal: 0.930, Pneumonia: 0.930, COVID-19: 0.871
PPV Normal: 0.930, Pneumonia 0.912, COVID-19: 0.931
There are many things one may learn from your work here with CovidNet, as it is robust work that will reasonably help to push/democratize Ai in a positive direction.
I think a similar section from the repository seen in issue 55 after title "Preliminary Conclusion", concerning expected constraints on testing/diagnosis, should be considered for CovidNet.
Example chest radiography images of COVID-19 cases from 2 different patients and their associated critical factors (highlighted in red) as identified by GSInquire.
Can you kindly tell what is GSInquire and how it was instrumental in identifying associated critical factors?
Hi,
I am getting below error message frequently
"At least two variables have the same name: conv1_conv/bias"
when was trying to test pneumonia image and also i saw this error with normal and COVID-19 image but less frequently
version of tensor flow i have.
tensorboard = 1.14.0
tensorflow = 1.14.0
Hi, Why do they use 99% for training and 1% for testing? The standard is 80-20 70-30.
Hi,
Will you be publishing the code for the model anytime soon? Thanks.
Best,
Arijit.
I've managed to load the provided model, but I'm not sure how to proceed from there to actually use it on an image.
In #2 @Vikramank mentioned a Flask app, so I assume it's possible to use the pre-trained models, but without more info or docs, I don't know how to do it.
EDIT: actually, digging a bit more into this, it seems that, without the actual Keras model, using the Tensorflow checkpoint is pretty hard :-? From keras-team/keras#5273 (comment):
Fundamentally, you cannot "turn an arbitrary TensorFlow checkpoint into a Keras model".
What you can do, however, is build an equivalent Keras model then load into this Keras model the weights contained in a TensorFlow checkpoint that corresponds to the saved model. In fact this is how the pre-trained InceptionV3 in Keras was obtained.
So, without more info, it seems pretty hard (or impossible?) to do :-?
Would it be possible to add a simple example script that (1) gets the path to an image as input (1) loads the module (2) uses it and outputs the "COVID probability"?
Thanks!
is:issue Sir can you provide a simple code to load the model because i have tried many methods and all are showing some errors kindly also mention versions of tf. last error i got was :
{module 'tensorflow._api.v2.train' has no attribute 'import_meta_graph')
............... the code was as following :
import tensorflow as tf
new_graph = tf.Graph()
with tf.compat.v1.Session(graph=new_graph) as sess:
saver = tf.train.import_meta_graph('/content/models2/model.meta')
saver.restore(sess, "/content/models2/model")
i have also tried tf.compat.v1.Session and tf.Session but unable to load the model
Hi do you plan to upgrade to TF2.0?
Just a heads up.
With reference to this line on main repository: "Motivated by this, a number of artificial intelligence (AI) systems based on deep learning have been proposed and results have been shown to be quite promising in terms of accuracy in detecting patients infected with COVID-19 using chest radiography images. However, to the best of the authors' knowledge, these developed AI systems have been closed source and unavailable to the research community for deeper understanding and extension, and unavailable for public access and use."
There has been another open source project since February 9, that pre-dated CovidNet by over a month:
Emails were sent to one of the authors namely Alexander, but the main CovidNet github repository is yet to reflect/acknowledge the much earlier repository, which is easy and quickly doable. Why hasn't this been done?
Hi, First of all thanks for sharing your work.
For the conv1x1 layers the dimension gets cut in half, suggesting the step size is 2. However with a 1x1 filter this means you're dropping half of the pixels. Is this correct?
Furthermore, the first flatten layer is said to have a flattened dimension of 100352 - but that's what you'd get from just PEPX 4.3. However you also have PEPX 4.2, PEPX 4.1, and the last conv1x1 on the right all feeding into the flattened layer, which each have 100352 elements. so are these 4x100352 all flattened together, feeding a vector of 401408 elements into the first FC layer (as I would expect since they all come from the same input image), or are you treating them separately?
Hi to everyone,
I've a doubt about Dataset RSNA Pneumonia Challenge.
I'm going to download dataset (4gb) for detection of pneumonia on my own NN model.
I was wondering if the dataset was paid or if there is any constraint over the term "challenge"
Just to confirm if the dataset can be downloaded directly from here as well?
This is to avoid the signing up a Kaggle account with mobile number confirmation.
I am not getting the same sensitivity values when trained for 15 more epochs. The sensitivity values are not retained for the 1st epoch itself!
I have used COVIDNet-CXR-Large model and the dataset files being train_COVIDx2.txt and test_COVIDx2.txt.
It has been mentioned in the paper that you have used a learning rate policy which reduces the learning rate if the learning is stagnated for a period of time. The factor and patience values, 0.7 & 5 respectively have also been mentioned in the paper. However, I did not come across any line in the code which implement this.
I have tried to train the model on the same dataset for another 30 epochs as well with different learning rates (2e-07 & 2e-08). The sensitivity kept dropping.
Am I missing something?
I'm currently making an implementation of the COVID-Net in PyTorch
---- > https://github.com/IliasPap/COVIDNet
I'm looking forward for the network exact details ( num_of_filters, activation functions ... )
Do I need to train the model "COVIDNet-CXR-Large"? or I can directly inference the model with inference.py script?
What should be the inference's x-ray image dimension in pixel? and what should be the image's file size of the inference's image?
Hello,
I want to train you network from scratch and not from the pre-trained "Small" and "Large" models.
Could you please describe how to do that since I want to compare the effects of data augmentation vs dropout vs no augmentation & no dropout?
Thank you!
Before posting, have you looked at the FAQ page?
Please include a summary of the issue.
Please include the steps to reproduce.
List any additional libraries that are affected.
A description of what you expected to happen.
A description of what happens instead.
As mentioned in the paper, the COVID_Net has been pretrained on ImageNet. Could you please upload the imageNet pretrained weights so we can reproduce your result
Dear Linda,
For training the model, is there a script available that can separate the classes ( Normal, Pneumonia and Covid-19) based on the train and test text files?
I've managed to build the train and test data sets but they aren't labeled at the moment.
Thanks,
Babs
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.