jaeho3690 / lidc-idri-preprocessing Goto Github PK

This is the preprocessing step of the LIDC-IDRI dataset

Python 20.09% Jupyter Notebook 79.91%

lidc-dataset preprocessing lung lidc-preprocessing npy lidc-idri

lidc-idri-preprocessing's Introduction

LIDC Preprocessing with Pylidc library

This repository would preprocess the LIDC-IDRI dataset. We use pylidc library to save nodule images into an .npy file format. The code file structure is as below

+-- LIDC-IDRI
|    # This file should contain the original LIDC dataset
+-- data
|    # This file contains the preprocessed data
|   |-- _Clean
|       +-- Image
|       +-- Mask
|   |-- Image
|       +-- LIDC-IDRI-0001
|       +-- LIDC-IDRI-0002
|       +-- ...
|   |-- Mask
|       +-- LIDC-IDRI-0001
|       +-- LIDC-IDRI-0002
|       +-- ...
|   |-- Meta
|       +-- meta.csv
+-- figures
|    # Save figures here
+-- notebook
|    # This notebook file edits the meta.csv file to make indexing easier
+-- config_file_create.py
|    # Creates configuration file. You can edit the hyperparameters of the Pylidc library here
+-- prepare_dataset.py
|    # Run this file to preprocess the LIDC-IDRI dicom files. Results would be saved in the data folder
+-- utils.py
     # Utility script

1.Download LIDC-IDRI dataset

First you would have to download the whole LIDC-IDRI dataset. On the website, you will see the Data Acess section. You would need to click Search button to specify the images modality. I clicked on CT only and downloaded total of 1010 patients.

2. Set up pylidc library

You would need to set up the pylidc library for preprocessing. There is an instruction in the documentation. Make sure to create the configuration file as stated in the instruction. Right now I am using library version 0.2.1

3. Explanation for each python file

python config_file_create.py

This python script contains the configuration setting for the directories. Change the directories settings to where you want to save your output files. Without modification, it will automatically save the preprocessed file in the data folder. Running this script will create a configuration file 'lung.conf'

This utils.py script contains function to segment the lung. Segmenting the lung and nodule are two different things. Segmenting the lung leaves the lung region only, while segmenting the nodule is finding prosepctive lung nodule regions in the lung. Don't get confused.

python prepare_dataset.py

This python script will create the image, mask files and save them to the data folder. The script will also create a meta_info.csv file containing information about whether the nodule is cancerous. In the LIDC Dataset, each nodule is annotated at a maximum of 4 doctors. Each doctors have annotated the malignancy of each nodule in the scale of 1 to 5. I have chosed the median high label for each nodule as the final malignancy. The meta_csv data contains all the information and will be used later in the classification stage. This prepare_dataset.py looks for the lung.conf file. The configuration file should be in the same directory. Running this script will output .npy files for each slice with a size of 512*512

To make a train/ val/ test split run the jupyter file in notebook folder. This will create an additional clean_meta.csv, meta.csv containing information about the nodules, train/val/test split.

A nodule may contain several slices of images. Some researches have taken each of these slices indpendent from one another. However, I believe that these image slices should not be seen as independent from adjacent slice image. Thus, I have tried to maintain a same set of nodule images to be included in the same split. Although this apporach reduces the accuracy of test results, it seems to be the honest approach.

4. Data folder

the data folder stores all the output images,masks. inside the data folder there are 3 subfolders.

1. Clean

The Clean folder contains two subfolders. Image and Mask folders. Some patients don't have nodules. In the actual implementation, a person will have more slices of image without a nodule. To evaluate our generalization on real world application, we save lung images without nodules for testing purpose. These images will be used in the test set.

2. Image

The Image folder contains the segmented lung .npy folders for each patient's folder

3. Mask

The Mask folder contains the mask files for the nodule.

4. Meta

The Meta folder contains the meta.csv file. The csv file contains information of each slice of image: Malignancy, whether the image should be used in train/val/test for the whole process, etc.

5. Contributing and Acknowledgement

I started this Lung cancer detection project a year ago. I was really a newbie to python. I didn't even understand what a directory setting is at the time! However, I had to complete this project for some personal reasons. I looked through google and other githubs. But most of them were too hard to understand and the code itself lacked information. I hope my codes here could help other researchers first starting to do lung cancer detection projects. Please give a star if you found this repository useful.

here is the link of github where I learned a lot from. Some of the codes are sourced from below.

https://github.com/mikejhuang/LungNoduleDetectionClassification

lidc-idri-preprocessing's People

Contributors

Stargazers

Watchers

Forkers

makama-md ajaykirubakar jongchan sumrao jiazewang shahenda-mahmoud-mabrouk-saed sakshiwala sabahsaddiq sndsosm further2006 sangeethabalaji lucaswojahn dr-dahou-adrar taigw ssraghuvanshi xelanicol oceanechy zzz-jpg iiitmg ljm198134 john-james-ai matthewchung74 lenardry pollob001 kyulee-jeon irsa890 sophiepet samarpitahazra mp1619 iiitmg niexiuping wahyurahmaniar wissemmeddeb vinibfr sujithvishal vivaansharma2003 lixiang007666 deeponcology muzaffersaylan

lidc-idri-preprocessing's Issues

FileNotFoundError: [Errno 2] No such file or directory: 'data\\Clean\\Image\\LIDC-IDRI-0100\\LIDC-IDRI-0100\\0100_CN001_slice000.npy'

Issue in LIDC-IDRI-Segmentation

The issue that I am raising is regarding LIDC-IDRI-Segmentation project.
Since I was not able to receive a reply there, I am posting here and I apologize for posting query in some other project.

I am unable to find the function crop_nodule in View_output.ipynb.
The function crop_nodule is called in function crop_patch.

问题

你好，我发现你的代码中没有对hu值进行处理，导致我的分割模型精度并不是很高。你有解决方案吗？

FileNotFoundError for "Clean" data

Hi Jaeho, my name is Harsha. Thank you for this repository. I am using it to preprocess data for my segmentation model. My issue is that as soon as your script encounters the first "clean" data files, it throws the following error:
Patient ID: LIDC-IDRI-0028 Dicom Shape: (512, 512, 141) Number of Annotated Nodules: 0
Clean Dataset LIDC-IDRI-0028
Traceback (most recent call last):
File "prepare_dataset.py", line 173, in
test.prepare_dataset()
File "prepare_dataset.py", line 157, in prepare_dataset
np.save(patient_clean_dir_mask / mask_name, lung_mask)
File "<array_function internals>", line 6, in save
File "/home/ramanha/env3/lib/python3.5/site-packages/numpy/lib/npyio.py", line 541, in save
fid = open(file, "wb")
FileNotFoundError: [Errno 2] No such file or directory: 'data/Clean/Mask/LIDC-IDRI-0028/LIDC-IDRI-0028/0028_CM001_slice000.npy'

Error running prepare_dataset

Hello, when I run "vol = scan.to_volume()", the program reported an error: "RuntimeError: Could not establish path to dicom files. Have you specified the path option in the configuration file C:\Users\Administrator\pylidc .conf?", I just learned Python not long ago, and I really can’t solve it. I hope to get your advice. Thank you very much.

RuntimeError: Could not establish path to dicom files. Have you specified the `path` option in the configuration file C:\Users\rahul\pylidc.conf?

I am new to this pylidc can anyone help me with this. I think i have specifies the correct path but still i am getting this error.
specifying the path as

error

contents of project folder.

dataset

Do I have to download all the data? LIDC dataset

pleas help me need work withe data but the data complexity

I am working on this data for lung cancer, but it does not have a correct label, and I am facing many problems. Is it possible to communicate and help?

Error saving clean dataset

prepare_dataset.py gave me an error (NotADirectory) while saving clean dataset.

but I already resolved it by changing these lines (line 152),

nodule_name = "{}/{}_CN001_slice{}".format(pid,pid[-4:],prefix[slice])
mask_name = "{}/{}_CM001_slice{}".format(pid,pid[-4:],prefix[slice])

into,

nodule_name = "{}_CN001_slice{}".format(pid[-4:],prefix[slice])
mask_name = "{}_CM001_slice{}".format(pid[-4:],prefix[slice])

File not found error

I have created all the folders as mentioned but still getting this error

FileNotFoundError Traceback (most recent call last)
in
166
167 test= MakeDataSet(LIDC_IDRI_list,IMAGE_DIR,MASK_DIR,CLEAN_DIR_IMAGE,CLEAN_DIR_MASK,META_DIR,mask_threshold,padding,confidence_level)
--> 168 test.prepare_dataset()

in prepare_dataset(self)
130
131 self.save_meta(meta_list)
--> 132 np.save(patient_image_dir / nodule_name,lung_segmented_np_array)
133 np.save(patient_mask_dir / mask_name,mask[:,:,nodule_slice])
134 else:

<array_function internals> in save(*args, **kwargs)

~\Anaconda3\envs\cpu_env\lib\site-packages\numpy\lib\npyio.py in save(file, arr, allow_pickle, fix_imports)
539 if not file.endswith('.npy'):
540 file = file + '.npy'
--> 541 fid = open(file, "wb")
542 own_fid = True
543

FileNotFoundError: [Errno 2] No such file or directory: 'D:\data\Image\LIDC-IDRI-0001\LIDC-IDRI-0001\0001_NI000_slice000.npy'

ImportError

Good evening friend,

Trying to run the prepare_dataset code but not successful. I keep getting this error;

ImportError: cannot import name 'is_dir_path'

Please what can I do

cluster_annotations

Hi, my name is David, I would like to thank you for taking the time to share your knowledge, it was really helpful for understanding such a complex topic.
A question, at the moment I try to run prepare_dataset I got this error message, I am a little bit lost, perhaps could you guide me with this?
0%| | 0/135 [00:00<?, ?it/s]
Traceback (most recent call last):
File "prepare_dataset.py", line 173, in
test.prepare_dataset()
File "prepare_dataset.py", line 99, in prepare_dataset
nodules_annotation = scan.cluster_annotations()
AttributeError: 'NoneType' object has no attribute 'cluster_annotations'

I can't produce all files in 'data' folder

Could you teach me how to produce the content of 'Clean','Image','mask','Meta' in 'data' folder ?
I download LIDC-IDRI, but the content is wierd.
Please help me!!!

AttributeError: 'NoneType' object has no attribute 'cluster_annotations'

Hi Jaeho! First of all thanks for the detailed explanation for the preprocess. I can follow so far for my first project try on this data.

I encounter a problem after running the code. Can I ask for a help to solve this issue? Thank you very much

AttributeError Traceback (most recent call last)
in
155
156 test= MakeDataSet(LIDC_IDRI_list,IMAGE_DIR,MASK_DIR,CLEAN_DIR_IMAGE,CLEAN_DIR_MASK,META_DIR,mask_threshold,padding,confidence_level)
--> 157 test.prepare_dataset()

in prepare_dataset(self)
81 pid = patient #LIDC-IDRI-0001~
82 scan = pl.query(pl.Scan).filter(pl.Scan.patient_id == pid).first()
---> 83 nodules_annotation = scan.cluster_annotations()
84 vol = scan.to_volume()
85 print("Patient ID: {} Dicom Shape: {} Number of Annotated Nodules: {}".format(pid,vol.shape,len(nodules_annotation)))

AttributeError: 'NoneType' object has no attribute 'cluster_annotations'

]

ImportError

Thanks for your response.

Utils.ipynb is there. I ran it. I found the function is_dir_path inside also but I keep getting 'cannot import name is_dir_path' error message.

Please do you have any other advice for me?

Thank you

problem in running the code

RuntimeError: Could not establish path to dicom files. Have you specified the path option in the configuration file E:\Users\oqla2\pylidc.conf?

Installation help

Hi,
First of thank you so much for the effort put into the guide you wrote up and all this code! I know basically nothing about pre-processing and this is really helping me out.
To start, I wanted to just get your preprocessing code working. I created a new anaconda env and downloaded all the necessary packages. Sadly pylidc available so I downloaded pip in anaconda and used pip to install the package.

Now when I run my code, I'm not getting any errors about imports anymore, but I get this error:

AttributeError: 'NoneType' object has no attribute 'cluster_annotations'

For this line:
nodules_annotation = scan.cluster_annotations()

Is this a pylidc installation issue?
Also can you please share exactly how you downloaded the packages (did you install all of them through pip?) so I can try that instead?

Thanks!

NoneType' object has no attribute 'cluster_annotations'

I am getting error in LIDC-IDRI preprocessing
~\AppData\Local\Temp\ipykernel_11144\2002799093.py in prepare_dataset(self)
140
141 for patient in tqdm(self.IDRI_list):
--> 142 pid = LIDC-IDRI-0x17
143 scan = pl.query(pl.Scan).filter(pl.Scan.patient_id == pid).first()
144 nodules_annotation = scan.cluster_annotations()

if I am writing pid= patient
then getting error
~\AppData\Local\Temp\ipykernel_11144\2591889043.py in prepare_dataset(self)
142 pid = patient
143 scan = pl.query(pl.Scan).filter(pl.Scan.patient_id == pid).first()
--> 144 nodules_annotation = scan.cluster_annotations()
145 vol = scan.to_volume()
146 print("Patient ID: {} Dicom Shape: {} Number of Annotated Nodules: {}".format(pid,vol.shape,len(nodules_annotation)))

AttributeError: 'NoneType' object has no attribute 'cluster_annotations'

How many classes

Hi Jaeho, this is rather a (stupid) question than an issue. What are the number of classes for segmentation from this dataset? 2, right? Malignant or not?

OSError: Couldn't find DICOM files for Scan(id=12,patient_id=LIDC-IDRI-0001).

pylidc.conf is:
[dicom]
path =C:\Users\Pollob\ct img\LIDC-IDRI
warn = True

actually where is the problem? anyone please help me.i am not able to fix...

RuntimeError: Could not establish path to dicom files.

hi friend
I am a beginner in python.I am new to pylidc. can u tel me how to specifiy the path option in the configuration file C:\Users\varun\pylidc.conf?.
My LIDC data is available in folder D:\LIDCPREPROCESSING CODE\LIDC-IDRI-Preprocessing-master\LIDC-IDRI.
expecting positive response
chinnu

Generating masks for multiple nodules within the same slice

Hello, your repo has been extremely helpful in the data preprocessing for the LIDC dataset. much thanks.

There is one question that I have which is regarding the nodule masks generation.

For example, if Slice_50 contains 2 nodules, this code will generate 2 npy images for the lung, and 2 npy masks for the nodule right?

The generated npy images for the lung will be the same slice_50, however there will be 2 respective npy masks for each of the nodules within the slice_50.

How will this affect the training and validation accuracies?

jaeho3690 / lidc-idri-preprocessing Goto Github PK

lidc-idri-preprocessing's Introduction

LIDC Preprocessing with Pylidc library

1.Download LIDC-IDRI dataset

2. Set up pylidc library

3. Explanation for each python file

4. Data folder

1. Clean

2. Image

3. Mask

4. Meta

5. Contributing and Acknowledgement

lidc-idri-preprocessing's People

Contributors

Stargazers

Watchers

Forkers

lidc-idri-preprocessing's Issues

Recommend Projects

Recommend Topics

Recommend Org