Giter Site home page Giter Site logo

lidctoolbox's Introduction

LIDC Matlab Toolbox

Thomas A. Lampert, ICube, University of Strasbourg

This work was carried out as part of the FOSTER project, which is funded by the French Research Agency (Contract ANR Cosinus, ANR-10-COSI-012-03-FOSTER, 2011—2014): http://foster.univ-nc.nc/

Introduction

This toolbox accompanies the following paper:

T. Lampert, A. Stumpf, and P. Gancarski, 'An Empirical Study of Expert Agreement and Ground Truth Estimation', IEEE Transactions on Image Processing 25 (6): 2557–2572, 2016.

I kindly request you to cite the paper if you use this toolbox for research purposes.

The toolbox contains functions for converting the LIDC database XML annotation files into images. The main function is LIDC_process_annotations, this function extracts the readings for each individual marker in the database, and then creates a TIFF image related to each slice of the scan.

Overview

The function works whether the images are present or not. Nevertheless, the images are used to sort the slices and therefore without them the output will not be in 'anatomical' order. The slice spacing is first determined from the dicom images if they are present, and if Max fails it is then calculated automatically from the annotations.

There are two paths to set in the LIDC_process_annotations.m file, the first to the LIDC dataset, this will be searched recursively for all XML files and the processing will be performed on each. The second path is the output path, if the images are present in the dataset then three folders will be created: gts, images, masks. Please note that neither of these paths can contain a space. Each of these folders will contain folders that are named after the StudyInstanceID of the relevant scan (minus the first '1.3.6.1.4.1.14519.5.2.1.6279.6001.', which seems to be constant throughout the dataset), and within the gts folder several folders named slice1 ... sliceX, where X is the number of slices for which reader annotations were found. Each of these folders contains the files GT_id1.tif ... GT_isY.tif where Y is the number of readers found for that particular scan (each file is a binary image where ones denote the markers GT). The gts folder will also contain a text file that details the correspondence between the folder's name (slice number), the SOPInstanceUID (unique for each slice of the scan) and the DICOM filename that contains that slice (if the images are present). The masks folder contains binary images slice1.tif ... sliceY.tif, where one indicates that the area is out-of-bounds of the scan. The images folder contains images slice1.tif ... sliceY.tif, which contain the slice images. The masks and images folder are only created if the images are present in the folder in which the XML annotations exist (as is the structure when the LIDC dataset is downloaded). The toolbox will only extract the slices for which annotations are found, the remaining slices can be obtained from the DICOM images quite easily.

Installation

To use the toolbox's functions, simply add the toolbox directory to Matlab's path. Within the header of each function may be found a short description of its purpose.

The function LIDC_xml_2_pmap uses (a slightly modified version of) the external Perl script max.pl (https://wiki.cancerimagingarchive.net/display/Public/Lung+Image+Database+Consortium) and therefore requires that Perl is installed, furthermore the following packages should be installed:

XML::Twig
XML::Parser
Math::Polygon::Calc
Tie::IxHash

More information can be found in the header of LIDC_process_annotations.m and the max.pl script located in ./support_software/max.pl. If you are using OSX (and perhaps Linux) you may also need to update the perl_library_path variable in LIDC_xml_2_pmap.m to point to the correct location of these libraries (particularly if you receive the error "Can't locate XML/Twig.pm" or it complains that XML::Twig is not installed, when it is). To install these packages the following command can be executed (use sudo if on OSX):

    perl -MCPAN -e "install XML::Twig"

NOTE: This toolbox was created under Matlab 2013a and OSX, I have also tested it under Windows 7 x64 using Matlab 2012b.

NOTE: Many functions include assignments such as [~, var1] = someFunction(input), which is only supported in versions of Matlab newer than 2009b. You can replace these assignments to [ignore, var1] = someFunction(input) for versions earlier than 2010a, although I've not tested that no other incompatibilities exist.

Quick Start - Windows

  1. Download and Install Activestate (http://www.activestate.com/activeperl/downloads)
  2. Install the following perl packages
  • XML::Twig
  • XML::Parser
  • Math::Polygon::Calc
  • Tie::IxHash

To do that, start the Windows command prompt and execute the following commands:
perl -MCPAN -e "install XML::Twig"
perl -MCPAN -e "install XML::Parser"
perl -MCPAN -e "install Math::Polygon::Calc"
perl -MCPAN -e "install Tie::IxHash" 3. Restart your PC

Your computer is now ready to use the toolbox.

To begin using the toolbox

  1. Open the file "LIDC_process_annotations.m"
  2. Set the Paths for the LIDC-IDRI folder and to the Output folder
  • Note: Make sure your path does not include a space, For example:
      "c:\LIDCtoolbox v1.3" is an invalid path due to the space between "toolbox" and "v1.3"
      "c:\LIDCtoolbox_v1.3" is a valid path
  1. Open the file
  2. Run "LIDC_process_annotations.m"

Quick Start - OS X/Linux

  1. Install the following perl packages
  • XML::Twig
  • XML::Parser
  • Math::Polygon::Calc
  • Tie::IxHash

To do that, start the terminal and execute the following commands:
sudo perl -MCPAN -e "install XML::Twig"
sudo perl -MCPAN -e "install XML::Parser"
sudo perl -MCPAN -e "install Math::Polygon::Calc"
sudo perl -MCPAN -e "install Tie::IxHash"

Your computer is now ready to use the toolbox.

To begin using the toolbox

  1. Open the file "LIDC_process_annotations.m"
  2. Set the Paths for the LIDC-IDRI folder and to the Output folder
  • Note: Make sure your path does not include a space, For example:
      "/users/LIDCtoolbox v1.3" is an invalid path due to the space between "toolbox" and "v1.3"
      "/users/LIDCtoolbox_v1.3" is a valid path
  1. Open the file "LIDC_xml_2_pmap.m"
  2. Set the path stored in the variable "perl_library_path" to that which contains the above installed perl packages.
  3. Run "LIDC_process_annotations.m"

If you find any problems or would like to contribute some code to the toolbox then please contact me.

Validation

The sample output of three scans is included in the "sample_output" directory. This is a sample of the output that should be expected when the full dataset is downloaded (i.e. annotations and images) and processed. When only the annotations are present, the "images" and "masks" directories will not be created.

Please note that the images will appear black in most standard viewers, this is because the masks contain binary values (0s and 1s), and the scan images contain the original DICOM values and therefore they are not scaled to the standard image value range (0–255). The recommended viewing method is to load them into a programming environment and use an image viewing function that performs automatic range scaling, i.e. (using Matlab)
img = imread(filename)
imagesc(img)

Known Problems

Without the images Max (used in the backend) will attempt to infer the slice spacing from the annotations, which may fail or the automatically calculated value will cause error messages such as:
FATAL[6501] (in MAX in sub rSpass2 near line 4366): Couldn't get a Z index at Z coordinate -105.300 mm
I have manually added values for these cases, see the switch statement on line 175 of LIDC_process_annotations.m. If this happens for other cases, then you will need to manually add the slice spacing into the switch statement. The studyID to be used in the switch statement will be output during the failure. The slice spacing can be inferred from the DICOM info contained in the images (this is done automatically when they are present), by analysing the xml file with Max using the command (executed in the toolbox path):
perl support_software/max-V107b/max-V107b.pl --skip-num-files-check --z-analyze --files=<path to xml file>
or by trial-and-error. The toolbox does not use any information derived from this value. Besides, this occurs mainly (but not exclusively) in scans where only small nodules are marked and are therefore unused.

Acknowledgements

Peyton Bland gave considerable advice with regards to the Max software upon which this toolbox is based. Hamada Rasheed Hassan also contributed through extensive testing of the toolbox under Windows and in helping write this readme file.

Analytics

lidctoolbox's People

Contributors

testerti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

lidctoolbox's Issues

Missing GTs

First of all, thanks for making this toolbox! I have a question regarding the difference between an empty .tif file in the gts folder and missing .tif files.
For instance, in sample_output/gts/191425307197546732281885591780/slice1/ there are 3 (out of 4) .tif files which only contain zeros. When looking at the corresponding .xml we also see that only in the last readingSession there is an entry for this imageZposition. However, in sample_output/gts/490157381160200744295382098329/slice1/ there are only two .tif files. What is the reason that sometimes the script adds empty (containing only zeros) .tif files in the gts folder while in other cases it does not?

output file issue

Hi,
I'm generating the LIDC labeled dataset using your toolbox, but when I'm comparing the results with your sampled output of "LIDC-IDRI-0002", "LIDC-IDRI-0004", and "LIDC-IDRI-0005", I can see in your file you have only two files, i.e., GT_id1.tif and GT_id2.tif. In my case its four GT_id1.tif, GT_id2.tif, GT_id3.tif, GT_id4.tif. Most of them are blank. Even Image and mask folders are fully black images. Moreover, in the "slice_correspondences.txt" file, filenames are different.
Questions:

  1. Could you tell me how I can be sure that this code is generating the actual ground truth?
  2. "Each of these folders contains the files GT_id1.tif ... GT_isY.tif where Y is the number of readers found for that particular scan (each file is a binary image where ones denote the markers GT)." Is the reader denote the number of experts/radiologists? If yes, which results we should consider for training & testing?

Working with a subset of LIDC

Does this requires the entire LIDC- dataset to be downloaded. Or would it work with several Patient's data in the directory.

Images from scans made with Toshiba scanner

I have a question regarding the slices generated with Toshiba scanners. They all seem to be very dark compared to slices generated with other scanners. I simply import the generated .tif image and use Matplotlib with a grey colormap and this works perfect for all images except the Toshiba ones.

Below you can find an example of how the generated image looks and how the original .dcm image looks when imported using Pydicom and saved as .png with matplotlib. When reading the files with imagesc() from Matlab (as you mentioned in a different issue) you also see that these scans are much darker. Is there something I can do to improve this?

slice73-150

slice73-150actual

when i run it,i got some error

Hello! I try to run it on matlab2018b, but i got some error:

错误使用 LIDC_xml_2_pmap (line 143)
There was a problem executing Max (perhaps the slice spacing or something more serious -- see Max output
above). The studyID was 280315210397549164238230581781 and the input file was
F:\LIDC\input\LIDC-IDRI-0007\01-01-2000-81781\3000631-57680\1\081_1.xmltemp

出错 LIDC_process_annotations (line 234)
LIDC_xml_2_pmap(new_xml_paths{i}{j}, new_xml_filenames{j}, pixel_spacing,
slice_thickness, [xml_path, filename], studyID);

Could u pls help me fix it ?

Strange output for around 120 cases with Toshiba scanners

I've found out that there is still something strange happening with Toshiba scans. This is for instance case LIDC-IDRI-0892. Where one (000042.dcm)of the original dicom files looks like this:
screenshot 2018-12-20 at 09 22 16

And for this image, the output of the Toolbox is:
screenshot 2018-12-20 at 09 20 41

There seems to be something strange happening due to the fact that there are a lot of negative pixel values in the original image. As you can see in this histogram:
hist

Could this be caused by the following messages the toolbox gives when processing this patient:
Warning: No padding value found. If negative values are found in the image then these will be used for the mask, if not, the mask will be empty ?

Tiff image issues

First of all, thank you very much for your toolbox!

I have setup matlab + lidcToolbox on my local computer. I have convert a little LIDC directory via this toolbox. But I found the tiff which be converted is black via windows photo viewer.

I don't know what cause this, waiting for help:)

PS: What application should I use to view these tiff?

What is the definition of "gts", "images", "masks" ?

Hi, TesterTi

  Thanks for your help.

  I don't know the definition of  these three direcotrys (gts, images, masks)。

  1,What is the difference between it ?  
  2,Do I view them through the same method(matlab, imagesc())?
  3,Which directory should I play If I want to make a presentation to other people for medicine demo ? 

How can I find the information about "characteristics" in xml file?

Hi, TesterTi

  I am studying the annotation xml about LIDC-IDRI-0002 now. There is a section which named "characteristics" in xml, But I can not found the description about it. 
  For exmple:
  <quote>

    <characteristics>
    <confidence>3</confidence>
    <subtlety>4</subtlety>
    <obscuration>2</obscuration>
    <reason>1</reason>
  </characteristics>  

  </quote>

  What is the meaning of "confidence" and "reason" ? Where can I found these description?

  I have search them at "https://cdebrowser.nci.nih.gov/CDEBrowser/", but nothing be found. 

 Thanks for your help!:)

Negative pixel values in .dcm images

Related to my previous question, I found out that .dcm files made with Toshiba scanners contain a lot of negative pixel values. Here you can find a histogram of the pixel values for one of the cases in the LIDC dataset.
toshibaslidebeforepreprocessing

However, when I inspect the image that matches this slice after preprocessing it using this toolbox, I see that there are no negative pixel values anymore. And these images are also a lot darker than images from all other scanners. I've inspected the code, but I can only see you replacing the out-of-scan negative pixel values. Could you explain how you deal with negative pixel values in the original dicom images and why they disappear after preprocessing? Thanks a lot!

Getting the output of the toolbox

First of all thank you very much for creating this toolbox. It should save me and others a lot of time!
I was wondering if it is possible to get the output of matched nodules created by the toolbox. Maybe the output is already sufficient and I won't actually need to run the code.

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.