Giter Site home page Giter Site logo

aim-harvard / pyradiomics Goto Github PK

View Code? Open in Web Editor NEW
1.1K 47.0 476.0 54.55 MB

Open-source python package for the extraction of Radiomics features from 2D and 3D images and binary masks. Support: https://discourse.slicer.org/c/community/radiomics

Home Page: http://pyradiomics.readthedocs.io/

License: BSD 3-Clause "New" or "Revised" License

Python 9.18% Jupyter Notebook 88.57% C 2.10% Shell 0.01% Dockerfile 0.13% Batchfile 0.01%
radiomics cancer-imaging-research medical-imaging computational-imaging nci-qin tcia-dac python radiomics-features docker nci-itcr

pyradiomics's People

Contributors

anthonyjatoba avatar bgeorge0 avatar blezek avatar chchuj avatar danxiejy avatar dfsp-spirit avatar drkuzmin avatar fabianbalsiger avatar fedorov avatar haarburger avatar hugoaerts avatar jaasantinha avatar jbvimort avatar jcfr avatar joostjm avatar kathryn-schutte avatar lyhyl avatar maekclena avatar mattwarkentin avatar michaelschwier avatar pieper avatar piiq avatar rcuocolo avatar risheng1128 avatar tommydino93 avatar vnarayan13 avatar ysuter avatar zivy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyradiomics's Issues

Perform padding on sitkImage instead of array

As discussed yesterday, cropping the volume and adding padding should be done on the sitkImage, so as to be able to perform it outside the feature classes and keep information on location of tumor correct.

Python 3 compatibility

Do you see any potential issues for adding python 3 compatibility? As far as I see the dependencies, making pyradiomics compatible with both python 2 and 3 shouldn't be very difficult...

Versioning at install

Currently, versioning is added to the folder name of the package when it is installed in site-packages. However, this causes it to create new folders for every new commit that is installed. With pip distributions, it regularly uninstalls older versions of a package before installing the new version.

Will this also happen when pyradiomics is published? Is there another way to specify that the version information should not be included in the name of the installed folder? Or that older versions of the package are uninstalled first?

Cloning the repository takes some time

An idea: Before doing a public release and involving a broader community, I would suggest to look for alternative approach to store the binary data.

time git clone [email protected]:Radiomics/pyradiomics.git
Cloning into 'pyradiomics'...
[...]

real    2m53.441s

May be they could be stored on data.radiomics.io ? Or in a repo named Radiomics/pyradiomics-data ? ...

Debugging wavelet and LoG operations

It would be helpful to compare the outputs of LoG and wavelet operations (we are not testing those currently) with those implemented in Matlab. At this point, we don't test those at all, and it makes sense to visually compare the outputs before doing the tests.

Initial testing framework setup

Nicole, this is just a placeholder to put all testing-related issues and keep track of status.

One idea I've just had (in addition to the other things we discussed before) is to also include tests that confirm that the development practices are followed. For example, we can have a test that confirms that docstrings are not empty for all feature getters.

Adding TCGA testing data

@vnarayan13 has access to 100 patient images and mask files plus Matlab processing output.
Where should this be hosted for testing?
We also need the information to link image file, mask file, Matlab results (patient ID?)

This will be used to test the feature ranking, please add some details here about how that will be generated and tested from this data set.

Shift value in firstorder

There is a constant shift of 2000. At the very least, it should be a class constant. But I suggest we parameterize it (it will be irrelevant for MR images, for example), and also revise where it is actually needed (in many instances, intensity values are squared, so this becomes irrelevant).

Use simple ITK labelShapeStatisticsFilter for computation of Shape features

See also @fedorov's comment in #115.
SimpleITK.LabelShapeStatisticsImageFilter exposes most of our shape features (and some we want to add in the future). This would simplify this feature class.

Nearly all features currently in shape.py are based upon volume and surface area and could therefore be calculated if necessary.

Additionally, I think we need to think about the implementation of shape, as it is currently calculated at the same points as the other feature classes, e.g. for every input/filtered image. However, for shape the values of the gray level intensities play no role, so shape only has to be calculated once per label map.

Installation instructions

We need instructions how to install the package, and basic usage example - following helloRadiomics or something simpler.

While adding documentation, please investigate how those pages can be integrated with the sphinx documentation auto-generated from docstrings, so we have all of the documentation in one place and together with the code.

Degrees of freedom for skewness and kurtosis

Currently, the standard deviation of the pyradiomics package is calculated with 1 degree of freedom (i.e. the sum of squares is multiplied by 1 / (N-1))
Standard deviation is also used in skewness and kurtosis, to the power of 3 and 4, respectively.
However, in the current version of the package, a standard deviation with 0 degrees of freedom is used (i.e. the sum of squares is multiplied by 1 / (N)).

I think we should make it consistent. So either change the standard deviation to use 0 degrees of freedom, or change skewness and kurtosis. Both are equally simpel to implement.

Confirm all feature classes are being tested

Currently, testing is not enabled for all of the feature classes due to some of the formulas being debugged. Whenever the implementation is finalized, make sure all feature classes are tested, and close this issue then.

Empty matrix handling

During testing of #63 testCase breast1 caused an empty glcm matrix for several angles.
How should the code handle this? Currently this empty matrix introduces some NaN-values due to 0-division in normalisation (P_glcm is divided by the sum, which is 0 in case of an empty matrix). This is causing all calculated feature values to also return NaN.
Possible solutions would be

  • Leave it as it is. This will cause all feature values to return NaN
  • Using numpy.where, prevent the matrix from containing NaN-values (but in stead containing only 0's). This will cause some featureValues to return NaN
  • Delete the angles which have an empty matrix associated with them. The resultant feature value will then be based only on angles containing a non-empty matrix and shouldn't return NaN)

How should we go about this?

Padding size when filtering is applied

In the situations when, for example, we apply LoG, padding should be added to make sure filtering is possible, and pad size should be calculated as a function of kernel size.

pyradiomics tests always use default settings

In test_features circleci tests all specified features (standard set now only firstorder).
However, when the feature calculation classes are intitialized, no arguments are passed.
Therefore, it will always use the standard settings.
Currently:

self.binWidth = 25
self.resampledPixelSpacing = None # no resampling
self.interpolator = sitk.sitkBSpline
self.padDistance = 5 # no padding
self.padFillValue = 0
self.verbose = True

2D maximum diameters

Thibaud Coroller asked me whether it was possible to add the maximum diameters in 2D, for correlation with those measured manually by radiologists.
I implemented these, but I'd rather wait with the PR until current issues with shape are resolved.

During development I did discover some differences with matlab. These are due to a bug in the matlab code. When the volume is small, matlab calculates a distance matrix (distMat) containing the euclidean distance. When the volume is larger and memory issues may occur, this matrix is calculated differently, but also the square root is not applied.
For the calculation of max3D this does not matter (calculated directly after assignment of distMat), with the square root applied once.
For the max2D, the square root is also applied, meaning that for small volumes, the square root is applied twice, causing an erroneous distance to be calculated. For larger volumes (where the square root is not applied to distMat), the calculated max2D distances are correct.

Create artificial test cases

Some extreme cases of certain textures can still cause bugs in the pyradiomics code, but won't be triggered during contiguous testing because the current test cases do not present such a case. Adding certain artificial test cases may address this (for example, a test case containing a flat region).

Revise communication of feature computation results back to the caller

Currently, result of feature computation is communicated as a flat dictionary of shortFeatureName:value pairs: https://github.com/Radiomics/pyradiomics/blob/master/radiomics/base.py#L42-L45

It will probably be better if instead we map feature name to a structure that includes not only the value of the feature, but also the inputs and parameters used to get to this number. This helps with data provenance in general, and specifically should be useful for testing. Does this makes sense? I propose we brainstorm what should be the first iteration of that structure.

{
  pyradiomicsResult : {
     commonParameters :
        {
             inputImage : string,
             inputMask : string,
             binWidth : integer,
             ...
         },
      firstOrderFeaturesResult : 
         parameters :
             {
                  <not sure if there are any first order specific parameters ... >
             },
         values:
             {
                   mean : float,
                   STD : float,
                   ...
              }
       }
  }
}

commonParameters would be populated by the base class, and feature-class-specific parameters by the implementing class.

We can also consider JSON-LD for referring back to the versioned documentation page describing the concepts https://en.wikipedia.org/wiki/JSON-LD.

Let's discuss this at the next call?

Investigate the use of parameterization to simplify testing

nose-parameterized plugin information: https://pypi.python.org/pypi/nose-parameterized/
looks like exactly what we need: http://stackoverflow.com/questions/30874795/use-class-method-in-nose-parameterize-expand-call
please compare with the capabilities of nose2 that seems to include parameterization natively, and whether it is better to switch to nose2 instead of using the parameterization plugin: http://nose2.readthedocs.org/en/latest/differences.html

Add padding to cropToTumorMask

For small masks, the result of cropToTumorMask can be too small for many of the feature calculations, causing exceptions. Could cropToTumorMask have a pad parameter? My hack looks like this:

  #Determine bounds
  lsif = sitk.LabelStatisticsImageFilter()
  lsif.Execute(imageNode, maskNode)
  bb = numpy.array(lsif.GetBoundingBox(label))

  # Expand the bounds
  bb = bb + [-pad, pad, -pad, pad, -pad, pad]
  
  ijkMinBounds = bb[0::2]
  ijkMaxBounds = size - bb[1::2] - 1

Build around Pandas / DataFrames instead of CSVWriter

The library would be a lot more flexible if it were built around Pandas and DataFrames since they are better suited for looking at collections of data. They also allow for many more export options like json, html, csv, and have better support for types inside columns extendable to figures and arrays

image

Add checks to ensure image and label correspondence

Add checks (in base class?):

  1. image and label have similar (within tolerance) base pixel spacing
  2. image and label have similar origins (does this vary with dimension?)
  3. image and label have similar 3-space orientation (space directions in nrrd header?)
  4. image and label have similar dimensions (not necessary, but need to somehow check registration)

`imageoperations.interpolateImage` not called

In radiomics.base settings are provide to resample an image.
The actual resampling is then implemented in imageoperations.interpolateImage.
However, this function is never called in the code.
It may be that this is applied in some API that is not included in the repository, but I can't find any reference in any of the radiomicsfeatureclasses or in any testclasses.

Add documentation on feature class at start of class

Issues to cover:

  • Explanation of feature class
  • Calculation of data matrix (if applicable)
  • Small matrix example
  • Possible custom settings (none necessary for currently implemented feature classes)
  • References (Article / Wiki / Other URLs)

Update baselines for first order Uniformity and Entropy

These should be taken to be the values currently generated by the pyradiomics code implementation. The discrepancy is most likely due to the differences in the binning approach implementation, according to @JoostJM. Consensus was reached in the group that it is not worthwhile to investigate this, but use the pyradiomics implementation.

Command line tool for features/signature computation

As discussed with Hugo, it will be helpful to have a command-line tool accompanied by a configuration file to select features to be calculated. Given config file, image, and mask, it would output features in a JSON or another format.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.