aim-harvard / pyradiomics Goto Github PK

Open-source python package for the extraction of Radiomics features from 2D and 3D images and binary masks. Support: https://discourse.slicer.org/c/community/radiomics

Home Page: http://pyradiomics.readthedocs.io/

License: BSD 3-Clause "New" or "Revised" License

Python 9.18% Jupyter Notebook 88.57% C 2.10% Shell 0.01% Dockerfile 0.13% Batchfile 0.01%

radiomics cancer-imaging-research medical-imaging computational-imaging nci-qin tcia-dac python radiomics-features docker nci-itcr

pyradiomics's People

Contributors

Stargazers

Watchers

Forkers

fedorov joostjm pieper al3n70rn 4quant bastiao chelovekhe joaosantinha blezek naucoin mrdivine marphy nick917 mayo-qin vincentagnus jbvimort vnarayan13 mstarmans91 sunshyam sohrabtowfighi hibhl jhlegarreta kwahid fabianbalsiger dongshenggu michaelschwier ttchuanbao clarehchao paxmover taznux junjiez ipadawan nagyistoce amzpotato zzhalan cyxss cleverpaul jianfson jlokimlin tianzq hayreenlee qichen2014 aihill cancerimageai priya-gittest zhenweishi jcd-gh jpieroabarcam liu3xing3long frankmakosa qianshuqinghan w7cam jaesoup onefork winsoncws birajaghoshal cxsunshine juzigithub unlimitedting wangfaofao maxdavid40 sandfis xchromosome219 greynes junqiangchen alps1122 saeedseyyedi gsel9 tanlogin rhmiller47 sixitingting jacapan picspin jsaenzbimcv aashish24 xiaochengcike jajenqin prtksngh belalmohsen amufuturama atakansite hyamomar xd19860205 zhouzhuhuang xujing1022 qwvwq maojingyi denghuchuan voreille bgeorge0 dfsp-spirit kkangshen pythons jiashi9 drkuzmin jaretk ajclement chpiero norah2 leod235689

pyradiomics's Issues

Protection of master branch to avoid inadvertent overwrite

Here are the settings I just enabled:

Cc: @Radiomics/developers

Perform padding on sitkImage instead of array

As discussed yesterday, cropping the volume and adding padding should be done on the sitkImage, so as to be able to perform it outside the feature classes and keep information on location of tumor correct.

Python 3 compatibility

Do you see any potential issues for adding python 3 compatibility? As far as I see the dependencies, making pyradiomics compatible with both python 2 and 3 shouldn't be very difficult...

Make progress bar optional

progress bar causes issues while running the code in batch mode

Versioning at install

Currently, versioning is added to the folder name of the package when it is installed in site-packages. However, this causes it to create new folders for every new commit that is installed. With pip distributions, it regularly uninstalls older versions of a package before installing the new version.

Will this also happen when pyradiomics is published? Is there another way to specify that the version information should not be included in the name of the installed folder? Or that older versions of the package are uninstalled first?

Cloning the repository takes some time

An idea: Before doing a public release and involving a broader community, I would suggest to look for alternative approach to store the binary data.

time git clone [email protected]:Radiomics/pyradiomics.git
Cloning into 'pyradiomics'...
[...]

real    2m53.441s

May be they could be stored on data.radiomics.io ? Or in a repo named Radiomics/pyradiomics-data ? ...

Investigate integration of formulas into docstrings

also see

https://pythonhosted.org/an_example_pypi_project/sphinx.html

Debugging wavelet and LoG operations

It would be helpful to compare the outputs of LoG and wavelet operations (we are not testing those currently) with those implemented in Matlab. At this point, we don't test those at all, and it makes sense to visually compare the outputs before doing the tests.

Initial testing framework setup

Nicole, this is just a placeholder to put all testing-related issues and keep track of status.

One idea I've just had (in addition to the other things we discussed before) is to also include tests that confirm that the development practices are followed. For example, we can have a test that confirms that docstrings are not empty for all feature getters.

Stabilize dynamic binning for Wavelet transform output

Recompute bin edges and binwidth to maintain consistent bin count for all normalized stationary wavelet transform outputs.

Resolve calculation mismatches between Matlab and python code

Collect information about failing tests.
Calculate features on more TCGA data sets and compare Matlab vs python results.

Adding TCGA testing data

@vnarayan13 has access to 100 patient images and mask files plus Matlab processing output.
Where should this be hosted for testing?
We also need the information to link image file, mask file, Matlab results (patient ID?)

This will be used to test the feature ranking, please add some details here about how that will be generated and tested from this data set.

Migrate documentation for individual functions from the HeterogeneityCAD wiki

As described in the comments to #2, a lot of functions are thoroughly documented on this page: http://www.slicer.org/slicerWiki/index.php/Documentation/Nightly/Modules/HeterogeneityCAD. These descriptions should be added to the code as docstrings.

Add flake8 and/or pylint check

Doing so will ensure consistency of the code.

Example: https://github.com/scikit-build/scikit-build/blob/master/.flake8

Shift value in firstorder

There is a constant shift of 2000. At the very least, it should be a class constant. But I suggest we parameterize it (it will be irrelevant for MR images, for example), and also revise where it is actually needed (in many instances, intensity values are squared, so this becomes irrelevant).

Use simple ITK labelShapeStatisticsFilter for computation of Shape features

See also @fedorov's comment in #115.
SimpleITK.LabelShapeStatisticsImageFilter exposes most of our shape features (and some we want to add in the future). This would simplify this feature class.

Nearly all features currently in shape.py are based upon volume and surface area and could therefore be calculated if necessary.

Additionally, I think we need to think about the implementation of shape, as it is currently calculated at the same points as the other feature classes, e.g. for every input/filtered image. However, for shape the values of the gray level intensities play no role, so shape only has to be calculated once per label map.

Standard python package layout

@fedorov Assuming the package won't depend on Slicer, I would suggest to organize it a standard python package.

See https://github.com/pypa/sampleproject and https://packaging.python.org/en/latest/distributing/

Installation instructions

We need instructions how to install the package, and basic usage example - following helloRadiomics or something simpler.

While adding documentation, please investigate how those pages can be integrated with the sphinx documentation auto-generated from docstrings, so we have all of the documentation in one place and together with the code.

Degrees of freedom for skewness and kurtosis

Currently, the standard deviation of the pyradiomics package is calculated with 1 degree of freedom (i.e. the sum of squares is multiplied by 1 / (N-1))
Standard deviation is also used in skewness and kurtosis, to the power of 3 and 4, respectively.
However, in the current version of the package, a standard deviation with 0 degrees of freedom is used (i.e. the sum of squares is multiplied by 1 / (N)).

I think we should make it consistent. So either change the standard deviation to use 0 degrees of freedom, or change skewness and kurtosis. Both are equally simpel to implement.

Improve testing output

@naucoin If the expected vs current output of a test is more than one value, you could display the diff between two files doing something like this: https://github.com/kripken/emscripten/blob/07b87426f898d6e9c677db291d9088c839197291/tests/runner.py#L460

Confirm all feature classes are being tested

Currently, testing is not enabled for all of the feature classes due to some of the formulas being debugged. Whenever the implementation is finalized, make sure all feature classes are tested, and close this issue then.

Empty matrix handling

During testing of #63 testCase breast1 caused an empty glcm matrix for several angles.
How should the code handle this? Currently this empty matrix introduces some NaN-values due to 0-division in normalisation (P_glcm is divided by the sum, which is 0 in case of an empty matrix). This is causing all calculated feature values to also return NaN.
Possible solutions would be

Leave it as it is. This will cause all feature values to return NaN
Using numpy.where, prevent the matrix from containing NaN-values (but in stead containing only 0's). This will cause some featureValues to return NaN
Delete the angles which have an empty matrix associated with them. The resultant feature value will then be based only on angles containing a non-empty matrix and shouldn't return NaN)

How should we go about this?

Padding size when filtering is applied

In the situations when, for example, we apply LoG, padding should be added to make sure filtering is possible, and pad size should be calculated as a function of kernel size.

Computation time performance profiling of feature calculation

Some information that gives user the idea about the relative computation time could be helpful

Add bin (and other?) parameters to the constructor

Default bin width should be 25

Related issue pointed out by JC: http://pythontips.com/2013/08/04/args-and-kwargs-in-python-explained/

Add appveyor and travisci testing

One example is in https://github.com/qiicr/dcmqi

pyradiomics tests always use default settings

In test_features circleci tests all specified features (standard set now only firstorder).
However, when the feature calculation classes are intitialized, no arguments are passed.
Therefore, it will always use the standard settings.
Currently:

self.binWidth = 25
self.resampledPixelSpacing = None # no resampling
self.interpolator = sitk.sitkBSpline
self.padDistance = 5 # no padding
self.padFillValue = 0
self.verbose = True

Provide version information in output

Pyradiomics / dependencies versions should be passed as part of the output. Probably not logging but maybe as part of the feature table?

Laplacian smoothing filter options

ITK filter has SetNormalizeAcrossScale = false by default. Make sure this is corrected to True if needed when LoG is integrated into the new classes.

http://www.itk.org/Doxygen/html/classitk_1_1LaplacianRecursiveGaussianImageFilter.html#a35e2271327e14dc69b8e8371839b9883

Add documentation generation based on sphinx/rst

A possible example: https://github.com/scikit-build/scikit-build/tree/master/docs

Add requirements.txt with the versioned dependencies

2D maximum diameters

Thibaud Coroller asked me whether it was possible to add the maximum diameters in 2D, for correlation with those measured manually by radiologists.
I implemented these, but I'd rather wait with the PR until current issues with shape are resolved.

During development I did discover some differences with matlab. These are due to a bug in the matlab code. When the volume is small, matlab calculates a distance matrix (distMat) containing the euclidean distance. When the volume is larger and memory issues may occur, this matrix is calculated differently, but also the square root is not applied.
For the calculation of max3D this does not matter (calculated directly after assignment of distMat), with the square root applied once.
For the max2D, the square root is also applied, meaning that for small volumes, the square root is applied twice, causing an erroneous distance to be calculated. For larger volumes (where the square root is not applied to distMat), the calculated max2D distances are correct.

Clear dashboard from failing tests

https://circleci.com/gh/Radiomics/pyradiomics/66

Most of the failures can be resolved by adding baselines (features added by @JoostJM).

The only feature that is failing and is also implemented in matlab is test_scenario_brain2_glcm_SumSquares.

Separation of functionality in pyradiomics and pyradiomicsbatch command line tools

@Radiomics/developers: is there a good reason to have these two as opposed of having a single pyradiomics command line tool that would support operation for both single input and directory?

Among other things, having a single processing script might help make things more straightforward with Docker deployment.

Create artificial test cases

Some extreme cases of certain textures can still cause bugs in the pyradiomics code, but won't be triggered during contiguous testing because the current test cases do not present such a case. Adding certain artificial test cases may address this (for example, a test case containing a flat region).

Document the process of implementing feature classes and individual features

issues to cover:

inherit from base class
follow conventions for naming feature getters
adding tests
adding documentation

Add feature tests that include LoG

add corresponding checks to the test utils/classes

Establish mapping between Matlab features and pyradiomics features

Also revise the names to make them as readable as possible

Revise communication of feature computation results back to the caller

Currently, result of feature computation is communicated as a flat dictionary of shortFeatureName:value pairs: https://github.com/Radiomics/pyradiomics/blob/master/radiomics/base.py#L42-L45

It will probably be better if instead we map feature name to a structure that includes not only the value of the feature, but also the inputs and parameters used to get to this number. This helps with data provenance in general, and specifically should be useful for testing. Does this makes sense? I propose we brainstorm what should be the first iteration of that structure.

{
  pyradiomicsResult : {
     commonParameters :
        {
             inputImage : string,
             inputMask : string,
             binWidth : integer,
             ...
         },
      firstOrderFeaturesResult : 
         parameters :
             {
                  <not sure if there are any first order specific parameters ... >
             },
         values:
             {
                   mean : float,
                   STD : float,
                   ...
              }
       }
  }
}

commonParameters would be populated by the base class, and feature-class-specific parameters by the implementing class.

We can also consider JSON-LD for referring back to the versioned documentation page describing the concepts https://en.wikipedia.org/wiki/JSON-LD.

Let's discuss this at the next call?

Investigate the use of parameterization to simplify testing

nose-parameterized plugin information: https://pypi.python.org/pypi/nose-parameterized/
looks like exactly what we need: http://stackoverflow.com/questions/30874795/use-class-method-in-nose-parameterize-expand-call
please compare with the capabilities of nose2 that seems to include parameterization natively, and whether it is better to switch to nose2 instead of using the parameterization plugin: http://nose2.readthedocs.org/en/latest/differences.html

Refactor image resampling out of the base class

resampling option should be removed from the base class, and instead exposed as a separate preprocessing step, consistently to how LoG and wavelet operations are applied.

Add padding to cropToTumorMask

For small masks, the result of cropToTumorMask can be too small for many of the feature calculations, causing exceptions. Could cropToTumorMask have a pad parameter? My hack looks like this:

  #Determine bounds
  lsif = sitk.LabelStatisticsImageFilter()
  lsif.Execute(imageNode, maskNode)
  bb = numpy.array(lsif.GetBoundingBox(label))

  # Expand the bounds
  bb = bb + [-pad, pad, -pad, pad, -pad, pad]
  
  ijkMinBounds = bb[0::2]
  ijkMaxBounds = size - bb[1::2] - 1

update setup.py with proper attribution and contact info

@hugoaerts what do you want to have as the contact email? maybe a mailing list?

Add LoG to preprocessing

Build around Pandas / DataFrames instead of CSVWriter

The library would be a lot more flexible if it were built around Pandas and DataFrames since they are better suited for looking at collections of data. They also allow for many more export options like json, html, csv, and have better support for types inside columns extendable to figures and arrays

Add checks to ensure image and label correspondence

Add checks (in base class?):

image and label have similar (within tolerance) base pixel spacing
image and label have similar origins (does this vary with dimension?)
image and label have similar 3-space orientation (space directions in nrrd header?)
image and label have similar dimensions (not necessary, but need to somehow check registration)

`imageoperations.interpolateImage` not called

In radiomics.base settings are provide to resample an image.
The actual resampling is then implemented in imageoperations.interpolateImage.
However, this function is never called in the code.
It may be that this is applied in some API that is not included in the repository, but I can't find any reference in any of the radiomicsfeatureclasses or in any testclasses.

Add documentation on feature class at start of class

Issues to cover:

Explanation of feature class
Calculation of data matrix (if applicable)
Small matrix example
Possible custom settings (none necessary for currently implemented feature classes)
References (Article / Wiki / Other URLs)

Update baselines for first order Uniformity and Entropy

These should be taken to be the values currently generated by the pyradiomics code implementation. The discrepancy is most likely due to the differences in the binning approach implementation, according to @JoostJM. Consensus was reached in the group that it is not worthwhile to investigate this, but use the pyradiomics implementation.

Command line tool for features/signature computation

As discussed with Hugo, it will be helpful to have a command-line tool accompanied by a configuration file to select features to be calculated. Given config file, image, and mask, it would output features in a JSON or another format.