Giter Site home page Giter Site logo

pydicom / deid Goto Github PK

View Code? Open in Web Editor NEW
134.0 9.0 41.0 90.46 MB

best effort anonymization for medical images using python

Home Page: https://pydicom.github.io/deid/

License: MIT License

Python 99.92% Dockerfile 0.08%
dicom pydicom medical medical-images deidentify anonymize

deid's Introduction

Deidentify (deid)

Best effort anonymization for medical images in Python.

DOI Build Status

Please see our Documentation.

These are basic Python based tools for working with medical images and text, specifically for de-identification. The cleaning method used here mirrors the one by CTP in that we can identify images based on known locations. We are looking for collaborators to develop and validate an OCR cleaning method! Please reach out if you would like to help work on this.

Installation

Local

For the stable release, install via pip:

pip install deid

For the development version, install from Github:

pip install git+git://github.com/pydicom/deid

Docker

docker build -t pydicom/deid .
docker run pydicom/deid --help

Issues

If you have an issue, or want to request a feature, please do so on our issues board.

deid's People

Contributors

briankolowitz avatar dimitripapadopoulos avatar fcossio avatar glebsts avatar howff avatar jjderidder avatar johannesu avatar jstorrs avatar kolowitzbj avatar mjcbello avatar petkaze avatar robinfrcd avatar sjswerdloff avatar vsoch avatar wetzelj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deid's Issues

check out tag issue

This seems to be a common issue:

NotImplementedError: Invalid tag (403e, 3f62): Unknown Value Representation 'ร@' in tag (403e, 3f62)

identifier for image should be file

I'm not comfortable with the fact that the API returns an entity id that is different than what goes in (without the original slash) as this is bound to lead to some error. We need to index the data on something that doesn't change until the end that further is one identifier per image, file name is reasonable to try.

items and lists

I find it a bit confusing that I can past an item or a list into many methods and always get a list back in return. This pattern is reflected many times throughout the code

# validate.py
if not isinstance(dcm_files,list):
        dcm_files = [dcm_files]
# header.py
if not isinstance(dicom_files,list):
        dicom_files = [dicom_files]

etc. I think it would be cleaner if the methods only took lists as arguments.

fresh build of Docker yields "ModuleNotFoundError: No module named 'pydicom'

Although the Dockerfile does in fact run "pip install pydicom" after the Docker image is built "docker run pydicom/deid" using a command like inspect will fail with a package not found error.

It is possible this is related to miniconda3 using Python 3.7 but "matplotlib=2.1.2" forces a downgrade to Python 3.6.6, which occurs after the "pip install pydicom" command in the Dockerfile.

I switched the order of the conda install for matplotlib and pip install pydicom in Dockerfile and the problem went away, like this:

RUN apt-get update && apt-get install -y wget git pkg-config libfreetype6-dev
RUN /opt/conda/bin/conda install matplotlib==2.1.2
RUN pip install pydicom
RUN mkdir /code
ADD . /code
WORKDIR /code
RUN python /code/setup.py install

Could this be investigated and incorporated into the code base, if it makes sense? Thank you!

interactive web interface for generating deid files

the user should be able to interactively generate a spec file to say how he/she wants his deidentification task to be done. For the SOM, we can point users here to generate, and then associate their spec files with the pipelines they have us doing.

Issue with full path names?

I updated to the latest version and my code broke. I'm using the full file path as the key for the ids dictionary.

if idx in ids: breaks b/c idx is the basename but my keys are the full path

a) is this a bug?
b) (or) do i need to modify my code to remove the path from the ids keys

header.py

if recipe.deid is not None:
            if idx in ids:
                for action in deid.get_actions():
                    dicom = perform_action(dicom=dicom,
                                           item=ids[idx],
                                           action=action) 

pydicom.read_file() --> _get_pixel_array() no longer exists

Attempting to use the DicomCleaner class to do pixel-level cleaning.

Looks like I keep getting an error in the clean.py file:

in
5 print(out)
6 if out['flagged']:
----> 7 client.clean()

/opt/conda/lib/python3.6/site-packages/deid-0.1.23-py3.6.egg/deid/dicom/pixels/clean.py in clean(self)
106
107 # We will set original image to image, cleaned to clean
--> 108 self.original = dicom._get_pixel_array()
109 self.cleaned = self.original.copy()
110

/opt/conda/lib/python3.6/site-packages/pydicom/dataset.py in getattr(self, name)
530 if tag is None: # name isn't a DICOM element keyword
531 # Try the base class attribute getter (fix for issue 332)
--> 532 return super(Dataset, self).getattribute(name)
533 tag = Tag(tag)
534 if tag not in self.tags: # DICOM DataElement not in the Dataset

AttributeError: 'FileDataset' object has no attribute '_get_pixel_array'

Tracing this issue back it looks like pydicom's FileDataset doesn't actually have a _get_pixel_array() function, as follows:

from pydicom import read_file
dicom = read_file(dicom_files[0])
dicom._get_pixel_array()

(dicom_files is a list of local paths to DICOM files). Gives the same error at the same location:

AttributeError Traceback (most recent call last)
in
1 from pydicom import read_file
2 dicom = read_file(dicom_files[0])
----> 3 dicom._get_pixel_array()

/opt/conda/lib/python3.6/site-packages/pydicom/dataset.py in getattr(self, name)
530 if tag is None: # name isn't a DICOM element keyword
531 # Try the base class attribute getter (fix for issue 332)
--> 532 return super(Dataset, self).getattribute(name)
533 tag = Tag(tag)
534 if tag not in self.tags: # DICOM DataElement not in the Dataset

AttributeError: 'FileDataset' object has no attribute '_get_pixel_array'

On the other hand, I seem to be able to call for the variable directly using dicom.pixel_array or, alternatively, dicom.__getattribute__("pixel_array")

Maybe pydicom changed the Dataset class at some point and broke the clean.py implementation?

AttributeError: 'function' object has no attribute 'tostring'

Hi,
Thanks for the opportunity to use this cool library ๐Ÿ™‚

So, everything is going great until the .save_dicom() step.

client = DicomCleaner(output_folder='/output', deid=my_deid)
client.clean()

scrubbing happens:
Scrubbing /Users/me/dicoms/dicom.dcm.
Then,
client.save_dicom()
gives

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-77-35491d2123a0> in <module>()
----> 1 client.save_dicom()

/Users/me/anaconda3/envs/python_p27/lib/python2.7/site-packages/deid/dicom/pixels/clean.pyc in save_dicom(self, output_folder, image_type)
    179             dicom_name = self._get_clean_name(output_folder)
    180             dicom = read_file(self.dicom_file,force=True)
--> 181             dicom.PixelData = self.clean.tostring()
    182             dicom.save_as(dicom_name)
    183             return dicom_name

AttributeError: 'function' object has no attribute 'tostring'

It looks like the problem is with the read_file from pydicom...but I'm assuming the dicoms are readable since they were opened and read in the scrubbing step.

To experiment, I tried to save as a png:
client.save_png()
and got

Error in callback <function post_execute at 0x111179668> (for post_execute):
UsageError: Invalid GUI request 'pdf', valid ones are:[None, 'osx', 'widget', 'qt5', 'qt', 'nbagg', 'gtk', 'qt4', 'gtk3', 'notebook', 'tk', 'ipympl', 'inline', 'asyncio', 'wx']
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/Users/me/anaconda3/envs/python_p27/lib/python2.7/site-packages/matplotlib/pyplot.py in post_execute()
    146 
    147             def post_execute():
--> 148                 if matplotlib.is_interactive():
    149                     draw_all()
    150 

AttributeError: 'NoneType' object has no attribute 'is_interactive'

Not sure why the error has to do with 'pdf'?
Thanks for taking a look

add custom "endswith:" filter for fields

instead of the following:

FORMAT dicom

%header

REPLACE PatientID var:entity_id
REPLACE SOPInstanceUID var:item_id
ADD PatientIdentityRemoved Yes"
JITTER InstanceCreationDate var:jitter
JITTER InstanceCreationTime var:jitter
JITTER StudyDate var:jitter
JITTER SeriesDate var:jitter
JITTER AcquisitionDate var:jitter
JITTER OverlayDate var:jitter

I should be able to do:

FORMAT dicom

%header

REPLACE PatientID var:entity_id
REPLACE SOPInstanceUID var:item_id
ADD PatientIdentityRemoved Yes"
JITTER endswith:Date var:jitter
JITTER endswith:Time var:jitter

to apply the same filter over all fields that end with (and start with) the term of interest.

Allow for custom functions to be passed into deid recipes

I would want to be able to have this in a deid recipe:

%header
REPLACE PatientID func:generate_id

And the action would be to use a function in the global space called "generate_id" to provide the PatientID, and return the new value. This is appropriate for "on the fly" generation of values.

[priority] add CHANGELOG

we need to be very on top of keeping track of changes, and making changes to development that coincide with particular versions.

ignoring user specified deid configuration

If i specify a deid file, why do you append the default 'dicom' file? it overrides my preferences. for instance i'd like to specify my own patientid in my deid file, but your 'dicom' default file removes it

in my deid.dicom file i'd like to have the following override your base configuration

%header
REPLACE PatientID var:patient_id

my "fix" is in header.py

if deid is not None:
        # deid = load_combined_deid([deid])
        deid = get_deid(deid, load=True)
    else:
        deid = get_deid('dicom', load=True)

possible resolutions seem to be
a) allow the user specified configuration to override your base configuration
b) add an option that allows the user to specify if they'd like to take your base configuration in addition to what they specify

Editing Dicom Preamble

Hello, I'll preface that I've only been learning python since March but have come a long way. I am using Pydicom and Deid to do some mass deidentification of dicom files and I noticed when I check the files, all fields I wanted changed in the header are changing, but the Media Storage SOP Class UID and Media Storage SOP Instance UID in the preamble are not changing. The SOP Class UID isn't that big of a deal because it's just an image type identifier, but more often than not, the Media Storage SOP Instance UID is just a copy of the actual SOP Instance UID with is PHI that needs removed. Is there a way to alter some code to get the Deid process to also change fields in the preamble as well? Thank you in advance for any help or guidance you can provide.

clean method appears to be off

In clean.py:

Line 122:
self.cleaned[minr:maxr, minc:maxc] = 0 # should fill with black

For coordinates [0,0,800,59] it blacks out the vertical left section instead of horizontal upper section of the image.

Example used with current source code gives ImportError

I checked out the current source tree, built the Docker image and then tried to use the data in the basic example. I get the exception detailed below when trying this, and it seems indeed like the import is missing in the code.

โžœ  code git clone [email protected]:pydicom/deid.git
Cloning into 'deid'...
โžœ  ~ cd deid 
โžœ  deid git:(master) docker build -t pydicom/deid .
โžœ  deid git:(master) docker run pydicom/deid inspect --deid /home/peter/code/deid/examples/deid/deid.dicom /home/peter/code/deid/deid/data/dicom-cookies --save 

Traceback (most recent call last):
  File "/opt/conda/bin/deid", line 11, in <module>
    load_entry_point('deid==0.1.11', 'console_scripts', 'deid')()
  File "/opt/conda/lib/python3.6/site-packages/deid-0.1.11-py3.6.egg/deid/main/__init__.py", line 157, in main
    from .inspect import main
  File "/opt/conda/lib/python3.6/site-packages/deid-0.1.11-py3.6.egg/deid/main/inspect.py", line 30, in <module>
    from deid.dicom import get_files
  File "/opt/conda/lib/python3.6/site-packages/deid-0.1.11-py3.6.egg/deid/dicom/__init__.py", line 1, in <module>
    from .header import (
  File "/opt/conda/lib/python3.6/site-packages/deid-0.1.11-py3.6.egg/deid/dicom/header.py", line 29, in <module>
    from .tags import (
  File "/opt/conda/lib/python3.6/site-packages/deid-0.1.11-py3.6.egg/deid/dicom/tags.py", line 28, in <module>
    from pydicom.tag import tag_in_exception
ImportError: cannot import name 'tag_in_exception'

Any ideas what might be wrong?

pip3 download question

pip3 download deid==0.1.18 for pretty much all python packages results in whl file bing downloaded, however for deid, gz source is downloaded and I need to generate the whl file by running setup.py bdist_wheel option. Not a big deal, but Is there a reason while deid behaves differently?

Pixel Data with undefined length must start with an item tag

The following image:
IMG00001.dcm.zip

The error happens after clean() method successfully returns (blanking out coordinates supplied) and save_dicom method is called.

With tag (7fe0, 0010) got exception: Pixel Data with undefined length must start with an item tag
Traceback (most recent call last):
File "/data/anaconda3/lib/python3.6/site-packages/pydicom/tag.py", line 30, in tag_in_exception
yield
File "/data/anaconda3/lib/python3.6/site-packages/pydicom/filewriter.py", line 475, in write_dataset
write_data_element(fp, dataset.get_item(tag), dataset_encoding)
File "/data/anaconda3/lib/python3.6/site-packages/pydicom/filewriter.py", line 435, in write_data_element
raise ValueError('Pixel Data with undefined length must '
ValueError: Pixel Data with undefined length must start with an item tag

deid version option has a bug

Running the container reveals:

(base) root@4abd1befe0c8:/code# which deid
/opt/conda/bin/deid
(base) root@4abd1befe0c8:/code# deid
Traceback (most recent call last):
  File "/opt/conda/bin/deid", line 11, in <module>
    load_entry_point('deid==0.1.19', 'console_scripts', 'deid')()
  File "/opt/conda/lib/python3.6/site-packages/deid-0.1.19-py3.6.egg/deid/main/__init__.py", line 141, in main
    if args.command == "version" or args.version is True:
AttributeError: 'Namespace' object has no attribute 'version'

Add function to perform JITTER

a Jitter of a timestamp means taking a variable field, to be used to jitter one or more fields. For example:

JITTER InstanceCreationTime var:item_timestamp

Would say to find the field InstanceCreationTime and jitter it by the number in the variable item_timestamp

deid identifiers --action all fails with NoneType object has no attribute get_actions

I have pulled the current master branch and tried the following:

(deid) โžœ  deid git:(master) โœ— cd examples/dicom 
(deid) โžœ  dicom git:(master) โœ— python deid-dicom-example.py 
Found 1 valid dicom files
Found 1 valid dicom files
Found 1 valid dicom files
Found 1 valid dicom files
Found 1 valid dicom files
Found 1 valid dicom files
Found 1 valid dicom files
WARNING No specification, loading default base deid.dicom
WARNING No specification, loading default base deid.dicom
WARNING No specification, loading default base deid.dicom
Traceback (most recent call last):
  File "deid-dicom-example.py", line 214, in <module>
    output_folder='/home/vanessa/Desktop')
  File "/home/peter/code/deid/deid/dicom/header.py", line 272, in replace_identifiers
    for action in deid.get_actions():
AttributeError: 'NoneType' object has no attribute 'get_actions'

Some information about my environment:

(deid) โžœ  dicom git:(master) โœ— python --version
Python 3.6.2
(deid) โžœ  dicom git:(master) โœ— pip list
certifi (2018.4.16)
chardet (3.0.4)
cycler (0.10.0)
deid (0.1.13, /home/peter/code/deid)
idna (2.6)
kiwisolver (1.0.1)
matplotlib (2.2.2)
numpy (1.14.3)
pip (9.0.1)
pydicom (1.0.2)
Pygments (2.2.0)
pyparsing (2.2.0)
python-dateutil (2.7.3)
pytz (2018.4)
requests (2.18.4)
retrying (1.3.3)
setuptools (28.8.0)
simplejson (3.15.0)
six (1.11.0)
urllib3 (1.22)
validator.py (1.2.5)

Any ideas what might be wrong?

Applying whitelist and blacklist filters

Hi, I have a question regarding the filter section of my config and my source code. In my configuration https://github.com/BrianKolowitz/deid/blob/development/my_examples/deid/deid.dicom I specify a whitelist

%filter whitelist

LABEL Xray
  contains Modality CR|DX

in my code https://github.com/BrianKolowitz/deid/blob/development/my_examples/dicom/my_deid.py i specify the configuration

 cleaned_files = replace_identifiers(dicom_files=dicom_files,
                                        ids=updated_ids,
                                        deid=deid,
                                        config=config_file_path,
                                        remove_private=True,
                                        output_folder=output_path)

but i see images with modalities PR and RG in my output_folder.

Is this a bug or am I not properly using the library?

Include specific Dicom group on get_identifiers

Is there a way to ensure that specific tag groups are included? Currently, the get_identifer function does not retrieve the 0051 group (which is essential for my image reconstruction...) Thanks!

DicomCleaner sav_dicom

Hello,
Thanks a lot for all these very usefull functions!
I get trouble generating dicom after clean:
File "C:\deid\dicom\pixels\clean.py", line 181, in save_dicom
dicom.PixelData = self.clean.tostring()
AttributeError: 'function' object has no attribute 'tostring'

Can you help me?
Regards

add logic for within line testing

eg, a filter might have a check in parentheses:

if (Criteria 1 and Criteria 2)
OR Criteria 3

So we need to evaluate the first parens first!

Support for duplicate DICOM file names

In the function deid.dicom.get_identifiers the dicom files are identified only by the file name.

This leads to hard to find bugs if you are trying to deidentify several series at the same time.

E.g.
Suppose you have two series

dicom/seriesA/00001.DCM
dicom/seriesB/00001.DCM

And you wish to update the Series Instance UID.

Following the example code you may write something like this

ids = get_identifiers(dicom_files)
for image,fields in ids.items():    
    fields['instance_id'] = pydicom.uid.generate_uid(entropy_srcs=uid)
    updated_ids[image] = fields 

Then both images would get the same instance_id.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.