Giter Site home page Giter Site logo

dicom-attribute-scraper's People

Contributors

arjunvenkatainnolitics avatar ccwoolfolk avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

guilhermeadams

dicom-attribute-scraper's Issues

Aggregator script

Given multiple JSON mappings from #3, aggregate the results into a single mapping. For example, { tag: “stringA” } and { tag: “stringB” } can be aggregated to a single { tag: [“stringA”, “stringB”] }.

This aggregator should accept the same write method arguments as #3. I.e., if the scraping script outputs JSON and sqlite, we should make sure the aggregator can aggregate both (I suspect the sqlite aggregator would be quite simple).

  • The source files are not required to contain identical tags. In other words, the number of example values can vary by tag.
  • Tags that are not included in any of the source files are not included in the output object. That is, the key does not exist in the output object. This script promises to aggregate the inputs but provides no guarantee about the coverage of those inputs.

Create and run script to aggregate DICOM files from the Innolitics library

We need to generate the "big" example file. Since we may need to do this again in the future, we should make a little script to handle it. Here are the tasks I expect we will need to complete, but chime in below if you see anything I have missed.

Information gathering:

  • Ping Yujan to clarify/confirm file structure (this isn't strictly necessary -- you could write a fully flexible implementation -- but I suspect it will save some time/effort to match the existing structure at least somewhat) and to permission you on Doomfist.

Script:

  • Find all DICOM files in the example file directories (find may be useful here).
  • Run each file through the scraper, saving the results in a temporary JSON file.
  • Aggregate the temporary JSON files using the aggregator.
  • Clean up any temporary files.

Follow-up:

  • Retrieve the resulting aggregated file (this may be trivial).
  • Manually review a few of the Patient fields (name, in particular). I would be very surprised if we had unintentionally stored any files with real patient data, but it's probably worth a 5-minute check before we put the information in front of 10s of thousands of views per week. :)
  • Create PR in dicom standard browser to add the newly generated example file if everything looks good.

I think a makefile or a shell script would be the best choices for format. I would probably lean toward Make but do not have a strong preference one way or the other.

Modify scraper to delete spaces in tags

Tags used in the browser do not include the space after the comma: "(0008,0008)" instead of "(0008, 0008)". This discrepancy causes the tags to not be recognized in the Dicom browser code, so the spaces must be removed during the attribute scraping.

Add SQLite option to scraper and aggregator

When processing a large number of files, it may be more efficient to write to a simple sqlite table instead of creating a json file for every dicom file. For example:

attribute value
"0010,0020" "patient-id-12345"
"0010,0040" "M"
... ...
"0010,0040" "O"

This functionality can be used by passing an attribute to the script. E.g., python scraper.py --use-sqlite "sqlite-db-file-name".

Find example DICOM files

It would be nice to have a few shared example files to use during development. There aren't any hard requirements, but ideally, the files would include a variety of attributes (i.e., overlapping attributes are less useful). The analyzer tab at https://dicom.innolitics.com may be useful for roughly gauging overlap.

Attribute mapping script

Given a DICOM file, create a mapping from tag (“(0010,0010)” or its ID or hex equivalent) to value for each attribute in the file, subject to the criteria below.

Deliverable: Python script that accepts a DICOM file path and saves the result. Ex:

python scriptname.py input-file.dcm --json output-file.json
  • Include only VRs that can be usefully represented by a string. Ex: exclude Unknown, Other Byte, etc.
  • Sequences (SQ) are ignored; the underlying tags include examples
  • MVP: Proprietary tags are ignored
  • Allow for exclusion of user-specified tags (i.e., if we want to exclude a tag for privacy reasons or because it is malformed in the example file)
  • Truncate example values at [x] characters by default and allow the user to specify a non-default value
  • The script defaults to JSON output but supports multiple write methods. For example, it should be easy to write to a sqlite database rather than a JSON file. We can start with JSON to nail the design, but if we are running 10s of files, something like sqlite will probably be easier to use in practice.
  • Automated tests are included. We can write unit tests as needed, but at a minimum, we need a few integration tests on minimal example files.

I suspect https://pydicom.github.io/pydicom/stable/ will be useful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.