Light

pennlinc / cubids Goto Github PK

View Code? Open in Web Editor NEW

19.0 7.0 8.0 8.76 MB

Curation of BIDS (CuBIDS): A sanity-preserving software package for processing BIDS datasets.

Home Page: https://cubids.readthedocs.io/

License: MIT License

Makefile 1.16% Python 97.91% Shell 0.93%

neuroimaging neuroimaging-data-science python-package data-curation data-organization neuroscience neuroscience-methods neuroinformatics

cubids's Introduction

CuBIDS: Curation of BIDS

Latest Version

GitHub Repository

Documentation Status

Test Status

Codecov

Publication DOI

Zenodo DOI

License

About

CuBIDS (Curation of BIDS) is a workflow and software package designed to facilitate reproducible curation of neuroimaging BIDS datasets. CuBIDS breaks down BIDS dataset curation into four main components and addresses each one using various command line programs complete with version control capabilities. These components are not necessarily linear but all are critical in the process of preparing BIDS data for successful preprocessing and analysis pipeline runs.

CuBIDS facilitates the validation of BIDS data.

CuBIDS visualizes and summarizes the heterogeneity in a BIDS dataset.

CuBIDS helps users test pipelines on the entire parameter space of a BIDS dataset.

CuBIDS allows users to perform metadata-based quality control on their BIDS data.

CuBIDS helps users clean protected information in BIDS datasets, in order to prepare them for public sharing.

For full documentation, please visit our ReadTheDocs.

Citing CuBIDS

If you use CuBIDS in your research, please cite the following paper:

Covitz, S., Tapera, T. M., Adebimpe, A., Alexander-Bloch, A. F., Bertolero, M. A., Feczko, E., ... & Satterthwaite, T. D. (2022). Curation of BIDS (CuBIDS): A workflow and software package for streamlining reproducible curation of large BIDS datasets. NeuroImage, 263, 119609. doi:10.1016/j.neuroimage.2022.119609.

Please also cite the Zenodo DOI for the version you used.

cubids's People

Contributors

Stargazers

Watchers

Forkers

pennsive megardn vuiis yumekamengjialyu cookpa tsalo yarikoptic

cubids's Issues

[DOC] Describe the usage of key/param group csv files

Describe what these do and how to use them

Add cubids-make-exemplars CLI program

A CLI function that takes one subject per Acquisition Group and copies them into a new BIDS directory

Copy over everything else necessary to make a complete BIDS dataset (dataset_description.json, etc)
Create other directories, YODA-style (eg code/, bidsdatasets/)

Add --config flag to apply

Add to cli and integrate into the call to get_CSVs within apply!

bond-index

BOnD utility that should be called after bond-apply is done. This utility will

Create and store a pybids layout object
Generate layout report https://bids-standard.github.io/pybids/reports/index.html (make report optional flag)
Use python argparse (parse_known_args) to get unknown args

How to pass git user information to containerized datalad?

There is not git configuration inside the container so if anyone uses the docker version of datalad there are warnings printed. How do we pass git configurations into the docker image?

[ENH] Add bond-datalad-init CLI

let users initialize a directory with datalad using BOnD

Change apply and purge to run a .sh file

This is to avoid the command too long to execute error subprocess.run throws on a super long string of commands.

[ENH] Add ApplyMerges Function

Way of harmonizing metadata. Change sidecars to match the param group you want to merge into.

[CI] Add test for MergeInto 0

There needs to be a test to verify that MergeInto==0 deletes a file

[ENH] Make a function that lists ALL unique keys in sidecars, then lets you delete some

Before checking into DataLad, some sidecars might have patient info in them. So there has to be a way to list every unique field seen in a project and then offer to delete some of them. Once all the info you don't want tracked is gone, you can check everything into datalad

organize csvs by modality

Relational Params need to be boolean

IntendedForKeyXX and FieldmapKeyXX values are now True/False instead of filepaths. This should decrease the number of fmap param groups significantly.

add back precision

want to round AFTER clustering so that params that don't belong in the acq-VARIANT string don't end up there

[ENH] Add subject/session Acquisition Groups

A set of scans belonging to either a subject or a session will also be a set of Key/Param groups. The combination of Key/Param groups determine how a pipeline will run on that data.

TODO

A function that groups subjects or sessions based on the Key/Param groups contained
A CLI entrypoint for that function

delete associations when MergeInto == 0

Call purge on list of deletion scans identified in metadata_merge.py (check_merging_operations) and returned to apply_csv_changes

Rename everything to CuBIDS

To change:

CLI functions
name in setup.cfg, setup.py
Name in Docs
Python code

[ENH] Speed up fieldmap checking

The current implementation of get_fieldmaps is incredibly slow for large datasets.

[ENH] Add lengths of IntendedFor list in param groups

Rename Key Groups and Filenames

[ENH] Detect key groups and param groups

We need to be able to find all param groups associated with a key group

To Do:

Find the key groups in the testdata/inconsistent dataset
Find the param groups under each of these
- Determine which metadata values are part of the param groups for each datatype

be able to ignore sidecar fields you don't want to determine param groups

read a config file where you can add or ignore fields to the grouping e.g. IntendedFor

[ENH] Singularity validator wrapper

Run validator in singularity

apply doesn't work if group uses a relative path

Files csv paths can't include '/gpfs/..' and they do if the path to the dataset while running group was relative.

This would be an issue if running CuBIDS on a dataset stored on a different machine, because then you have to use relative paths.

IDEAS FOR FIXING THIS:

change the way we get the new path so it's not adding "self.path" to the old path (get the old front and add to the new path instead of self.path + new stem)

How to read and write JSON files

Proof of Concept on reading & writing json files in a Jupyter notebook

Displaying the data in the sidecar
Editing this data
Checking that the sidecar will write "valid json"

Sort summary csv by param group count values

Within modality, across count (in descending order)

fix asl issues

[ENH] Check a single new subject against a summary group

for checking if any new parameter groups were introduced

[ENH] Add installation files

Add files for creating a python package, docs and CI tests

[ENH] Make BIDS validator easy to use

Problem: NodeJS is hard to install and maintain and it is not secure

To Do:

Make a Docker wrapper that pulls and runs BIDS-Validator on a BIDS tree
^ but with Singularity
Copy Tinashe's parsing code from RBC

Integrate new modality specific config file into bond-group

[ENH] Add image info to sidecars

The dimensions and voxel size are stored in nifti headers, which we don't want to have to read in order to group scans by this information.

Check BIDS spec to see if they have tags for image size/dimensions
Add a CLI bond-nifti-info command that adds image from the nifti headers to sidecars
Add "Dimension1", etc to IMAGING_PARAMETERS in bond.constants
Add pytests

rename .bvals, .bvec, and .tsv files.

CLEAN HOUSE + valid docstrings

make datalad optional for apply and purge

The only occurrence of datalad in the apply and purge functions is where we datalad run the merge commands for apply and the rm commands for purge.

Change those two lines to use subprocess.run instead of datalad_handle.run if the use_datalad flag is unset

fix change filename errors

[ENH] Add number of scans that appear under each param group

List number of scans in a new column

no looping through pandas dfs

THIS IS A RUNTIME SUCK

Comb thought the repository and get refactor df loops

automate remaining based on column names

Populate RenameKeyGroups column of the summary df with auto detection of variance from the dominant group.

sort the key groups by count within modality

allow valid merges

[ENH] Infrastructure: make a class

Create an object that encapsulates the BIDS directory and operations on it.

It should

Find and validate a BIDS tree
List unique key-value pair sequences
Provide methods to alter json sidecars based on key-value pair sequences

Ideas:

Use AnyTree
PyBIDS - look into their caching feature

fix sorting within modality by key group count

Create new column KeyParamGroup in summary and files csvs

[ENH] Set up testing

ToDO

[ENH] use get_fieldmap() to check for fieldmap types

The fieldmap type needs to go in the param groups

New Columns in the summary csv

-rename key group

-notes

-manual check

Key-Value Pairs PofC

Changing or rearranging bids name key value pairs

Recommend:

Read the name of a bids file
Create a dictionary of each of the key value pairs
Read multiple files
Create 1 dictionary where the keys come from the files' keys, and the values are a list of the possible values

Add string key/param column to summary and files csvs

make self.layouot lazy

Renaming Files PofC

Create a proof of concept of some code that can rename a file in memory and test if that file name is valid before renaming the actual file.

Valid: Follows a key-value pair structure (eg no dunders __; each key has a value EXCEPT the last one; has a suffix like nii.gz)

See https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/01-magnetic-resonance-imaging-data.html

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.