Giter Site home page Giter Site logo

ReproMan

Supports python version GitHub release PyPI version fury.io Tests codecov.io Documentation

ReproMan aims to simplify creation and management of computing environments in Neuroimaging. While concentrating on Neuroimaging use-cases, it is by no means is limited to this field of science and tools will find utility in other fields as well.

Status

ReproMan is under rapid development. While the code base is still growing the focus is increasingly shifting towards robust and safe operation with a sensible API. There has been no major public release yet, as organization and configuration are still subject of considerable reorganization and standardization.

See CONTRIBUTING.md if you are interested in internals and/or contributing to the project.

Installation

ReproMan requires Python 3 (>= 3.8).

Linux'es and OSX (Windows yet TODO) - via pip

By default, installation via pip (pip install reproman) installs core functionality of reproman allowing for managing datasets etc. Additional installation schemes are available, so you could provide enhanced installation via pip install 'reproman[SCHEME]' where SCHEME could be

  • tests to also install dependencies used by unit-tests battery of the reproman
  • full to install all of possible dependencies, e.g. DataLad

For installation through pip you would need some external dependencies not shipped from it (e.g. docker, singularity, etc.) for which please refer to the next section.

Debian-based systems

On Debian-based systems we recommend to enable NeuroDebian from which we will soon provide recent releases of ReproMan. We will also provide backports of all necessary packages from that repository.

Dependencies

Python 3.8+ with header files possibly needed to build some extensions without wheels. They are provided by python3-dev on debian-based systems or python-devel on Red Hat systems.

Our setup.py and corresponding packaging describes all necessary python dependencies. On Debian-based systems we recommend to enable NeuroDebian since we use it to provide backports of recent fixed external modules we depend upon. Additionally, if you would like to develop and run our tests battery see CONTRIBUTING.md regarding additional dependencies.

A typical workflow for reproman run

This example is heavily based on the "Typical workflow" example created for ///repronim/containers which we refer you to discover more about YODA principles etc. In this reproman example we will follow exactly the same goal -- running MRIQC on a sample dataset -- but this time utilizing ReproMan's ability to run computation remotely. DataLad and ///repronim/containers will still be used for data and containers logistics, while reproman will establish a little HTCondor cluster in the AWS cloud, run the analysis, and fetch the results.

Step 1: Create the HTCondor AWS EC2 cluster

If it is the first time you are using ReproMan to interact with AWS cloud services, you should first provide ReproMan with secret credentials to interact with AWS. For that edit its configuration file (~/.config/reproman/reproman.cfg on Linux, ~/Library/Application Support/reproman/reproman.cfg on OSX)

[aws]
access_key_id = ...
secret_access_key = ...

Disclaimer/Warning: Never share or post those secrets publicly.

filling out the ...s. If reproman fails to find this information, error message Unable to locate credentials will appear.

Run (need to be done once, makes resource available for reproman login or reproman run):

reproman create aws-hpc2 -t aws-condor -b size=2 -b instance_type=t2.medium

to create a new ReproMan resource: 2 AWS EC2 instances, with HTCondor installed (we use NITRC-CE instances).

Disclaimer/Warning: It is important to monitor your cloud resources in the cloud provider dashboard(s) to ensure absent run away instances etc. to help avoid incuring heavy cost for used cloud services.

Step 2: Create analysis DataLad dataset and run computation on aws-hpc2

Following script is an exact replica from ///repronim/containers where only the datalad containers-run command, which fetches data locally and runs computation locally and serially, is replaced with reproman run which publishes dataset (without data) to the remote resource, fetches the data, runs computation via HTCondor in parallel across 2 nodes, and then fetches results back:

#!/bin/sh
(  # so it could be just copy pasted or used as a script
PS4='> '; set -xeu  # to see what we are doing and exit upon error
# Work in some temporary directory
cd $(mktemp -d ${TMPDIR:-/tmp}/repro-XXXXXXX)
# Create a dataset to contain mriqc output
datalad create -d ds000003-qc -c text2git
cd ds000003-qc
# Install our containers collection:
datalad install -d . ///repronim/containers
# (optionally) Freeze container of interest to the specific version desired
# to facilitate reproducibility of some older results
datalad run -m "Downgrade/Freeze mriqc container version" \
    containers/scripts/freeze_versions bids-mriqc=0.16.0
# Install input data:
datalad install -d . -s https://github.com/ReproNim/ds000003-demo sourcedata
# Setup git to ignore workdir to be used by pipelines
echo "workdir/" > .gitignore && datalad save -m "Ignore workdir" .gitignore
# Execute desired preprocessing in parallel across two subjects
# on remote AWS EC2 cluster, creating a provenance record
# in git history containing all condor submission scripts and logs, and
# fetching them locally
reproman run -r aws-hpc2 \
   --sub condor --orc datalad-pair \
   --jp "container=containers/bids-mriqc" --bp subj=02,13 --follow \
   --input 'sourcedata/sub-{p[subj]}' \
   --output . \
   '{inputs}' . participant group -w workdir --participant_label '{p[subj]}'
)

ReproMan: Execute documentation section provides more information on the underlying principles behind reproman run command.

Step 3: Remove resource

Whenever everything is computed and fetched, and you are satisfied with the results, use reproman delete aws-hpc2 to terminate remote cluster in AWS, to not cause unnecessary charges.

License

MIT/Expat

Disclaimer

It is in a beta stage -- majority of the functionality is usable but Documentation and API enhancements is WiP to make it better. Please do not be shy of filing an issue or a pull request. See CONTRIBUTING.md for the guidance.

Center for Reproducible Neuroimaging Computation's Projects

abcd-apps icon abcd-apps

abcd-apps from BrainLife -- just a DataLad superdataset to catalog them all

brainverse icon brainverse

BrainVerse is an electronic laboratory notebook built as an open-source, cross-platform desktop application to help researchers manage, track and share information in a comprehensive format.

coco2019-training icon coco2019-training

Training materials for the ReproNim day at https://coastal-coding.github.io

containers icon containers

Containers "distribution" for reproducible neuroimaging

datalad-nda icon datalad-nda

Playground for scripts to ease working with NDA via DataLad

demo-protocol icon demo-protocol

an example protocol which serves as default protocol in schema-ui

dgpa_workshop_2022 icon dgpa_workshop_2022

The repository for the ReproNim "Reproducible Neuroimaging" workshop, organized and conducted for the DGPA in spring 2022.

ds001907-edu icon ds001907-edu

Fork of the original OpenNeuro dataset, in order to make proper BIDS for the ReproNim/OHBM 2022 Educational course.

fsl_seg_to_nidm icon fsl_seg_to_nidm

Converts structural segmentation outputs from FSL's FIRST and FAST tool to NIDM

howwouldrepronim icon howwouldrepronim

Sphinx for rendering the "How Would ReproNim Do That?" series of documents

module-intro icon module-intro

Module i: An Introduction to Reproducible Neuroimaging training module

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.