Giter Site home page Giter Site logo

nivm's Introduction

NIVM

This repository provides the Neuroimaging extension for the Data Science Virtual Machine (DSVM), a customized VM image on Microsoft’s Azure (wikipedia) for Neuroimaging research. The central goal of this extension is to enable neuroimaging scientists to have easy access to scalable computing and storage resources for their research projects. The central aim of this project is to make the transition to the cloud as simple as possible. See documentation below for details, including how to try the extension for free.

Once you are ready to try it out, press this button to deploy this VM to Microsoft Azure (and wait ~30 minutes until all the packages and data are downloaded and installed):

Note: You will be asked to create a resource group, which allows you to organize your VMs for different projects. As location, it is recommended to use East US 2 for now.

Incentives for moving to the cloud

Currently, many researchers are running their analyses on desktop computers or even their laptops. In some cases, neuroimaging laboratories have acquired an expensive on-premises computing and storage cluster with something like 50TB of storage and 30 computing cores. In rare circumstances do laboratories take advantage of shared computing resources provided by the host institute (e.g. university) for its entire scientific community.

Using the cloud allows scientists to quickly resize their virtual machine, from only sporting a dual-core CPU to several GPUs (DLVM). It is even possible to deploy a farm of VMs for rapid speed-up. Similarly, storage capacity can be flexibly increased and decreased on demand, not requiring researchers to buy a large amount of expensive redundant storage to safeguard against running out of space.

Consider how neuroimaging analyses would benefit from running on the cloud:

Performance and Storage space

  • MRI analyses require execution of complicated pipelines that are both time-consuming and computationally intensive, computing jobs typically run for hours and days. Requiring researchers to keep their laptops or workstations running.
  • Each neuroimaging project requires about 2TB of data, forcing labs to purchase large amounts of redundant storage, as data must be stored securely for many years, per request of federal (NIH, NSF) and many private funding agencies.

Scalability

  • Neuroimaging jobs are highly parallelizable. For example, each participants' data can be processed independently. Parameter and model selection can also be performed in parallel.
  • Many NI packages support GPUs, but GPUs aren’t required in every step of the analysis process. Ability to switch between cheaper and slower CPU VMs (DSVM) and more expensive and much faster GPU VMs (DLVM) would be beneficial.
  • Advanced data analysis techniques have led to impressive improvements of analysis results, but are also computationally intensive. They often also require the storage of intermediate results, which can take a lot of storage space for a limited amount of time.

Recent developments that favor cloud computing

  • Since the advent of data-sharing initiatives, many data sets are online and can be shared by scientists. Microsoft is currently considering to mirror the most essential of these data sets, so that their storage wouldn't have to be paid for.
  • Increases of MRI scanner magnetic field strengths and improvements to data acquisition protocols have enabled higher-resolution data sets, requiring larger amounts of storage, and more time/cycles to process.
  • Many current analysis approaches require permutation testing, to establish empirically what the likelihood of the found results is, given that there is no true effect in the data. Permutation testing is computationally intensive, as the same analyses must be run 100s or 1000s of times on simulated (hypothetical) data.

Reproducibility

  • VMs can be captured, and images of the VM can be downloaded. This will benefit efforts of increasing the reproducibility of published scientific findings. VM machine images will contain all the data, and analysis script need to reproduce the published results. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4936733/)

Overview of extension

Operating System: Ubuntu Linux Data Science Virtual Machine, associated with Neurodebian.org for package management. (An extension for Windows 10 may also be created, if there is demand for it.)

This DSVM extension for neuroimaging strongly relies on packages available through the NeuroDebian.ORG software repository. Specifically, this extension contains the following software: afni, connectome-workbench, connectomeviewer, itksnap.

FSL, however, is installed using the fsl_installer.py. Installation path is /data/fsl.

The extension also comes w/ Docker support, making it easy to install fmriprep.

Note: Matlab and SPM are currently not pre-installed with this extension. Please let us know of a good way of doing this without having a matlab license.

Getting started:

Go to Microsoft Azure portal, to create a free account. This will let you try this extension for free.

It may be benefitial to briefly browse the help pages for Microsoft Azure (https://ms.portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade/overview).

The documentation for DSVMs is also great.

Workflow

The very first step is to provision and deploy a Data Science Virtual Machine (DSVM) with this extension (click button above).

At the very core, using this extension requires the following steps:

  1. Look up the IP address of your virtual machine after booting it up.
  2. Upload your data using e.g. rsync, scp. It is recommended to store the data in /data. Note that it is easy to add additional hard-drives to your VM.
  3. Use the x2go client to connect to your virtual machine. Make sure to read the documentation for specific settings you need for this to work.

Existing Cloud solutions

Further Reading:

Support

See troubleshooting for some commonly encountered issues.

Please feel free to email any of the contributors for help.

Contribute

Please feel free to fork this repository, if you want to have other/additional software installed. Any augmentation of the documentation for this extension is of course also more than welcome. Please create a pull request if you think it worthwhile merging your fork with the main branch.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.