Giter Site home page Giter Site logo

cycif_manager's Introduction

CyCIF Manager

Purpose: Provide pipeline platform infrastructure to streamline CyCIF Analysis both on a local machine and on the O2 cluster at HMS

Requirements

  • Can be run either locally or on O2
  • user must have O2 account
    • access to 'transfer_users' and 'ImStor_sorger' groups
    • to check:
groups
- If lack O2 access or groups, request at "https://rc.hms.harvard.edu/" 
  • data follows Folder Organization (shown below)
  • file 'markers.csv' that lists on each row the name of marker in order imaged
    • Example:
DNA1
AF488
AF555
AF647
DNA2
mLY6C
mCD8A
mCD68
DNA3
CD30
CPARP
CD7

Pipeline Workflow

CyCIF Pipeline Plan

Folder Organization Example

Project folder is at a location findable by O2

(base) bionerd@MTS-LSP-L06275:~/Dana_Farber/CyCif/git/CyCif_O2_Manager/example_data$ pwd
/home/bionerd/Dana_Farber/CyCif/git/CyCif_O2_Manager/example_data

Within your data folder there are separate folders for each imaged slide

  • Can be whole tissue slide or TMA (eventually)
(base) bionerd@MTS-LSP-L06275:~/Dana_Farber/CyCif/git/CyCif_O2_Manager/example_data$ ll
drwxrwxrwx 1 bionerd bionerd 4096 Aug  9 08:03 image_1/
drwxrwxrwx 1 bionerd bionerd 4096 Aug  9 08:04 image_2/
  • each folder should contain a subfolder: 'raw_files' with
    • where for each CyCIF cycle there should the raw images from the microscope
    • for example from Rare Cycte: '.rcpnl' and '.metadata '
(base) bionerd@MTS-LSP-L06275:~/Dana_Farber/CyCif/git/CyCif_O2_Manager/example_data$ ll image_1/
total 0
drwxrwxrwx 1 bionerd bionerd 4096 Aug  9 10:44 ./
drwxrwxrwx 1 bionerd bionerd 4096 Aug  7 12:19 ../
drwxrwxrwx 1 bionerd bionerd 4096 Aug  9 08:04 raw_files/
[ntj8@login01 image_1]$ cd raw_files/
[ntj8@login01 raw_files]$ ll
total 3326644
-rwxrwx--- 1 ntj8 ntj8      11516 Jul  9 17:30 Scan_20190612_164155_01x4x00154.metadata
-rwxrwx--- 1 ntj8 ntj8 1703221248 Jul  9 17:31 Scan_20190612_164155_01x4x00154.rcpnl
-rwxrwx--- 1 ntj8 ntj8      11524 Jul  9 17:31 Scan_20190613_125815_01x4x00154.metadata
-rwxrwx--- 1 ntj8 ntj8 1703221248 Jul  9 17:32 Scan_20190613_125815_01x4x00154.rcpnl

After the CyCIF Pipeline is run there will be additional folders made (explained later), for each slide

Run CyCIF Pipeline

On O2

New User Installation

Run the following on O2 to modify your .bash_profile in order for commands to be found by path

echo 'CYCIF=/n/groups/lsp/cycif/CyCif_Manager/O2:/n/groups/lsp/cycif/CyCif_Manager/bin' >> ~/.bash_profile

echo 'export PATH=$CYCIF:$PATH' >> ~/.bash_profile

source ~/.bash_profile

Test CyCIF Pipeline Is Found. If works, should give the path to it. If not, will be blank

which cycif_pipeline_activate.sh

Run CyCIF Pipeline on O2

Three stages:

  • Transfer data
  • Activate CyCif Pipeline: Makes all of the necessary files unique to your dataset to submit jobs to O2
  • Run CyCif Pipeline: Submits all modules to run on O2 job scheduler

*Currently, large datasets (>10 images) overwhelm O2 que capacity. Next version will fix

Transfer Data to scratch disk. Example:

  • transfer.sbatch [from] [to]
  • Change 'ntj8' to your O2 username
  • Must use previously defined folder organization

sbatch transfer.sbatch /n/files/ImStor/sorger/data/RareCyte/nathantjohnson/Data/example_data/ /home/ntj8/scratch

Activate CyCIF Pipeline and Run

  • Must move within your dataset so all of the working and log files are within your dataset
  • Change '/n/scratch2/ntj8/example_data' to your data's path
  • DONT FORGET YOUR 'markers.csv' file (see above)
cd /n/scratch2/ntj8/example_data
cycif_pipeline_activate.sh /n/scratch2/ntj8/example_data
bash Run_CyCif_pipeline.sh

*If part of the pipeline has already been run, it will not re-run or overwrite the previous files

Transfer data back to ImStor (don't forget to upload to Omero)

sbatch transfer.sbatch /home/ntj8/scratch/example_data/ /n/files/ImStor/sorger/data/RareCyte/nathantjohnson/Data/example_data

Run On Local Machines (request analysis time: https://ppms.us/hms-lsp/login/)

Talk to Nathan if needed

Results

Upon completion of the pipeline there will be the following folders within your project directory containing the processed information from each part of the pipeline. The folders are:

  • cell_states
    • Placeholder for future analysis
  • clustering
    • Placeholder for future analysis
  • dearray
    • Contains masks
  • feature_extraction
    • The counts matrix of marker expression at a single cell level for all images (Output of HistoCAT software)
  • illumination_profiles
    • Preprocessing files required for stitching the acquired raw tiles into a single image (Ashlar)
  • prob_maps
    • Probability maps predicted by the UMAP deep learning algorithm for identifying nucleus, cell borders and background
  • raw_files
    • Your original folder containing the raw images
  • registration
    • Image that has been stitched and aligned over multiple cycles (needs to be uploaded to Omero for viewing or can be viewed using Image J)
  • segmentation
    • Provides the location for the nuclei and a cell within an image

Advertising Pros

  • Anyone familiar with command line can run:
    • transferable
    • automatic
    • straightforward to run
    • one location, ease of use
    • user supplies location for data
  • any new analysis methods can be added/switched as a module for CyCif analysis
  • ability to scale (pipeline manager organizes at both method and image)
  • merges every component of cycif pipeline to be runnable and scalable on O2
  • With minor modification can switch to AWS (or any cloud computing)

Whats Next? (By Priority)

  • add analysis parameters
  • manage moving image data from ImStor and Scratch
  • move to nextflow pipeline management system
  • improve useability
  • benchmark scalibility
  • add QC metrics/prompting for user
  • track usage
  • test for scalability limitations
  • GUI/webpage interface for non-command line users to provide information
  • update to singularity/docker containers for methods transferability

For Developers

Updated your code? Wish to add your method to pipeline? Contact Nathan

Install & Run

Assumption: Matlab Installed, Linux Environment (can use linux subsystem for windows)


git clone [email protected]:bioinfonerd/CyCif_O2_Manager.git

Install conda environments and example data by running within github directory

install.sh
install_example_dataset.sh

Local Install

  • Talk to Nathan if needed

cycif_manager's People

Contributors

bioinfonerd avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.