SCanD_project

This is a base repo for the Schizophrenia Canadian Neuroimaging Database (SCanD) codebase. It is meant to be folked/cloned for every SCanD dataset

General folder structure for the repo (when all is run)

${BASEDIR}
├── code                         # a clone of this repo
│   └── ...    
├── containers                   # the singularity image are copied or linked to here
│   ├── fmriprep-20.1.1.simg 
│   ├── fmriprep_ciftity-v1.3.2-2.3.3.simg 
│   └── mriqc-22.0.6.simg simg
├── data
│   ├── local                    # folder for the "local" dataset
│   │   ├── bids                 # the defaced BIDS dataset
│   │   ├── mriqc                # mriqc derivatives
│   │   ├── fmriprep             # fmriprep derivatives
│   │   ├── freesurfer           # freesurfer derivative - generated during fmriprep
│   │   ├── ciftify              # ciftify derivatives
│   │   ├── cifti_clean          # dtseries post confound regression
│   │   └── parcellated          # parcellated timeseries
│   |
│   └── share                    # folder with a smaller subset ready to share
│       ├── ciftify              # contains only copied over qc images and logs
│       ├── fmriprep             # contains only qc images, metadata and anat tsvs
│       └── parcellated          # contains the parcellated data
├── logs                       # logs from jobs run on cluster                 
|── README.md
└── templates                  # an extra folder with pre-downloaded fmriprep templates (see setup section)
    └── parcellations
        ├── README.md
        └── tpl-fsLR_res-91k_atlas-GlasserTianS2_dseg.dlabel.nii

Currently this repo is going to be set up for running things on SciNet Niagara cluster - but we can adapt later to create local set-ups behind hospital firewalls if needed.

The general overview of what to do

Organize your data into BIDS..
Deface the BIDS data (if not done during step 1)
Setting your SciNet enviroment/code/and data
1. Clone the Repo
2. Run the software set-up script (takes a few seconds)
3. Copy or link your bids data to this folder
Run MRIQC
Run fmriprep
Run ciftify
Run ciftify_clean and parcellate
Run the scripts to extract sharable data into the sharable folder

Organize your data into BIDS

This is the longest - most human intensive - step. But it will make everything else possible! BIDS is really a naming convention for your MRI data that will make it easier for other people the consortium (as well as the software) to understand what your data is (what scan types, how many participants, how many sessions..ect). Converting to BIDS may require renaming and/or reorganizing your current data. No coding is required, but there now a lot of different software projects out there to help out with the process.

For amazing tools and tutorials for learning how to BIDS convert your data, check out the BIDS starter kit.

Deface the BIDS data (if not done during step 1)

A useful tool is this BIDSonym BIDS app.

Setting your SciNet enviroment/code/and data

Cloning this Repo

cd $SCRATCH
git clone https://github.com/TIGRLab/SCanD_project.git

Run the software set-up script

cd ${SCRATCH}/SCanD_project
source code/00_setup_data_directories.sh

put your bids data into the data/local folder

We want to put your data into:

./data/local/bids

You can do this by either copying "scp -r", linking ln -s or moving the data to this place - it's your choice.

To copy the data from another computer/server you should be on the datamover node:

ssh <cc_username>@niagara.scinet.utoronto.ca
ssh nia-dm1
rsync -av <local_server>@<local_server_address>:/<local>/<server>/<path>/<bids> ${SCRATCH}/SCanD_project/data/local/

To link existing data from another location on SciNet Niagara to this folder:

ln -s /your/data/on/scinet/bids ${SCRATCH}/SCanD_project/data/local/bids

Running mriqc

## note step one is to make sure you are on one of the login nodes
ssh niagara.scinet.utoronto.ca

## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull         #in case you need to pull new code

## calculate the length of the array-job given
SUB_SIZE=10
N_SUBJECTS=$(( $( wc -l ./data/local/bids/participants.tsv | cut -f1 -d' ' ) - 1 ))
array_job_length=$(echo "$N_SUBJECTS/${SUB_SIZE}" | bc)
echo "number of array is: ${array_job_length}"

## submit the array job to the queue
sbatch --array=0-${array_job_length} ./code/01_mriqc.sh

Running fmriprep-anatomical (includes freesurfer)

Note: this step uses and estimated 16hrs for processing time per participant! So if all participants run at once (in our parallel cluster) it will still take a day to run.

## note step one is to make sure you are on one of the login nodes
ssh niagara.scinet.utoronto.ca

# module load singularity/3.8.0 - singularity already on most nodes
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull         #in case you need to pull new code

## calculate the length of the array-job given
SUB_SIZE=5
N_SUBJECTS=$(( $( wc -l ./data/local/bids/participants.tsv | cut -f1 -d' ' ) - 1 ))
array_job_length=$(echo "$N_SUBJECTS/${SUB_SIZE}" | bc)
echo "number of array is: ${array_job_length}"

## submit the array job to the queue
sbatch --array=0-${array_job_length} code/01_fmriprep_anat_scinet.sh

submitting the fmriprep func step

Running the functional step looks pretty similar to running the anat step. The time taken and resources needed will depend on how many functional tasks exists in the experiment - fMRIprep will try to run these in paralell if resources are available to do that.

Note - the script enclosed uses some interesting extra opions:

it defaults to running all the fmri tasks - the --task-id flag can be used to filter from there
it is running synthetic distortion correction by default - instead of trying to work with the datasets available feildmaps - because feildmaps correction can go wrong - but this does require that the phase encoding direction is specificed in the json files (for example "PhaseEncodingDirection": "j-").

## note step one is to make sure you are on one of the login nodes
ssh niagara.scinet.utoronto.ca

## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull

## figuring out appropriate array-job size
SUB_SIZE=2 # for func the sub size is moving to 1 participant because there are two runs and 8 tasks per run..
N_SUBJECTS=$(( $( wc -l ./data/local/bids/participants.tsv | cut -f1 -d' ' ) - 1 ))
array_job_length=$(echo "$N_SUBJECTS/${SUB_SIZE}" | bc)
echo "number of array is: ${array_job_length}"

## submit the array job to the queue
sbatch --array=0-${array_job_length} ./code/02_fmriprep_func_scinet.sh

running ciftify

## note step one is to make sure you are on one of the login nodes
ssh niagara.scinet.utoronto.ca

## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull

## figuring out appropriate array-job size
SUB_SIZE=8 # for func the sub size is moving to 1 participant because there are two runs and 8 tasks per run..
N_SUBJECTS=$(( $( wc -l ./data/local/bids/participants.tsv | cut -f1 -d' ' ) - 1 ))
array_job_length=$(echo "$N_SUBJECTS/${SUB_SIZE}" | bc)
echo "number of array is: ${array_job_length}"

## submit the array job to the queue
cd ${SCRATCH}/SCanD_project
sbatch --array=0-${array_job_length} ./code/03_ciftify_scinet.sh

running cifti clean

## note step one is to make sure you are on one of the login nodes
ssh niagara.scinet.utoronto.ca

## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull

## figuring out appropriate array-job size
SUB_SIZE=10 # for func the sub size is moving to 1 participant because there are two runs and 8 tasks per run..
N_DTSERIES=$(ls -1d ./data/local/ciftify/sub*/MNINonLinear/Results/*task*/*dtseries.nii | wc -l)
array_job_length=$(echo "$N_DTSERIES/${SUB_SIZE}" | bc)
echo "number of array is: ${array_job_length}"

## submit the array job to the queue
sbatch --array=0-${array_job_length} ./code/04_cifti_clean.sh

running the parcellation step

## note step one is to make sure you are on one of the login nodes
ssh niagara.scinet.utoronto.ca

## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull

## figuring out appropriate array-job size
SUB_SIZE=10 # for func the sub size is moving to 1 participant because there are two runs and 8 tasks per run..
N_DTSERIES=$(ls -1d ./data/local/ciftify/sub*/MNINonLinear/Results/*task*/*dtseries.nii | wc -l)
array_job_length=$(echo "$N_DTSERIES/${SUB_SIZE}" | bc)
echo "number of array is: ${array_job_length}"

## submit the array job to the queue
sbatch --array=0-${array_job_length} ./code/05_parcellate.sh

syncing the data with to the share directory

This step does calls some "group" level bids apps to build summary sheets and html index pages. It also moves a meta data, qc pages and a smaller subset of summary results into the data/share folder.

It takes about 10 minutes to run (depending on how much data you are synching). It could also be submitted.

## note step one is to make sure you are on one of the login nodes
ssh niagara.scinet.utoronto.ca

## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull

source ./code/06_extract_to_share.sh

Appendix - Adding a test dataset from openneuro

(To test this repo - using an openneuro dataset)

To get an openneuro dataset for testing - we will use datalad

loading datalad on SciNet niagara

## loading Erin's datalad environment on the SciNet system
module load git-annex/8.20200618 # git annex is needed by datalad
module use /project/a/arisvoin/edickie/modules #this let's you read modules from Erin's folder
module load datalad/0.15.5 # this is the datalad module in Erin's folder

using datalad to install a download a dataset

cd ${SCRATCH}/SCanD_project/data/local/
datalad clone https://github.com/OpenNeuroDatasets/ds000115.git bids

Before running fmriprep anat get need to download/"get" the anat derivatives

cd bids
datalad get sub*/anat/*T1w.nii.gz

Before running fmriprep func - we need to download the fmri scans

cd bids
datalad get sub*/func/*

But - with this dataset - there is also the issue that this dataset is old enough that no Phase Encoding Direction was given for the fMRI scans - we really want at least to have this so we can run Synth Distortion Correction. So we are going to guess it..

To guess - we add this line into the middle of the top level json ().

"PhaseEncodingDirection": "j-",

note: now - thanks to the people at repronim - we can also add the repronim derivatives !

cd ${SCRATCH}/SCanD_project/data/local/ls

datalad clone https://github.com/OpenNeuroDerivatives/ds000115-fmriprep.git fmriprep
datalad clone https://github.com/OpenNeuroDerivatives/ds000115-mriqc.git mriqc

getting the data files we actually use for downstream ciftify things

slimnsour / scand_project Goto Github PK

scand_project's Introduction