pegasus-isi / freesurfer-osg-workflow Goto Github PK
View Code? Open in Web Editor NEWA Pegasus workflow for running FreeSurfer on the Open Science Grid
A Pegasus workflow for running FreeSurfer on the Open Science Grid
When the job finishes, I get an output file like subject_output.tar.gz
inside my workdir. Is there a way to make the workflow not tar-up all the files and instead leave all files expanded under "subject" directory?
I think I just have to adjust scripts/autorecon3.sh
but I am not sure how best to handle this.
I just ran my first freesurfer-osg-workflow on osgconnect, but I can't seem to find the generated output. I believe the job failed, but I can't tell which log to check to see what went wrong.
On osgconnect, here is the path for the workflow I've submitted.
/local-scratch/hayashis/workflows/5e03c800cc6723b534bd2d7b/5e03c800cc67234a3ebd2d7e
Could someone help me troubleshoot the problem?
I believe the current version of freesurfer-osg-workflow runs freesurfer 6.0.1(?). Is there anyway to specify the version of freesurfer? We run 6.0.0 and 6.0.1, and sometimes the dev (nightly build) version. Also, freesurfer 7.0 RC is out now, so I might want to start testing with it also.
I've run another test job and it failed with this error message.
##################### Checking file integrity for input files #####################
Integrity check: output-t1.nii.gz: Expected checksum (315e3a7365017a2dfd9472e629a53d48b2777767b3a9ec7a830e9bb609a45685) does not match the calculated checksum (1819fdd61883a2da3e5c249649ef91034df97ce6891a66fb5a6676797581386e) (timing: 0.446)
I don't see output-t1.nii.gz delivered to the submit host (osgconnect) though. Here is a bit more log output
FAILED ~5 hours ago
Submit Directory : work
Total jobs : 14 (100.00%)
# jobs succeeded : 4 (28.57%)
# jobs failed : 1 (7.14%)
# jobs held : 1 (7.14%)
# jobs unsubmitted : 9 (64.29%)
*******************************Held jobs' details*******************************
===========================autorecon1_sh_output_00001===========================
submit file : autorecon1_sh_output_00001.sub
last_job_instance_id : 7
reason : Error from slot1_1@[email protected]: STARTER at 192.168.3.174 failed to send file(s) to <192.170.227.166:9618>: error reading from /sge-batch/3958692.1.lnxfarm/glide_LzUphU/execute/dir_31678/output_recon1_output.tar.xz: (errno 2) No such file or directory; SHADOW failed to receive file(s) from <192.12.238.129:35960>
******************************Failed jobs' details******************************
===========================autorecon1_sh_output_00001===========================
last state: POST_SCRIPT_FAILED
site: condorpool
submit file: 00/00/autorecon1_sh_output_00001.sub
output file: 00/00/autorecon1_sh_output_00001.out.002
error file: 00/00/autorecon1_sh_output_00001.err.002
-------------------------------Task #1 - Summary--------------------------------
site : condorpool
hostname : -
executable : /public/hayashis/workdir/5ee37996529ab4fcd686772f/5ef206c9bf709388263a46aa/work/00/00/autorecon1_sh_output_00001.sh
arguments : -
exitcode : -1
working dir : /public/hayashis/workdir/5ee37996529ab4fcd686772f/5ef206c9bf709388263a46aa/work
-----------Job stderr file - 00/00/autorecon1_sh_output_00001.err.002-----------
/bin/singularity exec --bind /cvmfs --bind /hadoop --bind /mnt/hadoop --contain --bind /batch/lnxfarm274/3958692.1.lnxfarm/glide_LzUphU/execute/dir_31678:/srv --no-home --ipc --pid /cvmfs/singularity.opensciencegrid.org/.images/d2/9c76edc34defb5998d5f6c0fa9704c55485d03141e6e3b112076ee777c9bea /srv/.osgvo-user-job-wrapper.sh /srv/condor_exec.exe
2020-06-23 15:05:28: PegasusLite: version 4.9.3dev
2020-06-23 15:05:29: Executing on host lnxfarm274.colorado.edu OSG_SITE_NAME=UColorado_HEP GLIDEIN_Site=Colorado GLIDEIN_ResourceName=UColorado_HEP
########################[Pegasus Lite] Setting up workdir ########################
2020-06-23 15:05:29: Not creating a new work directory as it is already set to /srv
##############[Pegasus Lite] Figuring out the worker package to use ##############
2020-06-23 15:05:29: The job contained a Pegasus worker package
##################### Setting the xbit for executables staged #####################
##################### Checking file integrity for input files #####################
Integrity check: output-t1.nii.gz: Expected checksum (315e3a7365017a2dfd9472e629a53d48b2777767b3a9ec7a830e9bb609a45685) does not match the calculated checksum (1819fdd61883a2da3e5c249649ef91034df97ce6891a66fb5a6676797581386e) (timing: 0.446)
2020-06-23 15:05:30: Last command exited with 1
PegasusLite: exitcode 1
Hi,
I am trying to run the test script from the /freesurfer-osg-workflow directory using the following command:
./submit.sh --input-def example-run.yml
however I got the following error message:
workflow-generator.py: error: argument --inputs-def is required
Do you have an idea on what is going on?
Thanks
If I submit a workflow (with pegasus-plan --submit
), and immediately runs pegasus-status, it reports Failurer.
STAT IN_STATE JOB
Idle 00:16 freesurfer-0 ( /home/hayashis/git/app-freesurfer-osg/work )
Summary: 1 Condor job total (I:1)
STATE
Failure
Summary: 1 DAG total (Failure:1)
If I sleep for about 20 seconds, then run pegasus-status, it will report Running status.
STAT IN_STATE JOB
Run 00:20 freesurfer-0 ( /home/hayashis/git/app-freesurfer-osg/work )
Idle 00:19 ┗━create_dir_freesurfer_0_local
Summary: 2 Condor jobs total (I:1 R:1)
STATE
Running
Summary: 1 DAG total (Running:1)
I've added sleep 20
on my startup script for now, but it would be nice if pegasus-status will report anything other than "Failure" as soon as I submit the job.
I am updating our freesurfer App to use the new pegasus workflow version.
For it to function properly on our system, I will need to query and translate pegasus status output to one of the following exit code.
#return code 0 = running
#return code 1 = finished successfully
#return code 2 = failed
#return code 3 = unknown status
I am currently working on the following script
https://github.com/brainlife/app-freesurfer-osg/blob/master/status.sh
My questions are..
Thanks!
I was able to run the test job successfully, and obtained what seems to be a valid freesurfer output.
However, I ran another test job using the same t1 input, and this time it failed with this error message.
$ pegasus-analyzer work
************************************Summary*************************************
Submit Directory : work
Total jobs : 14 (100.00%)
# jobs succeeded : 4 (28.57%)
# jobs failed : 1 (7.14%)
# jobs held : 1 (7.14%)
# jobs unsubmitted : 9 (64.29%)
*******************************Held jobs' details*******************************
==========================autorecon1_sh_subject_00001===========================
submit file : autorecon1_sh_subject_00001.sub
last_job_instance_id : 7
reason : Error from slot1_6@[email protected]: STARTER at 192.168.4.2 failed to send file(s) to <192.170.227.166:9618>: error reading from /var/lib/condor/execute/dir_1807/subject_recon1_output.tar.xz: (errno 2) No such file or directory; SHADOW failed to receive file(s) from <192.170.236.165:60541>
******************************Failed jobs' details******************************
==========================autorecon1_sh_subject_00001===========================
last state: POST_SCRIPT_FAILED
site: condorpool
submit file: 00/00/autorecon1_sh_subject_00001.sub
output file: 00/00/autorecon1_sh_subject_00001.out.002
error file: 00/00/autorecon1_sh_subject_00001.err.002
-------------------------------Task #1 - Summary--------------------------------
site : condorpool
hostname : condor-worker-7c7d97844f-ht4ml
executable : /srv/autorecon1_sh
arguments : subject subject-t1.nii.gz 4 -notal-check -cw256
exitcode : 1
working dir : /srv
----------------Task #1 - autorecon1.sh - subject_00001 - stdout----------------
Will use SUBJECTS_DIR=/srv/tmp.1PLxkG0bOW
Subject Stamp: freesurfer-Linux-centos6_x86_64-stable-pub-v6.0.1-f53a55a
Current Stamp: freesurfer-Linux-centos6_x86_64-stable-pub-v6.0.1-f53a55a
INFO: SUBJECTS_DIR is /srv/tmp.1PLxkG0bOW
Actual FREESURFER_HOME /opt/freesurfer-6.0.1
Linux condor-worker-7c7d97844f-ht4ml 5.3.2-1.el7.elrepo.x86_64 #1 SMP Tue Oct 1 08:18:21 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux
'/opt/freesurfer-6.0.1/bin/recon-all' -> '/srv/tmp.1PLxkG0bOW/subject/scripts/recon-all.local-copy'
-cw256 option is now persistent (remove with -clean-cw256)
/srv/tmp.1PLxkG0bOW/subject
mri_convert /srv/subject-t1.nii.gz /srv/tmp.1PLxkG0bOW/subject/mri/orig/001.mgz
mri_convert.bin /srv/subject-t1.nii.gz /srv/tmp.1PLxkG0bOW/subject/mri/orig/001.mgz
$Id: mri_convert.c,v 1.226 2016/02/26 16:15:24 mreuter Exp $
reading from /srv/subject-t1.nii.gz...
TR=6.40, TE=0.00, TI=0.00, flip angle=0.00
i_ras = (1, 0, 0)
j_ras = (0, 1, 0)
k_ras = (0, 0, 1)
writing to /srv/tmp.1PLxkG0bOW/subject/mri/orig/001.mgz...
#--------------------------------------------
How should I handle this error?
Would it be possible to stream the output from the recon-all command back to the submit host while the job is being executed? I can live without this, but I think users will want to see how the job is progressing.
I am particularly interested in the following markers in the stdout
#--------------------------------------------
#@# ASeg Stats Mon Apr 20 21:52:51 UTC 2020
...
#-----------------------------------------
#@# WMParc Mon Apr 20 21:53:29 UTC 2020
...
#--------------------------------------------
#@# BA_exvivo Labels lh Mon Apr 20 21:59:04 UTC 2020
...
#--------------------------------------------
#@# BA_exvivo Labels rh Mon Apr 20 22:01:08 UTC 2020
These markers tell where it is in terms of overall processing.
If something similar to this can be output to the log, I can then relay that to the users. Would this be difficult to do?
I am trying to figure out how to set -hippocampal-subfields-T1T2
option for autorecon-options in run.yml. This option requires path to the T2 file if specified.
Currently, I have the following script to customize the command line option based on user input.
cmd="-i $t1 -subject output -all -parallel -openmp $OMP_NUM_THREADS"
if [ -f $t2 ]; then
cmd="$cmd -T2 $t2 -T2pial"
#https://surfer.nmr.mgh.harvard.edu/fswiki/HippocampalSubfields
#https://surfer.nmr.mgh.harvard.edu/fswiki/HippocampalSubfieldsAndNucleiOfAmygdala
if [ $hippocampal == "true" ]; then
cmd="$cmd -hippocampal-subfields-T1T2 $t2 t1t2"
fi
else
if [ $hippocampal == "true" ]; then
cmd="$cmd -hippocampal-subfields-T1"
fi
fi
Since I can't just use the local path for T2 in autorecon-options (right?) I am not sure how to go about specifying this option. Is it possible to set this option?
I just tried running this again, and now I am seeing a different error message.
submitting with this this config
+ cat run.yml
output:
input: ../5ed1a215529ab4221883209e/5e6a9956874067bc9ea3d445/t1.nii.gz
+ ./workflow-generator.py --inputs-def run.yml
{'output': {'input': '../5ed1a215529ab4221883209e/5e6a9956874067bc9ea3d445/t1.nii.gz'}}
+ export PYTHONPATH=:/usr/lib/python2.6/site-packages
+ PYTHONPATH=:/usr/lib/python2.6/site-packages
+ pegasus-plan --conf pegasus.conf --dir /public/hayashis/workdir/5ed1a215529ab491cf83209d/5ed1a215529ab4764b8320a0 --relative-dir work --sites condorpool --output-site local --dax freesurfer-osg.xml --cluster horizontal --submit
2020.05.30 00:00:40.676 GMT:
2020.05.30 00:00:40.682 GMT: -----------------------------------------------------------------------
2020.05.30 00:00:40.688 GMT: File for submitting this DAG to HTCondor : freesurfer-0.dag.condor.sub
2020.05.30 00:00:40.694 GMT: Log of DAGMan debugging messages : freesurfer-0.dag.dagman.out
2020.05.30 00:00:40.701 GMT: Log of HTCondor library output : freesurfer-0.dag.lib.out
2020.05.30 00:00:40.707 GMT: Log of HTCondor library error messages : freesurfer-0.dag.lib.err
2020.05.30 00:00:40.714 GMT: Log of the life of condor_dagman itself : freesurfer-0.dag.dagman.log
2020.05.30 00:00:40.720 GMT:
2020.05.30 00:00:40.727 GMT: -no_submit given, not submitting DAG to HTCondor. You can do this with:
2020.05.30 00:00:40.738 GMT: -----------------------------------------------------------------------
2020.05.30 00:00:42.582 GMT: Your database is compatible with Pegasus version: 4.9.3dev
2020.05.30 00:00:42.691 GMT: Submitting to condor freesurfer-0.dag.condor.sub
2020.05.30 00:01:03.444 GMT: Submitting job(s)
2020.05.30 00:01:03.451 GMT: ERROR: store_cred failed!
2020.05.30 00:01:03.509 GMT: [ERROR] ERROR: Running condor_submit /usr/local/bin/condor_submit freesurfer-0.dag.condor.sub failed with exit code 1 at /usr/bin/pegasus-run line 327.
2020.05.30 00:01:03.515 GMT: [FATAL ERROR]
[1] java.lang.RuntimeException: Unable to submit the workflow using pegasus-run at edu.isi.pegasus.planner.client.CPlanner.executeCommand(CPlanner.java:697)
I'd like to request for freesurfer 7.1.0 container to be made available
ls: cannot access /cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-freesurfer:7.1.0/: No such file or directory
Error: unable to access /cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-freesurfer:7.1.0
Also, the Freesurfer 7.0.0 should be removed as it was recalled by the developer.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.