Giter Site home page Giter Site logo

atac-seq-pipeline's Introduction

ENCODE ATAC-seq pipeline

DOICircleCI

Introduction

This pipeline is designed for automated end-to-end quality control and processing of ATAC-seq and DNase-seq data. The pipeline can be run on compute clusters with job submission engines as well as on stand alone machines. It inherently makes uses of parallelized/distributed computing. Pipeline installation is also easy as most dependencies are automatically installed. The pipeline can be run end-to-end, starting from raw FASTQ files all the way to peak calling and signal track generation using a single caper submit command. One can also start the pipeline from intermediate stages (for example, using alignment files as input). The pipeline supports both single-end and paired-end data as well as replicated or non-replicated datasets. The outputs produced by the pipeline include 1) formatted HTML reports that include quality control measures specifically designed for ATAC-seq and DNase-seq data, 2) analysis of reproducibility, 3) stringent and relaxed thresholding of peaks, 4) fold-enrichment and pvalue signal tracks. The pipeline also supports detailed error reporting and allows for easy resumption of interrupted runs. It has been tested on some human, mouse and yeast ATAC-seq datasets as well as on human and mouse DNase-seq datasets.

The ATAC-seq pipeline protocol specification is here. Some parts of the ATAC-seq pipeline were developed in collaboration with Jason Buenrostro, Alicia Schep and Will Greenleaf at Stanford.

Issues with PE Fastqs downloaded from SRA

Read names in PE Fastqs should be consistent across the files pair. Do not use --readids in fastq-dump so that reads in a pair have the same read name. Inconsitent read names (for example, READNAME.1 in FQ1 and READNAME.2 in FQ2) will result in an empty BAM error in a filter step.

Features

  • Portability: The pipeline run can be performed across different cloud platforms such as Google, AWS and DNAnexus, as well as on cluster engines such as SLURM, SGE and PBS.
  • User-friendly HTML report: In addition to the standard outputs, the pipeline generates an HTML report that consists of a tabular representation of quality metrics including alignment/peak statistics and FRiP along with many useful plots (IDR/TSS enrichment). An example of the HTML report. The json file used in generating this report.
  • Supported genomes: Pipeline needs genome specific data such as aligner indices, chromosome sizes file and blacklist. We provide a genome database downloader/builder for hg38, hg19, mm10, mm9. You can also use this builder to build genome database from FASTA for your custom genome.

Installation

  1. Install Caper (Python Wrapper/CLI for Cromwell).

    $ pip install caper
  2. IMPORTANT: Read Caper's README carefully to choose a backend for your system. Follow the instruction in the configuration file.

    # backend: local or your HPC type (e.g. slurm, sge, pbs, lsf). read Caper's README carefully.
    $ caper init [YOUR_BACKEND]
    
    # IMPORTANT: edit the conf file and follow commented instructions in there
    $ vi ~/.caper/default.conf
  3. Git clone this pipeline.

    $ cd
    $ git clone https://github.com/ENCODE-DCC/atac-seq-pipeline
    $ cd atac-seq-pipeline
  4. Define test input JSON.

    INPUT_JSON="https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled.json"
  5. If you have Docker and want to run pipelines locally on your laptop. --max-concurrent-tasks 1 is to limit number of concurrent tasks to test-run the pipeline on a laptop. Uncomment it if run it on a workstation/HPC.

    # check if Docker works on your machine
    $ docker run ubuntu:latest echo hello
    
    # --max-concurrent-tasks 1 is for computers with limited resources
    $ caper run atac.wdl -i "${INPUT_JSON}" --docker --max-concurrent-tasks 1
  6. Otherwise, install Singularity on your system. Please follow this instruction to install Singularity on a Debian-based OS. Or ask your system administrator to install Singularity on your HPC.

    # check if Singularity works on your machine
    $ singularity exec docker://ubuntu:latest echo hello
    
    # on your local machine (--max-concurrent-tasks 1 is for computers with limited resources)
    $ caper run atac.wdl -i "${INPUT_JSON}" --singularity --max-concurrent-tasks 1
    
    # on HPC, make sure that Caper's conf ~/.caper/default.conf is correctly configured to work with your HPC
    # the following command will submit Caper as a leader job to SLURM with Singularity
    $ caper hpc submit atac.wdl -i "${INPUT_JSON}" --singularity --leader-job-name ANY_GOOD_LEADER_JOB_NAME
    
    # check job ID and status of your leader jobs
    $ caper hpc list
    
    # cancel the leader node to close all of its children jobs
    # If you directly use cluster command like scancel or qdel then
    # child jobs will not be terminated
    $ caper hpc abort [JOB_ID]
  7. (Optional Conda method) WE DO NOT HELP USERS FIX CONDA DEPENDENCY ISSUES. IF CONDA METHOD FAILS THEN PLEASE USE SINGULARITY METHOD INSTEAD. DO NOT USE A SHARED CONDA. INSTALL YOUR OWN MINICONDA3 AND USE IT.

    # check if you are not using a shared conda, if so then delete it or remove it from your PATH
    $ which conda
    
    # uninstall pipeline's old environments
    $ bash scripts/uninstall_conda_env.sh
    
    # install new envs, you need to run this for every pipeline version update.
    # it may be killed if you run this command line on a login node on HPC.
    # it's recommended to make an interactive node with enough resources and run it there.
    $ bash scripts/install_conda_env.sh
    
    # if installation fails please use Singularity method instead.
    
    # on your local machine (--max-concurrent-tasks 1 is for computers with limited resources)
    $ caper run atac.wdl -i "${INPUT_JSON}" --conda --max-concurrent-tasks 1
    
    # on HPC, make sure that Caper's conf ~/.caper/default.conf is correctly configured to work with your HPC
    # the following command will submit Caper as a leader job to SLURM with Conda
    $ caper hpc submit atac.wdl -i "${INPUT_JSON}" --conda --leader-job-name ANY_GOOD_LEADER_JOB_NAME
    
    # check job ID and status of your leader jobs
    $ caper hpc list
    
    # cancel the leader node to close all of its children jobs
    # If you directly use cluster command like scancel or qdel then
    # child jobs will not be terminated
    $ caper hpc abort [JOB_ID]

Input JSON file specification

IMPORTANT: DO NOT BLINDLY USE A TEMPLATE/EXAMPLE INPUT JSON. READ THROUGH THE FOLLOWING GUIDE TO MAKE A CORRECT INPUT JSON FILE. ESPECIALLY FOR AUTODETECTING/DEFINING ADAPTERS.

An input JSON file specifies all the input parameters and files that are necessary for successfully running this pipeline. This includes a specification of the path to the genome reference files and the raw data fastq file. Please make sure to specify absolute paths rather than relative paths in your input JSON files.

  1. Input JSON file specification (short)
  2. Input JSON file specification (long)

Running and sharing on Truwl

You can run this pipeline on truwl.com. This provides a web interface that allows you to define inputs and parameters, run the job on GCP, and monitor progress. To run it you will need to create an account on the platform then request early access by emailing [email protected] to get the right permissions. You can see the example case from this repo at https://truwl.com/workflows/instance/WF_e85df4.f10.8880/command. The example job (or other jobs) can be forked to pre-populate the inputs for your own job.

If you do not run the pipeline on Truwl, you can still share your use-case/job on the platform by getting in touch at [email protected] and providing your inputs.json file.

Running on Terra/Anvil (using Dockstore)

Visit our pipeline repo on Dockstore. Click on Terra or Anvil. Follow Terra's instruction to create a workspace on Terra and add Terra's billing bot to your Google Cloud account.

Download this test input JSON for Terra and upload it to Terra's UI and then run analysis.

If you want to use your own input JSON file, then make sure that all files in the input JSON are on a Google Cloud Storage bucket (gs://). URLs will not work.

Running on DNAnexus (using Dockstore)

Sign up for a new account on DNAnexus and create a new project on either AWS or Azure. Visit our pipeline repo on Dockstore. Click on DNAnexus. Choose a destination directory on your DNAnexus project. Click on Submit and visit DNAnexus. This will submit a conversion job so that you can check status of it on Monitor on DNAnexus UI.

Once conversion is done download one of the following input JSON files according to your chosen platform (AWS or Azure) for your DNAnexus project:

You cannot use these input JSON files directly. Go to the destination directory on DNAnexus and click on the converted workflow atac. You will see input file boxes in the left-hand side of the task graph. Expand it and define FASTQs (fastq_repX_R1 and also fastq_repX_R2 if it's paired-ended) and genome_tsv as in the downloaded input JSON file. Click on the common task box and define other non-file pipeline parameters. e.g. auto_detect_adapters and paired_end.

We have a separate project on DNANexus to provide example FASTQs and genome_tsv for hg38 and mm10. We recommend to make copies of these directories on your own project.

genome_tsv

Example FASTQs

Running on DNAnexus (using our pre-built workflows)

See this for details.

How to organize outputs

Install Croo. You can skip this installation if you have installed pipeline's Conda environment and activated it. Make sure that you have python3(> 3.4.1) installed on your system. Find a metadata.json on Caper's output directory.

$ pip install croo
$ croo [METADATA_JSON_FILE]

How to make a spreadsheet of QC metrics

Install qc2tsv. Make sure that you have python3(> 3.4.1) installed on your system.

Once you have organized output with Croo, you will be able to find pipeline's final output file qc/qc.json which has all QC metrics in it. Simply feed qc2tsv with multiple qc.json files. It can take various URIs like local path, gs:// and s3://.

$ pip install qc2tsv
$ qc2tsv /sample1/qc.json gs://sample2/qc.json s3://sample3/qc.json ... > spreadsheet.tsv

QC metrics for each experiment (qc.json) will be split into multiple rows (1 for overall experiment + 1 for each bio replicate) in a spreadsheet.

atac-seq-pipeline's People

Contributors

akundaje avatar ammawla avatar annashcherbina avatar karl616 avatar leepc12 avatar ottojolanki avatar strattan avatar vervacity avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

atac-seq-pipeline's Issues

Error in pipeline

Describe the bug
Hi. I'm trying to run the atac-seq-pipeline and old version(deprecated) is working.
But this new version makes some errors when running the pipeline( encode_trim_adapter.py) .

[error] WorkflowManagerActor Workflow 5a715660-fc5d-425f-9b4a-8b5b94ce81e8 failed (during ExecutingWorkflowState): Job atac.trim_adapter:0:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.

I attached stderr log. and I'm not sure 'input.json' is correct.
I tried to use this (https://www.encodeproject.org/experiments/ENCSR245LNF/) replicate2.

input.json
{

"atac.pipeline_type" : "atac",
"atac.genome_tsv" : "/home/dmb/hoyongLee/atac-seq-pipeline/hg38.tsv",

"atac.fastqs" : [[
           [
             "/home/dmb/hoyongLee/data/ENCFF154KNN.fastq",
             "/home/dmb/hoyongLee/data/ENCFF829SBE.fastq"
           ],
           [
            "/home/dmb/hoyongLee/data/ENCFF565CVN.fastq",
            "/home/dmb/hoyongLee/data/ENCFF092IJN.fastq"
           ],
           [
            "/home/dmb/hoyongLee/data/ENCFF351LRE.fastq",
            "/home/dmb/hoyongLee/data/ENCFF803XRX.fastq"
           ],
           [
            "/home/dmb/hoyongLee/data/ENCFF709FLO.fastq",
            "/home/dmb/hoyongLee/data/ENCFF396BSZ.fastq"
           ],
           [
            "/home/dmb/hoyongLee/data/ENCFF247VZG.fastq",
            "/home/dmb/hoyongLee/data/ENCFF738DHW.fastq"
           ]
]],

"atac.paired_end" : true,
"atac.multimapping" : 4,

"atac.trim_adapter.auto_detect_adapter" : true,

"atac.bowtie2.cpu" : 4,
"atac.bowtie2.mem_mb" : 16000,

"atac.filter.cpu" : 4,
"atac.filter.mem_mb" : 12000,

"atac.macs2_mem_mb" : 16000,

"atac.smooth_win" : 73,
"atac.enable_idr" : true,
"atac.idr_thresh" : 0.05,

"atac.qc_report.name" : "ENCSR245LNF",
"atac.qc_report.desc" : "ATAC-seq on Mus musculus C57BL/6 frontal cortex adult"

}

OS/Platform and dependencies

  • OS or Platform: Ubuntu 16.04, AWS instance
  • Cromwell/dxWDL version: cromwell-33.1
  • Conda version: 4.5.4

Attach logs

File "/home/dmb/hoyongLee/atac-seq-pipeline/cromwell-executions/atac/e765d0d0-ffee-4c81-b679-1d2cccce874c/call-trim_adapter/shard-0/execution/write_tsv_55ca5bd3e4730afa22126a0109b4636d.tmp", line 1
2 /home/dmb/hoyongLee/atac-seq-pipeline/cromwell-executions/atac/e765d0d0-ffee-4c81-b679-1d2cccce874c/call-trim_adapter/shard-0/inputs/-1676945734/ENCFF154KNN.fastq /home/dmb/hoyongLee/atac-seq-pipeline/ cromwell-executions/atac/e765d0d0-ffee-4c81-b679-1d2cccce874c/call-trim_adapter/shard-0/inputs/-1676945734/ENCFF829SBE.fastq
3 ^
4 SyntaxError: invalid syntax

Error during the end of pipeline at call-reproducibility_overlap step, using real data.

Hi, first things first, thanks for all the help and for this beautiful pipeline.

I was able to install and the test ran smoothly. Now I tried to run my real samples and it broke in the very end. I can't explain why. Can you try to help me understand what is the issue? I am attaching all the logs of the run.

cat `./call-reproducibility_overlap/execution/stderr`

 Traceback (most recent call last):
  File "/ru-auth/local/home/trezende/miniconda3/envs/encode-atac-seq-pipeline/bin/encode_reproducibility_qc.py", line 154, in <module>
    main()
  File "/ru-auth/local/home/trezende/miniconda3/envs/encode-atac-seq-pipeline/bin/encode_reproducibility_qc.py", line 50, in main
    args = parse_arguments()
  File "/ru-auth/local/home/trezende/miniconda3/envs/encode-atac-seq-pipeline/bin/encode_reproducibility_qc.py", line 42, in parse_arguments
    'Invalid number of peak files or --peak-pr.')
argparse.ArgumentTypeError: Invalid number of peak files or --peak-pr.

So, I built the mm10 mouse database successfully and tried the pipeline.

There is any way of trying to fix/test it without having to start since from the beginning? Can the pipeline skip the successfully and finished things?
debug_34.tar.gz

ataqc module still fails

The ataqc module still will not finish running without an error. It looks like a similar issue to #66 , perhaps related to non-conventional chromosome names.

Picked up _JAVA_OPTIONS: -Xms256M -Xmx16000M -XX:ParallelGCThreads=1
Traceback (most recent call last):
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_ataqc.py", line 444, in <module>
    ataqc()
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_ataqc.py", line 231, in ataqc
    read_len)
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/run_ataqc.py", line 438, in make_tss_plot
    processes=processes, stranded=True)
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/lib/python2.7/site-packages/metaseq/_genomic_signal.py", line 122, in array
    chunksize=chunksize, **kwargs)
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/lib/python2.7/site-packages/metaseq/array_helpers.py", line 383, in _array_parallel
    itertools.repeat(kwargs)))
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/lib/python2.7/multiprocessing/pool.py", line 253, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
ValueError: invalid reference `chr1`

OS/Platform and dependencies

  • OS or Platform: Ubuntu 16.04.4 (durga)
  • Cromwell/dxWDL version: cromwell 34-unknown-SNAP
  • Conda version: conda 4.5.11

debug_ataqc.tar.gz

TSS enrichment?

Are TSS fold enrichment or FRiP for enhancers reported by this pipeline? An older version of the pipeline included those stats in the HTML report, but I cannot find them in these newer HTML reports (human or custom reference). I was able to find the TSS enrichment plots (distance from TSS vs average read coverage) but not a specific TSS fold enrichment value.

OS/Platform and dependencies

  • Platform: Ubuntu 16.04.4
  • Cromwell: cromwell-34
  • Conda version: conda 4.5.11

Google changed a domain for hosting files for tutorial test samples and genome data

a question about reads shifting

Thank you for the construction of the pipeline and all the hard work.

I'd like to confirm with one issue: Do those bam files have the reads shifted? It seems many groups shifted reads +4/-5bp because of the insertion of adaptors by Tn5 transpose, before calling peaks with MACS2 or do footprint analysis.

Do I need to shift reads in the bam files or re-do peak calling with the shifted bam?

Thank you very much!

Error running test data with CONDA through virtualbox

OS/Platform and dependencies

  • OS or Platform: macOS High Sierra running running Linux Virtual box 4.15.0-23-generic Ubuntu 64-bit
  • Conda version: 4.5.11

I get an error while running the test data on the pipeline that says that it failed to find index Success(WomInteger(0)) on array. The error is shown below

WorkflowManagerActor Workflow 7a43f040-d436-4dbc-9596-8fbbccfa2827 failed (during ExecutingWorkflowState): cromwell.backend.standard.StandardAsyncExecutionActor$$anon$2: Failed to evaluate job outputs:
Bad output 'filter.flagstat_qc': Failed to find index Success(WomInteger(0)) on array:

Success([])

0
Bad output 'filter.dup_qc': Failed to find index Success(WomInteger(0)) on array:

Success([])

0
Bad output 'filter.pbc_qc': Failed to find index Success(WomInteger(0)) on array:

Success([])

0
Bad output 'filter.mito_dup_log': Failed to find index Success(WomInteger(0)) on array:

Success([])

0
	at cromwell.backend.standard.StandardAsyncExecutionActor.$anonfun$handleExecutionSuccess$1(StandardAsyncExecutionActor.scala:839)
	at scala.util.Success.$anonfun$map$1(Try.scala:251)
	at scala.util.Success.map(Try.scala:209)
	at scala.concurrent.Future.$anonfun$map$1(Future.scala:288)
	at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)
	at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
	at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
	at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
	at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
	at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91)
	at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

I have attached the tar ball.
I have also captured my session using script which is attached as the typescript.

I followed the instructions posted on the github for Tutorial for general UNIX computers without docker. I'm not sure what this error message means and was hoping that you had some advice to fix this. Thanks!

typescript.txt
debug_61.tar.gz

Resume function in the pipeline

General Question

Hi Jin,

I am wondering is there any resume function in the script as I could not find anything close to it. The concept is that if there is a breakdown at certain task in the pipeline. The next time I run the pipeline it should not run from the beginning so that it starts from the broke down task.

OS/Platform and dependencies

  • OS or Platform: CentOS Linux release 7.4.1708
  • Cromwell/dxWDL version: Cromwell-34
  • Conda version: 4.5.10

Processed bed files from Corces MR et al., 2017 paper

Hi,

I am guessing (please confirm so) that Corces MR et al., 2017 paper used this pipeline for their ATAC-seq data processing. I am wondering if they re-processed previously published data (GM12878 and CD4+) from Buenrostro JD et al., 2013 paper using this pipeline. Please let me know if it has been deposited with another accession number.

Again, it will save much of my time and compute power if I can get to know where the processed bed/bigBed files are deposited for Corces MR et al., 2017 paper.

Your help is very much appreciated!

Sincerely,
Satya

Bug in call-ataqc module

The call-ataqc module is failing with the following python variable assignment error:

Traceback (most recent call last):
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_ataqc.py", line 355, in <module>
    ataqc()
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_ataqc.py", line 122, in ataqc
    chr_m_reads, fraction_chr_m = get_chr_m(COORDSORT_BAM)
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/run_ataqc.py", line 215, in get_chr_m
    fract_chr_m = float(chr_m_reads) / tot_reads
UnboundLocalError: local variable 'chr_m_reads' referenced before assignment

OS/Platform and dependencies

  • OS or Platform: Ubuntu 16.04.4 (durga)
  • Cromwell/dxWDL version: cromwell 34-unknown-SNAP
  • Conda version: conda 4.5.11

I'm happy to provide error logs if needed, but this seems pretty cut and dry.

Can't complete test run

ENCSR356KRQ_subsampled_issue52.json.zip
debug_issue52.tar.gz
testRun_env_issue52.txt
testError_issue52.txt

Can't complete a test run on a local installation. Although I know that singularity or docker is preferred, that is not possible on this machine.

In addition to the information below I've attached error logs, my input .json file and my environment (after activating encode-atac-seq-pipeline).

$ uname -a
Linux node061.hpc.local 2.6.32-504.el6.x86_64 #1 SMP Tue Sep 16 01:56:35 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
$ conda -V
conda 4.5.11

cromwell-34

Technical vs. Biological Replicates

Hey

Not really an issue per se, if this is not the right place I'll move it to the googlegroup for klab genomic pipelines discussion.

I am not 100% sure, how technical replicates fit into the pipeline...

So next to the naive overlapping peaks we can additionally filter peaks for meeting specific IDR criteria: the way I see it, this seems like a pretty stringent way to adress biological replicates. Wether I will use it or not will probably depend on how much biological variation I want to adress in my downstream integrative analyses. For example for patient-derived data (one individual per replicate) I would probably use the entire naive_overlap peakset.

Technical replicates on the other hand, I usually adress by simply subsetting my peaksets for shared peaks... Which does not seem to be a straightforward possibility within this pipeline but maybe Ive overseen something. And merging technical replicates into a single fastq file somehow defeats their purpose in the first place

Could you tell me how you specifically adress the difference between technical and biological replicates?

Keep up the good work, awesome pipeline!
Chris

hg19 REG2MAP_bed

I got an error at the step of call-ataqc. I feel that the problem is caused by missing reg2map_bed file. I would like to know where can I download hg19 reg2map_bed file. Thanks!

Error during ataqc step

kundajelab/atac_dnase_pipelines#136 reported by @Chokaro

First of all, thanks for the hard work. Deploying your pipeline via docker was rather easy, even for a bioinformatics amateur like me.

Sadly it didnt go all smooth until the end, I am using cromwell and atac.wdl to access your docker container and during the ataqc step I get the following error (I get this error for the R1 files of both PE replicates)

Traceback (most recent call last):
File "/software/atac-seq-pipeline/src/encode_ataqc.py", line 355, in
ataqc()
File "/software/atac-seq-pipeline/src/encode_ataqc.py", line 213, in ataqc
ROADMAP_META, OUTPUT_PREFIX)
File "/software/atac-seq-pipeline/src/run_ataqc.py", line 948, in compare_to_roadmap
sample_data = pd.read_table(out_file, header=None)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 709, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 449, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 818, in init
self._make_engine(self.engine)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1049, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1695, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 402, in pandas._libs.parsers.TextReader.cinit
File "pandas/_libs/parsers.pyx", line 718, in pandas._libs.parsers.TextReader._setup_parser_source
IOError: File LRSC1_CD34_50k_R1.trim.merged.signal does not exist

My Input .json looks like this, the fastq.gz files are stored locally on a different hdd harddrive:

{
"atac.pipeline_type" : "atac",
"atac.genome_tsv" : "/media/chokaro/2TB_Storage_2/genome/local/hg19_local.tsv",
"atac.fastqs" : [
[
["/media/chokaro/2TB_Storage_2/2018_03_24_OmniATAC_CD34/fastq/LRSC1_CD34_10k_R1.fastq.gz",
"/media/chokaro/2TB_Storage_2/2018_03_24_OmniATAC_CD34/fastq/LRSC1_CD34_10k_R2.fastq.gz"]
],
[
["/media/chokaro/2TB_Storage_2/2018_03_24_OmniATAC_CD34/fastq/LRSC1_CD34_50k_R1.fastq.gz",
"/media/chokaro/2TB_Storage_2/2018_03_24_OmniATAC_CD34/fastq/LRSC1_CD34_50k_R2.fastq.gz"]
]
],

"atac.paired_end" : true,
"atac.multimapping" : 4,

"atac.trim_adapter.auto_detect_adapter" : true,

"atac.bowtie2.cpu" : 6,
"atac.bowtie2.mem_mb" : 16000,
"atac.bowtie2.time_hr" : 36,

"atac.filter.cpu" : 2,
"atac.filter.mem_mb" : 12000,
"atac.filter.time_hr" : 23,

"atac.macs2_mem_mb" : 16000,

"atac.smooth_win" : 73,
"atac.enable_idr" : true,
"atac.idr_thresh" : 0.05,

"atac.qc_report.name" : "test1",
"atac.qc_report.desc" : "test1 on CD34 omni ATAC"
}

And finally my OS and system config are the following:

OS: Ubuntu Xenial 16.04
cromwell 34
conda 4.5.11
Docker version 18.06.1-ce, build e68fc7a

Hoping you guys have some advice for this... In any case many thanks in advance!

best
Chris

conda requirements_py3.txt has insufficient requirements

conda/requirements_py3.txt is missing requirements for running this pipeline.

For example, running in python3 with the pipeline fails because cutadapt is not installed. This requirement is missing in requirements_py3.txt.

Multiple samples with replicates

Hello,

How can we configure/define the fastq arrays for multiple samples with technical replicates? As far as I can understand we can only define a single sample with multiple replicates if we use 1-dimensional arrays. Is this true?

Error during local conda pipeline execution

Hi,

I was testing the local pipeline with conda. And it threw an error. I can't tell what is the problem by looking this error message. Can you tell me what I could possibly be doing wrong?

This is the command I used:

java -jar -Dconfig.file=backends/backend.conf cromwell-34.jar run atac.wdl -i examples/local/ENCSR356KRQ_subsampled.json | tee -a output.txt

Thanks!

[2018-09-20 16:10:58,17] [info] BackgroundConfigAsyncJobExecutionActor [142a7da3atac.pool_ta:NA:1]: Status change f
rom WaitingForReturnCodeFile to Done
^[[D^[[D^[[D^[f[2018-09-20 16:15:14,03] [info] BackgroundConfigAsyncJobExecutionActor [142a7da3atac.macs2:0:1]: Sta
tus change from WaitingForReturnCodeFile to Done
[2018-09-20 16:15:46,95] [info] BackgroundConfigAsyncJobExecutionActor [142a7da3atac.macs2:1:1]: Status change from
 WaitingForReturnCodeFile to Done
[2018-09-20 16:15:47,22] [error] WorkflowManagerActor Workflow 142a7da3-bbb8-4762-9e12-e11886eb6c0c failed (during 
ExecutingWorkflowState): Job atac.xcor:0:1 exited with return code 1 which has not been declared as a valid return 
code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /home/trezende/atac-seq-pipeline/cromwell-executi
ons/atac/142a7da3-bbb8-4762-9e12-e11886eb6c0c/call-xcor/shard-0/execution/stderr.
 Traceback (most recent call last):
  File "/home/trezende/anaconda3/envs/encode-atac-seq-pipeline/bin/encode_xcor.py", line 102, in <module>
    main()
  File "/home/trezende/anaconda3/envs/encode-atac-seq-pipeline/bin/encode_xcor.py", line 91, in main
    ta_subsampled, args.speak, args.nth, args.out_dir)
  File "/home/trezende/anaconda3/envs/encode-atac-seq-pipeline/bin/encode_xcor.py", line 53, in xcor
    run_shell_cmd(cmd1)
  File "/home/trezende/anaconda3/envs/encode-atac-seq-pipeline/bin/encode_common.py", line 230, in run_shell_cmd
    os.killpg(pgid, signal.SIGKILL)
OSError: [Errno 3] No such process

Job atac.xcor:1:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnRetu
rnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /home/trezende/atac-seq-pipeline/cromwell-executi
ons/atac/142a7da3-bbb8-4762-9e12-e11886eb6c0c/call-xcor/shard-1/execution/stderr.
 Traceback (most recent call last):
  File "/home/trezende/anaconda3/envs/encode-atac-seq-pipeline/bin/encode_xcor.py", line 102, in <module>
    main()
  File "/home/trezende/anaconda3/envs/encode-atac-seq-pipeline/bin/encode_xcor.py", line 91, in main
    ta_subsampled, args.speak, args.nth, args.out_dir)
  File "/home/trezende/anaconda3/envs/encode-atac-seq-pipeline/bin/encode_xcor.py", line 53, in xcor
    run_shell_cmd(cmd1)
  File "/home/trezende/anaconda3/envs/encode-atac-seq-pipeline/bin/encode_common.py", line 230, in run_shell_cmd
    os.killpg(pgid, signal.SIGKILL)
OSError: [Errno 3] No such process

[2018-09-20 16:15:47,22] [info] WorkflowManagerActor WorkflowActor-142a7da3-bbb8-4762-9e12-e11886eb6c0c is in a ter
minal state: WorkflowFailedState
[2018-09-20 16:15:59,19] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'.
[2018-09-20 16:16:03,74] [info] Workflow polling stopped

pipeline fails at call-filter step

I am experiencing an issue with the call-filter step. The pipeline fails with the following stderr:

Traceback (most recent call last):
File "/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/src/encode_filter.py", line 392, in
main()
File "/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/src/encode_filter.py", line 319, in main
filt_bam, args.out_dir)
File "/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/src/encode_filter.py", line 176, in mark_dup_picard
run_shell_cmd(cmd)
File "/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/src/encode_common.py", line 230, in run_shell_cmd
os.killpg(pgid, signal.SIGKILL)
OSError: [Errno 3] No such process
ln: failed to access โ€˜.flagstat.qcโ€™: No such file or directory
ln: failed to access โ€˜
.dup.qcโ€™: No such file or directory
mkdir: cannot create directory โ€˜/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/cromwell-executions/atac/2ebc8ee7-ecae-441c-9df5-82898ae59666/call-filter/shard-0/execution/glob-37a62
59cc0c1dae299a7866489dff0bdโ€™: File exists
ln: failed to create hard link โ€˜/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/cromwell-executions/atac/2ebc8ee7-ecae-441c-9df5-82898ae59666/call-filter/shard-0/execution/glob-37a62
59cc0c1dae299a7866489dff0bd/nullโ€™: File exists
ln: failed to access โ€˜.pbc.qcโ€™: No such file or directory
mkdir: cannot create directory โ€˜/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/cromwell-executions/atac/2ebc8ee7-ecae-441c-9df5-82898ae59666/call-filter/shard-0/execution/glob-37a62
59cc0c1dae299a7866489dff0bdโ€™: File exists
ln: failed to create hard link โ€˜/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/cromwell-executions/atac/2ebc8ee7-ecae-441c-9df5-82898ae59666/call-filter/shard-0/execution/glob-37a62
59cc0c1dae299a7866489dff0bd/nullโ€™: File exists
ln: failed to access โ€˜
.mito_dup.txtโ€™: No such file or directory

And stdout:
[2018-07-10 20:47:57,293 INFO] ['/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/src/encode_filter.py', '/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/cromwell-executions/atac/2ebc8
ee7-ecae-441c-9df5-82898ae59666/call-filter/shard-0/inputs/-4793129/SCS47_merge_R1.trim.merged.bam', '--paired-end', '--multimapping', '4', '--dup-marker', 'picard', '--mapq-thresh'
, '30']
[2018-07-10 20:47:57,294 INFO] Initializing and making output directory...
[2018-07-10 20:47:57,294 INFO] Removing unmapped/low-quality reads...
[2018-07-10 20:47:57,299 INFO] run_shell_cmd: PID=128254, CMD=samtools view -F 524 -f 2 -u /labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/cromwell-executions/atac/2ebc8ee7-ecae-441c
-9df5-82898ae59666/call-filter/shard-0/inputs/-4793129/SCS47_merge_R1.trim.merged.bam | sambamba sort -n /dev/stdin -o SCS47_merge_R1.trim.merged.tmp_filt.bam -t 1
[2018-07-10 21:28:04,936 INFO] run_shell_cmd: PID=132462, CMD=samtools view -h SCS47_merge_R1.trim.merged.tmp_filt.bam -@ 1 | $(which assign_multimappers.py) -k 4 --paired-end | sam
tools fixmate -r /dev/stdin SCS47_merge_R1.trim.merged.fixmate.bam
[2018-07-10 21:41:49,466 INFO] run_shell_cmd: PID=133516, CMD=rm -f SCS47_merge_R1.trim.merged.tmp_filt.bam
[2018-07-10 21:41:50,121 INFO] run_shell_cmd: PID=133518, CMD=samtools view -F 1804 -f 2 -u SCS47_merge_R1.trim.merged.fixmate.bam | sambamba sort /dev/stdin -o SCS47_merge_R1.trim.
merged.filt.bam -t 1
[2018-07-10 22:00:35,046 INFO] run_shell_cmd: PID=135008, CMD=rm -f SCS47_merge_R1.trim.merged.fixmate.bam
[2018-07-10 22:00:35,366 INFO] Marking dupes with picard...
[2018-07-10 22:00:35,371 INFO] run_shell_cmd: PID=135010, CMD=java -Xmx4G -jar $(which picard.jar) MarkDuplicates INPUT=SCS47_merge_R1.trim.merged.filt.bam OUTPUT=SCS47_merge_R1.tri
m.merged.dupmark.bam METRICS_FILE=SCS47_merge_R1.trim.merged.dup.qc VALIDATION_STRINGENCY=LENIENT ASSUME_SORTED=true REMOVE_DUPLICATES=false
PID=135010: which: no picard.jar in (/home/mdegorte/miniconda3/envs/encode-atac-seq-pipeline/bin:/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/src:/home/mdegorte/miniconda3/bin:/la
bs/smontgom/mdegorte/ATAC/atac-seq-pipeline/src:/home/mdegorte/miniconda3/bin:/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/src:/home/mdegorte/miniconda3/bin:/scg/slurm/current/bin
:/scg/slurm/current/sbin:/scg/slurm/utils:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/mdegorte/.bds:/home/mdegorte/.globus-cli-virtualenv/bin:/home
/mdegorte/bin:/home/mdegorte/.bds:/home/mdegorte/.globus-cli-virtualenv/bin:/home/mdegorte/.bds:/home/mdegorte/.globus-cli-virtualenv/bin:/home/mdegorte/bin)
PID=135010: Error: Unable to access jarfile MarkDuplicates
[2018-07-10 22:00:35,399 ERROR] Unknown exception caught. Killing process group 135010...
Traceback (most recent call last):
File "/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/src/encode_common.py", line 224, in run_shell_cmd
p.returncode, cmd)
CalledProcessError: Command 'java -Xmx4G -jar $(which picard.jar) MarkDuplicates INPUT=SCS47_merge_R1.trim.merged.filt.bam OUTPUT=SCS47_merge_R1.trim.merged.dupmark.bam METRICS_FILE
=SCS47_merge_R1.trim.merged.dup.qc VALIDATION_STRINGENCY=LENIENT ASSUME_SORTED=true REMOVE_DUPLICATES=false' returned non-zero exit status 1

It looks like an issue with picard. Any help would be appreciated. Thanks!

directory question

Hi Jin,

Thank you for your help so far! I've run into a little issue that may be easy to fix or just not an option. So far I've been successful in running the pipeline on the test data. I noticed that I have to run it from within the atac-seq-pipeline directory and the input data seems to need to be contained in that directory or a subdirectory as well. I was told by our cluster admin that I should install the pipeline on my $HOME directory but store my large input fastq files and submit my sbatch jobs on my $WORK directory. I've tried a few different things, but I always get errors about the pipeline not finding a directory because it has appended the current working directory to the path I gave it in the json.

For example, my input data is currently in work/eaclark/fastq. The pipeline is in home/eaclark/atac-seq-pipeline. If I submit the job from within atac-seq-pipeline I get an error because it tried to find the input file in "/home/eaclark/atac-seq-pipeline/work/eaclark/fastq/sample.fastq.gz". So, it took the path I gave in the json which was "work/eaclark/fastq/sample.fastq.gz" and added the current working directory "/home/eaclark/atac-seq-pipeline" in front suggesting to me that it will always look for files in the current directory or subdirectory. I tried it the other way around, ie submit the job from $WORK, but then I get and error that it can't find pipeline files.

It there a way to run the pipeline the way my cluster admin wants? Or is that not an option, and I'll either have to move the input files to home/eaclark/atac-seq-pipeline, or install the pipeline on $WORK.

I hope my question makes sense.

Thanks!
Erin

ATAC pipeline run on slurm report error

Hi, thanks your wonderful work.
I run in/mypath/atac-seq-pipeline/
and source activate encode-atac-seq-pipeline

java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm /my_path/local/bin/cromwell-34.jar run atac.wdl -i /my_path1/input.json -o /my_path2/atac-seq-pipeline/workflow_opts/slurm.json

But just one file named "cromwell-workflow-logs" left but nothing in it
Jenkinsfile LICENSE README.md atac.wdl backends conda **cromwell-workflow-logs** docker_image docs examples genome src test workflow_opts
What's more, when it was running, it shows the following on the screen:

[2018-09-08 09:23:52,43] [info] Running with database db.url = jdbc:hsqldb:mem:a42fb754-58fc-418e-8224-01cd57b5b131;shutdown=false;hsqldb.tx=mvcc
[2018-09-08 09:24:01,66] [info] Running migration RenameWorkflowOptionsInMetadata with a read batch size of 100000 and a write batch size of 100000
[2018-09-08 09:24:01,67] [info] [RenameWorkflowOptionsInMetadata] 100%
[2018-09-08 09:24:01,78] [info] Running with database db.url = jdbc:hsqldb:mem:8c25714f-6a58-4b03-bf8d-b686ee8442fc;shutdown=false;hsqldb.tx=mvcc
[2018-09-08 09:24:02,13] [warn] This actor factory is deprecated. Please use cromwell.backend.google.pipelines.v1alpha2.PipelinesApiLifecycleActorFactory for
PAPI v1 or cromwell.backend.google.pipelines.v2alpha1.PipelinesApiLifecycleActorFactory for PAPI v2
[2018-09-08 09:24:02,16] [warn] Couldn't find a suitable DSN, defaulting to a Noop one.
[2018-09-08 09:24:02,16] [info] Using noop to send events.
[2018-09-08 09:24:02,44] [info] Slf4jLogger started
[2018-09-08 09:24:02,66] [info] Workflow heartbeat configuration:
{
  "cromwellId" : "cromid-d9e2d67",
  "heartbeatInterval" : "2 minutes",
  "ttl" : "10 minutes",
  "writeBatchSize" : 10000,
  "writeThreshold" : 10000
}
[2018-09-08 09:24:02,69] [info] Metadata summary refreshing every 2 seconds.
[2018-09-08 09:24:02,72] [info] WriteMetadataActor configured to flush with batch size 200 and process rate 5 seconds.
[2018-09-08 09:24:02,72] [info] KvWriteActor configured to flush with batch size 200 and process rate 5 seconds.
[2018-09-08 09:24:02,72] [info] CallCacheWriteActor configured to flush with batch size 100 and process rate 3 seconds.
[2018-09-08 09:24:03,69] [info] JobExecutionTokenDispenser - Distribution rate: 50 per 1 seconds.
[2018-09-08 09:24:03,71] [info] SingleWorkflowRunnerActor: Version 34
[2018-09-08 09:24:03,71] [info] JES batch polling interval is 33333 milliseconds
[2018-09-08 09:24:03,71] [info] JES batch polling interval is 33333 milliseconds
[2018-09-08 09:24:03,71] [info] JES batch polling interval is 33333 milliseconds
[2018-09-08 09:24:03,71] [info] PAPIQueryManager Running with 3 workers
[2018-09-08 09:24:03,72] [info] SingleWorkflowRunnerActor: Submitting workflow
[2018-09-08 09:24:03,77] [info] Unspecified type (Unspecified version) workflow 1e03bf36-d64b-42a7-9857-a644de257de3 submitted
[2018-09-08 09:24:03,82] [info] SingleWorkflowRunnerActor: Workflow submitted 1e03bf36-d64b-42a7-9857-a644de257de3
[2018-09-08 09:24:03,82] [info] 1 new workflows fetched
[2018-09-08 09:24:03,82] [info] WorkflowManagerActor Starting workflow 1e03bf36-d64b-42a7-9857-a644de257de3
[2018-09-08 09:24:03,83] [warn] SingleWorkflowRunnerActor: received unexpected message: Done in state RunningSwraData
[2018-09-08 09:24:03,83] [info] WorkflowManagerActor Successfully started WorkflowActor-1e03bf36-d64b-42a7-9857-a644de257de3
[2018-09-08 09:24:03,83] [info] Retrieved 1 workflows from the WorkflowStoreActor
[2018-09-08 09:24:03,85] [info] WorkflowStoreHeartbeatWriteActor configured to flush with batch size 10000 and process rate 2 minutes.
[2018-09-08 09:24:03,89] [info] MaterializeWorkflowDescriptorActor [1e03bf36]: Parsing workflow as WDL draft-2
[2018-09-08 09:24:22,52] [error] WorkflowManagerActor Workflow 1e03bf36-d64b-42a7-9857-a644de257de3 failed (during MaterializingWorkflowDescriptorState): cro
mwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor$$anon$1: Workflow input processing failed:
Unexpected character ']' at input index 643 (line 13, position 5), expected JSON Value:
    ],
    ^


        at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor.cromwell$engine$workflow$lifecycle$materialization$Materiali
zeWorkflowDescriptorActor$$workflowInitializationFailed(MaterializeWorkflowDescriptorActor.scala:200)
        at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor$$anonfun$2.applyOrElse(MaterializeWorkflowDescriptorActor.sc
ala:170)
        at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor$$anonfun$2.applyOrElse(MaterializeWorkflowDescriptorActor.sc
ala:165)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
        at akka.actor.FSM.processEvent(FSM.scala:670)
        at akka.actor.FSM.processEvent$(FSM.scala:667)
        at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor.akka$actor$LoggingFSM$$super$processEvent(MaterializeWorkflo
wDescriptorActor.scala:123)
        at akka.actor.LoggingFSM.processEvent(FSM.scala:806)
        at akka.actor.LoggingFSM.processEvent$(FSM.scala:788)
        at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor.processEvent(MaterializeWorkflowDescriptorActor.scala:123)
        at akka.actor.FSM.akka$actor$FSM$$processMsg(FSM.scala:664)
        at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:658)
        at akka.actor.Actor.aroundReceive(Actor.scala:517)
        at akka.actor.Actor.aroundReceive$(Actor.scala:515)
        at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor.aroundReceive(MaterializeWorkflowDescriptorActor.scala:123)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
        at akka.actor.ActorCell.invoke(ActorCell.scala:557)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
        at akka.dispatch.Mailbox.run(Mailbox.scala:225)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)                                                            [6/1501]


[2018-09-08 09:24:22,52] [info] WorkflowManagerActor WorkflowActor-1e03bf36-d64b-42a7-9857-a644de257de3 is in a terminal state: WorkflowFailedState
[2018-09-08 09:24:25,09] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'.
[2018-09-08 09:24:27,74] [info] Workflow polling stopped
[2018-09-08 09:24:27,76] [info] Shutting down WorkflowStoreActor - Timeout = 5 seconds
[2018-09-08 09:24:27,76] [info] Shutting down WorkflowLogCopyRouter - Timeout = 5 seconds
[2018-09-08 09:24:27,76] [info] Shutting down JobExecutionTokenDispenser - Timeout = 5 seconds
[2018-09-08 09:24:27,77] [info] Aborting all running workflows.
[2018-09-08 09:24:27,77] [info] JobExecutionTokenDispenser stopped
[2018-09-08 09:24:27,77] [info] WorkflowStoreActor stopped
[2018-09-08 09:24:27,78] [info] WorkflowLogCopyRouter stopped
[2018-09-08 09:24:27,78] [info] Shutting down WorkflowManagerActor - Timeout = 3600 seconds
[2018-09-08 09:24:27,78] [info] WorkflowManagerActor All workflows finished
[2018-09-08 09:24:27,78] [info] WorkflowManagerActor stopped
[2018-09-08 09:24:27,78] [info] Connection pools shut down
[2018-09-08 09:24:27,78] [info] Shutting down SubWorkflowStoreActor - Timeout = 1800 seconds
[2018-09-08 09:24:27,79] [info] Shutting down JobStoreActor - Timeout = 1800 seconds
[2018-09-08 09:24:27,79] [info] Shutting down CallCacheWriteActor - Timeout = 1800 seconds
[2018-09-08 09:24:27,79] [info] SubWorkflowStoreActor stopped
[2018-09-08 09:24:27,79] [info] Shutting down ServiceRegistryActor - Timeout = 1800 seconds
[2018-09-08 09:24:27,79] [info] Shutting down DockerHashActor - Timeout = 1800 seconds
[2018-09-08 09:24:27,79] [info] JobStoreActor stopped
[2018-09-08 09:24:27,79] [info] Shutting down IoProxy - Timeout = 1800 seconds
[2018-09-08 09:24:27,79] [info] CallCacheWriteActor Shutting down: 0 queued messages to process
[2018-09-08 09:24:27,79] [info] WriteMetadataActor Shutting down: 0 queued messages to process
[2018-09-08 09:24:27,79] [info] CallCacheWriteActor stopped
[2018-09-08 09:24:27,79] [info] KvWriteActor Shutting down: 0 queued messages to process
[2018-09-08 09:24:27,79] [info] DockerHashActor stopped
[2018-09-08 09:24:27,79] [info] IoProxy stopped
[2018-09-08 09:24:27,79] [info] ServiceRegistryActor stopped
[2018-09-08 09:24:27,81] [info] Database closed
[2018-09-08 09:24:27,81] [info] Stream materializer shut down
Workflow 1e03bf36-d64b-42a7-9857-a644de257de3 transitioned to state Failed
[2018-09-08 09:24:27,85] [info] Automatic shutdown of the async connection
[2018-09-08 09:24:27,85] [info] Gracefully shutdown sentry threads.
[2018-09-08 09:24:27,85] [info] Shutdown finished.

I follow https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/docs/tutorial_slurm.md, since I should run on my shool's slurm not my local pc but not stanford universityโ€˜s slurm.

Would you have any advice about my two errors?

Major error not caught by pipeline

Major error not caught by pipeline

The pipeline does not output an error in call-macs2/shard-0/execution/stderr if the specified blacklist BED file is improperly formatted. It instead runs to completion, which results in empty final peak files without an obvious error.

The problem occurs within the following step in the call-macs2 module (from call-macs2/shard-0/execution/stdout):

[2018-10-28 03:38:52,988 INFO] run_shell_cmd: PID=131102, CMD=bedtools intersect -v -a 20180815-14-Adipose-002-powder_S14_L001_R1_001.trim.merged.nodup.tn5.pval0.01.300K.narrowPeak.tmp1 -b rn6_blacklist.bed.tmp2 | awk 'BEGIN{OFS="\t"} {if ($5>1000) $5=1000; print $0}' | grep -P 'chr[\dXY]+[ \t]' | gzip -nc > 20180815-14-Adipose-002-powder_S14_L001_R1_001.trim.merged.nodup.tn5.pval0.01.300K.bfilt.narrowPeak.gz

It is easy to see why an improperly formatted BED file would have caused this step to fail, but it should have output the error to the stderr file (which was empty in call-macs2/shard-0/execution) and terminated the pipeline. Instead, at first glance it appeared the pipeline finished running without error and took a bit of digging to find the problem.

I don't have the exact error or log files because they were removed by another user after fixing the issue.

OS/Platform and dependencies

  • Platform: Ubuntu 16.04.4
  • Cromwell: cromwell-34
  • Conda version: conda 4.5.11

ataqc module fails

After implementing the change in #64, call-ataqc fails with the following error:

Traceback (most recent call last):
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_ataqc.py", line 355, in <module>
    ataqc()
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_ataqc.py", line 186, in ataqc
    read_len)
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/run_ataqc.py", line 438, in make_tss_plot
    processes=processes, stranded=True)
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/lib/python2.7/site-packages/metaseq/_genomic_signal.py", line 122, in array
    chunksize=chunksize, **kwargs)
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/lib/python2.7/site-packages/metaseq/array_helpers.py", line 383, in _array_parallel
    itertools.repeat(kwargs)))
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/lib/python2.7/multiprocessing/pool.py", line 253, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
ValueError: invalid reference `chr1`

This might be related to non-standard chromosome names. I am using the "atac.keep_irregular_chr_in_bfilt_peak" : false option.

OS/Platform and dependencies

  • OS or Platform: Ubuntu 16.04.4 (durga)
  • Cromwell/dxWDL version: cromwell 34-unknown-SNAP
  • Conda version: conda 4.5.11

debug_65.tar.gz

genome data on sherlock

Hi,
Thanks again for making your pipeline available.
I'm trying to run the wdl pipeline on sherlock, using 'sherlock.tsv' file for genome, and using shared genome data. Seems the ".tar" files for bowtie2 and bwa indexes can't be found?
Thanks!

Incompatibility issues with custom reference genome

Incompatibility issues with custom reference genome

Macs2 peak calling fails if the FASTA file used to build a custom genome database does not follow the chr[\dXY] naming convention. For example, I am using the Ensembl masked version of the rat genome (rn6, release 94) found here ftp://ftp.ensembl.org/pub/release-94/fasta/rattus_norvegicus/dna/, which does not prepend 'chr' to chromosome names. The error is produced by the following call:

[2018-10-31 18:06:20,887 ERROR] Unknown exception caught. Killing process group 72093...
Traceback (most recent call last):
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_common.py", line 224, in run_shell_cmd
    p.returncode, cmd)
CalledProcessError: Command 'cat /mnt/lab_data/montgomery/nicolerg/motrpac/atac/pipeline-output/cromwell-executions/atac/03a28d2a-f364-4ce1-bfd5-f10488cf42a9/call-macs2/shard-0/inputs/-78707573/rn6_masked.chrom.sizes | grep -P 'chr[\dXY]+[ \t]' > 20180725_2_Gastroc_002_powder_S1_L001_R1_001.trim.merged.nodup.tn5.pval0.01.300K.bfilt.chrsz.tmp' returned non-zero exit status 1

OS/Platform and dependencies

  • Platform: Ubuntu 16.04.4
  • Cromwell: cromwell-34
  • Conda version: conda 4.5.11

Simplify conda installation

Hi,

would it be interesting to you to simplify the conda installation procedure?

I would be willing to help. There are a couple of things that could be done:

  1. Add idr, phantompeakqualtools and picard-v2.10.6 to bioconda
  2. See if it is possible to fix graphviz, otherwise enforce the anaconda version already in the environment file

My goal would be to be able to set up the environment directly from an environment file.

Incompatibility with masked reference genomes

Incompatibility with masked reference genomes

call-macs2 fails if a masked reference (where Ns are used to indicate repetitive regions) is used to build a custom genome reference database.

run_shell_cmd: PID=70151, CMD=bedtools intersect -a ${SAMPLE}.trim.merged.nodup.tn5.tagAlign.tmp1 -b ${SAMPLE}.trim.merged.nodup.tn5.pval0.01.300K.bfilt.narrowPeak.tmp2 -wa -u | wc -l

Traceback (most recent call last):
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_macs2_atac.py", line 210, in <module>
    main()
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_macs2_atac.py", line 200, in main
    frip_qc = frip( args.ta, bfilt_npeak, args.out_dir)
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_frip.py", line 54, in frip
    write_txt(frip_qc, str(float(val1)/float(val2)))
ValueError: could not convert string to float: ***** WARNING: File rat_liver7_S12_L001_R1_001.trim.merged.nodup.tn5.tagAlign.tmp1 has inconsistent naming convention for record:
AABR07024382.1	100568	100639	N	1000	+

If it would be impractical to include compatibility with masked references, it would be helpful to specify that the pipeline is incompatible with masked references in the documentation. I understand that this is the function of the blacklist input, but blacklisted regions can be more difficult to define for less popular model organisms.

Empty Success array causes pipeline to fail

I'm running the ATAC-SEQ pipeline on a toy file with 1 pair end read (2 fastq files, both 100 lines long). I would like ATAC-SEQ to output bam files, but it keeps on crashing at a stage near to the end.

I'm following the NIH's guide for running ATAC-SEQ, so I'm working with the SLURM scheduler using the NIH's settings (backend, Cromwell, and wdl) and a file stored by them for the hg19 genome. https://hpc.nih.gov/apps/encode-atac-seq-pipeline.html While I can run through their example, I cannot get my data to work.

I believe this issue has come up a few times in the past, but the original posters went inactive before a solution could be settled on. See https://github.com/ENCODE-DCC/chip-seq-pipeline2/issues/15 and https://github.com/ENCODE-DCC/atac-seq-pipeline/issues/17. I did try loading picard, but that did not change my results

Here is where the error message appears:

[2018-10-25 18:05:25,88] [error] WorkflowManagerActor Workflow 37f13417-8676-469f-b575-aa80b5ce6c24 failed (during ExecutingWorkflowState): cromwell.backend.standard.StandardAsyncExecutionActor$$anon$2: Failed to evaluate job outputs:
Bad output 'filter.mito_dup_log': Failed to find index Success(WomInteger(0)) on array:

Success([])

Here is my JSON input:
{
"atac.pipeline_type" : "atac",
"atac.genome_tsv" : "/fdb/encode-atac-seq-pipeline/hg19/hg19.tsv",
"atac.fastqs_rep1_R1" : ["SRX860_1.fastq.gz"],
"atac.fastqs_rep1_R2" : ["SRX860_2.fastq.gz"],
"atac.paired_end" : true,
"atac.align_only" : true
}

And here are the attached files:
debug_[44].tar.gz
SRX860_1.fastq.gz
SRX860_2.fastq.gz

Any help would be appreciated.

Thanks,

Jonathan

Pipeline fails using test data

Hi Jin,
The atac-seq pipeline fails at BackgroundConfigAsyncJobExecutionActor. I'm also getting a warning that "Localization via hard link has failed". Attached is my script and the output. I'm using coda version 4.5.11 and the .json file you provide.

Thanks,
Kirsty

run.sh.txt
run.sh.o489151.txt

IDR section fails in qc report due to error in call-reproducibility_overlap

The pipeline runs and goes to completion, but when I check the qc report I get this error:
screen shot 2018-12-14 at 4 06 40 pm

However, when I look at the directory I see the optimal set and conservative set files.
stderr output is:
ln: failed to access 'conservative_peak.*.hammock_gz*': No such file or directory

It doesn't appear to derail the pipeline, but there appears to be some error in the generation of that file for downstream usage. I did pull the most recent release before running this, and another run with an older release gave the same error.

OS/Platform and dependencies

  • CentOS cluster with SLURM scheduler
  • Cromwell-34
  • Singularity version-2.5.2.

Here's the entire debug log:
debug_68.tar.gz

Error while running test data with Conda

Hi! I'm getting an error when running encode-atac-seq-pipeline with test data. I've installed it locally with Conda. I think the problem could be related with call-trim_adapter but I don't know how to fix it. I post the log here:

[2018-10-03 19:36:33,60] [info] Running with database db.url = jdbc:hsqldb:mem:7053f622-c273-4e8c-9de4-e8d2b6ac8888;shutdown=false;hsqldb.tx=mvcc
[2018-10-03 19:36:39,46] [info] Running migration RenameWorkflowOptionsInMetadata with a read batch size of 100000 and a write batch size of 100000
[2018-10-03 19:36:39,48] [info] [RenameWorkflowOptionsInMetadata] 100%
[2018-10-03 19:36:39,56] [info] Running with database db.url = jdbc:hsqldb:mem:c6327cac-109b-4c33-8b59-4d807ad2cdcb;shutdown=false;hsqldb.tx=mvcc
[2018-10-03 19:36:39,84] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] This actor factory is deprecated. Please use cromwell.backend.google.pipelines.v1alpha2.PipelinesApiLifecycleActorFactory for PAPI v1 or cromwell.backend.google.pipelines.v2alpha1.PipelinesApiLifecycleActorFactory for PAPI v2
[2018-10-03 19:36:39,85] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Couldn't find a suitable DSN, defaulting to a Noop one.
[2018-10-03 19:36:39,85] [info] Using noop to send events.
[2018-10-03 19:36:40,01] [info] Slf4jLogger started
[2018-10-03 19:36:40,14] [info] Workflow heartbeat configuration:
{
"cromwellId" : "cromid-bb0edc6",
"heartbeatInterval" : "2 minutes",
"ttl" : "10 minutes",
"writeBatchSize" : 10000,
"writeThreshold" : 10000
}
[2018-10-03 19:36:40,17] [info] Metadata summary refreshing every 2 seconds.
[2018-10-03 19:36:40,20] [info] KvWriteActor configured to flush with batch size 200 and process rate 5 seconds.
[2018-10-03 19:36:40,20] [info] WriteMetadataActor configured to flush with batch size 200 and process rate 5 seconds.
[2018-10-03 19:36:40,22] [info] CallCacheWriteActor configured to flush with batch size 100 and process rate 3 seconds.
[2018-10-03 19:36:40,70] [info] JobExecutionTokenDispenser - Distribution rate: 50 per 1 seconds.
[2018-10-03 19:36:40,71] [info] JES batch polling interval is 33333 milliseconds
[2018-10-03 19:36:40,71] [info] JES batch polling interval is 33333 milliseconds
[2018-10-03 19:36:40,72] [info] JES batch polling interval is 33333 milliseconds
[2018-10-03 19:36:40,72] [info] PAPIQueryManager Running with 3 workers
[2018-10-03 19:36:40,73] [info] SingleWorkflowRunnerActor: Version 34
[2018-10-03 19:36:40,73] [info] SingleWorkflowRunnerActor: Submitting workflow
[2018-10-03 19:36:40,76] [info] Unspecified type (Unspecified version) workflow 6b7a8369-13c6-46a6-8aaf-777ad039f3b1 submitted
[2018-10-03 19:36:40,79] [info] SingleWorkflowRunnerActor: Workflow submitted ๏ฟฝ[38;5;2m6b7a8369-13c6-46a6-8aaf-777ad039f3b1๏ฟฝ[0m
[2018-10-03 19:36:40,80] [info] 1 new workflows fetched
[2018-10-03 19:36:40,80] [info] WorkflowManagerActor Starting workflow ๏ฟฝ[38;5;2m6b7a8369-13c6-46a6-8aaf-777ad039f3b1๏ฟฝ[0m
[2018-10-03 19:36:40,80] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] SingleWorkflowRunnerActor: received unexpected message: Done in state RunningSwraData
[2018-10-03 19:36:40,80] [info] WorkflowManagerActor Successfully started WorkflowActor-6b7a8369-13c6-46a6-8aaf-777ad039f3b1
[2018-10-03 19:36:40,80] [info] Retrieved 1 workflows from the WorkflowStoreActor
[2018-10-03 19:36:40,81] [info] WorkflowStoreHeartbeatWriteActor configured to flush with batch size 10000 and process rate 2 minutes.
[2018-10-03 19:36:40,85] [info] MaterializeWorkflowDescriptorActor [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Parsing workflow as WDL draft-2
[2018-10-03 19:36:48,39] [info] MaterializeWorkflowDescriptorActor [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Call-to-Backend assignments: atac.filter -> Local, atac.overlap_pr -> Local, atac.macs2 -> Local, atac.macs2_ppr1 -> Local, atac.reproducibility_overlap -> Local, atac.pool_ta -> Local, atac.read_genome_tsv -> Local, atac.macs2_ppr2 -> Local, atac.idr -> Local, atac.macs2_pooled -> Local, atac.idr_ppr -> Local, atac.pool_ta_pr2 -> Local, atac.spr -> Local, atac.bowtie2 -> Local, atac.qc_report -> Local, atac.bam2ta -> Local, atac.xcor -> Local, atac.ataqc -> Local, atac.pool_ta_pr1 -> Local, atac.macs2_pr2 -> Local, atac.trim_adapter -> Local, atac.reproducibility_idr -> Local, atac.macs2_pr1 -> Local, atac.overlap_ppr -> Local, atac.idr_pr -> Local, atac.overlap -> Local
[2018-10-03 19:36:48,45] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [preemptible, disks, cpu, time, memory] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] Local [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:50,61] [info] WorkflowExecutionActor-6b7a8369-13c6-46a6-8aaf-777ad039f3b1 [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Starting atac.read_genome_tsv
[2018-10-03 19:36:50,61] [info] WorkflowExecutionActor-6b7a8369-13c6-46a6-8aaf-777ad039f3b1 [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Condition met: '!align_only && !true_rep_only && enable_idr'. Running conditional section
[2018-10-03 19:36:50,61] [info] WorkflowExecutionActor-6b7a8369-13c6-46a6-8aaf-777ad039f3b1 [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Condition met: 'enable_idr'. Running conditional section
[2018-10-03 19:36:50,61] [info] WorkflowExecutionActor-6b7a8369-13c6-46a6-8aaf-777ad039f3b1 [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Condition met: '!align_only && !true_rep_only'. Running conditional section
[2018-10-03 19:36:50,62] [info] WorkflowExecutionActor-6b7a8369-13c6-46a6-8aaf-777ad039f3b1 [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Condition met: 'enable_idr'. Running conditional section
[2018-10-03 19:36:50,62] [info] WorkflowExecutionActor-6b7a8369-13c6-46a6-8aaf-777ad039f3b1 [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Condition met: '!disable_xcor'. Running conditional section
[2018-10-03 19:36:50,62] [info] WorkflowExecutionActor-6b7a8369-13c6-46a6-8aaf-777ad039f3b1 [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Condition met: '!true_rep_only'. Running conditional section
[2018-10-03 19:36:50,77] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] BackgroundConfigAsyncJobExecutionActor [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0matac.read_genome_tsv:NA:1]: Unrecognized runtime attribute keys: disks, cpu, time, memory
[2018-10-03 19:36:51,03] [info] BackgroundConfigAsyncJobExecutionActor [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0matac.read_genome_tsv:NA:1]: ๏ฟฝ[38;5;5mcat /media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-read_genome_tsv/inputs/1631258567/hg38_local.tsv๏ฟฝ[0m
[2018-10-03 19:36:51,06] [info] BackgroundConfigAsyncJobExecutionActor [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0matac.read_genome_tsv:NA:1]: executing: /bin/bash /media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-read_genome_tsv/execution/script
[2018-10-03 19:36:54,71] [info] WorkflowExecutionActor-6b7a8369-13c6-46a6-8aaf-777ad039f3b1 [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0m]: Starting atac.trim_adapter (2 shards)
[2018-10-03 19:36:55,23] [info] BackgroundConfigAsyncJobExecutionActor [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0matac.read_genome_tsv:NA:1]: job id: 6170
[2018-10-03 19:36:55,23] [info] BackgroundConfigAsyncJobExecutionActor [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0matac.read_genome_tsv:NA:1]: Status change from - to Done
[2018-10-03 19:36:55,72] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] BackgroundConfigAsyncJobExecutionActor [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0matac.trim_adapter:0:1]: Unrecognized runtime attribute keys: disks, cpu, time, memory
[2018-10-03 19:36:55,72] [๏ฟฝ[38;5;220mwarn๏ฟฝ[0m] BackgroundConfigAsyncJobExecutionActor [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0matac.trim_adapter:1:1]: Unrecognized runtime attribute keys: disks, cpu, time, memory
[2018-10-03 19:36:55,75] [info] BackgroundConfigAsyncJobExecutionActor [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0matac.trim_adapter:0:1]: ๏ฟฝ[38;5;5mpython $(which encode_trim_adapter.py)
/media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-0/execution/write_tsv_6a0314610cecf7758f36a04f6f18802a.tmp
--adapters /media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-0/execution/write_tsv_d41d8cd98f00b204e9800998ecf8427e.tmp
--paired-end
--auto-detect-adapter
--min-trim-len 5
--err-rate 0.1
--nth 1๏ฟฝ[0m
[2018-10-03 19:36:55,75] [info] BackgroundConfigAsyncJobExecutionActor [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0matac.trim_adapter:0:1]: executing: /bin/bash /media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-0/execution/script
[2018-10-03 19:36:55,75] [info] BackgroundConfigAsyncJobExecutionActor [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0matac.trim_adapter:1:1]: ๏ฟฝ[38;5;5mpython $(which encode_trim_adapter.py)
/media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-1/execution/write_tsv_4702c8116b4f355f887138a23f9f2e3d.tmp
--adapters /media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-1/execution/write_tsv_d41d8cd98f00b204e9800998ecf8427e.tmp
--paired-end
--auto-detect-adapter
--min-trim-len 5
--err-rate 0.1
--nth 1๏ฟฝ[0m
[2018-10-03 19:36:55,76] [info] BackgroundConfigAsyncJobExecutionActor [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0matac.trim_adapter:1:1]: executing: /bin/bash /media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-1/execution/script
[2018-10-03 19:37:00,22] [info] BackgroundConfigAsyncJobExecutionActor [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0matac.trim_adapter:0:1]: job id: 6195
[2018-10-03 19:37:00,22] [info] BackgroundConfigAsyncJobExecutionActor [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0matac.trim_adapter:1:1]: job id: 6203
[2018-10-03 19:37:00,22] [info] BackgroundConfigAsyncJobExecutionActor [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0matac.trim_adapter:0:1]: Status change from - to Done
[2018-10-03 19:37:00,22] [info] BackgroundConfigAsyncJobExecutionActor [๏ฟฝ[38;5;2m6b7a8369๏ฟฝ[0matac.trim_adapter:1:1]: Status change from - to Done
[2018-10-03 19:37:00,87] [๏ฟฝ[38;5;1merror๏ฟฝ[0m] WorkflowManagerActor Workflow 6b7a8369-13c6-46a6-8aaf-777ad039f3b1 failed (during ExecutingWorkflowState): Job atac.trim_adapter:1:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-1/execution/stderr.
File "/media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-1/execution/write_tsv_4702c8116b4f355f887138a23f9f2e3d.tmp", line 1
/media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-1/inputs/-505442145/ENCFF641SFZ.subsampled.400.fastq.gz /media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-1/inputs/-505442144/ENCFF031ARQ.subsampled.400.fastq.gz
^
SyntaxError: invalid syntax

Job atac.trim_adapter:0:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-0/execution/stderr.
File "/media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-0/execution/write_tsv_6a0314610cecf7758f36a04f6f18802a.tmp", line 1
/media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-0/inputs/-1392945826/ENCFF341MYG.subsampled.400.fastq.gz /media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-0/inputs/-1392945825/ENCFF248EJF.subsampled.400.fastq.gz
^
SyntaxError: invalid syntax

[2018-10-03 19:37:00,87] [info] WorkflowManagerActor WorkflowActor-6b7a8369-13c6-46a6-8aaf-777ad039f3b1 is in a terminal state: WorkflowFailedState
[2018-10-03 19:37:07,81] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'.
[2018-10-03 19:37:10,23] [info] Workflow polling stopped
[2018-10-03 19:37:10,25] [info] Shutting down WorkflowStoreActor - Timeout = 5 seconds
[2018-10-03 19:37:10,25] [info] Shutting down WorkflowLogCopyRouter - Timeout = 5 seconds
[2018-10-03 19:37:10,25] [info] Aborting all running workflows.
[2018-10-03 19:37:10,25] [info] Shutting down JobExecutionTokenDispenser - Timeout = 5 seconds
[2018-10-03 19:37:10,25] [info] JobExecutionTokenDispenser stopped
[2018-10-03 19:37:10,25] [info] WorkflowStoreActor stopped
[2018-10-03 19:37:10,26] [info] WorkflowLogCopyRouter stopped
[2018-10-03 19:37:10,26] [info] Shutting down WorkflowManagerActor - Timeout = 3600 seconds
[2018-10-03 19:37:10,26] [info] WorkflowManagerActor All workflows finished
[2018-10-03 19:37:10,26] [info] WorkflowManagerActor stopped
[2018-10-03 19:37:10,26] [info] Connection pools shut down
[2018-10-03 19:37:10,26] [info] Shutting down SubWorkflowStoreActor - Timeout = 1800 seconds
[2018-10-03 19:37:10,26] [info] Shutting down JobStoreActor - Timeout = 1800 seconds
[2018-10-03 19:37:10,26] [info] Shutting down CallCacheWriteActor - Timeout = 1800 seconds
[2018-10-03 19:37:10,26] [info] SubWorkflowStoreActor stopped
[2018-10-03 19:37:10,26] [info] Shutting down ServiceRegistryActor - Timeout = 1800 seconds
[2018-10-03 19:37:10,26] [info] Shutting down DockerHashActor - Timeout = 1800 seconds
[2018-10-03 19:37:10,26] [info] JobStoreActor stopped
[2018-10-03 19:37:10,26] [info] Shutting down IoProxy - Timeout = 1800 seconds
[2018-10-03 19:37:10,26] [info] WriteMetadataActor Shutting down: 0 queued messages to process
[2018-10-03 19:37:10,26] [info] KvWriteActor Shutting down: 0 queued messages to process
[2018-10-03 19:37:10,26] [info] CallCacheWriteActor Shutting down: 0 queued messages to process
[2018-10-03 19:37:10,26] [info] DockerHashActor stopped
[2018-10-03 19:37:10,26] [info] IoProxy stopped
[2018-10-03 19:37:10,26] [info] CallCacheWriteActor stopped
[2018-10-03 19:37:10,26] [info] ServiceRegistryActor stopped
[2018-10-03 19:37:10,27] [info] Database closed
[2018-10-03 19:37:10,27] [info] Stream materializer shut down
Workflow 6b7a8369-13c6-46a6-8aaf-777ad039f3b1 transitioned to state Failed
[2018-10-03 19:37:10,30] [info] Automatic shutdown of the async connection
[2018-10-03 19:37:10,30] [info] Gracefully shutdown sentry threads.
[2018-10-03 19:37:10,30] [info] Shutdown finished.

OS/Platform and dependencies

  • OS or Platform: Ubuntu 14.04.5 LTS
  • Cromwell/dxWDL version: cromwell-34
  • Conda version: conda 4.5.11

Thank you so much!!

debug_41.tar.gz

Peak statistics in QC reports

It looks like IDR peaks are being used to generate raw peak statistics. The peak statistics for the raw peaks and IDR peaks are identical in the HTML report, and the reported numbers of raw peaks and IDR peaks are the same. Screenshots from the HTML report for one sample are below.

screen shot 2019-01-10 at 10 48 25 am
screen shot 2019-01-10 at 10 48 39 am
screen shot 2019-01-10 at 10 48 13 am

There is one set of peak statistics given at the end of the ataqc section of the QC JSON, but it does not specify which peak set those statistics are for. Peak statistics for Naive overlap peaks and IDR peaks are not explicitly given in the JSON. See snippet below.

    "ataqc": [
        {
            ...
            ...
            "Raw peaks": [
                49747,
                "OK"
            ],
            "Naive overlap peaks": [
                88397,
                "OK"
            ],
            "IDR peaks": [
                49747,
                "OK"
            ],
            "Min size": 150.0,
            "25 percentile": 488.0,
            "50 percentile (median)": 714.0,
            "75 percentile": 957.0,
            "Max size": 4710.0,
            "Mean": 743.380605866,
            "TSS_enrichment": 11.9348989018
        }

OS/Platform and dependencies

  • OS or Platform: Ubuntu 16.04.4 (durga)
  • Cromwell/dxWDL version: cromwell 34-unknown-SNAP
  • Conda version: conda 4.5.11

Pipeline fails with my data despite successful test run

I had a successful test run with a new 1.1.2 local installation. It is failing with my actual data (Xenopus laevis, built custom genome). I am starting the run from deduped bam files. After seeing the failure with v1.1.2, I tried a new local installation of 1.1.3. The error logs and terminal output for the 1.1.3 run are attached. I do not have access to machines on which I can install singularity or docker, so I'm restricted to the local installation.

$ conda --version
conda 4.5.11
$ uname -a
Linux node107.hpc.local 3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
cromwell-34

debug_issue62.tar.gz

4_10_issue62.stderr.txt
4_10_issue62.stdout.txt

Localization issues : cromwell just uses hard-links

Hi,
I've posted this issue on the cromwell github too.
So I'm running the ENCODE ATAC SEQ pipelineon a SGE cluster.
We don't allow hard-links in my facility (beegfs filesystem). Therefore I've been trying to use the localization parameters in the cromwell configuration file but to no avail. The backend file is being used since I can get errors message by putting non supported keyword in the localization array.

I've been trying it with different version of CROMWELL (30.2, 31, 32, 32)

Here is the script generated by cromwell based on my WDL file :

# make the directory which will keep the matching files
mkdir /sandbox/users/foucal-a/test_atac-pipe/cromwell-executions/atac/f4fd93fa-6f3a-42a6-94f2-459901d245c4/call-trim_adapter/shard-0/execution/glob-4f26c666d13d1cb48973da7f646a7de2

# symlink all the files into the glob directory
( ln -L merge_fastqs_R?_*.fastq.gz /sandbox/users/foucal-a/test_atac-pipe/cromwell-executions/atac/f4fd93fa-6f3a-42a6-94f2-459901d245c4/call-trim_adapter/shard-0/execution/glob-4f26c666d13d1cb48973da7f646a7de2 2> /dev/null ) || ( ln merge_fastqs_R?_*.fastq.gz /sandbox/users/foucal-a/test_atac-pipe/cromwell-executions/atac/f4fd93fa-6f3a-42a6-94f2-459901d245c4/call-trim_adapter/shard-0/execution/glob-4f26c666d13d1cb48973da7f646a7de2 )

# list all the files that match the glob into a file called glob-[md5 of glob].list
ls -1 /sandbox/users/foucal-a/test_atac-pipe/cromwell-executions/atac/f4fd93fa-6f3a-42a6-94f2-459901d245c4/call-trim_adapter/shard-0/execution/glob-4f26c666d13d1cb48973da7f646a7de2 > /sandbox/users/foucal-a/test_atac-pipe/cromwell-executions/atac/f4fd93fa-6f3a-42a6-94f2-459901d245c4/call-trim_adapter/shard-0/execution/glob-4f26c666d13d1cb48973da7f646a7de2.list

I have the error when the script tries to symlink all the files into the glob directory.
Here is the WDL code :

 scatter( i in range(length(fastqs_)) ) {
                # trim adapters and merge trimmed fastqs
                call trim_adapter { input :
                        fastqs = fastqs_[i],
                        adapters = if length(adapters_)>0 then adapters_[i] else [],
                        paired_end = paired_end,
                }
                # align trimmed/merged fastqs with bowtie2s
                call bowtie2 { input :
                        idx_tar = bowtie2_idx_tar,
                        fastqs = trim_adapter.trimmed_merged_fastqs, #[R1,R2]
                        paired_end = paired_end,
                        multimapping = multimapping,
                }
        }

With the function :

task trim_adapter { # trim adapters and merge trimmed fastqs
        # parameters from workflow
        Array[Array[File]] fastqs               # [merge_id][read_end_id]
        Array[Array[String]] adapters   # [merge_id][read_end_id]
        Boolean paired_end
        # mandatory
        Boolean? auto_detect_adapter    # automatically detect/trim adapters
        # optional
        Int? min_trim_len               # minimum trim length for cutadapt -m
        Float? err_rate                 # Maximum allowed adapter error rate
                                                        # for cutadapt -e
        # resource
        Int? cpu
        Int? mem_mb
        Int? time_hr
        #Commenting this line as a test. PRoblem with hard link
        String? disks

        command {
                python $(which encode_trim_adapter.py) \
                        ${write_tsv(fastqs)} \
                        --adapters ${write_tsv(adapters)} \
                        ${if paired_end then "--paired-end" else ""} \
                        ${if select_first([auto_detect_adapter,false]) then "--auto-detect-adapter" else ""} \
                        ${"--min-trim-len " + select_first([min_trim_len,5])} \
                        ${"--err-rate " + select_first([err_rate,'0.1'])} \
                        ${"--nth " + select_first([cpu,2])}
        }
        output {
                # WDL glob() globs in an alphabetical order
                # so R1 and R2 can be switched, which results in an
                # unexpected behavior of a workflow
                # so we prepend merge_fastqs_'end'_ (R1 or R2)
                # to the basename of original filename
                # this prefix will be later stripped in bowtie2 task
                Array[File] trimmed_merged_fastqs = glob("merge_fastqs_R?_*.fastq.gz")
        }
        runtime {
                cpu : select_first([cpu,2])
                memory : "${select_first([mem_mb,'12000'])} MB"
                time : select_first([time_hr,24])
                disks : select_first([disks,"local-disk 100 HDD"])
        }
}

My backend.conf :

include required(classpath("application"))

backend {
  default="SGE"
  providers {
    SGE {
      actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
      config {
        concurrent-job-limit = 10000
        runtime-attributes= """
        Int? cpu=1
        Int? memory=4
        String? disks
        String? time
        String? preemptible
        """
        submit = """
        qsub \
            -terse \
            -V \
            -b n \
            -wd ${cwd} \
            -N ${job_name} \
            ${'-pe smp ' + cpu} \
            ${'-l h_vmem=' + memory + "G"} \
            -o ${out} \
            -e ${err} \
            ${script}
        """
        kill = "qdel ${job_id}"
        check-alive = "qstat -j ${job_id}"
        job-id-regex = "(\\d+)"

        filesystems {
          local {
            localization: [
              "soft-link","copy","hard-link"
            ]
            caching {
              duplication-strategy: [ "soft-link","copy","hard-link"]
              hashing-strategy: "file"
            }
          }
        }
      }
    }
  }
}
engine{
        filesystems{
                local{
                        localization: [
                                "soft-link","copy","hard-link"
                                ]
                        caching {
                                duplication-strategy: [ "soft-link","copy","hard-link"]
              hashing-strategy: "file"
            }
          }
       }
}

I wonder if there is something wrong with my config files or if Cromwell's localization is at fault.

Macs2 p-val threshold change

It seems that a user can't adjust the Macs2 callpeak p-val threshold from 0.1? I might be missing something but how can one do this within the pipeline? Thank you.

nodup bam

Hi Jin,
Could you provide an example .json file for starting with nodup bam instead of fastq?

Thanks,
Kirsty

Error in read genome tsv module

When running the pipeline on my samples, the following error occurs at the read_genome_tsv step:

Bad output 'read_genome_tsv.genome': java.io.IOException: Could not read from ~/atac-seq-pipeline/cromwell-executions/atac/8ffa8456-f560-45ae-b478-8a32c14b8e90/call-read_genome_tsv/
execution/tmp.tsv: File ~/atac-seq-pipeline/cromwell-executions/atac/8ffa8456-f560-45ae-b478-8a32c14b8e90/call-read_genome_tsv/execution/tmp.tsv is larger than 128000 Bytes. Maximum
 read limits can be adjusted in the configuration under system.input-read-limits.

I'm using mm10_no_alt_analysis_set_ENCODE.fasta.gz as my reference genome, which is 830M and seems to be causing the error?

This discussion suggests increasing limits in the call to Java (using java -Dsystem.input-read-limits.lines=500000 -jar /cromwell-34.jar). Is that the recommended solution in this case as well?

OS/Platform and dependencies

  • OS or Platform: SGE cluster
  • Cromwell/dxWDL version: cromwell-34.jar
  • Conda version: conda 4.3.30

Please find logs here:
debug_75.tar.gz

Error in pipeline

I get the following error when running the atac-seq pipeline:

[2018-06-27 23:46:30,67] [error] WorkflowManagerActor Workflow 9f195d75-602d-4df8-822d-d4737d7c99c8 failed (during ExecutingWorkflowState): Job atac.xcor:0:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /home/simon/git/atacomate/pipeline/cromwell-executions/atac/9f195d75-602d-4df8-822d-d4737d7c99c8/call-xcor/shard-0/execution/stderr.
 Traceback (most recent call last):
  File "/software/atac-seq-pipeline/src/encode_xcor.py", line 102, in <module>
    main()
  File "/software/atac-seq-pipeline/src/encode_xcor.py", line 91, in main
    ta_subsampled, args.speak, args.nth, args.out_dir)
  File "/software/atac-seq-pipeline/src/encode_xcor.py", line 53, in xcor
    run_shell_cmd(cmd1)
  File "/software/atac-seq-pipeline/src/encode_common.py", line 230, in run_shell_cmd
    os.killpg(pgid, signal.SIGKILL)
OSError: [Errno 3] No such process

Job atac.spr:0:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /home/simon/git/atacomate/pipeline/cromwell-executions/atac/9f195d75-602d-4df8-822d-d4737d7c99c8/call-spr/shard-0/execution/stderr.
 Traceback (most recent call last):
  File "/software/atac-seq-pipeline/src/encode_spr.py", line 130, in <module>
    main()
  File "/software/atac-seq-pipeline/src/encode_spr.py", line 116, in main
    ta_pr1, ta_pr2 = spr_pe(args.ta, args.out_dir)
  File "/software/atac-seq-pipeline/src/encode_spr.py", line 81, in spr_pe
    run_shell_cmd(cmd1)
  File "/software/atac-seq-pipeline/src/encode_common.py", line 230, in run_shell_cmd
    os.killpg(pgid, signal.SIGKILL)
OSError: [Errno 3] No such process

Failed to evaluate job outputs:
Bad output 'macs2.bfilt_npeak': Failed to find index Success(WomInteger(0)) on array:

Success([])

0
Bad output 'macs2.bfilt_npeak_bb': Failed to find index Success(WomInteger(0)) on array:

Success([])

0
Bad output 'macs2.sig_pval': Failed to find index Success(WomInteger(0)) on array:

Success([])

0
Bad output 'macs2.frip_qc': Failed to find index Success(WomInteger(0)) on array:

Success([])

0
cromwell.backend.standard.StandardAsyncExecutionActor$$anon$2: Failed to evaluate job outputs:
Bad output 'macs2.bfilt_npeak': Failed to find index Success(WomInteger(0)) on array:

Success([])

0
Bad output 'macs2.bfilt_npeak_bb': Failed to find index Success(WomInteger(0)) on array:

Success([])

0
Bad output 'macs2.sig_pval': Failed to find index Success(WomInteger(0)) on array:

Success([])

0
Bad output 'macs2.frip_qc': Failed to find index Success(WomInteger(0)) on array:

Success([])

0
	at cromwell.backend.standard.StandardAsyncExecutionActor.$anonfun$handleExecutionSuccess$1(StandardAsyncExecutionActor.scala:786)
	at scala.util.Success.$anonfun$map$1(Try.scala:251)
	at scala.util.Success.map(Try.scala:209)
	at scala.concurrent.Future.$anonfun$map$1(Future.scala:288)
	at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)
	at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
	at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
	at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
	at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
	at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91)
	at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:43)
	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

The command I used:

java -jar -Dconfig.file=../../atac-seq-pipeline/backends/backend.conf cromwell-32.jar run ../../atac-seq-pipeline/atac.wdl -i input.json -o ../../atac-seq-pipeline/workflow_opts/docker.json

The contents of input.json:

{
    "atac.pipeline_type" : "atac",
    "atac.genome_tsv" : "/data/genome_data/hg38/hg38.tsv",
    "atac.fastqs" : [[
	    [
		 "/home/simon/git/atacomate/Supp_GSM2977488_hESC_ATAC/SRR6667571_pass_1.fastq.gz",
		 "/home/simon/git/atacomate/Supp_GSM2977488_hESC_ATAC/SRR6667571_pass_2.fastq.gz"
    ]]],

    "atac.paired_end" : true,
    "atac.multimapping" : 4,

    "atac.trim_adapter.auto_detect_adapter" : true,

    "atac.bowtie2.cpu" : 12,
    "atac.bowtie2.mem_mb" : 16000,
    "atac.bowtie2.time_hr" : 36,

    "atac.filter.cpu" : 2,
    "atac.filter.mem_mb" : 12000,
    "atac.filter.time_hr" : 23,

    "atac.macs2_mem_mb" : 16000,

    "atac.smooth_win" : 73,
    "atac.enable_idr" : true,
    "atac.idr_thresh" : 0.05,

    "atac.qc_report.name" : "GSM2977488",
    "atac.qc_report.desc" : "hESC_ATAC"
}

Any ideas? Please let me know if you need more information.

ATACQC from QC sequencing

Hello,

When I ran a low number of reads through the pipeline (9800 reads), I didn't get some of the QC data such as chrM% and MAQPfiltered%. Is there a threshold for read count?

Error running test data on SLURM cluster

Hi Jin,
I'm a little new to programming, so I hope you will bear with me.

I installed the pipeline on our cluster following the SLURM instructions, and downloaded the test data and genome. I think everything worked fine.
$ ls atac-seq-pipeline/ atac.wdl conda cromwell-workflow-logs ENCSR356KRQ_fastq_subsampled.tar Jenkinsfile src test_genome_database_hg38_atac.tar backends cromwell-34.jar docker_image examples LICENSE test test_sample bash_logs cromwell-executions docs genome README.md test_genome_database workflow_opts

I didn't fully understand step 5, but I think I filled in the right info.
$ cat workflow_opts/slurm.json { "default_runtime_attributes" : { "slurm_partition" : "neuro-largemem", "slurm_account" : "neuro", "singularity_container" : "~/.singularity/atac-seq-pipeline-v1.1.1.simg" } }

However, when I tried to run the pipeline on test data it initiates but errors out.
[2018-11-28 17:42:19,32] [error] WorkflowManagerActor Workflow 09488c8c-8169-4359-9a30-44eb45387723 failed (during ExecutingWorkflowState): Job atac.trim_adapter:0:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details. Check the content of stderr for potential additional information: /home/eaclark/software/atac-seq-pipeline/cromwell-executions/atac/09488c8c-8169-4359-9a30-44eb45387723/call-trim_adapter/shard-0/execution/stderr. ImportError: No module named site ln: failed to access โ€˜merge_fastqs_R?_*.fastq.gzโ€™: No such file or directory

I'm not sure what might be wrong that it can't import the module, so I'm hoping you can give me some advice?
Thank you!
debug_[eirn.a.clark].tar.gz

Error when in installing conda dependencies.sh

Hi, When I try to install conda dependencies I get this error. Can I install the dependencies manually?
Or do you have an idea to how can I fix this? This is the only error I get 1 second after trying the shell script.

Tried in two different places with an exact same error. Also, in the documentation you say:

bash installers/uninstall_dependencies.sh

but uninstall_dependencies.sh is located at the conda folder, not installers folder.

Thank you.

bash conda/install_dependencies.sh                                             
/ru-auth/local/home/trezende/localPrograms/Miniconda/bin/conda
=== Found Conda (conda 4.5.11).
=== Installing packages for python3 env...
Solving environment: failed

CondaHTTPError: HTTP 404 NOT FOUND for url <https://conda.anaconda.org/r/noarch/repodata.json>
Elapsed: 00:00.020204
CF-RAY: 45d006588c6d9200-EWR

The remote server could not find the noarch directory for the
requested channel with url: https://conda.anaconda.org/r

As of conda 4.3, a valid channel must contain a `noarch/repodata.json` and
associated `noarch/repodata.json.bz2` file, even if `noarch/repodata.json` is
empty. please request that the channel administrator create
`noarch/repodata.json` and associated `noarch/repodata.json.bz2` files.
$ mkdir noarch
$ echo '{}' > noarch/repodata.json
$ bzip2 -k noarch/repodata.json

You will need to adjust your conda configuration to proceed.
Use `conda config --show channels` to view your configuration's current state.
Further configuration help can be found at <https://conda.io/docs/config.html>.

Install Dependencies Error

I have been following the tutorial for running the pipeline on SGE using conda. When I install_dependencies I run into the following error and the conda environment is not created:

`Verifying transaction: failed

PaddingError: Placeholder of length '80' too short in package /ifs/scratch/columbia/CSCI/Passegue/Paul_Dellorusso/miniconda3/envs/encode-atac-seq-pipeline-python3/bin/hb-ot-shape-closure.
The package must be rebuilt with conda-build > 2.0.`

I looked into my conda-build version, and it appears to be using conda-build 3.17.5. So I don't see how that could be contributing to this error (unless the read and write filter support indicated in the output matters?):

`ha4c6n8:atac-seq-pipeline(master)] conda build -V
read filter "zstd" is not supported
write filter "zstd" is not supported
conda-build 3.17.5`

I deleted the pipeline and re-started the tutorial from scratch, and still ran into the same error. This is the entire run information:

`ha4c6n8:atac-seq-pipeline(master)] bash conda/uninstall_dependencies.sh
/ifs/scratch/columbia/CSCI/Passegue/Paul_Dellorusso/bin/miniconda3/bin/conda
=== Found Conda (conda 4.5.12).
=== Pipeline's py3 Conda env (encode-atac-seq-pipeline-python3) does not exist or has already been removed.
=== Pipeline's Conda env (encode-atac-seq-pipeline) does not exist or has already been removed.
ha4c6n8:atac-seq-pipeline(master)] bash conda/install_dependencies.sh 
/ifs/scratch/columbia/CSCI/Passegue/Paul_Dellorusso/bin/miniconda3/bin/conda
=== Found Conda (conda 4.5.12).
=== Installing packages for python3 env...
Solving environment: done

## Package Plan ##

  environment location: /ifs/scratch/columbia/CSCI/Passegue/Paul_Dellorusso/miniconda3/envs/encode-atac-seq-pipeline-python3

  added / updated specs: 
    - bedtools==2.26.0
    - idr==2.0.4.2
    - java-jdk==8.0.92
    - libgcc==5.2.0
    - matplotlib==1.5.1
    - ncurses==6.1
    - numpy==1.11.3
    - openblas==0.2.20
    - python==3.5.0
    - tabix==0.2.6


The following NEW packages will be INSTALLED:

    bedtools:        2.26.0-0                              bioconda   
    blas:            1.1-openblas                          conda-forge
    ca-certificates: 2018.11.29-ha4d7672_0                 conda-forge
    cairo:           1.14.6-0                              conda-forge
    certifi:         2018.8.24-py35_1001                   conda-forge
    cycler:          0.10.0-py_1                           conda-forge
    fontconfig:      2.11.1-6                              conda-forge
    freetype:        2.6.3-1                               conda-forge
    gettext:         0.19.8.1-h5e8e0c9_1                   conda-forge
    glib:            2.56.2-h464dc38_1                     conda-forge
    harfbuzz:        1.0.6-1                               conda-forge
    icu:             56.1-4                                conda-forge
    idr:             2.0.4.2-py35h24bf2e0_0                bioconda   
    java-jdk:        8.0.92-1                              bioconda   
    jpeg:            9c-h470a237_1                         conda-forge
    libffi:          3.2.1-hfc679d8_5                      conda-forge
    libgcc:          5.2.0-0                               conda-forge
    libgcc-ng:       7.2.0-hdf63c60_3                      conda-forge
    libgfortran:     3.0.0-1                               conda-forge
    libiconv:        1.15-h470a237_3                       conda-forge
    libpng:          1.6.36-ha92aebf_0                     conda-forge
    libstdcxx-ng:    7.2.0-hdf63c60_3                      conda-forge
    libtiff:         4.0.6-5                               conda-forge
    libxml2:         2.9.3-8                               conda-forge
    matplotlib:      1.5.1-np111py35_4                     conda-forge
    ncurses:         6.1-hfc679d8_2                        conda-forge
    numpy:           1.11.3-py35_blas_openblashd3ea46f_205 conda-forge [blas_openblas]
    openblas:        0.2.20-8                              conda-forge
    openssl:         1.0.2p-h470a237_1                     conda-forge
    pango:           1.40.1-0                              conda-forge
    pcre:            8.41-hfc679d8_3                       conda-forge
    pip:             18.0-py35_1001                        conda-forge
    pixman:          0.34.0-h470a237_3                     conda-forge
    pyparsing:       2.3.0-py_0                            conda-forge
    pyqt:            4.11.4-py35_3                         conda-forge
    python:          3.5.0-1                                          
    python-dateutil: 2.7.5-py_0                            conda-forge
    pytz:            2018.7-py_0                           conda-forge
    qt:              4.8.7-6                               conda-forge
    readline:        6.2-2                                            
    scipy:           1.1.0-py35_blas_openblash7943236_201  conda-forge [blas_openblas]
    setuptools:      40.4.3-py35_0                         conda-forge
    sip:             4.18-py35_1                           conda-forge
    six:             1.11.0-py35_1                         conda-forge
    sqlite:          3.19.3-1                              conda-forge
    tabix:           0.2.6-ha92aebf_0                      bioconda   
    tk:              8.5.19-2                              conda-forge
    wheel:           0.32.0-py35_1000                      conda-forge
    xz:              5.0.5-1                               conda-forge
    zlib:            1.2.11-h470a237_3                     conda-forge

Preparing transaction: done
Verifying transaction: failed

PaddingError: Placeholder of length '80' too short in package /ifs/scratch/columbia/CSCI/Passegue/Paul_Dellorusso/miniconda3/envs/encode-atac-seq-pipeline-python3/bin/hb-ot-shape-closure.
The package must be rebuilt with conda-build > 2.0.`

Thanks for any guidance.

Paul

Core dumped in call-filter step

The pipeline fails in the call-filter step:

INFO    2018-12-02 11:16:03     MarkDuplicates  Reads are assumed to be ordered by: coordinate
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f8211ec5344, pid=65067, tid=65200
#
# JRE version: OpenJDK Runtime Environment (11.0.1+13) (build 11.0.1+13-LTS)
# Java VM: OpenJDK 64-Bit Server VM (11.0.1+13-LTS, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x7f8344]  G1ParScanThreadState::copy_to_survivor_space(InCSetState, oopDesc*, markOopDesc*)+0x334
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P" (or dumping to /mnt/lab_data/montgomery/nicolerg/motrpac/atac/pipeline-output/test-new/cromwell-executions/atac/938324f7-a83a-4947-809e-e44338f6374b/call-filter/shard-0/execution/core.65067)
#
# An error report file with more information is saved as:
# /mnt/lab_data/montgomery/nicolerg/motrpac/atac/pipeline-output/test-new/cromwell-executions/atac/938324f7-a83a-4947-809e-e44338f6374b/call-filter/shard-0/execution/hs_err_pid65067.log
#
# If you would like to submit a bug report, please visit:
#   http://www.azulsystems.com/support/
#
Aborted (core dumped)

I am using the "atac.keep_irregular_chr_in_bfilt_peak" : false option.

OS/Platform and dependencies

  • OS or Platform: Ubuntu 16.04.4 (durga)
  • Cromwell/dxWDL version: cromwell 34-unknown-SNAP
  • Conda version: conda 4.5.11

Error logs are attached.
debug_issue63.tar.gz

Adapting your pipeline to run in SGE (JHPCE cluster): specify qsub shell + ataqc mem + finding picard.jar

Hi,

We were able to run your pipeline using Cromwell 34 at JHPCE http://www.jhpce.jhu.edu/ with some effort. We have some questions that we might have missed in the docs.

Issue 1: conda env

We noticed that the qsub'ed jobs were not finding the conda env. So we edited our ~/.bashrc file to load the conda env. However, the backend configuration file uses /bin/sh instead of /bin/bash and thus we had to edit the backend conf file: at https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/backends/backend.conf#L138.

## For forcing the activation of the conda ATAC seq env
if [[ $HOSTNAME == compute-* ]]; then
       echo "Activating the ATAC seq conda environment"
       source activate encode-atac-seq-pipeline
fi

While the issue with the conda environment not been activated was unexpected, is there an option we missed for specifying the shell for the qsub calls? Looks like there isn't and this is a small edit that a user can do.

Issue 2: ataqc memory

By default the ataqc step uses max 16,000 MB and you can see that value specified in both the java options and the memory requested for SGE. That is all ok, except that for some reason we seem to need an extra 4GB beyond the max java limit in SGE at JHPCE.

Is there an option we can use to increase the max SGE mem for the ataqc step while keeping the max mem at 16,000 for the java options? (or for only controlling the java options?). It could be that we just need to fork your pipeline and make a small edit in the atac.wdl file.

For example, we might need to edit https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/atac.wdl#L976 to read

export _JAVA_OPTIONS="-Xms256M -Xmx16000M -XX:ParallelGCThreads=1"

and then run the pipeline with atac.ataqc.mem_mb set to 21,000 Mb. (if that options is valid, since it's not mentioned in https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/docs/input.md#resource)

Issue 3: which picard.jar

When the pipeline crashed due to memory in ataqc we debugged things by modifying the script.submit script but then ran into an error with which picard.jar. For some reason,

def locate_picard():
try:
cmd='which picard.jar'
ret=run_shell_cmd(cmd)
return ret
crashed and didn't go into the exception. We solved this again by editing the ~/.bashrc file with:

## For using a specific picard.jar
if [[ $HOSTNAME == compute-* ]]; then
       echo "Tricks for picard"
       export PATH=/users/bbarry/.conda/envs/encode-atac-seq-pipeline/share/picard-2.10.6-0:$PATH
fi

I don't know if you've seen this before. It could again be an issue with the OS version at JHPCE.

Let us know if you need more info!

Best,
Leonardo and @BriannaBarry

(cc @andrewejaffe)

OS/Platform and dependencies

  • OS or Platform: SGE Cluster
  • Cromwell/dxWDL version: Cromwell v34 (we could try v36 but doubt our issues are cromwell-specific)
  • Conda version: 4.3.22

JHPCE info: /dcl01/lieber/ajaffe/Brianna/jaffe_lab/atac-seq-pipeline/cromwell-executions/atac/198ad417-0ccf-4a84-8cee-2707166bc53b/call-ataqc/shard-1/execution (in case we need to revisit this later)

xcor not completing on Sherlock singularity

I'm running the pipeline with Singularity on Sherlock 2.0 and the xcor job doesn't work on the test data:

[2018-10-14 01:26:02,28] [error] WorkflowManagerActor Workflow b3ff0230-746e-42f6-bea3-9a6aafa8ceb2 failed (during ExecutingWorkflowState): Job atac.xcor:0:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /scratch/users/knoedler/atac-seq-pipeline/cromwell-executions/atac/b3ff0230-746e-42f6-bea3-9a6aafa8ceb2/call-xcor/shard-0/execution/stderr.
Traceback (most recent call last):
File "/software/atac-seq-pipeline/src/encode_xcor.py", line 102, in
main()
File "/software/atac-seq-pipeline/src/encode_xcor.py", line 91, in main
ta_subsampled, args.speak, args.nth, args.out_dir)
File "/software/atac-seq-pipeline/src/encode_xcor.py", line 53, in xcor
run_shell_cmd(cmd1)
File "/software/atac-seq-pipeline/src/encode_common.py", line 230, in run_shell_cmd
os.killpg(pgid, signal.SIGKILL)
OSError: [Errno 3] No such process
ln: failed to access '.cc.plot.pdf': No such file or directory
ln: failed to access '
.cc.plot.png': No such file or directory
ln: failed to access '.cc.qc': No such file or directory
ln: failed to access '
.cc.fraglen.txt': No such file or directory

Job atac.xcor:1:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /scratch/users/knoedler/atac-seq-pipeline/cromwell-executions/atac/b3ff0230-746e-42f6-bea3-9a6aafa8ceb2/call-xcor/shard-1/execution/stderr.
Traceback (most recent call last):
File "/software/atac-seq-pipeline/src/encode_xcor.py", line 102, in
main()
File "/software/atac-seq-pipeline/src/encode_xcor.py", line 91, in main
ta_subsampled, args.speak, args.nth, args.out_dir)
File "/software/atac-seq-pipeline/src/encode_xcor.py", line 53, in xcor
run_shell_cmd(cmd1)
File "/software/atac-seq-pipeline/src/encode_common.py", line 230, in run_shell_cmd
os.killpg(pgid, signal.SIGKILL)
OSError: [Errno 3] No such process
ln: failed to access '.cc.plot.pdf': No such file or directory
ln: failed to access '
.cc.plot.png': No such file or directory
ln: failed to access '.cc.qc': No such file or directory
ln: failed to access '
.cc.fraglen.txt': No such file or directory

version cromwell

Very nice tool, thanks for valuable work! I used this pipeline and run on slurm, for this goal, I followed official guidances, seems two guidances:

one is tutorial_slurm.md the other is slurm

At first, I think it's good to complement each other, but there are something conflict, my question: 1 if I meet this problem, I follow which guide? 2 Now I use which cromwell version here? 3 Should I install WOMtool and related toolchain (scala...) for use cromwell to finish this pipeline, it also make me feel very confused? Thank you!!!

Error in the pipeline for test data

Hi Jin,

I have pulled the ATAC-Seq pipeline and started experimenting with the test data. Initially it ran and executed. Today I am getting the following error. I am using sge_singularity backend to run the pipeline.

$ singularity --version
2.5.2-dist

Pulled Singularity image for ATAC-Seq is as follows

$ ls -lrth ~/.singularity/
total 2.6G
-rwxr-xr-x 1 padmanabs1 reslnusers 1.1G Sep 21 09:42 chip-seq-pipeline-v1.1.simg
drwxr-xr-x 2 padmanabs1 reslnusers 4.4K Sep 24 09:09 docker
drwxr-xr-x 2 padmanabs1 reslnusers  192 Sep 24 09:09 metadata
-rwxr-xr-x 1 padmanabs1 reslnusers 1.2G Sep 24 09:10 atac-seq-pipeline-v1.1.simg

And the commands to run the pipeline are

INPUT=examples/local/ENCSR356KRQ_subsampled.json
java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=sge_singularity cromwell-34.jar run atac.wdl -i ${INPUT} -o workflow_opts/sge.json

More info on sge.json

$ cat workflow_opts/sge.json
{
    "default_runtime_attributes" : {
        "sge_pe" : "smp",
        "sge_queue" : "all.q",
        "singularity_container" : "~/.singularity/atac-seq-pipeline-v1.1.simg"
    }
}

The error I am getting is

[2018-09-26 13:51:45,54] [error] WorkflowManagerActor Workflow 59fb6fa8-c5bc-4928-9d8a-6a8fea701b24 failed (during ExecutingWorkflowState): Job atac.trim_adapter:0:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.

Seems like unable to access the directory

Check the content of stderr for potential additional information: $ cromwell-executions/atac/59fb6fa8-c5bc-4928-9d8a-6a8fea701b24/call-trim_adapter/shard-0/execution/stderr.
 Traceback (most recent call last):
  File "/software/atac-seq-pipeline/src/encode_trim_adapter.py", line 269, in <module>
    main()
  File "/software/atac-seq-pipeline/src/encode_trim_adapter.py", line 233, in main
    fastqs = ret_val.get(BIG_INT)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
    raise self._value
OSError: [Errno 3] No such process
ln: failed to access 'merge_fastqs_R?_*.fastq.gz': No such file or directory

Please help me fix this issue.

OS/Platform and dependencies

  • OS or Platform: CentOS Linux release 7.4.1708
  • Cromwell/dxWDL version: Cromwell-34
  • Conda version: 4.5.10

Attach logs
I have attached the logs.
debug_10.tar.gz

Error while running test data SGE Singularity on Cromwell server mode

Describe the bug
Hi Jin,

I am running the atac-seq pipeline documentation for running the pipeline on Cromwell server mode for SGE Signularity. I have started the cromwell-server using the command on a qlogin interactive node

$ _JAVA_OPTIONS="-Xmx5G" java -jar -Dconfig.file=backend.conf -Dbackend.default=sge_singularity cromwell-34.jar server
2018-10-02 12:39:42,353 cromwell-system-akka.dispatchers.engine-dispatcher-7 INFO  - Cromwell 34 service started on 0:0:0:0:0:0:0:0:8000...

After that I ran the following command to run the pipeline

$INPUT=ENCSR356KRQ_subsampled.json
$curl -X POST --header "Accept: application/json" -v "ServerIP:8000/api/workflows/v1" \
    -F [email protected] \
    -F workflowInputs=@${INPUT} \
    -F workflowOptions=@workflow_opts/sge.json

The job was submitted to the cromwell server running on the qlogin interactive node. But I get the following error within the first step on the pipeline which is reading the test_genome.

2018-10-02 12:40:56,446 cromwell-system-akka.dispatchers.backend-dispatcher-74 INFO  - DispatchedConfigAsyncJobExecutionActor [UUID(5d79e08e)atac.read_genome_tsv:NA:1]: `cat $ATAC/cromwell-executions/atac/5d79e08e-65d8-49bd-8a66-632b5cdf284f/call-read_genome_tsv/inputs/1073766326/hg38_local.tsv`
2018-10-02 12:40:56,494 cromwell-system-akka.dispatchers.backend-dispatcher-74 INFO  - DispatchedConfigAsyncJobExecutionActor [UUID(5d79e08e)atac.read_genome_tsv:NA:1]: executing: echo "chmod u+x $ATAC/cromwell-executions/atac/5d79e08e-65d8-49bd-8a66-632b5cdf284f/call-read_genome_tsv/execution/script && SINGULARITY_BINDPATH=$(echo $ATAC/cromwell-executions/atac/5d79e08e-65d8-49bd-8a66-632b5cdf284f/call-read_genome_tsv | sed 's/cromwell-executions/\n/g' | head -n1) singularity  exec     ~/.singularity/atac-seq-pipeline-v1.1.simg $ATAC/cromwell-executions/atac/5d79e08e-65d8-49bd-8a66-632b5cdf284f/call-read_genome_tsv/execution/script" | qsub \
-terse \
-b n \
-N cromwell_5d79e08e_read_genome_tsv \
-wd $ATAC/cromwell-executions/atac/5d79e08e-65d8-49bd-8a66-632b5cdf284f/call-read_genome_tsv \
-o $ATAC/cromwell-executions/atac/5d79e08e-65d8-49bd-8a66-632b5cdf284f/call-read_genome_tsv/execution/stdout \
-e $ATAC/cromwell-executions/atac/5d79e08e-65d8-49bd-8a66-632b5cdf284f/call-read_genome_tsv/execution/stderr \
  \
-l h_vmem=4000m \
-l s_vmem=4000m \
-l h_rt=3600 \
-l s_rt=3600 \
-q all.q \
 \
 \
-V
2018-10-02 12:40:57,094 cromwell-system-akka.dispatchers.backend-dispatcher-72 ERROR - DispatchedConfigAsyncJobExecutionActor [UUID(5d79e08e)atac.read_genome_tsv:NA:1]: Error attempting to Execute
java.lang.RuntimeException: Could not find job ID from stdout file. Check the stderr file for possible errors: $ATAC/cromwell-executions/atac/5d79e08e-65d8-49bd-8a66-632b5cdf284f/call-read_genome_tsv/execution/stderr.submit
        at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.getJob(ConfigAsyncJobExecutionActor.scala:226)
        at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.$anonfun$execute$2(SharedFileSystemAsyncJobExecutionActor.scala:133)
        at scala.util.Either.fold(Either.scala:188)
        at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.execute(SharedFileSystemAsyncJobExecutionActor.scala:126)
        at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.execute$(SharedFileSystemAsyncJobExecutionActor.scala:121)
        at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.execute(ConfigAsyncJobExecutionActor.scala:208)
        at cromwell.backend.standard.StandardAsyncExecutionActor.$anonfun$executeAsync$1(StandardAsyncExecutionActor.scala:600)
        at scala.util.Try$.apply(Try.scala:209)
        at cromwell.backend.standard.StandardAsyncExecutionActor.executeAsync(StandardAsyncExecutionActor.scala:600)
        at cromwell.backend.standard.StandardAsyncExecutionActor.executeAsync$(StandardAsyncExecutionActor.scala:600)
        at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.executeAsync(ConfigAsyncJobExecutionActor.scala:208)
        at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover(StandardAsyncExecutionActor.scala:915)
        at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover$(StandardAsyncExecutionActor.scala:907)
        at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.executeOrRecover(ConfigAsyncJobExecutionActor.scala:208)
        at cromwell.backend.async.AsyncBackendJobExecutionActor.$anonfun$robustExecuteOrRecover$1(AsyncBackendJobExecutionActor.scala:65)
        at cromwell.core.retry.Retry$.withRetry(Retry.scala:37)
        at cromwell.backend.async.AsyncBackendJobExecutionActor.withRetry(AsyncBackendJobExecutionActor.scala:61)
        at cromwell.backend.async.AsyncBackendJobExecutionActor.cromwell$backend$async$AsyncBackendJobExecutionActor$$robustExecuteOrRecover(AsyncBackendJobExecutionActor.scala:65)
        at cromwell.backend.async.AsyncBackendJobExecutionActor$$anonfun$receive$1.applyOrElse(AsyncBackendJobExecutionActor.scala:88)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
        at akka.actor.Actor.aroundReceive(Actor.scala:517)
        at akka.actor.Actor.aroundReceive$(Actor.scala:515)
        at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.aroundReceive(ConfigAsyncJobExecutionActor.scala:208)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
        at akka.actor.ActorCell.invoke(ActorCell.scala:557)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
        at akka.dispatch.Mailbox.run(Mailbox.scala:225)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-10-02 12:40:57,673 cromwell-system-akka.dispatchers.engine-dispatcher-66 ERROR - WorkflowManagerActor Workflow 5d79e08e-65d8-49bd-8a66-632b5cdf284f failed (during ExecutingWorkflowState): cromwell.core.CromwellFatalException: java.lang.RuntimeException: Could not find job ID from stdout file. Check the stderr file for possible errors: $ATAC/cromwell-executions/atac/5d79e08e-65d8-49bd-8a66-632b5cdf284f/call-read_genome_tsv/execution/stderr.submit
        at cromwell.core.CromwellFatalException$.apply(core.scala:18)
        at cromwell.core.retry.Retry$$anonfun$withRetry$1.applyOrElse(Retry.scala:38)
        at cromwell.core.retry.Retry$$anonfun$withRetry$1.applyOrElse(Retry.scala:37)
        at scala.concurrent.Future.$anonfun$recoverWith$1(Future.scala:413)
        at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:37)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
        at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
        at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
        at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
        at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91)
        at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.RuntimeException: Could not find job ID from stdout file. Check the stderr file for possible errors: $ATAC/cromwell-executions/atac/5d79e08e-65d8-49bd-8a66-632b5cdf284f/call-read_genome_tsv/execution/stderr.submit
        at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.getJob(ConfigAsyncJobExecutionActor.scala:226)
        at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.$anonfun$execute$2(SharedFileSystemAsyncJobExecutionActor.scala:133)
        at scala.util.Either.fold(Either.scala:188)
        at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.execute(SharedFileSystemAsyncJobExecutionActor.scala:126)
        at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.execute$(SharedFileSystemAsyncJobExecutionActor.scala:121)
        at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.execute(ConfigAsyncJobExecutionActor.scala:208)
        at cromwell.backend.standard.StandardAsyncExecutionActor.$anonfun$executeAsync$1(StandardAsyncExecutionActor.scala:600)
        at scala.util.Try$.apply(Try.scala:209)
        at cromwell.backend.standard.StandardAsyncExecutionActor.executeAsync(StandardAsyncExecutionActor.scala:600)
        at cromwell.backend.standard.StandardAsyncExecutionActor.executeAsync$(StandardAsyncExecutionActor.scala:600)
        at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.executeAsync(ConfigAsyncJobExecutionActor.scala:208)
        at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover(StandardAsyncExecutionActor.scala:915)
        at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover$(StandardAsyncExecutionActor.scala:907)
        at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.executeOrRecover(ConfigAsyncJobExecutionActor.scala:208)
        at cromwell.backend.async.AsyncBackendJobExecutionActor.$anonfun$robustExecuteOrRecover$1(AsyncBackendJobExecutionActor.scala:65)
        at cromwell.core.retry.Retry$.withRetry(Retry.scala:37)
        at cromwell.backend.async.AsyncBackendJobExecutionActor.withRetry(AsyncBackendJobExecutionActor.scala:61)
        at cromwell.backend.async.AsyncBackendJobExecutionActor.cromwell$backend$async$AsyncBackendJobExecutionActor$$robustExecuteOrRecover(AsyncBackendJobExecutionActor.scala:65)
        at cromwell.backend.async.AsyncBackendJobExecutionActor$$anonfun$receive$1.applyOrElse(AsyncBackendJobExecutionActor.scala:88)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
        at akka.actor.Actor.aroundReceive(Actor.scala:517)
        at akka.actor.Actor.aroundReceive$(Actor.scala:515)
        at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.aroundReceive(ConfigAsyncJobExecutionActor.scala:208)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
        at akka.actor.ActorCell.invoke(ActorCell.scala:557)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
        at akka.dispatch.Mailbox.run(Mailbox.scala:225)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
        ... 4 more

2018-10-02 12:40:57,678 cromwell-system-akka.dispatchers.engine-dispatcher-66 INFO  - WorkflowManagerActor WorkflowActor-5d79e08e-65d8-49bd-8a66-632b5cdf284f is in a terminal state: WorkflowFailedState

OS/Platform and dependencies

  • OS or Platform: CentOS Linux release 7.4.1708
  • Cromwell/dxWDL version: Cromwell-34
  • Conda version: 4.5.10

Attach logs
I am attaching the logs here
debug_40.tar.gz

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.