encode-dcc / atac-seq-pipeline Goto Github PK
View Code? Open in Web Editor NEWENCODE ATAC-seq pipeline
License: MIT License
ENCODE ATAC-seq pipeline
License: MIT License
Hi! I'm getting an error when running encode-atac-seq-pipeline with test data. I've installed it locally with Conda. I think the problem could be related with call-trim_adapter but I don't know how to fix it. I post the log here:
[2018-10-03 19:36:33,60] [info] Running with database db.url = jdbc:hsqldb:mem:7053f622-c273-4e8c-9de4-e8d2b6ac8888;shutdown=false;hsqldb.tx=mvcc
[2018-10-03 19:36:39,46] [info] Running migration RenameWorkflowOptionsInMetadata with a read batch size of 100000 and a write batch size of 100000
[2018-10-03 19:36:39,48] [info] [RenameWorkflowOptionsInMetadata] 100%
[2018-10-03 19:36:39,56] [info] Running with database db.url = jdbc:hsqldb:mem:c6327cac-109b-4c33-8b59-4d807ad2cdcb;shutdown=false;hsqldb.tx=mvcc
[2018-10-03 19:36:39,84] [�[38;5;220mwarn�[0m] This actor factory is deprecated. Please use cromwell.backend.google.pipelines.v1alpha2.PipelinesApiLifecycleActorFactory for PAPI v1 or cromwell.backend.google.pipelines.v2alpha1.PipelinesApiLifecycleActorFactory for PAPI v2
[2018-10-03 19:36:39,85] [�[38;5;220mwarn�[0m] Couldn't find a suitable DSN, defaulting to a Noop one.
[2018-10-03 19:36:39,85] [info] Using noop to send events.
[2018-10-03 19:36:40,01] [info] Slf4jLogger started
[2018-10-03 19:36:40,14] [info] Workflow heartbeat configuration:
{
"cromwellId" : "cromid-bb0edc6",
"heartbeatInterval" : "2 minutes",
"ttl" : "10 minutes",
"writeBatchSize" : 10000,
"writeThreshold" : 10000
}
[2018-10-03 19:36:40,17] [info] Metadata summary refreshing every 2 seconds.
[2018-10-03 19:36:40,20] [info] KvWriteActor configured to flush with batch size 200 and process rate 5 seconds.
[2018-10-03 19:36:40,20] [info] WriteMetadataActor configured to flush with batch size 200 and process rate 5 seconds.
[2018-10-03 19:36:40,22] [info] CallCacheWriteActor configured to flush with batch size 100 and process rate 3 seconds.
[2018-10-03 19:36:40,70] [info] JobExecutionTokenDispenser - Distribution rate: 50 per 1 seconds.
[2018-10-03 19:36:40,71] [info] JES batch polling interval is 33333 milliseconds
[2018-10-03 19:36:40,71] [info] JES batch polling interval is 33333 milliseconds
[2018-10-03 19:36:40,72] [info] JES batch polling interval is 33333 milliseconds
[2018-10-03 19:36:40,72] [info] PAPIQueryManager Running with 3 workers
[2018-10-03 19:36:40,73] [info] SingleWorkflowRunnerActor: Version 34
[2018-10-03 19:36:40,73] [info] SingleWorkflowRunnerActor: Submitting workflow
[2018-10-03 19:36:40,76] [info] Unspecified type (Unspecified version) workflow 6b7a8369-13c6-46a6-8aaf-777ad039f3b1 submitted
[2018-10-03 19:36:40,79] [info] SingleWorkflowRunnerActor: Workflow submitted �[38;5;2m6b7a8369-13c6-46a6-8aaf-777ad039f3b1�[0m
[2018-10-03 19:36:40,80] [info] 1 new workflows fetched
[2018-10-03 19:36:40,80] [info] WorkflowManagerActor Starting workflow �[38;5;2m6b7a8369-13c6-46a6-8aaf-777ad039f3b1�[0m
[2018-10-03 19:36:40,80] [�[38;5;220mwarn�[0m] SingleWorkflowRunnerActor: received unexpected message: Done in state RunningSwraData
[2018-10-03 19:36:40,80] [info] WorkflowManagerActor Successfully started WorkflowActor-6b7a8369-13c6-46a6-8aaf-777ad039f3b1
[2018-10-03 19:36:40,80] [info] Retrieved 1 workflows from the WorkflowStoreActor
[2018-10-03 19:36:40,81] [info] WorkflowStoreHeartbeatWriteActor configured to flush with batch size 10000 and process rate 2 minutes.
[2018-10-03 19:36:40,85] [info] MaterializeWorkflowDescriptorActor [�[38;5;2m6b7a8369�[0m]: Parsing workflow as WDL draft-2
[2018-10-03 19:36:48,39] [info] MaterializeWorkflowDescriptorActor [�[38;5;2m6b7a8369�[0m]: Call-to-Backend assignments: atac.filter -> Local, atac.overlap_pr -> Local, atac.macs2 -> Local, atac.macs2_ppr1 -> Local, atac.reproducibility_overlap -> Local, atac.pool_ta -> Local, atac.read_genome_tsv -> Local, atac.macs2_ppr2 -> Local, atac.idr -> Local, atac.macs2_pooled -> Local, atac.idr_ppr -> Local, atac.pool_ta_pr2 -> Local, atac.spr -> Local, atac.bowtie2 -> Local, atac.qc_report -> Local, atac.bam2ta -> Local, atac.xcor -> Local, atac.ataqc -> Local, atac.pool_ta_pr1 -> Local, atac.macs2_pr2 -> Local, atac.trim_adapter -> Local, atac.reproducibility_idr -> Local, atac.macs2_pr1 -> Local, atac.overlap_ppr -> Local, atac.idr_pr -> Local, atac.overlap -> Local
[2018-10-03 19:36:48,45] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,45] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [preemptible, disks, cpu, time, memory] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:48,46] [�[38;5;220mwarn�[0m] Local [�[38;5;2m6b7a8369�[0m]: Key/s [cpu, memory, time, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-10-03 19:36:50,61] [info] WorkflowExecutionActor-6b7a8369-13c6-46a6-8aaf-777ad039f3b1 [�[38;5;2m6b7a8369�[0m]: Starting atac.read_genome_tsv
[2018-10-03 19:36:50,61] [info] WorkflowExecutionActor-6b7a8369-13c6-46a6-8aaf-777ad039f3b1 [�[38;5;2m6b7a8369�[0m]: Condition met: '!align_only && !true_rep_only && enable_idr'. Running conditional section
[2018-10-03 19:36:50,61] [info] WorkflowExecutionActor-6b7a8369-13c6-46a6-8aaf-777ad039f3b1 [�[38;5;2m6b7a8369�[0m]: Condition met: 'enable_idr'. Running conditional section
[2018-10-03 19:36:50,61] [info] WorkflowExecutionActor-6b7a8369-13c6-46a6-8aaf-777ad039f3b1 [�[38;5;2m6b7a8369�[0m]: Condition met: '!align_only && !true_rep_only'. Running conditional section
[2018-10-03 19:36:50,62] [info] WorkflowExecutionActor-6b7a8369-13c6-46a6-8aaf-777ad039f3b1 [�[38;5;2m6b7a8369�[0m]: Condition met: 'enable_idr'. Running conditional section
[2018-10-03 19:36:50,62] [info] WorkflowExecutionActor-6b7a8369-13c6-46a6-8aaf-777ad039f3b1 [�[38;5;2m6b7a8369�[0m]: Condition met: '!disable_xcor'. Running conditional section
[2018-10-03 19:36:50,62] [info] WorkflowExecutionActor-6b7a8369-13c6-46a6-8aaf-777ad039f3b1 [�[38;5;2m6b7a8369�[0m]: Condition met: '!true_rep_only'. Running conditional section
[2018-10-03 19:36:50,77] [�[38;5;220mwarn�[0m] BackgroundConfigAsyncJobExecutionActor [�[38;5;2m6b7a8369�[0matac.read_genome_tsv:NA:1]: Unrecognized runtime attribute keys: disks, cpu, time, memory
[2018-10-03 19:36:51,03] [info] BackgroundConfigAsyncJobExecutionActor [�[38;5;2m6b7a8369�[0matac.read_genome_tsv:NA:1]: �[38;5;5mcat /media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-read_genome_tsv/inputs/1631258567/hg38_local.tsv�[0m
[2018-10-03 19:36:51,06] [info] BackgroundConfigAsyncJobExecutionActor [�[38;5;2m6b7a8369�[0matac.read_genome_tsv:NA:1]: executing: /bin/bash /media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-read_genome_tsv/execution/script
[2018-10-03 19:36:54,71] [info] WorkflowExecutionActor-6b7a8369-13c6-46a6-8aaf-777ad039f3b1 [�[38;5;2m6b7a8369�[0m]: Starting atac.trim_adapter (2 shards)
[2018-10-03 19:36:55,23] [info] BackgroundConfigAsyncJobExecutionActor [�[38;5;2m6b7a8369�[0matac.read_genome_tsv:NA:1]: job id: 6170
[2018-10-03 19:36:55,23] [info] BackgroundConfigAsyncJobExecutionActor [�[38;5;2m6b7a8369�[0matac.read_genome_tsv:NA:1]: Status change from - to Done
[2018-10-03 19:36:55,72] [�[38;5;220mwarn�[0m] BackgroundConfigAsyncJobExecutionActor [�[38;5;2m6b7a8369�[0matac.trim_adapter:0:1]: Unrecognized runtime attribute keys: disks, cpu, time, memory
[2018-10-03 19:36:55,72] [�[38;5;220mwarn�[0m] BackgroundConfigAsyncJobExecutionActor [�[38;5;2m6b7a8369�[0matac.trim_adapter:1:1]: Unrecognized runtime attribute keys: disks, cpu, time, memory
[2018-10-03 19:36:55,75] [info] BackgroundConfigAsyncJobExecutionActor [�[38;5;2m6b7a8369�[0matac.trim_adapter:0:1]: �[38;5;5mpython $(which encode_trim_adapter.py)
/media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-0/execution/write_tsv_6a0314610cecf7758f36a04f6f18802a.tmp
--adapters /media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-0/execution/write_tsv_d41d8cd98f00b204e9800998ecf8427e.tmp
--paired-end
--auto-detect-adapter
--min-trim-len 5
--err-rate 0.1
--nth 1�[0m
[2018-10-03 19:36:55,75] [info] BackgroundConfigAsyncJobExecutionActor [�[38;5;2m6b7a8369�[0matac.trim_adapter:0:1]: executing: /bin/bash /media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-0/execution/script
[2018-10-03 19:36:55,75] [info] BackgroundConfigAsyncJobExecutionActor [�[38;5;2m6b7a8369�[0matac.trim_adapter:1:1]: �[38;5;5mpython $(which encode_trim_adapter.py)
/media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-1/execution/write_tsv_4702c8116b4f355f887138a23f9f2e3d.tmp
--adapters /media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-1/execution/write_tsv_d41d8cd98f00b204e9800998ecf8427e.tmp
--paired-end
--auto-detect-adapter
--min-trim-len 5
--err-rate 0.1
--nth 1�[0m
[2018-10-03 19:36:55,76] [info] BackgroundConfigAsyncJobExecutionActor [�[38;5;2m6b7a8369�[0matac.trim_adapter:1:1]: executing: /bin/bash /media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-1/execution/script
[2018-10-03 19:37:00,22] [info] BackgroundConfigAsyncJobExecutionActor [�[38;5;2m6b7a8369�[0matac.trim_adapter:0:1]: job id: 6195
[2018-10-03 19:37:00,22] [info] BackgroundConfigAsyncJobExecutionActor [�[38;5;2m6b7a8369�[0matac.trim_adapter:1:1]: job id: 6203
[2018-10-03 19:37:00,22] [info] BackgroundConfigAsyncJobExecutionActor [�[38;5;2m6b7a8369�[0matac.trim_adapter:0:1]: Status change from - to Done
[2018-10-03 19:37:00,22] [info] BackgroundConfigAsyncJobExecutionActor [�[38;5;2m6b7a8369�[0matac.trim_adapter:1:1]: Status change from - to Done
[2018-10-03 19:37:00,87] [�[38;5;1merror�[0m] WorkflowManagerActor Workflow 6b7a8369-13c6-46a6-8aaf-777ad039f3b1 failed (during ExecutingWorkflowState): Job atac.trim_adapter:1:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-1/execution/stderr.
File "/media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-1/execution/write_tsv_4702c8116b4f355f887138a23f9f2e3d.tmp", line 1
/media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-1/inputs/-505442145/ENCFF641SFZ.subsampled.400.fastq.gz /media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-1/inputs/-505442144/ENCFF031ARQ.subsampled.400.fastq.gz
^
SyntaxError: invalid syntax
Job atac.trim_adapter:0:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-0/execution/stderr.
File "/media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-0/execution/write_tsv_6a0314610cecf7758f36a04f6f18802a.tmp", line 1
/media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-0/inputs/-1392945826/ENCFF341MYG.subsampled.400.fastq.gz /media/usuario/360AA7340AA6EFD3/Users/User/atac-seq-pipeline/cromwell-executions/atac/6b7a8369-13c6-46a6-8aaf-777ad039f3b1/call-trim_adapter/shard-0/inputs/-1392945825/ENCFF248EJF.subsampled.400.fastq.gz
^
SyntaxError: invalid syntax
[2018-10-03 19:37:00,87] [info] WorkflowManagerActor WorkflowActor-6b7a8369-13c6-46a6-8aaf-777ad039f3b1 is in a terminal state: WorkflowFailedState
[2018-10-03 19:37:07,81] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'.
[2018-10-03 19:37:10,23] [info] Workflow polling stopped
[2018-10-03 19:37:10,25] [info] Shutting down WorkflowStoreActor - Timeout = 5 seconds
[2018-10-03 19:37:10,25] [info] Shutting down WorkflowLogCopyRouter - Timeout = 5 seconds
[2018-10-03 19:37:10,25] [info] Aborting all running workflows.
[2018-10-03 19:37:10,25] [info] Shutting down JobExecutionTokenDispenser - Timeout = 5 seconds
[2018-10-03 19:37:10,25] [info] JobExecutionTokenDispenser stopped
[2018-10-03 19:37:10,25] [info] WorkflowStoreActor stopped
[2018-10-03 19:37:10,26] [info] WorkflowLogCopyRouter stopped
[2018-10-03 19:37:10,26] [info] Shutting down WorkflowManagerActor - Timeout = 3600 seconds
[2018-10-03 19:37:10,26] [info] WorkflowManagerActor All workflows finished
[2018-10-03 19:37:10,26] [info] WorkflowManagerActor stopped
[2018-10-03 19:37:10,26] [info] Connection pools shut down
[2018-10-03 19:37:10,26] [info] Shutting down SubWorkflowStoreActor - Timeout = 1800 seconds
[2018-10-03 19:37:10,26] [info] Shutting down JobStoreActor - Timeout = 1800 seconds
[2018-10-03 19:37:10,26] [info] Shutting down CallCacheWriteActor - Timeout = 1800 seconds
[2018-10-03 19:37:10,26] [info] SubWorkflowStoreActor stopped
[2018-10-03 19:37:10,26] [info] Shutting down ServiceRegistryActor - Timeout = 1800 seconds
[2018-10-03 19:37:10,26] [info] Shutting down DockerHashActor - Timeout = 1800 seconds
[2018-10-03 19:37:10,26] [info] JobStoreActor stopped
[2018-10-03 19:37:10,26] [info] Shutting down IoProxy - Timeout = 1800 seconds
[2018-10-03 19:37:10,26] [info] WriteMetadataActor Shutting down: 0 queued messages to process
[2018-10-03 19:37:10,26] [info] KvWriteActor Shutting down: 0 queued messages to process
[2018-10-03 19:37:10,26] [info] CallCacheWriteActor Shutting down: 0 queued messages to process
[2018-10-03 19:37:10,26] [info] DockerHashActor stopped
[2018-10-03 19:37:10,26] [info] IoProxy stopped
[2018-10-03 19:37:10,26] [info] CallCacheWriteActor stopped
[2018-10-03 19:37:10,26] [info] ServiceRegistryActor stopped
[2018-10-03 19:37:10,27] [info] Database closed
[2018-10-03 19:37:10,27] [info] Stream materializer shut down
Workflow 6b7a8369-13c6-46a6-8aaf-777ad039f3b1 transitioned to state Failed
[2018-10-03 19:37:10,30] [info] Automatic shutdown of the async connection
[2018-10-03 19:37:10,30] [info] Gracefully shutdown sentry threads.
[2018-10-03 19:37:10,30] [info] Shutdown finished.
OS/Platform and dependencies
Thank you so much!!
I get the following error when running the atac-seq pipeline:
[2018-06-27 23:46:30,67] [error] WorkflowManagerActor Workflow 9f195d75-602d-4df8-822d-d4737d7c99c8 failed (during ExecutingWorkflowState): Job atac.xcor:0:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /home/simon/git/atacomate/pipeline/cromwell-executions/atac/9f195d75-602d-4df8-822d-d4737d7c99c8/call-xcor/shard-0/execution/stderr.
Traceback (most recent call last):
File "/software/atac-seq-pipeline/src/encode_xcor.py", line 102, in <module>
main()
File "/software/atac-seq-pipeline/src/encode_xcor.py", line 91, in main
ta_subsampled, args.speak, args.nth, args.out_dir)
File "/software/atac-seq-pipeline/src/encode_xcor.py", line 53, in xcor
run_shell_cmd(cmd1)
File "/software/atac-seq-pipeline/src/encode_common.py", line 230, in run_shell_cmd
os.killpg(pgid, signal.SIGKILL)
OSError: [Errno 3] No such process
Job atac.spr:0:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /home/simon/git/atacomate/pipeline/cromwell-executions/atac/9f195d75-602d-4df8-822d-d4737d7c99c8/call-spr/shard-0/execution/stderr.
Traceback (most recent call last):
File "/software/atac-seq-pipeline/src/encode_spr.py", line 130, in <module>
main()
File "/software/atac-seq-pipeline/src/encode_spr.py", line 116, in main
ta_pr1, ta_pr2 = spr_pe(args.ta, args.out_dir)
File "/software/atac-seq-pipeline/src/encode_spr.py", line 81, in spr_pe
run_shell_cmd(cmd1)
File "/software/atac-seq-pipeline/src/encode_common.py", line 230, in run_shell_cmd
os.killpg(pgid, signal.SIGKILL)
OSError: [Errno 3] No such process
Failed to evaluate job outputs:
Bad output 'macs2.bfilt_npeak': Failed to find index Success(WomInteger(0)) on array:
Success([])
0
Bad output 'macs2.bfilt_npeak_bb': Failed to find index Success(WomInteger(0)) on array:
Success([])
0
Bad output 'macs2.sig_pval': Failed to find index Success(WomInteger(0)) on array:
Success([])
0
Bad output 'macs2.frip_qc': Failed to find index Success(WomInteger(0)) on array:
Success([])
0
cromwell.backend.standard.StandardAsyncExecutionActor$$anon$2: Failed to evaluate job outputs:
Bad output 'macs2.bfilt_npeak': Failed to find index Success(WomInteger(0)) on array:
Success([])
0
Bad output 'macs2.bfilt_npeak_bb': Failed to find index Success(WomInteger(0)) on array:
Success([])
0
Bad output 'macs2.sig_pval': Failed to find index Success(WomInteger(0)) on array:
Success([])
0
Bad output 'macs2.frip_qc': Failed to find index Success(WomInteger(0)) on array:
Success([])
0
at cromwell.backend.standard.StandardAsyncExecutionActor.$anonfun$handleExecutionSuccess$1(StandardAsyncExecutionActor.scala:786)
at scala.util.Success.$anonfun$map$1(Try.scala:251)
at scala.util.Success.map(Try.scala:209)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:288)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:43)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
The command I used:
java -jar -Dconfig.file=../../atac-seq-pipeline/backends/backend.conf cromwell-32.jar run ../../atac-seq-pipeline/atac.wdl -i input.json -o ../../atac-seq-pipeline/workflow_opts/docker.json
The contents of input.json
:
{
"atac.pipeline_type" : "atac",
"atac.genome_tsv" : "/data/genome_data/hg38/hg38.tsv",
"atac.fastqs" : [[
[
"/home/simon/git/atacomate/Supp_GSM2977488_hESC_ATAC/SRR6667571_pass_1.fastq.gz",
"/home/simon/git/atacomate/Supp_GSM2977488_hESC_ATAC/SRR6667571_pass_2.fastq.gz"
]]],
"atac.paired_end" : true,
"atac.multimapping" : 4,
"atac.trim_adapter.auto_detect_adapter" : true,
"atac.bowtie2.cpu" : 12,
"atac.bowtie2.mem_mb" : 16000,
"atac.bowtie2.time_hr" : 36,
"atac.filter.cpu" : 2,
"atac.filter.mem_mb" : 12000,
"atac.filter.time_hr" : 23,
"atac.macs2_mem_mb" : 16000,
"atac.smooth_win" : 73,
"atac.enable_idr" : true,
"atac.idr_thresh" : 0.05,
"atac.qc_report.name" : "GSM2977488",
"atac.qc_report.desc" : "hESC_ATAC"
}
Any ideas? Please let me know if you need more information.
I am experiencing an issue with the call-filter step. The pipeline fails with the following stderr:
Traceback (most recent call last):
File "/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/src/encode_filter.py", line 392, in
main()
File "/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/src/encode_filter.py", line 319, in main
filt_bam, args.out_dir)
File "/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/src/encode_filter.py", line 176, in mark_dup_picard
run_shell_cmd(cmd)
File "/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/src/encode_common.py", line 230, in run_shell_cmd
os.killpg(pgid, signal.SIGKILL)
OSError: [Errno 3] No such process
ln: failed to access ‘.flagstat.qc’: No such file or directory
ln: failed to access ‘.dup.qc’: No such file or directory
mkdir: cannot create directory ‘/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/cromwell-executions/atac/2ebc8ee7-ecae-441c-9df5-82898ae59666/call-filter/shard-0/execution/glob-37a62
59cc0c1dae299a7866489dff0bd’: File exists
ln: failed to create hard link ‘/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/cromwell-executions/atac/2ebc8ee7-ecae-441c-9df5-82898ae59666/call-filter/shard-0/execution/glob-37a62
59cc0c1dae299a7866489dff0bd/null’: File exists
ln: failed to access ‘.pbc.qc’: No such file or directory
mkdir: cannot create directory ‘/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/cromwell-executions/atac/2ebc8ee7-ecae-441c-9df5-82898ae59666/call-filter/shard-0/execution/glob-37a62
59cc0c1dae299a7866489dff0bd’: File exists
ln: failed to create hard link ‘/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/cromwell-executions/atac/2ebc8ee7-ecae-441c-9df5-82898ae59666/call-filter/shard-0/execution/glob-37a62
59cc0c1dae299a7866489dff0bd/null’: File exists
ln: failed to access ‘.mito_dup.txt’: No such file or directory
And stdout:
[2018-07-10 20:47:57,293 INFO] ['/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/src/encode_filter.py', '/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/cromwell-executions/atac/2ebc8
ee7-ecae-441c-9df5-82898ae59666/call-filter/shard-0/inputs/-4793129/SCS47_merge_R1.trim.merged.bam', '--paired-end', '--multimapping', '4', '--dup-marker', 'picard', '--mapq-thresh'
, '30']
[2018-07-10 20:47:57,294 INFO] Initializing and making output directory...
[2018-07-10 20:47:57,294 INFO] Removing unmapped/low-quality reads...
[2018-07-10 20:47:57,299 INFO] run_shell_cmd: PID=128254, CMD=samtools view -F 524 -f 2 -u /labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/cromwell-executions/atac/2ebc8ee7-ecae-441c
-9df5-82898ae59666/call-filter/shard-0/inputs/-4793129/SCS47_merge_R1.trim.merged.bam | sambamba sort -n /dev/stdin -o SCS47_merge_R1.trim.merged.tmp_filt.bam -t 1
[2018-07-10 21:28:04,936 INFO] run_shell_cmd: PID=132462, CMD=samtools view -h SCS47_merge_R1.trim.merged.tmp_filt.bam -@ 1 | $(which assign_multimappers.py) -k 4 --paired-end | sam
tools fixmate -r /dev/stdin SCS47_merge_R1.trim.merged.fixmate.bam
[2018-07-10 21:41:49,466 INFO] run_shell_cmd: PID=133516, CMD=rm -f SCS47_merge_R1.trim.merged.tmp_filt.bam
[2018-07-10 21:41:50,121 INFO] run_shell_cmd: PID=133518, CMD=samtools view -F 1804 -f 2 -u SCS47_merge_R1.trim.merged.fixmate.bam | sambamba sort /dev/stdin -o SCS47_merge_R1.trim.
merged.filt.bam -t 1
[2018-07-10 22:00:35,046 INFO] run_shell_cmd: PID=135008, CMD=rm -f SCS47_merge_R1.trim.merged.fixmate.bam
[2018-07-10 22:00:35,366 INFO] Marking dupes with picard...
[2018-07-10 22:00:35,371 INFO] run_shell_cmd: PID=135010, CMD=java -Xmx4G -jar $(which picard.jar) MarkDuplicates INPUT=SCS47_merge_R1.trim.merged.filt.bam OUTPUT=SCS47_merge_R1.tri
m.merged.dupmark.bam METRICS_FILE=SCS47_merge_R1.trim.merged.dup.qc VALIDATION_STRINGENCY=LENIENT ASSUME_SORTED=true REMOVE_DUPLICATES=false
PID=135010: which: no picard.jar in (/home/mdegorte/miniconda3/envs/encode-atac-seq-pipeline/bin:/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/src:/home/mdegorte/miniconda3/bin:/la
bs/smontgom/mdegorte/ATAC/atac-seq-pipeline/src:/home/mdegorte/miniconda3/bin:/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/src:/home/mdegorte/miniconda3/bin:/scg/slurm/current/bin
:/scg/slurm/current/sbin:/scg/slurm/utils:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/mdegorte/.bds:/home/mdegorte/.globus-cli-virtualenv/bin:/home
/mdegorte/bin:/home/mdegorte/.bds:/home/mdegorte/.globus-cli-virtualenv/bin:/home/mdegorte/.bds:/home/mdegorte/.globus-cli-virtualenv/bin:/home/mdegorte/bin)
PID=135010: Error: Unable to access jarfile MarkDuplicates
[2018-07-10 22:00:35,399 ERROR] Unknown exception caught. Killing process group 135010...
Traceback (most recent call last):
File "/labs/smontgom/mdegorte/ATAC/atac-seq-pipeline/src/encode_common.py", line 224, in run_shell_cmd
p.returncode, cmd)
CalledProcessError: Command 'java -Xmx4G -jar $(which picard.jar) MarkDuplicates INPUT=SCS47_merge_R1.trim.merged.filt.bam OUTPUT=SCS47_merge_R1.trim.merged.dupmark.bam METRICS_FILE
=SCS47_merge_R1.trim.merged.dup.qc VALIDATION_STRINGENCY=LENIENT ASSUME_SORTED=true REMOVE_DUPLICATES=false' returned non-zero exit status 1
It looks like an issue with picard. Any help would be appreciated. Thanks!
conda/requirements_py3.txt is missing requirements for running this pipeline.
For example, running in python3 with the pipeline fails because cutadapt
is not installed. This requirement is missing in requirements_py3.txt.
Hi,
I am guessing (please confirm so) that Corces MR et al., 2017 paper used this pipeline for their ATAC-seq data processing. I am wondering if they re-processed previously published data (GM12878 and CD4+) from Buenrostro JD et al., 2013 paper using this pipeline. Please let me know if it has been deposited with another accession number.
Again, it will save much of my time and compute power if I can get to know where the processed bed/bigBed files are deposited for Corces MR et al., 2017 paper.
Your help is very much appreciated!
Sincerely,
Satya
The pipeline runs and goes to completion, but when I check the qc report I get this error:
However, when I look at the directory I see the optimal set and conservative set files.
stderr output is:
ln: failed to access 'conservative_peak.*.hammock_gz*': No such file or directory
It doesn't appear to derail the pipeline, but there appears to be some error in the generation of that file for downstream usage. I did pull the most recent release before running this, and another run with an older release gave the same error.
OS/Platform and dependencies
Here's the entire debug log:
debug_68.tar.gz
I'm running the ATAC-SEQ pipeline on a toy file with 1 pair end read (2 fastq files, both 100 lines long). I would like ATAC-SEQ to output bam files, but it keeps on crashing at a stage near to the end.
I'm following the NIH's guide for running ATAC-SEQ, so I'm working with the SLURM scheduler using the NIH's settings (backend, Cromwell, and wdl) and a file stored by them for the hg19 genome. https://hpc.nih.gov/apps/encode-atac-seq-pipeline.html While I can run through their example, I cannot get my data to work.
I believe this issue has come up a few times in the past, but the original posters went inactive before a solution could be settled on. See https://github.com/ENCODE-DCC/chip-seq-pipeline2/issues/15 and https://github.com/ENCODE-DCC/atac-seq-pipeline/issues/17. I did try loading picard, but that did not change my results
Here is where the error message appears:
[2018-10-25 18:05:25,88] [error] WorkflowManagerActor Workflow 37f13417-8676-469f-b575-aa80b5ce6c24 failed (during ExecutingWorkflowState): cromwell.backend.standard.StandardAsyncExecutionActor$$anon$2: Failed to evaluate job outputs:
Bad output 'filter.mito_dup_log': Failed to find index Success(WomInteger(0)) on array:
Success([])
Here is my JSON input:
{
"atac.pipeline_type" : "atac",
"atac.genome_tsv" : "/fdb/encode-atac-seq-pipeline/hg19/hg19.tsv",
"atac.fastqs_rep1_R1" : ["SRX860_1.fastq.gz"],
"atac.fastqs_rep1_R2" : ["SRX860_2.fastq.gz"],
"atac.paired_end" : true,
"atac.align_only" : true
}
And here are the attached files:
debug_[44].tar.gz
SRX860_1.fastq.gz
SRX860_2.fastq.gz
Any help would be appreciated.
Thanks,
Jonathan
General Question
Hi Jin,
I am wondering is there any resume function in the script as I could not find anything close to it. The concept is that if there is a breakdown at certain task in the pipeline. The next time I run the pipeline it should not run from the beginning so that it starts from the broke down task.
OS/Platform and dependencies
Thank you for the construction of the pipeline and all the hard work.
I'd like to confirm with one issue: Do those bam files have the reads shifted? It seems many groups shifted reads +4/-5bp because of the insertion of adaptors by Tn5 transpose, before calling peaks with MACS2 or do footprint analysis.
Do I need to shift reads in the bam files or re-do peak calling with the shifted bam?
Thank you very much!
Hi,
I've posted this issue on the cromwell github too.
So I'm running the ENCODE ATAC SEQ pipelineon a SGE cluster.
We don't allow hard-links in my facility (beegfs filesystem). Therefore I've been trying to use the localization parameters in the cromwell configuration file but to no avail. The backend file is being used since I can get errors message by putting non supported keyword in the localization array.
I've been trying it with different version of CROMWELL (30.2, 31, 32, 32)
Here is the script generated by cromwell based on my WDL file :
# make the directory which will keep the matching files
mkdir /sandbox/users/foucal-a/test_atac-pipe/cromwell-executions/atac/f4fd93fa-6f3a-42a6-94f2-459901d245c4/call-trim_adapter/shard-0/execution/glob-4f26c666d13d1cb48973da7f646a7de2
# symlink all the files into the glob directory
( ln -L merge_fastqs_R?_*.fastq.gz /sandbox/users/foucal-a/test_atac-pipe/cromwell-executions/atac/f4fd93fa-6f3a-42a6-94f2-459901d245c4/call-trim_adapter/shard-0/execution/glob-4f26c666d13d1cb48973da7f646a7de2 2> /dev/null ) || ( ln merge_fastqs_R?_*.fastq.gz /sandbox/users/foucal-a/test_atac-pipe/cromwell-executions/atac/f4fd93fa-6f3a-42a6-94f2-459901d245c4/call-trim_adapter/shard-0/execution/glob-4f26c666d13d1cb48973da7f646a7de2 )
# list all the files that match the glob into a file called glob-[md5 of glob].list
ls -1 /sandbox/users/foucal-a/test_atac-pipe/cromwell-executions/atac/f4fd93fa-6f3a-42a6-94f2-459901d245c4/call-trim_adapter/shard-0/execution/glob-4f26c666d13d1cb48973da7f646a7de2 > /sandbox/users/foucal-a/test_atac-pipe/cromwell-executions/atac/f4fd93fa-6f3a-42a6-94f2-459901d245c4/call-trim_adapter/shard-0/execution/glob-4f26c666d13d1cb48973da7f646a7de2.list
I have the error when the script tries to symlink all the files into the glob directory.
Here is the WDL code :
scatter( i in range(length(fastqs_)) ) {
# trim adapters and merge trimmed fastqs
call trim_adapter { input :
fastqs = fastqs_[i],
adapters = if length(adapters_)>0 then adapters_[i] else [],
paired_end = paired_end,
}
# align trimmed/merged fastqs with bowtie2s
call bowtie2 { input :
idx_tar = bowtie2_idx_tar,
fastqs = trim_adapter.trimmed_merged_fastqs, #[R1,R2]
paired_end = paired_end,
multimapping = multimapping,
}
}
With the function :
task trim_adapter { # trim adapters and merge trimmed fastqs
# parameters from workflow
Array[Array[File]] fastqs # [merge_id][read_end_id]
Array[Array[String]] adapters # [merge_id][read_end_id]
Boolean paired_end
# mandatory
Boolean? auto_detect_adapter # automatically detect/trim adapters
# optional
Int? min_trim_len # minimum trim length for cutadapt -m
Float? err_rate # Maximum allowed adapter error rate
# for cutadapt -e
# resource
Int? cpu
Int? mem_mb
Int? time_hr
#Commenting this line as a test. PRoblem with hard link
String? disks
command {
python $(which encode_trim_adapter.py) \
${write_tsv(fastqs)} \
--adapters ${write_tsv(adapters)} \
${if paired_end then "--paired-end" else ""} \
${if select_first([auto_detect_adapter,false]) then "--auto-detect-adapter" else ""} \
${"--min-trim-len " + select_first([min_trim_len,5])} \
${"--err-rate " + select_first([err_rate,'0.1'])} \
${"--nth " + select_first([cpu,2])}
}
output {
# WDL glob() globs in an alphabetical order
# so R1 and R2 can be switched, which results in an
# unexpected behavior of a workflow
# so we prepend merge_fastqs_'end'_ (R1 or R2)
# to the basename of original filename
# this prefix will be later stripped in bowtie2 task
Array[File] trimmed_merged_fastqs = glob("merge_fastqs_R?_*.fastq.gz")
}
runtime {
cpu : select_first([cpu,2])
memory : "${select_first([mem_mb,'12000'])} MB"
time : select_first([time_hr,24])
disks : select_first([disks,"local-disk 100 HDD"])
}
}
My backend.conf :
include required(classpath("application"))
backend {
default="SGE"
providers {
SGE {
actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
config {
concurrent-job-limit = 10000
runtime-attributes= """
Int? cpu=1
Int? memory=4
String? disks
String? time
String? preemptible
"""
submit = """
qsub \
-terse \
-V \
-b n \
-wd ${cwd} \
-N ${job_name} \
${'-pe smp ' + cpu} \
${'-l h_vmem=' + memory + "G"} \
-o ${out} \
-e ${err} \
${script}
"""
kill = "qdel ${job_id}"
check-alive = "qstat -j ${job_id}"
job-id-regex = "(\\d+)"
filesystems {
local {
localization: [
"soft-link","copy","hard-link"
]
caching {
duplication-strategy: [ "soft-link","copy","hard-link"]
hashing-strategy: "file"
}
}
}
}
}
}
}
engine{
filesystems{
local{
localization: [
"soft-link","copy","hard-link"
]
caching {
duplication-strategy: [ "soft-link","copy","hard-link"]
hashing-strategy: "file"
}
}
}
}
I wonder if there is something wrong with my config files or if Cromwell's localization is at fault.
Hello,
When I ran a low number of reads through the pipeline (9800 reads), I didn't get some of the QC data such as chrM% and MAQPfiltered%. Is there a threshold for read count?
I have been following the tutorial for running the pipeline on SGE using conda. When I install_dependencies I run into the following error and the conda environment is not created:
`Verifying transaction: failed
PaddingError: Placeholder of length '80' too short in package /ifs/scratch/columbia/CSCI/Passegue/Paul_Dellorusso/miniconda3/envs/encode-atac-seq-pipeline-python3/bin/hb-ot-shape-closure.
The package must be rebuilt with conda-build > 2.0.`
I looked into my conda-build version, and it appears to be using conda-build 3.17.5. So I don't see how that could be contributing to this error (unless the read and write filter support indicated in the output matters?):
`ha4c6n8:atac-seq-pipeline(master)] conda build -V
read filter "zstd" is not supported
write filter "zstd" is not supported
conda-build 3.17.5`
I deleted the pipeline and re-started the tutorial from scratch, and still ran into the same error. This is the entire run information:
`ha4c6n8:atac-seq-pipeline(master)] bash conda/uninstall_dependencies.sh
/ifs/scratch/columbia/CSCI/Passegue/Paul_Dellorusso/bin/miniconda3/bin/conda
=== Found Conda (conda 4.5.12).
=== Pipeline's py3 Conda env (encode-atac-seq-pipeline-python3) does not exist or has already been removed.
=== Pipeline's Conda env (encode-atac-seq-pipeline) does not exist or has already been removed.
ha4c6n8:atac-seq-pipeline(master)] bash conda/install_dependencies.sh
/ifs/scratch/columbia/CSCI/Passegue/Paul_Dellorusso/bin/miniconda3/bin/conda
=== Found Conda (conda 4.5.12).
=== Installing packages for python3 env...
Solving environment: done
## Package Plan ##
environment location: /ifs/scratch/columbia/CSCI/Passegue/Paul_Dellorusso/miniconda3/envs/encode-atac-seq-pipeline-python3
added / updated specs:
- bedtools==2.26.0
- idr==2.0.4.2
- java-jdk==8.0.92
- libgcc==5.2.0
- matplotlib==1.5.1
- ncurses==6.1
- numpy==1.11.3
- openblas==0.2.20
- python==3.5.0
- tabix==0.2.6
The following NEW packages will be INSTALLED:
bedtools: 2.26.0-0 bioconda
blas: 1.1-openblas conda-forge
ca-certificates: 2018.11.29-ha4d7672_0 conda-forge
cairo: 1.14.6-0 conda-forge
certifi: 2018.8.24-py35_1001 conda-forge
cycler: 0.10.0-py_1 conda-forge
fontconfig: 2.11.1-6 conda-forge
freetype: 2.6.3-1 conda-forge
gettext: 0.19.8.1-h5e8e0c9_1 conda-forge
glib: 2.56.2-h464dc38_1 conda-forge
harfbuzz: 1.0.6-1 conda-forge
icu: 56.1-4 conda-forge
idr: 2.0.4.2-py35h24bf2e0_0 bioconda
java-jdk: 8.0.92-1 bioconda
jpeg: 9c-h470a237_1 conda-forge
libffi: 3.2.1-hfc679d8_5 conda-forge
libgcc: 5.2.0-0 conda-forge
libgcc-ng: 7.2.0-hdf63c60_3 conda-forge
libgfortran: 3.0.0-1 conda-forge
libiconv: 1.15-h470a237_3 conda-forge
libpng: 1.6.36-ha92aebf_0 conda-forge
libstdcxx-ng: 7.2.0-hdf63c60_3 conda-forge
libtiff: 4.0.6-5 conda-forge
libxml2: 2.9.3-8 conda-forge
matplotlib: 1.5.1-np111py35_4 conda-forge
ncurses: 6.1-hfc679d8_2 conda-forge
numpy: 1.11.3-py35_blas_openblashd3ea46f_205 conda-forge [blas_openblas]
openblas: 0.2.20-8 conda-forge
openssl: 1.0.2p-h470a237_1 conda-forge
pango: 1.40.1-0 conda-forge
pcre: 8.41-hfc679d8_3 conda-forge
pip: 18.0-py35_1001 conda-forge
pixman: 0.34.0-h470a237_3 conda-forge
pyparsing: 2.3.0-py_0 conda-forge
pyqt: 4.11.4-py35_3 conda-forge
python: 3.5.0-1
python-dateutil: 2.7.5-py_0 conda-forge
pytz: 2018.7-py_0 conda-forge
qt: 4.8.7-6 conda-forge
readline: 6.2-2
scipy: 1.1.0-py35_blas_openblash7943236_201 conda-forge [blas_openblas]
setuptools: 40.4.3-py35_0 conda-forge
sip: 4.18-py35_1 conda-forge
six: 1.11.0-py35_1 conda-forge
sqlite: 3.19.3-1 conda-forge
tabix: 0.2.6-ha92aebf_0 bioconda
tk: 8.5.19-2 conda-forge
wheel: 0.32.0-py35_1000 conda-forge
xz: 5.0.5-1 conda-forge
zlib: 1.2.11-h470a237_3 conda-forge
Preparing transaction: done
Verifying transaction: failed
PaddingError: Placeholder of length '80' too short in package /ifs/scratch/columbia/CSCI/Passegue/Paul_Dellorusso/miniconda3/envs/encode-atac-seq-pipeline-python3/bin/hb-ot-shape-closure.
The package must be rebuilt with conda-build > 2.0.`
Thanks for any guidance.
Paul
Hi Jin,
I'm a little new to programming, so I hope you will bear with me.
I installed the pipeline on our cluster following the SLURM instructions, and downloaded the test data and genome. I think everything worked fine.
$ ls atac-seq-pipeline/ atac.wdl conda cromwell-workflow-logs ENCSR356KRQ_fastq_subsampled.tar Jenkinsfile src test_genome_database_hg38_atac.tar backends cromwell-34.jar docker_image examples LICENSE test test_sample bash_logs cromwell-executions docs genome README.md test_genome_database workflow_opts
I didn't fully understand step 5, but I think I filled in the right info.
$ cat workflow_opts/slurm.json { "default_runtime_attributes" : { "slurm_partition" : "neuro-largemem", "slurm_account" : "neuro", "singularity_container" : "~/.singularity/atac-seq-pipeline-v1.1.1.simg" } }
However, when I tried to run the pipeline on test data it initiates but errors out.
[2018-11-28 17:42:19,32] [error] WorkflowManagerActor Workflow 09488c8c-8169-4359-9a30-44eb45387723 failed (during ExecutingWorkflowState): Job atac.trim_adapter:0:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details. Check the content of stderr for potential additional information: /home/eaclark/software/atac-seq-pipeline/cromwell-executions/atac/09488c8c-8169-4359-9a30-44eb45387723/call-trim_adapter/shard-0/execution/stderr. ImportError: No module named site ln: failed to access ‘merge_fastqs_R?_*.fastq.gz’: No such file or directory
I'm not sure what might be wrong that it can't import the module, so I'm hoping you can give me some advice?
Thank you!
debug_[eirn.a.clark].tar.gz
The pipeline fails in the call-filter
step:
INFO 2018-12-02 11:16:03 MarkDuplicates Reads are assumed to be ordered by: coordinate
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f8211ec5344, pid=65067, tid=65200
#
# JRE version: OpenJDK Runtime Environment (11.0.1+13) (build 11.0.1+13-LTS)
# Java VM: OpenJDK 64-Bit Server VM (11.0.1+13-LTS, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0x7f8344] G1ParScanThreadState::copy_to_survivor_space(InCSetState, oopDesc*, markOopDesc*)+0x334
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P" (or dumping to /mnt/lab_data/montgomery/nicolerg/motrpac/atac/pipeline-output/test-new/cromwell-executions/atac/938324f7-a83a-4947-809e-e44338f6374b/call-filter/shard-0/execution/core.65067)
#
# An error report file with more information is saved as:
# /mnt/lab_data/montgomery/nicolerg/motrpac/atac/pipeline-output/test-new/cromwell-executions/atac/938324f7-a83a-4947-809e-e44338f6374b/call-filter/shard-0/execution/hs_err_pid65067.log
#
# If you would like to submit a bug report, please visit:
# http://www.azulsystems.com/support/
#
Aborted (core dumped)
I am using the "atac.keep_irregular_chr_in_bfilt_peak" : false
option.
OS/Platform and dependencies
Error logs are attached.
debug_issue63.tar.gz
Hi, When I try to install conda dependencies I get this error. Can I install the dependencies manually?
Or do you have an idea to how can I fix this? This is the only error I get 1 second after trying the shell script.
Tried in two different places with an exact same error. Also, in the documentation you say:
bash installers/uninstall_dependencies.sh
but uninstall_dependencies.sh is located at the conda folder, not installers folder.
Thank you.
bash conda/install_dependencies.sh
/ru-auth/local/home/trezende/localPrograms/Miniconda/bin/conda
=== Found Conda (conda 4.5.11).
=== Installing packages for python3 env...
Solving environment: failed
CondaHTTPError: HTTP 404 NOT FOUND for url <https://conda.anaconda.org/r/noarch/repodata.json>
Elapsed: 00:00.020204
CF-RAY: 45d006588c6d9200-EWR
The remote server could not find the noarch directory for the
requested channel with url: https://conda.anaconda.org/r
As of conda 4.3, a valid channel must contain a `noarch/repodata.json` and
associated `noarch/repodata.json.bz2` file, even if `noarch/repodata.json` is
empty. please request that the channel administrator create
`noarch/repodata.json` and associated `noarch/repodata.json.bz2` files.
$ mkdir noarch
$ echo '{}' > noarch/repodata.json
$ bzip2 -k noarch/repodata.json
You will need to adjust your conda configuration to proceed.
Use `conda config --show channels` to view your configuration's current state.
Further configuration help can be found at <https://conda.io/docs/config.html>.
Hi Jin,
Could you provide an example .json file for starting with nodup bam instead of fastq?
Thanks,
Kirsty
I'm running the pipeline with Singularity on Sherlock 2.0 and the xcor job doesn't work on the test data:
[2018-10-14 01:26:02,28] [error] WorkflowManagerActor Workflow b3ff0230-746e-42f6-bea3-9a6aafa8ceb2 failed (during ExecutingWorkflowState): Job atac.xcor:0:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /scratch/users/knoedler/atac-seq-pipeline/cromwell-executions/atac/b3ff0230-746e-42f6-bea3-9a6aafa8ceb2/call-xcor/shard-0/execution/stderr.
Traceback (most recent call last):
File "/software/atac-seq-pipeline/src/encode_xcor.py", line 102, in
main()
File "/software/atac-seq-pipeline/src/encode_xcor.py", line 91, in main
ta_subsampled, args.speak, args.nth, args.out_dir)
File "/software/atac-seq-pipeline/src/encode_xcor.py", line 53, in xcor
run_shell_cmd(cmd1)
File "/software/atac-seq-pipeline/src/encode_common.py", line 230, in run_shell_cmd
os.killpg(pgid, signal.SIGKILL)
OSError: [Errno 3] No such process
ln: failed to access '.cc.plot.pdf': No such file or directory
ln: failed to access '.cc.plot.png': No such file or directory
ln: failed to access '.cc.qc': No such file or directory
ln: failed to access '.cc.fraglen.txt': No such file or directory
Job atac.xcor:1:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /scratch/users/knoedler/atac-seq-pipeline/cromwell-executions/atac/b3ff0230-746e-42f6-bea3-9a6aafa8ceb2/call-xcor/shard-1/execution/stderr.
Traceback (most recent call last):
File "/software/atac-seq-pipeline/src/encode_xcor.py", line 102, in
main()
File "/software/atac-seq-pipeline/src/encode_xcor.py", line 91, in main
ta_subsampled, args.speak, args.nth, args.out_dir)
File "/software/atac-seq-pipeline/src/encode_xcor.py", line 53, in xcor
run_shell_cmd(cmd1)
File "/software/atac-seq-pipeline/src/encode_common.py", line 230, in run_shell_cmd
os.killpg(pgid, signal.SIGKILL)
OSError: [Errno 3] No such process
ln: failed to access '.cc.plot.pdf': No such file or directory
ln: failed to access '.cc.plot.png': No such file or directory
ln: failed to access '.cc.qc': No such file or directory
ln: failed to access '.cc.fraglen.txt': No such file or directory
Users lost access to test sample fastqs and genome data.
Both work with a web browser but the OLD one doesn't work with wget
.
We need to replace storage.cloud.google.com
with storage.googleapis.com
.
Are TSS fold enrichment or FRiP for enhancers reported by this pipeline? An older version of the pipeline included those stats in the HTML report, but I cannot find them in these newer HTML reports (human or custom reference). I was able to find the TSS enrichment plots (distance from TSS vs average read coverage) but not a specific TSS fold enrichment value.
OS/Platform and dependencies
Hi,
would it be interesting to you to simplify the conda installation procedure?
I would be willing to help. There are a couple of things that could be done:
My goal would be to be able to set up the environment directly from an environment file.
OS/Platform and dependencies
I get an error while running the test data on the pipeline that says that it failed to find index Success(WomInteger(0)) on array. The error is shown below
WorkflowManagerActor Workflow 7a43f040-d436-4dbc-9596-8fbbccfa2827 failed (during ExecutingWorkflowState): cromwell.backend.standard.StandardAsyncExecutionActor$$anon$2: Failed to evaluate job outputs:
Bad output 'filter.flagstat_qc': Failed to find index Success(WomInteger(0)) on array:
Success([])
0
Bad output 'filter.dup_qc': Failed to find index Success(WomInteger(0)) on array:
Success([])
0
Bad output 'filter.pbc_qc': Failed to find index Success(WomInteger(0)) on array:
Success([])
0
Bad output 'filter.mito_dup_log': Failed to find index Success(WomInteger(0)) on array:
Success([])
0
at cromwell.backend.standard.StandardAsyncExecutionActor.$anonfun$handleExecutionSuccess$1(StandardAsyncExecutionActor.scala:839)
at scala.util.Success.$anonfun$map$1(Try.scala:251)
at scala.util.Success.map(Try.scala:209)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:288)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
I have attached the tar ball.
I have also captured my session using script which is attached as the typescript.
I followed the instructions posted on the github for Tutorial for general UNIX computers without docker. I'm not sure what this error message means and was hoping that you had some advice to fix this. Thanks!
Hi,
I was testing the local pipeline with conda. And it threw an error. I can't tell what is the problem by looking this error message. Can you tell me what I could possibly be doing wrong?
This is the command I used:
java -jar -Dconfig.file=backends/backend.conf cromwell-34.jar run atac.wdl -i examples/local/ENCSR356KRQ_subsampled.json | tee -a output.txt
Thanks!
[2018-09-20 16:10:58,17] [info] BackgroundConfigAsyncJobExecutionActor [142a7da3atac.pool_ta:NA:1]: Status change f
rom WaitingForReturnCodeFile to Done
^[[D^[[D^[[D^[f[2018-09-20 16:15:14,03] [info] BackgroundConfigAsyncJobExecutionActor [142a7da3atac.macs2:0:1]: Sta
tus change from WaitingForReturnCodeFile to Done
[2018-09-20 16:15:46,95] [info] BackgroundConfigAsyncJobExecutionActor [142a7da3atac.macs2:1:1]: Status change from
WaitingForReturnCodeFile to Done
[2018-09-20 16:15:47,22] [error] WorkflowManagerActor Workflow 142a7da3-bbb8-4762-9e12-e11886eb6c0c failed (during
ExecutingWorkflowState): Job atac.xcor:0:1 exited with return code 1 which has not been declared as a valid return
code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /home/trezende/atac-seq-pipeline/cromwell-executi
ons/atac/142a7da3-bbb8-4762-9e12-e11886eb6c0c/call-xcor/shard-0/execution/stderr.
Traceback (most recent call last):
File "/home/trezende/anaconda3/envs/encode-atac-seq-pipeline/bin/encode_xcor.py", line 102, in <module>
main()
File "/home/trezende/anaconda3/envs/encode-atac-seq-pipeline/bin/encode_xcor.py", line 91, in main
ta_subsampled, args.speak, args.nth, args.out_dir)
File "/home/trezende/anaconda3/envs/encode-atac-seq-pipeline/bin/encode_xcor.py", line 53, in xcor
run_shell_cmd(cmd1)
File "/home/trezende/anaconda3/envs/encode-atac-seq-pipeline/bin/encode_common.py", line 230, in run_shell_cmd
os.killpg(pgid, signal.SIGKILL)
OSError: [Errno 3] No such process
Job atac.xcor:1:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnRetu
rnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /home/trezende/atac-seq-pipeline/cromwell-executi
ons/atac/142a7da3-bbb8-4762-9e12-e11886eb6c0c/call-xcor/shard-1/execution/stderr.
Traceback (most recent call last):
File "/home/trezende/anaconda3/envs/encode-atac-seq-pipeline/bin/encode_xcor.py", line 102, in <module>
main()
File "/home/trezende/anaconda3/envs/encode-atac-seq-pipeline/bin/encode_xcor.py", line 91, in main
ta_subsampled, args.speak, args.nth, args.out_dir)
File "/home/trezende/anaconda3/envs/encode-atac-seq-pipeline/bin/encode_xcor.py", line 53, in xcor
run_shell_cmd(cmd1)
File "/home/trezende/anaconda3/envs/encode-atac-seq-pipeline/bin/encode_common.py", line 230, in run_shell_cmd
os.killpg(pgid, signal.SIGKILL)
OSError: [Errno 3] No such process
[2018-09-20 16:15:47,22] [info] WorkflowManagerActor WorkflowActor-142a7da3-bbb8-4762-9e12-e11886eb6c0c is in a ter
minal state: WorkflowFailedState
[2018-09-20 16:15:59,19] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'.
[2018-09-20 16:16:03,74] [info] Workflow polling stopped
Hi Jin,
The atac-seq pipeline fails at BackgroundConfigAsyncJobExecutionActor. I'm also getting a warning that "Localization via hard link has failed". Attached is my script and the output. I'm using coda version 4.5.11 and the .json file you provide.
Thanks,
Kirsty
Hi Jin,
I have pulled the ATAC-Seq pipeline and started experimenting with the test data. Initially it ran and executed. Today I am getting the following error. I am using sge_singularity backend to run the pipeline.
$ singularity --version
2.5.2-dist
Pulled Singularity image for ATAC-Seq is as follows
$ ls -lrth ~/.singularity/
total 2.6G
-rwxr-xr-x 1 padmanabs1 reslnusers 1.1G Sep 21 09:42 chip-seq-pipeline-v1.1.simg
drwxr-xr-x 2 padmanabs1 reslnusers 4.4K Sep 24 09:09 docker
drwxr-xr-x 2 padmanabs1 reslnusers 192 Sep 24 09:09 metadata
-rwxr-xr-x 1 padmanabs1 reslnusers 1.2G Sep 24 09:10 atac-seq-pipeline-v1.1.simg
And the commands to run the pipeline are
INPUT=examples/local/ENCSR356KRQ_subsampled.json
java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=sge_singularity cromwell-34.jar run atac.wdl -i ${INPUT} -o workflow_opts/sge.json
More info on sge.json
$ cat workflow_opts/sge.json
{
"default_runtime_attributes" : {
"sge_pe" : "smp",
"sge_queue" : "all.q",
"singularity_container" : "~/.singularity/atac-seq-pipeline-v1.1.simg"
}
}
The error I am getting is
[2018-09-26 13:51:45,54] [error] WorkflowManagerActor Workflow 59fb6fa8-c5bc-4928-9d8a-6a8fea701b24 failed (during ExecutingWorkflowState): Job atac.trim_adapter:0:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Seems like unable to access the directory
Check the content of stderr for potential additional information: $ cromwell-executions/atac/59fb6fa8-c5bc-4928-9d8a-6a8fea701b24/call-trim_adapter/shard-0/execution/stderr.
Traceback (most recent call last):
File "/software/atac-seq-pipeline/src/encode_trim_adapter.py", line 269, in <module>
main()
File "/software/atac-seq-pipeline/src/encode_trim_adapter.py", line 233, in main
fastqs = ret_val.get(BIG_INT)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
OSError: [Errno 3] No such process
ln: failed to access 'merge_fastqs_R?_*.fastq.gz': No such file or directory
Please help me fix this issue.
OS/Platform and dependencies
Attach logs
I have attached the logs.
debug_10.tar.gz
Describe the bug
Hi Jin,
I am running the atac-seq pipeline documentation for running the pipeline on Cromwell server mode for SGE Signularity. I have started the cromwell-server using the command on a qlogin interactive node
$ _JAVA_OPTIONS="-Xmx5G" java -jar -Dconfig.file=backend.conf -Dbackend.default=sge_singularity cromwell-34.jar server
2018-10-02 12:39:42,353 cromwell-system-akka.dispatchers.engine-dispatcher-7 INFO - Cromwell 34 service started on 0:0:0:0:0:0:0:0:8000...
After that I ran the following command to run the pipeline
$INPUT=ENCSR356KRQ_subsampled.json
$curl -X POST --header "Accept: application/json" -v "ServerIP:8000/api/workflows/v1" \
-F [email protected] \
-F workflowInputs=@${INPUT} \
-F workflowOptions=@workflow_opts/sge.json
The job was submitted to the cromwell server running on the qlogin interactive node. But I get the following error within the first step on the pipeline which is reading the test_genome.
2018-10-02 12:40:56,446 cromwell-system-akka.dispatchers.backend-dispatcher-74 INFO - DispatchedConfigAsyncJobExecutionActor [UUID(5d79e08e)atac.read_genome_tsv:NA:1]: `cat $ATAC/cromwell-executions/atac/5d79e08e-65d8-49bd-8a66-632b5cdf284f/call-read_genome_tsv/inputs/1073766326/hg38_local.tsv`
2018-10-02 12:40:56,494 cromwell-system-akka.dispatchers.backend-dispatcher-74 INFO - DispatchedConfigAsyncJobExecutionActor [UUID(5d79e08e)atac.read_genome_tsv:NA:1]: executing: echo "chmod u+x $ATAC/cromwell-executions/atac/5d79e08e-65d8-49bd-8a66-632b5cdf284f/call-read_genome_tsv/execution/script && SINGULARITY_BINDPATH=$(echo $ATAC/cromwell-executions/atac/5d79e08e-65d8-49bd-8a66-632b5cdf284f/call-read_genome_tsv | sed 's/cromwell-executions/\n/g' | head -n1) singularity exec ~/.singularity/atac-seq-pipeline-v1.1.simg $ATAC/cromwell-executions/atac/5d79e08e-65d8-49bd-8a66-632b5cdf284f/call-read_genome_tsv/execution/script" | qsub \
-terse \
-b n \
-N cromwell_5d79e08e_read_genome_tsv \
-wd $ATAC/cromwell-executions/atac/5d79e08e-65d8-49bd-8a66-632b5cdf284f/call-read_genome_tsv \
-o $ATAC/cromwell-executions/atac/5d79e08e-65d8-49bd-8a66-632b5cdf284f/call-read_genome_tsv/execution/stdout \
-e $ATAC/cromwell-executions/atac/5d79e08e-65d8-49bd-8a66-632b5cdf284f/call-read_genome_tsv/execution/stderr \
\
-l h_vmem=4000m \
-l s_vmem=4000m \
-l h_rt=3600 \
-l s_rt=3600 \
-q all.q \
\
\
-V
2018-10-02 12:40:57,094 cromwell-system-akka.dispatchers.backend-dispatcher-72 ERROR - DispatchedConfigAsyncJobExecutionActor [UUID(5d79e08e)atac.read_genome_tsv:NA:1]: Error attempting to Execute
java.lang.RuntimeException: Could not find job ID from stdout file. Check the stderr file for possible errors: $ATAC/cromwell-executions/atac/5d79e08e-65d8-49bd-8a66-632b5cdf284f/call-read_genome_tsv/execution/stderr.submit
at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.getJob(ConfigAsyncJobExecutionActor.scala:226)
at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.$anonfun$execute$2(SharedFileSystemAsyncJobExecutionActor.scala:133)
at scala.util.Either.fold(Either.scala:188)
at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.execute(SharedFileSystemAsyncJobExecutionActor.scala:126)
at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.execute$(SharedFileSystemAsyncJobExecutionActor.scala:121)
at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.execute(ConfigAsyncJobExecutionActor.scala:208)
at cromwell.backend.standard.StandardAsyncExecutionActor.$anonfun$executeAsync$1(StandardAsyncExecutionActor.scala:600)
at scala.util.Try$.apply(Try.scala:209)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeAsync(StandardAsyncExecutionActor.scala:600)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeAsync$(StandardAsyncExecutionActor.scala:600)
at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.executeAsync(ConfigAsyncJobExecutionActor.scala:208)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover(StandardAsyncExecutionActor.scala:915)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover$(StandardAsyncExecutionActor.scala:907)
at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.executeOrRecover(ConfigAsyncJobExecutionActor.scala:208)
at cromwell.backend.async.AsyncBackendJobExecutionActor.$anonfun$robustExecuteOrRecover$1(AsyncBackendJobExecutionActor.scala:65)
at cromwell.core.retry.Retry$.withRetry(Retry.scala:37)
at cromwell.backend.async.AsyncBackendJobExecutionActor.withRetry(AsyncBackendJobExecutionActor.scala:61)
at cromwell.backend.async.AsyncBackendJobExecutionActor.cromwell$backend$async$AsyncBackendJobExecutionActor$$robustExecuteOrRecover(AsyncBackendJobExecutionActor.scala:65)
at cromwell.backend.async.AsyncBackendJobExecutionActor$$anonfun$receive$1.applyOrElse(AsyncBackendJobExecutionActor.scala:88)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at akka.actor.Actor.aroundReceive(Actor.scala:517)
at akka.actor.Actor.aroundReceive$(Actor.scala:515)
at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.aroundReceive(ConfigAsyncJobExecutionActor.scala:208)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
at akka.actor.ActorCell.invoke(ActorCell.scala:557)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-10-02 12:40:57,673 cromwell-system-akka.dispatchers.engine-dispatcher-66 ERROR - WorkflowManagerActor Workflow 5d79e08e-65d8-49bd-8a66-632b5cdf284f failed (during ExecutingWorkflowState): cromwell.core.CromwellFatalException: java.lang.RuntimeException: Could not find job ID from stdout file. Check the stderr file for possible errors: $ATAC/cromwell-executions/atac/5d79e08e-65d8-49bd-8a66-632b5cdf284f/call-read_genome_tsv/execution/stderr.submit
at cromwell.core.CromwellFatalException$.apply(core.scala:18)
at cromwell.core.retry.Retry$$anonfun$withRetry$1.applyOrElse(Retry.scala:38)
at cromwell.core.retry.Retry$$anonfun$withRetry$1.applyOrElse(Retry.scala:37)
at scala.concurrent.Future.$anonfun$recoverWith$1(Future.scala:413)
at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:37)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.RuntimeException: Could not find job ID from stdout file. Check the stderr file for possible errors: $ATAC/cromwell-executions/atac/5d79e08e-65d8-49bd-8a66-632b5cdf284f/call-read_genome_tsv/execution/stderr.submit
at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.getJob(ConfigAsyncJobExecutionActor.scala:226)
at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.$anonfun$execute$2(SharedFileSystemAsyncJobExecutionActor.scala:133)
at scala.util.Either.fold(Either.scala:188)
at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.execute(SharedFileSystemAsyncJobExecutionActor.scala:126)
at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.execute$(SharedFileSystemAsyncJobExecutionActor.scala:121)
at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.execute(ConfigAsyncJobExecutionActor.scala:208)
at cromwell.backend.standard.StandardAsyncExecutionActor.$anonfun$executeAsync$1(StandardAsyncExecutionActor.scala:600)
at scala.util.Try$.apply(Try.scala:209)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeAsync(StandardAsyncExecutionActor.scala:600)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeAsync$(StandardAsyncExecutionActor.scala:600)
at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.executeAsync(ConfigAsyncJobExecutionActor.scala:208)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover(StandardAsyncExecutionActor.scala:915)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover$(StandardAsyncExecutionActor.scala:907)
at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.executeOrRecover(ConfigAsyncJobExecutionActor.scala:208)
at cromwell.backend.async.AsyncBackendJobExecutionActor.$anonfun$robustExecuteOrRecover$1(AsyncBackendJobExecutionActor.scala:65)
at cromwell.core.retry.Retry$.withRetry(Retry.scala:37)
at cromwell.backend.async.AsyncBackendJobExecutionActor.withRetry(AsyncBackendJobExecutionActor.scala:61)
at cromwell.backend.async.AsyncBackendJobExecutionActor.cromwell$backend$async$AsyncBackendJobExecutionActor$$robustExecuteOrRecover(AsyncBackendJobExecutionActor.scala:65)
at cromwell.backend.async.AsyncBackendJobExecutionActor$$anonfun$receive$1.applyOrElse(AsyncBackendJobExecutionActor.scala:88)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at akka.actor.Actor.aroundReceive(Actor.scala:517)
at akka.actor.Actor.aroundReceive$(Actor.scala:515)
at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.aroundReceive(ConfigAsyncJobExecutionActor.scala:208)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
at akka.actor.ActorCell.invoke(ActorCell.scala:557)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
... 4 more
2018-10-02 12:40:57,678 cromwell-system-akka.dispatchers.engine-dispatcher-66 INFO - WorkflowManagerActor WorkflowActor-5d79e08e-65d8-49bd-8a66-632b5cdf284f is in a terminal state: WorkflowFailedState
OS/Platform and dependencies
Attach logs
I am attaching the logs here
debug_40.tar.gz
Hi, thanks your wonderful work.
I run in/mypath/atac-seq-pipeline/
and source activate encode-atac-seq-pipeline
java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm /my_path/local/bin/cromwell-34.jar run atac.wdl -i /my_path1/input.json -o /my_path2/atac-seq-pipeline/workflow_opts/slurm.json
But just one file named "cromwell-workflow-logs" left but nothing in it
Jenkinsfile LICENSE README.md atac.wdl backends conda **cromwell-workflow-logs** docker_image docs examples genome src test workflow_opts
What's more, when it was running, it shows the following on the screen:
[2018-09-08 09:23:52,43] [info] Running with database db.url = jdbc:hsqldb:mem:a42fb754-58fc-418e-8224-01cd57b5b131;shutdown=false;hsqldb.tx=mvcc
[2018-09-08 09:24:01,66] [info] Running migration RenameWorkflowOptionsInMetadata with a read batch size of 100000 and a write batch size of 100000
[2018-09-08 09:24:01,67] [info] [RenameWorkflowOptionsInMetadata] 100%
[2018-09-08 09:24:01,78] [info] Running with database db.url = jdbc:hsqldb:mem:8c25714f-6a58-4b03-bf8d-b686ee8442fc;shutdown=false;hsqldb.tx=mvcc
[2018-09-08 09:24:02,13] [warn] This actor factory is deprecated. Please use cromwell.backend.google.pipelines.v1alpha2.PipelinesApiLifecycleActorFactory for
PAPI v1 or cromwell.backend.google.pipelines.v2alpha1.PipelinesApiLifecycleActorFactory for PAPI v2
[2018-09-08 09:24:02,16] [warn] Couldn't find a suitable DSN, defaulting to a Noop one.
[2018-09-08 09:24:02,16] [info] Using noop to send events.
[2018-09-08 09:24:02,44] [info] Slf4jLogger started
[2018-09-08 09:24:02,66] [info] Workflow heartbeat configuration:
{
"cromwellId" : "cromid-d9e2d67",
"heartbeatInterval" : "2 minutes",
"ttl" : "10 minutes",
"writeBatchSize" : 10000,
"writeThreshold" : 10000
}
[2018-09-08 09:24:02,69] [info] Metadata summary refreshing every 2 seconds.
[2018-09-08 09:24:02,72] [info] WriteMetadataActor configured to flush with batch size 200 and process rate 5 seconds.
[2018-09-08 09:24:02,72] [info] KvWriteActor configured to flush with batch size 200 and process rate 5 seconds.
[2018-09-08 09:24:02,72] [info] CallCacheWriteActor configured to flush with batch size 100 and process rate 3 seconds.
[2018-09-08 09:24:03,69] [info] JobExecutionTokenDispenser - Distribution rate: 50 per 1 seconds.
[2018-09-08 09:24:03,71] [info] SingleWorkflowRunnerActor: Version 34
[2018-09-08 09:24:03,71] [info] JES batch polling interval is 33333 milliseconds
[2018-09-08 09:24:03,71] [info] JES batch polling interval is 33333 milliseconds
[2018-09-08 09:24:03,71] [info] JES batch polling interval is 33333 milliseconds
[2018-09-08 09:24:03,71] [info] PAPIQueryManager Running with 3 workers
[2018-09-08 09:24:03,72] [info] SingleWorkflowRunnerActor: Submitting workflow
[2018-09-08 09:24:03,77] [info] Unspecified type (Unspecified version) workflow 1e03bf36-d64b-42a7-9857-a644de257de3 submitted
[2018-09-08 09:24:03,82] [info] SingleWorkflowRunnerActor: Workflow submitted 1e03bf36-d64b-42a7-9857-a644de257de3
[2018-09-08 09:24:03,82] [info] 1 new workflows fetched
[2018-09-08 09:24:03,82] [info] WorkflowManagerActor Starting workflow 1e03bf36-d64b-42a7-9857-a644de257de3
[2018-09-08 09:24:03,83] [warn] SingleWorkflowRunnerActor: received unexpected message: Done in state RunningSwraData
[2018-09-08 09:24:03,83] [info] WorkflowManagerActor Successfully started WorkflowActor-1e03bf36-d64b-42a7-9857-a644de257de3
[2018-09-08 09:24:03,83] [info] Retrieved 1 workflows from the WorkflowStoreActor
[2018-09-08 09:24:03,85] [info] WorkflowStoreHeartbeatWriteActor configured to flush with batch size 10000 and process rate 2 minutes.
[2018-09-08 09:24:03,89] [info] MaterializeWorkflowDescriptorActor [1e03bf36]: Parsing workflow as WDL draft-2
[2018-09-08 09:24:22,52] [error] WorkflowManagerActor Workflow 1e03bf36-d64b-42a7-9857-a644de257de3 failed (during MaterializingWorkflowDescriptorState): cro
mwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor$$anon$1: Workflow input processing failed:
Unexpected character ']' at input index 643 (line 13, position 5), expected JSON Value:
],
^
at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor.cromwell$engine$workflow$lifecycle$materialization$Materiali
zeWorkflowDescriptorActor$$workflowInitializationFailed(MaterializeWorkflowDescriptorActor.scala:200)
at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor$$anonfun$2.applyOrElse(MaterializeWorkflowDescriptorActor.sc
ala:170)
at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor$$anonfun$2.applyOrElse(MaterializeWorkflowDescriptorActor.sc
ala:165)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
at akka.actor.FSM.processEvent(FSM.scala:670)
at akka.actor.FSM.processEvent$(FSM.scala:667)
at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor.akka$actor$LoggingFSM$$super$processEvent(MaterializeWorkflo
wDescriptorActor.scala:123)
at akka.actor.LoggingFSM.processEvent(FSM.scala:806)
at akka.actor.LoggingFSM.processEvent$(FSM.scala:788)
at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor.processEvent(MaterializeWorkflowDescriptorActor.scala:123)
at akka.actor.FSM.akka$actor$FSM$$processMsg(FSM.scala:664)
at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:658)
at akka.actor.Actor.aroundReceive(Actor.scala:517)
at akka.actor.Actor.aroundReceive$(Actor.scala:515)
at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor.aroundReceive(MaterializeWorkflowDescriptorActor.scala:123)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
at akka.actor.ActorCell.invoke(ActorCell.scala:557)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [6/1501]
[2018-09-08 09:24:22,52] [info] WorkflowManagerActor WorkflowActor-1e03bf36-d64b-42a7-9857-a644de257de3 is in a terminal state: WorkflowFailedState
[2018-09-08 09:24:25,09] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'.
[2018-09-08 09:24:27,74] [info] Workflow polling stopped
[2018-09-08 09:24:27,76] [info] Shutting down WorkflowStoreActor - Timeout = 5 seconds
[2018-09-08 09:24:27,76] [info] Shutting down WorkflowLogCopyRouter - Timeout = 5 seconds
[2018-09-08 09:24:27,76] [info] Shutting down JobExecutionTokenDispenser - Timeout = 5 seconds
[2018-09-08 09:24:27,77] [info] Aborting all running workflows.
[2018-09-08 09:24:27,77] [info] JobExecutionTokenDispenser stopped
[2018-09-08 09:24:27,77] [info] WorkflowStoreActor stopped
[2018-09-08 09:24:27,78] [info] WorkflowLogCopyRouter stopped
[2018-09-08 09:24:27,78] [info] Shutting down WorkflowManagerActor - Timeout = 3600 seconds
[2018-09-08 09:24:27,78] [info] WorkflowManagerActor All workflows finished
[2018-09-08 09:24:27,78] [info] WorkflowManagerActor stopped
[2018-09-08 09:24:27,78] [info] Connection pools shut down
[2018-09-08 09:24:27,78] [info] Shutting down SubWorkflowStoreActor - Timeout = 1800 seconds
[2018-09-08 09:24:27,79] [info] Shutting down JobStoreActor - Timeout = 1800 seconds
[2018-09-08 09:24:27,79] [info] Shutting down CallCacheWriteActor - Timeout = 1800 seconds
[2018-09-08 09:24:27,79] [info] SubWorkflowStoreActor stopped
[2018-09-08 09:24:27,79] [info] Shutting down ServiceRegistryActor - Timeout = 1800 seconds
[2018-09-08 09:24:27,79] [info] Shutting down DockerHashActor - Timeout = 1800 seconds
[2018-09-08 09:24:27,79] [info] JobStoreActor stopped
[2018-09-08 09:24:27,79] [info] Shutting down IoProxy - Timeout = 1800 seconds
[2018-09-08 09:24:27,79] [info] CallCacheWriteActor Shutting down: 0 queued messages to process
[2018-09-08 09:24:27,79] [info] WriteMetadataActor Shutting down: 0 queued messages to process
[2018-09-08 09:24:27,79] [info] CallCacheWriteActor stopped
[2018-09-08 09:24:27,79] [info] KvWriteActor Shutting down: 0 queued messages to process
[2018-09-08 09:24:27,79] [info] DockerHashActor stopped
[2018-09-08 09:24:27,79] [info] IoProxy stopped
[2018-09-08 09:24:27,79] [info] ServiceRegistryActor stopped
[2018-09-08 09:24:27,81] [info] Database closed
[2018-09-08 09:24:27,81] [info] Stream materializer shut down
Workflow 1e03bf36-d64b-42a7-9857-a644de257de3 transitioned to state Failed
[2018-09-08 09:24:27,85] [info] Automatic shutdown of the async connection
[2018-09-08 09:24:27,85] [info] Gracefully shutdown sentry threads.
[2018-09-08 09:24:27,85] [info] Shutdown finished.
I follow https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/docs/tutorial_slurm.md, since I should run on my shool's slurm not my local pc but not stanford university‘s slurm.
Would you have any advice about my two errors?
Incompatibility issues with custom reference genome
Macs2 peak calling fails if the FASTA file used to build a custom genome database does not follow the chr[\dXY]
naming convention. For example, I am using the Ensembl masked version of the rat genome (rn6, release 94) found here ftp://ftp.ensembl.org/pub/release-94/fasta/rattus_norvegicus/dna/, which does not prepend 'chr' to chromosome names. The error is produced by the following call:
[2018-10-31 18:06:20,887 ERROR] Unknown exception caught. Killing process group 72093...
Traceback (most recent call last):
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_common.py", line 224, in run_shell_cmd
p.returncode, cmd)
CalledProcessError: Command 'cat /mnt/lab_data/montgomery/nicolerg/motrpac/atac/pipeline-output/cromwell-executions/atac/03a28d2a-f364-4ce1-bfd5-f10488cf42a9/call-macs2/shard-0/inputs/-78707573/rn6_masked.chrom.sizes | grep -P 'chr[\dXY]+[ \t]' > 20180725_2_Gastroc_002_powder_S1_L001_R1_001.trim.merged.nodup.tn5.pval0.01.300K.bfilt.chrsz.tmp' returned non-zero exit status 1
OS/Platform and dependencies
After implementing the change in #64, call-ataqc
fails with the following error:
Traceback (most recent call last):
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_ataqc.py", line 355, in <module>
ataqc()
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_ataqc.py", line 186, in ataqc
read_len)
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/run_ataqc.py", line 438, in make_tss_plot
processes=processes, stranded=True)
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/lib/python2.7/site-packages/metaseq/_genomic_signal.py", line 122, in array
chunksize=chunksize, **kwargs)
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/lib/python2.7/site-packages/metaseq/array_helpers.py", line 383, in _array_parallel
itertools.repeat(kwargs)))
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/lib/python2.7/multiprocessing/pool.py", line 253, in map
return self.map_async(func, iterable, chunksize).get()
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/lib/python2.7/multiprocessing/pool.py", line 572, in get
raise self._value
ValueError: invalid reference `chr1`
This might be related to non-standard chromosome names. I am using the "atac.keep_irregular_chr_in_bfilt_peak" : false
option.
OS/Platform and dependencies
When running the pipeline on my samples, the following error occurs at the read_genome_tsv
step:
Bad output 'read_genome_tsv.genome': java.io.IOException: Could not read from ~/atac-seq-pipeline/cromwell-executions/atac/8ffa8456-f560-45ae-b478-8a32c14b8e90/call-read_genome_tsv/
execution/tmp.tsv: File ~/atac-seq-pipeline/cromwell-executions/atac/8ffa8456-f560-45ae-b478-8a32c14b8e90/call-read_genome_tsv/execution/tmp.tsv is larger than 128000 Bytes. Maximum
read limits can be adjusted in the configuration under system.input-read-limits.
I'm using mm10_no_alt_analysis_set_ENCODE.fasta.gz
as my reference genome, which is 830M and seems to be causing the error?
This discussion suggests increasing limits in the call to Java (using java -Dsystem.input-read-limits.lines=500000 -jar /cromwell-34.jar
). Is that the recommended solution in this case as well?
OS/Platform and dependencies
cromwell-34.jar
conda 4.3.30
Please find logs here:
debug_75.tar.gz
It seems that a user can't adjust the Macs2 callpeak p-val threshold from 0.1? I might be missing something but how can one do this within the pipeline? Thank you.
Hi Jin,
Thank you for your help so far! I've run into a little issue that may be easy to fix or just not an option. So far I've been successful in running the pipeline on the test data. I noticed that I have to run it from within the atac-seq-pipeline directory and the input data seems to need to be contained in that directory or a subdirectory as well. I was told by our cluster admin that I should install the pipeline on my $HOME directory but store my large input fastq files and submit my sbatch jobs on my $WORK directory. I've tried a few different things, but I always get errors about the pipeline not finding a directory because it has appended the current working directory to the path I gave it in the json.
For example, my input data is currently in work/eaclark/fastq. The pipeline is in home/eaclark/atac-seq-pipeline. If I submit the job from within atac-seq-pipeline I get an error because it tried to find the input file in "/home/eaclark/atac-seq-pipeline/work/eaclark/fastq/sample.fastq.gz". So, it took the path I gave in the json which was "work/eaclark/fastq/sample.fastq.gz" and added the current working directory "/home/eaclark/atac-seq-pipeline" in front suggesting to me that it will always look for files in the current directory or subdirectory. I tried it the other way around, ie submit the job from $WORK, but then I get and error that it can't find pipeline files.
It there a way to run the pipeline the way my cluster admin wants? Or is that not an option, and I'll either have to move the input files to home/eaclark/atac-seq-pipeline, or install the pipeline on $WORK.
I hope my question makes sense.
Thanks!
Erin
In a local installation with conda the test run (as described in the tutorial) fails.
OS/Platform and dependencies
$ uname -a
Linux node061.hpc.local 2.6.32-504.el6.x86_64 #1 SMP Tue Sep 16 01:56:35 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
cromwell-34
$ conda -V
conda 4.5.11
ATACseq_pipeline_error_issueID_51.txt
debug_issueID_51.tar.gz
Describe the bug
Hi. I'm trying to run the atac-seq-pipeline and old version(deprecated) is working.
But this new version makes some errors when running the pipeline( encode_trim_adapter.py) .
[error] WorkflowManagerActor Workflow 5a715660-fc5d-425f-9b4a-8b5b94ce81e8 failed (during ExecutingWorkflowState): Job atac.trim_adapter:0:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
I attached stderr log. and I'm not sure 'input.json' is correct.
I tried to use this (https://www.encodeproject.org/experiments/ENCSR245LNF/) replicate2.
input.json
{
"atac.pipeline_type" : "atac",
"atac.genome_tsv" : "/home/dmb/hoyongLee/atac-seq-pipeline/hg38.tsv",
"atac.fastqs" : [[
[
"/home/dmb/hoyongLee/data/ENCFF154KNN.fastq",
"/home/dmb/hoyongLee/data/ENCFF829SBE.fastq"
],
[
"/home/dmb/hoyongLee/data/ENCFF565CVN.fastq",
"/home/dmb/hoyongLee/data/ENCFF092IJN.fastq"
],
[
"/home/dmb/hoyongLee/data/ENCFF351LRE.fastq",
"/home/dmb/hoyongLee/data/ENCFF803XRX.fastq"
],
[
"/home/dmb/hoyongLee/data/ENCFF709FLO.fastq",
"/home/dmb/hoyongLee/data/ENCFF396BSZ.fastq"
],
[
"/home/dmb/hoyongLee/data/ENCFF247VZG.fastq",
"/home/dmb/hoyongLee/data/ENCFF738DHW.fastq"
]
]],
"atac.paired_end" : true,
"atac.multimapping" : 4,
"atac.trim_adapter.auto_detect_adapter" : true,
"atac.bowtie2.cpu" : 4,
"atac.bowtie2.mem_mb" : 16000,
"atac.filter.cpu" : 4,
"atac.filter.mem_mb" : 12000,
"atac.macs2_mem_mb" : 16000,
"atac.smooth_win" : 73,
"atac.enable_idr" : true,
"atac.idr_thresh" : 0.05,
"atac.qc_report.name" : "ENCSR245LNF",
"atac.qc_report.desc" : "ATAC-seq on Mus musculus C57BL/6 frontal cortex adult"
}
OS/Platform and dependencies
Attach logs
File "/home/dmb/hoyongLee/atac-seq-pipeline/cromwell-executions/atac/e765d0d0-ffee-4c81-b679-1d2cccce874c/call-trim_adapter/shard-0/execution/write_tsv_55ca5bd3e4730afa22126a0109b4636d.tmp", line 1
2 /home/dmb/hoyongLee/atac-seq-pipeline/cromwell-executions/atac/e765d0d0-ffee-4c81-b679-1d2cccce874c/call-trim_adapter/shard-0/inputs/-1676945734/ENCFF154KNN.fastq /home/dmb/hoyongLee/atac-seq-pipeline/ cromwell-executions/atac/e765d0d0-ffee-4c81-b679-1d2cccce874c/call-trim_adapter/shard-0/inputs/-1676945734/ENCFF829SBE.fastq
3 ^
4 SyntaxError: invalid syntax
It looks like IDR peaks are being used to generate raw peak statistics. The peak statistics for the raw peaks and IDR peaks are identical in the HTML report, and the reported numbers of raw peaks and IDR peaks are the same. Screenshots from the HTML report for one sample are below.
There is one set of peak statistics given at the end of the ataqc
section of the QC JSON, but it does not specify which peak set those statistics are for. Peak statistics for Naive overlap peaks and IDR peaks are not explicitly given in the JSON. See snippet below.
"ataqc": [
{
...
...
"Raw peaks": [
49747,
"OK"
],
"Naive overlap peaks": [
88397,
"OK"
],
"IDR peaks": [
49747,
"OK"
],
"Min size": 150.0,
"25 percentile": 488.0,
"50 percentile (median)": 714.0,
"75 percentile": 957.0,
"Max size": 4710.0,
"Mean": 743.380605866,
"TSS_enrichment": 11.9348989018
}
OS/Platform and dependencies
Hi,
Thanks again for making your pipeline available.
I'm trying to run the wdl pipeline on sherlock, using 'sherlock.tsv' file for genome, and using shared genome data. Seems the ".tar" files for bowtie2 and bwa indexes can't be found?
Thanks!
Very nice tool, thanks for valuable work! I used this pipeline and run on slurm, for this goal, I followed official guidances, seems two guidances:
one is tutorial_slurm.md the other is slurm
At first, I think it's good to complement each other, but there are something conflict, my question: 1 if I meet this problem, I follow which guide? 2 Now I use which cromwell version here? 3 Should I install WOMtool and related toolchain (scala...) for use cromwell to finish this pipeline, it also make me feel very confused? Thank you!!!
kundajelab/atac_dnase_pipelines#136 reported by @Chokaro
First of all, thanks for the hard work. Deploying your pipeline via docker was rather easy, even for a bioinformatics amateur like me.
Sadly it didnt go all smooth until the end, I am using cromwell and atac.wdl to access your docker container and during the ataqc step I get the following error (I get this error for the R1 files of both PE replicates)
Traceback (most recent call last):
File "/software/atac-seq-pipeline/src/encode_ataqc.py", line 355, in
ataqc()
File "/software/atac-seq-pipeline/src/encode_ataqc.py", line 213, in ataqc
ROADMAP_META, OUTPUT_PREFIX)
File "/software/atac-seq-pipeline/src/run_ataqc.py", line 948, in compare_to_roadmap
sample_data = pd.read_table(out_file, header=None)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 709, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 449, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 818, in init
self._make_engine(self.engine)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1049, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1695, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 402, in pandas._libs.parsers.TextReader.cinit
File "pandas/_libs/parsers.pyx", line 718, in pandas._libs.parsers.TextReader._setup_parser_source
IOError: File LRSC1_CD34_50k_R1.trim.merged.signal does not exist
My Input .json looks like this, the fastq.gz files are stored locally on a different hdd harddrive:
{
"atac.pipeline_type" : "atac",
"atac.genome_tsv" : "/media/chokaro/2TB_Storage_2/genome/local/hg19_local.tsv",
"atac.fastqs" : [
[
["/media/chokaro/2TB_Storage_2/2018_03_24_OmniATAC_CD34/fastq/LRSC1_CD34_10k_R1.fastq.gz",
"/media/chokaro/2TB_Storage_2/2018_03_24_OmniATAC_CD34/fastq/LRSC1_CD34_10k_R2.fastq.gz"]
],
[
["/media/chokaro/2TB_Storage_2/2018_03_24_OmniATAC_CD34/fastq/LRSC1_CD34_50k_R1.fastq.gz",
"/media/chokaro/2TB_Storage_2/2018_03_24_OmniATAC_CD34/fastq/LRSC1_CD34_50k_R2.fastq.gz"]
]
],
"atac.paired_end" : true,
"atac.multimapping" : 4,
"atac.trim_adapter.auto_detect_adapter" : true,
"atac.bowtie2.cpu" : 6,
"atac.bowtie2.mem_mb" : 16000,
"atac.bowtie2.time_hr" : 36,
"atac.filter.cpu" : 2,
"atac.filter.mem_mb" : 12000,
"atac.filter.time_hr" : 23,
"atac.macs2_mem_mb" : 16000,
"atac.smooth_win" : 73,
"atac.enable_idr" : true,
"atac.idr_thresh" : 0.05,
"atac.qc_report.name" : "test1",
"atac.qc_report.desc" : "test1 on CD34 omni ATAC"
}
And finally my OS and system config are the following:
OS: Ubuntu Xenial 16.04
cromwell 34
conda 4.5.11
Docker version 18.06.1-ce, build e68fc7a
Hoping you guys have some advice for this... In any case many thanks in advance!
best
Chris
Hey
Not really an issue per se, if this is not the right place I'll move it to the googlegroup for klab genomic pipelines discussion.
I am not 100% sure, how technical replicates fit into the pipeline...
So next to the naive overlapping peaks we can additionally filter peaks for meeting specific IDR criteria: the way I see it, this seems like a pretty stringent way to adress biological replicates. Wether I will use it or not will probably depend on how much biological variation I want to adress in my downstream integrative analyses. For example for patient-derived data (one individual per replicate) I would probably use the entire naive_overlap peakset.
Technical replicates on the other hand, I usually adress by simply subsetting my peaksets for shared peaks... Which does not seem to be a straightforward possibility within this pipeline but maybe Ive overseen something. And merging technical replicates into a single fastq file somehow defeats their purpose in the first place
Could you tell me how you specifically adress the difference between technical and biological replicates?
Keep up the good work, awesome pipeline!
Chris
I got an error at the step of call-ataqc. I feel that the problem is caused by missing reg2map_bed file. I would like to know where can I download hg19 reg2map_bed file. Thanks!
The call-ataqc
module is failing with the following python variable assignment error:
Traceback (most recent call last):
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_ataqc.py", line 355, in <module>
ataqc()
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_ataqc.py", line 122, in ataqc
chr_m_reads, fraction_chr_m = get_chr_m(COORDSORT_BAM)
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/run_ataqc.py", line 215, in get_chr_m
fract_chr_m = float(chr_m_reads) / tot_reads
UnboundLocalError: local variable 'chr_m_reads' referenced before assignment
OS/Platform and dependencies
I'm happy to provide error logs if needed, but this seems pretty cut and dry.
The ataqc module still will not finish running without an error. It looks like a similar issue to #66 , perhaps related to non-conventional chromosome names.
Picked up _JAVA_OPTIONS: -Xms256M -Xmx16000M -XX:ParallelGCThreads=1
Traceback (most recent call last):
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_ataqc.py", line 444, in <module>
ataqc()
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_ataqc.py", line 231, in ataqc
read_len)
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/run_ataqc.py", line 438, in make_tss_plot
processes=processes, stranded=True)
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/lib/python2.7/site-packages/metaseq/_genomic_signal.py", line 122, in array
chunksize=chunksize, **kwargs)
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/lib/python2.7/site-packages/metaseq/array_helpers.py", line 383, in _array_parallel
itertools.repeat(kwargs)))
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/lib/python2.7/multiprocessing/pool.py", line 253, in map
return self.map_async(func, iterable, chunksize).get()
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/lib/python2.7/multiprocessing/pool.py", line 572, in get
raise self._value
ValueError: invalid reference `chr1`
OS/Platform and dependencies
Incompatibility with masked reference genomes
call-macs2
fails if a masked reference (where Ns are used to indicate repetitive regions) is used to build a custom genome reference database.
run_shell_cmd: PID=70151, CMD=bedtools intersect -a ${SAMPLE}.trim.merged.nodup.tn5.tagAlign.tmp1 -b ${SAMPLE}.trim.merged.nodup.tn5.pval0.01.300K.bfilt.narrowPeak.tmp2 -wa -u | wc -l
Traceback (most recent call last):
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_macs2_atac.py", line 210, in <module>
main()
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_macs2_atac.py", line 200, in main
frip_qc = frip( args.ta, bfilt_npeak, args.out_dir)
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_frip.py", line 54, in frip
write_txt(frip_qc, str(float(val1)/float(val2)))
ValueError: could not convert string to float: ***** WARNING: File rat_liver7_S12_L001_R1_001.trim.merged.nodup.tn5.tagAlign.tmp1 has inconsistent naming convention for record:
AABR07024382.1 100568 100639 N 1000 +
If it would be impractical to include compatibility with masked references, it would be helpful to specify that the pipeline is incompatible with masked references in the documentation. I understand that this is the function of the blacklist input, but blacklisted regions can be more difficult to define for less popular model organisms.
I ran the pipeline but I had to restart the server ... Is there any way to resume the pipeline where it was left?
I had a successful test run with a new 1.1.2 local installation. It is failing with my actual data (Xenopus laevis, built custom genome). I am starting the run from deduped bam files. After seeing the failure with v1.1.2, I tried a new local installation of 1.1.3. The error logs and terminal output for the 1.1.3 run are attached. I do not have access to machines on which I can install singularity or docker, so I'm restricted to the local installation.
$ conda --version
conda 4.5.11
$ uname -a
Linux node107.hpc.local 3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
cromwell-34
Hello,
How can we configure/define the fastq arrays for multiple samples with technical replicates? As far as I can understand we can only define a single sample with multiple replicates if we use 1-dimensional arrays. Is this true?
Hi,
We were able to run your pipeline using Cromwell 34 at JHPCE http://www.jhpce.jhu.edu/ with some effort. We have some questions that we might have missed in the docs.
We noticed that the qsub'ed jobs were not finding the conda env. So we edited our ~/.bashrc
file to load the conda env. However, the backend configuration file uses /bin/sh
instead of /bin/bash
and thus we had to edit the backend conf file: at https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/backends/backend.conf#L138.
## For forcing the activation of the conda ATAC seq env
if [[ $HOSTNAME == compute-* ]]; then
echo "Activating the ATAC seq conda environment"
source activate encode-atac-seq-pipeline
fi
While the issue with the conda environment not been activated was unexpected, is there an option we missed for specifying the shell for the qsub calls? Looks like there isn't and this is a small edit that a user can do.
By default the ataqc step uses max 16,000 MB and you can see that value specified in both the java options and the memory requested for SGE. That is all ok, except that for some reason we seem to need an extra 4GB beyond the max java limit in SGE at JHPCE.
Is there an option we can use to increase the max SGE mem for the ataqc step while keeping the max mem at 16,000 for the java options? (or for only controlling the java options?). It could be that we just need to fork your pipeline and make a small edit in the atac.wdl
file.
For example, we might need to edit https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/atac.wdl#L976 to read
export _JAVA_OPTIONS="-Xms256M -Xmx16000M -XX:ParallelGCThreads=1"
and then run the pipeline with atac.ataqc.mem_mb
set to 21,000 Mb. (if that options is valid, since it's not mentioned in https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/docs/input.md#resource)
When the pipeline crashed due to memory in ataqc we debugged things by modifying the script.submit
script but then ran into an error with which picard.jar
. For some reason,
atac-seq-pipeline/src/encode_common_genomic.py
Lines 104 to 108 in e9c9744
~/.bashrc
file with:
## For using a specific picard.jar
if [[ $HOSTNAME == compute-* ]]; then
echo "Tricks for picard"
export PATH=/users/bbarry/.conda/envs/encode-atac-seq-pipeline/share/picard-2.10.6-0:$PATH
fi
I don't know if you've seen this before. It could again be an issue with the OS version at JHPCE.
Let us know if you need more info!
Best,
Leonardo and @BriannaBarry
(cc @andrewejaffe)
OS/Platform and dependencies
JHPCE info: /dcl01/lieber/ajaffe/Brianna/jaffe_lab/atac-seq-pipeline/cromwell-executions/atac/198ad417-0ccf-4a84-8cee-2707166bc53b/call-ataqc/shard-1/execution
(in case we need to revisit this later)
Major error not caught by pipeline
The pipeline does not output an error in call-macs2/shard-0/execution/stderr
if the specified blacklist BED file is improperly formatted. It instead runs to completion, which results in empty final peak files without an obvious error.
The problem occurs within the following step in the call-macs2
module (from call-macs2/shard-0/execution/stdout
):
[2018-10-28 03:38:52,988 INFO] run_shell_cmd: PID=131102, CMD=bedtools intersect -v -a 20180815-14-Adipose-002-powder_S14_L001_R1_001.trim.merged.nodup.tn5.pval0.01.300K.narrowPeak.tmp1 -b rn6_blacklist.bed.tmp2 | awk 'BEGIN{OFS="\t"} {if ($5>1000) $5=1000; print $0}' | grep -P 'chr[\dXY]+[ \t]' | gzip -nc > 20180815-14-Adipose-002-powder_S14_L001_R1_001.trim.merged.nodup.tn5.pval0.01.300K.bfilt.narrowPeak.gz
It is easy to see why an improperly formatted BED file would have caused this step to fail, but it should have output the error to the stderr
file (which was empty in call-macs2/shard-0/execution
) and terminated the pipeline. Instead, at first glance it appeared the pipeline finished running without error and took a bit of digging to find the problem.
I don't have the exact error or log files because they were removed by another user after fixing the issue.
OS/Platform and dependencies
Hi, first things first, thanks for all the help and for this beautiful pipeline.
I was able to install and the test ran smoothly. Now I tried to run my real samples and it broke in the very end. I can't explain why. Can you try to help me understand what is the issue? I am attaching all the logs of the run.
cat `./call-reproducibility_overlap/execution/stderr`
Traceback (most recent call last):
File "/ru-auth/local/home/trezende/miniconda3/envs/encode-atac-seq-pipeline/bin/encode_reproducibility_qc.py", line 154, in <module>
main()
File "/ru-auth/local/home/trezende/miniconda3/envs/encode-atac-seq-pipeline/bin/encode_reproducibility_qc.py", line 50, in main
args = parse_arguments()
File "/ru-auth/local/home/trezende/miniconda3/envs/encode-atac-seq-pipeline/bin/encode_reproducibility_qc.py", line 42, in parse_arguments
'Invalid number of peak files or --peak-pr.')
argparse.ArgumentTypeError: Invalid number of peak files or --peak-pr.
So, I built the mm10 mouse database successfully and tried the pipeline.
There is any way of trying to fix/test it without having to start since from the beginning? Can the pipeline skip the successfully and finished things?
debug_34.tar.gz
The Github page for the Usage link inside README does not exist.
ENCSR356KRQ_subsampled_issue52.json.zip
debug_issue52.tar.gz
testRun_env_issue52.txt
testError_issue52.txt
Can't complete a test run on a local installation. Although I know that singularity or docker is preferred, that is not possible on this machine.
In addition to the information below I've attached error logs, my input .json file and my environment (after activating encode-atac-seq-pipeline).
$ uname -a
Linux node061.hpc.local 2.6.32-504.el6.x86_64 #1 SMP Tue Sep 16 01:56:35 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
$ conda -V
conda 4.5.11
cromwell-34
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.