Giter Site home page Giter Site logo

datirium / workflows Goto Github PK

View Code? Open in Web Editor NEW
15.0 4.0 14.0 3.02 MB

CWL based Bioinformatics Workflows

License: Apache License 2.0

Common Workflow Language 91.01% Shell 3.52% R 5.38% HTML 0.02% Dockerfile 0.06% Awk 0.01%
biowardrobe cwl workflow epigenetics bioinformatics bioinformatics-pipeline bioinformatics-analysis chip-seq rna-seq clip-seq

workflows's Introduction

Build Status

Bioinformatics Workflows by Datirium LLC

ChIP-Seq, ATAC-Seq, CLIP-Seq, RNA-Seq CWL workflows for use in Scientific Data Analysis Platform (SciDAP) or in BioWardrobe project or standalone with cwltool.

All the original BioWardrobe's pipelines has been rewritten in CWL and new workflows has been added. The repository pulls automatically into SciDAP platform.

Augmented CWL standard for SciDAP

There are 4 additional references that can be given to a workflow for added compatability within SciDAP.

  1. Metadata
  2. Upstreams
  3. Visual Plugins
  4. Service Tags

Metadata

To extend user interface (dynamic form) with extra input fields not required by a workflow, the 'sd:metadata' field was introduced. It defines a list of workflow templates where the inputs object is used for constructing and storing extra fields with an original workflow.

Example of 'metadata' template for user interface:

chipseq-header.cwl

cwlVersion: v1.0
class: Workflow

inputs:
  cells:
    type: string
    label: "Cells"
    sd:preview:
        position: 1
  conditions:
    type: string
    label: "Conditions"
    sd:preview:
        position: 3
  alias:
    type: string
    label: "Experiment short name/Alias"
    sd:preview:
        position: 2
  catalog:
    type: string?
    label: "Catalog #"

outputs: []
steps: []

and include file as sd:metadata

'sd:metadata':
    - "../metadata/chipseq-header.cwl"

Upstreams

To extend the SciDAP UI to allow for already analysed data the be selectable as inputs, we organize a graph of separate workflows. To link workflows we use ’sd:upstream’, which defines a list of upstream workflows who's outputs are accible by this workflow.

...
'sd:upstream':
  rnaseq_sample_untreated:
    - "rnaseq-se.cwl"
    - "rnaseq-pe.cwl"
    - "rnaseq-se-dutp.cwl"
    - "rnaseq-pe-dutp.cwl"
    - "rnaseq-se-dutp-mitochondrial.cwl"
    - "rnaseq-pe-dutp-mitochondrial.cwl"
    - "trim-rnaseq-pe.cwl"
    - "trim-rnaseq-se.cwl"
    - "trim-rnaseq-pe-dutp.cwl"
    - "trim-rnaseq-se-dutp.cwl"
  rnaseq_sample_treated:
    - "rnaseq-se.cwl"
    - "rnaseq-pe.cwl"
    - "rnaseq-se-dutp.cwl"
    - "rnaseq-pe-dutp.cwl"
    - "rnaseq-se-dutp-mitochondrial.cwl"
    - "rnaseq-pe-dutp-mitochondrial.cwl"
    - "trim-rnaseq-pe.cwl"
    - "trim-rnaseq-se.cwl"
    - "trim-rnaseq-pe-dutp.cwl"
    - "trim-rnaseq-se-dutp.cwl"

inputs:

  untreated_files:
    type: File[]
    format:
     - "http://edamontology.org/format_3752"
     - "http://edamontology.org/format_3475"
    label: "Untreated input CSV/TSV files"
    doc: "Untreated input CSV/TSV files"
    'sd:upstreamSource': "rnaseq_sample_untreated/rpkm_common_tss"
    'sd:localLabel': true
...

VisualPlugins for an output type file

Usually, workflows' output results (especially files) are provided as download links in web interfaces. With SciDAP visualization plugins, data can be presented as a plot, as a genome browser, as a table, or (in the case of html outputs) to be opened in a new tab. The keyword 'sd:visualPlugins' enables SciDAP visualization plugins. line, pie, chart, igvbrowser, syncfusiongrid, and linkList types are already available in the platform.

outputs:
    ...
    fastx_statistics:
        type: File
        label: "FASTQ statistics"
        format: "http://edamontology.org/format_2330"
        doc: "fastx_quality_stats generated FASTQ file quality statistics file"
        outputSource: fastx_quality_stats/statistics_file
        'sd:visualPlugins':
        - line:
            Title: 'Base frequency plot'
            xAxisTitle: 'Nucleotide position'
            yAxisTitle: 'Frequency'
            colors: ["#b3de69", "#99c0db", "#fb8072", "#fdc381", "#888888"]
            data: [$12, $13, $14, $15, $16]
    ...
    diff_expr_file:
        type: File
        label: "DESeq results, TSV"
        format: "http://edamontology.org/format_3475"
        doc: "DESeq generated list of differentially expressed items grouped by isoforms, genes or common TSS"
        outputSource: deseq/diff_expr_file
        'sd:visualPlugins':
        - syncfusiongrid:
            Title: 'Combined DESeq results'
    ...
    bigwig:
        type: File
        format: "http://edamontology.org/format_3006"
        label: "BigWig file"
        doc: "Generated BigWig file"
        outputSource: bam_to_bigwig/bigwig_file
        'sd:visualPlugins':
        - igvbrowser:
            id: 'igvbrowser'
            type: 'wig'
            name: "BigWig Track"
            height: 120
    ...

Service Tags for workflows

The 'sd:serviceTag'keyword enables new workflows to be added for the creation of:

  • samples: uses keyword 'sample'
  • analyses: uses keyword 'analysis'
  • genelist: uses keywork 'genelist'

workflows's People

Contributors

andreykartashov avatar carcassona avatar heylel-b-sh avatar michael-kotliar avatar mr-c avatar ndeeseee avatar portah avatar qmccourt avatar robert-player avatar scrowley-datirium avatar tyomach avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

workflows's Issues

rna-seq-pe workflow is stuck

The Workflow is stuck at some docker step. Here are the docker logs

docker: Error response from daemon: invalid mount config for type "bind": bind source path does not exist: /tmp/node-096185fd-7cf7-44a5-a462-8e7c6002543d-0ec45386-81be-47c0-919e-658edac745bb/tmpfbzh2pcg/aa3cb6f7-e16a-4a7f-8a5e-7449fff895a1/thvrv357w/tmp-out6ne9jcfd.
See 'docker run --help'.

Workflow file: rnaseq-pe.cwl
Configuration file: tests/rnaseq-pe-1.json
Workflow tool: toil

Toil logs:

[2020-07-17T06:49:26+0000] [MainThread] [I] [cwltool] Resolved 'rnaseq-pe.cwl' to 'file:///Users/prakash/work/kallisto/workflows/workflows/rnaseq-pe.cwl'
URI prefix 'sd' of 'sd:metadata' not recognized, are you missing a $namespaces section?
[2020-07-17T06:49:34+0000] [MainThread] [W] [salad] URI prefix 'sd' of 'sd:metadata' not recognized, are you missing a $namespaces section?
URI prefix 'sd' of 'sd:upstream' not recognized, are you missing a $namespaces section?
[2020-07-17T06:49:34+0000] [MainThread] [W] [salad] URI prefix 'sd' of 'sd:upstream' not recognized, are you missing a $namespaces section?
URI prefix 'sd' of 'sd:upstreamSource' not recognized, are you missing a $namespaces section?
[2020-07-17T06:49:34+0000] [MainThread] [W] [salad] URI prefix 'sd' of 'sd:upstreamSource' not recognized, are you missing a $namespaces section?
URI prefix 'sd' of 'sd:upstreamSource' not recognized, are you missing a $namespaces section?
[2020-07-17T06:49:34+0000] [MainThread] [W] [salad] URI prefix 'sd' of 'sd:upstreamSource' not recognized, are you missing a $namespaces section?
URI prefix 'sd' of 'sd:upstreamSource' not recognized, are you missing a $namespaces section?
[2020-07-17T06:49:34+0000] [MainThread] [W] [salad] URI prefix 'sd' of 'sd:upstreamSource' not recognized, are you missing a $namespaces section?
URI prefix 'sd' of 'sd:layout' not recognized, are you missing a $namespaces section?
[2020-07-17T06:49:34+0000] [MainThread] [W] [salad] URI prefix 'sd' of 'sd:layout' not recognized, are you missing a $namespaces section?
URI prefix 'sd' of 'sd:layout' not recognized, are you missing a $namespaces section?
[2020-07-17T06:49:34+0000] [MainThread] [W] [salad] URI prefix 'sd' of 'sd:layout' not recognized, are you missing a $namespaces section?
URI prefix 'sd' of 'sd:layout' not recognized, are you missing a $namespaces section?
[2020-07-17T06:49:34+0000] [MainThread] [W] [salad] URI prefix 'sd' of 'sd:layout' not recognized, are you missing a $namespaces section?
URI prefix 'sd' of 'sd:upstreamSource' not recognized, are you missing a $namespaces section?
[2020-07-17T06:49:34+0000] [MainThread] [W] [salad] URI prefix 'sd' of 'sd:upstreamSource' not recognized, are you missing a $namespaces section?
URI prefix 'sd' of 'sd:layout' not recognized, are you missing a $namespaces section?
[2020-07-17T06:49:34+0000] [MainThread] [W] [salad] URI prefix 'sd' of 'sd:layout' not recognized, are you missing a $namespaces section?
URI prefix 'sd' of 'sd:visualPlugins' not recognized, are you missing a $namespaces section?
[2020-07-17T06:49:34+0000] [MainThread] [W] [salad] URI prefix 'sd' of 'sd:visualPlugins' not recognized, are you missing a $namespaces section?
URI prefix 'sd' of 'sd:visualPlugins' not recognized, are you missing a $namespaces section?
[2020-07-17T06:49:34+0000] [MainThread] [W] [salad] URI prefix 'sd' of 'sd:visualPlugins' not recognized, are you missing a $namespaces section?
URI prefix 'sd' of 'sd:visualPlugins' not recognized, are you missing a $namespaces section?
[2020-07-17T06:49:34+0000] [MainThread] [W] [salad] URI prefix 'sd' of 'sd:visualPlugins' not recognized, are you missing a $namespaces section?
URI prefix 'sd' of 'sd:visualPlugins' not recognized, are you missing a $namespaces section?
[2020-07-17T06:49:34+0000] [MainThread] [W] [salad] URI prefix 'sd' of 'sd:visualPlugins' not recognized, are you missing a $namespaces section?
URI prefix 'sd' of 'sd:visualPlugins' not recognized, are you missing a $namespaces section?
[2020-07-17T06:49:34+0000] [MainThread] [W] [salad] URI prefix 'sd' of 'sd:visualPlugins' not recogni[2020-07-17T06:49:34+0000] [MainThread] [W] [salad] rnaseq-pe.cwl:205:7: Warning: checking item
                     Warning:   checking object `rnaseq-pe.cwl#bambai_pair/igvbrowser`
rnaseq-pe.cwl:209:9: Warning:     Field `type` references unknown identifier `alignment`, tried
                     file:///Users/prakash/work/kallisto/workflows/workflows/rnaseq-pe.cwl#alignment
rnaseq-pe.cwl:112:7: Warning: checking item
                     Warning:   checking object `rnaseq-pe.cwl#bigwig/igvbrowser`
rnaseq-pe.cwl:115:9: Warning:     Field `type` references unknown identifier `wig`, tried
                     file:///Users/prakash/work/kallisto/workflows/workflows/rnaseq-pe.cwl#wig
[2020-07-17T06:49:34+0000] [MainThread] [W] [salad] rnaseq-pe.cwl:112:7: Warning: checking item
                     Warning:   checking object `rnaseq-pe.cwl#bigwig/igvbrowser`
rnaseq-pe.cwl:115:9: Warning:     Field `type` references unknown identifier `wig`, tried
                     file:///Users/prakash/work/kallisto/workflows/workflows/rnaseq-pe.cwl#wig
URI prefix 'doi' of 'doi: 10.1093/bioinformatics/bts635' not recognized, are you missing a $namespaces section?
[2020-07-17T06:49:47+0000] [MainThread] [W] [salad] URI prefix 'doi' of 'doi: 10.1093/bioinformatics/bts635' not recognized, are you missing a $namespaces section?
[2020-07-17T06:49:47+0000] [MainThread] [W] [cwltool] Workflow checker warning:
rnaseq-pe.cwl:412:7:                    Parameter 'bambai_pair' requires secondaryFiles ['.bai'] but
rnaseq-pe.cwl:361:11:                     source 'bam_bai_pair' does not provide those
                                          secondaryFiles.
../tools/samtools-sort-index.cwl:154:5:   To resolve, add missing secondaryFiles patterns to
                                          definition of 'bam_bai_pair' or
../tools/samtools-stats.cwl:40:5:         mark missing secondaryFiles in definition of
                                          'bambai_pair' as optional.
rnaseq-pe.cwl:334:9:                    Source 'log_final' of type ["null", "File"] may be
                                        incompatible
rnaseq-pe.cwl:120:5:                      with sink 'star_final_log' of type "File"
[2020-07-17T06:49:52+0000] [MainThread] [I] [toil] Running Toil version 4.2.0a1-6fee089e407c7487f8d2ef3aa7f8a485e8c51caf on host bfa671af2216.
[2020-07-17T06:49:53+0000] [MainThread] [I] [toil.worker] Redirecting logging to /tmp/node-096185fd-7cf7-44a5-a462-8e7c6002543d-0ec45386-81be-47c0-919e-658edac745bb/tmpjpwiotpm/worker_log.txt
[2020-07-17T06:49:54+0000] [MainThread] [I] [toil.leader] 0 jobs are running, 0 jobs are issued and waiting to run
[2020-07-17T06:49:54+0000] [MainThread] [I] [toil.leader] Issued job 'file:///Users/prakash/work/kallisto/workflows/tools/extract-fastq.cwl' bash -c kind-file_Users_prakash_work_kallisto_workflows_tools_extract-fastq.cwl/instance-sjkyy9lc with job batch system ID: 1 and cores: 1, disk: 3.0 G, and memory: 2.0 G
[2020-07-17T06:49:54+0000] [MainThread] [I] [toil.leader] Issued job 'file:///Users/prakash/work/kallisto/workflows/tools/extract-fastq.cwl' bash -c kind-file_Users_prakash_work_kallisto_workflows_tools_extract-fastq.cwl/instance-2l_kx_q_ with job batch system ID: 2 and cores: 1, disk: 3.0 G, and memory: 2.0 G
[2020-07-17T06:49:54+0000] [MainThread] [I] [toil.worker] Redirecting logging to /tmp/node-096185fd-7cf7-44a5-a462-8e7c6002543d-0ec45386-81be-47c0-919e-658edac745bb/tmpcth8q0u9/worker_log.txt
[2020-07-17T06:49:54+0000] [MainThread] [I] [toil.worker] Redirecting logging to /tmp/node-096185fd-7cf7-44a5-a462-8e7c6002543d-0ec45386-81be-47c0-919e-658edac745bb/tmpfbzh2pcg/worker_log.txt

Preseq graph

For all pipelines with preseq:
The graph is too wide: difficult to see the area that is of interest:
-Need to either:
--change the X coordinates to log scale :X(1,000,000 to 1,000,000,000)
--OR limit X scale to 200,000,000

Also need to show the current read/duplicate number with a red dot
(less important) Also in the data labels that appear when mouse hovers over a point, Y coordinate is shown in a convenient fashion: e.g. 17 202 064.5, whereas X coordinate is 21000000 (no spaces). Need to add spaces.

bigWig track is not displayed when output if File[] or some other reason?

Use rgt-thor.cwl as an example.
The following output is not displayed in IGV

  cond_1_bigwig_file:
    type: File[]
    format: "http://edamontology.org/format_3006"
    label: "First biological condition ChIP-seq signals"
    doc: "Postprocessed ChIP-seq signals from the first biological condition samples"
    outputSource: thor/cond_1_bigwig_file
    'sd:visualPlugins':
    - igvbrowser:
        tab: 'IGV Genome Browser'
        id: 'igvbrowser'
        type: 'wig'
        name: "Biological condition 1"
        height: 120

PCA workflows fails if expression_aliases input has duplicates

In PCA workflow the input expression_aliases is expected to have unique values as they are used as legend on the generated plots, however workflow shouldn't fail even if this input has NOT unique values. Perhaps, require correction of the R script and rebuilding Docker container. Similar problem potentially might occur in other R based workflows such as DESeq (not tested)

File input in `sd:layout advanced: true` causes error in web interface

Looks like uncommenting sd:layout causes error Input type "file" isn't supported by matInput when displaying the workflow. Bug can be tested on rgt-thor.cwl workflow

Workflow input example

housekeeping_genes_bed_file:
  type: File?
  format: "http://edamontology.org/format_3003"
  label: "Housekeeping genes file"
  doc: "Define housekeeping genes (BED format) used for normalizing"
  # 'sd:layout':
  #   advanced: true

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.