Giter Site home page Giter Site logo

patterns's People

Contributors

abhi18av avatar alperyilmaz avatar bentsherman avatar davidmasp avatar egonw avatar evanfloden avatar ewels avatar heuermh avatar jsalignon avatar kevinsayers avatar mfoll avatar mribeirodantas avatar odoublewen avatar pditommaso avatar sateeshperi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

patterns's Issues

Port the patterns to DSL2

I noticed that the patterns are (i) still in the DSL1 and (ii) use deprecated features like set value and (iii) do not make use of the newer helpful features such as stub for quick iterations. Perhaps, it might be beneficial for the overall community to move the patterns to DSL2 to facilitate the transition.

What do you think?

Example of conditional execution based on channel output

The conditional process example is a great example, but it only covers a conditional based on a pre-set param value (params.flag in the example), and does not cover dynamic conditionals based on process/workflow output. For example, one may want to run Sub-workflow1 if Process1 generates non-empty files, while Sub-workflow2 is run if the files are all empty.

Code like the following does not work:

  if( MY_PROCESS.out.map{ it.size() }.sum() == 0 ){
    ch_out = WORKFLOW1()
  } else {
    ch_out = WORKFLOW2()
  }

...since MY_PROCESS.out.map{ it.size() }.sum() is not considered an integer that can be compared to 0. So how can one handle dynamic flow control in Nextflow, based on process/workflow output?

Groovy syntax on GitHub

You can get GitHub to play nicely with the NextFlow scripts file extensions with these two tricks:

Add the following to the top of every .nf script: (does syntax highlighting)

vim: syntax=groovy
-*- mode: groovy;-*-

Create a file called .gitattributes with the following: (changes the coloured bar at the top of the repo to say 100% Groovy instead of 100% Shell.

*.nf linguist-language=Groovy

Hope this helps! It was annoying me on our NextFlow repos ๐Ÿ˜‰

Dockerfile refers to non-existing file

bin/AMPA.pl does not exist in this examples repository, and $ docker build . fails.

$ docker build .
Sending build context to Docker daemon 6.546 MB
Step 0 : FROM pditommaso/dkrbase:1.1
 ---> ae4cb2b803ba
Step 1 : MAINTAINER Paolo Di Tommaso <[email protected]>
 ---> Using cache
 ---> 7f956c07387e
Step 2 : RUN apt-get install -q -y gnuplot python && apt-get clean
 ---> Using cache
 ---> 730aeb7ec1b6
Step 3 : RUN cpanm Math::CDF Math::Round &&   rm -rf /root/.cpanm/work/
 ---> Using cache
 ---> 1d85dd9a180e
Step 4 : RUN wget -q ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.29/ncbi-blast-2.2.29+-x64-linux.tar.gz &&     tar xf ncbi-blast-2.2.29+-x64-linux.tar.gz &&     mv ncbi-blast-2.2.29+ /opt/ &&     rm -rf ncbi-blast-2.2.29+-x64-linux.tar.gz &&     ln -s /opt/ncbi-blast-2.2.29+/ /opt/blast
 ---> Using cache
 ---> c8bfe75956a3
Step 5 : RUN wget -q http://tcoffee.org/Packages/Stable/Version_11.00.8cbe486/linux/T-COFFEE_installer_Version_11.00.8cbe486_linux_x64.tar.gz &&   tar xf T-COFFEE_installer_Version_11.00.8cbe486_linux_x64.tar.gz -C /opt &&   mv /opt/T-COFFEE_installer_Version_11.00.8cbe486_linux_x64 /opt/tcoffee &&   rm -rf /opt/tcoffee/plugins/linux/*  &&   rm T-COFFEE_installer_Version_11.00.8cbe486_linux_x64.tar.gz
 ---> Using cache
 ---> 5381d47ca2b9
Step 6 : ADD bin/AMPA.pl /usr/local/bin/
bin/AMPA.pl: no such file or directory

Temporary resolution: comment out in Dockerfile (?) or add the script found in another repository nextflow-io/tests/bin/AMPA.pl.

manipulating variable outside of scripts

My title may be slightly misleading, however, bare with me.

I have a process iterate_list. Process iterate_list takes a list and does something on each item in the list. When running the script, it takes two inputs. The list and the item it needs to process (which it gets as a consumer from a rabbitmq queue)

Currently, I give a python script the entire list, and it iterates over each one does the processing (as one big chunk) and returns after completion. This is fine, however, if the system restarts, it starts all over again.

I was wondering, how can I make it so that every time my python script processes a single item, it returns the item, I remove it from the list, and then pass in the new list to the process. So in case of a system restart/crash, nextflow knows where it left off and can continue from there.

import groovy.json.JsonSlurper

 def jsonSlurper = new JsonSlurper()
 def cfg_file = new File('/config.json')
 def analysis_config = jsonSlurper.parse(cfg_file)
 def cfg_json = cfg_file.getText()
 def list_of_items_to_process = [] 

 items = Channel.from(analysis_config.items.keySet())

 for (String item : items) {
     list_of_items_to_process << item
     } 

 process iterate_list{
     echo true

     input:
     list_of_items_to_process

     output:
     val 1 into typing_cur

     script:
     """
     python3.7 process_list_items.py ${my_queue} \'${list_of_items_to_process}\'
     """ 
 }

 process signal_completion{

     echo true

     input:
     val typing_cur

     script:
     """
     echo "all done!"
     """
 }

Basically, the process "iterate_list" takes one "item" from a queue in the message broker. Process iterate_list should look something like:

    process iterate_list{
        echo true

        input:
        list_of_items_to_process

        output:
        val 1 into typing_cur

        script:
        """
        python3.7 process_list_items.py ${my_queue} \'${list_of_items_to_process}\'
        list_of_items_to_process.remove(<output from python script>)
        """
    }

And so for each one, it shd run, remove the item it jus processed, and restart with a new list.

    initial_list = [1,2,3,4]
    after_first_process_completes = [2,3,4]
    and_eventually = [] <- This is when it should move on to the next process.

Excuse the indents, SO wasn't letting me post the code without indents.

Merge patterns into main Nextflow docs

The Nextflow patterns provide a lot of value to users at every level. I think they would have more visibility if they were part of the main Nextflow docs, since that seems to be the starting point for most people.

What do you think @pditommaso

Example for optional input in tuple?

I am trying to run a configuration where the input is a tuple of paths, some of which are optional. The pattern in this repository works for separate path inputs (or, so says the author), but extending it to my use case results in the error Not a valid path value: NO_FILE.

In this simple example demonstrating the issue, the input is a CSV file defining RNA-seq sample names, forward and reverse read fastqs, and a STAR genome index to align them to. The reverse read is optional.

My question to the community is how I can work around this issue.

Repository setup

nextflow.config

params {
  manifest = null  // csv file name,R1,R2?,index
  outdir = "outs"  // save output bams
}
profiles {
  conda {
    conda.enabled = true
    process.conda = "star samtools"
  }
}

main.nf

process MyProcess {
  publishDir outdir, mode: "copy"
  input:
    tuple val(name), path(R1), path(R2), path(index)
    path outdir
  output:
    path "${name}_Aligned.out.sortedByCoord.bam"
    path "${name}_Aligned.out.sortedByCoord.bam.bai"
  script:
    R2_arg = R2.name == "NO_FILE" ? "" : R2
"""
STAR --readFilesIn $R1 $R2_arg --readFilesCommand gunzip -c \
     --genomeDir $index --outSAMtype BAM SortedByCoordinate \
     --outFileNamePrefix ${name}_
samtools index ${name}_Aligned.out.sortedByCoord.bam
"""
}

workflow {
  MyProcess(
    file(manifest).read().splitCsv(header: ["name", "R1", "R2", "index"]).map{it.R2 = it.R2 ?: "NO_FILE"},
    params.outdir
  )
}

Run

nextflow run main.nf [-profile conda] --manifest path/to/manifest.csv

Blast example stuck

I'm trying to run blast.nf in my workstation, blast_result and top_hits have been generated, then the project stuck. Any possible reason?

nextflow run examples/blast.nf -with-docker -with-report -with-timeline
N E X T F L O W ~ version 0.27.0
Launching examples/blast.nf [small_nightingale] - revision: 7b4b740be4
[warm up] executor > local
[94/2ab84a] Submitted process > blast (1)

nextflow info
Version: 0.27.0 build 4751
Modified: 09-01-2018 10:18 UTC (05:18 EDT)
System: Linux 3.10.0-514.10.2.el7.x86_64
Runtime: Groovy 2.4.13 on OpenJDK 64-Bit Server VM 1.8.0_121-b13
Encoding: UTF-8 (UTF-8)

Multiple input and multiple output

Hi, I am a new user of nextflow. I have test1.bed, test1.bim, test1.fam files. I want to do some qc using plink in nextflow. How can I import those three files in nextflow and get output test2.bed, test2.bim, test2.fam which will be used in the next step to generate test3.bed, test3.bim, test3.fam and so on? I also need to save all the intermediate files in a directory. Any help with example?
Kind regards, Zillur

Example of aggregating file with associated metadata

Currently, there is no example of aggregating files AND associated metadata. For instance, in many/most nf-core pipelines the process outputs are something like:

output:
tuple val(meta), path("file.txt")

...but what if one wants to then aggregate all of the file.txt outputs into one table AND include the meta metadata in that output table?

As far as I can tell from scouring the nextflow slack channel, one must "embed" the metadata in the file paths and then parse the file paths in the aggregation step. For example:

Per-file process:

output:
tuple val(meta), path("${meta}.txt")

Aggregation process:

input:
path("*")

script:
"""
[somehow parse {meta} from input file path] 
"""

Is there a better way, especially given the substantial limitations of trying to embed metadata into a file path (eg., dealing with multiple values and special characters in the metadata values)?

I'm sure a lot of pipeline developers would like a best-practices example of how to deal with this situation (without having to decipher how meta is dealt with in aggregation steps of nf-core pipelines).

Multiple optional inputs

The optional input pattern does not seem to work if a process has more than one optional input.

For example, the following test:

params.inputs = "$projectDir/data/sites.txt"
params.filter = "$projectDir/assets/NO_FILE"

process foo {
  debug true
  input:
  path seq
  path(opt)
  path(opt2)

  script:
  def filter = opt.name != 'NO_FILE' ? "--filter $opt" : ''
  def filter2 = opt2.name != 'NO_FILE' ? "--filter $opt" : ''
  """
  echo your_command --input $seq $filter $filter2
  """
}

workflow {
  prots_ch = Channel.fromPath(params.inputs, checkIfExists:true)
  opt_file = file(params.filter, checkIfExists:true)
  opt2_file = file(params.filter, checkIfExists:true)

  foo(prots_ch, opt_file, opt2_file)
}

Returns a collision error:

ERROR ~ Error executing process > 'foo (1)'
ERROR ~ Error executing process > 'foo (1)'

Caused by:                                                                                                             Process `foo` input file name collision -- There are multiple input files for each of the following file names: NO_FILE


Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details

How should one generalize this pattern to handle multiple optional inputs?

blast-parallel.nf adding makeblastdb process

Hi,
Would you be able to add makeblastdb process to blast-parallel.nf?

genomes = Channel.fromPath(params.genomes)

process formatBlastDatabases {

  storeDir '/db/genomes'

  input:
  file species from genomes

  output:
  file "${dbName}.*" into blastDb

  script:
  dbName = species.baseName
  """
  makeblastdb -dbtype nucl -in ${species} -out ${dbName}
  """
}

Thank you in advance.

Michal

Typo for "Process when empty"?

Current example for process-when-empty:

params.inputs = ''

process foo {
  debug true  
  input:
  val x
  when:
  x ## 'EMPTY'

  script:
  '''
  echo hello
  ''' 
}

workflow {
  reads_ch = params.inputs
    ? Channel.fromPath(params.inputs, checkIfExists:true)
    : Channel.empty()

  reads_ch \
    | ifEmpty { 'EMPTY' } \
    | foo
}

I'm guessing that x ## 'EMPTY' should be x == 'EMPTY'

Parsing from initial run argument into process scripts

How do I parse an argument from the initiation command into one of my process scripts that's in another language like Python.

Initiation cmd:
nextflow run /Path/to/myscript.nf --in '/Path/to/MyData'

process dataDirectories {

"""
#!/usr/bin/env python2.7

import os
import getpass

currentUser=getpass.getuser()

dataPath="/home/" + currentUser + "/WGS_Data/" + MyData
resultsPath="/home/" + currentUser + "/WGS_Results/" + MyData

try:
	os.makedirs(dataPath, 0o777)    
	os.makedirs(resultsPath, 0o777)

except:
	pass
"""

}

I would like to get the string 'MyData' from the path (which is in my '--in' argument) into my python script.
Can this be done?

Add combinations pattern

Channel.from([['A', 10], ['B', 8], ['C', 5], ['D', 4]])
  .toList().map{ [it, it].combinations().findAll{ a, b -> a[1] < b[1]} }
  .flatMap()
  .view()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.