nf-core / modules Goto Github PK

View Code? Open in Web Editor NEW

263.0 145.0 657.0 151.91 MB

Repository to host tool-specific module files for the Nextflow DSL2 community!

Home Page: https://nf-co.re/modules

License: MIT License

Nextflow 95.64% Dockerfile 0.24% Shell 0.08% Python 0.81% R 3.23%

nf-core nextflow modules workflows pipelines dsl2 nf-test

modules's Introduction

THIS REPOSITORY IS UNDER ACTIVE DEVELOPMENT. SYNTAX, ORGANISATION AND LAYOUT MAY CHANGE WITHOUT NOTICE!

A repository for hosting Nextflow DSL2 module files containing tool-specific process definitions and their associated documentation.

Using existing modules
Adding new modules
Help
Citation

Using existing modules

The module files hosted in this repository define a set of processes for software tools such as fastqc, bwa, samtools etc. This allows you to share and add common functionality across multiple pipelines in a modular fashion.

We have written a helper command in the nf-core/tools package that uses the GitHub API to obtain the relevant information for the module files present in the modules/ directory of this repository. This includes using git commit hashes to track changes for reproducibility purposes, and to download and install all of the relevant module files.

Install the latest version of nf-core/tools (>=2.0)

List the available modules:

$ nf-core modules list remote

                                      ,--./,-.
      ___     __   __   __   ___     /,-._.--~\
|\ | |__  __ /  ` /  \ |__) |__         }  {
| \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                      `._,._,'

nf-core/tools version 2.0

INFO     Modules available from nf-core/modules (master):                       pipeline_modules.py:164

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Module Name                    ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ bandage/image                  │
│ bcftools/consensus             │
│ bcftools/filter                │
│ bcftools/isec                  │
..truncated..

Install the module in your pipeline directory:

$ nf-core modules install fastqc

                                      ,--./,-.
      ___     __   __   __   ___     /,-._.--~\
|\ | |__  __ /  ` /  \ |__) |__         }  {
| \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                      `._,._,'

nf-core/tools version 2.0

INFO     Installing fastqc                                                      pipeline_modules.py:213
INFO     Downloaded 3 files to ./modules/nf-core/modules/fastqc                 pipeline_modules.py:236

Import the module in your Nextflow script:

#!/usr/bin/env nextflow

nextflow.enable.dsl = 2

include { FASTQC } from './modules/nf-core/modules/fastqc/main'

Remove the module from the pipeline repository if required:

$ nf-core modules remove fastqc

                                      ,--./,-.
      ___     __   __   __   ___     /,-._.--~\
|\ | |__  __ /  ` /  \ |__) |__         }  {
| \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                      `._,._,'

nf-core/tools version 2.0

INFO     Removing fastqc                                                        pipeline_modules.py:271
INFO     Successfully removed fastqc                                            pipeline_modules.py:285

Check that a locally installed nf-core module is up-to-date compared to the one hosted in this repo:

$ nf-core modules lint fastqc

                                      ,--./,-.
      ___     __   __   __   ___     /,-._.--~\
|\ | |__  __ /  ` /  \ |__) |__         }  {
| \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                      `._,._,'

nf-core/tools version 2.0

INFO     Linting pipeline: .                                                    lint.py:104
INFO     Linting module: fastqc                                                 lint.py:106

╭─────────────────────────────────────────────────────────────────────────────────╮
│ [!] 1 Test Warning                                                              │
╰─────────────────────────────────────────────────────────────────────────────────╯
╭──────────────┬───────────────────────────────┬──────────────────────────────────╮
│ Module name  │ Test message                  │ File path                        │
├──────────────┼───────────────────────────────┼──────────────────────────────────┤
│ fastqc       │ Local copy of module outdated │ modules/nf-core/modules/fastqc/  │
╰──────────────┴────────────────────────────── ┴──────────────────────────────────╯
╭──────────────────────╮
│ LINT RESULTS SUMMARY │
├──────────────────────┤
│ [✔]  15 Tests Passed │
│ [!]   1 Test Warning │
│ [✗]   0 Test Failed  │
╰──────────────────────╯

Adding new modules

If you wish to contribute a new module, please see the documentation on the nf-core website.

Please be kind to our code reviewers and submit one pull request per module :)

Help

For further information or help, don't hesitate to get in touch on Slack #modules channel (you can join with this invite).

Citation

If you use the module files in this repository for your analysis please you can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

modules's People

Contributors

Stargazers

Watchers

Forkers

ewels apeltzer thanhleviet dfornika joseespinosa grst annacprice aegaskin piotr-faba-ardigen felixkrueger patres36 luslab ggabernet drpatelh i-pletenev stevekm jeremy1805 ryoiwata dockstore-testing j-81 odiogosilva abhi18av phue andersgs heuermh truwl zhangdongqin sk-sahu mark-s-hill hediatnani marissadubbelaar chelauk ajodeh-juma flowuenne aanil arontommi anu9109 sruthipsuresh erkison antunderwood gcjmackenzie batoolmm ntoda03 avantonder fullama kkamieniecka fabianegli kevbrick jefferysforked jemten sguizard soniad89 oist asridhar94 mgordon09 goekelab here0009 edotau praveenraj2018 mjmansfi aleksandrabliznina bioinfotongli charles-plessy jianhong ravinpoudel erikdanielsson kevinmenden riederd projectoriented bjohnnyd kaurravneet4123 veitveit matq007 mxrcon c-mertes rjpbonnal alaindomissy sidduppal jfnavarro fbdtemme gordiandziwis ramprasadn daisyhan97 eramosva lskatz noirot nicorap rpetit3 tron-bioinformatics klkeys darcy220606 alexandregilardet santiagorevale daichengxin mashehu crukmi-computationalbiology clinicalgenomicsgbg pagoy69 emnilsson louperelo

modules's Issues

Add Editor Config lint back in

I commented out the Editor config linting 082c582 but it would be good to fix and add this back in. Possibly in one go to get all of the tests passing again.

new module: freebayes/single

I think it would be good to have module for freebayes

Module file versioning

We need to come up with a way to version each module or at least be able to use a particular version of a module within the main pipeline script. Through previous discussions we have somewhat agreed that we need to be able to do this via git commit as we are able to do with nf-core/configs. Whether we are able to do this at the level of individual module files or a commit id for the entire nf-core/modules repo is still up for discussion.

Upload logs and stdout for failed tests

Add tests for sortmerna

nf-core软件函数

Add tests for umitools

Share singularity images using CVMFS

Hi here,

Maybe it was already suggested, somewhere?
Have you ever think about sharing the singularity images using the CernVM File System. It allow to provide files on the network based on a web of mirrors.

The Galaxy project is using this technology to share databanks, singularity images, configs accross Galaxy instances.
https://galaxyproject.org/blog/2019-02-cvmfs/

More easy to suggest than implement. So far, I'm just a client (and soon a stratum 1) and never try to build something from scratch.

My 2 cents

Use JSON for meta data

The more I think about it, the more I think that JSON is more appropriate for the meta information. We have nested lists and other semi-complicated structures at JSON is more verbose and clear with this stuff.

Add t-coffee module

Write a tcoffee module:

Create the module for tcoffee itself.
Since there is any data set available for testing multiple sequence alignment, include a dataset on the modules branch of nf-core/test-datasets

Add tests for qualimap/rnaseq

Add tests for ucsc

Configure Homer reproducibly and effeciently

This is just a place holder for a future discussion. I'm working on adding some homer modules. The problem is that the way the configuration occurs in the currently used dockerfile and because the docker files are read only.

https://hub.docker.com/r/dennishazelett/homer

Here's a documented example of how they create various genomes off a base docker file.

So far I have

    perl /usr/local/share/homer-4.11-2/configureHomer.pl \\
        -install $genome \\
        -keepScript

Which runs but I'm not able to take the /usr/local/share/homer-4.11-2/ directory and use it as an output.

Add codeowners

To get pinged and have actual owners who keep up with the software that the modules are taking advantage of.

Add tests for salmon

Add tests for gffread

Cache nextflow binary

Edit: during ci jobs to speed up the workflows. This will be applicable across nf-core ci-jobs

new module: fgbio/fastqtobam

I think it would be good to have module for fgbio

new module: gatk4/calculatecontamination

I think it would be good to have module for gatk4

Run a test on all modules which have been modified in that push / PR, all in parallel in separate jobs

I was wondering if using the bits @ewels posted on slack, we could solve both of these issues and introduce a "test autodiscovery" from a single github action, that spawns "pytest workflow" for each changed folder using the "test matrix" strategy. Like that the pytest-workflow test-dir could be set to each module directory, and the tests folder contained in each module.

IMO, that would clean up quite a few redundancies.

Originally posted by @grst in #80 (comment)

Without separate workflow files for each module.

new module: fgbio/groupreadsbyumi

I think it would be good to have module for fgbio

Use remote repos with include statement

Need to test and possibly work out a way to use a remote git repo with the include statement
e.g.

modules_base = "https://raw.githubusercontent.com/nf-core/modules/${params.module_version}"
include "${modules_base}" params(params)

Add tests for homer

Module documentation format

We need to decide how best to be able to document each individual module itself e.g. what is this module doing, keywords for findability, links to homepage per tool used in the process etc. @sven and I came up with a rudimentary version of this but I think we will need more discussion to get this right.

/*
* Description:
*     Run FastQC on sequenced reads
* Keywords:
*     read qc
*     adapter
* Tools:
*     FastQC:
*         homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
*         documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/
*         description: FastQC gives general quality metrics about your reads.
*                      It provides information about the quality score distribution
*                      across your reads, the per base sequence content (%A/C/G/T).
*                      You get information about adapter contamination and other
*                      overrepresented sequences.
*/

It would also be good to be able to generate automated docs for the types of objects that are required as input: and output: for each modules, the script: section and any other information that may be useful. @sven suggested we may be able to get this by directly by plugging into NF.

This is all still open for discussion so please chime in if you have some ideas.

Add tests for phantompeakqualtools

new module: ascat

I think it would be good to have a module for ascat

Add tests for macs2

Support multiple profiles with tests

For running locally and for more CI jobs.

Direct download of Singularity images via HTTPS

Before we released v2.0 of the rnaseq pipeline Nextflow didn't have direct download support for Singularity images. Paolo has now added this functionality here and it will be available in any releases after 20.10.0.

I had already added some logic to download the Singularity images in the DSL2 module files but it had to be removed in #76 for the reasons outlined above. Be great to add it back in after the next stable Nextflow release!

New tool - shovill (bacterial assembly)

Write module file for shovill https://github.com/tseemann/shovill

Add tests for stringtie

File structure

Suggested during discussion at the Stockholm hackathon about potential repository organisation:

.
├── .github
│   └── wokflows
│       └── test-processes.yml
├── README.md
├── nf-core
└── tools
    ├── bwa
    │   └── mem
    │       ├── main.nf
    │       ├── meta.yml
    │       └── test-action.yml
    ├── fastqc
    │   ├── main.nf
    │   ├── meta.yml
    │   └── test-action.yml
    └── samtools
        ├── index
        │   ├── main.nf
        │   ├── meta.yml
        │   └── test-action.yml
        └── sort
            ├── main.nf
            ├── meta.yml
            └── test-action.yml

Have a directory for every tool
Have subdirectories for every subcommand
Have a yaml meta file with descriptions of the process
.github/workflows/test-processes.yml will have a step for each process tool.
- Each step can use path to only run when those files are changed (docs)
- Each step can reference the test-action.yml file held in the process subdirectory with uses (docs)
- Need to lint that .github/workflows/test-processes.yml has a step for every process

QUESTION: commands that can be run in very different ways?
- Should we have a different subdirectory for commands that can be run in a very different manner?
QUESTION: What happens with variable numbers of inputs and outputs? cf. #6

Add tests for deeptools

new module: freebayes/somatic

I think it would be good to have module for freebayes

new module: fgbio/callmolecularconsensusreads

I think it would be good to have module for fgbio

Port modules and single-tool workflows from Babraham

I'll be working on adding the modules and single-tool workflows that were already used and tested at the Babraham.

To avoid duplication of efforts, the tools I'll be working on initially will include the following:

FastQC
FastQ Screen
MultiQC

Trimming

Trim Galore

Alignment

Read Simulator:

Sherman

Allele-specific sorting:

SNPsplit

Test how variable numbers of inputs and outputs work

Need to look in to how Nextflow DSL2 handles variable numbers of inputs or outputs.

For example - TrimGalore! can optionally save untrimmed reads. If that is enabled, we will have an additional output channel. How do pipelines handle this?

Add tests for preseq

Add tests for hisat2

Module parameter inheritance and parameter wrapping

Copied from the slack channel:

Hi guys,

Can I get your feedback on a custom parameter inheritance model we have built-in for our modules?

Our user story is such that we wanted a set of default params defined inside the module to run the process in the case that the user imports the module and does nothing else.
We then wanted to be able to override the params with those from the parent nf file, but without making large boilerplate calls using addParams or by passing arguments as channels as we feel these should be retained for data.
Finally, we wanted to be able to set group parameters on multiple includes of the same module but retaining the ability to override the module params individually if we wanted to.
We found during our testing that any module params defined actually override the global parameters which is the opposite of what we wanted. This forces either the route via addParams or the route via channels, neither of which we wanted to use.

I constructed a custom groovy class which automatically overrides the params by matching names. First, the module params are prefixed with internal_* - then any parameter in the parent nf file can override an internal param by prefixing with the module name (e.g for cutadapt params.cutadapt_adapter_seq would override params.internal_adapter_seq inside the module.
This provides a model where defaults are used unless explicitly overridden in the parent. The same param is overridden in all module instances unless specifically overridden using addParams. This gives us the flexibility for example to define a global adapter sequence for cutadapt, but define separate output directories for each module instance.

The functionality requires 3 lines of code per module to implement.

I have posted the code below - please ignore the rest of the module parameter wise as we are still building out and generalising (we also know there is a cutadapt module, its just an easy example)

#!/usr/bin/env nextflow
// Include NfUtils
Class groovyClass = new GroovyClassLoader(getClass().getClassLoader()).parseClass(new File("groovy/NfUtils.groovy"));
GroovyObject nfUtils = (GroovyObject) groovyClass.newInstance();
// Define internal params
module_name = 'cutadapt'
// Specify DSL2
nextflow.preview.dsl = 2
// TODO check version of cutadapt in host process
// Define default nextflow internals
params.internal_outdir = './results'
params.internal_process_name = 'cutadapt'
params.internal_output_prefix = ''
params.internal_min_quality = 10
params.internal_min_length = 16
params.internal_adapter_sequence = 'AGATCGGAAGAGC'
// Check if globals need to 
nfUtils.check_internal_overrides(module_name, params)
// Trimming reusable component
process cutadapt {
    // Tag
    tag "${sample_id}"
    publishDir "${params.internal_outdir}/${params.internal_process_name}",
        mode: "copy", overwrite: true
    input:
        //tuple val(sample_id), path(reads)
        path(reads)
    output:
        //tuple val(sample_id), path("${reads.simpleName}.trimmed.fq.gz")
        path("${params.internal_output_prefix}${reads.simpleName}.trimmed.fq.gz")
    shell:
    """
    cutadapt \
        -j ${task.cpus} \
        -q ${params.internal_min_quality} \
        --minimum-length ${params.internal_min_length} \
        -a ${params.internal_adapter_sequence} \
        -o ${params.internal_output_prefix}${reads.simpleName}.trimmed.fq.gz $reads
    """
}

class NfUtils{
    def check_internal_overrides(String moduleName, Map params)
    {
        // get params set of keys
        Set paramsKeySet = params.keySet()
        // Interate through and set internals to the correct parameter at runtime
        paramsKeySet.each {
            if(it.startsWith("internal_")) {
                def searchString = moduleName + '_' + it.replace('internal_', '');
                if(paramsKeySet.contains(searchString)) {
                    params.replace(it, params.get(searchString))
                }
            }
        }
    }
}

#!/usr/bin/env nextflow
// Define DSL2
nextflow.preview.dsl=2
// Log
log.info ("Starting Cutadapt trimming test pipeline")
/* Define global params
--------------------------------------------------------------------------------------*/
params.cutadapt_output_prefix = 'trimmed_'
/* Module inclusions 
--------------------------------------------------------------------------------------*/
include cutadapt from './trim-reads.nf' addParams(cutadapt_process_name: 'cutadapt1')
include cutadapt as cutadapt2 from './trim-reads.nf' addParams(cutadapt_process_name: 'cutadapt2')
/*------------------------------------------------------------------------------------*/
/* Define input channels
--------------------------------------------------------------------------------------*/
testPaths = [
  ['Sample 1', "$baseDir/input/readfile1.fq.gz"],
  ['Sample 2', "$baseDir/input/readfile2.fq.gz"],
  ['Sample 3', "$baseDir/input/readfile3.fq.gz"],
  ['Sample 4', "$baseDir/input/readfile4.fq.gz"],
  ['Sample 5', "$baseDir/input/readfile5.fq.gz"],
  ['Sample 6', "$baseDir/input/readfile6.fq.gz"]
]
// Create channel of test data (excluding the sample ID)
 Channel
  .from(testPaths)
  .map { row -> file(row[1]) }
  .set {ch_test_inputs}
  Channel
  .from(testPaths)
  .map { row -> file(row[1]) }
  .set {ch_test_inputs2}
/*------------------------------------------------------------------------------------*/
// Run workflow
workflow {
    // Run cutadapt
    cutadapt( ch_test_inputs )
    // Run cutadapt
    cutadapt2( ch_test_inputs2 )
    // Collect file names and view output
    //cutadapt.out | view 
}

Add tests for rsem

Write custom test for checking contents of BAM file.

https://pytest-workflow.readthedocs.io/en/stable/#writing-custom-tests

Bowtie/2 include in the header the run commands, which are never going to be the same, so the md5 hash will never be equal across different containers.

$ samtools view -H test.bam
## Singularity
@HD     VN:1.0  SO:unsorted
@SQ     SN:gi|170079663|ref|NC_010473.1|        LN:4686137
@PG     ID:bowtie2      PN:bowtie2      VN:2.4.2        CL:"/usr/local/bin/bowtie2-align-s --wrapper basic-0 -x ./bowtie2/NC_010473 --threads 1 -1 test_R1.fastq.gz -2 test_R2.fastq.gz"
@PG     ID:samtools     PN:samtools     PP:bowtie2      VN:1.11 CL:samtools view -@ 1 -bhS -o test.bam -
## Conda
@HD     VN:1.0  SO:unsorted
@SQ     SN:gi|170079663|ref|NC_010473.1|        LN:4686137
@PG     ID:bowtie2      PN:bowtie2      VN:2.4.2        CL:"/tmp/pytest_workflow_4fbqrxe4/Run_bowtie2_index_and_align_paired-end/work/conda/env-10b78180015f409ae983f51f20f43c6a/bin/bowtie2-align-s --wrapper basic-0 -x ./bowtie2/NC_010473 --threads 1 -1 test_R1.fastq.gz -2 test_R2.fastq.gz"
@PG     ID:samtools     PN:samtools     PP:bowtie2      VN:1.11 CL:samtools view -@ 1 -bhS -o test.bam -

new module: allelecounter

I think it would be good to have a module for allelecounter

Add tests for star

Handle module / process imports

Lots of people use nf-core pipelines offline. We want to make the process of using modules from a different repository as simple as possible.

One solution would be to use git submodule to add nf-core/modules as a git submodule to every pipeline. By default, doing git clone will not pull the submodules. Doing git clone --recursive or git submodule update --init --recursive will pull the module repository.

Loading logic could then be:

Try to load the files locally - works if submodule is initialised. Fails otherwise.
If fails, try to load from the web
If fails, exit with an error

Then by default most people running online will pull the online files dynamically. But pulling a pipeline to use offline is super easy and does not require any changes to files or config.

Currently nf-core download manually pulls institutional config files and edits nextflow.config so that the pipeline loads these files. This could also be done with submodules as above, without any need to edit any files.

Limitations would be that we have to manage the git hash of the modules repository in two places - the git submodule file and the nextflow.config file. We can lint to check that these two are the same. Also, this forces pipelines to use a single hash for all modules in the pipeline. I think this is probably ok for reasons of maintaining sanity though.

Thoughts?

Support multiple nextflow versions in CI

Should we support multiple nextflow versions in the CI? If so which ones. See 8cd635f

new module: controlfreec

I think it would be good to have a module for controlfreec

Module tests

There will be various tests we can perform on individual module files...how far we go and how we implement this is up for discussion.

Test and parse module file to create documentation for information about the tools used in the process e.g home-page links etc
Test and parse the content of the process via NF e.g. input:, output: and script:
Test the module works on include with a vanilla template script
Test the actual process command works by bundling containers from biocontainers as default and testing the execution - this will also require the appropriate test data to be hosted somewhere for CI tests. This could be a can of worms as we should be able to expect contributors to test this anyway (@sven?)

Use --seed parameters for aligners / other tools wherever possible

Given that we are now testing for the same outputs generated by a given module in order to detect changes as a result of updating the module itself, it would be good if we can somehow factor in instances where for example alignments are generated at random if the same tool is run more than once. This will rightly break the CI tests but one way around that is to use --seed parameters if available e.g. Bowtie2.

The implementation should be as simple as passing the appropriate optional argument to the tool in the main.nf script for the tests.

Also see #143 (comment)

nf-core / modules Goto Github PK

modules's Introduction

Table of contents

Using existing modules

Adding new modules

Help

Citation

modules's People

Contributors

Stargazers

Watchers

Forkers

modules's Issues

Recommend Projects

Recommend Topics

Recommend Org