Giter Site home page Giter Site logo

nf-core / modules Goto Github PK

View Code? Open in Web Editor NEW
255.0 141.0 626.0 149.42 MB

Repository to host tool-specific module files for the Nextflow DSL2 community!

Home Page: https://nf-co.re/modules

License: MIT License

Nextflow 95.86% Dockerfile 0.28% Shell 0.10% Python 0.80% R 2.96%
nf-core nextflow modules workflows pipelines dsl2 nf-test

modules's Introduction

nf-core/modules

Nextflow run with conda run with docker run with singularity

GitHub Actions Coda Linting Get help on Slack

Follow on Twitter Watch on YouTube

THIS REPOSITORY IS UNDER ACTIVE DEVELOPMENT. SYNTAX, ORGANISATION AND LAYOUT MAY CHANGE WITHOUT NOTICE!

A repository for hosting Nextflow DSL2 module files containing tool-specific process definitions and their associated documentation.

Table of contents

Using existing modules

The module files hosted in this repository define a set of processes for software tools such as fastqc, bwa, samtools etc. This allows you to share and add common functionality across multiple pipelines in a modular fashion.

We have written a helper command in the nf-core/tools package that uses the GitHub API to obtain the relevant information for the module files present in the modules/ directory of this repository. This includes using git commit hashes to track changes for reproducibility purposes, and to download and install all of the relevant module files.

  1. Install the latest version of nf-core/tools (>=2.0)

  2. List the available modules:

    $ nf-core modules list remote
    
                                          ,--./,-.
          ___     __   __   __   ___     /,-._.--~\
    |\ | |__  __ /  ` /  \ |__) |__         }  {
    | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                          `._,._,'
    
    nf-core/tools version 2.0
    
    INFO     Modules available from nf-core/modules (master):                       pipeline_modules.py:164
    
    ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
    ┃ Module Name                    ┃
    ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
    │ bandage/image                  │
    │ bcftools/consensus             │
    │ bcftools/filter                │
    │ bcftools/isec                  │
    ..truncated..
  3. Install the module in your pipeline directory:

    $ nf-core modules install fastqc
    
                                          ,--./,-.
          ___     __   __   __   ___     /,-._.--~\
    |\ | |__  __ /  ` /  \ |__) |__         }  {
    | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                          `._,._,'
    
    nf-core/tools version 2.0
    
    INFO     Installing fastqc                                                      pipeline_modules.py:213
    INFO     Downloaded 3 files to ./modules/nf-core/modules/fastqc                 pipeline_modules.py:236
  4. Import the module in your Nextflow script:

    #!/usr/bin/env nextflow
    
    nextflow.enable.dsl = 2
    
    include { FASTQC } from './modules/nf-core/modules/fastqc/main'
  5. Remove the module from the pipeline repository if required:

    $ nf-core modules remove fastqc
    
                                          ,--./,-.
          ___     __   __   __   ___     /,-._.--~\
    |\ | |__  __ /  ` /  \ |__) |__         }  {
    | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                          `._,._,'
    
    nf-core/tools version 2.0
    
    INFO     Removing fastqc                                                        pipeline_modules.py:271
    INFO     Successfully removed fastqc                                            pipeline_modules.py:285
  6. Check that a locally installed nf-core module is up-to-date compared to the one hosted in this repo:

    $ nf-core modules lint fastqc
    
                                          ,--./,-.
          ___     __   __   __   ___     /,-._.--~\
    |\ | |__  __ /  ` /  \ |__) |__         }  {
    | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                          `._,._,'
    
    nf-core/tools version 2.0
    
    INFO     Linting pipeline: .                                                    lint.py:104
    INFO     Linting module: fastqc                                                 lint.py:106
    
    ╭─────────────────────────────────────────────────────────────────────────────────╮
    │ [!] 1 Test Warning                                                              │
    ╰─────────────────────────────────────────────────────────────────────────────────╯
    ╭──────────────┬───────────────────────────────┬──────────────────────────────────╮
    │ Module name  │ Test message                  │ File path                        │
    ├──────────────┼───────────────────────────────┼──────────────────────────────────┤
    │ fastqc       │ Local copy of module outdated │ modules/nf-core/modules/fastqc/  │
    ╰──────────────┴────────────────────────────── ┴──────────────────────────────────╯
    ╭──────────────────────╮
    │ LINT RESULTS SUMMARY │
    ├──────────────────────┤
    │ [✔]  15 Tests Passed │
    │ [!]   1 Test Warning │
    │ [✗]   0 Test Failed  │
    ╰──────────────────────╯

Adding new modules

If you wish to contribute a new module, please see the documentation on the nf-core website.

Please be kind to our code reviewers and submit one pull request per module :)

Help

For further information or help, don't hesitate to get in touch on Slack #modules channel (you can join with this invite).

Citation

If you use the module files in this repository for your analysis please you can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

modules's People

Contributors

adamrtalbot avatar andersgs avatar apeltzer avatar asp8200 avatar chris-cheshire avatar drpatelh avatar edmundmiller avatar ewels avatar friederikehanssen avatar grst avatar heuermh avatar jasmezz avatar jfy133 avatar jianhong avatar joseespinosa avatar kevinmenden avatar louislenezet avatar mahesh-panchal avatar mashehu avatar matthdsm avatar maxulysse avatar muffato avatar nvnieuwk avatar pinin4fjords avatar ramprasadn avatar rpetit3 avatar sateeshperi avatar sruthipsuresh avatar susijo avatar veitveit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

modules's Issues

Handle module / process imports

Lots of people use nf-core pipelines offline. We want to make the process of using modules from a different repository as simple as possible.

One solution would be to use git submodule to add nf-core/modules as a git submodule to every pipeline. By default, doing git clone will not pull the submodules. Doing git clone --recursive or git submodule update --init --recursive will pull the module repository.

Loading logic could then be:

  • Try to load the files locally - works if submodule is initialised. Fails otherwise.
  • If fails, try to load from the web
  • If fails, exit with an error

Then by default most people running online will pull the online files dynamically. But pulling a pipeline to use offline is super easy and does not require any changes to files or config.

Currently nf-core download manually pulls institutional config files and edits nextflow.config so that the pipeline loads these files. This could also be done with submodules as above, without any need to edit any files.

Limitations would be that we have to manage the git hash of the modules repository in two places - the git submodule file and the nextflow.config file. We can lint to check that these two are the same. Also, this forces pipelines to use a single hash for all modules in the pipeline. I think this is probably ok for reasons of maintaining sanity though.

Thoughts?

Use JSON for meta data

The more I think about it, the more I think that JSON is more appropriate for the meta information. We have nested lists and other semi-complicated structures at JSON is more verbose and clear with this stuff.

Write custom test for checking contents of BAM file.

https://pytest-workflow.readthedocs.io/en/stable/#writing-custom-tests

Bowtie/2 include in the header the run commands, which are never going to be the same, so the md5 hash will never be equal across different containers.

$ samtools view -H test.bam
## Singularity
@HD     VN:1.0  SO:unsorted
@SQ     SN:gi|170079663|ref|NC_010473.1|        LN:4686137
@PG     ID:bowtie2      PN:bowtie2      VN:2.4.2        CL:"/usr/local/bin/bowtie2-align-s --wrapper basic-0 -x ./bowtie2/NC_010473 --threads 1 -1 test_R1.fastq.gz -2 test_R2.fastq.gz"
@PG     ID:samtools     PN:samtools     PP:bowtie2      VN:1.11 CL:samtools view -@ 1 -bhS -o test.bam -
## Conda
@HD     VN:1.0  SO:unsorted
@SQ     SN:gi|170079663|ref|NC_010473.1|        LN:4686137
@PG     ID:bowtie2      PN:bowtie2      VN:2.4.2        CL:"/tmp/pytest_workflow_4fbqrxe4/Run_bowtie2_index_and_align_paired-end/work/conda/env-10b78180015f409ae983f51f20f43c6a/bin/bowtie2-align-s --wrapper basic-0 -x ./bowtie2/NC_010473 --threads 1 -1 test_R1.fastq.gz -2 test_R2.fastq.gz"
@PG     ID:samtools     PN:samtools     PP:bowtie2      VN:1.11 CL:samtools view -@ 1 -bhS -o test.bam -

Share singularity images using CVMFS

Hi here,

Maybe it was already suggested, somewhere?
Have you ever think about sharing the singularity images using the CernVM File System. It allow to provide files on the network based on a web of mirrors.

The Galaxy project is using this technology to share databanks, singularity images, configs accross Galaxy instances.
https://galaxyproject.org/blog/2019-02-cvmfs/

More easy to suggest than implement. So far, I'm just a client (and soon a stratum 1) and never try to build something from scratch.

My 2 cents

Test how variable numbers of inputs and outputs work

Need to look in to how Nextflow DSL2 handles variable numbers of inputs or outputs.

For example - TrimGalore! can optionally save untrimmed reads. If that is enabled, we will have an additional output channel. How do pipelines handle this?

Port modules and single-tool workflows from Babraham

I'll be working on adding the modules and single-tool workflows that were already used and tested at the Babraham.

To avoid duplication of efforts, the tools I'll be working on initially will include the following:

QC

  • FastQC
  • FastQ Screen
  • MultiQC

Trimming

  • Trim Galore

Alignment

  • Bowtie2
  • HISAT2
  • Bismark
  • deduplicate_bismark
  • bismark_methylation_extractor
  • bismark2bedGraph
  • coverage2cytosine
  • bismark2summary
  • bismark2report

Read Simulator:

  • Sherman

Allele-specific sorting:

  • SNPsplit

Module documentation format

We need to decide how best to be able to document each individual module itself e.g. what is this module doing, keywords for findability, links to homepage per tool used in the process etc. @sven and I came up with a rudimentary version of this but I think we will need more discussion to get this right.

/*
* Description:
*     Run FastQC on sequenced reads
* Keywords:
*     read qc
*     adapter
* Tools:
*     FastQC:
*         homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
*         documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/
*         description: FastQC gives general quality metrics about your reads.
*                      It provides information about the quality score distribution
*                      across your reads, the per base sequence content (%A/C/G/T).
*                      You get information about adapter contamination and other
*                      overrepresented sequences.
*/

It would also be good to be able to generate automated docs for the types of objects that are required as input: and output: for each modules, the script: section and any other information that may be useful. @sven suggested we may be able to get this by directly by plugging into NF.

This is all still open for discussion so please chime in if you have some ideas.

Add codeowners

To get pinged and have actual owners who keep up with the software that the modules are taking advantage of.

Use --seed parameters for aligners / other tools wherever possible

Given that we are now testing for the same outputs generated by a given module in order to detect changes as a result of updating the module itself, it would be good if we can somehow factor in instances where for example alignments are generated at random if the same tool is run more than once. This will rightly break the CI tests but one way around that is to use --seed parameters if available e.g. Bowtie2.

The implementation should be as simple as passing the appropriate optional argument to the tool in the main.nf script for the tests.

Also see #143 (comment)

Module tests

There will be various tests we can perform on individual module files...how far we go and how we implement this is up for discussion.

  1. Test and parse module file to create documentation for information about the tools used in the process e.g home-page links etc
  2. Test and parse the content of the process via NF e.g. input:, output: and script:
  3. Test the module works on include with a vanilla template script
  4. Test the actual process command works by bundling containers from biocontainers as default and testing the execution - this will also require the appropriate test data to be hosted somewhere for CI tests. This could be a can of worms as we should be able to expect contributors to test this anyway (@sven?)

Add t-coffee module

Write a tcoffee module:

  • Create the module for tcoffee itself.
  • Since there is any data set available for testing multiple sequence alignment, include a dataset on the modules branch of nf-core/test-datasets

Module file versioning

We need to come up with a way to version each module or at least be able to use a particular version of a module within the main pipeline script. Through previous discussions we have somewhat agreed that we need to be able to do this via git commit as we are able to do with nf-core/configs. Whether we are able to do this at the level of individual module files or a commit id for the entire nf-core/modules repo is still up for discussion.

Run a test on all modules which have been modified in that push / PR, all in parallel in separate jobs

I was wondering if using the bits @ewels posted on slack, we could solve both of these issues and introduce a "test autodiscovery" from a single github action, that spawns "pytest workflow" for each changed folder using the "test matrix" strategy. Like that the pytest-workflow test-dir could be set to each module directory, and the tests folder contained in each module.

IMO, that would clean up quite a few redundancies.

Originally posted by @grst in #80 (comment)

Without separate workflow files for each module.

Direct download of Singularity images via HTTPS

Before we released v2.0 of the rnaseq pipeline Nextflow didn't have direct download support for Singularity images. Paolo has now added this functionality here and it will be available in any releases after 20.10.0.

I had already added some logic to download the Singularity images in the DSL2 module files but it had to be removed in #76 for the reasons outlined above. Be great to add it back in after the next stable Nextflow release!

Use remote repos with include statement

Need to test and possibly work out a way to use a remote git repo with the include statement
e.g.

modules_base = "https://raw.githubusercontent.com/nf-core/modules/${params.module_version}"
include "${modules_base}" params(params)

File structure

Suggested during discussion at the Stockholm hackathon about potential repository organisation:

.
├── .github
│   └── wokflows
│       └── test-processes.yml
├── README.md
├── nf-core
└── tools
    ├── bwa
    │   └── mem
    │       ├── main.nf
    │       ├── meta.yml
    │       └── test-action.yml
    ├── fastqc
    │   ├── main.nf
    │   ├── meta.yml
    │   └── test-action.yml
    └── samtools
        ├── index
        │   ├── main.nf
        │   ├── meta.yml
        │   └── test-action.yml
        └── sort
            ├── main.nf
            ├── meta.yml
            └── test-action.yml
  • Have a directory for every tool
  • Have subdirectories for every subcommand
  • Have a yaml meta file with descriptions of the process
  • .github/workflows/test-processes.yml will have a step for each process tool.
    • Each step can use path to only run when those files are changed (docs)
    • Each step can reference the test-action.yml file held in the process subdirectory with uses (docs)
    • Need to lint that .github/workflows/test-processes.yml has a step for every process

  • QUESTION: commands that can be run in very different ways?
    • Should we have a different subdirectory for commands that can be run in a very different manner?
  • QUESTION: What happens with variable numbers of inputs and outputs? cf. #6

Add Editor Config lint back in

I commented out the Editor config linting 082c582 but it would be good to fix and add this back in. Possibly in one go to get all of the tests passing again.

Cache nextflow binary

Edit: during ci jobs to speed up the workflows. This will be applicable across nf-core ci-jobs

Module parameter inheritance and parameter wrapping

Copied from the slack channel:


Hi guys,

Can I get your feedback on a custom parameter inheritance model we have built-in for our modules?

Our user story is such that we wanted a set of default params defined inside the module to run the process in the case that the user imports the module and does nothing else.
We then wanted to be able to override the params with those from the parent nf file, but without making large boilerplate calls using addParams or by passing arguments as channels as we feel these should be retained for data.
Finally, we wanted to be able to set group parameters on multiple includes of the same module but retaining the ability to override the module params individually if we wanted to.
We found during our testing that any module params defined actually override the global parameters which is the opposite of what we wanted. This forces either the route via addParams or the route via channels, neither of which we wanted to use.

I constructed a custom groovy class which automatically overrides the params by matching names. First, the module params are prefixed with internal_* - then any parameter in the parent nf file can override an internal param by prefixing with the module name (e.g for cutadapt params.cutadapt_adapter_seq would override params.internal_adapter_seq inside the module.
This provides a model where defaults are used unless explicitly overridden in the parent. The same param is overridden in all module instances unless specifically overridden using addParams. This gives us the flexibility for example to define a global adapter sequence for cutadapt, but define separate output directories for each module instance.

The functionality requires 3 lines of code per module to implement.

I have posted the code below - please ignore the rest of the module parameter wise as we are still building out and generalising (we also know there is a cutadapt module, its just an easy example)

#!/usr/bin/env nextflow
// Include NfUtils
Class groovyClass = new GroovyClassLoader(getClass().getClassLoader()).parseClass(new File("groovy/NfUtils.groovy"));
GroovyObject nfUtils = (GroovyObject) groovyClass.newInstance();
// Define internal params
module_name = 'cutadapt'
// Specify DSL2
nextflow.preview.dsl = 2
// TODO check version of cutadapt in host process
// Define default nextflow internals
params.internal_outdir = './results'
params.internal_process_name = 'cutadapt'
params.internal_output_prefix = ''
params.internal_min_quality = 10
params.internal_min_length = 16
params.internal_adapter_sequence = 'AGATCGGAAGAGC'
// Check if globals need to 
nfUtils.check_internal_overrides(module_name, params)
// Trimming reusable component
process cutadapt {
    // Tag
    tag "${sample_id}"
    publishDir "${params.internal_outdir}/${params.internal_process_name}",
        mode: "copy", overwrite: true
    input:
        //tuple val(sample_id), path(reads)
        path(reads)
    output:
        //tuple val(sample_id), path("${reads.simpleName}.trimmed.fq.gz")
        path("${params.internal_output_prefix}${reads.simpleName}.trimmed.fq.gz")
    shell:
    """
    cutadapt \
        -j ${task.cpus} \
        -q ${params.internal_min_quality} \
        --minimum-length ${params.internal_min_length} \
        -a ${params.internal_adapter_sequence} \
        -o ${params.internal_output_prefix}${reads.simpleName}.trimmed.fq.gz $reads
    """
}
class NfUtils{
    def check_internal_overrides(String moduleName, Map params)
    {
        // get params set of keys
        Set paramsKeySet = params.keySet()
        // Interate through and set internals to the correct parameter at runtime
        paramsKeySet.each {
            if(it.startsWith("internal_")) {
                def searchString = moduleName + '_' + it.replace('internal_', '');
                if(paramsKeySet.contains(searchString)) {
                    params.replace(it, params.get(searchString))
                }
            }
        }
    }
}
#!/usr/bin/env nextflow
// Define DSL2
nextflow.preview.dsl=2
// Log
log.info ("Starting Cutadapt trimming test pipeline")
/* Define global params
--------------------------------------------------------------------------------------*/
params.cutadapt_output_prefix = 'trimmed_'
/* Module inclusions 
--------------------------------------------------------------------------------------*/
include cutadapt from './trim-reads.nf' addParams(cutadapt_process_name: 'cutadapt1')
include cutadapt as cutadapt2 from './trim-reads.nf' addParams(cutadapt_process_name: 'cutadapt2')
/*------------------------------------------------------------------------------------*/
/* Define input channels
--------------------------------------------------------------------------------------*/
testPaths = [
  ['Sample 1', "$baseDir/input/readfile1.fq.gz"],
  ['Sample 2', "$baseDir/input/readfile2.fq.gz"],
  ['Sample 3', "$baseDir/input/readfile3.fq.gz"],
  ['Sample 4', "$baseDir/input/readfile4.fq.gz"],
  ['Sample 5', "$baseDir/input/readfile5.fq.gz"],
  ['Sample 6', "$baseDir/input/readfile6.fq.gz"]
]
// Create channel of test data (excluding the sample ID)
 Channel
  .from(testPaths)
  .map { row -> file(row[1]) }
  .set {ch_test_inputs}
  Channel
  .from(testPaths)
  .map { row -> file(row[1]) }
  .set {ch_test_inputs2}
/*------------------------------------------------------------------------------------*/
// Run workflow
workflow {
    // Run cutadapt
    cutadapt( ch_test_inputs )
    // Run cutadapt
    cutadapt2( ch_test_inputs2 )
    // Collect file names and view output
    //cutadapt.out | view 
}

Configure Homer reproducibly and effeciently

This is just a place holder for a future discussion. I'm working on adding some homer modules. The problem is that the way the configuration occurs in the currently used dockerfile and because the docker files are read only.

https://hub.docker.com/r/dennishazelett/homer

Here's a documented example of how they create various genomes off a base docker file.

So far I have

    perl /usr/local/share/homer-4.11-2/configureHomer.pl \\
        -install $genome \\
        -keepScript

Which runs but I'm not able to take the /usr/local/share/homer-4.11-2/ directory and use it as an output.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.