Giter Site home page Giter Site logo

nf-core / smrnaseq Goto Github PK

View Code? Open in Web Editor NEW
71.0 143.0 120.0 8.21 MB

A small-RNA sequencing analysis pipeline

Home Page: https://nf-co.re/smrnaseq

License: MIT License

HTML 1.77% R 3.55% Nextflow 94.68%
nf-core nextflow workflow small-rna smrna-seq pipeline

smrnaseq's Introduction

nf-core/smrnaseq

GitHub Actions CI Status GitHub Actions Linting StatusAWS CICite with Zenodo nf-test

Nextflow run with conda run with docker run with singularity Launch on Seqera Platform

Get help on SlackFollow on TwitterFollow on MastodonWatch on YouTube

Introduction

nf-core/smrnaseq is a bioinformatics best-practice analysis pipeline for Small RNA-Seq.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the nf-core website.

Online videos

A short talk about the history, current status and functionality on offer in this pipeline was given by Lorena Pantano (@lpantano) on 9th November 2021 as part of the nf-core/bytesize series.

You can find numerous talks on the nf-core events page from various topics including writing pipelines/modules in Nextflow DSL2, using nf-core tooling, running nf-core pipelines as well as more generic content like contributing to Github. Please check them out!

Pipeline summary

  1. Quality check and triming
    1. Raw read QC (FastQC)
    2. UMI extraction and miRNA adapter trimming (UMI-tools) (Optional)
    3. 3' adapter trimming (fastp)
    4. Read quality and length filter (fastp)
    5. Trim read QC (FastQC)
  2. UMI deduplication (Optional)
    1. Deduplication on fastq-level (UMICollapse)
    2. Barcode and miRNA adapter extraction (UMI-tools)
    3. Read length filter (fastp)
  3. miRNA QC (miRTrace)
  4. Contamination filtering (Bowtie2) (Optional)
    1. rRNA filtration
    2. tRNA filtration
    3. cDNA filtration
    4. ncRNA filtration
    5. piRNA filtration
    6. Others filtration
  5. UMI barcode deduplication (UMI-tools)
  6. miRNA quantification
    • EdgeR
      1. Reads alignment against miRBase mature miRNA (Bowtie1)
      2. Post-alignment processing of alignment against Mature miRNA (SAMtools)
      3. Unmapped reads (from reads vs mature miRNA) alignment against miRBase hairpin
      4. Post-alignment processing of alignment against Hairpin (SAMtools)
      5. Analysis on miRBase, or MirGeneDB hairpin counts (edgeR)
        • TMM normalization and a table of top expression hairpin
        • MDS plot clustering samples
        • Heatmap of sample similarities
    • Mirtop quantification
      1. Read collapsing (seqcluster)
      2. miRNA and isomiR annotation (mirtop)
  7. Genome Quantification (Optional)
    1. Reads alignment against host reference genome (Bowtie1)
    2. Post-alignment processing of alignment against host reference genome (SAMtools)
  8. Novel miRNAs and known miRNAs discovery (MiRDeep2) (Optional)
    1. Mapping against reference genome with the mapper module
    2. Known and novel miRNA discovery with the mirdeep2 module
  9. Present QC for raw read, alignment, and expression results (MultiQC)

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

sample,fastq_1
Clone1_N1,s3://ngi-igenomes/test-data/smrnaseq/C1-N1-R1_S4_L001_R1_001.fastq.gz
Clone1_N3,s3://ngi-igenomes/test-data/smrnaseq/C1-N3-R1_S6_L001_R1_001.fastq.gz
Clone9_N1,s3://ngi-igenomes/test-data/smrnaseq/C9-N1-R1_S7_L001_R1_001.fastq.gz
Clone9_N2,s3://ngi-igenomes/test-data/smrnaseq/C9-N2-R1_S8_L001_R1_001.fastq.gz
Clone9_N3,s3://ngi-igenomes/test-data/smrnaseq/C9-N3-R1_S9_L001_R1_001.fastq.gz
Control_N1,s3://ngi-igenomes/test-data/smrnaseq/Ctl-N1-R1_S1_L001_R1_001.fastq.gz
Control_N2,s3://ngi-igenomes/test-data/smrnaseq/Ctl-N2-R1_S2_L001_R1_001.fastq.gz
Control_N3,s3://ngi-igenomes/test-data/smrnaseq/Ctl-N3-R1_S3_L001_R1_001.fastq.gz

Each row represents a fastq file (single-end).

Now, you can run the pipeline using:

nextflow run nf-core/smrnaseq \
   -profile <docker/singularity/.../institute> \
  --input samplesheet.csv \
  --genome 'GRCh37' \
  --mirtrace_species 'hsa' \
  --protocol 'illumina' \
  --outdir <OUTDIR>

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details and further functionality, please refer to the usage documentation and the parameter documentation.

Pipeline output

To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.

Credits

nf-core/smrnaseq was originally written by P. Ewels, C. Wang, R. Hammarén, L. Pantano, A. Peltzer.

We thank the following people for their extensive assistance in the development of this pipeline:

Lorena Pantano (@lpantano) from MIT updated the pipeline to Nextflow DSL2.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #smrnaseq channel (you can join with this invite).

Citations

If you use nf-core/smrnaseq for your analysis, please cite it using the following doi: 10.5281/zenodo.3456879

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

smrnaseq's People

Contributors

abartlett004 avatar adamrtalbot avatar apeltzer avatar christopher-mohr avatar chuan-wang avatar ckcomputomics avatar drejom avatar drpatelh avatar erikdanielsson avatar ewels avatar fhausmann avatar grst avatar hammarn avatar jemten avatar joseespinosa avatar kevinmenden avatar klkeys avatar kstawiski avatar lcabus-flomics avatar lpantano avatar magdalenazz avatar maxulysse avatar mjsteinbaugh avatar nf-core-bot avatar pditommaso avatar robsyme avatar sdjebali avatar sguizard avatar sirselim avatar wbau avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

smrnaseq's Issues

User provided --three_prime_adapter is not recognised

When running on several small RNASeq data sets we have in house no miRNA are being detected. Upon inspecting the trimmed files they are all in the order of ~80Kb in size, and checking the trimming logs the issue is that the adapter being used isn't the one provided by the user.

This is the command being used:

nextflow run nf-core/smrnaseq -r 1.0.0 --reads 'fastq/*.fastq.gz' -profile conda \
  --genome 'GRCh37' --saveReference -resume --min_length 17 \
  --three_prime_adapter AGATCGGAAGAGC

However the adapter being reported in the trimming output is TGGAATTCTCGGGTGCCAAGG - the adapter that is defined in the illumina protocol.

When I clone the pipeline and edit hack the illumina protocol adapter to be the one we want to use for trimming everything works as expected:

nextflow run /tmp/smrnaseq/main.nf --reads 'fastq/*.fastq.gz' \
  -profile conda --protocol illumina --genome 'GRCh37' \
  --saveReference -resume --min_length 17

So for some reason the 'custom' user defined adapter parameter isn't being assigned or the illumina protocol is taking precedence. I haven't had time to dig into this anymore but am happy to do some more testing if required.

Make sample name cleaning regex configurable

A bunch of steps in the pipeline attempt to clean off common filename suffixes to give nicer sample names, eg:

prefix = reads.toString() - ~/(.R1)?(_R1)?(_trimmed)?(\.fq)?(\.fastq)?(\.gz)?$/

  1. (.R1)? should probably be (\.R1)? as the . alone is a wildcard matching any character
  2. This regex should be kept as a params variable to avoid repeating, and to make it configurable by the end user if it's having an undesirable effect.

mirdeep2 errors in v1.1.0 pipeline

Check Documentation

I have checked the following places for your error:

Description of the bug

I'm seeing mirdeep2 errors pop up with running the default configuration of the pipeline:

Error executing process > 'mirdeep2 (1)'

Caused by:
  Process `mirdeep2 (1)` terminated with an error exit status (255)

Command executed:

  perl -ane 's/[ybkmrsw]/N/ig;print;' hairpin.fa > hairpin_ok.fa
  sed 's/ .*//' genome.edited.fa | awk '$1 ~ /^>/ {gsub(/_/,"",$1); print; next} {print}' > genome_nowhitespace.fa

  miRDeep2.pl \
  30607-032.R1_trimmed_collapsed.fa \
  genome_nowhitespace.fa \
  30607-032.R1_trimmed_reads_vs_refdb.arf \

[...]

Error executing process > 'mirdeep2 (1)'

Caused by:
  Process `mirdeep2 (1)` terminated with an error exit status (255)

Command executed:

  perl -ane 's/[ybkmrsw]/N/ig;print;' hairpin.fa > hairpin_ok.fa
  sed 's/ .*//' genome.edited.fa | awk '$1 ~ /^>/ {gsub(/_/,"",$1); print; next} {print}' > genome_nowhitespace.fa

  miRDeep2.pl \
  30607-032.R1_trimmed_collapsed.fa \
  genome_nowhitespace.fa \
  30607-032.R1_trimmed_reads_vs_refdb.arf \

[...]

Error executing process > 'mirdeep2 (1)'

Caused by:
  Process `mirdeep2 (1)` terminated with an error exit status (255)

Command executed:

  perl -ane 's/[ybkmrsw]/N/ig;print;' hairpin.fa > hairpin_ok.fa
  sed 's/ .*//' genome.edited.fa | awk '$1 ~ /^>/ {gsub(/_/,"",$1); print; next} {print}' > genome_nowhitespace.fa

  miRDeep2.pl \
  30607-032.R1_trimmed_collapsed.fa \
  genome_nowhitespace.fa \
  30607-032.R1_trimmed_reads_vs_refdb.arf \
  mature.fa \
  none \
  hairpin_ok.fa \
  -d \
  -z _30607-032

Command exit status:
  255

Command output:


  #####################################
  #                                   #
  # miRDeep2.0.1.2                    #
  #                                   #
  # last change: 22/01/2019           #
  #                                   #
  #####################################

  miRDeep2 started at 18:54:23


  #Starting miRDeep2

Command error:
  #Starting miRDeep2
  /opt/conda/envs/nf-core-smrnaseq-1.1.0/bin/miRDeep2.pl 30607-032.R1_trimmed_collapsed.fa genome_nowhitespace.fa 30607-032.R1_trimmed_reads_vs_refdb.arf mature.fa none hairpin_ok.fa -d -z _3
0607-032

  miRDeep2 started at 18:54:23


  mkdir mirdeep_runs/run_30_09_2021_t_18_54_23_30607-032

  The mapped reference id chr22_KI270733v1_random from file 30607-032.R1_trimmed_reads_vs_refdb.arf is not an id of the genome file genome_nowhitespace.fa

Work dir:
  /opt/nextflow/work/work/fe/60acc7f099188ab8709ab98cf007f4

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

Steps to reproduce

nextflow run 'nf-core/smrnaseq' -profile 'docker' \
    --genome 'GRCh38' \
    --input 'fastq/*.R1.fastq.gz'

I'm currently re-running the pipeline with --skip_mirdeep to see if it will successfully run through.

System

  • Hardware: AWS AMI Ubuntu 20
  • Executor: EC2 / Docker
  • OS: Ubuntu 20
  • Version: 1.1.0 pipeline

Nextflow Installation

  • Version: 21.04.3

Set up AWS megatests

AWS megatests is now running nicely and we’re trying to set up all (most) nf-core pipelines to run a big dataset. We need to identify a set of public data to run benchmarks for the pipeline.

The idea is that this will run automatically for every release of the nf-core/smrnaseq pipeline. The results will then be publicly accessible from s3 and viewable through the website: https://nf-co.re/smrnaseq/results - this means that people can manually compare differences in output between pipeline releases if they wish.

We need a dataset that is as “normal” as possible, mouse or human, sequenced relatively recently and with a bunch of replicates etc. It can be a fairly large project

I'm hoping that @lpantano can help here, but suggestions from anyone and everyone are more than welcome! ✋🏻

In practical terms, once decided we need to:

  • Upload the FastQ files to s3: s3://nf-core-awsmegatests/smrnaseq/input_data/ (I can help with this)
  • Update test_full.config to work with these file paths
  • Check .github/workflows/awsfulltest.yml (should be no changes required I think?)
  • Merge, and try running the dev branch manually

AWS S3 Issue

Hello,

I'm trying to run smRNASeq pipeline as below. However keep running in to the S3 Issue.

nextflow run nf-core/smrnaseq -profile singularity --genome CanFam3.1 --mirtrace_species cfa --mirtrace_protocol qiaseq --input '*.fastq.gz' --protocol qiaseq
N E X T F L O W ~ version 21.04.3
Launching nf-core/smrnaseq [tender_marconi] - revision: 03333bf [master]


                                    ,--./,-.
    ___     __   __   __   ___     /,-._.--~'

|\ | |__ __ / / \ |__) |__ } { | \| | \__, \__/ | \ |___ \-.,--, .,._,'
nf-core/smrnaseq v1.1.0

Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: KR7X00MMWCJEK9JE; S3 Extended Request ID: Ep7w7RBLvRc/uP/NXcIxainkPrD73+6nof37vxNW85gFlYZQCukp4d+Os2ZR/msdmpDUDMbly4E=)

cat .nextflow.log:

Oct-19 10:31:59.899 [main] DEBUG nextflow.plugin.PluginUpdater - Starting plugin nf-amazon version: 1.0.5
Oct-19 10:31:59.900 [main] INFO org.pf4j.AbstractPluginManager - Start plugin '[email protected]'
Oct-19 10:31:59.925 [main] DEBUG nextflow.plugin.BasePlugin - Plugin started [email protected]
Oct-19 10:31:59.940 [main] DEBUG nextflow.file.FileHelper - > Added 'S3FileSystemProvider' to list of installed providers [s3]
Oct-19 10:31:59.940 [main] DEBUG nextflow.file.FileHelper - Started plugin 'nf-amazon' required to handle file: s3://ngi-igenomes/igenomes/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta/genome.fa
Oct-19 10:31:59.946 [main] DEBUG nextflow.file.FileHelper - Creating a file system instance for provider: S3FileSystemProvider
Oct-19 10:31:59.953 [main] DEBUG nextflow.Global - Using AWS credential defined in default section in file: /sc/kzd/home/ponnar02/.aws/credentials
Oct-19 10:31:59.955 [main] DEBUG nextflow.file.FileHelper - AWS S3 config details: {secret_key=nolvM8.., region=us-east-1, access_key=AKIAI4..}
Oct-19 10:32:01.477 [main] DEBUG nextflow.Session - Session aborted -- Cause: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: KR7X00MMWCJEK9JE; S3 Extended Request ID: Ep7w7RBLvRc/uP/NXcIxainkPrD73+6nof37vxNW85gFlYZQCukp4d+Os2ZR/msdmpDUDMbly4E=)
Oct-19 10:32:01.496 [main] ERROR nextflow.cli.Launcher - @unknown
com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: KR7X00MMWCJEK9JE; S3 Extended Request ID: Ep7w7RBLvRc/uP/NXcIxainkPrD73+6nof37vxNW85gFlYZQCukp4d+Os2ZR/msdmpDUDMbly4E=)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4914)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4860)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4854)
at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:880)
at com.upplication.s3fs.AmazonS3Client.listObjects(AmazonS3Client.java:105)
at com.upplication.s3fs.util.S3ObjectSummaryLookup.lookup(S3ObjectSummaryLookup.java:113)
at com.upplication.s3fs.S3FileSystemProvider.getAccessControl(S3FileSystemProvider.java:921)
at com.upplication.s3fs.S3FileSystemProvider.checkAccess(S3FileSystemProvider.java:607)
at java.nio.file.Files.exists(Files.java:2385)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
at org.codehaus.groovy.runtime.callsite.StaticMetaMethodSite.invoke(StaticMetaMethodSite.java:44)
at org.codehaus.groovy.runtime.callsite.StaticMetaMethodSite.call(StaticMetaMethodSite.java:89)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.StaticMetaMethodSite.call(StaticMetaMethodSite.java:94)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
at nextflow.extension.FilesEx.exists(FilesEx.groovy:454)
at nextflow.file.FileHelper.checkIfExists(FileHelper.groovy:986)
at nextflow.file.FileHelper$checkIfExists$2.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:148)
at nextflow.Nextflow.file(Nextflow.groovy:159)
at nextflow.Nextflow$file.callStatic(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallStatic(CallSiteArray.java:55)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callStatic(AbstractCallSite.java:217)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callStatic(AbstractCallSite.java:240)
at Script_4145bff8.runScript(Script_4145bff8:104)
at nextflow.script.BaseScript.runDsl1(BaseScript.groovy:164)
at nextflow.script.BaseScript.run(BaseScript.groovy:200)
at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:221)
at nextflow.script.ScriptRunner.run(ScriptRunner.groovy:212)
at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:120)
at nextflow.cli.CmdRun.run(CmdRun.groovy:302)
at nextflow.cli.Launcher.run(Launcher.groovy:475)
at nextflow.cli.Launcher.main(Launcher.groovy:657)

Error in the collapse process

Hi, thanks for your work. I'm tring to run this pipeline but it stop with the following error.

Error executing process > 'bowtie_miRBase_hairpin_collapsed (10.umitransformed.clean.fastq 10.umitransformed.clean_trimmed_umi_trimmed.fastq)

Caused by:
  No such property: baseName for class: nextflow.util.BlankSeparatedList

Source block:
  index_base = index.toString().tokenize(' ')[0].tokenize('.')[0]
  prefix = reads.baseName
  seq_center = params.seq_center ? "--sam-RG ID:${prefix} --sam-RG 'CN:${params.seq_center}'" : ''
  """
      bowtie \\
          $index_base \\
          -p ${task.cpus} \\
          -t \\
          -k 50 \\
          -a \\
          --best \\
          --strata \\
          -e 99999 \\
          --chunkmbs 2048 \\
          -q <(cat $reads) \\
          -S $seq_center \\
          | samtools view -bS - > ${prefix}.bam
      """

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

i use that pipeline with this configuration:

~/nextflow run nf-core/smrnaseq --reads '../cleaned/*.umitransformed.clean.fq.gz' --min_length 17 --genome 'GRCh37' --skipQC -profile singularity

How could i resolve this problem?

[FEATURE] (Optional) Additional filtering / contamination filtering steps

nf-core/smrnaseq feature request

Is your feature request related to a problem? Please describe

There are currently (no/few?) filtering / contamination filtering steps implemented (for trna, rrna, cdna,ncrna, pirna, ...) - we have some modules available for this and would like to contribute these.

Describe the solution you'd like

See above - making this available for everyone to optionally filter contaminants out.

Describe alternatives you've considered

Keeping these out ;-)

Add takarabio SMARTer smRNA-Seq Kit profile option

This could be a useful addition for the profiles, which could be named 'smart', for instance.

In their documentation, Takara provides settings for cutadapt which I think translate to the following settings in trimgalore config for the pipeline:

min_length = 15
clip_R1 = 3
three_prime_adapter = 'AAAAAAAAAA'

I will test these settings and report if this works...

Changelog mentions duplicate releases

Also according to CHANGELOG, there is already two 1.0 releases, on top on the 1.0.0, but the SciLifeLab/NGI-smRNAseq has none...

I think it should be bumped to the next dev, and the matter of the old 1.0 should be fixed.

mirtrace tries to allocate half of the installed RAM

mirtrace can be executed in two ways: with the mirtrace wrapper script or calling the jar directly (java -jar mirtrace.jar...).

mirtrace requires a large amount of heap allocated on the java virtual machine. Its README states that the wrapper script sets the heap to half of the RAM of the system. For custom heap values, they suggest to not use the wrapper script and call the jar directly. See https://github.com/friedlanderlab/mirtrace

Currently this nexflow pipeline uses the mirtrace wrapper script, so it tries to allocate half of the RAM installed in the system. I'm trying to run this pipeline on a shared server with a lot of RAM installed but with a smaller RAM limit available for me. If the system has 1.4TB of RAM installed, mirtrace tries to allocate 700GB, but I have a limit of 300GB.

I made a pull request to mirtrace to let the user define the heap limit on the mirtrace wrapper as well friedlanderlab/mirtrace#4. In parallel, while my contribution is revised, I would like to report this issue and ask if you could please call the mirtrace jar directly with a reasonable heap value (maybe derived from the nextflow memory limit?). My nextflow skills are still very limited and I would appreciate a lot if someone could fix this (I would definitely learn from the solution) or give me an overdetailed explanation so I could fix it myself and submit a PR here.

My first successful attempt to get this fixed is at zeehio@27ec0cf.

It requires (1) find the location of mirtrace.jar and (2) call java directly.

For (1) I used the same approach used in the mirtrace wrapper script:

mirtracejar=\$(dirname \$(which mirtrace))

For (2) I replaced the mirtrace qc call (in the mirtrace process, in main.nf) with:

java -Xms4096M -Xmx4096M -jar \$mirtracejar/mirtrace.jar --mirtrace-wrapper-name mirtrace qc 

A proper solution would set -Xms4096M -Xmx4096M dynamically, based on the current nextflow memory limits. That's beyond my current nextflow skills but it should be easy for someone with experience.

Thanks in advance

PS: might be related to #44 but I can't tell from that issue if that's the case

Cannot change Trimgalore max_length parameter

When investigating other smallRNAs one might want to increase the max_length of trimgalore, so that reads > 40bp are not thrown out. However, this is not possible in the current 1.,0.0 version of the smrnaseq pipeline. In the main.nf script the max_length is hardcoded as 40 bp into the script, making it impossible to change this parameter.

I changed the hardcoded bit of trimgalore, and added a parameter that can be assigned in the config file, trying to mimic the syntax and style of the min_length parameter. Please mind that there is no "default" in the adapted script. This could very well be added as 40 bp in the future so people won't experience changes when rerunning the pipeline.

I have trouble with creating a branch/PR, so if one of the devs could help me with that, that would be amazing.

Error executing process > bowtie indices... mirBase issue?

Hi there!

I used to run nf-core/smrnaseq and it worked like a charm until last week, when I had issues with the pipeline. I used the same command earlier:
nextflow run ~/Downloads/smrnaseq/ -profile docker --input 'fq.gz' --outdir result --genome GRCm38

Now, when I run the same command on the same dataset I get errors:
Error executing process bowtie indices..

Caused by:
Can't stage file ftp://mirbase.org/pub/mirbase/CURRENT/mature.fa.gz -- reason: Connection refused (Connection refused)

Can you please let me know if there's an issue with mirbase datasets, or maybe something wrong that I overlooked? thx!

The pipeline gets stuck in the 'mirtrace' process with no progress for days..

missing mirtop results

Check Documentation

I have checked the following places for your error:

Description of the bug

I recently applied the pipeline to my miRNAseq data and I was particularly interested into mirtop results.
However, the pipeline does not produce either the mirtop results or the results/mirtop directory.
In other words, the step "miRNA and isomiR annotation from step 4.1 (mirtop)" seems missing.
I also checked the "Nextflow workflow report" produced by the pipeline and the step of mirtop is not included in the execution. It does not appear in the legend of tasks and it does not have a running time or memory usage.

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line: [nextflow run nf-core/smrnaseq -r 1.1.0 --input '/storage_1/fastq2_small/*N*.fastq.gz' -profile docker --protocol custom --genome GRCm38 --clip_r1 0 --three_prime_clip_r1 0 --max_cpus 30 --three_prime_adapter AGATCGGAAGAGCACACGTCT --mirtrace_protocol Illumina]

Expected behaviour

The results/mirtop directory with isomiR annotation.

Log files

Have you provided the following extra information/files:

  • The command used to run the pipeline

System

  • Hardware: [Desktop]
  • Executor: [local]
  • OS: [Ubuntu 18.04.5 LTS]

Container engine

  • Engine: [Docker]
  • version: [1.1.0]
  • Image tag: [e.g. nfcore/smrnaseq:1.1.0]

Thank you for your future reply.

edgeR_mirna error

Running smrnaseq on an AWS EC2 (Ubuntu 18.04, 64 GB memory, 16 cpu) with the following command line:

nextflow run nf-core/smrnaseq -r 1.0.0 -profile docker \
--reads 'sample1.*' \
--genome 'GRCh37' \
--max_memory '60.GB' \
 --outdir smrnaseq_output \
--protocol cats

gives the following error:

Error executing process > 'edgeR_mirna'

Caused by:
  Process `edgeR_mirna` terminated with an error exit status (1)

Command executed:

  edgeR_miRBase.r Kit4RNA3.mature.stats Kit4RNA3.hairpin.stats

Command exit status:
  1

Command output:
  $mature
  [1] "sample1.mature.stats"

  $hairpin
  [1] "sample1.hairpin.stats"

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  Loading required package: limma
  Loading required package: edgeR
  Loading required package: statmod
  Loading required package: data.table
  Loading required package: gplots

  Attaching package: ‘gplots’

  The following object is masked from ‘package:stats’:

      lowess

  Error in apply(data, 1, function(row) all(row == 0)) :
    dim(X) must have a positive length
  Execution halted

Running edgeR_miRBase.r locally also gives same error.

trimming options ignored

Hi,
I'd like to point out that at least part of the trimming options are not properly recognised.
When I set --clip_R1 4 --three_prime_clip_R1 4, I got on the screen the following:

Pipeline Release  : 1.0.0
Run Name          : hopeful_joliot
Reads             : data/reads/test/*.fastq.gz
Genome            : GRCh37
Min Trimmed Length: 17
Trim 5' R1        : 0
Trim 3' R1        : 0
miRBase mature    : s3://ngi-igenomes/igenomes//Homo_sapiens/Ensembl/GRCh37/Annotation/SmallRNA/mature.fa
miRBase hairpin   : s3://ngi-igenomes/igenomes//Homo_sapiens/Ensembl/GRCh37/Annotation/SmallRNA/hairpin.fa
Bowtie Index for Ref: s3://ngi-igenomes/igenomes//Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex/genome
Save Reference    : Yes
Protocol          : illumina
miRTrace species  : hsa
3' adapter        : TGGAATTCTCGGGTGCCAAGG
...
Config Profile    : singularity
Max Resources     : 14.GB memory, 8 cpus, 10d time per job
Container         : singularity - nfcore/smrnaseq:1.0.0

So it seems it ignored the trimming arguments. Did I miss something?

Thanks

Inconsistencies in hairpins aligned read numbers

Hi!

Check Documentation

I have checked the following places for your error:

Description of the bug

In the multiqc report, the number of reads aligned to the hairpin database differs slightly from the number of reads that did not map on the matures. In our real life example we got a ~2M reads difference :
image
Here, we got 22.1M reads not mapped on the mature database. However, the report states that 23.9M sequences were aligned against the hairpin database.

Steps to reproduce

This can be reproduced with the test profile, in a lesser extent and it's not visible in the report because of the low amount of reads :

$ nextflow run nf-core/smrnaseq -profile test,singularity
$ cd work/xx/xxxxx # directory for a hairpin alignment task
$ echo $(zcat sample_1.mature_unmapped.fq.gz | wc -l)/4 | bc
20399

$ samtools flagstat sample_1.hairpin.bam
20411 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
29 + 0 mapped (0.14% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Expected behaviour

flagstat shoudl report the same number of reads. This seems to be a problem of multiple alignments incorrectly flagged in the bam file. I assume this also slightly impacts quantification

Log files

Have you provided the following extra information/files:

System

  • Hardware: Laptop + slurm cluster
  • Executor: local + slurm
  • OS: Ubuntu + CentOS

Nextflow Installation

  • Version: 20.10.0.5430

Container engine

  • Engine: Singularity
  • Image tag: nfcore/smrnaseq:1.1.0

edgeR_mirna - argument is of length zero

Description of the bug

Running v1.1 of the pipeline on 80 samples (R1 of a paired-end Novaseq run with Nextflex):

Getting this in the edgeR_mirna step:

`
Command error:
Loading required package: limma
Loading required package: edgeR
Loading required package: statmod
Loading required package: data.table
Loading required package: gplots

Attaching package: ‘gplots’

The following object is masked from ‘package:stats’:

  lowess

Error in if (nr_keep > 0) { : argument is of length zero
Execution halted

`

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line: nextflow run nf-core/smrnaseq -profile ccga_med --input 'reads_concat/concatenated/*_R1.fastq.gz' --genome GRCh37 --seq_center CCGA --protocol nextflex
  2. See error: see above

Expected behaviour

No crashing? ;)

Log files

Have you provided the following extra information/files:

System

  • Hardware: HPC (Centos 7)
  • Executor: slurm
  • OS: Centos 7
  • Version

Nextflow Installation

  • Version: 21.05

Container engine

  • Engine: Singularity
  • version: 3.5.2
  • Image tag: nfcore-smrnaseq-1.1.0.img

Additional context

Error with singularity profile

Dear all,

I am trying to run the pipeline on 4 small rnaseq fastq files from pig using a singularity profile.

However I am getting this error message very early on:

Error executing process > 'output_documentation (1)'

Caused by:
  java.nio.file.NoSuchFileException: <outdir>/work/singularity/nfcore-smrnaseq-1.0.0.img.pulling.1607022732854

and a bit before I had this

Pulling Singularity image docker://nfcore/smrnaseq:1.0.0 [cache <outdir>/work/singularity/nfcore-smrnaseq-1.0.0.img]
WARN: Singularity cache directory has not been defined -- Remote image will be stored in the path: <outdir>/work/singularity -- Use env variable NXF_SINGULARITY_CACHEDIR to specify a different location

This is the bash script I sent to our slurm cluster

cd <outdir>
module load bioinfo/Nextflow-v20.10.0
module load system/singularity-3.6.4
nextflow run <path_to_latest_code>/smrnaseq --reads '<indir>/*.fastq.gz' --max_memory '16.GB' --max_cpus 2 --protocol nextflex --genome 'Sscrofa10.2' --outdir <outdir> --email [email protected] -profile singularity -resume > nextflow.out 2> nextflow.err

Do you see what the problem could be?

Best,
Sarah

EDIT: code block

SMrnaseq: Convert usage docs to JSON schema

Hi!

this is not necessarily an issue with the pipeline, but in order to streamline the documentation group next week for the hackathon, I'm opening issues in all repositories / pipeline repos that might need this update to switch from parameter docs to auto-generated documentation based on the JSON schema.

This will then supersede any further parameter documentation, thus making things a bit easier :-)

If this doesn't apply (anymore), please close the issue. Otherwise, I'm hoping to have some helping hands on this next week in the documentation team on Slack https://nfcore.slack.com/archives/C01QPMKBYNR

Migrate badge to travis-ci.com

We've been migrating this over to travis-ci.com (as all travis-ci.org repositories will be shutdown at some point). Please update the badge in the readme accordingly :-)

Single end support: input settings

Hi
Can I use single end reads from illumina as input. How can I distinguish control and case when I giving as input, since we use *.fastq.gz as input flag.

Submitting LSF job for the mirtrace process in test causes Bad Job name error

Check Documentation

I have checked the following places for your error:

Description of the bug

When starting the pipeline with nextflow run nf-core/smrnaseq -profile test,singularity, local runs causes no problem. However, if we set the executor to be LSF, it reports an error, caused by bsub, with the output Bad job name. Job not submitted.. We believe the problem is caused by the mirtrace process, which prints mirtrace ([sample_1.fastq.gz, sample_2.fastq.gz, sample_3.fastq.gz]).

Steps to reproduce

Steps to reproduce the behaviour:

  1. Set "lsf" as process.executor in config.
  2. Command line: nextflow run nf-core/smrnaseq -profile test,singularity
  3. See error:
Error executing process > 'mirtrace ([sample_1.fastq.gz, sample_2.fastq.gz, sample_3.fastq.gz])'

Caused by:
  Failed to submit process to grid scheduler for execution

Command executed:

  bsub

Command exit status:
  255

Command output:
  Bad job name. Job not submitted.

We believe that the following job name, which contains special characters [ and ], may cause the problem with LSF:

#BSUB -J "nf-mirtrace_([sample_1.fastq.gz,_sample_2.fastq.gz,_sample_3.fastq.gz])"

Expected behaviour

It should run error-free as local runs.

Log files

Have you provided the following extra information/files:

  • The command used to run the pipeline
  • The .nextflow.log file

System

  • Hardware: HPC
  • Executor: LSF
  • OS: CentOS Linux...]
  • Version

Nextflow Installation

  • Version: 20.10

Container engine

  • Engine: Singularity
  • version:
  • Image tag:

Additional context

Allow "--userns" when running singularity

Hi there!

Thanks for suggesting a new feature for the pipeline!

Is there any way to provide "--userns" which running singularity? On our HPC clusters we need to provide "--userns" to run singularity, Is there any way we can provide this as config or directly as option?

Thanks,
Keyur

pipeline reads only a single input fastq file

Hi there!

I am running the pipeline to report differentially expressed miRNAs in C vs T samples. The pipeline works perfectly when I run it on a single input file, ex code below:
nextflow run ~/Downloads/smrnaseq/ -profile docker --protocol custom --input F1.fq.gz --outdir Results --genome GRCm38 --min_length 15 --trim_galore_max_length 50 --three_prime_adapter AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC

I get the prediction of the novel and known miRNAs which is good. However, I need to run the pipeline on two input fastq files, C vs T, to report differentially expressed miRNAs. I placed the C and T fastq files in a single folder and used the below code to run it:
nextflow run ~/Downloads/smrnaseq/ -profile docker --protocol custom --input fq.gz --outdir Results --genome GRCm38 --min_length 15 --trim_galore_max_length 50 --three_prime_adapter AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC

I see that the pipeline only reads one of the fastq files, in this case C1.fq.gz. I wonder if I can specify multiple input fastq files so that the pipeline would report the differentially expressed miRNAs with edgeR?

Thx!

Multiple Sample running Issue

Hello,
I'm trying to run multiple microRNA samples and the pipeline start to die at the trimming step : Any suggestions

nextflow run nf-core/smrnaseq -r 1.1.0 --input sample.sheet.csv -profile singularity --protocol qiaseq --fasta genome.fa --mirtrace_species cfa --mirna_gtf cfa.gff3 --bt_index /pathtogenomebowtie/ --hairpin hairpin.fa --mature mature.fa --mirtrace_protocol qiaseq --max_cpus 90 --max_memory 1000GB &&

eg: sample.sheet.csv is below:

sample, fastq_1
01-C04,01-C04_S15_R1_001.fastq.gz
01-C04, 01-C04_S3_R1_001.fastq.gz
01-C07, 01-C07__S5_R1_001.fastq.gz
01-C08, 01-C08__S9_R1_001.fastq.gz
01-C10, 01-C10__S8_R1_001.fastq.gz
01-D01,01-D01__S4_R1_001.fastq.gz
01-D02,01-D02__S1_R1_001.fastq.gz
01-D02,01-D02__S2_R1_001.fastq.gz
01-D03, 01-D03__S2_R1_001.fastq.gz

Error executing process > 'trim_galore (sample.sheet.csv)'

Caused by:
Process trim_galore (sample.sheet.csv) terminated with an error exit status (25)

Command executed:

trim_galore --adapter AACTGTAGGCACCATCAAT --length 17 --max_length 40 --gzip sample.sheet.csv --fastqc

Command exit status:
25

Command output:
(empty)

Command error:
Multicore support not enabled. Proceeding with single-core trimming.
Path to Cutadapt set as: 'cutadapt' (default)
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Cutadapt version: 3.4
single-core operation.
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)

Maximum length cutoff set to >> 40 bp <<; sequences longer than this threshold will be removed (only advised for smallRNA-trimming!)

File seems to be in SOLiD colorspace format which is not supported by Trim Galore (sequence is: '01-C04,01-C04_S15_R1_001.fastq.gz
')! Please use Cutadapt on colorspace files separately and check its documentation!

Use bowtie 1 for human genome alignment

Moved from SciLifeLab#3


Probably shouldn't have put Bowtie 2 into the latest change.

Need to read this paper and take on advice:
http://bib.oxfordjournals.org/content/16/6/950.full

Alignments performed using Bowtie 2 resulted in an even greater number of miRNAs with higher counts; examination of the alignment output revealed that many of these were attributed to the allowance of insertions and deletions.

Bowtie 2, which was developed for gapped alignment, was included in our comparison to illustrate the consequences of selecting an inappropriate aligner.


@zhenyisong:
There is another paper regarding the same issue, "Evaluation of microRNA alignment techniques". PMID: 27284164.

Problems with miRTrace output

Hi,
I was running the pipeline with 40 samples and everything worked well, but at the end I didn't obtain the mirtrace files (mirtrace-report.html, mirtrace-results.json, etc.), I only obtained the qc_passed_reads.all.collapsed and qc_passed_reads.rnatype_unknown.collapsed.
Nonetheless, the pipeline was completed successfully and I think that there are no other errors. When I run the pipeline only with 10 samples everything seems to work well, I don't know which can be the problem...

Thank you very much!

weird launchDir from UPPMAX profile

Check Documentation

I have checked the following places for your error:

Description of the bug

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line: nextflow run ~/nf-core-smrnaseq/workflow -profile uppmax --project xxx --genome 'GRCh37' --input '../data/*gz' --protocol 'illumina' -with-singularity /home/xxx/nf-core-smrnaseq/singularity-images/nfcore-smrnaseq-1.1.0.img --mirna_gtf /proj/xxx/nobackup/Lokesh_analysis/microRNA/hsa.gff3 --hairpin /proj/xxx/nobackup/Lokesh_analysis/microRNA/hairpin.fa.gz --mature /proj/xxx/nobackup/Lokesh_analysis/microRNA/mature.fa.gz
  2. See error: launchDir : /castor/project/proj_nobackup/Lokesh_analysis/microRNA/NF_run_results
    Unable to create folder=/castor/project/proj_nobackup/Lokesh_analysis/microRNA/NF_run_results/work/ab/b27626ba774e671f9bb5fe0484060d

Expected behaviour

Log files

Have you provided the following extra information/files:

  • The command used to run the pipeline
  • The .nextflow.log file

System

  • Hardware: bianca node with 16 threads
  • Executor: slurm
  • OS: centOS
  • Version

Nextflow Installation

  • Version:

Container engine

  • Engine: Singularity
  • version:
  • Image tag:nfcore/smrnaseq:1.1.0

Additional context

Documentation and code don't match for 'cats' protocol

Hi,

just found a tiny issue - the adapter sequence for the 'cats' protocol in the code and in the documentation don't match. It's uncommented in the code, see line 107 in main.nf

Small issue but can lead to some frustration I imagine.

Cheers,
Kevin

Unable to pull singularity image

I am attempting to pull an image of the smrnaseq pipeline on a compute cluster that uses sge and getting an error (see below) that I am unable to figure out the way around. Any pointers would be helpful. Thanks.

  Failed to pull singularity image
  command: singularity pull  --name nfcore-smrnaseq-1.0.0.img docker://nfcore/smrnaseq:1.0.0 > /dev/null
  status : 255
  message:
    INFO:    Converting OCI blobs to SIF format
    INFO:    Starting build...
    Getting image source signatures
    Copying blob sha256:cc1a78bfd46becbfc3abb8a74d9a70a0e0dc7a5809bbd12e814f9382db003707
    Copying blob sha256:420ea9ce27d99662ea212155ed07e75827356c8d37f8587414bff1ae8b9624b8
    Copying blob sha256:1b219c0505472aa2627d80fbb385d28eb0c9d099f79f0ddf477e50050b5d66b9
    Copying blob sha256:1c177edca126d0c36166cda39ee2b9df28d9f4f8ab975d8cee58869000a79912
    Copying blob sha256:3b1707d1b6ccd87289085dd42f4153c90a9b0084ec88b7f69da4207202c5ab26
    Copying blob sha256:948ec94fc8551061bac9b2cfcc1d786bbbfc3bd4c06fbcce7b19f53690761d2d
    Copying blob sha256:75a3ff91ce43fa9e7fbd5fd951df283ef8d24fbc44a5725d47d638e057dad638
    Copying blob sha256:ceec4fd711c53b4171c2ce7c715e8f8ace07a117ba3bf07b176afc3916eb7cde
    Copying config sha256:0fa79f59552300796ec2b72cf6f776827f42d4a5efd2088115f2802a973fc900
    Writing manifest to image destination
    Storing signatures
    2020/10/23 16:26:37  info unpack layer: sha256:cc1a78bfd46becbfc3abb8a74d9a70a0e0dc7a5809bbd12e814f9382db003707
    2020/10/23 16:26:38  warn xattr{etc/gshadow} ignoring ENOTSUP on setxattr "user.rootlesscontainers"
    2020/10/23 16:26:38  warn xattr{/tmp/rootfs-12b5ab23-1587-11eb-95bd-1418773e5343/etc/gshadow} destination filesystem does not support xattrs, further warnings will be suppressed
    2020/10/23 16:26:49  warn rootless{usr/local/man} ignoring (usually) harmless EPERM on setxattr "user.rootlesscontainers"
    2020/10/23 16:27:24  info unpack layer: sha256:420ea9ce27d99662ea212155ed07e75827356c8d37f8587414bff1ae8b9624b8
    2020/10/23 16:27:26  warn xattr{etc/gshadow} ignoring ENOTSUP on setxattr "user.rootlesscontainers"
    2020/10/23 16:27:26  warn xattr{/tmp/rootfs-12b5ab23-1587-11eb-95bd-1418773e5343/etc/gshadow} destination filesystem does not support xattrs, further warnings will be suppressed
    2020/10/23 16:28:40  info unpack layer: sha256:1b219c0505472aa2627d80fbb385d28eb0c9d099f79f0ddf477e50050b5d66b9
    2020/10/23 16:28:40  warn xattr{opt/conda/LICENSE.txt} ignoring ENOTSUP on setxattr "user.rootlesscontainers"
    2020/10/23 16:28:40  warn xattr{/tmp/rootfs-12b5ab23-1587-11eb-95bd-1418773e5343/opt/conda/LICENSE.txt} destination filesystem does not support xattrs, further warnings will be suppressed
    FATAL:   While making image from oci registry: error fetching image to cache: while building SIF from layers: packer failed to pack: while unpacking tmpfs: error unpacking rootfs: unpack layer: unpack entry: opt/conda/pkgs/asn1crypto-0.24.0-py27_0/lib/python2.7/site-packages/asn1crypto/__init__.py: link: unpriv.link: unpriv.wrap target: operation not permitted

Think about mapping against hairpin only

Moved from SciLifeLab#31


Marc thinks that mapping again only hairpin miRBase reads could be a better approach:

  • Map against hairpin only
    • Check mapping within the mature region
    • Allow overlap of 1bp at 5' end
    • Allow 2 bp overlap at 3' end
  • Treat the 5' and 3' arm alignments separately
  • Allow a single mismatch in mapping

Remove samtools and mature mapping step. Have a custom script to do filtering and counting.

Error in the edgeR_mirna process

Hi! I was trying to run this pipeline with some miRNA data, but I encountered the following error.

Error executing process > 'edgeR_mirna'

Caused by:
Process edgeR_mirna terminated with an error exit status (1)

Command executed:

edgeR_miRBase.r Plasma_S1_L001_R2_001.hairpin.stats Plasma_S1_L001_R2_001.mature.stats Plasma_S1_L001_R1_001.mature.stats Plasma_S1_L001_R1_001.hairpin.stats

Command exit status:
1

Command output:
$mature
[1] "Plasma_S1_L001_R2_001.mature.stats" "Plasma_S1_L001_R1_001.mature.stats"

$hairpin
[1] "Plasma_S1_L001_R2_001.hairpin.stats" "Plasma_S1_L001_R1_001.hairpin.stats"

Command error:
Loading required package: limma
Loading required package: edgeR
Loading required package: statmod
Loading required package: data.table
Loading required package: gplots

Attaching package: ‘gplots’

The following object is masked from ‘package:stats’:

  lowess

Error in quantile.default(x, p = p) :
missing values and NaN's not allowed if 'na.rm' is FALSE
Calls: calcNormFactors ... .calcFactorQuantile -> apply -> FUN -> quantile -> quantile.default
In addition: Warning message:
In DGEList(counts = data, genes = rownames(data)) :
library size of zero detected
Execution halted

I use the pipeline with this configuration:
nextflow run nf-core/smrnaseq --reads '*{1,2}.fastq.gz' --genome GRCh37 -profile singularity

How can I solve this?

adding coreutils/realpath

Hello,
When I run nf-core/smrnaseq revision 1.0.0 the mirtrace steps fail due to missing "realpath" command. Could perhaps coreutils be added to the conda environment for installation?
Thank you,
Peter Bazeley

Adding UMI-tools

Hi all!
Firstly I would like to thank you for this awesome work. Secondly I have a request:
Please is it possible to add UMI-tools to environment (docker) to allow work with umi's?
Thank you for answer

Add option for sequencing centre in BAM file

Moved from SciLifeLab#35


Bowtie 1

Output options:
--sam-RG <text>
Add <text> (usually of the form TAG:VAL, e.g. ID:IL7LANE2) as a field on the @RG header line. Specify --sam-RG multiple times to set multiple fields.
--sam-RG is ignored unless -S/--sam is also specified.

Bowtie2

SAM options: --rg <text>

Add <text> (usually of the form TAG:VAL, e.g. SM:Pool1) as a field on the @RG header line.

e.g.:

--rg CN:nameofourgroup

[FEATURE] UMI Handling (extract, trimming, merging trimming reports)

nf-core/smrnaseq feature request

Is your feature request related to a problem? Please describe

We have some data that has UMIs attached to the reads and would like to be able to both extract & trim these and have the reports of that step in the final report available as well.

Describe the solution you'd like

We have functional code to perform this using cutadapt + some extra scripts & would like to contribute this and/or consider a different / alternative solution if there is a better approach in doing this.

Describe alternatives you've considered

Doing this prior smrnaseq, but we figured its a more common use case and could be an optional addition to the pipeline.

edgeR_mirna step failing on samples with low mature mapping rate

v1.0.0 of the pipeline is erroring out at the edgeR_mirna step, due to some samples with low mature mapping rate. @lpantano any ideas here?

Happy to help debug this and provide some example files.

Execution cancelled -- Finishing pending tasks before exit
[0;35m[nf-core/smrnaseq] Pipeline completed with errors
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
Error executing process > 'edgeR_mirna'

Caused by:
  Process `edgeR_mirna` terminated with an error exit status (1)

Command executed:

  edgeR_miRBase.r 30193-035.mature.stats 30193-041.hairpin.stats 30193-005.mature.stats 30193-025.hairpin.stats 30193-032.hairpin.stats 30193-015.hairpin.stats 30193-040.hairpin.stats 30193-042.mature.stats 30193-007.hairpin.stats 30193-012.mature.stats 30193-039.mature.stats 30193-001.mature.stats 30193-025.mature.stats 30193-028.hairpin.stats 30193-030.mature.stats 30193-039.hairpin.stats 30193-003.mature.stats 30193-023.hairpin.stats 30193-023.mature.stats 30193-011.mature.stats 30193-038.hairpin.stats 30193-012.hairpin.stats 30193-034.hairpin.stats 30193-008.mature.stats 30193-036.mature.stats 30193-032.mature.stats 30193-038.mature.stats 30193-008.hairpin.stats 30193-021.hairpin.stats 30193-024.mature.stats 30193-033.mature.stats 30193-003.hairpin.stats 30193-001.hairpin.stats 30193-009.hairpin.stats 30193-019.hairpin.stats 30193-021.mature.stats 30193-005.hairpin.stats 30193-031.mature.stats 30193-037.hairpin.stats 30193-029.hairpin.stats 30193-010.hairpin.stats 30193-015.mature.stats 30193-019.mature.stats 30193-016.mature.stats 30193-010.mature.stats 30193-016.hairpin.stats 30193-011.hairpin.stats 30193-017.hairpin.stats 30193-027.mature.stats 30193-040.mature.stats 30193-013.hairpin.stats 30193-041.mature.stats 30193-042.hairpin.stats 30193-004.hairpin.stats 30193-022.mature.stats 30193-020.hairpin.stats 30193-029.mature.stats 30193-030.hairpin.stats 30193-033.hairpin.stats 30193-017.mature.stats 30193-031.hairpin.stats 30193-027.hairpin.stats 30193-022.hairpin.stats 30193-013.mature.stats 30193-026.mature.stats 30193-006.mature.stats 30193-020.mature.stats 30193-007.mature.stats 30193-002.hairpin.stats 30193-009.mature.stats 30193-028.mature.stats 30193-034.mature.stats 30193-018.mature.stats 30193-036.hairpin.stats 30193-004.mature.stats 30193-006.hairpin.stats 30193-002.mature.stats 30193-037.mature.stats 30193-026.hairpin.stats 30193-014.mature.stats 30193-024.hairpin.stats 30193-018.hairpin.stats 30193-035.hairpin.stats 30193-014.hairpin.stats

Command exit status:
  1

Command output:
  $mature
   [1] "30193-035.mature.stats" "30193-005.mature.stats" "30193-042.mature.stats"
   [4] "30193-012.mature.stats" "30193-039.mature.stats" "30193-001.mature.stats"
   [7] "30193-025.mature.stats" "30193-030.mature.stats" "30193-003.mature.stats"
  [10] "30193-023.mature.stats" "30193-011.mature.stats" "30193-008.mature.stats"
  [13] "30193-036.mature.stats" "30193-032.mature.stats" "30193-038.mature.stats"
  [16] "30193-024.mature.stats" "30193-033.mature.stats" "30193-021.mature.stats"
  [19] "30193-031.mature.stats" "30193-015.mature.stats" "30193-019.mature.stats"
  [22] "30193-016.mature.stats" "30193-010.mature.stats" "30193-027.mature.stats"
  [25] "30193-040.mature.stats" "30193-041.mature.stats" "30193-022.mature.stats"
  [28] "30193-029.mature.stats" "30193-017.mature.stats" "30193-013.mature.stats"
  [31] "30193-026.mature.stats" "30193-006.mature.stats" "30193-020.mature.stats"
  [34] "30193-007.mature.stats" "30193-009.mature.stats" "30193-028.mature.stats"
  [37] "30193-034.mature.stats" "30193-018.mature.stats" "30193-004.mature.stats"
  [40] "30193-002.mature.stats" "30193-037.mature.stats" "30193-014.mature.stats"

  $hairpin
   [1] "30193-041.hairpin.stats" "30193-025.hairpin.stats"
   [3] "30193-032.hairpin.stats" "30193-015.hairpin.stats"
   [5] "30193-040.hairpin.stats" "30193-007.hairpin.stats"
   [7] "30193-028.hairpin.stats" "30193-039.hairpin.stats"
   [9] "30193-023.hairpin.stats" "30193-038.hairpin.stats"
  [11] "30193-012.hairpin.stats" "30193-034.hairpin.stats"
  [13] "30193-008.hairpin.stats" "30193-021.hairpin.stats"
  [15] "30193-003.hairpin.stats" "30193-001.hairpin.stats"
  [17] "30193-009.hairpin.stats" "30193-019.hairpin.stats"
  [19] "30193-005.hairpin.stats" "30193-037.hairpin.stats"
  [21] "30193-029.hairpin.stats" "30193-010.hairpin.stats"
  [23] "30193-016.hairpin.stats" "30193-011.hairpin.stats"
  [25] "30193-017.hairpin.stats" "30193-013.hairpin.stats"
  [27] "30193-042.hairpin.stats" "30193-004.hairpin.stats"
  [29] "30193-020.hairpin.stats" "30193-030.hairpin.stats"
  [31] "30193-033.hairpin.stats" "30193-031.hairpin.stats"
  [33] "30193-027.hairpin.stats" "30193-022.hairpin.stats"
  [35] "30193-002.hairpin.stats" "30193-036.hairpin.stats"
  [37] "30193-006.hairpin.stats" "30193-026.hairpin.stats"
  [39] "30193-024.hairpin.stats" "30193-018.hairpin.stats"
  [41] "30193-035.hairpin.stats" "30193-014.hairpin.stats"

Command error:
  Loading required package: limma
  Loading required package: edgeR
  Loading required package: statmod
  Loading required package: data.table
  Loading required package: gplots

  Attaching package: ‘gplots’

  The following object is masked from ‘package:stats’:

      lowess

  Error in quantile.default(x, p = p) :
    missing values and NaN's not allowed if 'na.rm' is FALSE
  Calls: calcNormFactors ... .calcFactorQuantile -> apply -> FUN -> quantile -> quantile.default
  In addition: Warning message:
  In DGEList(counts = data, genes = rownames(data)) :
    library size of zero detected
  Execution halted

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.