Giter Site home page Giter Site logo

germs_16s_pipeline's Introduction

This is a remake of the original 16S pipeline developed with in the Genome Institute of Singapore, designed for Illumina shotgun sequencing of 16S rRNA amplicon sequences (Ong et al., 2013, PMID 23579286).

Running the pipeline

The pipeline is based on snakemake. The main program (S16.py) will write a config file (conf.json) and snakemake file (snake.make) in the given output directory. These are then used to call snakemake via qsub using the also created wrapper script snake.sh. The system will send an email to you upon completion (be it successful or not).

For help see S16.py --help.

Only upon successful completion the output directory will contain an empty file called COMPLETE.

Results (abundance tables and piecharts) can then be found in results subdirectory (see report.html there).

Steps involved

  • Reads are first preprocessed (merging of multiple files, trimming etc.) with famas.
  • Preprocessed reads are classified into 'unspecific', 'spike-in' and '16S', based on a BWA-MEM mapping against the original EMIRGE/ARB SSU database containing the spike-in (see ratios.txt for corresponding ratios).
  • Only 16S sequences are used for reconstructing the full-length sequences with EMIRGE (Miller et al., 2011, PMID 21595876) or EMIRGE amplicon (Miller et al., 2013; PMID 23405248)
  • Primers are clipped from the reconstructed sequences
  • Clipped sequences are mapped with Graphmap against a preclustered version (99% OTU) of the Greengenes database for classification.
  • Abundance-tables and piecharts are created in the results subdirectory of the output directory. Pairwise identity thresholds for the different taxonomi ranks are implemented as determined by Yarza et al. (2014; PMID 25118885).

Tip

If you want to first see what the pipeline would do, call S16.py with --no-run. Then check the created files (see above).

To print all commands that would be run, use:

snakemake -s snake.make --configfile conf.json -n -p

To get a graphical representation of the workflow, run (from the output directory):

snakemake -s snake.make --configfile conf.json --dag --forceall | dot -Tpdf > dag.pdf

and have a look at dag.pdf. Once you're satisfied just run bash snake.sh.

Setup

This is tuned towards an in-house setup. If you want to replicate it see the CONF variable in S16.py, which lists expected binaries and databases.

germs_16s_pipeline's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

germs_16s_pipeline's Issues

rpos >= len(ref)

See dir bug-rpos-not-lt-ref
Must be Graphmap
Reported by Collins 2016-05-06

Trim GG before clustering

The classification database (99% OTU) should have been trimmed before clustering instead of using the preclustered database. Pointed out by Christophe LAY

Determine source of EMIRGE/Bowtie unaligned reads

  • example emirge-bowtie-unmapped-source-PHH4059
  • note 1: emirge is a b**** and sequentially renames reads before mapping starting from 0
  • note 2: only properly paired reads seem to be listed in bam
  • created tmp copy: emirge_tmp_reads_[12].fastq.keep.gz
  • samtools view -F 0x4 emirge/iter.10/bowtie.iter.10.PE.u.b.bam | cut -f 1 | sort -us -o bowtie.iter.10.PE.u.b.bam.mapped-ids.txt

Empty BAM

No errors in logs, but BAM files empty. See bug-empty-bam-PHH4043

Graphmap 0.3.0 results in many unaligned seqs

See also isovic/graphmap#28

See e.g. AHS182

cd /mnt/projects/wilma/16s/S16_V2/bugs/AHS182

source /mnt/software/etc/gis.bashrc
source activate py3k
snakemake -s snake.make --configfile conf.json -p -T --dryrun --force


new:

samtools view graphmap-0.3.0-1d16f07.ident.bam | head -n 1
82|ACCG02000012.525518.527039   0   742595  281 1   477M1D249M  *   0   0   <removed>   *   MD:Z:3C473^G246T2   NM:i:3  AS:i:3604   H0:i:4  ZE:f:0  ZF:f:0.334728   ZQ:i:726    ZR:i:1382   Xi:f:99.6

samtools view greengenes-hits-graphmap.bam | grep '821|AF414833.1.1363'
821|AF414833.1.1363 4   *   0   255 *   *   0   0   <removed> * NM:i:-1 AS:i:-731   H0:i:0  ZE:f:inf    ZF:f:0  ZQ:i:731    ZR:i:0  Xi:f:0

/mnt/projects/wilma/16s/S16_V2_inst/f714a48/classify_hits.py -f -q emirge_outprimer_trimmed.fa -i graphmap-0.3.0-1d16f07.ident.bam -t /mnt/genomeDB/misc/greengenes.secondgenome.com/downloads/13_5/gg_13_5_otus/taxonomy/99_otu_taxonomy.txt -o -
82|ACCG02000012.525518.527039   0.290834    742595  99.6    k:Bacteria  p:Actinobacteria    c:Actinobacteria    o:Bifidobacteriales f:Bifidobacteriaceae    g:Bifidobacterium   s:breve
821|AF414833.1.1363 0.127213    *   0   k:  p:  c:  o:  f:  g:  s:

old:

samtools view graphmap-0.2.2-dev-604a386.ident.bam | head -n 1
82|ACCG02000012.525518.527039   0   484303  338 1   726M    *   0   0   <removed>   *   NM:i:2  AS:i:3612   H0:i:1  ZE:f:0  ZF:f:0.334728   ZQ:i:726    ZR:i:1535   Xi:f:99.7
821|AF414833.1.1363 0   4480544 347 0   731M    *   0   0   <removed> * NM:i:5  AS:i:3610   H0:i:1  ZE:f:0  ZF:f:0.335099   ZQ:i:731    ZR:i:1536   Xi:f:99.3

/mnt/projects/wilma/16s/S16_V2_inst/f714a48/classify_hits.py -f -q emirge_outprimer_trimmed.fa -i graphmap-0.2.2-dev-604a386.ident.bam -t /mnt/genomeDB/misc/greengenes.secondgenome.com/downloads/13_5/gg_13_5_otus/taxonomy/99_otu_taxonomy.txt -o -
82|ACCG02000012.525518.527039   0.290834    484303  99.7    k:Bacteria  p:Actinobacteria    c:Actinobacteria    o:Bifidobacteriales f:Bifidobacteriaceae    g:Bifidobacterium   s:breve
821|AF414833.1.1363 0.127213    4480544 99.3    k:Bacteria  p:Bacteroidetes c:Bacteroidia   o:Bacteroidales f:Prevotellaceae    g:Prevotella    s:nigrescens


What does "Skipping because of primer matches with wrong orientation" mean?

When I run primer_trimmer.py, I get a lot of warning messages about “Skipping ___ because of primer matches with wrong orientation”. What does this mean?

The code that probably caused this situation was in lines 181-183 of primer_trimmer.py and it looked like this:

        if pos['fw'] > pos['rv']:
            LOG.info("Skipping %s because of primer matches with wrong orientation" % (seqrec.id))
            Continue

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.