Giter Site home page Giter Site logo

stlfrassembly's Introduction

MetaSTHD Assembly Pineline

Description:

Hybrid de novo assembly process of metagenomic STLF and TGS data based on Wengan

The purpose of the pipline is that:

The co-barcoding information of stLFR and the short reads are accurate ,which combine with the long reads advantages of the three generations of data to improve the de novo assembly effect of the metagenome.

25c6f7b7ce53908576dc45e26c5db5b

The software are required:

  1. Python3 (version >2.7) and gcc (version >7.2)
  2. metaWRAP
  3. quast
  4. perl
  5. WENGAN and CLOUDESPADES
  6. longranger4stLFR

Install Wengan and Create the environment

Wengan : https://github.com/adigenova/wengan/releases/download/v0.2/wengan-v0.2-bin-Linux.tar.gz

$ cd usr/software 
$ wget https://github.com/adigenova/wengan/releases/download/v0.2/wengan-v0.2-bin-Linux.tar.gz
$ tar -zxvf wengan-v0.2-bin-Linux.tar.gz

Set the path of wengan

$ vi ~/.bashrc
export wengan="usr/software/wengan-v0.2-bin-Linux/wengan.pl"
$ source ~/.bashrc

Hybrid assembly with Wengan

There have some necessary shellfile need to download
Shellfile : https://github.com/sumoii/STLFRassembly.git

bin
|-- run.sh
|-- Step.1.0.getsource.sh
|-- Step.1.1.splitbarcode.sh
|-- Step.1.2.getcleandata.sh
|-- Step.1.3.stlfrto10x.sh
|-- Step.2.1.stlfrcloudspades.sh
|-- Step.2.2.ontwtdbg.sh
|-- Step.3.1.wenganaseembly.sh
|-- Step.4.1.1.prequast.sh
|-- Step.4.1.2.quast.sh
|-- Step.4.1.3.purify.sh
|-- Step.4.1.4.afterpurifyquast.sh
`-- Step.4.2.1.binning.sh
$ cd usr/software
$ git clone https://github.com/sumoii/STLFRassembly.git
$ vi ~/.bashrc
export PATH="/usr/software/STLFRassembly/shellfile/bin:$PATH"

All right ,then we can begin the process

Running quicking

There have a run.sh in the pipeline ,if sh run.sh -1 -2 ... the all step are run at the moment

$ sh run.sh
        -l   The longread path
        -1   The shortreads1 path
        -2   The shortreads2 path
        -f   The first pirfix [default: STLFR_CLOUDSPADES ]
        -s   The other pirfix [default: WTDBG ]
        -t   The threads number
        -m   The model of Wengan
        -g   The memory of set [GB]
        -x   The longreads format [default : ont]
        -w   The whitelist path
        -L   The longranger4stLFR/longranger path
        -M   The methed of you choose  [Quast or Binning]
                Quast The quast way
                        require the follow options
                        -r   The reference path
                        -q   The quast path
                Binning The binning way
                        require the follow options
                        -b   The binning path (MeatWRAP)
        -A   Choose of all methed (binning and quast)

or Step by Step

Data preprocessing

$ source Step.1.0.getsource.sh \
-l usr/database/Zymo-GridION-EVEN-BB-SN-PCR-R10HC-flipflop.fq.gz  \
-1 usr/database/V300045526B_L01_read_1.fq.gz  \
-2 usr/database/V300045526B_L01_read_2.fq.gz
$ sh Step.1.1.splitbarcode.sh -d Dataprepare -1 $shortreads1 -2 $shortreads2 -t $atools
$ sh Step.1.2.getcleandata.sh -t  $atools  -d Dataprepare

Step.1.3 STLFR > 10X

$ sh Step.1.3.stlfrto10x.sh
Usage:

         Preparing for  short assembly
         STLFR to 10X

Option:
        -a The tools path
        -1 The split_reads.1.fq.gz.clean.gz path
        -2 The split_reads.1.fq.gz.clean.gz path
        -t The threads numbers [default: 40]
        -w The whitelist path
        -f filter_num [default: 2]
        -m mapratio_num [default: 8]
        -M The memory number GB [default: 300]  
        
$ sh Step.1.3.stlfrto10x.sh -a $atools -1 Dataprepare/split_reads.1.fq.gz.clean.gz  -2 Dataprepare/split_reads.1.fq.gz.clean.gz \
-w usr/database/whitelist \
-f 2 -m 8 -M 300 -t 40

Assmeble

CLOUDSPADES AND WTDBG

$ name1="STLFR_CLOUDSPADES"
$ name2="WTDBG"
$ model="A"
$ sh Step.2.1.stlfrcloudspades.sh -f STLFR10X/longranger/out/barcoded.fastq.gz -o ${name1}_contigs -t 8 -m 250 -l longranger/to/path
$ sh Step.2.2.ontwtdbg.sh -l $longreads -w usr/software/wtdbg -t 40 -x ont 

WENGAN

$ sh Step.3.1.wenganaseembly.sh -l ${name2}_contigs/WTDBG.fa \
-s ${name1}_contigs/cloudspades_out/contigs.fasta -m ${model} \
-1 Dataprepare/split_reads.1.fq.gz.clean.gz \
-2 Dataprepare/split_reads.1.fq.gz.clean.gz \
-f ${name1} -d ${name2} -t 40 -g 3000 -x ontraw

Reference quast

Quast

$ sh Step.4.1.1.prequast.sh 
Description:
        Prepare for quast
Option:
        -r The path of reference genomic folder
        -1 <The first name of your contigs prefix>
        -2 <The other name of your contigs prefix>      
        -m The model you choose  
        
$ sh Step.4.1.1.prequast.sh -r usr/database/mock10_kraken2-fa -1 ${name1} -2 ${name2} -m ${model}
$ sh Step.4.1.2.quast.sh
Description:
        Quast of assembled genomic
        
Option:
        -f The path of abssebly file
        -q The path of quast.py  
     
$ cd ${name1}_${name2}_${model}_quast
$ sh Step.4.1.2.quast.sh \
-f ../${name1}_${name2}_${model}_assemble/${name1}_${name2}.SPolished.asm.wengan.fasta \
-q usr/software/quast/quast.py
$ cd ..

Purify

$ sh Step.4.1.3.purify.sh
Description:

        Purify of Quast report

Option:
        -p the path of Purify
        -r the path of assembled genomic
        -1 <The first name of your contigs prefix>
        -2 <The other name of your contigs prefix>
        -m the model of set before
$ sh Step.4.1.3.purify.sh -p $atools/contig_purify.py \
-r ${name1}_${name2}_${model}_assemble/${name1}_${name2}.SPolished.asm.wengan.fasta \
-1 ${name1} -2 ${name2} -m ${model}

Quast after Purify

$ sh Step.4.1.4.afterpurifyquast.sh
Usage:
        Quast after Purify
Option:
        -1 <The first name of your contigs prefix>
        -2 <The other name of your contigs prefix>
        -m <The model of Wengan that you choose>
        -p <The path of quast>
 
$ sh Step.4.1.4.afterpurifyquast.sh -1 ${name1} -2 ${name2} -m ${model} -p usr/software/quast/quast.py

Binning

$ sh Step.4.2.1.binning.sh -1 Dataprepare/split_reads.1.fq.gz.clean.gz -2 Dataprepare/split_reads.2.fq.gz.clean.gz \
-o ${name1}_${name2}_${model}_binning \
-M usr/software/metaWRAP \
-c 50 -x 10 -t 40 -l 1000

Result

$ tree -L 1
.
|-- Dataprepare
|-- WTDBG_contigs
|-- STLFR_CLOUDSPADES_contigs
|-- STLFR_CLOUDSPADES_WTDBG_A_assemble
|-- STLFR_CLOUDSPADES_WTDBG_A_Purify
|-- STLFR_CLOUDSPADES_WTDBG_A_quast
|-- STLFR_CLOUDSPADES_WTDBG_A_quast_afterPurify
`-- STLFR_CLOUDSPADES_WTDBG_A_binning
|   `-- STLFR_CLOUDSPADES_WTDBG.SPolished.asm.wengan.fasta
|-- STLFR_CLOUDSPADES_WTDBG_A_Purify
|   |-- 1280_genomic.fasta
|   |-- 1351_genomic.fasta
|   |-- 1423_genomic.fasta
|   |-- 1613_genomic.fasta
|   |-- 1639_genomic.fasta
|   |-- 287_genomic.fasta
|   |-- 28901_genomic.fasta
|   |-- 4932_genomic.fasta
|   |-- 5207_genomic.fasta
|   |-- 562_genomic.fasta
|   `-- filename.txt
|-- STLFR_CLOUDSPADES_WTDBG_A_quast
|   |-- 1280_genomic.quast
|   |-- 1351_genomic.quast
|   |-- 1423_genomic.quast
|   |-- 1613_genomic.quast
|   |-- 1639_genomic.quast
|   |-- 287_genomic.quast
|   |-- 28901_genomic.quast
|   |-- 4932_genomic.quast
|   |-- 5207_genomic.quast
|   |-- 562_genomic.quast
|   |-- filename.txt
|   `-- genomic.txt
|-- STLFR_CLOUDSPADES_WTDBG_A_quast_afterPurify
    |-- 1280_genomic_afterPurify.quast
    |-- 1351_genomic_afterPurify.quast
    |-- 1423_genomic_afterPurify.quast
    |-- 1613_genomic_afterPurify.quast
    |-- 1639_genomic_afterPurify.quast
    |-- 287_genomic_afterPurify.quast
    |-- 28901_genomic_afterPurify.quast
    |-- 4932_genomic_afterPurify.quast
    |-- 5207_genomic_afterPurify.quast
    `-- 562_genomic_afterPurify.quast

stlfrassembly's People

Contributors

sumoii avatar

Watchers

 avatar

stlfrassembly's Issues

Where do I find longranger4stLFR?

Hi,

Your pipeline seems very interesting and I'd like to try it with my stLFR data, but I can't find the longranger4stLFR dependency anywhere.
Where can I download it from?

Thanks, Ido

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.