Giter Site home page Giter Site logo

denovo-assembly-long-read's Introduction

Denovo Assembly for Long Read Sequences

Long read sequencing technologies such as PacBio and Oxford Nanopore are becoming more popular in the field of genomics. These technologies are capable of generating long reads. The long reads are useful for assembling the genome of complex organisms. In this project, we will use the PacBio HiFi reads to assemble the genome of yeast Saccharomyces cerevisiae.

"Software installation is like a box of chocolates, you never know what you're gonna get. No matter what tool you use, conda, mamba, sudo, pip or whatever, it always finds a way to throw an error and refuse to install.

---- A frustrated bioinformatician

Data

I used the filtered and clipped version of fastq file for the analysis (link)

For simplicity, I just put all the inputs in ../data/ folder to make git push easier (we could use .gitignore too). Don't be worry, everytime we need an input, I will mention the url.

1 - Quality Control of the reads

sofware: FastQC v0.11.9

The first step is to check the quality of the reads. We can use FastQC to do this. The command is given below:

$  fastqc  <input>  -o <output>
$  fastqc  ../data/SRR13577846.fastq.gz  -o 1-QC/


$  firefox 1-QC/SRR13577846_fastqc.html  # open the report in firefox

There are some alert in fastqc report. We can ignore them for now as it is an timely intensive excerise. Maybe we will come back to this later.

2 - Perform de novo assembly using Hifiasm

software: Hifiasm 0.18.8-r525 (Link)

Hifiasm is a fast and accurate assembler for PacBio HiFi reads. The command is given below:

hifiasm -o <output> -t <threads> <input>
$ hifiasm -o 2_assembly/SRR13577846 -t 5 ../data/SRR13577846.fastq.gz

# Real time: 1602.548 sec; CPU: 7809.897 sec; Peak RSS: 13.164 GB

hifiasm output is a set of files. You can find the details in the documentation. We us <>.bp.p_ctg.gfa file which contains the assembled contigs.

Quast and BUSCO needs contigs in fasta format. We can use awk to convert the gfa file to fasta file.

$ awk '/^S/{print ">"$2;print$3}' 2_assembly/SRR13577846.bp.p_ctg.gfa > 2_assembly/SRR13577846.fa

The fasta file "SRR13577846.fa" is the input for Quast and BUSCO.

3 - Perform quality assessment using Quast

software: Quast v5.0.2 (link)

Quast is a tool for quality assessment of genome assemblies. The command is given below:

Saccharomyces cerevisiae reference genome is available at this page. You can download this and put it in ../data/ folder. I renamed it to ref.fna.

$ quast -r <reference> <input> -o <output>
$ quast -r ../data/ref.fna 2_assembly/SRR13577846.fa -o 3_QUAST/

$ firefox 3_QUAST/report.html

4 - Perform quality assessment using BUSCO

software: BUSCO v4.1.4 (link)

BUSCO is a tool for quality assessment of genome assemblies which is based on the presence of orthologous genes.

busco --list-datasets # to find lineage also can be selected autolineage

The command is given below:

$ busco -i <input> -o <output> -l <lineage> -m <mode> -c <threads> (-f= force to override, -q=just report error)
$ busco -m genome -i 2_assembly/SRR13577846.fa -o 4_BUSCO  -f -q  -l saccharomycetes_odb10 

Finally, lets look at the results with MultiQC:

$ multiqc 1-QC/ 3_QUAST/ 4_BUSCO/ -o 5_multiQC/
$ firefox 5_multiQC/multiqc_report.html

denovo-assembly-long-read's People

Contributors

aradar46 avatar

Stargazers

xhn avatar

Watchers

Dag Ahren avatar  avatar

denovo-assembly-long-read's Issues

Feedback to the repo

HI!
Very nicely done with the repo!

I have no suggestions for improvement.

Best
Dag

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.