Giter Site home page Giter Site logo

Singlet sequences about megahit HOT 8 CLOSED

voutcn avatar voutcn commented on May 26, 2024
Singlet sequences

from megahit.

Comments (8)

voutcn avatar voutcn commented on May 26, 2024

We don't provide such an option. You could aligned reads back to the contigs with short-read aligners say Bowtie2.

from megahit.

deprekate avatar deprekate commented on May 26, 2024

It might be good to tag this as a bug/feature request? One of the standard pipelines for metagenome analysis is to assemble, then dump the singlet reads in with the contigs, and then do downstream analysis.

As an example I just grabbed the newest uploaded WGS metagenome on MGRAST (4640988.3)
I assembled using MIRA to find out approximatley how many reads do not assemble into contigs. Out of 519,098 input reads, less than half assembled into contigs. It seems unwise to discard half of your data.

$ grep -c "^>" mgm4640988.3.050.upload.fna 
519098

$ wc -l Test_info_contigreadlist.txt 
254040

$ wc -l Test_info_debrislist.txt 
265058

And since the reads that went into which contigs, is also not reported, there is no way to calculate coverage. This means you cannot do abundance calculations on the data (or use this data on MGRAST).

from megahit.

voutcn avatar voutcn commented on May 26, 2024

It is not a bug. Knowing little about MIRA, but I think most de Bruijn graph based assemblers would not report which reads goes to which contigs, because reads have been already chopped into kmers and kmers are de-duplicated.

One could easily aligned the reads back to the contigs and output the unmapped reads for subsequent analysis. As I know Bowtie2 provides some options to output unmapped reads. I would probably write a tutorial on this in the near future.

from megahit.

epruesse avatar epruesse commented on May 26, 2024

bbmap can do it like so:

bbmap in=read_1.fq in2=read_2.fq ref=final.contigs.fa nodisk outu=unmapped_reads_1.fq outu2=unmapped_reads_2.fq

You might want to change the stringency of the mapping though.

On Jul 29, 2015, at 12:00 AM, Dinghua Li <[email protected]mailto:[email protected]> wrote:

It is not a bug. Knowing little about MIRA, but I think most de Bruijn graph based assemblers would not report which reads goes to which contigs, because reads have been already chopped into kmers and kmers are de-duplicated.

One could easily aligned the reads back to the contigs and output the unmapped reads for subsequent analysis. As I know Bowtie2 provides some options to output unmapped reads. I would probably write a tutorial on this in the near future.


Reply to this email directly or view it on GitHubhttps://github.com//issues/56#issuecomment-125851597.

from megahit.

voutcn avatar voutcn commented on May 26, 2024

Thanks @epruesse. I decided to write a simple tutorial based on BBMap.

from megahit.

deprekate avatar deprekate commented on May 26, 2024

Cool, I got bbmap to work. Out of my test assembly of 30,000 sequences, MEGAHIT/bbmap had 1,805 singlets. I used CAP3 (as it is very simplistic, but super slow) to get a basis of comparison, and CAP3 assembly had 1,580 singlets, which is about the same as MEGAHIT/bbmap. So only tweaking the stringency to match default settings of MEGAHIT is needed.

Where is the writeup tutorial? I am now trying to figure out how to get the coverage of each contig from bbmap, since I believe that is not something estimated by MEGAHIT. I tried different bbmap "Histogram and statistics output parameters:" but all gave weird results (lots of zero coverage, when techincally a contig should have had at least two reads go into it)

from megahit.

voutcn avatar voutcn commented on May 26, 2024

@deprekate https://github.com/voutcn/megahit/wiki/An-example-of-real-assembly
BTW I am consulting with the author of BBMap for a set of parameters for mapping reads to contigs so the tutorial will be updated in the near future.

from megahit.

voutcn avatar voutcn commented on May 26, 2024

@deprekate The tutorial has been updated. I am closing this issue.

from megahit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.