Comments (8)
We don't provide such an option. You could aligned reads back to the contigs with short-read aligners say Bowtie2.
from megahit.
It might be good to tag this as a bug/feature request? One of the standard pipelines for metagenome analysis is to assemble, then dump the singlet reads in with the contigs, and then do downstream analysis.
As an example I just grabbed the newest uploaded WGS metagenome on MGRAST (4640988.3)
I assembled using MIRA to find out approximatley how many reads do not assemble into contigs. Out of 519,098 input reads, less than half assembled into contigs. It seems unwise to discard half of your data.
$ grep -c "^>" mgm4640988.3.050.upload.fna
519098
$ wc -l Test_info_contigreadlist.txt
254040
$ wc -l Test_info_debrislist.txt
265058
And since the reads that went into which contigs, is also not reported, there is no way to calculate coverage. This means you cannot do abundance calculations on the data (or use this data on MGRAST).
from megahit.
It is not a bug. Knowing little about MIRA, but I think most de Bruijn graph based assemblers would not report which reads goes to which contigs, because reads have been already chopped into kmers and kmers are de-duplicated.
One could easily aligned the reads back to the contigs and output the unmapped reads for subsequent analysis. As I know Bowtie2 provides some options to output unmapped reads. I would probably write a tutorial on this in the near future.
from megahit.
bbmap can do it like so:
bbmap in=read_1.fq in2=read_2.fq ref=final.contigs.fa nodisk outu=unmapped_reads_1.fq outu2=unmapped_reads_2.fq
You might want to change the stringency of the mapping though.
On Jul 29, 2015, at 12:00 AM, Dinghua Li <[email protected]mailto:[email protected]> wrote:
It is not a bug. Knowing little about MIRA, but I think most de Bruijn graph based assemblers would not report which reads goes to which contigs, because reads have been already chopped into kmers and kmers are de-duplicated.
One could easily aligned the reads back to the contigs and output the unmapped reads for subsequent analysis. As I know Bowtie2 provides some options to output unmapped reads. I would probably write a tutorial on this in the near future.
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/56#issuecomment-125851597.
from megahit.
Thanks @epruesse. I decided to write a simple tutorial based on BBMap.
from megahit.
Cool, I got bbmap to work. Out of my test assembly of 30,000 sequences, MEGAHIT/bbmap had 1,805 singlets. I used CAP3 (as it is very simplistic, but super slow) to get a basis of comparison, and CAP3 assembly had 1,580 singlets, which is about the same as MEGAHIT/bbmap. So only tweaking the stringency to match default settings of MEGAHIT is needed.
Where is the writeup tutorial? I am now trying to figure out how to get the coverage of each contig from bbmap, since I believe that is not something estimated by MEGAHIT. I tried different bbmap "Histogram and statistics output parameters:" but all gave weird results (lots of zero coverage, when techincally a contig should have had at least two reads go into it)
from megahit.
@deprekate https://github.com/voutcn/megahit/wiki/An-example-of-real-assembly
BTW I am consulting with the author of BBMap for a set of parameters for mapping reads to contigs so the tutorial will be updated in the near future.
from megahit.
@deprekate The tutorial has been updated. I am closing this issue.
from megahit.
Related Issues (20)
- Problem with big assembly HOT 1
- Error with latest OS X Bioconda recipe HOT 1
- Assembly contiguity & sequencing depth
- Usage on Mac - megahit_core read2sdbg
- Ubuntu WSL and Open-Suse server megahit_core read2sdbg
- Stuck with exit code 1 for a while please help HOT 2
- Hi~I got this error: 'Exit code -9' HOT 3
- == Error == system call for: "['']" finished abnormally, OS return value: 2
- Will circular DNA be reported? HOT 3
- Exit Code -6
- Exit code 1
- Running into a permission error while running a test data
- How to solve memory shortages?
- Megahit running for almost a month now
- MEGAHIT output: scaffolds HOT 3
- How to tell what the "peak memory" or "maximum memory needed" would be for a job based on the log?
- Is there a default for the --preset option?
- Error: Exit code -7
- Merge simple edges in the assembly graph? HOT 1
- Illumina SE
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from megahit.