Comments (10)
Hi,
1, No, although NextDenovo will try to correct reads with length longer than 1001bp, but it will filter some low quality, low depth reads and..., so the output corrected read is much less than the raw reads with length > 1001bp.
2 & 3, Your data is not enough for assembly using the currently version of NextDenovo with default options, because all default options are optimize with 60-100x NanoPore data. So it will produce an unexpected assembly result. But if you still want to use NextDenovo to do the assembly, you can try to use the option correction_options = -b and change -k 20 in sort_options and than rerun all pipeline, while I can not guarantee you can get a good result. You can try to other assemblers.
from nextdenovo.
Thanks for your reply, it helps a lot. But what still confuses me is: should I set seed_cutoff as low as possible(1001, for example) when I have enough nanopore data and enough CPUs? Will correcting as many reads as possible improve assembly quality? Thanks again.
from nextdenovo.
Yes, but I recommend using bin/seq_stat to calculate the expected seed cutoff.
from nextdenovo.
Dear Dr. Hu,
I've recently run nextdenovo using 33x ONT reads from a 1Gb genome. After running seq_stats the suggested seed cutoff was 0 bp. However as the minimum read length was 1000bp I set the seed_cutoff to 1.1k. Results were a bit dissapointing with N50=2Mb. As a comparison for a mammalian genome with 70x Pacbio I've got an N50 of 76Mb!!!
I wonder if it worths tweaking some of the parameters as suggested above (correction_options = -b and change -k 20 in sort_options) or would be necessary to gather more data to reach 60x (that is not always possible)?
I also would like to know if there is some document with more detailed help on this assembler. do you have any sort of manual, white paper or do you plan to upload an MS to biorxiv?
The results on the mammal are very encouraging, the program is definitely a tool to consider for achieving chomosome-scale assemblies.
Thanks,
Fernando
from nextdenovo.
Hi, the input data is not enough, and the seed length is too short, you can see the default value of option -min_len_seed in nextcorrect.py is 10k, so most of corrected seeds will be filtered, currently, the default options are optimized for input data size >= 60x and seed length >=20Kb , Otherwise, it will produce some unexpected results and need be careful to check assembly quality.
BTW I am now preparing the manuscript of NextDenovo, I also will provide some default options for short seeds and 30x input data in the next release. But, if you want to get a better assembly result, it is recommend to sequencing >=60X data using NanoPore ultra-long libraries.
from nextdenovo.
from nextdenovo.
Hi Hu,
I just read that latest release (version 2.3.1), use non-seed reads to correct structural & base errors if seed depth < 35
I guess those are the default options you mentioned above. Thus, should I expect also better results in cases with ONT coverage >=30x? Did you run tests on that front?
Thanks,
Fernando
from nextdenovo.
NextDenovo is only an assembly software, so if you need a more accuracy assembly, you can try to NextPolish
from nextdenovo.
Hi,
Ok, I see. The option is just affecting to base level accuracy (i.e. use non-seed reads to correct structural & base errors if seed depth < 35).
I was thinking about getting better contiguity and assembly quality (fewer miss-assemblies) with less data. Thus, v2.3.1 still requires coverage >= 60x for optimal results, right?
Thanks,
Fernando
from nextdenovo.
- NO, the main purpose of this step is to correct structural errors, using mapping depth information and overlapped coordinates between seeds.
- Yes.
from nextdenovo.
Related Issues (20)
- Assembly is running around a month and going strong - or is it stalled? HOT 15
- about the ParallelTask for the nextdenovo HOT 1
- NextDenovo Parent Job Failed, No Error, Subtasks still running HOT 1
- Low performance with highly repeated regions HOT 1
- non-zero exit status 137, error info: . - assembly still finishes though HOT 1
- Unable to run test data on macOS HOT 4
- Samtools sort out-of-memory HOT 1
- Segmentation fault (core dumped) at 03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0 HOT 1
- Can you run nextDenovo without a config file? HOT 1
- TypeError: unsupported operand type(s) HOT 2
- long time for assemling genome HOT 6
- Can I assemble a gap free genome using only regular ONT data? HOT 1
- telomeres getting lost HOT 6
- [4501 ERROR] 2023-06-13 22:29:40 the input data is insufficient for an assembly. HOT 3
- No assembly file after running NextDenovo HOT 1
- sort_align step erro HOT 1
- Does NEXTDENOVO filter the contig before output? HOT 2
- Error: db_stat failed ? HOT 1
- ERROR:the input data is insufficient for an assembly HOT 3
- Question about error rate for correction HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nextdenovo.