Hi, Dr. Hu, thanks for your excellent work at NextOmics. I have a few questions about

NO, the main purpose of this step is to correct structural errors, using mapping

Questions about "seed_cutoff" option about nextdenovo HOT 10 OPEN

nextomics commented on May 23, 2024

Questions about "seed_cutoff" option

from nextdenovo.

Comments (10)

moold commented on May 23, 2024

Hi,
1, No, although NextDenovo will try to correct reads with length longer than 1001bp, but it will filter some low quality, low depth reads and..., so the output corrected read is much less than the raw reads with length > 1001bp.

2 & 3, Your data is not enough for assembly using the currently version of NextDenovo with default options, because all default options are optimize with 60-100x NanoPore data. So it will produce an unexpected assembly result. But if you still want to use NextDenovo to do the assembly, you can try to use the option correction_options = -b and change -k 20 in sort_options and than rerun all pipeline, while I can not guarantee you can get a good result. You can try to other assemblers.

from nextdenovo.

TypicalSEE commented on May 23, 2024

Thanks for your reply, it helps a lot. But what still confuses me is: should I set seed_cutoff as low as possible(1001, for example) when I have enough nanopore data and enough CPUs? Will correcting as many reads as possible improve assembly quality? Thanks again.

from nextdenovo.

moold commented on May 23, 2024

Yes, but I recommend using bin/seq_stat to calculate the expected seed cutoff.

from nextdenovo.

gitcruz commented on May 23, 2024

Dear Dr. Hu,

I've recently run nextdenovo using 33x ONT reads from a 1Gb genome. After running seq_stats the suggested seed cutoff was 0 bp. However as the minimum read length was 1000bp I set the seed_cutoff to 1.1k. Results were a bit dissapointing with N50=2Mb. As a comparison for a mammalian genome with 70x Pacbio I've got an N50 of 76Mb!!!

I wonder if it worths tweaking some of the parameters as suggested above (correction_options = -b and change -k 20 in sort_options) or would be necessary to gather more data to reach 60x (that is not always possible)?

I also would like to know if there is some document with more detailed help on this assembler. do you have any sort of manual, white paper or do you plan to upload an MS to biorxiv?

The results on the mammal are very encouraging, the program is definitely a tool to consider for achieving chomosome-scale assemblies.

Thanks,
Fernando

from nextdenovo.

moold commented on May 23, 2024

Hi, the input data is not enough, and the seed length is too short, you can see the default value of option -min_len_seed in nextcorrect.py is 10k, so most of corrected seeds will be filtered, currently, the default options are optimized for input data size >= 60x and seed length >=20Kb , Otherwise, it will produce some unexpected results and need be careful to check assembly quality.
BTW I am now preparing the manuscript of NextDenovo, I also will provide some default options for short seeds and 30x input data in the next release. But, if you want to get a better assembly result, it is recommend to sequencing >=60X data using NanoPore ultra-long libraries.

from nextdenovo.

gitcruz commented on May 23, 2024

Thanks, Having more data would be always great. I would love to use ultralong reads, but as far as I know it will require a lot more input DNA. I look forward to read the manuscript. Best, Fernando El sáb., 25 jul. 2020 11:05, Hu Jiang <[email protected]> escribió:

…

Hi, the input data is not enough, and the seed length is too short, you can see the default value of option -min_len_seed in nextcorrect.py is 10k, so most of corrected seeds will be filtered, currently, the default options are optimized for input data size >= 60x and seed length >=20Kb , Otherwise, it will produce some unexpected results and need be careful to check assembly quality. BTW I am now preparing the manuscript of NextDenovo, I also will provide some default options for short seeds and 30x input data in the next release. But, if you want to get a better assembly result, it is recommend to sequencing >=60X data using NanoPore ultra-long libraries. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#49 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB34KVJ7LG7OHIJFQWLLE3LR5KN43ANCNFSM4KMFXSMQ> .

from nextdenovo.

gitcruz commented on May 23, 2024

Hi Hu,

I just read that latest release (version 2.3.1), use non-seed reads to correct structural & base errors if seed depth < 35
I guess those are the default options you mentioned above. Thus, should I expect also better results in cases with ONT coverage >=30x? Did you run tests on that front?

Thanks,
Fernando

from nextdenovo.

moold commented on May 23, 2024

NextDenovo is only an assembly software, so if you need a more accuracy assembly, you can try to NextPolish

from nextdenovo.

gitcruz commented on May 23, 2024

Hi,
Ok, I see. The option is just affecting to base level accuracy (i.e. use non-seed reads to correct structural & base errors if seed depth < 35).
I was thinking about getting better contiguity and assembly quality (fewer miss-assemblies) with less data. Thus, v2.3.1 still requires coverage >= 60x for optimal results, right?
Thanks,
Fernando

from nextdenovo.

moold commented on May 23, 2024

NO, the main purpose of this step is to correct structural errors, using mapping depth information and overlapped coordinates between seeds.
Yes.

from nextdenovo.

Questions about "seed_cutoff" option about nextdenovo HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent