Giter Site home page Giter Site logo

Comments (10)

moold avatar moold commented on May 23, 2024

Hi,
1, No, although NextDenovo will try to correct reads with length longer than 1001bp, but it will filter some low quality, low depth reads and..., so the output corrected read is much less than the raw reads with length > 1001bp.

2 & 3, Your data is not enough for assembly using the currently version of NextDenovo with default options, because all default options are optimize with 60-100x NanoPore data. So it will produce an unexpected assembly result. But if you still want to use NextDenovo to do the assembly, you can try to use the option correction_options = -b and change -k 20 in sort_options and than rerun all pipeline, while I can not guarantee you can get a good result. You can try to other assemblers.

from nextdenovo.

TypicalSEE avatar TypicalSEE commented on May 23, 2024

Thanks for your reply, it helps a lot. But what still confuses me is: should I set seed_cutoff as low as possible(1001, for example) when I have enough nanopore data and enough CPUs? Will correcting as many reads as possible improve assembly quality? Thanks again.

from nextdenovo.

moold avatar moold commented on May 23, 2024

Yes, but I recommend using bin/seq_stat to calculate the expected seed cutoff.

from nextdenovo.

gitcruz avatar gitcruz commented on May 23, 2024

Dear Dr. Hu,

I've recently run nextdenovo using 33x ONT reads from a 1Gb genome. After running seq_stats the suggested seed cutoff was 0 bp. However as the minimum read length was 1000bp I set the seed_cutoff to 1.1k. Results were a bit dissapointing with N50=2Mb. As a comparison for a mammalian genome with 70x Pacbio I've got an N50 of 76Mb!!!

I wonder if it worths tweaking some of the parameters as suggested above (correction_options = -b and change -k 20 in sort_options) or would be necessary to gather more data to reach 60x (that is not always possible)?

I also would like to know if there is some document with more detailed help on this assembler. do you have any sort of manual, white paper or do you plan to upload an MS to biorxiv?

The results on the mammal are very encouraging, the program is definitely a tool to consider for achieving chomosome-scale assemblies.

Thanks,
Fernando

from nextdenovo.

moold avatar moold commented on May 23, 2024

Hi, the input data is not enough, and the seed length is too short, you can see the default value of option -min_len_seed in nextcorrect.py is 10k, so most of corrected seeds will be filtered, currently, the default options are optimized for input data size >= 60x and seed length >=20Kb , Otherwise, it will produce some unexpected results and need be careful to check assembly quality.
BTW I am now preparing the manuscript of NextDenovo, I also will provide some default options for short seeds and 30x input data in the next release. But, if you want to get a better assembly result, it is recommend to sequencing >=60X data using NanoPore ultra-long libraries.

from nextdenovo.

gitcruz avatar gitcruz commented on May 23, 2024

from nextdenovo.

gitcruz avatar gitcruz commented on May 23, 2024

Hi Hu,

I just read that latest release (version 2.3.1), use non-seed reads to correct structural & base errors if seed depth < 35
I guess those are the default options you mentioned above. Thus, should I expect also better results in cases with ONT coverage >=30x? Did you run tests on that front?

Thanks,
Fernando

from nextdenovo.

moold avatar moold commented on May 23, 2024

NextDenovo is only an assembly software, so if you need a more accuracy assembly, you can try to NextPolish

from nextdenovo.

gitcruz avatar gitcruz commented on May 23, 2024

Hi,
Ok, I see. The option is just affecting to base level accuracy (i.e. use non-seed reads to correct structural & base errors if seed depth < 35).
I was thinking about getting better contiguity and assembly quality (fewer miss-assemblies) with less data. Thus, v2.3.1 still requires coverage >= 60x for optimal results, right?
Thanks,
Fernando

from nextdenovo.

moold avatar moold commented on May 23, 2024
  1. NO, the main purpose of this step is to correct structural errors, using mapping depth information and overlapped coordinates between seeds.
  2. Yes.

from nextdenovo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.