Giter Site home page Giter Site logo

Comments (8)

chuckpr avatar chuckpr commented on June 6, 2024

+1 I am curious about this myself as many of our 2x150 reads can be merged before assembly.

from megahit.

voutcn avatar voutcn commented on June 6, 2024

is K-max 127 long enough in this case ? is there a "rule" to determine k-max ?

Honestly, I am not sure. Basically, I would not recommend setting k to larger than 100, unless the sequencing depth is very high (some single-cell data can be > 1000x). Roughly speaking, the error rate of HiSeq is about 1%, when k goes as large as 100, you may rarely see any correct kmers from reads.

However, I don't have much experience with MiSeq. So I leave this issue open here for more discussion.

can we set k-max to a higher value ? does it make any sense ?

As I know the A5-Miseq pipeline makes use of very large k-mer size for MiSeq data. So I think probably it makes sense, but you still have to try and compare before going ahead.

To use larger k value. Two files should be modified

from megahit.

chuckpr avatar chuckpr commented on June 6, 2024

Thanks, I'll give it a shot.

If I change the --k-max value in the opts.txt file and run megahit with the --continue flag can I start an assembly where a previous assembly left off? This would be a quick way to see if k > 127 helps with my data. Thanks again!

from megahit.

voutcn avatar voutcn commented on June 6, 2024

@chuckpr No, you have to rerun it from the beginning. You can let k-max go as large as you want, and then evaluate the contigs assembled from different k in intermediate_contigs/k*.contigs.fa

from megahit.

jvollme avatar jvollme commented on June 6, 2024

Honestly, I am not sure. Basically, I would not recommend setting k to larger than 100, unless the sequencing depth is very high

But I thought there is an iterative read correction beginning at lower kmer steps (I thought that was one of the points of doing assemblies using iterative kmer sizes)? Shouldn't that remove most such errors before you reach the high kmers?
For Miseq Data of up to 300bp l got huge improvements (concerning N50 and maximum contig length) with IDBA_UD after modifying it to allow kmer lengths upto 251 bp. I am really looking forward to trying this with megahit as megahit is far more user friendly.

from megahit.

jvollme avatar jvollme commented on June 6, 2024

But I have a question regarding the exact modification for megahit:
The first change (in https://github.com/voutcn/megahit/blob/master/definitions.h#L56) is straigntforward. But what do I have to change here:

modify the > 127 to the k-max you want at https://github.com/voutcn/megahit/blob/master/megahit#L374

Can't find the value I am supposed to change here.

from megahit.

voutcn avatar voutcn commented on June 6, 2024

@jvollme if you are using the latest codes, simply change the value kMaxK in definitions.h works.

from megahit.

jvollme avatar jvollme commented on June 6, 2024

Thanks!
Maybe you could mention that info in your manual or readme somewehere? I think this info might be interesting for a number of people also but may be hard to find (even though Megahit is quite well documented otherwise).

from megahit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.