Giter Site home page Giter Site logo

Comments (5)

holtjma avatar holtjma commented on July 18, 2024

Hello,

We've never explicitly tested this, but I can give you some information on what I would expect to happen.

For FMLRC, the coverage of your long reads doesn't matter actually since reads are all corrected individually. Instead, this is going to largely depend on the short-read data you're using. FMLRC looks for evidence of the k-mer sequences in the short reads, so if a particular allele is absent (or at a low frequency) from that short read data, then FMLRC will treat it as if it were a sequencing error and will most likely correct it to an allele that is present in the short read data. However, if you have multiple alleles and those alleles are present at the required thresholds, then FMLRC should recognize the allele as a valid k/K-mer. Does that make sense?

As for suggested parameters, I don't have any reason to believe one value for -k or -K will outperform another from a heterozygosity perspective. However, -m and -f both influence what FMLRC will consider a supported k/K-mer. If you were using something like 20x short-read data on heterozygous samples, then I would likely recommend lowering to -m 3 (indicating at least 3 reads required for a valid k/K-mer) simply because you are expecting fewer reads per allele and the default of -m 5 is possibly too high for some situations. Again, we never performed any tests on that type of data, so this is all based on my expectations for how the algorithm would perform.

Let me know if you have any more questions!

from fmlrc.

holtjma avatar holtjma commented on July 18, 2024

Closing due to inactivity. Feel free to open if you have more questions.

from fmlrc.

dcopetti avatar dcopetti commented on July 18, 2024

Hello,
Another question: how are indels dealt with in FMLRC? I mean, if the ONT raw read has a 5 bp insertion that introduces a new (rare) k-mer, how is that region corrected? Assuming that the 5 new k-mers will be at low frequency in the BWT index.
Even not correcting them should be OK, under the assumption that indel errors are occurring randomly. In that way, other overlapping e.c. reads will not have that insertion and will drive the consensus.
Does it make sense?

from fmlrc.

holtjma avatar holtjma commented on July 18, 2024

In the code, indels and single base changes are indistinguishable and we calculate edit distance between the uncorrected and the correction in the event of multiple possible corrections that need to be selected from.

The short answer is that any k-mer that is not solid (i.e. present) in the short read BWT will be treated as an error, even if that same k-mer block occurs hundreds or thousands of times in the long read data (remember, each long read is handled independently).

Currently, solid is defined using two parameters:

  1. -m INT (default: 5)- this is the absolute minimum for a k-mer to be considered solid; any count less than this will be considered an error that needs correcting
  2. -f FLOAT (default: 0.10) - this creates a dynamic minimum based on the read. Given a read, we first calculate all k-mer counts for that read, and then calculate the median of all counts greater than the absolute minimum (the -m parameter above). Then, we calculate a second minimum, min2 = median*f. Any counts less than that second minimum are also considered errors that need correction.

So if the short 5-bp insertion is present at the above requirements in your short read dataset, then I would not expect fmlrc to correct it because it thinks the k-mers are not errors.

from fmlrc.

dcopetti avatar dcopetti commented on July 18, 2024

clear now, thanks!

from fmlrc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.