Hello, I have about 20x (per allele) PromethION data of a highly heterozygous plan

correcting low cov reads from heterozygous genomes about fmlrc HOT 5 CLOSED

dcopetti commented on July 18, 2024

correcting low cov reads from heterozygous genomes

from fmlrc.

Comments (5)

holtjma commented on July 18, 2024

Hello,

We've never explicitly tested this, but I can give you some information on what I would expect to happen.

For FMLRC, the coverage of your long reads doesn't matter actually since reads are all corrected individually. Instead, this is going to largely depend on the short-read data you're using. FMLRC looks for evidence of the k-mer sequences in the short reads, so if a particular allele is absent (or at a low frequency) from that short read data, then FMLRC will treat it as if it were a sequencing error and will most likely correct it to an allele that is present in the short read data. However, if you have multiple alleles and those alleles are present at the required thresholds, then FMLRC should recognize the allele as a valid k/K-mer. Does that make sense?

As for suggested parameters, I don't have any reason to believe one value for -k or -K will outperform another from a heterozygosity perspective. However, -m and -f both influence what FMLRC will consider a supported k/K-mer. If you were using something like 20x short-read data on heterozygous samples, then I would likely recommend lowering to -m 3 (indicating at least 3 reads required for a valid k/K-mer) simply because you are expecting fewer reads per allele and the default of -m 5 is possibly too high for some situations. Again, we never performed any tests on that type of data, so this is all based on my expectations for how the algorithm would perform.

Let me know if you have any more questions!

from fmlrc.

holtjma commented on July 18, 2024

Closing due to inactivity. Feel free to open if you have more questions.

from fmlrc.

dcopetti commented on July 18, 2024

Hello,
Another question: how are indels dealt with in FMLRC? I mean, if the ONT raw read has a 5 bp insertion that introduces a new (rare) k-mer, how is that region corrected? Assuming that the 5 new k-mers will be at low frequency in the BWT index.
Even not correcting them should be OK, under the assumption that indel errors are occurring randomly. In that way, other overlapping e.c. reads will not have that insertion and will drive the consensus.
Does it make sense?

from fmlrc.

holtjma commented on July 18, 2024

In the code, indels and single base changes are indistinguishable and we calculate edit distance between the uncorrected and the correction in the event of multiple possible corrections that need to be selected from.

The short answer is that any k-mer that is not solid (i.e. present) in the short read BWT will be treated as an error, even if that same k-mer block occurs hundreds or thousands of times in the long read data (remember, each long read is handled independently).

Currently, solid is defined using two parameters:

-m INT (default: 5)- this is the absolute minimum for a k-mer to be considered solid; any count less than this will be considered an error that needs correcting
-f FLOAT (default: 0.10) - this creates a dynamic minimum based on the read. Given a read, we first calculate all k-mer counts for that read, and then calculate the median of all counts greater than the absolute minimum (the -m parameter above). Then, we calculate a second minimum, min2 = median*f. Any counts less than that second minimum are also considered errors that need correction.

So if the short 5-bp insertion is present at the above requirements in your short read dataset, then I would not expect fmlrc to correct it because it thinks the k-mers are not errors.

from fmlrc.

dcopetti commented on July 18, 2024

clear now, thanks!

from fmlrc.

correcting low cov reads from heterozygous genomes about fmlrc HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent