Comments (5)
Hello,
We've never explicitly tested this, but I can give you some information on what I would expect to happen.
For FMLRC, the coverage of your long reads doesn't matter actually since reads are all corrected individually. Instead, this is going to largely depend on the short-read data you're using. FMLRC looks for evidence of the k-mer sequences in the short reads, so if a particular allele is absent (or at a low frequency) from that short read data, then FMLRC will treat it as if it were a sequencing error and will most likely correct it to an allele that is present in the short read data. However, if you have multiple alleles and those alleles are present at the required thresholds, then FMLRC should recognize the allele as a valid k/K-mer. Does that make sense?
As for suggested parameters, I don't have any reason to believe one value for -k
or -K
will outperform another from a heterozygosity perspective. However, -m
and -f
both influence what FMLRC will consider a supported k/K-mer. If you were using something like 20x short-read data on heterozygous samples, then I would likely recommend lowering to -m 3
(indicating at least 3 reads required for a valid k/K-mer) simply because you are expecting fewer reads per allele and the default of -m 5
is possibly too high for some situations. Again, we never performed any tests on that type of data, so this is all based on my expectations for how the algorithm would perform.
Let me know if you have any more questions!
from fmlrc.
Closing due to inactivity. Feel free to open if you have more questions.
from fmlrc.
Hello,
Another question: how are indels dealt with in FMLRC? I mean, if the ONT raw read has a 5 bp insertion that introduces a new (rare) k-mer, how is that region corrected? Assuming that the 5 new k-mers will be at low frequency in the BWT index.
Even not correcting them should be OK, under the assumption that indel errors are occurring randomly. In that way, other overlapping e.c. reads will not have that insertion and will drive the consensus.
Does it make sense?
from fmlrc.
In the code, indels and single base changes are indistinguishable and we calculate edit distance between the uncorrected and the correction in the event of multiple possible corrections that need to be selected from.
The short answer is that any k-mer that is not solid (i.e. present) in the short read BWT will be treated as an error, even if that same k-mer block occurs hundreds or thousands of times in the long read data (remember, each long read is handled independently).
Currently, solid is defined using two parameters:
- -m INT (default: 5)- this is the absolute minimum for a k-mer to be considered solid; any count less than this will be considered an error that needs correcting
- -f FLOAT (default: 0.10) - this creates a dynamic minimum based on the read. Given a read, we first calculate all k-mer counts for that read, and then calculate the median of all counts greater than the absolute minimum (the -m parameter above). Then, we calculate a second minimum, min2 = median*f. Any counts less than that second minimum are also considered errors that need correction.
So if the short 5-bp insertion is present at the above requirements in your short read dataset, then I would not expect fmlrc to correct it because it thinks the k-mers are not errors.
from fmlrc.
clear now, thanks!
from fmlrc.
Related Issues (20)
- Comment on % Pacbio sequences corrected ? HOT 3
- Segmentation fault: 11 HOT 6
- FMLRC (Or Ropebwt) not working correctly HOT 8
- does RNA long reads suit for correction? HOT 1
- Error during msbwt HOT 2
- Error corrupted size vs. prev_size HOT 6
- How to generate RLE BWT format for paired-end illumina data HOT 2
- Question about final command HOT 4
- iterated times matter with the results? HOT 4
- Index no output HOT 9
- [fmlrc-convert] ERROR - unexpected symbol in input: char: " ", hex: "20" HOT 2
- About short reads: insert size and type HOT 3
- Erronneous all-adenine contigs introduced HOT 2
- ../fmlrc-convert: No such file or directory HOT 3
- No output file HOT 4
- correct at ends of reads is not completed HOT 2
- Correcting Nanopore long reads with PacBio HiFi HOT 3
- fastq format long read file input? HOT 5
- Q: multiple round of flmrc2? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fmlrc.