Giter Site home page Giter Site logo

Comments (5)

fhormozd avatar fhormozd commented on September 8, 2024

I think this is the reason why so many of the "interesting" k-mers are in fact inherited from one the parents. I don't think this is an issue of false positive rate.

from kevlar.

standage avatar standage commented on September 8, 2024

No, I'd say it's either a problem with abundance queries (very unlikely) or my new banding approach to counting the k-mers in the first place (much more likely).

from kevlar.

standage avatar standage commented on September 8, 2024

Several issues I've uncovered over the last couple of days.

  • The assumption that a k-mer hashes to the same value as its reverse complement is baked into the kevlar code everywhere. This was true for khmer's original hash function, but not for the recent updates to support k > 32 and k-mer banding. This is fixed now in khmer's refactor/hashing2 branch.
  • The number of reads reported for each contig uses read IDs as a unique key. Paired reads from the BAM-derived human data have identical IDs, using the bitwise flag to distinguish forward and reverse. And even reads downloaded from the SRA using fastq-dump don't include pairing information (/1 and /2) by default.

These findings don't address all of the concerns we've seen with the results, but they will definitely help cut out some sources of confusion.

from kevlar.

standage avatar standage commented on September 8, 2024

Over the weekend I ran kevlar on 3 E. coli genomes. See https://gist.github.com/standage/05c407ccb2cc13c382e1880a6254ed30. The pipeline reported 1451 novel k-mers associated with 517 contigs. These results need some more inspection, but I noticed immediately that kevlar is still reporting some highly inflated k-mer abundances. One of the reported k-mers CACTATGAAAGTGGGCTTC has a reported of abundance of 15 and 17 in the cases, but has a true abundance of 1 and 0. Note: this time kevlar was not run in banding mode.

from kevlar.

standage avatar standage commented on September 8, 2024

It turns out that my intuition regarding FPR was a bit off. See dib-lab/khmer#1619.

from kevlar.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.