Giter Site home page Giter Site logo

Speeding up dnmtools states about dnmtools HOT 6 CLOSED

hchetia avatar hchetia commented on August 17, 2024
Speeding up dnmtools states

from dnmtools.

Comments (6)

andrewdavidsmith avatar andrewdavidsmith commented on August 17, 2024

Possibly. I'll keep this open and probably rename the issue, add detail so we might have a roadmap on how to do that. However, there's no simple switch that we can use to make this happen right away.

from dnmtools.

hchetia avatar hchetia commented on August 17, 2024

Hi @andrewdavidsmith
Thanks for your response.
I have run DNMTOOLS states on sam file (44GB). It has been running for >100 hours and have generated 3 MB of epireads.
A snippet from "top"
image

Seems like "states" could be made capable of using more memory and cores. The input sam could be split the temporary samlets and those samlets can be read in parallel.

Thanks.

from dnmtools.

andrewdavidsmith avatar andrewdavidsmith commented on August 17, 2024

That's not the expected behavior. If the reads are not sorted in the expected order, there's a chance it could turn from a linear time computation into a quadratic time one. If you could find a way to share the data with me I can check. Although I know you might not want to share all the data, the problem might not be easily revealed on just a small part. Let me know and feel free to email me.

from dnmtools.

andrewdavidsmith avatar andrewdavidsmith commented on August 17, 2024

The right thing for us to do is have the code verify the sorting of reads, but sometimes the code attempts to just proceed and compensate if it gets unexpected input.

I also notice from your screen capture that the program is using 17.2g of vmem, but only 3.2g of pmem, which suggests something else is going on and the program is likely thrashing. Are you sure you have sufficient available physical memory for that process?

from dnmtools.

hchetia avatar hchetia commented on August 17, 2024

@andrewdavidsmith You were right, the algorithm was compensating for unsorted inputs. I have now been successfully ran the conversion to epiread pretty quickly.
In terms of accelerating the program, I agree that the code should verify the sorting first.
Run details- ~15 mins to generate 1.5G epireads from 35G deduped sorted sam inputs (HG38).
Adding CPU info and meminfo here in case it's helpful to you-
CPU(s): 96 Thread(s) per core: 2 Core(s) per socket: 24 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz
Memory: 790 Gigs
I don't understand the thrashing part. Sharing another snippet here-
image

from dnmtools.

andrewdavidsmith avatar andrewdavidsmith commented on August 17, 2024

@hchetia I'm closing this because I think the issue has been solved. It should not continue if the input is unsorted in a way that will cause slowdown. Specifically, all reads from the same chromosome need to be consecutive.

from dnmtools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.