Giter Site home page Giter Site logo

Comments (4)

lcolladotor avatar lcolladotor commented on June 1, 2024

Regarding mem_free, I do see that Nextflow passes the memory value to another SGE option, but I'm not sure if we would also need to pass it explicitly to mem_free. This might be something we could ask Mark Miller

from speaqeasy.

gpertea avatar gpertea commented on June 1, 2024

Could (should?) the mem_free option as used in the CountObjects step be scaled dynamically depending on the number of samples in samples.manifest ?
After using a larger mem_free value in the jhpce.config file in a few small runs last week I got an email from JHPCE support (Subj: "JHPCE Cluster Job RAM Exception Report") basically scolding me for wasting RAM on the cluster (I used mf=24G for the countObjects step as I had a larger batch to process before that; using mf=70G for 2 cores essentially asks for 140G RAM which would be an even bigger waste for regular size batches).

from speaqeasy.

lcolladotor avatar lcolladotor commented on June 1, 2024

I don't think that this is super easy to calculate. Some software has a baseline usage regardless of the number of samples and/or the size of the input (like a FASTQ with 1 million reads vs one with 100 million). Also, from the number of samples it's hard to guess the actual size of the dataset (like number of reads).

Overall, you can initially dodge the JHPCE usage reports by using bluejay. I personally think that it's ok to overshoot a little bit, though I also try to keep track of my memory usage and adjust things manually if the default setting is way too high.

Having said that, if you want to try to give it a go, that'd be great. I think that in the past we had 2 settings: like a default and a "high mem" setting. So we basically had figured out some of these numbers for 2 broadly common scenarios.

from speaqeasy.

gpertea avatar gpertea commented on June 1, 2024

I think a decent estimate for the number of reads (or total bases) in a sample can be made by looking at the input file sizes (multiply by ~3 for the compressed ones I guess). Anyway I think it would be good for now if the user can provide this as an option for a specific run, by modifying their own instance of the run*.sh script -- I hope nextflow allows this kind of override of values in jhpce.config.

from speaqeasy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.