Is the current JHPCE config file well specified? <a href="https://github.com/LieberIns

Check JHPCE config file `clusterOptions` about speaqeasy HOT 4 CLOSED

lcolladotor commented on June 1, 2024

Check JHPCE config file `clusterOptions`

from speaqeasy.

Comments (4)

lcolladotor commented on June 1, 2024

Regarding mem_free, I do see that Nextflow passes the memory value to another SGE option, but I'm not sure if we would also need to pass it explicitly to mem_free. This might be something we could ask Mark Miller

from speaqeasy.

gpertea commented on June 1, 2024

Could (should?) the mem_free option as used in the CountObjects step be scaled dynamically depending on the number of samples in samples.manifest ?
After using a larger mem_free value in the jhpce.config file in a few small runs last week I got an email from JHPCE support (Subj: "JHPCE Cluster Job RAM Exception Report") basically scolding me for wasting RAM on the cluster (I used mf=24G for the countObjects step as I had a larger batch to process before that; using mf=70G for 2 cores essentially asks for 140G RAM which would be an even bigger waste for regular size batches).

from speaqeasy.

lcolladotor commented on June 1, 2024

I don't think that this is super easy to calculate. Some software has a baseline usage regardless of the number of samples and/or the size of the input (like a FASTQ with 1 million reads vs one with 100 million). Also, from the number of samples it's hard to guess the actual size of the dataset (like number of reads).

Overall, you can initially dodge the JHPCE usage reports by using bluejay. I personally think that it's ok to overshoot a little bit, though I also try to keep track of my memory usage and adjust things manually if the default setting is way too high.

Having said that, if you want to try to give it a go, that'd be great. I think that in the past we had 2 settings: like a default and a "high mem" setting. So we basically had figured out some of these numbers for 2 broadly common scenarios.

from speaqeasy.

gpertea commented on June 1, 2024

I think a decent estimate for the number of reads (or total bases) in a sample can be made by looking at the input file sizes (multiply by ~3 for the compressed ones I guess). Anyway I think it would be good for now if the user can provide this as an option for a specific run, by modifying their own instance of the run*.sh script -- I hope nextflow allows this kind of override of values in jhpce.config.

from speaqeasy.

Recommend Projects

Check JHPCE config file `clusterOptions` about speaqeasy HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent