Giter Site home page Giter Site logo

Comments (4)

dklein-pik avatar dklein-pik commented on July 29, 2024

In the process we had before, the setup of the run was done sequentially on the login node, (potentially) taking quite a long time to start the last run (if a couple of runs were started). By integrating the run setup into the individual slurm jobs we intended to make the starting of runs faster for the user and parallelize those parts of the script that can be parallelized. However, you are right, for the short time of the individual run setup we now block more CPUs than required. To find out how severe this is, I looked into my latest coupled runs (n=177). Here are the mean durations:

  type  section tot           
  <ord> <chr>   <drtn>        
1 rem   GAMS    67.272571 mins
2 rem   output   7.906585 mins
3 rem   prep     2.562874 mins

It seems that the mean preparation time (2.56) is not very long compared to the mean GAMS runtime (67.27). However, with all the runs we blocked 12 CPUs 177 times for 2.56 minutes, which is something. By splitting the process of starting runs up again into two parts (a slurm job preparing all the runs, and the actual parallel runs), we could save cluster resources, but lose time due to the sequential procedure of preparing the runs.

from remind.

giannou avatar giannou commented on July 29, 2024

I don't get what you mean with "sequential procedure". Can you specify? The login nodes on the cluster have around 100 CPU's available. I'm sure they can host our model preparation jobs, so no need to send them to the compute nodes. But if we have to send them to the compute nodes then If we had one SLURM job on one CPU preparing the run (no need to prompt the users here) and then a second one (specified by the user, for the GAMS part and the reporting) we would save resources, no?

from remind.

dklein-pik avatar dklein-pik commented on July 29, 2024

With "sequential procedure" I mean all the parts in the preparation scripts that are outside the "lock" block. Inside this block are mainly the NDC calculations and the singleGAMSfile(), which is, I admit, probably the heaviest part. This part can not be parallelized, meaning the runs wait for each other to go through this bottleneck. All the rest can take place in parallel. Even if the parts mentioned above are executed sequentially, the starting of runs from a users point of view is faster now, because he does not need to wait for the starting script to loop over all runs (including the lame singleGAMSfile part). All runs are send to the cluster immediately and the user is done with starting the runs.

If we had one SLURM job on one CPU preparing the run (no need to prompt the users here) and then a second one (specified by the user, for the GAMS part and the reporting) we would save resources, no?

That's right. This would have a similar positive effect for the user mentioned above. Needs some rework of the starting scripts. I can take care of that after my first month of parental leave, which I expect to start soon.

from remind.

johanneskoch94 avatar johanneskoch94 commented on July 29, 2024

David maybe the switch to using dependencies with the job submissions is useful for this? I've played with my setup that I showed you a while back, remember? And since you've already split the prepare_run function into 2, it was pretty straight forward to submit a prepare- and a run job with different sets of resource requirements.

from remind.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.