Comments (6)
I'm a little puzzled here. The Python client includes a resource manager plugin - e.g., Slurm. That plugin has the ability to request an allocation in it. The intent was that the Python client would not need to be started from within an allocation, but would instead execute its initial operations (fetching and building things) and then request an allocation when one was needed for actually running the tests.
The only issue we have encountered with that method is that the allocation request can take some time to be granted. Our internal solution was to simply create a high-priority queue for MTT operations and submit the request to it.
Is this not adequate for ECP systems? If not, note that the Python client already has a C/R capability in that you can have one ini file that downloads and builds things, and another ini file that flags the download/build components with ASIS
to indicate that MTT is to use the existing installations if present. Does that not also solve the problem?
from mtt.
I don't see how this would work with the current IU database reporter. Anyway I've written the code and it appears to serve my purposes.
from mtt.
You are welcome to use your code - however, the methods I described work just fine with the current IU reporter. We use it every day precisely that way. The builds are reported correctly even if previously built.
from mtt.
Hi @hppritcha , I am trying to understand your use-case and how it differs from ours. Your process allocates the cluster from a compute node? How does this work?
You said:
the requesting process is typically put on some compute node
This means the process that requested an allocation runs on a compute node? Don't you need an allocation before running anything on a compute node?
from mtt.
@ribab one invokes the allocation command from a front end node - the one you get placed on when you ssh to the system. For example with the ANL theta cluster, here's how you'd get an allocation:
ssh theta
XXX@thetalogin6:qsub -n 8 --jobname ompi -q debug-flat-quad -t 60 -I
upon granting the allocation, the user is placed on one of the internal mom
nodes on theta. Then one uses aprun or mpirun to launch the application.
A similar thing occurs on SLURM configured systems like NERSC cori. One can try using the salloc --no-shell
option to remain on one of the cori login nodes, but we've found that option to be unreliable for running many tests like we do in MTT.
The only way I can see how one might use MTT ALPS plugin on theta would be for the allocate command to include the name of a script which would somehow continue the MTT run on the backend mom node. The front end process running the ALPS plugin up to that point would be disconnected with whatever was going on in the backend.
The way the MTT SLURM and ALPS plugins are written, I suspect systems configured similar to some of Cray's internal systems were being used. There, SLURM is configured so that when one does an salloc, one remains on the front end nodes, not ssh'd into a compute or mom node on the backend. In this case, the plugins as is with their allocate/deallocate commands should work fine. PBS was similarly configured on those systems.
from mtt.
closed via #916
from mtt.
Related Issues (20)
- move to python3 HOT 9
- git plugin fails for any url that ends with ".git"
- Enhance Python client to facilitate debugging errors in MPI build phase
- Error when not including any middleware stages when using Combinatorial Executor HOT 1
- python client and intel compiler issue HOT 6
- imp module deprecated - should use importlib
- It's be nice to split MPI Install phase into multiple phases HOT 1
- README file is out-of-date
- Initialization warning/error messages
- cherrypy server code experiences traumas moving from python2 to python3 HOT 3
- Proposition for conditional stage execution HOT 7
- Git plugin fails to clone repo URLs with trailing / HOT 3
- Git plugin should (better) support downloading specific tags or commit hashes
- Question/Request: Is the GitHub wiki completely outdated regarding the Python client? If so, please make this more obvious. HOT 3
- cherrypy server doesn't recover from psycopg2.error - Idle InTransaction Session Timeout HOT 1
- .elog files are huge HOT 5
- Intel test suite with Python client HOT 4
- CherryPy server internal error when submitting results from Python client HOT 15
- several places - likely - in MTT python code where iterating over keys in a dic and altering HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mtt.