Comments (6)
The default behaviour of ASE is to allow non-zero exit codes for the wrapped DFT binaries. So, it works fine without covalent but not with!
from covalent-slurm-plugin.
Where I ran into this is using ASE with quantum espresso (although I have seen it in other codes also).
Typically, in ASE, one would have an environment variable set:
os.environ["ASE_ESPRESSO_COMMAND"] = "srun pw.x -npool %d -ndiag 1 -input PREFIX.pwi > PREFIX.pwo" % num_nodes
Finishing this command with an exit code other then 0 will then cause Covalent to crash out. You can get around this by doing the below instead:
script = r'''bash -c '(srun pw.x -npool %d -ndiag 1 -input PREFIX.pwi 2>stderr.log 1>PREFIX.pwo || exitcode=$?) && [[ $exitcode -eq 3 ]] && exitcode=0
if [[ $exitcode -eq 3 ]]; then
>&2 cat stderr.log
fi
exit $exitcode'
''' % (num_nodes)
os.environ["ASE_ESPRESSO_COMMAND"] = script
but it's a a bit of a hack. The error codes could be handled in the ASE internals but the standard user isn't going to want to do that. From a UX standpoint, we may wish to do something about all of this.
from covalent-slurm-plugin.
Interesting. I could see the use case although I personally haven't run into this myself because I am almost always running quantum chemistry codes with Python wrappers that automatically capture the exit code and then decide if an error should be raised (so, one abstraction layer higher if that makes sense).
In the end, if the user is launching the executable with a Python subprocess, this never really has to be an issue. But if they are calling the executable directly in the Slurm script I could see it coming up.
from covalent-slurm-plugin.
Ah! You said the magic words. I totally get what you mean now! I'm 100% on board.
from covalent-slurm-plugin.
As a side-comment: does that exit code issue not make the ASE calculator not useful as-is? Perhaps put another way, would things work fine without Covalent but not with Covalent? I'm just curious.
from covalent-slurm-plugin.
Ah, I realized another related scenario where this comes up (for me at least). A lot of the workflows I run have some sort of error-handling routine associated with them, such that if the executable errors out, the error-handling routine will try to fix it and re-launch the executable rather than cancel the Slurm job entirely. That's the basis behind the Custodian package we use with VASP. For the kinds of calculations I do, I don't want Covalent to care about the exit code in a subprocess.
from covalent-slurm-plugin.
Related Issues (20)
- Update electron statuses for `SlurmExecutor` HOT 1
- Error when using SlurmExecutor on RHEL8 compute nodes HOT 1
- Add support for a certificate file to be passed to `asyncssh.connect()` HOT 1
- Support commands beyond the SLURM directive HOT 2
- Docs updates: Clarify remote dependencies (`cloudpickle`, `covalent`), Python version matching, and `parsable` flag
- `self._remote_func_filename` is not defined when a SLURM job hits the walltime
- Accurate time logging for queued jobs HOT 2
- Update PR template
- Allow for SLURM submission locally HOT 2
- Allow for the creation of unique subfolders in the current working directory to avoid file overwriting
- Support for login without SSH key HOT 1
- SLURM job crashes if Conda is not installed HOT 1
- Add an option, `use_srun: bool`, that can run the Python function without `srun`
- Add support for dropped connections HOT 2
- Update to sshproxy instructions in README.md
- Slurm electrons fails when called within a Dask sublattice which itself is called in a Dask lattice. HOT 2
- Slurm sublattice fails with "username is a required parameter in the Slurm plugin." HOT 6
- Setting the executor in a `@ct.lattice` decorator does not use the right configuration parameters HOT 1
- Make it possible for users to pass optional kwargs to `asyncssh.connect()`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from covalent-slurm-plugin.