libatoms / expyre Goto Github PK
View Code? Open in Web Editor NEWExecute Python Remotely
License: GNU General Public License v2.0
Execute Python Remotely
License: GNU General Public License v2.0
Modify tests so that those that can run without queuing system do so, and run those automatically in CI.
Perhaps eventually install a queuing system in CI and run all tests.
I had a problem with remote DFT calculation which has stopped with unreproducible error.
I suspect this might have to do with read/write bottleneck. I'm supposed to carry out DFT calculation in scratch folder.
I wanted to specify rundir
in config.json
file to scratch folder which is not the same as .expyre
.
But it seems like stage files for DFT submission is still generated in .expyre
not what I have specified as rundir
in config.json
.
Following is snippet of my config.json
file.
I wonder if I'm wrong with something or is it a bug.
I'll be appreciated for any comments.
Best regards,
Hyunwook
{
"systems": {
"local": {
"host": null,
"remsh_cmd": "/usr/bin/ssh",
"scheduler": "slurm",
"header": [
"#SBATCH --no-requeue",
"#SBATCH --nodes={num_nodes}",
"#SBATCH --ntasks-per-node={num_cores}",
"#SBATCH --mem=180000"
],
"commands": [
"module purge",
"module load anaconda/3/2021.05",
"module load gcc/10 intel/19.1.2 impi/2019.8 mkl/2020.2 fftw-mpi/3.3.9",
"source ${MKLROOT}/bin/mklvars.sh intel64",
"conda activate /u/hjung/conda-envs/mace_env",
"export QUIP_ARCH=linux_x86_64_gfortran_openmp",
"export QUIPPY_INSTALL_OPTS=--user",
"export WFL_AUTOPARA_NPOOL=40",
"export OMP_NUM_THREADS=1",
"export GAP_FIT_OMP_NUM_THREADS=40",
"export ASE_ESPRESSO_COMMAND='srun /u/hjung/Softwares/QE/qe-7.0/bin/pw.x -in PREFIX.pwi > PREFIX.pwo'",
"export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mpcdf/soft/SLE_12/packages/x86_64/intel_parallel_studio/2020.2/mkl/lib/intel64/",
"export ESPRESSO_PSEUDO='/u/hjung/Softwares/QE/pseudopotentials'"
],
"partitions": {
"tiny" : { "num_cores" : 1, "max_time" : "24:00:00", "max_mem" : "12GB" },
"medium" : { "num_cores" : 40, "max_time" : "24:00:00", "max_mem" : "24GB" }
},
"rundir": "/u/hjung/workflow/scratch"
},
On some remote machines just the ssh connection is somewhat slow. It would be nice if multiple job start commands could be combined, perhaps by gathering all the remote commands into an array of strings, and then running all of them in a single ssh connection.
It's sometimes nice to resubmit a remote job by hand (e.g. after tweaking up the max runtime). However, right now expyre will be confused if it tries to check on the new job while it's running, because the jobid won't match what's saved in the database. After the job is done it's fine, because the jobid doesn't matter. Maybe expyre should not even save the jobid, and just extract it as needed (e.g. for qdel) by filtering the output of qstat based on the job name, which is already a unique string. Note that the job name can be mangled by the queuing system wrappers, to avoid forbidden characters, so you can't just use the hash part of the rundir - you have to mangle it the same way.
Add keys to the dictionary used to format the System.header
string templates that have information about the job's memory requirements. These may need to be per task or per node or per job, and possibly supporting different units.
Go through feedback, resolve how to incorporate and open PRs/Issues
It would be useful to support more arbitrary queuing system header entries. Some times those need to replace one of the normal ones, like the node specification. Probably this can only generally implemented by replacing all the logic inside the queuing system python functions to a templating system. There will still be queuing system dependence because of the actual commands and parsing of the status queries, but the files may be better off entirely templated.
I was having trouble with a low memory node in the womble cluster in Engineering, where the standard partition is listed as having more memory than this node (which is in the partition). One solution would be able to specify additional scheduler header lines for certain jobs, such as memory requirements.
Make ExPyRe available for only pypi for easy pip installation. Package name?
So it doesn't involve worfklow; e.g. just os.system()
call of CASTEP or ORCA, etc.
Mostly a placeholder, but it would be possible to submit each job to multiple machines (or even partitions within a single machine), and monitor them, killing the others once one starts.
Hi all,
when running geometry optimization with Quantum Espresso via the calculation = 'relax' command with a small maximum number of relaxation steps (e.g. 20) the calculation will result in an error code 3.
The QE output still terminates with
=------------------------------------------------------------------------------=
JOB DONE.
=------------------------------------------------------------------------------=
as it should and includes all necessary information (i.e. energy and forces).
However, wfl will through an error and stop the iterative training.
Is it possible to prohibit certain error messages to halt the whole program?
Here is an example QE_run.tar.gz
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.