Giter Site home page Giter Site logo

expyre's People

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

expyre's Issues

automated tests in CI

Modify tests so that those that can run without queuing system do so, and run those automatically in CI.

Perhaps eventually install a queuing system in CI and run all tests.

rundir in config.json

I had a problem with remote DFT calculation which has stopped with unreproducible error.
I suspect this might have to do with read/write bottleneck. I'm supposed to carry out DFT calculation in scratch folder.
I wanted to specify rundir in config.json file to scratch folder which is not the same as .expyre.
But it seems like stage files for DFT submission is still generated in .expyre not what I have specified as rundir in config.json.
Following is snippet of my config.json file.
I wonder if I'm wrong with something or is it a bug.
I'll be appreciated for any comments.

Best regards,
Hyunwook

{
    "systems": {
        "local": {
            "host": null,
            "remsh_cmd": "/usr/bin/ssh",
            "scheduler": "slurm",
            "header": [
                "#SBATCH --no-requeue",
                "#SBATCH --nodes={num_nodes}",
                "#SBATCH --ntasks-per-node={num_cores}",
				"#SBATCH --mem=180000"
            ],
            "commands": [
                "module purge",
                "module load anaconda/3/2021.05",
                "module load gcc/10 intel/19.1.2 impi/2019.8 mkl/2020.2 fftw-mpi/3.3.9",
                "source ${MKLROOT}/bin/mklvars.sh intel64",
                "conda activate /u/hjung/conda-envs/mace_env",
                "export QUIP_ARCH=linux_x86_64_gfortran_openmp",
                "export QUIPPY_INSTALL_OPTS=--user",
                "export WFL_AUTOPARA_NPOOL=40",
                "export OMP_NUM_THREADS=1",
                "export GAP_FIT_OMP_NUM_THREADS=40",
                "export ASE_ESPRESSO_COMMAND='srun /u/hjung/Softwares/QE/qe-7.0/bin/pw.x -in PREFIX.pwi > PREFIX.pwo'",
				"export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mpcdf/soft/SLE_12/packages/x86_64/intel_parallel_studio/2020.2/mkl/lib/intel64/",
                "export ESPRESSO_PSEUDO='/u/hjung/Softwares/QE/pseudopotentials'"
            ],
            "partitions": {
                "tiny" : { "num_cores" : 1, "max_time" : "24:00:00", "max_mem" : "12GB" },
                "medium" : { "num_cores" : 40, "max_time" : "24:00:00", "max_mem" : "24GB" }
                },
			"rundir": "/u/hjung/workflow/scratch"
        },

combining multiple job starts

On some remote machines just the ssh connection is somewhat slow. It would be nice if multiple job start commands could be combined, perhaps by gathering all the remote commands into an array of strings, and then running all of them in a single ssh connection.

stop relying on remote job id

It's sometimes nice to resubmit a remote job by hand (e.g. after tweaking up the max runtime). However, right now expyre will be confused if it tries to check on the new job while it's running, because the jobid won't match what's saved in the database. After the job is done it's fine, because the jobid doesn't matter. Maybe expyre should not even save the jobid, and just extract it as needed (e.g. for qdel) by filtering the output of qstat based on the job name, which is already a unique string. Note that the job name can be mangled by the queuing system wrappers, to avoid forbidden characters, so you can't just use the hash part of the rundir - you have to mangle it the same way.

make memory information available in System.header

Add keys to the dictionary used to format the System.header string templates that have information about the job's memory requirements. These may need to be per task or per node or per job, and possibly supporting different units.

more flexibility in queued job batch files

It would be useful to support more arbitrary queuing system header entries. Some times those need to replace one of the normal ones, like the node specification. Probably this can only generally implemented by replacing all the logic inside the queuing system python functions to a templating system. There will still be queuing system dependence because of the actual commands and parsing of the status queries, but the files may be better off entirely templated.

specifying additional header lines in remoteinfo files

I was having trouble with a low memory node in the womble cluster in Engineering, where the standard partition is listed as having more memory than this node (which is in the partition). One solution would be able to specify additional scheduler header lines for certain jobs, such as memory requirements.

ExPyRe on PyPi

Make ExPyRe available for only pypi for easy pip installation. Package name?

Example of a DFT job

So it doesn't involve worfklow; e.g. just os.system() call of CASTEP or ORCA, etc.

speculative job submission

Mostly a placeholder, but it would be possible to submit each job to multiple machines (or even partitions within a single machine), and monitor them, killing the others once one starts.

Quantum Espresso Geometry Optimization Error Code 3

Hi all,
when running geometry optimization with Quantum Espresso via the calculation = 'relax' command with a small maximum number of relaxation steps (e.g. 20) the calculation will result in an error code 3.
The QE output still terminates with

=------------------------------------------------------------------------------=
   JOB DONE.
=------------------------------------------------------------------------------=

as it should and includes all necessary information (i.e. energy and forces).
However, wfl will through an error and stop the iterative training.

Is it possible to prohibit certain error messages to halt the whole program?

Here is an example QE_run.tar.gz

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.