coltonbh / qcop Goto Github PK

3.0 3.0 2.0 247 KB

A package for operating Quantum Chemistry programs using qcio standardized data structures. Compatible with TeraChem, psi4, QChem, NWChem, ORCA, Molpro, geomeTRIC and many more.

License: MIT License

Python 99.45% Shell 0.55%

qcop's Introduction

Quantum Chemistry Operate

A package for operating Quantum Chemistry programs using qcio standardized data structures. Compatible with TeraChem, psi4, QChem, NWChem, ORCA, Molpro, geomeTRIC and many more.

qcop works in harmony with a suite of other quantum chemistry tools for fast, structured, and interoperable quantum chemistry.

The QC Suite of Programs

qcio - Beautiful and user friendly data structures for quantum chemistry.
qcparse - A library for efficient parsing of quantum chemistry data into structured qcio objects.
qcop - A package for operating quantum chemistry programs using qcio standardized data structures. Compatible with TeraChem, psi4, QChem, NWChem, ORCA, Molpro, geomeTRIC and many more.
BigChem - A distributed application for running quantum chemistry calculations at scale across clusters of computers or the cloud. Bring multi-node scaling to your favorite quantum chemistry program.
ChemCloud - A web application and associated Python client for exposing a BigChem cluster securely over the internet.

Installation

pip install qcop

Quickstart

qcop uses the qcio data structures to drive quantum chemistry programs in a standardized way. This allows for a simple and consistent interface to a wide variety of quantum chemistry programs. See the qcio library for documentation on the input and output data structures.

The compute function is the main entry point for the library and is used to run a calculation.

from qcio import Structure, ProgramInput
from qcop import compute
from qcop.exceptions import ExternalProgramError
# Create the Structure
h2o = Structure.open("h2o.xyz")

# Define the program input
prog_input = ProgramInput(
    structure=h2o,
    calctype="energy",
    model={"method": "hf", "basis": "sto-3g"},
    keywords={"purify": "no", "restricted": False},
)

# Run the calculation; will return ProgramOutput or raise an exception
try:
    po = compute("terachem", prog_input, collect_files=True)
except ExternalProgramError as e:
    # External QQ program failed in some way
    po = e.program_output
    po.input_data # Input data used by the QC program
    po.success # Will be False
    po.results # Any half-computed results before the failure
    po.traceback # Stack trace from the calculation
    po.ptraceback # Shortcut to print out the traceback in human readable format
    po.stdout # Stdout log from the calculation
    raise e
else:
    # Calculation succeeded
    po.input_data # Input data used by the QC program
    po.success # Will be True
    po.results # All structured results from the calculation
    po.stdout # Stdout log from the calculation
    po.pstdout # Shortcut to print out the stdout in human readable format
    po.files # Any files returned by the calculation
    po.provenance # Provenance information about the calculation
    po.extras # Any extra information not in the schema

One may also call compute(..., raise_exc=False) to return a ProgramOutput object rather than raising an exception when a calculation fails. This may allow easier handling of failures in some cases.

from qcio import Structure, ProgramInput
from qcop import compute
from qcop.exceptions import ExternalProgramError
# Create the Structure
h2o = Structure.open("h2o.xyz")

# Define the program input
prog_input = ProgramInput(
    structure=h2o,
    calctype="energy",
    model={"method": "hf", "basis": "sto-3g"},
    keywords={"purify": "no", "restricted": False},
)

# Run the calculation; will return a ProgramOutput objects
po = compute("terachem", prog_input, collect_files=True, raise_exc=False)
if not po.success:
    # External QQ program failed in some way
    po.input_data # Input data used by the QC program
    po.success # Will be False
    po.results # Any half-computed results before the failure
    po.traceback # Stack trace from the calculation
    po.ptraceback # Shortcut to print out the traceback in human readable format
    po.stdout # Stdout log from the calculation

else:
    # Calculation succeeded
    po.input_data # Input data used by the QC program
    po.success # Will be True
    po.results # All structured results from the calculation
    po.stdout # Stdout log from the calculation
    po.pstdout # Shortcut to print out the stdout in human readable format
    po.files # Any files returned by the calculation
    po.provenance # Provenance information about the calculation
    po.extras # Any extra information not in the schema

Alternatively, the compute_args function can be used to run a calculation with the input data structures passed in as arguments rather than as a single ProgramInput object.

from qcio import Structure
from qcop import compute_args
# Create the Structure
h2o = Structure.open("h2o.xyz")

# Run the calculation
output = compute_args(
    "terachem",
    h2o,
    calctype="energy",
    model={"method": "hf", "basis": "sto-3g"},
    keywords={"purify": "no", "restricted": False},
    files={...},
    collect_files=True
)

The behavior of compute() and compute_args() can be tuned by passing in keyword arguments like collect_files shown above. Keywords can modify which scratch directory location to use, whether to delete or keep the scratch files after a calculation completes, what files to collect from a calculation, whether to print the program stdout in real time as the program executes, and whether to propagate a wavefunction through a series of calculations. Keywords also include hooks for passing in update functions that can be called as a program executes in real time. See the compute method docstring for more details.

See the /examples directory for more examples.

Support

If you have any issues with qcop or would like to request a feature, please open an issue.

qcop's People

Contributors

Stargazers

Watchers

Forkers

hjnpark chrinide

qcop's Issues

Capture geometric (or any other external package error) and wrap with `QCOPException` class

When geomeTRIC raises an exception (e.g., failed to converge) we get various ChemCloud server errors (same issue Umberto was having). This is because the exception is not caught and raises under the QCOP banner so when the ChemCloud server re-raises the error from celery (result.ready()) the exception class doesn't exist because geometric is not installed on the server so an exception is raised and the server returns a 500.

I need to capture all external package exceptions--like QCEngine exceptions--and wrap them in a QCOP exception class.

Propagate `print_stdout` to subprograms.

Run compute(..., print_stdout=True) using DualProgramInput and the stdout of the subprogram, e.g., terachem when using geometric as an optimizer, does not print its stdout to the terminal.

[FEATURE] Move update_func up to just `BaseAdapter.compute()` call

I'm currently pushing down the update_func into low level functions like execute_subprocess and various places in the GeometricAdapter. I also have multiple ways (wrappers) to capture stdout and log files covering three cases:

The package uses the python logger
The package writes to sys.stdout
The package (like xtb) writes to system stdout outside of Python's access to sys.stdout (by writing directly with a C library, e.g.,)

I think I could create a single high level wrapper function that I put in BaseAdapter.compute to capture all these forms of stdout and then work with the update_func only at that level. Would be much easier and cleaner.

Something like this:

# Very lazy pseudocode
with capture_logs(...) as logs: # logs is some stream that gets written to
    # Execute the program. results will be None if FileInput                
    tread = treading.Thread(update_loop, update_func, update_interval, logs ).start()
    results, stdout = self.compute_results, args=(
                inp_obj,
                update_func,
                update_interval,
                propagate_wfn=propagate_wfn,
                **kwargs,
            )
    )
    # lazy pseudocode
    thread.kill()

Then I would run the program in a background thread, capture the logs/stdout in the main thread and can run an update_func until the thread completes execution.

This would also fix #19 and maybe #18

Maybe hae the @capture_logs decorator be a master function that uses one of the three approaches to capture logs depending on a class variable on an Adapter that declares how this program writes logs (one of logger, stdout, or nonpython)?

[FEATURE] - Add CREST

Crest offers geometry optimizations as well as a number of new and very useful calc_types that I'd like to add to qcop.

[Add TeraChem Optimization] CalcType

Depends on: coltonbh/qcparse#14

[BUG] - Optimizations with geomeTRIC and xtb start to pick up the `qcio_optim.xyz` file created by geomeTRIC after a few steps in the trajectory

This code

from pathlib import Path

from qcio import Structure, CalcType, ProgramInput, DualProgramInput, constants
from qcop import compute

from xtb.utils import Solvent

STRUCT_DIR = Path("./data/2017-jacs/structures")

u1_anion_180 = Structure.open(STRUCT_DIR / "U1-_mmff94s_180.json")
pi = DualProgramInput(
    calctype=CalcType.optimization,
    structure=u1_anion_180,
    subprogram="xtb",
    subprogram_args={
        "model": {"method": "GFN2xTB"},
        "keywords": {"solvent": Solvent.thf},
    },
)
po = compute("geometric", pi, raise_exc=False, print_stdout=True)

Some outputs

>>> po.files.keys()
dict_keys(['qcio_optim.xyz'])
>>> po.files.keys()
dict_keys(['qcio_optim.xyz'])
>>> po.results.trajectory[0].files.keys()
dict_keys([])
>>> po.results.trajectory[1].files.keys()
dict_keys([])
>>> po.results.trajectory[2].files.keys()
dict_keys(['qcio_optim.xyz'])
>>> po.results.trajectory[3].files.keys()
dict_keys(['qcio_optim.xyz'])
>>> po.results.trajectory[4].files.keys()
dict_keys(['qcio_optim.xyz'])
>>> po.results.trajectory[4].files.keys()
dict_keys(['qcio_optim.xyz'])
>>> po.results.trajectory[5].files.keys()
dict_keys(['qcio_optim.xyz'])
>>> po.results.trajectory[5].files['qcio_optim.xyz'] == po.results.trajectory[6].files['qcio_optim.xyz']
False
>>> len(po.results.trajectory[5].files['qcio_optim.xyz'])
10725
>>> len(po.results.trajectory[6].files['qcio_optim.xyz'])
12870
>>> len(po.results.trajectory[7].files['qcio_optim.xyz'])
15015
>>> len(po.results.trajectory[8].files['qcio_optim.xyz'])

I did not pass collect_files=True.

geometric logs (DualOutputHandler) update to handle any arbitrary update function instead of just print_stdout

Update the DualOutputHandler to run arbitrary update functions to send logs anywhere. Have punted on this for now as it's not needed and will require some extra engineering work.

[FEATURE] - Add optimizations with Sella/ASE

I like the API for qcop and would love to use it as a common interface for energy, hessian, and structure calculations with Sella/ASE, similar to the current geomeTRIC/QCEngine calculators in qcop. The motivation for adding the former is two-fold:

Sella seems to have more robust support for saddle-point/TS optimizations
ASE provides a QCEngine-style wrapper around another set of electronic structure codes, in addition to those already present in qcop

In particular, my workflow involves the following list of routines, which might be performed through Sella/ASE:

ASE: Single-point energy calculations using an ASE calculator (ideally, preserving the option to pass in a custom calculator)
ASE: Gradient (force) calculations using an ASE calculator
ASE: Hessian (harmonic vibration) calculations using an ASE calculator
Sella: Optimization of a minimum-energy (equilibrium) structure using internal coordinates
Sella: Optimization of a saddle-point (TS) structure using internal coordinates
Sella: Optimization of a minimum energy structure, subject to one or more constraints of bond distance, angle, etc., also in internal coordinates (for generating relaxed scans)
Sella: IRC calculation in internal coordinates (lower priority than the other items, but would be nice to have)

I don't need any further tools from ASE, beyond these basic ones. These building blocks would allow me to write my own routines for conformer sampling, relaxed scanning, and transition state finding using qcop.

Cannot sees stdout in real time when using a QCEngine adaptor.

They do their subprocess execution using subprocess.run so it's not possible to poll the process in real time and extract stdout. However, for programs like psi4 that are capturing logs in python perhaps there's a way to hack this and get access to the logs in real time??

`print_stdout` appears to not work for geomeTRIC

Qcop CLI for comverting xyz file to printed out array of arrays in bohr

qcop bohr file.xyz

symbols = [...]
geometry = [[...]]

For quick conversion of xyz files to copy and paste into input.toml files.