Giter Site home page Giter Site logo

reactionmechanismgenerator / autotst Goto Github PK

View Code? Open in Web Editor NEW
31.0 10.0 16.0 12.08 MB

AutoTST: A framework to perform automated transition state theory calculations

License: Other

Python 83.16% Jupyter Notebook 16.83% Makefile 0.01%
chemistry quantum-chemistry kinetics combustion

autotst's People

Contributors

alongd avatar amarkpayne avatar c-underkoffler avatar calvinp0 avatar davidfarinajr avatar rwest avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

autotst's Issues

[Bug Report] AttributeError: module 'scipy.stats' has no attribute 'gibrat'

Describe the bug

The package 'descriptastorus' has had a recent fix in the naming convention for the attribute 'gibrat' - https://github.com/bp-kelley/descriptastorus/blob/86eedc60546abe6f59cdbcb12025a61157ba178d/descriptastorus/descriptors/dists.py#L207C1-L207C1 as in scipy they used to refer it to gilbrat but have since fixed it to be gibrat (notice the removal of the l). This is not reflected n the git repository but there was an upload of a new version, 2.6.0, in the conda-forge channel: https://anaconda.org/conda-forge/descriptastorus. And so, the fix up is included in this version

However, since AutoTST is kept to python 3.7.* (I believe due to RMG), it installs an older version of scipy (1.7.3), whereas descriptastorus is expecting scipy>=1.9.0. and therefore, the older version still maintains the attribute naming convention as gilbrat

Gaussian doesn't like writing scratch temp files to '/tmp' in discovery

Gaussian crashes when $GAUSS_SCRDIR is set to /tmp in Discovery
sample error message

PGFIO/stdio: No such file or directory
PGFIO-F-/OPEN/unit=11/error code returned by host stdio - 2.
 File name = /tmp/farina.d/autotst/Gau-48004.inp
 In source file ml0.f, at line number 197
  g16() [0xb91318]
  g16() [0xb926da]
  g16() [0xb9205c]
  g16() [0x404619]
  g16() [0x40397d]
  /lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaef1d78445]
  g16(sched_setaffinity+0xa1) [0x4038a9]
/var/spool/slurm/d/job5752911/slurm_script: line 3: 48004 Aborted 

resonance generation creates charged species

Currently, AutoTST uses RMG's generate_resonance_structures() method to automatically generate resonance structures when an autotst species is initialized. However, occasionally, this resonance generation method raises a ValueError when trying to create a charged species. This is happening to some of the weird HAN nitrogen intermediates. This is the traceback when trying to generate a species from smiles string 'OO[N+](O)[O-]'

Traceback (most recent call last):
  File "molecules.py", line 103, in <module>
    species = Species(smiles=[smiles])
  File "/home/farina.d/AutoTST/autotst/species.py", line 143, in __init__
    species_list.append(molecule.generate_resonance_structures())
  File "rmgpy/molecule/molecule.py", line 2089, in rmgpy.molecule.molecule.Molecule.generate_resonance_structures
  File "rmgpy/molecule/molecule.py", line 2091, in rmgpy.molecule.molecule.Molecule.generate_resonance_structures
  File "rmgpy/molecule/resonance.py", line 150, in rmgpy.molecule.resonance.generate_resonance_structures
  File "rmgpy/molecule/resonance.py", line 185, in rmgpy.molecule.resonance.generate_resonance_structures
ValueError: Got the following structure:
SMILES: OO[N+2](O)[O-]
AdjacencyList:
1 O u0 p2 c0 {3,S} {5,S}
2 O u0 p2 c0 {5,S} {6,S}
3 O u0 p2 c0 {1,S} {7,S}
4 O u0 p3 c-1 {5,S}
5 N u0 p0 c+2 {1,S} {2,S} {4,S}
6 H u0 p0 c0 {2,S}
7 H u0 p0 c0 {3,S}

Net charge: 1

Currently RMG cannot process charged species correctly.
If this structure was entered in SMILES, try using the adjacencyList format for an unambiguous definition.

would be nice to avoid needing $AUTOTST environment variable

At very least we should document that setting this is necessary.
It'd be nicer if it wasn't necessary.

Is it only used for unit tests? What is it in fact used for? Can we instead detect the path based on the currently running file or something?

Symmetry number estimations not always accurate

Symmetry numbers calculated using RMG's symmetry package tend to result in incorrect estimations of symmetry numbers. Below is an example of ethane which should have a symmetry number of 6. This issue arose when creating PR #48. Some investigation into alternate packages may remedy this.

In [1]: from autotst.species import Conformer                                                                                

In [2]: conf = Conformer("CC")                                                                                               

In [3]: conf.calculate_symmetry_number()                                                                                     
symmetry.py:253 write_input_file INFO Symmetry input file written to ./CC.symm
symmetry.py:219 parse INFO Point group: C1
Out[3]: 1

Add Unit Tests

Currently there are no unit tests for AutoTST, we should probably add these for the following files:

  • species.py
  • reaction.py
  • base.py
  • job.py
  • calculators/vibrational_analysis.py
  • calculators/gaussian.py
  • calculators/statmech.py
  • conformer/systematic.py

reduce computational cost

ideas to reduce computational cost:

  1. during shell minimizations, compare energies with loose convergence criteria to eliminate high energy conformers earlier in the process (don't run high energy conformers all the way down hill as they will be discarded anyways)
  2. only run saddle point searches on low energy conformers after shell optimization (instead of trying to find saddle points for all conformers)
  • generate conformers --> prune with DFTB (low level) --> prune with DFT shell opt (high level) --> run saddle point searched on remaining conformers

1-D Array Error

Occasionally when running conformational analysis, the following error is returned:

Traceback (most recent call last):
  File "transitionstates.py", line 36, in <module>
    rxn.generate_conformers(calculator=Hotbit())
  File "/home/harms.n/Code/AutoTST/autotst/reaction.py", line 510, in generate_conformers
    conformers = systematic_search(conformer, delta=60)
  File "/home/harms.n/Code/AutoTST/autotst/conformer/systematic.py", line 271, in systematic_search
    is_close = (np.sqrt(((df.distances[index] - df.distances)**2).apply(np.mean)) > 0.1)
  File "/home/harms.n/anaconda2/envs/rmg_env/lib/python2.7/site-packages/pandas/core/ops.py", line 745, in wrapper
    dtype=dtype,
  File "/home/harms.n/anaconda2/envs/rmg_env/lib/python2.7/site-packages/pandas/core/ops.py", line 653, in _construct_result
    return left._constructor(result, index=index, name=name, dtype=dtype)
  File "/home/harms.n/anaconda2/envs/rmg_env/lib/python2.7/site-packages/pandas/core/series.py", line 264, in __init__
    raise_cast_failure=True)
  File "/home/harms.n/anaconda2/envs/rmg_env/lib/python2.7/site-packages/pandas/core/series.py", line 3275, in _sanitize_array
    raise Exception('Data must be 1-dimensional')
Exception: Data must be 1-dimensional

This occurs after AutoTST has generated an ensemble of conformers, but is having an issue when identifying unique low energy conformers. Probably going to run through conformers one at a time rather than using Pandas logic to identify unique conformers.

Trouble identifying `intra_H_migration` and `R_Addition_Multiple_Bond` reactions

I've been noticing that there is a bug in reaction.py that causes some reactions to not be identified properly when trying to create an AutoTST Reaction object. It seems like H_Abstraction is just fine, but there are a few errors with the other families. Specifically, when looking at Sarathy's butanol model, the current master of AutoTST identified the following reactions:

We identified 143 possible R_Addition_MultipleBond reactions
We identified 855 possible H_Abstraction reactions
We identified 0 possible intra_H_migration reactions
And a total of 998 reactions

There should be:

  • 855 H Abstraction
  • 78 intra H migration
  • 184 R addition
  • 1117 total reactions

Suggestion to increment cclib version

Describe the bug

AutoTST uses cclib == v.1.6.2 which does not support the new openbabel >= 3.0 import syntax. A new environment made recently for AutoTST contains cclib 1.6.2 and openbabel 3.1.1, leading to a crash when cclib is called and tries to import OB. The most recent release of cclib (1.7) does import OB >= 3.0 correctly.

How to reproduce the bug

Create a new tst_env and run AutoTST.

Expected behavior

For the libraries to not collide. The root problem is probably that AutoTST requires cclib to be exactly v. 1.6.2, while this version of cclib does not cap the version of OB to <= 3.0, leading to a collision.

Additional context
I don't know whether cclib v. 1.6.2 is a hard requirement in AutoTST, if it's not, I'd suggest changing to cclib >= 1.7

Building of TSs with reactant and product species geometries

Conformational changes like inversion of chiral centers and rotation about double bonds rarely occur when reactants go to TSs (e.g. a reactant containing a cis double bond will probably have a TS with that corresponding double bond being cis). This is similar to the templating idea proposed in #33, but with a variation. I'm proposing that we do the following when building a TS geometry:

  1. Read in QM optimized geometries, if they exist, for reactants and products.
  2. Perform conformer analysis and QM optimization for reactants and products that have no optimized geometries and read them in.
  3. Using optimized geometries for reactants and products, construct a TS geometry using a tighter bounds matrix for bound atoms (because we trust these values)
  4. Perform a conformer analysis on rotatable torsions only because cis/trans bonds and chiral centers probably won't invert.

This would require us to do the following:

  • Create a database of optimized geometries for autotst.reaction to read from
    • Possibly build the geometry with RDKit if a reactant or product geometry isn't available but post a warning.
  • Revise the way the bounds matrix is edited in autotst.reaction
  • Revise the conformer analysis in autotst.conformer.systematic to not include cit/trans bonds and chiral centers

Add type hint support for AutoTST

We should add type hinting support for AutoTST methods. It makes it easier to refactor and debug the code. RMG has some partial support in some modules for type hinting so we can start off from there. An example method from RMG.

def get_thermo_data(self, molecule: Union[Molecule, str]) -> ThermoData:
        """
        Return thermodynamic parameters corresponding to a given
        :class:`Molecule` object `molecule` or a SMILES string.
        Returns: ThermoData
        """

`ActionError` and `IndexError` while trying to get transition states of H abstraction reactions from importer

I'm getting the following errors while trying to get a TS object.
Action Error for this reaction CC1=C[CH]C=CC1=O+CCCCO_CC1=CC=CC=C1O+CCC[CH]O1

ActionError                               Traceback (most recent call last)
<ipython-input-160-bddd08767730> in <module>
----> 1 rxn.get_labeled_reaction()

~/Code/RMG-Py3/AutoTST/autotst/reaction.py in get_labeled_reaction(self)
    337                 try:
    338                     labeled_r, labeled_p = family.get_labeled_reactants_and_products(
--> 339                         test_reaction.reactants, test_reaction.products)
    340                 except ValueError:
    341                     continue

~/Code/RMG-Py3/RMG-Py/rmgpy/data/kinetics/family.py in get_labeled_reactants_and_products(self, reactants, products)
   2672         # raise exception
   2673         if num_mappings > 0:
-> 2674             raise ActionError('Something wrong with products that RMG cannot find a match!')
   2675 
   2676         return None, None

ActionError: Something wrong with products that RMG cannot find a match!

while IndexError for this reaction
C1=CC=C(C=CC2=CC=CC=C2)C=C1+[O]O_C(=CC1=CC=CC=C1)=C1C=C[CH]C=C1+OO

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-164-a9a69e534d3e> in <module>
----> 1 rxn.ts

~/Code/RMG-Py3/AutoTST/autotst/reaction.py in ts(self)
    116         if self._ts is None:
    117             ts_dict = {}
--> 118             for direction, complex in self.get_rmg_complexes().items():
    119                 ts = TS(
    120                     reaction_label=self.label,

~/Code/RMG-Py3/AutoTST/autotst/reaction.py in get_rmg_complexes(self)
    521 
    522         if self.rmg_reaction is None:
--> 523             self.get_labeled_reaction()
    524 
    525         reactant_complex = RMGMolecule()

~/Code/RMG-Py3/AutoTST/autotst/reaction.py in get_labeled_reaction(self)
    337                 try:
    338                     labeled_r, labeled_p = family.get_labeled_reactants_and_products(
--> 339                         test_reaction.reactants, test_reaction.products)
    340                 except ValueError:
    341                     continue

~/Code/RMG-Py3/RMG-Py/rmgpy/data/kinetics/family.py in get_labeled_reactants_and_products(self, reactants, products)
   2631             # get mappings in forward direction
   2632             mappings_a = self._match_reactant_to_template(molecule_a, template.reactants[0].item)
-> 2633             mappings_b = self._match_reactant_to_template(molecule_b, template.reactants[1].item)
   2634             mappings = list(itertools.product(mappings_a, mappings_b))
   2635             # get 
mappings in the reverse direction

IndexError: list index out of range

Both of these errors can be traced back to family.py in rmgpy. The reactions in question are scraped from importer project and were able to make a transition state in py2. I'm seeing these errors after we updated to py3.

How to run AutoTST

Hello, developers. I have created AutoTST environment on my Mac, but there is no guide to tell us how to run or activate AutoTST simulation.

Because the AutoTST is based on RMG, my guess is that once we activate AutoTST environment (tst_env), we can run AutoTST as same as run RMG, for example, type " rmg.py input.py" in the terminal (it actually works).

Here are my questions, 1st, do we need to modify input file to tell RMG we need to activate AutoTST; 2nd, what and where is the output of AutoTST simulation, for example, when we finish RMG simulation, we can get a mechanism(chem.inp), for AutoTST what kind of files we are looking for?

Regard to the 2nd question, I run 2 RMG simulations under rmg_env environment and tst_env environment with the same input, respectively. After finishing these 2 simulations, I found that they have exactly the same output files, so I wonder if I activate AutoTST under the tst_env environment.

Below is my simple input file, just for the test.
input.txt

Looking forward to your response, thanks!!

Think about adding `templating` when generating TS guesses

From reading about other tools, many will template when generating a TS guess. From the AutoTS paper

In addition to interpolation, we have the ability to use geometric information from a previously located TS. We refer to this approach as templating and to the predefined TS geometries as templates. TS templates, along with the reactant and product geometries that they connect (as verified by an IRC calculation), are stored in an auxiliary computer file. Several string representations of the input reaction are generated and used to search this template file for matching reactions. Each representation is of a particular order, i, and is constructed by generating SMILES strings for the atoms in the reaction center plus all (i – 1)th nearest neighbors. We tend to use only three orders starting with i = 1. The SMILES patterns of the reactant and product structures are combined with the symbol “≫” as in reaction SMILES. The algorithm has the preference for matches at the highest order attainable.
Once it is recognized that a usable template exists, we extract it from the file and compute an atom correspondence between the input structures and template structures. This is done by utilizing an MCS algorithm, which computes a bijection for the reaction center atoms. The reaction matching described above guarantees that the reaction center and possibly some number of neighbors make up a substructure shared between the reactant and product.

Addition of Hindered Rotor Calculations to AutoTST

Currently, AutoTST can generate calculators and write input files for 1-D hindered rotor scans, however, there is a lot of fine tuning that needs to be done in order for this to be successfully implemented. Below is a general ToDo:

  • Adding hindered rotor scans to Arkane input files
  • Addressing discontinuous rotor scans when they occur

input file for Job

There are a lot of options for Kinetics and ThermoJob (optimize, conformer rmsd/energy cutoff, single_point, arkane,etc.). We could make this better by creating input files, similar to how Arkane works.

Be ready to run many jobs in parallel

For example, if you start 1000 reactions at once (on XSEDE) and 100 of them have a reactant in common, they will each ask themselves "do I have finished log files for this reactant?" and decide "no, I don't, so let me start calculating it". Then you are calculating the same reactant 100 times, probably fighting over saving the log files in the same folder.

This is the sort of thing we need to anticipate and avoid.

We should do something like a "HAZOP" of our code. Think about "too much flow?" etc. at each point. or "what if...".

Loading RMG-database and TS database

When estimating TS distances for a reaction, we currently load RMG-database and AutoTST database. For example, when we create an autotst H Abstraction reaction and want to generate a TS, AutoTST loads both the RMG-database H Abstraction kinetics tree and the AutoTST H Abstraction TS tree. I don't think its necessary to load the RMG H Abstraction kinetics tree.

Arkane Job Is Broken

Currently, AutoTST can optimize and converge on species and transition state geometries, but the writing of Arkane input files is now broken because of the recent merge. Will address it soon.

Add additional tests when validating IRCs

Currently the testing around validating IRCs is a bit lax (as seen in PR #52). We should make a test that reads an existing IRC calculation and validated (and / or invalidates it).

Add additional calculators to AutoTST

Due to lack of Gaussian on all computing clusters, we are hoping to add additional calculators to AutoTST:

  • NWChem
  • MolPro

Any additional suggestions?

log to TS

To update the TS database, we need to work backwards: generate a TS from a log file. We currently rely on the atom sorting labels to identify the atoms in the reaction center. However, sometimes the sorting labels in the log file don't match up with the TS sorting labels (if using a different version of ase for example). We could make the log to TS method more robust if we include key information in the gaussian log file needed to generate the TS, such as ts.reaction_label, ts.reaction_family, ts.direction, and ts.rmg_molecule.get_all_labeled_atoms(). We could do this by adding these attributes to the title block of the gaussian input file (unfortunately Gaussian doesn't print comments in output). However, ase's gaussian calculator does not have a title parameter (it just writes Gaussian input prepared by ASE), so we would need to change ase Gaussian code

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.