reactionmechanismgenerator / autotst Goto Github PK
View Code? Open in Web Editor NEWAutoTST: A framework to perform automated transition state theory calculations
License: Other
AutoTST: A framework to perform automated transition state theory calculations
License: Other
The package 'descriptastorus' has had a recent fix in the naming convention for the attribute 'gibrat' - https://github.com/bp-kelley/descriptastorus/blob/86eedc60546abe6f59cdbcb12025a61157ba178d/descriptastorus/descriptors/dists.py#L207C1-L207C1 as in scipy they used to refer it to gilbrat
but have since fixed it to be gibrat
(notice the removal of the l
). This is not reflected n the git repository but there was an upload of a new version, 2.6.0, in the conda-forge channel: https://anaconda.org/conda-forge/descriptastorus. And so, the fix up is included in this version
However, since AutoTST is kept to python 3.7.* (I believe due to RMG), it installs an older version of scipy (1.7.3), whereas descriptastorus is expecting scipy>=1.9.0. and therefore, the older version still maintains the attribute naming convention as gilbrat
Gaussian crashes when $GAUSS_SCRDIR is set to /tmp in Discovery
sample error message
PGFIO/stdio: No such file or directory
PGFIO-F-/OPEN/unit=11/error code returned by host stdio - 2.
File name = /tmp/farina.d/autotst/Gau-48004.inp
In source file ml0.f, at line number 197
g16() [0xb91318]
g16() [0xb926da]
g16() [0xb9205c]
g16() [0x404619]
g16() [0x40397d]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaef1d78445]
g16(sched_setaffinity+0xa1) [0x4038a9]
/var/spool/slurm/d/job5752911/slurm_script: line 3: 48004 Aborted
Conformer
objects generated by HotBit
calculator leads to TypeError: cannot serialize '_io.TextIOWrapper'
error. Googling this error resulted in several posts stating that this is unsolvable and inherent in multiprocessing
and I don't know where I should look if I were to fix it. I'm just opening this issue to keep track of it.Currently, AutoTST uses RMG's generate_resonance_structures()
method to automatically generate resonance structures when an autotst species is initialized. However, occasionally, this resonance generation method raises a ValueError
when trying to create a charged species. This is happening to some of the weird HAN nitrogen intermediates. This is the traceback when trying to generate a species from smiles string 'OO[N+](O)[O-]'
Traceback (most recent call last):
File "molecules.py", line 103, in <module>
species = Species(smiles=[smiles])
File "/home/farina.d/AutoTST/autotst/species.py", line 143, in __init__
species_list.append(molecule.generate_resonance_structures())
File "rmgpy/molecule/molecule.py", line 2089, in rmgpy.molecule.molecule.Molecule.generate_resonance_structures
File "rmgpy/molecule/molecule.py", line 2091, in rmgpy.molecule.molecule.Molecule.generate_resonance_structures
File "rmgpy/molecule/resonance.py", line 150, in rmgpy.molecule.resonance.generate_resonance_structures
File "rmgpy/molecule/resonance.py", line 185, in rmgpy.molecule.resonance.generate_resonance_structures
ValueError: Got the following structure:
SMILES: OO[N+2](O)[O-]
AdjacencyList:
1 O u0 p2 c0 {3,S} {5,S}
2 O u0 p2 c0 {5,S} {6,S}
3 O u0 p2 c0 {1,S} {7,S}
4 O u0 p3 c-1 {5,S}
5 N u0 p0 c+2 {1,S} {2,S} {4,S}
6 H u0 p0 c0 {2,S}
7 H u0 p0 c0 {3,S}
Net charge: 1
Currently RMG cannot process charged species correctly.
If this structure was entered in SMILES, try using the adjacencyList format for an unambiguous definition.
At very least we should document that setting this is necessary.
It'd be nicer if it wasn't necessary.
Is it only used for unit tests? What is it in fact used for? Can we instead detect the path based on the currently running file or something?
Symmetry numbers calculated using RMG's symmetry
package tend to result in incorrect estimations of symmetry numbers. Below is an example of ethane which should have a symmetry number of 6. This issue arose when creating PR #48. Some investigation into alternate packages may remedy this.
In [1]: from autotst.species import Conformer
In [2]: conf = Conformer("CC")
In [3]: conf.calculate_symmetry_number()
symmetry.py:253 write_input_file INFO Symmetry input file written to ./CC.symm
symmetry.py:219 parse INFO Point group: C1
Out[3]: 1
Currently there are no unit tests for AutoTST, we should probably add these for the following files:
ideas to reduce computational cost:
Unit tests currently exist but are very sparse right now for the update methods. This was caught in PR #43 but will be addressed within this issue
Occasionally when running conformational analysis, the following error is returned:
Traceback (most recent call last):
File "transitionstates.py", line 36, in <module>
rxn.generate_conformers(calculator=Hotbit())
File "/home/harms.n/Code/AutoTST/autotst/reaction.py", line 510, in generate_conformers
conformers = systematic_search(conformer, delta=60)
File "/home/harms.n/Code/AutoTST/autotst/conformer/systematic.py", line 271, in systematic_search
is_close = (np.sqrt(((df.distances[index] - df.distances)**2).apply(np.mean)) > 0.1)
File "/home/harms.n/anaconda2/envs/rmg_env/lib/python2.7/site-packages/pandas/core/ops.py", line 745, in wrapper
dtype=dtype,
File "/home/harms.n/anaconda2/envs/rmg_env/lib/python2.7/site-packages/pandas/core/ops.py", line 653, in _construct_result
return left._constructor(result, index=index, name=name, dtype=dtype)
File "/home/harms.n/anaconda2/envs/rmg_env/lib/python2.7/site-packages/pandas/core/series.py", line 264, in __init__
raise_cast_failure=True)
File "/home/harms.n/anaconda2/envs/rmg_env/lib/python2.7/site-packages/pandas/core/series.py", line 3275, in _sanitize_array
raise Exception('Data must be 1-dimensional')
Exception: Data must be 1-dimensional
This occurs after AutoTST has generated an ensemble of conformers, but is having an issue when identifying unique low energy conformers. Probably going to run through conformers one at a time rather than using Pandas logic to identify unique conformers.
I've been noticing that there is a bug in reaction.py
that causes some reactions to not be identified properly when trying to create an AutoTST Reaction object. It seems like H_Abstraction is just fine, but there are a few errors with the other families. Specifically, when looking at Sarathy's butanol model, the current master of AutoTST identified the following reactions:
We identified 143 possible R_Addition_MultipleBond reactions
We identified 855 possible H_Abstraction reactions
We identified 0 possible intra_H_migration reactions
And a total of 998 reactions
There should be:
AutoTST uses cclib == v.1.6.2 which does not support the new openbabel >= 3.0 import syntax. A new environment made recently for AutoTST contains cclib 1.6.2
and openbabel 3.1.1
, leading to a crash when cclib is called and tries to import OB. The most recent release of cclib (1.7) does import OB >= 3.0 correctly.
Create a new tst_env and run AutoTST.
For the libraries to not collide. The root problem is probably that AutoTST requires cclib to be exactly v. 1.6.2, while this version of cclib does not cap the version of OB to <= 3.0, leading to a collision.
Additional context
I don't know whether cclib v. 1.6.2 is a hard requirement in AutoTST, if it's not, I'd suggest changing to cclib >= 1.7
Conformational changes like inversion of chiral centers and rotation about double bonds rarely occur when reactants go to TSs (e.g. a reactant containing a cis double bond will probably have a TS with that corresponding double bond being cis). This is similar to the templating idea proposed in #33, but with a variation. I'm proposing that we do the following when building a TS geometry:
This would require us to do the following:
We should add type hinting support for AutoTST methods. It makes it easier to refactor and debug the code. RMG has some partial support in some modules for type hinting so we can start off from there. An example method from RMG.
def get_thermo_data(self, molecule: Union[Molecule, str]) -> ThermoData:
"""
Return thermodynamic parameters corresponding to a given
:class:`Molecule` object `molecule` or a SMILES string.
Returns: ThermoData
"""
I'm getting the following errors while trying to get a TS object.
Action Error for this reaction CC1=C[CH]C=CC1=O+CCCCO_CC1=CC=CC=C1O+CCC[CH]O1
ActionError Traceback (most recent call last)
<ipython-input-160-bddd08767730> in <module>
----> 1 rxn.get_labeled_reaction()
~/Code/RMG-Py3/AutoTST/autotst/reaction.py in get_labeled_reaction(self)
337 try:
338 labeled_r, labeled_p = family.get_labeled_reactants_and_products(
--> 339 test_reaction.reactants, test_reaction.products)
340 except ValueError:
341 continue
~/Code/RMG-Py3/RMG-Py/rmgpy/data/kinetics/family.py in get_labeled_reactants_and_products(self, reactants, products)
2672 # raise exception
2673 if num_mappings > 0:
-> 2674 raise ActionError('Something wrong with products that RMG cannot find a match!')
2675
2676 return None, None
ActionError: Something wrong with products that RMG cannot find a match!
while IndexError for this reaction
C1=CC=C(C=CC2=CC=CC=C2)C=C1+[O]O_C(=CC1=CC=CC=C1)=C1C=C[CH]C=C1+OO
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-164-a9a69e534d3e> in <module>
----> 1 rxn.ts
~/Code/RMG-Py3/AutoTST/autotst/reaction.py in ts(self)
116 if self._ts is None:
117 ts_dict = {}
--> 118 for direction, complex in self.get_rmg_complexes().items():
119 ts = TS(
120 reaction_label=self.label,
~/Code/RMG-Py3/AutoTST/autotst/reaction.py in get_rmg_complexes(self)
521
522 if self.rmg_reaction is None:
--> 523 self.get_labeled_reaction()
524
525 reactant_complex = RMGMolecule()
~/Code/RMG-Py3/AutoTST/autotst/reaction.py in get_labeled_reaction(self)
337 try:
338 labeled_r, labeled_p = family.get_labeled_reactants_and_products(
--> 339 test_reaction.reactants, test_reaction.products)
340 except ValueError:
341 continue
~/Code/RMG-Py3/RMG-Py/rmgpy/data/kinetics/family.py in get_labeled_reactants_and_products(self, reactants, products)
2631 # get mappings in forward direction
2632 mappings_a = self._match_reactant_to_template(molecule_a, template.reactants[0].item)
-> 2633 mappings_b = self._match_reactant_to_template(molecule_b, template.reactants[1].item)
2634 mappings = list(itertools.product(mappings_a, mappings_b))
2635 # get
mappings in the reverse direction
IndexError: list index out of range
Both of these errors can be traced back to family.py
in rmgpy. The reactions in question are scraped from importer project and were able to make a transition state in py2. I'm seeing these errors after we updated to py3.
Hello, developers. I have created AutoTST environment on my Mac, but there is no guide to tell us how to run or activate AutoTST simulation.
Because the AutoTST is based on RMG, my guess is that once we activate AutoTST environment (tst_env), we can run AutoTST as same as run RMG, for example, type " rmg.py input.py" in the terminal (it actually works).
Here are my questions, 1st, do we need to modify input file to tell RMG we need to activate AutoTST; 2nd, what and where is the output of AutoTST simulation, for example, when we finish RMG simulation, we can get a mechanism(chem.inp), for AutoTST what kind of files we are looking for?
Regard to the 2nd question, I run 2 RMG simulations under rmg_env environment and tst_env environment with the same input, respectively. After finishing these 2 simulations, I found that they have exactly the same output files, so I wonder if I activate AutoTST under the tst_env environment.
Below is my simple input file, just for the test.
input.txt
Looking forward to your response, thanks!!
From reading about other tools, many will template
when generating a TS guess. From the AutoTS paper
In addition to interpolation, we have the ability to use geometric information from a previously located TS. We refer to this approach as templating and to the predefined TS geometries as templates. TS templates, along with the reactant and product geometries that they connect (as verified by an IRC calculation), are stored in an auxiliary computer file. Several string representations of the input reaction are generated and used to search this template file for matching reactions. Each representation is of a particular order, i, and is constructed by generating SMILES strings for the atoms in the reaction center plus all (i – 1)th nearest neighbors. We tend to use only three orders starting with i = 1. The SMILES patterns of the reactant and product structures are combined with the symbol “≫” as in reaction SMILES. The algorithm has the preference for matches at the highest order attainable.
Once it is recognized that a usable template exists, we extract it from the file and compute an atom correspondence between the input structures and template structures. This is done by utilizing an MCS algorithm, which computes a bijection for the reaction center atoms. The reaction matching described above guarantees that the reaction center and possibly some number of neighbors make up a substructure shared between the reactant and product.
Currently, AutoTST can generate calculators and write input files for 1-D hindered rotor scans, however, there is a lot of fine tuning that needs to be done in order for this to be successfully implemented. Below is a general ToDo:
I just saw that ASE now has native TS optimizers and IRC calculations for Gaussian. This may be worth looking into 🤔
See the link below:
https://wiki.fysik.dtu.dk/ase/dev/ase/calculators/gaussian.html
There are a lot of options for Kinetics and ThermoJob (optimize, conformer rmsd/energy cutoff, single_point, arkane,etc.). We could make this better by creating input files, similar to how Arkane works.
For example, if you start 1000 reactions at once (on XSEDE) and 100 of them have a reactant in common, they will each ask themselves "do I have finished log files for this reactant?" and decide "no, I don't, so let me start calculating it". Then you are calculating the same reactant 100 times, probably fighting over saving the log files in the same folder.
This is the sort of thing we need to anticipate and avoid.
We should do something like a "HAZOP" of our code. Think about "too much flow?" etc. at each point. or "what if...".
When estimating TS distances for a reaction, we currently load RMG-database and AutoTST database. For example, when we create an autotst H Abstraction reaction and want to generate a TS, AutoTST loads both the RMG-database H Abstraction kinetics tree and the AutoTST H Abstraction TS tree. I don't think its necessary to load the RMG H Abstraction kinetics tree.
Currently, AutoTST can optimize and converge on species and transition state geometries, but the writing of Arkane input files is now broken because of the recent merge. Will address it soon.
Currently the testing around validating IRCs is a bit lax (as seen in PR #52). We should make a test that reads an existing IRC calculation and validated (and / or invalidates it).
Due to lack of Gaussian on all computing clusters, we are hoping to add additional calculators to AutoTST:
Any additional suggestions?
To update the TS database, we need to work backwards: generate a TS from a log file. We currently rely on the atom sorting labels to identify the atoms in the reaction center. However, sometimes the sorting labels in the log file don't match up with the TS sorting labels (if using a different version of ase for example). We could make the log to TS method more robust if we include key information in the gaussian log file needed to generate the TS, such as ts.reaction_label
, ts.reaction_family
, ts.direction
, and ts.rmg_molecule.get_all_labeled_atoms()
. We could do this by adding these attributes to the title block of the gaussian input file (unfortunately Gaussian doesn't print comments in output). However, ase's gaussian calculator does not have a title
parameter (it just writes Gaussian input prepared by ASE
), so we would need to change ase Gaussian code
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.