Giter Site home page Giter Site logo

enerzyme's Introduction

Enerzyme

Towards next generation machine learning force field on enzymatic catalysis.

Current model architectures:

Usage

Installation

Recommended environment

python==3.10.12
pip==23.2.1
setuptools==68.1.2
h5py==3.9.0
numpy==1.24.3
addict==2.4.0
tqdm==4.66.1
joblib==1.3.2
pandas==2.1.0
pytorch==2.0.1
scikit-learn==1.3.0
ase==3.22.1
transformers==4.33.1
torch-ema==0.3
pyyaml==6.0.1
pip install -e .

Training

Energy (force) / Atomic Charge / Dipole moment fitting.

enerzyme train -c <configuration yaml file> -o <output directory>

Please see enerzyme/config/train.yaml for details and recommended configurations.

Enerzyme saves the preprocessed dataset, split indices, final <configuration yaml file>, and the best model to the <output directory>.

Evaluation

Energy (force) / Atomic Charge / Dipole moment prediction.

enerzyme predict -c <configuration yaml file> -o <output directory> -m <model directory>

Please see enerzyme/config/predict.yaml for details.

Enerzyme reads the <model directory> for the model configuration, load the models, predict the results from all active models, save the predicted values as a pickle in the corresponding model subfolders, and report the results as a csv file in the <output directory>.

Simulation

Supported simulation types:

  • Flexible scanning on the distance between two atoms.
  • Constrained Langevin MD
enerzyme simulate -c <configuration yaml file> -o <output directory> -m <model directory>

Enerzyme reads the <model directory> for the model configuration, load the models, do simulation, and report the results in the <output directory>.

enerzyme's People

Contributors

benzoin96485 avatar

Stargazers

Melissa Manetsch avatar  avatar

Watchers

Kostas Georgiou avatar  avatar

enerzyme's Issues

Optimize the storage of the full neighbor list

The full neighbor list scales as O($N^2$) with the system size and occupies a large disk space when storing the preprocessed dataset. Actually only one full neighbor list is needed if the dataset contains only one system with different configurations. The storage of atom types and total charges can be optimized as well.

General-purpose dispersion layer: DFT-D4

DFT-D4 energy calculation should be separated from SpookyNet. A general layer with positions as inputs and atomic dispersion energy as output are better for extensibility.

Support and standardize the format of datasets

As npz and hdf5 are more advanced formats storing huge datasets, enerzyme should support and standardize reading data from them. The picklized format should be further standardized, too.

Features calculation, preloading and storage

For fast computation and rigorous reproduction among different models, the data splitting and processed features, including scaled and translated energies, atomic numbers, neighbor lists, batch segmentations, ... should be stored for reuse.

General-purpose Coulomb layer

Coulomb energy calculation should be separated from specific models. A general layer with atomic charges and positions as inputs and atomic Coulomb energy as output are better for extensibility.

Package distribution through setup.py

Writing setup.py to install the package locally. A command enerzyme should be registered to invoke the main.py to do training, evaluation or simulation.
The direct reason for raising this issue is the separation between currently running models and models under development.

General-purpose dispersion layer: DFT-D3

DFT-D3 energy calculation should be separated from PhysNet. A general layer with positions as inputs and atomic dispersion energy as output are better for extensibility.

Support EMA training scheme

Exponential moving average is a training scheme that is widely used in machine learning force field including PhysNet, SpookyNet, and NequiP. It's believed to be able to improve the convergence and thus necessary to study its effect. Implementation in NequiP repo can be a reference.

SpookyNet refactor

Support SpookyNet with model builders as done in PhysNet. Share the public modules and layers.

Add total energy scaling and shifting transformation

This type of data normalization is not well defined when priors like ZBL repulsion, electrostatic energy and dispersion correction are introduced. However, comparison should be made especially with the vanilla NequiP results to make sure that the results' advantage doesn't come from data normalization.

Add energy decomposition monitoring

As electrostatic, ZBL repulsion and dispersion correction layers are used, it's important to monitor how they evolve along the training curve. The first step is providing a monitoring option to report the averaged energy terms in the training set/validation set in every epochs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.