Machine Learning Enabled Development of Accurate Force Fields for Hydrofluorocarbons
Authors: Ning Wang, Montana Carlozo, Eliseo Marin-Rimoldi, Bridgette Belfort, Alexander W. Dowling, and Edward J. Maginn
Introduction
HFC-FFO is a repository used to rapidly calibrate the LJ parameters of HFC forcefields given experimental data. The key feature of this work is using machine learning tools in the form of Gaussian processes (GPs) which allow us to cheaply estimate the resuls of a molecular simulation given temperature state points and thermophysical property data.
Citation
This work has been submitted for review. In the meantime, you
may cite the https://doi.org/10.1021/acs.jctc.3c00338
as:
Ning Wang, Montana N. Carlozo, Eliseo Marin-Rimoldi, Bridgette J. Befort, Alexander W. Dowling, and Edward J. Maginn*, “Machine Learning-Enabled Development of Accurate Force Fields for Re- frigerants”, J. Chem. Theory Comput., 2023, 19, 14, 4546–4558
Available Data
HFC Parameter Sets
The non-dominated and best parameter sets for each HFC are
provided under HFC-FFO/rXX/analysis/csv/
. Where XX represents a different HFC. For example r-143a, r-14, r-170. The non-dominated
sets are found in rXX-pareto.csv
, and
the best sets are found in rXX-final.csv
. The parameter values in the csv files are
normalized between 0 and 1 based upon the parameter bounds for each
atom type (see manuscript, or HFC-FFO/rXX/analysis/utils/rXX.py
for definitions of
the upper and lower parameter bounds for each refrigerant).
Molecular Simulations Inputs
All molecular simulations were performed inside HFC-FFO/r##/runs
where it exists.
Each iteration was managed with signac-flow
. Inside of each
directory in runs
, you will find all the necessary files to
run the simulations. Note that you may not get the exact same simulation
results due to differences in software versions, random seeds, etc.
Nonetheless, all of the results from our molecular simulations are saved
under HFC-FFO/analysis/csv/rXX-YY-iterZZ-results.csv
, where XX
is the molecule, YY
is the stage (liquid density or VLE), and
ZZ
is the iteration number.
Surrogate Modeling Analysis
All of the scripts for the surrogate modeling are provided in
HFC-FFO/r##/analysis
, following the same naming structure as
the csv files.
Figures
All scripts required to generate the primary figures in the
manuscript are reported under HFC-FFO/r##/analysis/final-figs
and the
associated PDF files are located under
HFC-FFO/r##/analysis/final-figs/pdfs
Installation
To run this software, you must have access to all packages in the hfcs-fffit environment (hfcs-fffit.yml) which can be installed using the instructions in the next section.
This package has a number of requirements that can be installed in different ways. We recommend using a conda environment to manage most of the installation and dependencies. However, some items will need to be installed from source or pip.
Running the simulations will also require an installation of GROMACS.
This can be installed separately (see installation instructions
here <https://manual.gromacs.org/documentation/2021.2/install-guide/index.html>
).
An example of the procedure is provided below:
# First clone hfcs-fffit and install pip/conda available dependencies
# with a new conda environment named hfcs-fffit
git clone [email protected]:dowlinglab/hfcs-fffit.git
cd hfcs-fffit/
conda create --name hfcs-fffit python=3.7 -c conda-forge
conda activate hfcs-fffit
python3 -m pip install -r requirements-pip.txt
conda install --file requirements-conda.txt -c conda-forge
cd ../
# Now clone and install other dependencies
git clone [email protected]:dowlinglab/fffit.git
# Checkout the v0.1 release of fffit and install
cd fffit/
git checkout tags/v0.1
pip install .
cd ../
# Checkout the v0.1 release of block average and install
git clone [email protected]:rsdefever/block_average.git
cd block_average/
git checkout tags/v0.1
pip install .
cd ../
Usage
Liquid Density Optimization
NOTE: We use signac and signac flow (<https://signac.io/>
)
to manage the setup and execution of the molecular simulations. These
instructions assume a working knowledge of that software.
The first iteration of the liquid density simulations were
performed under HFC-FFO/r##/runs/rXX-density-iter1/
.
To run liquid density iterations, follow the following steps:
- Create the initial configuration
- Prepare rXX_gaff.xml
- Go to the data folder and use the run.sh file
conda activate hfcs-fffit cd HFC-FFO/rXX/run/rXX-density-iter1/data source run.sh
- Initialize signac workflow
- Leave ''HFC-FFO/rXX/run/rXX-density-iter1/data'' untouched
- Initialize files for simulation use
cd HFC-FFO/rXX/run/rXX-density-iter1/ python init.py
- Check status a few times throughout the process
python project.py status
- Create force fields and generate inputs
python project.py run -o create_forcefield python project.py run -o generate_inputs
- Create systems
- Note: rm -r workspace/ signac_project_document.json signac.rc will remove everything and allow you to start fresh if you mess up
python project.py run -o create_system
- Fix topology
python project.py run -o fix_topology
- Run simulation
python python project.py submit -o simulate --bundle=24 --parallel
- Calculate density
python project.py submit -o calculate_density --bundle=24 --parallel
- Extract density using the following after each LD iteration in analysis/ folder
python extract_rXX_density.py ZZ
- Run GP optimization and get samples for the next iteration in analysis/ folder
module load gcc/11.2.0
python id-new-samples.py
python plotfig_gp_examples.py
VLE Optimization
To run vapor-liquid-equilbrium iterations, follow the following steps:
- Use analysis/csv/rXX-vle-iter1-params.csv to initialize files for simulation use
cd HFC-FFO/rXX/run/rXX-vle-iter1/ python init.py
- Check status a few times throughout the process
python project.py status
- Create force fields
python project.py run -o create_forcefield
- Calculate vapor/liquid box size
python project.py run -o calc_vapboxl python project.py run -o calc_liqboxl
- Run simulation
python project.py submit -o equilibrate_liqbox --bundle=12 --parallel python project.py run -o extract_final_liqbox python project.py submit -o run_gemc --bundle=12 --parallel
- Calculate VLE Properties
python project.py run -o calculate_props
- Extract VLE properties using the following after each vle iteration in analysis/ folder
python extract_rXX_vle.py ZZ
- Analyze Data
module load gcc/11.2.0 python id-new-samples.py python get-new-samples.py python analysis.py
Final Analysis
The nondominated parameter sets and final processing steps can be ran using the following:
- Summarize data
cd HFC-FFO/rXX/run/rXX-vle-iterKK python id-pareto.py cd HFC-FFO/rXX/analysis/final-analysis python select_final.py
- Plots in paper were generated by codes in HFC-FFO/rXX/analysis/final-figs/
Credits
This work is funded by the National Science Foundation, EFRI DChem: Next-generation Low Global Warming Refrigerants, Award no. 2029354 and uses the computing resources provided by the Center for Research Computing (CRC) at the University of Notre Dame. The authors would like to thank Bridgette Befort as her work is used as the basis of this method.
Contact
Please contact Ning Wang ([email protected]), Eliseo Marin Rimoldi ([email protected]), or Montana Carlozo ([email protected]) with any questions, suggestions, or issues.