The QMC4AQML dataset
Authors:
- Bing Huang (University of Vienna, [email protected])
- O. Anatole von Lilienfeld (University of Toronto, [email protected])
- Jaron T. Krogel (Oak Ridge National Laboratory, [email protected])
- Anouar Benali (Oak Ridge National Laboratory, [email protected])
- Purpose: used for testing multi-level quantum machine learning model aiming for an accuracy at the quantum Monte Carlo (QMC) level
- Molecular data are provided in the extxyz format (https://wiki.fysik.dtu.dk/ase/ase/io/formatoptions.html#extxyz).
- Geometries were optimzed at the level of B3LYP/cc-pVTZ
- All FN-QMC energies were obtained using the B3LYP/cc-pVTZ nodal surfaces.
- retrievable through the key
qmc_b3lyp
in the extxyz file
- retrievable through the key
- Quantum chemical (QC) levels of theory include
- HF, DFT (PBE, B3LYP)
- post-HF levels of theory: MP2, CCSD(T)
- and basis set is fixed to cc-pVTZ
- with SPE retrievable as the key of format
theory+vtz
- e.g., B3LYP/cc-pVTZ energy is assigned the key
b3lypvtz
(which is identical to keyb5vlypvtz
)
- e.g., B3LYP/cc-pVTZ energy is assigned the key
- located in the folder
targets/
(all energies in Hartree) - totalling 50 molecules, drawn randomly from the QM9 dataset
- calculated at FN-QMC level and various other cheaper levels
Two sets of amons are available:
- graph amons (with QMC energy) of the QM9 dataset (in the folder
amons-ni5-qmc/
) (all energies in kcal/mol)- made up of at most 5 heavy atoms
- totalling 1175
- amounts to force-field global minima
- conformer amons (without QMC energy) for the 50 test molecules (in
amons-ni7/
) (all energies in Hartree)- the maximal number of heavy atoms is now 7
- include all conformational degrees of freedom
- that is, multiple conformer amons may share the same SMILES string
- the conformer amons for, say the first test molecule (
targets/frag_0001.xyz
), areamons-ni7/frag_0001_a*xyz