Molecular energy and ligand stability prediction models based on deep neural tensor networks and MMFF optimized geometries.
├── LICENSE
├── Makefile <- Makefile, used to create new environment, install requiremnts.
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── external <- Data used to generate results in paper.
│ ├── processed
│ └── raw <- The original, immutable data dump.
├── models <- Trained models.
├── notebooks <- Jupyter notebooks inlcuding tutorials.
├── reports <- Generated analysis results.
│ └── result_data <- Result data for confs, used to get conformation stability result.
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
├── setup.py <- makes project pip installable (pip install -e .) so src can be imported
├── src <- Source code for use in this project.
│ ├── __init__.py <- Makes src a Python module
│ │
│ ├── data <- Scripts to download or generate data
│ │ └── prepare_dataset.py
│ ├── models <- Scripts to train models and then use trained models to make
│ │ │ predictions
│ │ ├── predict_model.py
│ │ └── train_model.py
│
└── tox.ini <- tox file with settings for running tox; see tox.testrun.org
Setup functions are in Makefile. To see functions in Makefile:
make -f Makefile
- Create DTNN_7ib environment:
make Makefile create_environment
- Activate environment:
source activate DTNN_7ib
- Install requirements:
make Makefile requirements
All scipts for data prepartion are in src/data directory. To see the option for data prepation:
python prepare_dataset.py --help
To build qm9mmff dataset for training, validation, testlive and testing
python prepare_dataset.py
To build eMol9_CM:
python prepare_dataset.py --datatype emol9mmff
To build Plati_CM:
python prepare_dataset.py --datatype platinummmff
raw data are in data/raw directory, processed data are in data/processed directory. The processed data we used in data/external directory
All scipts for model training are in src/data directory. To see the option for model training:
python train_model.py --help
Train DTNN_7ib model:
python train_model.py --addnewm
Train TL_QM9M:
python train_model.py --geometry MMFF --transferlearning
Train TL_eMol9_CM:
python train_model.py --datatype emol9mmff --geometry MMFF --transferlearning
All scipts for prediction are in src/data directory. To see the option for prediction:
python predict_model.py --help
Performanc of DTNN_7ib on QM9:
python predict_model.py
Performanc of DTNN_7ib on QM9MMFF:
python predict_model.py --testpositions mmffpositions
Performance of TL_QM9M on QM9MMFF:
python predict_model.py --modelname TL_QM9_name --testpositions mmffpositions
Peformance of TL_eMol9_CM on eMol9_CM:
python predict_model.py --modelname TL_eMOL9_CM_name --testtype emol9mmff --testpositions positions1
Peformance of TL_eMol9_CM on Plati_CM:
python predict_model.py --modelname TL_eMOL9_CM_name --testtype platinummmff --testpositions positions1
Note:
- If you directly run predict_model.py, the MAE and RMSE is for DTNN_7ib trained with best validation error in one of splits (performance of DTNN_7ib in paper is the avarage of five different splits) and transfer learning is applied on the atom vectors learned from this DTNN_7ib model
- remember to change the model name
Thanks for DTNN code(https://github.com/atomistic-machine-learning/dtnn), we reimplemented elementary blocks in DTNN.
Ref:Jianing Lu, Cheng Wang and Yingkai Zhang. J. Chem. Theory Comput. https://pubs.acs.org/doi/pdf/10.1021/acs.jctc.9b00001