Giter Site home page Giter Site logo

josejimenezluna / delfta Goto Github PK

View Code? Open in Web Editor NEW
95.0 5.0 13.0 3.84 MB

Δ-QML for medicinal chemistry

License: GNU Affero General Public License v3.0

Python 99.18% Makefile 0.38% Dockerfile 0.29% Shell 0.09% Batchfile 0.06%
machine-learning deep-learning pytorch quantum-chemistry

delfta's Introduction

DelFTa: Open-source Δ-quantum machine learning for medicinal chemistry

delfta Anaconda-Server Badge Documentation Status codecov License: AGPL v3

Overview

The DelFTa application is an easy-to-use, open-source toolbox for predicting quantum-mechanical properties of drug-like molecules. Using either ∆-learning (with a GFN2-xTB baseline) or direct-learning (without a baseline), the application accurately approximates DFT reference values (ωB97X-D/def2-SVP). It employs 3D message-passing neural networks trained on the QMugs dataset of quantum-mechanical properties, and can predict formation and orbital energies, dipoles, Mulliken partial charges and Wiberg bond orders. See the paper for more details (version 1.0.0 used in this work).

Installation

We currently only support Python 3.8 and 3.9 Linux builds.

Installation via conda

We recommend and support installation via the conda package manager, and that a fresh environment is created beforehand. Then fetch the package from our channel:

conda install delfta -c delfta -c pytorch -c rusty1s -c conda-forge

Installation via Docker

A CUDA-enabled container can be pulled from DockerHub.

We also provide a Dockerfile for manual builds:

docker build -t delfta . 

Attach to the provided container with:

docker run -it delfta bash

First run

DelFTa requires some additional files (e.g. trained models) before it can be used. Execute the following in order to fetch those:

python -c "import runpy; _ = runpy.run_module('delfta.download', run_name='__main__')"

Quick start

We interface with Pybel (OpenBabel). Most molecular file formats are supported (e.g. .sdf, .xyz).

from openbabel.pybel import readstring
mol = readstring("smi", "CCO")

from delfta.calculator import DelftaCalculator
calc = DelftaCalculator()
preds = calc.predict(mol)

print(preds)

Further documentation on how to use the package is available under ReadTheDocs.

Tutorials

In-depth tutorials can be found in the tutorials subfolder. These include:

  • delta_vs_direct.ipynb: This showcases the basics of how to run the calculator, and compares results using direct- and Δ-learning models.
  • calculator_options.ipynb: This dives into the different options you can initialize the calculator class with.
  • training.ipynb: A simple example of how networks can be trained.

Citation

If you use this software or parts thereof, please consider citing the following BibTex entry:

@article{atz2022delta,
  title={$\Delta$-Quantum machine-learning for medicinal chemistry},
  author={Atz, Kenneth and Isert, Clemens and B{\"o}cker, Markus NA and Jim{\'e}nez-Luna, Jos{\'e} and Schneider, Gisbert},
  journal={Physical Chemistry Chemical Physics},
  volume={24},
  number={18},
  pages={10775--10783},
  year={2022},
  publisher={Royal Society of Chemistry}
}

delfta's People

Contributors

atzkenneth avatar cisert avatar josejimenezluna avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

delfta's Issues

Check for validity of molecules when reading

When an invalid input file is provided, OpenBabel does print a warning, but we try to keep going and return somewhat confusing errors.

  • example_file_1.sdf returns Molecules at position [0] have no 3D conformations available. Either provide a mol with one or re-run calculator with force3D=True.
  • example_file_2.sdf returns need at least one array to concatenate

example_files.zip

Checking if molecules are valid and throwing an error if not might be clearer for the user. In the two cases above, input_ is either a list of empty (no atoms) pybel molecules (for example_file_1.sdf) or just an empty list (for example_file_2.sdf), so we could check for that.

Installation: Docker image missing libXau.so.6

When using the docker image from DockerHub, the import of openbabel fails:

>>> from openbabel.pybel import readstring
==============================
*** Open Babel Error  in openLib
  /opt/conda/envs/delfta/lib/openbabel/3.1.0/png2format.so did not load properly.
 Error: libXau.so.6: cannot open shared object file: No such file or directory

This can be solved by running (inside the container):

apt-get update
apt-get install libxtst6

(see also https://stackoverflow.com/questions/17355863/cant-find-install-libxtst-so-6)

tarfile.ReadError

I have installed delfta with conda install delfta -c delfta -c pytorch -c rusty1s -c conda-forge on a WSL Ubuntu 20.04
Now I'm trying to execute the "First Run" commands, but both
python -c "import runpy; _ = runpy.run_module('delfta.download', run_name='__main__')" and

from delfta.download import _download_required
_download_required()

throw tarfile.ReadError: file could not be opened successfully
The file in question is under home/user/miniconda3/envs/delfta/lib/python3.8/tarfile.py

Use single progress bar for inputs > batch_size

When running DelftaCalculator for more than batch_size molecules, new progress bars are created for each batch. This somewhat defeats the purpose of a progress bar, as it always only goes from 0/1 to 1/1, and then a new bar is created, so there's no estimate on how long the overall process will take, or which percentage of the overall task has been completed. As a minor point, we can probably skip the progress if the input is < batch_size, although the user could also turn that off themselves.

Screenshot 2021-06-28 at 14 10 31

Change order of 3D-check and hydrogen check; modify hydrogen check

  • make3D automatically assigns hydrogens, so when passing a molecule without hydrogens but using force3D=True, there's no feedback in the logger that hydrogens have been added. Additionally, this means that force3d=True overrides addh=False. I suggest first checking if hydrogens are present, and, if that is not the case but force3d=True, to return an error.
  • Checking for hydrogens with 1 in set([atom.atomicnum for atom in mol.atoms]) runs the risk of ignoring molecules where only certain hydrogens are present. It's a strange edge case, but I suggest checking if e.g. [atom.atomicnum for atom in mol.atoms] changes after adding hydrogens to a copy of the molecule. Not elegant, but didn't find an alternative yet.

Parallelize xtb code

For large numbers of molecules (i.e. an sdf file) it may make more sense to use joblib with large batch sizes.

Fix tests to refer to correct directory

  • need to download to and run from the same folder
  • assert len(mols) == 100 in test_xtb.py and test_calculator.py to make sure we actually have something to run the tests on

E_gap doesn't always equal E_lumo - E_homo

While I realize E_gap is a separate model itself, the value it returns is sometimes way different from calculating E_lumo - E_homo. For example, take the following molecule with the following settings:

calc = DelftaCalculator(delta=True, xtbopt=True, return_optmols=True, addh=True)
mol = readstring("smi", "C[C@@H]1CC[C@@H]2C(C)(C)[C@H](O)[C@]3(C)CC[C@@]21C3")

Results are:

'E_homo': -0.23676376
'E_lumo': 0.11727003
'E_gap': 0.12012529

E_gap should be close to 0.354

Cannot download files

Hi! Thanks for a great repository.
I am trying to download the files as suggested in the README, but unfortunately I get this error:

(delfta) mduranfrigola@raluy:~/Desktop$ python -c "import runpy; _ = runpy.run_module('delfta.download', run_name='__main__')"
2024/04/24 05:17:59 PM | DelFTa | INFO: Now downloading trained models and utils...
Traceback (most recent call last):                                                                                                  
  File "<string>", line 1, in <module>
  File "/home/mduranfrigola/miniconda3/envs/delfta/lib/python3.9/runpy.py", line 228, in run_module
    return _run_code(code, {}, init_globals, run_name, mod_spec)
  File "/home/mduranfrigola/miniconda3/envs/delfta/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/mduranfrigola/miniconda3/envs/delfta/lib/python3.9/site-packages/delfta/download.py", line 134, in <module>
    _download_required()
  File "/home/mduranfrigola/miniconda3/envs/delfta/lib/python3.9/site-packages/delfta/download.py", line 93, in _download_required
    with tarfile.open(models_tar) as handle:
  File "/home/mduranfrigola/miniconda3/envs/delfta/lib/python3.9/tarfile.py", line 1797, in open
    raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully

Do you know what might be going on?
Thanks a lot in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.