Giter Site home page Giter Site logo

espaloma_charge's Introduction

Espaloma Charge

CI

Standalone charge assignment from Espaloma framework. https://doi.org/10.1039/D2SC02739A

Installation

conda-forge

We recomend installing espaloma_charge via conda-forge:

$ mamba create -n espaloma -c conda-forge espaloma_charge

If you plan on using openff-toolkit, then install that as well when creating your espaloma_charge environment for optimal dependency solving:

$ mamba create -n espaloma -c conda-forge espaloma_charge openff-toolkit

pypi

We also have espaloma_charge on pypi, but the dgl dependency must be installed first.

# First create a conda env with mamba, conda, or micromamba
$ mamba create -n espaloma -c conda-forge dgl==1.1.2 pip python
$ mamba activate espaloma
$ pip install espaloma_charge

Examples

Option0: Assign charges to rdkit molecule.

>>> from rdkit import Chem; from espaloma_charge import charge
>>> molecule = Chem.MolFromSmiles("N#N")
>>> charge(molecule)
array([0., 0.], dtype=float32)

Assign charges to your favorite molecule in Google Colab

Option1: Use with openff-toolkit(installation required)

>>> from openff.toolkit.topology import Molecule
>>> from espaloma_charge.openff_wrapper import EspalomaChargeToolkitWrapper
>>> toolkit_registry = EspalomaChargeToolkitWrapper()
>>> molecule = Molecule.from_smiles("N#N")
>>> molecule.assign_partial_charges('espaloma-am1bcc', toolkit_registry=toolkit_registry)
>>> molecule.partial_charges
<Quantity([0. 0.], 'elementary_charge')>

Option2: Use as Command Line Interface to write antechamber-compatible charges.

$ espaloma_charge -i in.mol2 -o in.crg
$ antechamber -fi mol2 -fo mol2 -i in.mol2 -o out.mol2 -c rc -cf in.crg 

Reference

If you are using this little tool in your pipeline, please consider citing:

@Article{D2SC02739A,
author ="Wang, Yuanqing and Fass, Josh and Kaminow, Benjamin and Herr, John E. and Rufa, Dominic and Zhang, Ivy and Pulido, Iván and Henry, Mike and Bruce Macdonald, Hannah E. and Takaba, Kenichiro and Chodera, John D.",
title  ="End-to-end differentiable construction of molecular mechanics force fields",
journal  ="Chem. Sci.",
year  ="2022",
volume  ="13",
issue  ="41",
pages  ="12016-12033",
publisher  ="The Royal Society of Chemistry",
doi  ="10.1039/D2SC02739A",
url  ="http://dx.doi.org/10.1039/D2SC02739A"}


@misc{https://doi.org/10.48550/arxiv.2302.06758,
  doi = {10.48550/ARXIV.2302.06758},
  
  url = {https://arxiv.org/abs/2302.06758},
  
  author = {Wang, Yuanqing and Pulido, Iván and Takaba, Kenichiro and Kaminow, Benjamin and Scheen, Jenke and Wang, Lily and Chodera, John D.},
  
  keywords = {Machine Learning (cs.LG), Chemical Physics (physics.chem-ph), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Physical sciences, FOS: Physical sciences},
  
  title = {EspalomaCharge: Machine learning-enabled ultra-fast partial charge assignment},
  
  publisher = {arXiv},
  
  year = {2023},
  
  copyright = {Creative Commons Attribution 4.0 International}
}


espaloma_charge's People

Contributors

ijpulidos avatar j-wags avatar jchodera avatar mikemhenry avatar yuanqing-wang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

espaloma_charge's Issues

Espaloma performance

Hi all and thanks for this repo!
I tried to evaluate on my favorite toy (Benzamidine) and I didn't get very good performance and was curious if you had any comment on it.
image

First of all the charges look very different from AM1-BCC charges (init here is Espaloma, fit is AM1-BCC)

=== Molecule MOL, 18 atoms, 18 bonds ===
Name (init)  Name (new)  Element  Atomtype  Charge (init)  Charge (fit)  Chiral centers
          C          C1        C   (C1) za         -0.254        -0.262                
          C          C2        C   (C2) zb         -0.164        -0.082                
          C          C3        C        C3         -0.162        -0.116                
          C          C4        C        C4         -0.188        -0.058                
          C          C5        C        C5         -0.162        -0.116                
          C          C6        C   (C6) zb         -0.164        -0.082                
          C          C7        C   (C7) zc          0.300         0.522                
          N          N1        N   (N1) zd         -0.362        -0.469                
          N          N2        N   (N2) ze         -0.363        -0.453                
          H          H1        H        H1          0.273         0.149                
          H          H2        H        H2          0.251         0.170                
          H          H3        H        H3          0.258         0.170                
          H          H4        H        H4          0.251         0.170                
          H          H5        H        H5          0.273         0.149                
          H          H6        H   (H6) zf          0.363         0.327                
          H          H7        H   (H7) zf          0.363         0.327                
          H          H8        H   (H8) zg          0.245         0.327                
          H          H9        H   (H9) zg          0.245         0.327 

Beyond that, if I use Espaloma charges for torsion scans of the C-C-C-N torsion I get very weird torsion profiles compared to AM1-BCC. Left: AM1-BCC, Right: Espaloma. Focus on the green curve which is the initial MM evaluation with Sage FF and AM1-BCC and Espaloma respectively. I find it concerning that it considers the barrier at 0 degrees the highest instead of the one at 90 degrees which is also shown by xTB to be the highest barrier.
image

Support for DGL 2

We currently support outdated versions of DGL (namely 1.1.2), we probably want to support more up to date versions of this library in order to be able to have espaloma_charge as part of an environment with reasonable up to date packages.

related to #34

Calculations on charged molecules

Dear developers,
I tried to calculate charge distribution of positively charged molecules using espaloma charge v 0.0.8 from pip distribution, but the resulting partial atomic charges always sum to 0.0. The following piece of code demonstrates the behavior

from espaloma_charge import charge
from rdkit import Chem
mol=Chem.MolFromSmiles("[NH4+]")
mol=Chem.AddHs(mol)
tc=Chem.GetFormalCharge(mol)
print(tc);
print([atom.GetFormalCharge() for atom in mol.GetAtoms()])
c=charge(mol,total_charge=tc)
print(c)
print(sum(c))`

The corresponding output is

1
[1, 0, 0, 0, 0]
/opt/conda/lib/python3.10/site-packages/dgl/heterograph.py:92: DGLWarning: Recommend creating graphs by `dgl.graph(data)` instead of `dgl.DGLGraph(data)`.
  dgl_warning(
[-0.93732476  0.2343312   0.2343312   0.2343312   0.23433116]
1.4901161193847656e-08

Is it a bug or I just missinterpreted the output?

Thank you

Jiri

Is there any way to generate equivalent charges for resonance hybrids like carboxylate?

Hi all, thank you for making this package available!

I tested this out on some molecules and found that, for what we might think of as equivalent atoms, partial charges are different.

Example (see charges on oxygens):

from rdkit import Chem; from espaloma_charge import charge
from rdkit.Chem.Draw import IPythonConsole, rdMolDraw2D
import numpy as np
from IPython.display import SVG

m = Chem.MolFromSmiles('OC(c1ccccc1)=O')
ch = charge(m, total_charge=-1)

for i in range(m.GetNumAtoms()):
   m.GetAtomWithIdx(i).SetProp("atomNote",str(np.around(ch[i],3)))
   
d2d = rdMolDraw2D.MolDraw2DSVG(350,300)
d2d.DrawMolecule(m)
d2d.FinishDrawing()
SVG(d2d.GetDrawingText())

Screenshot 2023-02-24 at 6 33 54 pm

This may be expected: Section 8.5 of the manuscript says Ensuring full chemical equivalence is nontrivial:

the non-uniqueness of formal charge assignment (obvious in molecules such as guanidinium, where resonance forms locate the formal charge on different atoms) does not guarantee the assigned parameters will respect chemical equivalence (a form of invariance)

I see why it's hard - SMILES requires a kekule form rather than a resonance form. But is it possible to get around this? One might simply average the carboxylate charges, for instance, but I wonder if there's a more principled way?

Thanks
Lewis

Is total_charge always assumed to be 0 in the charge() API?

In the charge(...) API, the total_charge is computed but never used.

When the SPICE run() is called in production.py, the model is called with ChargeEquilibrium() with no kwargs, which means the total_charge could be erroneously assumed to be 0 if total_charge isn't specified and the model doesn't have q_ref.

It's not clear to me that there isn't a code path that mistakenly sets the total_charge to 0. Can we fix this to require total_charge, or eliminate the argument altogether to require that we have the formal charges defined in the graph?

fewer charges than atoms

Hi,

Thanks for this very useful tool!

I installed with pip, and the option 0 example you include on the github webpiage seems to work.
however, when i try larger molecules I keep getting charge arrays that have fewer entries
than my molecules have atoms

e.g.:

from rdkit import Chem; from espaloma_charge import charge
molecule = Chem.MolFromSmiles("CCOCCNC(=O)C[C@@h]1N=C(c2ccc(cc2)Cl)c2c(-n3c1nnc3C)sc(c2C)C")
charge(molecule)
/home/celeristx/miniconda3/lib/python3.10/site-packages/dgl/heterograph.py:72: DGLWarning: Recommend creating graphs by dgl.graph(data) instead of dgl.DGLGraph(data).
dgl_warning('Recommend creating graphs by dgl.graph(data)'
array([-0.18779773, 0.4208883 , -0.44761604, 0.419579 , 0.12307103,
-0.29645967, 0.51088446, -0.41038862, -0.13668227, 0.28527066,
-0.58568656, 0.5189817 , -0.11827499, 0.07983082, -0.0337906 ,
0.0861465 , -0.03379034, 0.07983054, -0.08033632, -0.18743275,
0.3917905 , -0.49400392, 0.46795568, -0.27843612, -0.21633013,
0.47469214, -0.29152048, -0.13895345, 0.03585355, -0.07050467,
0.01740551, 0.09582561], dtype=float32)

the molecule has 58 atoms, while above only 32 numbers are provided.
Turns out, if I only count non-hydrogens i get 32 atoms, it looks as if the charge
command only tells me the charges of the nonH atoms, or fails to infer the implicit
hydrogens in the smiles string.

also, even if this worked, I'd struggle to assign the charges. Using the tool as seen above
it is not clear which charge belongs to which atom.

I also tried to directly read a local sdf/mol file incuding explicit hydrogens with some rdkit reader but the result is the
same (charge command only gives an array with 32 entries while the molecule in the mol file has 58 atoms.

from rdkit import Chem; from espaloma_charge import charge
mol=Chem.rdmolfiles.MolFromMolFile('small.mol')
charge(mol)
/home/celeristx/miniconda3/lib/python3.10/site-packages/dgl/heterograph.py:72: DGLWarning: Recommend creating graphs by dgl.graph(data) instead of dgl.DGLGraph(data).
dgl_warning('Recommend creating graphs by dgl.graph(data)'
array([-0.14054929, 0.40022638, -0.44654357, 0.4333703 , 0.16091529,
-0.35090274, 0.61365485, -0.15334736, -0.49362743, 0.2768011 ,
-0.59814954, 0.4693746 , 0.5192658 , -0.26549223, -0.54531276,
-0.1246964 , -0.24202514, -0.29489523, 0.414872 , 0.531917 ,
0.07317285, 0.07317285, -0.09172964, -0.19458656, -0.21533589,
-0.04015996, -0.0401599 , 0.1066979 , 0.15635435, 0.07988627,
0.01111788, -0.0832857 ], dtype=float32)

As rdkit would not write mol2 files, I also see no way to directly write the result as a file to disk
for further use after this procedure.

Finally the command line tool you mention in Option 2 (espaloma_charge) is nowhere to
be found in the repository.

Any help very much appreciated!
cheers,
Michael

Toolkit wrapper should not issue `IncorrectNumConformersWarning`

From #19:

By the way, also the warning Espaloma gives (for which I'm deleting here the conformer) seems a bit excessive. It feels to me like it could just silently ignore the existence of conformers in the molecule.

IncorrectNumConformersWarning: Molecule 'Molecule with name 'BEN_pH7_4.sdf' and SMILES '[H][c]1[c]([H])[c]([H])[c]([C]([N]([H])[H])=[N+]([H])[H])[c]([H])[c]1[H]'' has 1 conformers, but charge method 'espaloma-am1bcc' expects exactly 0

Cannot `import espaloma_charge` upon fresh installation

I used pip to install espaloma_charge in an environment with the openff-toolkit package installed via conda. The respective install commands run, and appear to pull in sensible packages, but when I start an interpreter session and try to import something from espaloma charge I get the following error:

ImportError: cannot import name 'DILL_AVAILABLE' from 'torch.utils.data.datapipes.utils.common' (path/to/miniconda3/envs/espalomatester/lib/python3.12/site-packages/torch/utils/data/datapipes/utils/common.py)

I've tried doing several different things, but this will minimally reproduce this issue if you have a fresh miniconda3 around:

mamba create -n espalomatester -c conda-forge openff-toolkit
conda activate espalomatester
pip install espaloma_charge
python -m espaloma_charge

One can of course open an interpreter session and try to do things more similar to the example input in the readme, but when I do that I get the same error.

Of particular note, when I check to see if it failed to install dill somehow I get the following, implying that some version of it has been installed:

$ pip install dill
Requirement already satisfied: dill in ./miniconda3/envs/espalomatester/lib/python3.12/site-packages (0.3.8)

I'd really appreciate any direction, or if there's some mistake I've made in trying to install this tool--though I must say I don't know how the error could be on my end, given the minimal nature of the install procedure.

0 charge for Hydrogens

Hi!
I am so happy to find this tool.
Here is a simple question:
I test espaloma_charge + antechamber on a small molecule but found charges were 0 for all hydrogens.
As a comparison, using antechamber alone assign positive charges to all Hs.
The SMILES is
CC(C)(C)c1ccc(cc1)c2nnc(o2)c3cccc(c3O)OC ZINC001252803730_1
Could you please take a look?
Thank you!

espaloma_charge and openff-toolkit

Hi, I tried to install the latest version of the openff-toolkit 0.12 together with espaloma_charge, but both conda and mamba were not able to make it. I had to pin python=3.9 and that made me able to install ambertools=22, then finally the toolkit. I think this is something to bear in mind, as I guess other people might be interested in using espaloma with the openff-toolkit.

Handling of non-zero charged molecules

The model seems to assume that all molecules have a formal charge of zero and equilibrates the charges accordingly. This looks like it was also an issue here https://github.com/choderalab/espaloma but was fixed choderalab/espaloma#122.

The total_charge argument to the espaloma_charge.charge function also looks to not be used in practice.

if total_charge is None:
total_charge = Chem.GetFormalCharge(molecule)
graph = from_rdkit_mol(molecule)
if torch.cuda.is_available():
graph = graph.to("cuda:0")
model = model.cuda()
graph = model(graph)
return graph.ndata["q"].cpu().detach().flatten().numpy()

To reproduce, using the pip installed v0.0.8.

import espaloma_charge
import numpy as np
from rdkit import Chem

smiles = ["C", "[NH4+]", "CC(=O)[O-]"]
mols = [Chem.AddHs(Chem.MolFromSmiles(s)) for s in smiles]

charges = [espaloma_charge.charge(m, total_charge=Chem.GetFormalCharge(m)) for m in mols]

print(charges)
print([np.sum(c) for c in charges])

The second print statement outputs: [0.0, 0.0, -1.4901161e-08]

Espaloma assigns total neutral charge to charged molecules

Thanks for putting this up as a package, @yuanqing-wang -- it makes it very easy to use!

I did notice that Espaloma always assigns charges that sum up to 0 to a molecule, even for charged molecules where a total_charge is passed in. Is that intentional -- do we need to rescale manually?

from espaloma_charge import charge
from openff.toolkit.topology.molecule import Molecule

offmol = Molecule.from_smiles("O=C[O-]")
esp_charges = charge(offmol.to_rdkit(), total_charge=-1)
esp_charges.sum()  # 0.0

EspalomaCharge toolkit wrapper generates np.float32 charges, but np.float64 are expected

I'm tracking down an odd issue where it seems like the pint.Quantity wrapped np.float32 is causing problems when molecule.to_openeye() is called afterwards:

Traceback (most recent call last):
  File "/lila/data/chodera/chodera/espaloma_charge/scripts/hydration-free-energies/compare-models.py", line 265, in <module>
    cli()
  File "/lila/home/chodera/miniconda/envs/hydration/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/lila/home/chodera/miniconda/envs/hydration/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/lila/home/chodera/miniconda/envs/hydration/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/lila/home/chodera/miniconda/envs/hydration/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/lila/home/chodera/miniconda/envs/hydration/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/lila/data/chodera/chodera/espaloma_charge/scripts/hydration-free-energies/compare-models.py", line 194, in errors
    oemol = charged_molecule.to_openeye()
  File "/lila/home/chodera/miniconda/envs/hydration/lib/python3.10/site-packages/openff/toolkit/topology/molecule.py", line 4791, in to_openeye
    return toolkit_registry.call(
  File "/lila/home/chodera/miniconda/envs/hydration/lib/python3.10/site-packages/openff/toolkit/utils/toolkit_registry.py", line 356, in call
    raise e
  File "/lila/home/chodera/miniconda/envs/hydration/lib/python3.10/site-packages/openff/toolkit/utils/toolkit_registry.py", line 352, in call
    return method(*args, **kwargs)
  File "/lila/home/chodera/miniconda/envs/hydration/lib/python3.10/site-packages/openff/toolkit/utils/openeye_wrapper.py", line 1504, in to_openeye
    oe_atom.SetPartialCharge(
  File "/lila/home/chodera/miniconda/envs/hydration/lib/python3.10/site-packages/openeye/oechem.py", line 13912, in SetPartialCharge
    return _oechem.OEAtomBase_SetPartialCharge(self, arg2)
TypeError: in method 'OEAtomBase_SetPartialCharge', argument 2 of type 'double'

but np.float64 generated by the other toolkit wrappers appears to be fine.

I'll try to create a minimal test case here once I figure out what is going on.

In the meantime, I'll make changes to follow this convention used in the toolkit in the hydration branch.

Retrain EspalomaCharge with resonance-invariant features

Thanks to @ljmartin in #16, we realize we used a resonance-sensitive feature atom.GetTotalValence() in the training of EspalomaCharge 0.0.8.

I've opened this issue so we can track progress toward the following updates:

  • Retraining EspalomaCharge using the resonance-invariant atom.GetDegree() instead
  • Removing the ability for users to specify net charge in charge()
  • Adding tests to make sure the total partial charge matches the net formal charge (to within a tolerance)
  • Adding tests to make sure carboxylates and other resonance-equivalent atoms end up with the same partial charge
  • Releasing a new version

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.