Giter Site home page Giter Site logo

Comments (12)

davidlmobley avatar davidlmobley commented on August 26, 2024

@shuail - I think the key step is to send this to the RDKit guys. Have you done so?

from lomap.

shuail avatar shuail commented on August 26, 2024

@davidlmobley I posted this issue on rdkit mail list and the answer from Greg is

"The RDKit Mol2 parser is really only validated for the atom types generated by corina. I'm not surprised that the ouput from open babel would not be understood. This is documented:
http://rdkit.org/docs/api/rdkit.Chem.rdmolfiles-module.html#MolFromMol2File"

And after, I ask if there is a plan for rdkit to writeout mol2 format directly, and it seems that someone already worked on that code which is almost ready to merge into the main branch of rdkit. (still need some testing). I will keep posted here if the rdkit implement the mol2 file writer. If that's the case, I could generate the mol2 files directly from rdkit using smile string and feed that mol2 file into LOMAP.

from lomap.

davidlmobley avatar davidlmobley commented on August 26, 2024

@shuail - why don't you just generate mol2 files directly from SMILES via OEChem?

Though, it's not actually clear to me that this would solve the problem generally, as I just checked the OpenEye docs and I don't find any mention of Corina atom types, so it's not at all clear that RDKit could handle mol2 files written by OEChem either. But maybe it is worth checking this for one of your problem cases.

In my opinion RDKit's position is a little crazy and very frustrating. The mol2 format was invented by TRIPOS and the spec indicates, as I understand it, that the atom types are SYBYL atom types. What RDKit is saying is that they're not going to support SYBYL atom types, so can this format even be legitimately called ".mol2" if it insists on using atom types which are not SYBYL types?

I suppose there's another possibility which actually may be better. Is there a mechanism to BUILD molecules IN RDKit rather than reading them? Something like looping over atoms, creating a bunch of atoms, and connecting them by bonds of specified order? If so then one very general purpose approach would be to:

  • Provide a reader that reads in molecules from sdf, mol2, or whatever using OEChem to OEMol objects (trivial)
  • Provide a utility which converts OEMol objects to RDKit molecules (we'd have to write this)
  • Take those molecules as input for LOMAP

If someone wanted to put molecules into LOMAP in some other way (such as without OEChem) then it would be their job to figure out how to turn them into RDKit molecules.

This would fix the issue since OEChem's mol2 reader is very general/robust and has basically handled anything I've ever thrown at it (and it would also be able to handle sdf or well-formed PDB files as input, broadening the range of input possibilities), and then once we have an OEMol it should be easy (I think) to convert it to an RDKit molecule by looping over the relevant info. (If this appeals, I can show an example of doing something similar -- turnign and OEMol into an OpenMM Topology and back).

from lomap.

shuail avatar shuail commented on August 26, 2024

@davidlmobley here is link about the rdkit mol2 file writer, rdkit/rdkit#415, it looks the rdkit developer still plan to use the Sybyl atom types and try to do some final step testing comments by UnixJunkie at the end though I don't know how fast it could finish. Yes, I think I could try to use openeye tools to construct the molecule instead of using openbabel. Hope rdkit will like openeye mol2 file format in general. For the option of building molecule inside RDKit using OEMol objects, that sounds interesting but maybe it's time consuming for me to explore how RDKit build topologies from scratch, but I will keep this suggestion in mind. Thx!

from lomap.

UnixJunkie avatar UnixJunkie commented on August 26, 2024

you should use sdf is you want rdkit to eat your molecules

from lomap.

davidlmobley avatar davidlmobley commented on August 26, 2024

@UnixJunkie - the issue with SDF is that it doesn't carry partial charges, and LOMAP needs partial charges.

I suppose it could be redesigned to avoid this -- @shuail , we could work based on formal charges at the planning stage if needed. Still, it seems silly to have to go to a lot of effort to avoid mol2 format just because RDKit doesn't like it when it is a perfectly good format for the rest of what we want to do (probably the best present-day format for getting small molecules into molecular simulations).

from lomap.

davidlmobley avatar davidlmobley commented on August 26, 2024

@shuail - it looks to me like it's rather trivial to create molecules and atoms with RDKit; Googling for 30 seconds leads to this: http://www.rdkit.org/Python_Docs/rdkit.Chem.rdchem.EditableMol-class.html

It looks like once you have a molecule, you can easily AddAtom, AddBond, and perform the equivalent removal operations -- e.g. see http://www.rdkit.org/Python_Docs/rdkit.Chem.rdchem.EditableMol-class.html . There's even an example of this usage in the "getting started" info (http://www.rdkit.org/docs/GettingStartedInPython.html):

More complex transformations can be carried out using the RWMol class:

>>> m = Chem.MolFromSmiles('CC(=O)C=CC=C')
>>> mw = Chem.RWMol(m)
>>> mw.ReplaceAtom(4,Chem.Atom(7))
>>> mw.AddAtom(Chem.Atom(6))
7
>>> mw.AddAtom(Chem.Atom(6))
8
>>> mw.AddBond(6,7,Chem.BondType.SINGLE)
7
>>> mw.AddBond(7,8,Chem.BondType.DOUBLE)
8
>>> mw.AddBond(8,3,Chem.BondType.SINGLE)
9
>>> mw.RemoveAtom(0)
>>> mw.GetNumAtoms()
8

So, it looks like you should be able to generate an RDKit molecule from another source in a very similar way to how we generate an OpenMM topology here (https://github.com/open-forcefield-group/openforcefield/blob/master/openforcefield/utils/utils.py#L157) from an OEMol. Instead you could go from OEMol to RDKit molecule.

from lomap.

UnixJunkie avatar UnixJunkie commented on August 26, 2024

from lomap.

shuail avatar shuail commented on August 26, 2024

@davidlmobley I just installed the OEChem and test all my existing systems, the mol2 files generated by OEChem work perfectly with the rdkit, it will not report the "cannot kekulize errors". I will then continue to work on the more general approach which use the rdkit constructor to create the rdmol object to avoid using the rdkit mol2 file reader.

from lomap.

davidlmobley avatar davidlmobley commented on August 26, 2024

Ah, that's great that you have an interim fix, then. :)

from lomap.

maxjump avatar maxjump commented on August 26, 2024

I have also been running into this error message when using mol2 files generated with openbabel 2.3.2. However I find it somewhat irritating that the "kekulization" works fine for most of the mol2 files generated this way. Do you have an idea whether this error is caused by certain atom names from the Sybyl atom types?

from lomap.

davidlmobley avatar davidlmobley commented on August 26, 2024

I'd guess it's that RDKit strictly only supports Corina atom types when using its mol2 reader (which is rather limited) and many other tools like Sybyl atom types. So you probably have atom types RDKit can't interpret coming in on occasion.

A better option may be to shift (in a few months) to SDF format; RDKit is shifting to an SDF format which supports partial charges via a standard tag that OEChem will also support, and it has better SDF support than mol2 support.

from lomap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.