mobleylab / lomap Goto Github PK

View Code? Open in Web Editor NEW

36.0 36.0 17.0 435 KB

Alchemical mutation scoring map

License: MIT License

Python 99.81% Shell 0.19%

lomap's People

Contributors

Stargazers

Watchers

Forkers

yuan-hu-pharma brycestx shuail kpiyush1 yingyang90 ppxasjsm michellab lazuraslong mark-mackey-cresset iwatobipen shuchingou layeqa pk-organics dwhswenson egtai

lomap's Issues

Re-license under MIT

@nividic - I see that for some reason we have this under GPL currently. Any issues with me switching it to MIT? @shuail , we also need your input as you've contributed to this.

@halx - I don't think any of your code is in here yet but I probably should get your input too.

[WIP] Lomap Refactoring

Dear All, this is a partial starting plan to the Lomap code refactoring.

My idea is refactoring Lomap in building blocks. The refactoring will be large, and the current code will not be compatible with future versions. We can afford this kind of refactoring mainly because the current Lomap user base is not large and we do not have to rely on specific generated input/output Lomap formats that needs to be kept between versions. The refactoring should address the following main key points:

(1) support to multiple toolkits;

(2) support to NOT hardcoded rules to score molecules and to generate graphs;

(3) catch up with the latest package dependencies.

The first request will allow users to set their favorite chemoinformatic environment. Two possible toolkits have been selected so far: RDkit and OpenEye which are well spread in the community. The second and most important request will allow users to introduce new science in the construction of RBFE pre-calculation plans with the hope of increase RBFE efficiency. Finally, Lomap has not been maintained for long time and the package dependency updates have broken the current code in many points.

Multiple toolkit support

The first step is to support multiple toolkits. This can be accomplished in many ways and I advise to build a "common API" across the different toolkits. For example, in one of the first steps, Lomap requires the construction of a molecule database. This database could be populated by reading molecules from a given file format. Therefore, it must exist somewhere a reading function that accomplish this task. The reading function should be part of the “common API” however, when it is invoked the portion of code related to the selected toolkit will be executed. The common API is a sort of pillow between the users and the effective toolkit in use. The toolkit selection happens when the Lomap module is loaded. In case multiple toolkits are present one will be defined as “DEFAULT”. The user will be allowed to switch at running time between toolkits, but particular attention will be required to not invalidate previously built structures with a different toolkit (this functionality could create issues and I do not think that is an important user scenario). The toolkit selection will load the “common API” which will automatically point to the exposed toolkit functionalities (classes, functions etc.) The drawbacks to have a common API that I can spots are:

I'm assuming that the same tasks/science can be accomplished with different toolkits
symmetricity in the API across the toolkits

Support to NOT hardcoded rules to score molecules and to generate graphs

This is the most important key point and although a lot of thinking I’m still not sure what is the best thing to do. Here I was looking for API simplicity and flexibility. What I have come up so far is the following:
the user can define or use rules from a repository. A rule is a function that execute a specific simple task. Rules will have a generic number of arguments and can return a generic number of outputs. Rules can operate on different objects. So far, the Lomap rules are operating on two molecules only returning a number. Rules can be combined together in an “algorithm” to accomplish a set of simple tasks. When the Lomap module is loaded the rules are loaded as well; users can add new rules at running time (here we can have also asymmetry between the toolkits loading different rules based on the default toolkit). Rules can manage set of molecules retuning numbers other molecules or graphs or combination of them. The user defines at running time “algorithms” that mixes the rules or use predefined “algorithms” from a repository. I think this is a quite general idea that should handle large class of problems that we would like to tackle.

Catch up with the latest package dependencies

Some important packages dependency updates have been done along the past years and we need to catch up with these updates. Networkx 2.2 has broken the code in the graph generation section but I think @ppxasjsm has fixed it updating also the graph plotting with new design features not based on the old pyqt code. We can incorporate all these changes but at the end of the large refactoring and in the meanwhile users should use the previous update @ppxasjsm 's version.

Please comments on the previous key points and when we will agree on them, we can start the development. I’m willing to work on the code (compatibly with my working schedule) and when the refactoring will be completed I would/have to work on the development of the OpenEye side the most.

Best

Discuss support of rule sets and future handling of rules

A few questions how to handle rules came to mind.

Would we want to support rule sets? Not because they need to be implement right now but just to prepare the code for the possibility. Rule sets are distinct from each other because they have, in whole or partially, a different set of individual rules. The current set is basically all MCS based. To make up a, possibly, not totally useless example, another rule set could be based on, say, fingerprints or shapes and such. Probably irrelevant for RAFEs but can you see a use case for rules set, other sets?

Optionality and order, which also means how much influence can/should a user have. In the current rule set the same-charge-rule could be optional (ignoring for the moment how clever that would be), maybe others? How much do rules need to run in order, i.e. to we need to impose sequentiality or is it possible to make rules order-independent? The final score is a product and as such order doesn't matter.

Update to not pin to specific PyQt4 versions

We've had to pin to a specific PyQt4 version, as well as certain specific versions of other software, because of API changes. When someone is back on developing this in more detail, we should bring things up to the more current versions.

add the function to layout infomation (filename, score, conectivity) in txt files

Add the layout_info function to output the information in txt format including ligand index (corresponding to the index in the graph), ligand filename, strict, loose, charge score, and connectivity (corresponding to the graph).

Update documentation for output of mapping info; handling of different charges

We implemented changes allowing easy access to/output of mapping info (#9 ), and for planning calculations involving mapping between molecules of different charges when an expert desires (#10) . We need to update the documentation/README.md to reflect this.

can't read reference graph pickle

The file "test/basic/molecules.gpickle" does not seem to contain a networkx readable graph. With networkx 2.2 I get an error of:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-11-9ad07ad142b9> in <module>
----> 1 nx.read_gpickle('test.gpickle')

<decorator-gen-708> in read_gpickle(path)

~/miniconda3/envs/lomap2/lib/python3.6/site-packages/networkx/utils/decorators.py in _open_file(func_to_be_decorated, *args, **kwargs)
    238         # Finally, we call the original function, making sure to close the fobj
    239         try:
--> 240             result = func_to_be_decorated(*new_args, **kwargs)
    241         finally:
    242             if close_fobj:

~/miniconda3/envs/lomap2/lib/python3.6/site-packages/networkx/readwrite/gpickle.py in read_gpickle(path)
     99     .. [1] https://docs.python.org/2/library/pickle.html
    100     """
--> 101     return pickle.load(path)
    102 
    103 # fixture for nose tests

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd5 in position 2: ordinal not in range(128)

With networkx 1.1 I can read the file, but the resulting graph does not have any graph like attributes, such as edges, nodes and adjacency information.

Documentation

It would be nice to have an automatically built online documentation. The docstrings are all formatted well to achieve this.

Colouring code for connecting compounds in FEP

Dear all,

I would like to ask whether there is a reason that the graph produced by lomap sometimes contains molecules connected by light cyan lines and sometimes by red likes. Despite my efforts, I could not find any document mentioning this difference.

Best,

Agisilaos

Handle chirality

Due to difficulties with maximal common substructures and chirality, LOMAP currently deals with chiral centers in a rather extreme manner -- specifically, the chiral center is deleted before doing the mapping, then only the largest fragment of the resulting molecule is retained. While this "works" (it produces mappings that are valid) in the cases of larger molecules with chiral centers in the middle of a scaffold, it will result in extreme transformations even when such transformations are not needed.

Chirality handling will need to be updated carefully, probably by modifying the MCSS search to be chiral and/or using coordinate information to identify which stereoisomer is being considered and include this when selecting which match to use.

Network figure can have bad resolution

Another minor issue is the resolution of the image in PNG format which is drawn by the method graphen.draw()

The image which is produced is very nice and useful, but in my applications the quality of the image is relatively poor, e.g. the labels are almost unreadable. This is the case both when displaying the figure and saving the figure with the command

plt.savefig('graph.png', facecolor=fig.get_facecolor())

I have played with the numerous image parameters in that function, but the required choice is not obvious too me and seems non-trivial.

MCS test fails with most recent version of rdkit

The test for MCS passes with RDKIT version 2018.03, but fails with version 2018.09. Something worth investigating.

For now, I'd suggest dependency of rdkit<=2018.03

graph visualisation issues

Running the example script on the networkx2 branch I get the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-01daf4821019> in <module>()
----> 1 nx_graph = db_mol.build_graph()

~/miniconda3/envs/lomap/lib/python3.5/site-packages/lomap-0.0.0-py3.5.egg/lomap/dbmol.py in build_graph(self)
    588         # Display the graph by using Matplotlib
    589         if self.options.display:
--> 590             Gr.draw()
    591 
    592         return self.Graph

~/miniconda3/envs/lomap/lib/python3.5/site-packages/lomap-0.0.0-py3.5.egg/lomap/graphgen.py in draw(self)
   1080         # print 'Graph .png file has been generated...'
   1081 
-> 1082         plt.show()
   1083 
   1084         return

~/miniconda3/envs/lomap/lib/python3.5/site-packages/matplotlib/pyplot.py in show(*args, **kw)
    242     """
    243     global _show
--> 244     return _show(*args, **kw)
    245 
    246 

~/miniconda3/envs/lomap/lib/python3.5/site-packages/ipykernel/pylab/backend_inline.py in show(close, block)
     37             display(
     38                 figure_manager.canvas.figure,
---> 39                 metadata=_fetch_figure_metadata(figure_manager.canvas.figure)
     40             )
     41     finally:

~/miniconda3/envs/lomap/lib/python3.5/site-packages/ipykernel/pylab/backend_inline.py in _fetch_figure_metadata(fig)
    172     """Get some metadata to help with displaying a figure."""
    173     # determine if a background is needed for legibility
--> 174     if _is_transparent(fig.get_facecolor()):
    175         # the background is transparent
    176         ticksLight = _is_light([label.get_color()

~/miniconda3/envs/lomap/lib/python3.5/site-packages/ipykernel/pylab/backend_inline.py in _is_transparent(color)
    193 def _is_transparent(color):
    194     """Determine transparency from alpha."""
--> 195     rgba = colors.to_rgba(color)
    196     return rgba[3] < .5

AttributeError: module 'matplotlib.colors' has no attribute 'to_rgba'

All dependencies were installed using conda, i.e.
conda install networkx
conda install graphviz
conda install pyQt=4

matplotlib version: '1.5.1'
networkx version: '2.2'
python version 3.5

Initialisation was done with:

db_mol = lomap.DBMolecules('path', parallel=12, display=True, output=True, verbose='info')

Code splitting

I opened this issue to discuss if and how to split the new Lomap code developed so far.

(1) As discussed by emails Lomap should work as terminal application and API. I decided to merge these two parts in one file (dbmol.py) to avoid the double user input checking which we would love to keep. We agreed to come back to the original solution dividing this part into two different files: a main script for the terminal part and the other for the API. I already started coding it and if you don’t have further concern I would like to complete it.

(2) Currently the dbmol.py file hosts the main data classes used in the code:
DBMolecules responsible for the molecule bookkeeping;
Molecule as a container for each molecule;
SMatrix a basic symmetric matrix container.

DBMolecules is also responsible to calculate the loose and strict matrices and coordinates the codec execution. It is not clear to me how semantically you would like to divide it. Do you want to create a file for each class? The file it is not huge but I’m opened to other ideas. After the spitting of point (1) the code coordination will be passed out to this file as well.

Total charge comparison threshold

I loose the total charge comparison threshold from 1e-3 to 1e-2 and commit to shuaidev branch. It improves our test cases performance (put molecules which we thought have identical net charge into the same group) while more validation may needed. If this loose threshold cause any problems, we could discuss here or change to a net charge comparison instead of summing all partial charges which we are using in the code now?

output pickle files containing DBMolecules database with graph info

This enhancement will generate the pickle files containing the DBMolecules object with the graph information. Good for avoiding regenerating the matrics/graph.

Add a fingerprint similarity option to calculate the structure similarity

The MCSS calculation could run up to 20 second per pair. For a large dataset, ~1000 ligands, the matrix of pairwise mcss search will take long time to finish. So add a fast fingerprint similarity calculation as an alternative structural similarity estimator.

Include substantial test set, output graphics

@shuail - the different mapping options would be a lot easier to explain/document if we could provide output mappings illustrating them. It would be a lot easier to do THAT if we had a substantial test set in here. Do you guys have one you're using which could actually be deposited in the repo? Maybe we should be using the "JACS" set from Schrodinger and deposit the ligands here? Thoughts?

change the max_mol_size in graph layout

I changed the max_mol_size from 11 to 50 angstrom to accommodate larger molecule for our test cases. And also increase the molecule graph size from (100, 100) to (200, 200) to get better resolution. So far there is no bad effect yet while if more test cases yield a bad layout, we could discuss here.

Change the code to be compatible with networkx 2.0

In #28 pull request, we pin networkx to older version 1.11 to let the test pass. Long termly, we may need to change the code to be compatible with the latest version of networkx.

Allow use of coordinates to select MCSS matching when desired

Currently, matching for scoring (and for setting up transformations) is done entirely based on MCSS search. In some cases this will be undesirable, such as when a ligand binding mode is expected to be well known for all ligands in a series and is indicated in the input coordinates. In such cases, it might be preferred to use the input coordinates to bias the MCSS matching process rather than going with a "pure MCSS" approach. This should be implemented and made available as an option.

Molecule RDkit sanitization

As discussed by emails currently the code requires the sanitization of the loaded molecules. This is necessary and used in different part of the current code (mainly in the mcs.py file). So far the code requires information about rings and ring aromaticity (the code is following the original paper and the original code approach). I can show examples where without the proper sanitization RDkit is not able to extract ring info. Therefore code like this for example:

ring_info = mol.GetRingInfo()
rings = ring_info.AtomRings()

for ring in rings:
 if idx in ring:
   found = True
   break

is not always going to work without a proper sanitization. I’m going to double check this to see if we can use some other Rdkit functionality. In addition apparently we would like to skip the ring aromaticity part (to allow transformations flat to non flat rings (please correct me if I’m wrong)) keeping just the extraction of the ring info. This will require many changes in the code because the ring aromaticity was a key feature and need to be expensively discussed here before I’m going to modify this part.

Add a fast graphing option

For a dataset with ~1000 ligand, the graphing minimization algorithm will take long time to finish since the initial graph contains large amount of connections. It will be good to have an option to grow the graph from a least connected graph to a larger graph.

Error in installing Lomap with networkx 2/fix networkx 2 compatibility

I tried to install Lomap using conda following the installation instructions from the README.md. It appeared to install without any trouble, but when I ran the test I got an error in the build_graph function (see the output from test_lomap.py below). What do I need to do to get Lomap correctly installed and running?

Thanks, Sarah

.usage: LOMAPv1.0 [-h] [-p PARALLEL] [-v {off,info,pedantic}] [-t TIME]
[-e ECRSCORE] [-o] [-n NAME] [-d] [-m MAX] [-c CUTOFF]
directory
LOMAPv1.0: error: argument -p/--parallel: invalid int value: '-1.5'
usage: LOMAPv1.0 [-h] [-p PARALLEL] [-v {off,info,pedantic}] [-t TIME]
[-e ECRSCORE] [-o] [-n NAME] [-d] [-m MAX] [-c CUTOFF]
directory
LOMAPv1.0: error: argument -v/--verbose: invalid choice: 'err_option' (choose from 'off', 'info', 'pedantic')
usage: LOMAPv1.0 [-h] [-p PARALLEL] [-v {off,info,pedantic}] [-t TIME]
[-e ECRSCORE] [-o] [-n NAME] [-d] [-m MAX] [-c CUTOFF]
directory
LOMAPv1.0: error: argument -t/--time: invalid int value: '-1.5'
usage: LOMAPv1.0 [-h] [-p PARALLEL] [-v {off,info,pedantic}] [-t TIME]
[-e ECRSCORE] [-o] [-n NAME] [-d] [-m MAX] [-c CUTOFF]
directory
LOMAPv1.0: error: argument -m/--max: invalid int value: '-5.0'
usage: LOMAPv1.0 [-h] [-p PARALLEL] [-v {off,info,pedantic}] [-t TIME]
[-e ECRSCORE] [-o] [-n NAME] [-d] [-m MAX] [-c CUTOFF]
directory
LOMAPv1.0: error: argument -m/--max: invalid int value: 'string'
..E....

ERROR: test_graph (main.TestLomap)

Traceback (most recent call last):
File "test_lomap.py", line 92, in test_graph
graph = db.build_graph()
File "/home/skfegan/miniconda2/lib/python2.7/site-packages/lomap-0.0.0-py2.7.egg/lomap/dbmol.py", line 531, in build_graph
Gr = graphgen.GraphGen(self)
File "/home/skfegan/miniconda2/lib/python2.7/site-packages/lomap-0.0.0-py2.7.egg/lomap/graphgen.py", line 164, in init
self.connectSubgraphs()
File "/home/skfegan/miniconda2/lib/python2.7/site-packages/lomap-0.0.0-py2.7.egg/lomap/graphgen.py", line 524, in connectSubgraphs
connectSuccess = self.connectGraphComponents_brute_force()
File "/home/skfegan/miniconda2/lib/python2.7/site-packages/lomap-0.0.0-py2.7.egg/lomap/graphgen.py", line 586, in connectGraphComponents_brute_force
similarity = self.dbase.loose_mtx[nodesOfI[k],nodesOfJ[l]]
File "/home/skfegan/miniconda2/lib/python2.7/site-packages/networkx/classes/reportviews.py", line 178, in getitem
return self._nodes[n]
KeyError: 0

Ran 8 tests in 4.037s

FAILED (errors=1)

Add support to python 3

The code must support python 3 as well

Confusion about example output

If I run the example script I get an output as seen in the attached image. Two molecules and the detected MCSS. To me this looks wrong, but maybe I am just misunderstanding the output I am supposed to get.
Should I not allow for rings to break and therefore one of the rings should be detected in the MCSS?

MCS atom mappings and hydrogens

For my applications I need the mappings between the atom indices of the pairs of molecules which are connected in the in network graph. I.e. a list of pairs of integers. For instance, suppose there are two molecules each having 3 atoms, and the two molecules share two atoms, I need a mapping of the form

1 1
2 3

where the first column contains the indices of the "common" atoms of the first molecule and the second column contains the indices of the common atoms in the second molecule.

Your code seems to be able to provide this information, but due to the yet minimal documentation (and lack of time to study the source code thoroughly) I just added a quick hack which gave me the information I was looking for (which might not be the most elegant way). I got the information (written to files) by adding the last three of the following lines to the code in the dbmol.py:

# Maximum Common Subgraph (MCS) calculation    
MC = mcs.MCS(moli, molj, options=self.options)
with open("mcs_mapping_" + str(i) + "_" + str(j), "w") as mappingFile:
    for item in MC._MCS__map_moli_molj:
        mappingFile.write(str(item[0]) + " " + str(item[1]) + "\n")

Now the problem is that this outputs only the MCS mappings without the hydrogen atoms, but I also need the hydrogens included in mappings. How can this be done?

In the file dbmol.py, line 310, is also a small bug/typo:
loggign.warning(30*'-')
which results the termination of the program when the warning is going to be issued.

Finish getting `dev` branch up to speed and merged into master

See discussion here:
#34 (comment)

rdkit cannot kekulize mol to remvoe hydrogens

Here is an example of the rdkit cannot kekulize the molecule when it try to remove the hydrogen atoms. To simplify the problem, I just copy and paste the molecule and two line of codes for reading in the molecule as an rdkit mol object and remove the hydrogens. The codes are borrowed from the LOMAP code, so if running using the LOMAP code, it will have the same warning message.

The example.mol2 file looks like below:

@MOLECULE
example
46 49 0 0 0
SMALL
GASTEIGER

@ATOM
1 C -4.5556 -0.2844 1.1718 C.3 1 LIG1 -0.0109
2 C -6.0291 -0.7271 1.2334 C.3 1 LIG1 0.0493
3 C -6.4413 -0.5958 -1.0493 C.3 1 LIG1 0.0493
4 C -5.1977 0.3130 -1.1927 C.3 1 LIG1 -0.0109
5 C 5.5992 -2.5640 -0.8780 C.ar 1 LIG1 -0.0253
6 O -6.3822 -1.4588 0.0764 O.3 1 LIG1 -0.3796
7 C 2.8943 1.6722 0.9911 C.ar 1 LIG1 0.2664
8 C 5.1745 -2.0407 0.3480 C.ar 1 LIG1 0.1371
9 C -1.6179 0.4017 0.1577 C.ar 1 LIG1 0.2173
10 C -4.0573 -0.1702 -0.2838 C.3 1 LIG1 0.0275
11 C 0.8767 -0.2307 1.1489 C.ar 1 LIG1 0.0370
12 C 2.1438 -0.5325 1.6439 C.ar 1 LIG1 -0.0306
13 C 6.1958 -1.7294 -1.8279 C.ar 1 LIG1 -0.0590
14 C 6.3717 -0.3702 -1.5525 C.ar 1 LIG1 -0.0605
15 C 5.9487 0.1564 -0.3282 C.ar 1 LIG1 -0.0452
16 C 0.6358 1.0320 0.5744 C.ar 1 LIG1 0.1483
17 C -0.1716 -1.1537 1.2042 C.ar 1 LIG1 0.0418
18 C 3.1618 0.4153 1.5592 C.ar 1 LIG1 0.0780
19 C 5.3424 -0.6749 0.6231 C.ar 1 LIG1 0.0480
20 C 1.3530 3.2786 -0.1013 C.3 1 LIG1 0.0167
21 F 4.6032 -2.8623 1.2640 F 1 LIG1 -0.2043
22 S 4.7969 0.0115 2.1898 S.3 1 LIG1 -0.0812
23 N -1.3906 -0.8211 0.7091 N.ar 1 LIG1 -0.2222
24 O 3.8206 2.5277 0.9363 O.2 1 LIG1 -0.2664
25 N 1.6412 1.9659 0.5033 N.ar 1 LIG1 -0.2949
26 N -0.6088 1.3106 0.0937 N.ar 1 LIG1 -0.1964
27 N -2.9091 0.7394 -0.3655 N.pl3 1 LIG1 -0.3104
28 H -3.9262 -1.0225 1.7144 H 1 LIG1 0.0305
29 H -4.4544 0.6942 1.6907 H 1 LIG1 0.0305
30 H -6.1785 -1.3738 2.1237 H 1 LIG1 0.0560
31 H -6.6965 0.1565 1.3647 H 1 LIG1 0.0560
32 H -7.3658 0.0220 -1.0063 H 1 LIG1 0.0560
33 H -6.5227 -1.2302 -1.9574 H 1 LIG1 0.0560
34 H -4.8575 0.3261 -2.2513 H 1 LIG1 0.0305
35 H -5.4753 1.3532 -0.9112 H 1 LIG1 0.0305
36 H 5.4676 -3.6168 -1.0922 H 1 LIG1 0.0646
37 H -3.7461 -1.1771 -0.6436 H 1 LIG1 0.0500
38 H 2.3428 -1.4998 2.0895 H 1 LIG1 0.0638
39 H 6.5237 -2.1362 -2.7758 H 1 LIG1 0.0618
40 H 6.8363 0.2748 -2.2870 H 1 LIG1 0.0618
41 H 6.0904 1.2094 -0.1219 H 1 LIG1 0.0630
42 H -0.0243 -2.1352 1.6372 H 1 LIG1 0.0838
43 H 2.2342 3.9528 -0.1073 H 1 LIG1 0.0457
44 H 0.5450 3.7853 0.4685 H 1 LIG1 0.0457
45 H 1.0258 3.1432 -1.1544 H 1 LIG1 0.0457
46 H -3.0166 1.6655 -0.8392 H 1 LIG1 0.1492
@BOND
1 1 2 1
2 1 10 1
3 2 6 1
4 3 4 1
5 3 6 1
6 4 10 1
7 5 8 ar
8 5 13 ar
9 7 18 ar
10 7 24 2
11 7 25 ar
12 8 19 ar
13 8 21 1
14 9 23 ar
15 9 26 ar
16 9 27 1
17 10 27 1
18 11 12 ar
19 11 16 ar
20 11 17 ar
21 12 18 ar
22 13 14 ar
23 14 15 ar
24 15 19 ar
25 16 25 ar
26 16 26 ar
27 17 23 ar
28 18 22 1
29 19 22 1
30 20 25 1
31 1 28 1
32 1 29 1
33 2 30 1
34 2 31 1
35 3 32 1
36 3 33 1
37 4 34 1
38 4 35 1
39 5 36 1
40 10 37 1
41 12 38 1
42 13 39 1
43 14 40 1
44 15 41 1
45 17 42 1
46 20 43 1
47 20 44 1
48 20 45 1
49 27 46 1

And the example.py code looks like

from rdkit.Chem import AllChem
from rdkit import Chem

rdkit_mol = Chem.MolFromMol2File("example.mol2", sanitize=False, removeHs=False)
mol = AllChem.RemoveHs(rdkit_mol)

If running the example.py, it will return an error as below:

ValueError: Sanitization error: Can't kekulize mol. Unkekulized atoms: 8 10 11 15 16 17 22 24 25

It seems rdkit cannot understand the molecules when it try to remove the hydrogens, probably related to the format of the mol2 file? I use openbabel to convert the mol2 file from an sdf file.

Rule mncar does not make any sense

The central bit of the rule is

scr_mncar = float((nha_mcs_mol >= ths) or (nha_moli + 3) or (nha_molj + 3))

where the variables are number of atoms and as such either positive integers or zero. ths is a threshold value and 4 by defaults This means that the first expression is either True or False. If it is False the second expression will be evaluated and will be a positive integer. As such the third expression will never be evaluated.

So in sum, scr_mncar will be either 1.0 or float (nha_moli + 3).

What's the point of that?

Enhacement questions

Hi,

would it make sense to add some of these enhancements?

automatic versioning using something like versioneer
switch to pytest rather than nose
offer a pypi install rather than the outdated conda?

In terms of coding style there seems quite a vast mix of camelcase and python style naming of functions. Would it make sense to make this more uniform?

Problem specifying the hub molecule

Dear all,

I am running lomap on a set of molecules and it runs fine using python 2.7. I have the following problem though: when I try to specify the hub molecule, so as to estimate the similarity scores of all the rest of the molecules with respect to the hub molecule I do not get the desired results. The output is the same graph as the one I get without specifying the hub option. I use the following command:

lomap -v pedantic -b system_01.mol2 -a ~/molecules/test

Can anyone please tell me what might be the problem?

Sincerely,

Agisilaos

improve the drawing function to layout the 2D structure

Original code will skip the cases where the rdkit cannot kekulize molecules correctly and so could not layout the image in the final graph properly for those molecules. Looking at this in details, the "cannot kekulize error" happens in AllChem.RemoveHs (rdkit has a bad hydrogen detection?). So here, for molecules failed the AllChem.RemoveHs function, just skip this function but still compute the 2D coordinate and draw to image files. It improve the layout and all the molecules could be drew in the final graph and some molecules which original cannot be kekulized will have hydrogens.

Issues with current perturbation map generation

[dummy issue will expand tomorrow, but needed it to reference]

Installing lomap via mobleylab conda channel

Hi,

I am trying to use the conda install of lomap from the Mobley lab channel is this still a valid way to install things?
I find it quite worrying that conda can simply downgrade my python from 3.5 to 2.7:

The following packages will be DOWNGRADED:

    python:                        3.5.5-h5001a0f_2      conda-forge --> 2.7.15-h43f7c74_0 conda-forge

If you don't know what you are doing this may mess with your conda environments quite badly.

Output problems for example.py and import problems for dbmol.py

Dear Lomap,

I had some problems when I tried to run the python scripts of lomap, please see the following for detial.

(1) #LINUX VERSION
LSB Version: core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-x86_64:core-3.2-x86_64:core-4.0-x86_64:desktop-4.0-amd64:desktop-4.0-noarch:graphics-2.0-amd64:graphics-2.0-noarch:graphics-3.2-amd64:graphics-3.2-noarch:graphics-4.0-amd64:graphics-4.0-noarch
Distributor ID: SUSE LINUX
Description: SUSE Linux Enterprise Server 11 (x86_64)
Release: 11
Codename: n/a

(2) #LOMAP INSTALLATION
conda config --add channels nividic
conda create -c nividic -n my-lomap lomap
source activate my-lomap

#all prerequisites were installed
conda list
boost 1.56.0 py34_1 nividic
graphviz 2.38.0 1 mobleylab
lomap 0.0.0 py34_0 nividic
pygraphviz 1.4rc1 py34_0 nividic
rdkit 2015.09.2 np110py34_0 nividic
matplotlib 1.5.1 np110py34_0
networkx 1.11 py34_0
pyqt 4.11.4 py34_4
…..

(3) python example.py
#basic or radial
#db_mol = lomap.DBMolecules('test/basic/', output=True)
#db_mol = lomap.DBMolecules('test/radial/', output=True)
#outputs are as following
out.txt
out_score_with_connection.txt
mcs.png

Question One: Why didn’t the script output all results? Such as out.png, out.pickle, out.pdf, out.eps, and out.doc?

(4) #open dbmol.py and input commands
python

from lomap import mcs
Traceback (most recent call last):
File "", line 1, in
ImportError: cannot import name 'mcs'
from lomap import graphgen
Traceback (most recent call last):
File "", line 1, in
ImportError: cannot import name 'graphgen'

Question Two: What caused these problems and how to solve them?

Thank you and look forward to your reply.

How to plot the graph?

Hi, I've found this picture generated by this package in paper. But, I can only get the following picture, what is the right way to plot the image with the molecule structure as the node?

Allowing different charges in the network graph

Lomap seems to connect molecules only if they have the same charges.

However, I do need to connect also molecules with different charges for experimental purposes. How is it possible to disable this restriction? (I am aware that it is recommended not to do change charges in alchemical free energy simulations.)

Developper guide as a wiki page?

Everyone good with using a wiki page on github for the developer guide?

add the radial option in graphing

I add the radial graphing options (the commit in shuaidev in 07/11). The idea is that if we have a lead compounds and the rest compounds are derivatives from the lead compounds, we want to have direct calculations from all derivatives to this lead compound and also create a cycle in between the derivatives. So the layout is similar to a bicycle and we name this option as radial option. I made changes to the code to have two radial options below:

The Complete radial option will pick the radial center (lead compound) automatically by the structure similarity: the ligand shared the most structural similarity with others will be picked as lead compound.

The Biased radial option will take the custom specified compound as the lead compound.

Wishlist

This is a wishlist of what features I would find useful in LOMAP [currently incomplete will expand in the next couple of days]:

When a network is drawn keep molecules with the same orientation for the MCS (See attached image, where even the same groups are pictured in different ways)
Rather than using Node ID give a node the ID of the filename. Then I don't have to go back to the text file that contains the dictionary of the node ID and filename mapping.
write out edges that are in a csv style, i.e. filename, filename. The additional information with regard to similarity etc is also useful, however, If I want to quickly generate perturbations based on mol2 files I just need a list of edges that are dependent on filenames.
I would like easy incorporation with BioSimSpace, so we can use an automated generation of mappings. Currently, about 50% of the autogenerated mappings are unlikely to work in a free energy code, see issue #46.

What would be really cool:
have an output that could be run in some kind of interactive network display environment (Javascript?), where only on click on nodes expand and you can see the structure otherwise only filenames and maybe similarity scores are displayed.