Giter Site home page Giter Site logo

xyz2mol's Introduction

xyz2mol has now been implented in RDKit

raw_mol = Chem.MolFromXYZFile('acetate.xyz')
mol = Chem.Mol(raw_mol)
rdDetermineBonds.DetermineBonds(mol,charge=-1)

Convert Cartesian coordinates to one or more molecular graphs

Given Cartesian coordinates in the form of a .xyz file, the code constructs a list of one or more molecular graphs. In cases where there are several possible resonance forms xyz2mol returns a list of all, otherwise just a list of one.

This code is based on the work of DOI: 10.1002/bkcs.10334

Yeonjoon Kim and Woo Youn Kim
"Universal Structure Conversion Method for Organic Molecules:
From Atomic Connectivity to Three-Dimensional Geometry"
Bull. Korean Chem. Soc.
2015, Vol. 36, 1769-1777

Setup

Depends on rdkit, numpy, and networkx. Easiest to setup via anaconda/conda:

conda install -c conda-forge xyz2mol

Setup for a standalone enviroment is avaliable via Makefile. To setup and test simply clone the project and make.

git clone https://github.com/jensengroup/xyz2mol

and then run the following the the xyz2mol folder

make
make test

Note, it is also possible to run the code without the networkx dependencies, but is slower.

Example usage

Read in xyz file and print out the SMILES, but don't incode the chirality.

xyz2mol.py examples/chiral_stereo_test.xyz --ignore-chiral

Read in xyz file and print out the SDF format, save it in a file

xyz2mol.py examples/chiral_stereo_test.xyz -o sdf > save_file.sdf

Read in xyz file with a charge and print out the SMILES

xyz2mol.py examples/acetate.xyz --charge -1

Dependencies:

rdkit # (version 2019.9.1 or later needed for huckel option)
networkx

xyz2mol's People

Contributors

charnley avatar ffmulks avatar jhjensen2 avatar koerstz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xyz2mol's Issues

fixed exploring all bond orders with differents starting atom

the changed lines:

def get_bo(ac, p_ua, du, valences):
bo = ac.copy()
ua = list(p_ua)
while len(du) > 1:
ua_pairs = itertools.combinations(ua, 2)
for i, j in ua_pairs:
if bo[i, j] > 0:
bo[i, j] += 1
bo[j, i] += 1
break
bo_valence = list(bo.sum(axis=1))
ua_new, du_new = get_ua(valences, bo_valence)
if du_new != du:
ua = [ua_i for ua_i in p_ua if ua_i in ua_new]
du = copy.copy(du_new)
else:
break
return bo
.
.

# implemenation of algorithm shown in Figure 2
# UA: unsaturated atoms
# DU: degree of unsaturation (u matrix in Figure)
# best_BO: Bcurr in Figure 

is_best_bo = False
for valences in valences_list:
ac_valence = list(ac.sum(axis=1))
ua, du_from_ac = get_ua(valences, ac_valence)
if len(ua) == 0 or bo_is_ok(ac, ac, charg, du_from_ac,
atomic_valence_electrons,
atomic_num_list,
charged_fragments):
best_bo = ac.copy()
break
ua_perm = itertools.permutations(ua)
for a in ua_perm:
bo = get_bo(ac, a, du_from_ac, valences)
if bo_is_ok(bo, ac, charg, du_from_ac,
atomic_valence_electrons,
atomic_num_list,
charged_fragments):
best_bo = bo.copy()
is_best_bo = True
break
elif bo.sum() > best_bo.sum():
best_bo = bo.copy()
print('best comb not found')
if is_best_bo:
break
.
.
.

atomic valence of O atom inconsistent with RDkit

Line 56 in xyz2mol.py:
atomic_valence[8] = [2,1,3]

That will cause failure in parse some structures. I suggest remove 3 from the allowed valence list.

Complain from RDkit.
[16:14:52] Explicit valence for atom # 1 O, 3, is greater than permitted

NameError: name 'defaultdict' is not defined

"import xyz2mol" gives following error: "NameError: name 'defaultdict' is not defined"
Am I doing something wrong or did the last commit cause this issue ("Removed defaultdict import")?
(see: 14068eb)

My use case:
I want to read .xyz file and convert it to rdkit mol file.

best regards,
Urmas

Stack trace:

NameError Traceback (most recent call last)
in ()
----> 1 import xyz2mol

/usr/local/lib/python3.7/dist-packages/xyz2mol.py in ()
47 global atomic_valence_electrons
48
---> 49 atomic_valence = defaultdict(list)
50 atomic_valence[1] = [1]
51 atomic_valence[5] = [3,4]

NameError: name 'defaultdict' is not defined

Exceptionally long runtime for a few organic structures from materials project

Hello,

first of all thanks for this great script! It is really useful and does a good job at solving this tricky task.

I was using it on a few thousand organic molecules from the materials project database and realized that a few structures always lead to exceptionally long runtime (~10 minutes compared to <1 second for most other molecules).

  • Python 3.7.5
  • RDKit 2019.09.2

I'm calling with:
python xyz2mol.py molecule.xyz --use-huckel --charge 0
When I set --no-charged-fragments the calculation instead takes only ~1 minute but this is still a lot longer than for other structures.

Do you have any idea why these structures take so long? Is there anything I could do about it?

You can find the .xyz of two example structures and the resulting SMILES strings below:

  • 1 N#CC(C#N)=C1c2cc([N+](=O)[O-])cc([N+](=O)[O-])c2-c2c1cc([N+](=O)[O-])cc2[N+](=O)[O-]
34
Properties=species:S:1:pos:R:3 pbc="F F F"
C        0.73692500       0.51863000      -0.03802500
C       -0.73709800       0.51853500       0.03824500
C        1.17836400      -0.83219800       0.04762500
C        1.70690900       1.51141400      -0.22401300
C       -1.17839500      -0.83234800      -0.04754000
C       -1.70721100       1.51120700       0.22420600
C        0.00006400      -1.72012500       0.00007400
C        2.53564000      -1.15199000       0.09649200
C        3.06628200       1.21639600      -0.15076800
N        1.37416700       2.88102000      -0.66635000
C       -2.53564500      -1.15224700      -0.09678300
C       -3.06652800       1.21608200       0.15058700
N       -1.37459700       2.88074400       0.66686200
C        0.00030300      -3.09187400       0.00024400
C        3.45228900      -0.10548900       0.03352700
O        0.38166200       2.99492000      -1.38745000
O        2.13692500       3.78391800      -0.35163400
C       -3.45237300      -0.10580900      -0.03401100
O       -0.38185500       2.99457700       1.38765000
O       -2.13743300       3.78365300       0.35239600
C       -1.18832100      -3.88288100      -0.11103400
C        1.18921900      -3.88238300       0.11179100
N        4.89748200      -0.41699000       0.11258300
N       -4.89754100      -0.41737300      -0.11356500
N       -2.12696000      -4.56470000      -0.20142600
N        2.12802200      -4.56397100       0.20223900
O        5.20955100      -1.59603200       0.25743400
O        5.68097700       0.52605800       0.03167900
O       -5.20951000      -1.59645600      -0.25831000
O       -5.68110900       0.52560000      -0.03243400
H        2.90408600      -2.16593900       0.17608200
H        3.80486200       2.00100800      -0.26123200
H       -2.90394700      -2.16622200      -0.17654900
H       -3.80521500       2.00060700       0.26094000
  • 2 [NH2+]=C1N=CN=C2N3[C@@H]4O[C@H](CO[P@@](=O)(O[P@](=O)(O)O[P@@]([O-])(O)=[OH+])OC35[N-]C125)[C@@H](O)[C@H]4O
45
Properties=species:S:1:pos:R:3 pbc="F F F"
O        4.36294500      -1.55898200       1.19840700
C        3.10513300      -0.92751400       0.97875100
C        2.86634300      -0.72258900      -0.53503300
C        2.02171300      -1.95959500       1.33356600
N        1.91655400       0.40588800      -0.76397200
O        2.34711500      -1.92674100      -1.03009800
O        2.29221600      -2.71678000       2.48855300
C        1.92958000      -2.80125800       0.04338700
C        2.15543200       1.70089100      -0.30613100
C        0.73195100       0.48380800      -1.46547600
C        0.54836100      -3.34299100      -0.26937900
C        1.06929500       2.46700000      -0.72438200
N        3.21680800       2.13289500       0.38457700
N        0.19196000       1.68276100      -1.45614000
O       -0.39916100      -2.24266900      -0.37758300
C        1.08410500       3.83051700      -0.35954800
C        3.12623500       3.44233300       0.65789300
P       -0.83728200      -1.70962400      -1.81922600
N        0.10761000       4.70578100      -0.70064300
N        2.14223600       4.29011900       0.33496900
O       -2.20702200      -0.94827500      -1.47786400
O        0.23407600      -0.52044500      -2.20670500
O       -0.92585400      -2.67604700      -2.91904500
P       -2.85940700       0.33289900      -0.70924700
O       -1.97321500       0.42381000       0.64791500
O       -2.45515700       1.58735300      -1.56091900
O       -4.30444600       0.14212700      -0.44905300
P       -2.44426400       0.14947500       2.21721200
O       -1.53871300       0.80975600       3.17277700
O       -3.97216200       0.57457000       2.23570700
O       -2.45168500      -1.46410900       2.28946500
H        5.04302400      -0.88269600       1.34022000
H        3.02449600      -0.00216500       1.54945000
H        3.78897400      -0.48062500      -1.06991600
H        1.07625300      -1.43888400       1.50963300
H        3.25561600      -2.86261800       2.51878700
H        2.63099400      -3.64393100       0.10106200
H       -1.44141000       1.72447800      -1.63391000
H        0.53920000      -3.92335800      -1.19561200
H        0.19327800      -3.96716600       0.55497200
H        3.95475600       3.87280900       1.21424500
H       -0.77975900       4.37659500      -1.05328500
H        0.13788800       5.62638600      -0.28299800
H       -4.43938400       0.38504100       1.38103200
H       -1.79620500      -1.78939000       2.92936300

Release on conda

Would you be interested to release this lib on pypi (just the source package would be enough). So I can create a conda package from it?

does this code need hydrogens to be present ?

I am looking for code that a assigns bond orders for structures without hydrogens. I assume xyz2mol needs hydrogens to be present and correct to work. Are you planning to extend this ?

Conversion takes too much time for some molecules

This script takes too much time (more than 30 minutes) for some molecules.

How to reproduce:

  • Python 3.6.5
  • RDKit 2019.09.1 (from anaconda)

Run the following script.

python xyz2mol.py test.xyz

The content of test.xyz is as follows:

61
0FX_3VBL_A
O        -10.51300       36.55700      -27.27400
P        -11.25000       35.54400      -26.43500
O        -10.78200       34.13900      -26.15300
O        -12.65500       35.24600      -27.15400
C        -13.51900       36.30600      -27.56700
C        -14.93800       36.12500      -27.04000
O        -14.90500       35.78700      -25.64900
C        -15.44200       34.90700      -27.80900
O        -16.75100       34.53100      -27.35500
C        -15.29800       35.10000      -29.32200
N        -15.64600       33.87000      -30.01300
C        -13.87800       35.53100      -29.70800
C        -13.59800       35.60000      -31.21200
O        -13.48100       36.68500      -28.94500
O        -11.65300       36.16700      -25.00900
P        -10.58100       36.93800      -24.08500
O         -9.26000       36.18000      -24.12600
O        -11.33600       37.17300      -22.80000
O        -10.36700       38.39500      -24.71900
C        -11.50200       39.22600      -24.97700
C        -11.07500       40.67100      -25.21600
O        -10.51700       41.22000      -24.02100
C         -9.91900       40.67700      -26.21600
O        -10.44500       40.76600      -27.54300
C         -9.09100       41.89000      -25.81000
C         -9.32400       41.96600      -24.30500
N         -8.16000       41.47000      -23.56100
C         -8.17300       40.23100      -23.03100
C         -7.07300       39.76700      -22.32500
C         -7.01100       38.39400      -21.69400
C         -7.09000       42.26400      -23.45200
O         -7.09800       43.41300      -23.96200
N         -5.99900       41.85800      -22.77600
C         -5.99300       40.62500      -22.21800
O         -4.97900       40.25600      -21.60000
H        -13.17510       37.10900      -27.08020
H        -15.50310       36.93190      -27.21210
H        -14.06460       35.28360      -25.44810
H        -14.83410       34.15150      -27.56480
H        -17.44190       35.01140      -27.89520
H        -15.93440       35.81520      -29.61090
H        -15.54980       34.00320      -30.99940
H        -15.03590       33.13600      -29.71470
H        -13.29060       34.79340      -29.37500
H        -12.65230       35.88820      -31.36210
H        -13.74010       34.69780      -31.61930
H        -14.21870       36.25900      -31.63670
H        -12.11810       39.19250      -24.19000
H        -11.97860       38.88690      -25.78810
H        -11.84030       41.22430      -25.54490
H         -9.37620       39.84290      -26.11760
H         -9.69160       40.77020      -28.20060
H         -9.41570       42.71780      -26.26750
H         -8.12280       41.75190      -26.01850
H         -9.47860       42.92170      -24.05440
H         -8.97580       39.64610      -23.14680
H         -6.12840       38.27580      -21.23900
H         -7.74680       38.30190      -21.02310
H         -7.11680       37.69640      -22.40260
H         -5.20450       42.45870      -22.68740
H        -16.59180       33.62440      -29.80050

This xyz file is generated from 0FX_3VBL_A of the Platinum Dataset 2017_01 by Open Babel 3.0.0.

This molecule is actually charged to -1 according to Chem.rdmolops.GetFormalCharge(), but adding correct charge information (replacing the second line to charge=-1=) did not solve the problem.

segmentation fault of RDkit Get3DDistanceMatrix()

I get segmentation faults when using the Get3DDistanceMatrix function of RDkit (updated to latest stable of RDkit, and still getting the issue)

By changing
dMat = Chem.Get3DDistanceMatrix(mol)
in xyz2AC
to
dMat = distance_matrix(xyz, xyz)

and adding from scipy.spatial import distance_matrix in preamble I don't have any issues at all.

Stranger case when convert xyz file to mol

Thank you for your work! Your code helped me a lot.
I have used xyz2mol for convert many file .xyz to mol in rdkit, but when I used it for some molecular in QM9 dataset, it return the result look strange and seem not correct.
strange_mol_case

I have take a picture about this case. I used Rdkit for visualize the mol result and the mol have a H atom that not connect to any atoms. This is my file (sorry for the extension of file, I must change to .txt to upload)
dsgdb9nsd_059827.txt

Could you have an idea for this case?
Something seem wrong and I dont know how to fix.
Thank you a lot!

Add UserWarning for incorrect charge?

If the following check fails

xyz2mol/xyz2mol.py

Lines 507 to 509 in f512673

# If charge is not correct don't return mol
if Chem.GetFormalCharge(mol) != charge:
return []

an empty list is returned by AC2mol and therefore xyz2mol.py terminates without any output whatsoever.

I suggest to add an UserWarning in case this check fails, which prints out both Chem.GetFormalCharge(mol) and charge, to indicate what is happening to the user.

I can open a PR from the code on my fork, unless you prefer to keep the current "silent mode".

'charge_OK' referenced before assignment

With this molecule from the QM9 dataset, the code crashes.

14
gdb 17732	4.70059	2.58463	2.12795	1.0519	59.76	-0.2373	0.0592	0.2964	645.2599	0.108401	-382.406252	-382.40113	-382.400186	-382.435117	21.702	
C	-0.0014375802	 1.4421265986	-0.0783539803	-0.173276
O	 1.4279653634	 1.680945134	-0.0080214605	-0.29881
C	 2.039377242	 0.3725618284	 0.2411593666	 0.062175
C	 1.2851358508	-0.6213188634	-0.6464578571	-0.111225
C	 0.8430229634	-1.6066653822	 0.4586764181	-0.013319
O	-0.6285080863	-1.2983195241	 0.3001821354	-0.387437
C	 0.0746804748	-0.0497085445	 0.1590760325	 0.408626
C	 1.1413118459	-0.3240770307	 1.2671705337	-0.111221
H	-0.5189406562	 2.0037218427	 0.7059437231	 0.118485
H	-0.386047951	 1.7290984788	-1.0622458982	 0.118485
H	 3.1161899719	 0.5067876894	 0.3012405973	 0.091326
H	 1.3044717917	-0.7751572283	-1.7230279892	 0.095211
H	 1.0880849399	-2.6505456812	 0.6392374907	 0.105768
H	 0.99978614	-0.1454833675	 2.3307110179	 0.095211

Here is the stack trace:

python /home/peter/kodesjov/xyz2mol/xyz2mol.py dsgdb9nsd_017732.xyz 
Traceback (most recent call last):
  File "/home/peter/kodesjov/xyz2mol/xyz2mol.py", line 789, in <module>
    mol = xyz2mol(atoms, xyz_coordinates,
  File "/home/peter/kodesjov/xyz2mol/xyz2mol.py", line 710, in xyz2mol
    new_mol = AC2mol(mol, AC, atoms, charge,
  File "/home/peter/kodesjov/xyz2mol/xyz2mol.py", line 478, in AC2mol
    BO, atomic_valence_electrons = AC2BO(
  File "/home/peter/kodesjov/xyz2mol/xyz2mol.py", line 468, in AC2BO
    if not charge_OK:
UnboundLocalError: local variable 'charge_OK' referenced before assignment

missing support for boron atoms

For any compound containing boron atoms the code fails with the following traceback:

Traceback (most recent call last):
  File "test.py", line 201, in <module>
    test_smiles_from_adjacent_matrix(smiles)
  File "test.py", line 108, in test_smiles_from_adjacent_matrix
    new_mol = x2m.AC2mol(new_mol, adjacent_matrix, atoms, charge, charged_fragments, quick)
  File "~/xyz2mol/xyz2mol.py", line 481, in AC2mol
    allow_charged_fragments=allow_charged_fragments)
  File "~/xyz2mol/xyz2mol.py", line 307, in BO2mol
    mol_charge)
  File "~/xyz2mol/xyz2mol.py", line 321, in set_atomic_charges
    charge = get_atomic_charge(atom, atomic_valence_electrons[atom], BO_valences[i])
KeyError: 5

This can be easily reproduced by adding 'B' to __TEST_SMILES__ in test.py.

Failure for Ruthenium complexes

Hello,

Thanks for the great script! I was trying to look if a method exists that could convert my complexes into SMILES format and happened to find this code.

In these particular complexes the Ru has a valency of 5 so I added these 2 lines to the script:

atomic_valence[44] = [5]
atomic_valence_electrons[44] = 8

What I did:
Then I had the XYZ file posted below and used the command python3 xyz2mol.py examples/RuPNP.xyz --charge -0 after placing the XYZ file in the examples folder. When visualizing this result I noticed that it doesn't look like my original complex at all and was wondering if it is due to a mistake in my procedure or if there is a problem with these kind of complexes and SMILES conversion in general.

Input
RuPNP.xyz

59

Ru 15.3980042 10.5490630 5.2033603
H 16.9439317 10.7730840 5.4520114
P 15.7467822 10.3527306 2.9458026
P 14.9265047 11.3401653 7.3234641
O 16.0603873 7.6485323 5.7451690
N 14.2736536 12.1247227 4.6360560
C 14.5298784 11.5516168 2.2257427
H 14.8802410 11.9773868 1.2738167
H 13.6165897 10.9742169 2.0208314
C 14.2307908 12.6388847 3.2622000
H 14.9512548 13.4761988 3.1471242
H 13.2320531 13.0746795 3.0573791
C 13.7876659 13.1571420 5.5562860
H 12.8123790 13.5471668 5.2006236
H 14.4766243 14.0278764 5.5498860
C 13.6278365 12.6270450 6.9819159
H 12.6639566 12.1040191 7.0756667
H 13.6419961 13.4406238 7.7225548
C 16.2852502 12.2723648 8.2092159
H 15.8439149 12.6793921 9.1342785
C 17.4506661 11.3397622 8.5621893
H 17.8198203 10.8290013 7.6611655
H 17.1657364 10.5718051 9.2916701
H 18.2809748 11.9191043 8.9944284
C 16.7710910 13.4398232 7.3379693
H 17.6240801 13.9396163 7.8223041
H 15.9899998 14.1950917 7.1793382
H 17.0997678 13.0763378 6.3533387
C 14.0767878 10.2898563 8.6239594
H 13.1373135 10.0390773 8.1001085
C 14.7987023 8.9710750 8.9257662
H 15.7188447 9.1352633 9.5027507
H 15.0643092 8.4269478 8.0124229
H 14.1467697 8.3232460 9.5319371
C 13.7332503 11.0502086 9.9116889
H 13.0884044 10.4315338 10.5546623
H 13.2030242 11.9932364 9.7177824
H 14.6405895 11.2811454 10.4889444
C 15.3994234 8.7261483 2.0949264
H 16.1679274 8.0563431 2.5157924
C 14.0284390 8.1834535 2.5225436
H 13.9244905 8.1663091 3.6155495
H 13.8974136 7.1578248 2.1458085
H 13.2079949 8.7913866 2.1120321
C 15.5398019 8.7657253 0.5683917
H 16.5550882 9.0336455 0.2459060
H 14.8386391 9.4876193 0.1224554
H 15.3083298 7.7777050 0.1412298
C 17.4065070 10.8737300 2.2713575
H 17.3229432 10.8594744 1.1716855
C 17.7383451 12.3024017 2.7229766
H 17.7288895 12.3718898 3.8202981
H 17.0227734 13.0376813 2.3300824
H 18.7401961 12.5875695 2.3669701
C 18.4902053 9.8846383 2.7182649
H 18.3195573 8.8712523 2.3283327
H 18.5229301 9.8239489 3.8160972
H 19.4775347 10.2164776 2.3623808
C 15.8294397 8.7898140 5.5755584

Output: C[CH+]C.C[CH-]C.C[CH-]C.C[CH-]C.[CH2-]CN1CC[P-][RuH+5]1([P-2])C#[O+]

Additional info:
Operating system: CentOS Linux release 7.3.1611 (Core)
xyz2mol version 723a2fa and x2m_env Anaconda environment was used

Screenshots:

XYZ file
image
SMILES representation after using xyz2mol
image

Comparison with OpenBabel?

Is there a comparison with OpenBabel's bond order assignment algorithm somewhere? Would be super helpful :)

AttributeError: 'module' object has no attribute 'DetectBondStereochemistry'

qiang@qli:/Desktop/xyz2mol-master$ ls
acetate.xyz chiral_stereo_test.xyz ethane.xyz LICENSE README.md test.py xyz2mol.py
qiang@qli:
/Desktop/xyz2mol-master$ python xyz2mol.py ethane.xyz
Traceback (most recent call last):
File "xyz2mol.py", line 432, in
mol = xyz2mol(atomicNumList, charge, xyz_coordinates, charged_fragments, quick)
File "xyz2mol.py", line 408, in xyz2mol
new_mol = chiral_stereo_check(new_mol)
File "xyz2mol.py", line 392, in chiral_stereo_check
Chem.DetectBondStereochemistry(mol,-1)
AttributeError: 'module' object has no attribute 'DetectBondStereochemistry'

folding out list of iterator, can give memory errors

in AC2BO:
valences_list = list(itertools.product(*valences_list_of_lists))
is just used for iterating over, by folding it out first I have experienced memory error from python (even on a machine with plenty of memory).

xyz2mol.py in AC2BO(AC, atomicNumList, charge, charged_fragments, quick)
    281
    282 # convert [[4],[2,1]] to [[4,2],[4,1]]
--> 283     valences_list = list(itertools.product(*valences_list_of_lists))
    284
    285     best_BO = AC.copy()

MemoryError:

I suggest to remove the list() part and just use the iterator as is.

link to similar issue:
https://stackoverflow.com/questions/6503388/prevent-memory-error-in-itertools-permutation

Boost.Python.ArgumentError in SetFormalCharge function

Dear Developer,

My name is Joaquim Jornet-Somoza (quim), postdoctoral researcher on theoretical and computational chemistry.
I have a set of xyz files (in fact, one file with concatenated xyz blocks), that I would like to pass to RD-Kit as mol.
I looked to your code and I found it interesting for my purpose.
Howevere, when I tryed the test examples it fails for the acetate xyz file (not for the ethane.xyz) saying:

$python xyz2mol.py acetate.xyz
Traceback (most recent call last):
File "xyz2mol.py", line 362, in
mol = xyz2mol(atomicNumList,charge,xyz_coordinates,charged_fragments)
File "xyz2mol.py", line 346, in xyz2mol
new_mol = AC2mol(mol,AC,atomicNumList,charge,charged_fragments)
File "xyz2mol.py", line 251, in AC2mol
mol = BO2mol(mol,BO, atomicNumList,atomic_valence_electrons,charge,charged_fragments)
File "xyz2mol.py", line 141, in BO2mol
mol = set_atomic_charges(mol,atomicNumList,atomic_valence_electrons,BO_valences,BO_matrix,mol_charge)
File "xyz2mol.py", line 163, in set_atomic_charges
a.SetFormalCharge(charge)
Boost.Python.ArgumentError: Python argument types in
Atom.SetFormalCharge(Atom, numpy.int64)
did not match C++ signature:
SetFormalCharge(RDKit::Atom {lvalue}, int)

Could you tell me where does this errors come from ?
Sincerely
quim

Parsing nciatlas geometries

Hi all, and thanks for providing xyz2mol.

Is it possible to parse multi-molecule xyz files? For example, the NCIAtlas provides geometries of pairs of molecules in xyz format. Invoking xyz2mol from the command fails while reading the charge line - it seems to be because there are two charges, charge_a and charge_b. Is that nonstandard?
Example here:

7.113_noHB__water--methylisocyanide_200.xyz:

9
charge=0 charge_a=0 charge_b=0 selection_a=1-3 selection_b=4-9 scaling=2.0
  O    2.117646266  -0.063971009   0.000000000
  H    1.566706960   0.725029523   0.000000000
  H    3.022595991   0.254923981   0.000000000
  H   -3.957827806  -1.581005514  -0.887150416
  H   -3.957827806  -1.581005514   0.887150416
  H   -2.416049485  -1.548657916   0.000000000
  C   -3.430031556   1.371722631   0.000000000
  N   -3.465218185   0.209914627   0.000000000
  C   -3.449363210  -1.210254769   0.000000000

xyz2mol.py 7.113_noHB__water--methylisocyanide_200.xyz returns:

Traceback (most recent call last):
  File "/Users/ljmartin/miniconda3/envs/compchem/lib/python3.9/site-packages/xyz2mol.py", line 795, in <module>
    atoms, charge, xyz_coordinates = read_xyz_file(filename)
  File "/Users/ljmartin/miniconda3/envs/compchem/lib/python3.9/site-packages/xyz2mol.py", line 548, in read_xyz_file
    charge = int(line.split("=")[1])
ValueError: invalid literal for int() with base 10: '0 charge_a'

Thanks a lot!

Failed to return the radical smiles at alpha-C

20
test
C 1.946483 1.206551 0.000157
C 0.557868 1.211882 0.000209
C -0.183791 -0.000006 0.000084
C 0.557871 -1.211887 -0.000106
C 1.946488 -1.206548 -0.000175
C 2.653807 0.000002 -0.000057
H 2.485035 2.147902 0.000311
H 0.032031 2.159512 0.000486
H 0.032040 -2.159521 -0.000265
H 2.485039 -2.147900 -0.000328
H 3.737395 0.000005 -0.000165
C -1.617056 -0.000003 -0.000038
C -2.374708 -1.296605 0.000275
H -3.452167 -1.122153 0.000408
H -2.135352 -1.908833 -0.880101
H -2.135065 -1.908601 0.880692
C -2.374698 1.296610 -0.000331
H -2.134393 1.908972 -0.880312
H -3.452159 1.122170 -0.001312
H -2.135994 1.908472 0.880480

The SMILES for above structure is: C[C-](C)c1ccccc1 or C[C-](c1ccccc1)C
But the output from xyz2mol.py is: CC(C)=C1C=CC=C[CH-]1

sdf file is empty while converting from xyz file

Hi all,

Trying to use xyz2mol to convert xyz files to sdf, however the result is an empty file without any error whatsoever.

Annexed you can find the molecule that i want to convert (it is a txt so i could annex here) and I'm using the following line directly in the terminal:
xyz2mol molecule_1.xyz -o sdf > molecule_1_from_xyz.sdf

xyz2mol version: 0.1.2

Thank you in advance :)

molecule_1.txt

Problem with assignment of charges

Many thanks for writing this program!

I encountered some problems while generating SMILES strings from xyz files for Bodipys. The expected structure with most mesomeric weight would have a negative charge on the Boron and a positive charge on the neighboring Nitrogen. The script correctly recognises the zwitterionic state but in some cases places the cationic charge on other hetero atoms:

example output [xyz2mol]:
F[B-]1(F)n2c(cc3c2=c2sccc2=[S+]3)=C(C(F)(F)F)c2cc3sc4ccsc4c3n21

expected output [non-canonical]:
FC(F)(F)C=5c3cc2sc1ccsc1c2n3[B-](F)(F)[N+]4=C6C(=CC4=5)Sc7ccsc67

This happens for a number of different heteroatom placements/variations:
F[B-]1(F)n2c(ccc2-c2cccs2)C(C(F)(F)F)=c2cc/c(=C3/C=CC=[S+]3)n21
CN(C)c1ccc(-c2ccc3n2[B-](F)(F)n2c(=C4C=CC(=[N+](C)C)C=C4)ccc2=C3C(F)(F)F)cc1

About anonical hack

Hi, at the end of the code, why should we convert the molecule to Smiles string and then load mol from Smiles string?

    smiles = Chem.MolToSmiles(mol, isomericSmiles=True)
    m = Chem.MolFromSmiles(smiles)
    smiles = Chem.MolToSmiles(m, isomericSmiles=True)

Failed to return the radical smiles at alpha-C (Second Issue)

Dear Prof. Jensen.

Following the previous issue about the SMILES of alpha-C radicals. I have tried the updated script. It works for part of the alpha-C radicals and there are still some can not be correctly represented.

In the Dropbox shared link below, I have attached both the success and failed examples. Hoping this will do help to come up with a general way to solve this problem.

https://www.dropbox.com/s/duuzciq2k9d2pnj/examples_for_xyz2mol.zip?dl=0

Propyl radical gives empty list of molecules

I tried various options, but all of them returned an empty list for the propyl radical.

8

C     -0.3027758391    0.5794019014    1.0345046298
C      0.0425157439   -0.4399390123    0.1832077568
C      0.2724509312   -0.2796867185   -1.1600106986
H     -0.4707683461    0.4079216362    2.0816470556
H     -0.4162158143    1.5863557406    0.6730322879
H      0.1401386036   -1.4356735654    0.5989624308
H      0.5435628550   -1.1069559010   -1.7895205051
H      0.1910918658    0.6885759191   -1.6218229572

`--sdf` argument

Does the --sdf argument dump anything? I just see it's a boolean but nothing it done with it in xyz2mol.py

typo?

python xyz2mol.py examples/acetate.xyz --charge -1

Traceback (most recent call last):
  File "/Users/michalkrompiec/xyz2mol/xyz2mol/xyz2mol.py", line 806, in <module>
    mols = xyz2mol(atoms, xyz_coordinates,
  File "/Users/michalkrompiec/xyz2mol/xyz2mol/xyz2mol.py", line 732, in xyz2mol
    for new_mol in new_mols:
NameError: name 'new_mols' is not defined

Looks like there is a typo in line 725 (introduced by the commit on Oct 2), which should be changed to:
new_mols = AC2mol(mol, AC, atoms, charge,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.