drrdom / crem Goto Github PK

View Code? Open in Web Editor NEW

196.0 7.0 37.0 427 KB

CReM: chemically reasonable mutations framework

License: BSD 3-Clause "New" or "Revised" License

Python 3.54% Shell 0.03% Jupyter Notebook 96.43%

chemoinformatics chemical-space

crem's Introduction

CReM - chemically reasonable mutations

CReM is an open-source Python framework to generate chemical structures using a fragment-based approach.

The main idea behind is similar to matched molecular pairs considering context that fragments in the identical context are interchangeable. Therefore, one can create a database of interchangeable fragments and use it for generation of chemically valid structures.

Features:

Generation of a custom fragment database
Three modes of structure generation: MUTATE, GROW, LINK
Context radius to consider for replacement
Fragment size to replace and the size of a replacing fragment
Protection of atoms from modification (e.g. scaffold protection)
Replacements with fragments occurred in a fragment database with certain minimal frequency
Make randomly chosen replacements up to the specified number

Limitations and known issues

New ring systems cannot be constructed from fragments, thus representativeness of ring systems in generated structures depends on a used fragment database. We are working on that issue.
Very large molecules will not be processed by CReM. If a molecule has more than 30 non-ring single bonds it will not be MUTATED. If a molecule has more than 100 hydrogen atoms it will not be processed by GROW and LINK.
Canonicalisation of contexts depends on RDKit SMILES representation. Thus, changing in RDKit SMILES representation may affect fragment databases and make impossible to use a database prepared with previous RDKit version from code running under later RDKit versions.

Documentation

https://crem.readthedocs.io/en/latest/

Web app

To play with a tool online.
https://crem.imtm.cz/

Installation

Several command line utilities will be installed to create fragment databases and crem module will become available in Python imports to generate structures.

From pypi package

pip install crem

Manually from repository

git clone https://github.com/DrrDom/crem
cd crem
python3 setup.py sdist bdist_wheel
pip install dist/crem-0.1-py3-none-any.whl

Uninstall

pip uninstall crem

Dependencies

crem requires rdkit>=2017.09. To run the guacamol test guacamol should be installed.

Generation of a fragment database

This step is required if you want to generate a custom fragment database. You can download precompiled databases obtained by fragmentation of the whole ChEMBL by links provided on this page - http://www.qsar4u.com/pages/crem.php.

For convenience there is the bash script crem_create_frag_db.sh which includes all steps below. It takes three positional arguments: input file with SMILES, output directory where intermediate files and a final database will be stored and number of CPUs to use (this is optional, default value is 1).

crem_create_frag_db.sh input.smi fragdb_dir 32

Fragmentation of input structures:

fragmentation -i input.smi -o frags.txt -c 32 -v

Convert fragments to standardized representation of a core and a context of a given radius:

frag_to_env -i frags.txt -o r3.txt -r 3 -c 32 -v

Remove duplicated lines in the output file and count frequency of occurrence of fragemnt-context pairs. These (sort and uniq) are bash utilities but since Win10 is Linux-friendly that should not be a big issue for Win users to execute them

sort r3.txt | uniq -c > r3_c.txt

Create DB and import the file to a database table

env_to_db -i r3_c.txt -o fragments.db -r 3 -c -v

Last three steps should be executed for each radius. All tables can be stored in the same database.

Structure generation

Import necessary functions from the main module

from crem.crem import mutate_mol, grow_mol, link_mols
from rdkit import Chem

Create a molecute and mutate it. Only one heavy atom will be substituted. Default radius is 3.

m = Chem.MolFromSmiles('c1cc(OC)ccc1C')  # methoxytoluene
mols = list(mutate_mol(m, db_name='replacements.db', max_size=1))

output example

['CCc1ccc(C)cc1',
 'CC#Cc1ccc(C)cc1',
 'C=C(C)c1ccc(C)cc1',
 'CCCc1ccc(C)cc1',
 'CC=Cc1ccc(C)cc1',
 'CCCCc1ccc(C)cc1',
 'CCCOc1ccc(C)cc1',
 'CNCCc1ccc(C)cc1',
 'COCCc1ccc(C)cc1',
 ...
 'Cc1ccc(C(C)(C)C)cc1']

Add hydrogens to the molecule to mutate hydrogens as well

mols = list(mutate_mol(Chem.AddHs(m), db_name='replacements.db', max_size=1))

output

['CCc1ccc(C)cc1',
 'CC#Cc1ccc(C)cc1',
 'C=C(C)c1ccc(C)cc1',
 'CCCc1ccc(C)cc1',
 'Cc1ccc(C(C)C)cc1',
 'CC=Cc1ccc(C)cc1',
 ...
 'COc1ccc(C)cc1C',
 'C=Cc1cc(C)ccc1OC',
 'COc1ccc(C)cc1Cl',
 'COc1ccc(C)cc1CCl']

Grow molecule. Only hydrogens will be replaced. Hydrogens should not be added explicitly.

mols = list(grow_mol(m, db_name='replacements.db'))

output

['COc1ccc(C)c(Br)c1',
 'COc1ccc(C)c(C)c1',
 'COc1ccc(C)c(Cl)c1',
 'COc1ccc(C)c(OC)c1',
 'COc1ccc(C)c(N)c1',
 ...
 'COc1ccc(CCN)cc1']

Create the second molecule and link it to toluene

m2 = Chem.MolFromSmiles('NCC(=O)O')  # glycine
mols = list(link_mols(m, m2, db_name='replacements.db'))

output

['Cc1ccc(OCC(=O)NCC(=O)O)cc1',
 'Cc1ccc(OCCOC(=O)CN)cc1',
 'COc1ccc(CC(=N)NCC(=O)O)cc1',
 'COc1ccc(CC(=O)NCC(=O)O)cc1',
 'COc1ccc(CC(=S)NCC(=O)O)cc1',
 'COc1ccc(CCOC(=O)CN)cc1']

You can vary the size of a linker and specify the distance between two attachment points in a linking fragment. There are many other arguments available in these functions, look at their docstrings for details.

Additional filters to control fragments chosen for replacing

An example of a filtering function which will keep only fragments containing a specific atom to be chosen for replacing.

from collections import defaultdict
from functools import partial
from rdkit import Chem

def filter_function(row_ids, cur, radius, atom_number):

    """
    The first three arguments should be always the same as shown in the example. These parameters will be passed to a function from a main function, e.g. from mutate_mol. All other arguments are user-defined. The function should return the list of row ids of fragments which will be used for replacing. 

    :param row_id: a list of row ids from CReM database of those fragments which satisfy other selection criteria
    :param cur: cursor of CReM database
    :param radius: radius of a context 
    :param atom_number: an atomic number, fragments with this number will be discarded
    :return list of remaining row ids
    """

    # this part may be kept intact, it collects from DB SMILES of fragments with given row ids
    # since fragments may occur multiple times (due to different contexts) the results are collected in a dict
    if not row_ids:
        return []
    batch_size = 32000  # SQLite has a limit on a number of passed values to a query
    row_ids = list(row_ids)
    smis = defaultdict(list)  # {smi_1: [rowid_1, rowid_5, ...], ...}
    for start in range(0, len(row_ids), batch_size):
        batch = row_ids[start:start + batch_size]
        sql = f"SELECT rowid, core_smi FROM radius{radius} WHERE rowid IN ({','.join('?' * len(batch))})"
        for i, smi in cur.execute(sql, batch).fetchall():
            smis[smi].append(i)

    output_row_ids = []
    for smi, ids in smis.items():
        for a in Chem.MolFromSmiles(smi).GetAtoms():
            if a.GetAtomicNum() == atom_number:
                output_row_ids.extend(ids)
    return output_row_ids

# only F-containing fragments will be chosen for replacing
mol = Chem.MolFromSmiles('c1ccccc1C')
mols = mutate_mol(mol, db_name='replacements.db', filter_func=partial(filter_function, atom_number=9), max_size=1, max_inc=3)

output

['Fc1ccccc1', 
 'FC(F)(F)c1ccccc1',
 'FC(F)Oc1ccccc1',
 'FC(F)Sc1ccccc1']

Iterative enumeration

For convenience there is a function enumerate_compounds in utils module (added in version 0.2.6). It performs iterative growing (scaffold decoration) or mutation (analog enumeration) of a supplied molecule. More details are in docstring of the function.

Example. Enumerate derivatives of 1-chloro-3-methylbenzene at positions 2 and 4 of the ring and at the methyl group at the same time. In this case one should choose scaffold mode, 3 iterations, specify atom ids (0-based indices) where fragments can be attached and set protect_added_frag=True to restrict enumeration only to selected positions.

from crem.utils import enumerate_compounds

mol = Chem.MolFromMolBlock("""
  Mrv1922 05242309182D          

  8  8  0  0  0  0            999 V2000
   -3.2813    1.3161    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.9957    0.9036    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.9957    0.0786    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.2813   -0.3339    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.5668    0.0786    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.5668    0.9036    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.2813   -1.1589    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.8523    1.3161    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  2  0  0  0  0
  3  4  1  0  0  0  0
  4  5  2  0  0  0  0
  5  6  1  0  0  0  0
  1  6  2  0  0  0  0
  4  7  1  0  0  0  0
  6  8  1  0  0  0  0
M  END
""")

mols = enumerate_compounds(mol, 'replacements_sa2.db', mode='scaffold', n_iterations=3,
                           radius=3, max_replacements=2, replace_ids=[2,4,6], protect_added_frag=True, 
                           return_smi=True)

output

['COc1c(C)cccc1Cl', 
'Cc1cc(Cl)ccc1Cl', 
'COc1ccc(Cl)c(OC)c1C', 
'COc1c(Cl)cccc1CF', 
'Cc1c(Cl)ccc(Cl)c1C', 
'CSCc1cc(Cl)ccc1Cl', 
'COc1ccc(Cl)c(OC)c1CC#N', 
'COCc1c(OC)ccc(Cl)c1OC', 
'COc1c(Cl)cccc1C(F)F', 
'COc1c(Cl)ccc(CO)c1CF', 
'Cc1c(Cl)ccc(Cl)c1CC#N', 
'Cc1c(Cl)ccc(Cl)c1CCN', 
'CSCc1c(Cl)ccc(Cl)c1C', 
'CSCc1c(Cl)ccc(Cl)c1Cl']

Multiprocessing

All functions have an argument ncores and can make mupltile replacement in one molecule in parallel. If you want to process several molecules in parallel you have to write your own code. However, the described functions are generators and cannot be used with multiprocessing module. Therefore, three complementary functions mutate_mol2, grow_mol2 and link_mols2 were created. They return the list with results and can be pickled and used with multiprocessing.Pool or other tools.

Example:

from multiprocessing import Pool
from functools import partial
from crem.crem import mutate_mol2
from rdkit import Chem

p = Pool(2)
input_smi = ['c1ccccc1N', 'NCC(=O)OC', 'NCCCO']
input_mols = [Chem.MolFromSmiles(s) for s in input_smi]

res = list(p.imap(partial(mutate_mol2, db_name='replacements.db', max_size=1), input_mols))

res would be a list of lists with SMILES of generated molecules

Bechmarks

Guacamol

task	SMILES LSTM*	SMILES GA*	Graph GA*	Graph MCTS*	CReM
Celecoxib rediscovery	1.000	0.732	1.000	0.355	1.000
Troglitazone rediscovery	1.000	0.515	1.000	0.311	1.000
Thiothixene rediscovery	1.000	0.598	1.000	0.311	1.000
Aripiprazole similarity	1.000	0.834	1.000	0.380	1.000
Albuterol similarity	1.000	0.907	1.000	0.749	1.000
Mestranol similarity	1.000	0.79	1.000	0.402	1.000
C11H24	0.993	0.829	0.971	0.410	0.966
C9H10N2O2PF2Cl	0.879	0.889	0.982	0.631	0.940
Median molecules 1	0.438	0.334	0.406	0.225	0.371
Median molecules 2	0.422	0.38	0.432	0.170	0.434
Osimertinib MPO	0.907	0.886	0.953	0.784	0.995
Fexofenadine MPO	0.959	0.931	0.998	0.695	1.000
Ranolazine MPO	0.855	0.881	0.92	0.616	0.969
Perindopril MPO	0.808	0.661	0.792	0.385	0.815
Amlodipine MPO	0.894	0.722	0.894	0.533	0.902
Sitagliptin MPO	0.545	0.689	0.891	0.458	0.763
Zaleplon MPO	0.669	0.413	0.754	0.488	0.770
Valsartan SMARTS	0.978	0.552	0.990	0.04	0.994
Deco Hop	0.996	0.970	1.000	0.590	1.000
Scaffold Hop	0.998	0.885	1.000	0.478	1.000
total score	17.341	14.398	17.983	9.011	17.919

License

BSD-3

Citation

CReM: chemically reasonable mutations framework for structure generation
Pavel Polishchuk
Journal of Cheminformatics 2020, 12, (1), 28
https://doi.org/10.1186/s13321-020-00431-w

Control of Synthetic Feasibility of Compounds Generated with CReM
Pavel Polishchuk
Journal of Chemical Information and Modeling 2020, 60, 6074-6080
https://dx.doi.org/10.1021/acs.jcim.0c00792

crem's People

Contributors

Stargazers

Watchers

crem's Issues

Parameters relevant for speed in mutate_mol

Hi! Thanks for maintaining this repo! I was wondering which parameters were relevant for speeding up mutate_mol? I've found better speeds by increasing n_cores but if I wanted to speed it up even more would decreasing radius help too?

how to employ the grow_mol2/mut_mol2 function

hi, DrrDom,

could you please provide some example for using grow_mol2 and mut_mol2 functions?
In particular, how to set the parameters, such as replace_ids ?
Many many thanks

Multiple replacements in grow_mol

Hi, thanks for creating this useful repo which allows us to generate new molecules from a framework!

I have been playing around with the grow_mol function, and I want to specify 3 positions in my starting molecule to grow the molecule. I can do so with the parameter replace_ids and specify the 3 positions in a list. However, I realised that all of the generated molecules only had 1 replacement per molecule. What I am looking for is a way to possibly generate new molecules with 2 or even all 3 positions replaced at once.

I have read the documentation for crem, and as far as I'm aware, there is no way to do this currently. Do you have any advice regarding this issue?

Thank you!
Alan

fragmentation py issue: Can't generate mol for: SMILES

Dear Pavel,

I'm so grateful for your powerful work. when I tried to generate the a custom fragment database as you described.
but after runing the py file fragmentation -i input.smi -o frags.txt -c 32 -v, the error happened and indicated that

<Can't generate mol for: SMILES
import sys; print('Python %s on %s' % (sys.version, sys.platform))>

could you please give some suggestions how to fix it and what errors?
neigher did I use the wrong format input.smi file? could you please show me some examples how to use the right smi file?
many many thanks,

best,
Sh-Y

Reproducible output

Implement random seed to make results reproducible

Opening Pre-Compiled Databases

Perhaps a novice question, but opening the pre-defined fragment databases given here: http://www.qsar4u.com/pages/crem.php.

using gzip, etc. gives an unexpected end of file error. Any advice on properly unarchiving these?

No such table: radius3

Hi! I followed the instruction to install the crem package. However, when I ran the demo code, I found the below issues:

---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
/tmp/ipykernel_1568311/2420725268.py in <module>
      1 m = Chem.MolFromSmiles('c1cc(OC)ccc1C')  # methoxytoluene
----> 2 mols = list(mutate_mol(m, db_name='replacements.db', max_size=1,radius=3))

/home/shuangjia/anaconda3/lib/python3.7/site-packages/crem/crem.py in mutate_mol(mol, db_name, radius, min_size, max_size, min_rel_size, max_rel_size, min_inc, max_inc, max_replacements, replace_cycles, replace_ids, protected_ids, symmetry_fixes, min_freq, return_rxn, return_rxn_freq, return_mol, ncores, **kwargs)
    493                                                                 protected_ids_1=protected_ids, protected_ids_2=None,
    494                                                                 min_freq=min_freq, symmetry_fixes=symmetry_fixes,
--> 495                                                                 **kwargs):
    496             for smi, m, rxn in __frag_replace(mol, None, frag_sma, core_sma, radius, ids, None):
    497                 if max_replacements is None or len(products) < (max_replacements + 1):  # +1 because we added source mol to output smiles

/home/shuangjia/anaconda3/lib/python3.7/site-packages/crem/crem.py in __gen_replacements(mol1, mol2, db_name, radius, dist, min_size, max_size, min_rel_size, max_rel_size, min_inc, max_inc, max_replacements, replace_cycles, protected_ids_1, protected_ids_2, min_freq, symmetry_fixes, **kwargs)
    342                 max_atoms = num_heavy_atoms + max_inc
    343 
--> 344                 row_ids = __get_replacements_rowids(cur, env, dist, min_atoms, max_atoms, radius, min_freq, **kwargs)
    345 
    346                 if max_replacements is None:

/home/shuangjia/anaconda3/lib/python3.7/site-packages/crem/crem.py in __get_replacements_rowids(db_cur, env, dist, min_atoms, max_atoms, radius, min_freq, **kwargs)
    284         else:
    285             sql += f" AND {k} = {v}"
--> 286     db_cur.execute(sql)
    287     return set(i[0] for i in db_cur.fetchall())
    288 

OperationalError: no such table: radius3

I'm not sure if it's a bug after the package update, whether I use pypi to install or manually install this problem exists, look forward to your answer, thank you!

[bug] package is incompatible with python 3.11+

per the python 3.11 documentation:

The population parameter of random.sample() must be a sequence, and automatic conversion of sets to lists is no longer supported. Also, if the sample size is larger than the population size, a ValueError is raised. (Contributed by Raymond Hettinger in bpo-40465.)

This causes crem to crash when running, as it attempts to use random.sample on a dict.keys object, which apparently is now a non-sequence type (and no longer gets automatically converted).

This is fixed by first converting to a list (or other sequence datatype) prior to calling random.sample.

Database Visualization and Variation Inquiry

Thank you for creating this valuable tool. I'm wondering how I can visualize the database I've created following the documentation. Also, could you please summarize the variations I can expect in the outcome database if I've created it with different radii (for example: 1, 2, 3, 4, 5)? Additionally, what if I only want to create fragmented databases with different heteroatoms and not couple them?

AtomKekulizeException: non-ring atom 2 marked aromatic

Hello, I am getting a strange error with this molecule.

mol = Chem.MolFromSmiles('C[C@H](C(:[O]):[O-])[C@H](c1ccc2c(c1)O[C@@H](C1CCN([C@@H](C)c3cc(F)ccc3OC(F)(F)F)CC1)CC2)C1CC1')
if mol: print(True)
a =list( mutate_mol(mol, db_name='replacements02_sc2.5.db', radius=3, min_size=0, max_size=2, min_inc=-5, max_inc=6))

The molecule looks like a correct RDKit mol. But i got the following error:

[16:29:26] non-ring atom 2 marked aromatic
---------------------------------------------------------------------------
AtomKekulizeException                     Traceback (most recent call last)
Input In [14], in <cell line: 1>()
----> 1 a =list( mutate_mol(mol, db_name='/home/ale/GITLAB/bi_crem_database/replacements02_sc2.5.db', radius=3, min_size=0, max_size=2, min_inc=-5, max_inc=6))

File ~/anaconda3/envs/moldrug/lib/python3.9/site-packages/crem/crem.py:487, in mutate_mol(mol, db_name, radius, min_size, max_size, min_rel_size, max_rel_size, min_inc, max_inc, max_replacements, replace_cycles, replace_ids, protected_ids, symmetry_fixes, min_freq, return_rxn, return_rxn_freq, return_mol, ncores, **kwargs)
    483 protected_ids = sorted(protected_ids)
    485 if ncores == 1:
--> 487     for frag_sma, core_sma, freq, ids in __gen_replacements(mol1=mol, mol2=None, db_name=db_name, radius=radius,
    488                                                             min_size=min_size, max_size=max_size,
    489                                                             min_rel_size=min_rel_size, max_rel_size=max_rel_size,
    490                                                             min_inc=min_inc, max_inc=max_inc,
    491                                                             max_replacements=max_replacements,
    492                                                             replace_cycles=replace_cycles,
    493                                                             protected_ids_1=protected_ids, protected_ids_2=None,
    494                                                             min_freq=min_freq, symmetry_fixes=symmetry_fixes,
    495                                                             **kwargs):
    496         for smi, m, rxn in __frag_replace(mol, None, frag_sma, core_sma, radius, ids, None):
    497             if max_replacements is None or len(products) < (max_replacements + 1):  # +1 because we added source mol to output smiles

File ~/anaconda3/envs/moldrug/lib/python3.9/site-packages/crem/crem.py:314, in __gen_replacements(mol1, mol2, db_name, radius, dist, min_size, max_size, min_rel_size, max_rel_size, min_inc, max_inc, max_replacements, replace_cycles, protected_ids_1, protected_ids_2, min_freq, symmetry_fixes, **kwargs)
    312 else:
    313     mol = mol1
--> 314     f = __fragment_mol(mol, radius, protected_ids=protected_ids_1, symmetry_fixes=symmetry_fixes)
    316 if f:
    317     mol_hac = mol.GetNumHeavyAtoms()

File ~/anaconda3/envs/moldrug/lib/python3.9/site-packages/crem/crem.py:113, in __fragment_mol(mol, radius, return_ids, keep_stereo, protected_ids, symmetry_fixes)
    110             output.add((env, frag, ids_0))
    111     else:  # multiple cuts
    112         # there are no checks for H needed because H can be present only in single cuts
--> 113         env, frag = get_canon_context_core(chains, core, radius, keep_stereo)
    114         output.add((env, frag, get_atom_prop(core) if return_ids else tuple()))
    116 if symmetry_fixes:

File ~/anaconda3/envs/moldrug/lib/python3.9/site-packages/crem/mol_context.py:302, in get_canon_context_core(context, core, radius, keep_stereo)
    299 def get_canon_context_core(context, core, radius, keep_stereo=False):
    300     # context and core are Mols or SMILES
    301     # returns SMILES by default
--> 302     res = get_std_context_core_permutations(context, core, radius, keep_stereo)
    303     if res:
    304         env, cores = res

File ~/anaconda3/envs/moldrug/lib/python3.9/site-packages/crem/mol_context.py:247, in get_std_context_core_permutations(context, core, radius, keep_stereo)
    245 # remove Hs from context and core
    246 if context:  # context cannot be H (no check needed), if so the user will obtain meaningless output
--> 247     context = Chem.RemoveHs(context)
    248 if core and Chem.MolToSmiles(core) != '[H][*:1]':
    249     core = Chem.RemoveHs(core)

AtomKekulizeException: non-ring atom 2 marked aromatic

Any idea what is going on? A lot of thanks in advance!

fragmentation problem

Hi,

I am trying to build my own db according to the instructions, but when I try mols = list(mutate_mol(m, db_name='test.db', max_size=3)) in the tutorial with my own db, the result list of mols is always empty, but when I try with the preproduced db replacements02_sa2.db there is no problem; I used the CHEMBL231.smi file in the example folder, and follow the instructions

fragmentation -i CHEMBL231.smi -o frags.txt -c 32 -v
frag_to_env -i frags.txt -o r3.txt -r 3 -c 32 -v
sort r3.txt | uniq -c > r3_c.txt
env_to_db -i r3_c.txt -o tert.db -r 3 -c -v

got the result

 root@872fabd400c3:/home/crem# python test.py
[]

is there something iI missed?

auxiliary function get_replacements

how can we implement the function in the new version 0.2.11, auxiliary function get_replacements?

could you please provide some examples how to run the auxiliary function get_replacements? many thanks,

best,

Fragmentation

Was a bit confused by the fragment_mol method in fragmentation.py, specifically these lines:

frags = rdMMPA.FragmentMol(mol, pattern="[!#1]!@!=!#[!#1]", maxCuts=4, resultsAsMols=False, maxCutBonds=30)
frags += rdMMPA.FragmentMol(mol, pattern="[!#1]!@!=!#[!#1]", maxCuts=3, resultsAsMols=False, maxCutBonds=30)

Shouldn't maxCuts=3 be included as a subset of all fragmentations for maxCuts=4?
What exactly is the difference between maxCutBonds and maxCuts? I tried looking for documentation on this and was unfortunately unable to find it.
Would it make more sense for the maxCuts parameter to be user-defined rather than preset?

Thanks for your help!

enumerate_compounds: filtering the generated molecules to meet my criteria

Hi,

I am using the enumerate_compounds function to generate new molecules, with 3 positions for potential replacement. Hence, I set the n_iterations = 3. As far as I'm aware, this does a stepwise growing of my base molecule mol.

However, I am only interested in the generated molecules where at least 1 specified position is grown. Is there any way to filter out molecules which don't meet this criteria, or how do you suggest I go about this problem?

Thank you so much!

Problem with guacamol_test.py

Hi all
I was trying the script with guacamol test and got the warning for GPU from tensorflow (I don't have GPU set up), but what I had hard time to understand is this:

(my-rdkit-env) J:\esp\Personal\AndreaZaliani\pubchem_yojana\crem>python crem/guacamol_crem_test.py
2022-07-04 11:22:30.488812: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-07-04 11:22:30.492783: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
File "J:\esp\Personal\AndreaZaliani\pubchem_yojana\crem\crem\guacamol_crem_test.py", line 30, in
from crem import mutate_mol2
File "J:\esp\Personal\AndreaZaliani\pubchem_yojana\crem\crem\crem.py", line 11, in
from crem.mol_context import get_canon_context_core, combine_core_env_to_rxn_smarts
ModuleNotFoundError: No module named 'crem.mol_context'; 'crem' is not a package

Any hint for me?

Best
andrea

error of enumerate_compounds function

Hi, Crem,

I tried the new function enumerate_compounds in utils module.

but there is an error, when I use the method to repeat the example you provied.
73dd185

TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

my PC memory is 32G, so I didnot why this error occurs
could you please provide some suggesions to fix it? many thanks,

Shengyang

Results different between online app and local version

If I draw the following molecule in your web app:

And try to use the "Scaffold" (e.g. grow) functionality at the selected index, I get a number of results (maybe a few hundred/thousand) at radius = 1.

However, if I iterate over the atoms in the molecule in a local version, crem reports it cant generate any results at the same position:

smiles = 'c1cc(OC)ccc1C'
base_mol = Chem.MolFromSmiles(smiles)
atoms = len(list(base_mol.GetAtoms()))

for atom in range(0, atoms):
    gen_results = list(grow_mol(base_mol, return_mol=True, max_replacements=20000,
                                db_name='replacements02_sc2.5.db', radius=1, replace_ids=[atom], max_atoms=15, ncores=4))
    num_results = len(gen_results)
    print(atom, num_results)

The output of this is

[10:36:04] WARNING: not removing hydrogen atom with dummy atom neighbors
0 19999
[10:36:06] WARNING: not removing hydrogen atom with dummy atom neighbors
1 20000
2 0
3 0
[10:36:10] WARNING: not removing hydrogen atom with dummy atom neighbors
4 19999
5 0
6 0
7 0
[10:36:15] WARNING: not removing hydrogen atom with dummy atom neighbors
8 19995
[10:36:18] WARNING: not removing hydrogen atom with dummy atom neighbors
9 19999
[10:36:21] WARNING: not removing hydrogen atom with dummy atom neighbors
10 19998
[10:36:25] WARNING: not removing hydrogen atom with dummy atom neighbors
11 19999
12 0
13 0
14 0
15 0
[10:36:29] WARNING: not removing hydrogen atom with dummy atom neighbors
16 19994
17 0
18 0

And just as a sanity check, this is the numbering of the molecule:

I'm not sure why index 1 is able to produce results, but index 5 doesn't (even though it should by symmetry), and the same position (index 5) works in the web app. Do you know why these would produce different results?

grow_mol error when used on fragments with dummy atoms

Hello,

Thank you for maintain CReM! I ran into an error while using the program, and would like to see if you have any insights into this problem.

I'm trying to grow a fragment generated using RGroupDecomposition in rdkit, like so:

from rdkit import Chem
from rdkit.Chem import rdRGroupDecomposition as rgd
from crem.crem import grow_mol

mol = Chem.MolFromSmiles("Cc4cccc(Cc3cnc(NC(=O)c1cnn2ccc(C)nc12)s3)c4")
core = Chem.MolFromSmiles("Cc1cnc(NC(=O))s1")
params = rgd.RGroupDecompositionParameters()
params.RGroupLabels = rgd.RGroupLabels.AtomIndexLabels
fragment = rgd.RGroupDecompose([core], [mol], options=params)[0][0]['R1']
display(fragment)

The R1:1 is a dummy atom generated by RGroupDecomposition. If I try to run grow_mol at this point, I receive this error:

derivatives = list(grow_mol(r, db_name='/home/javi/data/rgroup/replacements02_sa2.db', max_replacements=9, return_mol=True))
>>>
ValueError                                Traceback (most recent call last)
crem.ipynb Cell 3' in <cell line: 62>()
---> 66 derivatives = list(grow_mol(r, db_name='replacements02_sa2.db', max_replacements=9, return_mol=True))

File ~/workspace/sci-dev/sci-dev-venv/lib/python3.8/site-packages/crem/crem.py:487, in mutate_mol(mol, db_name, radius, min_size, max_size, min_rel_size, max_rel_size, min_inc, max_inc, max_replacements, replace_cycles, replace_ids, protected_ids, symmetry_fixes, min_freq, return_rxn, return_rxn_freq, return_mol, ncores, **kwargs)
    483 protected_ids = sorted(protected_ids)
    485 if ncores == 1:
--> 487     for frag_sma, core_sma, freq, ids in __gen_replacements(mol1=mol, mol2=None, db_name=db_name, radius=radius,
    488                                                             min_size=min_size, max_size=max_size,
    489                                                             min_rel_size=min_rel_size, max_rel_size=max_rel_size,
    490                                                             min_inc=min_inc, max_inc=max_inc,
    491                                                             max_replacements=max_replacements,
    492                                                             replace_cycles=replace_cycles,
    493                                                             protected_ids_1=protected_ids, protected_ids_2=None,
    494                                                             min_freq=min_freq, symmetry_fixes=symmetry_fixes,
    495                                                             **kwargs):
    496         for smi, m, rxn in __frag_replace(mol, None, frag_sma, core_sma, radius, ids, None):
    497             if max_replacements is None or (max_replacements is not None and len(products) < max_replacements):

File ~/workspace/sci-dev/sci-dev-venv/lib/python3.8/site-packages/crem/crem.py:339, in __gen_replacements(mol1, mol2, db_name, radius, dist, min_size, max_size, min_rel_size, max_rel_size, min_inc, max_inc, max_replacements, replace_cycles, protected_ids_1, protected_ids_2, min_freq, symmetry_fixes, **kwargs)
    334 hac_ratio = num_heavy_atoms / mol_hac
    336 if (min_size <= num_heavy_atoms <= max_size and min_rel_size <= hac_ratio <= max_rel_size) \
    337         or (replace_cycles and cycle_pattern.search(core)):
--> 339     frag_sma = combine_core_env_to_rxn_smarts(core, env)
    341     min_atoms = num_heavy_atoms + min_inc
    342     max_atoms = num_heavy_atoms + max_inc

File ~/workspace/sci-dev/sci-dev-venv/lib/python3.8/site-packages/crem/mol_context.py:349, in combine_core_env_to_rxn_smarts(core, env, keep_h)
    346         links[i].append(a.GetNeighbors()[0].GetIdx())
    347         att_to_remove.append(a.GetIdx())
--> 349 for i, j in links.values():
    350     m.AddBond(i, j, Chem.BondType.SINGLE)
    352 for i in sorted(att_to_remove, reverse=True):

ValueError: too many values to unpack (expected 2)

However, when I try to replicate the problem later on in a notebook, I get a different error:

list(grow_mol(fragment, db_name='replacements02_sa2.db', max_replacements=9, return_mol=True))
>>>
RuntimeError                              Traceback (most recent call last)
crem.ipynb Cell 4' in <cell line: 10>()
---> 10 list(grow_mol(fragment, db_name='replacements02_sa2.db', max_replacements=9, protected_ids=[9,10], return_mol=True))

File ~/workspace/sci-dev/sci-dev-venv/lib/python3.8/site-packages/crem/crem.py:487, in mutate_mol(mol, db_name, radius, min_size, max_size, min_rel_size, max_rel_size, min_inc, max_inc, max_replacements, replace_cycles, replace_ids, protected_ids, symmetry_fixes, min_freq, return_rxn, return_rxn_freq, return_mol, ncores, **kwargs)
    483 protected_ids = sorted(protected_ids)
    485 if ncores == 1:
--> 487     for frag_sma, core_sma, freq, ids in __gen_replacements(mol1=mol, mol2=None, db_name=db_name, radius=radius,
    488                                                             min_size=min_size, max_size=max_size,
    489                                                             min_rel_size=min_rel_size, max_rel_size=max_rel_size,
    490                                                             min_inc=min_inc, max_inc=max_inc,
    491                                                             max_replacements=max_replacements,
    492                                                             replace_cycles=replace_cycles,
    493                                                             protected_ids_1=protected_ids, protected_ids_2=None,
    494                                                             min_freq=min_freq, symmetry_fixes=symmetry_fixes,
    495                                                             **kwargs):
    496         for smi, m, rxn in __frag_replace(mol, None, frag_sma, core_sma, radius, ids, None):
    497             if max_replacements is None or (max_replacements is not None and len(products) < max_replacements):

File ~/workspace/sci-dev/sci-dev-venv/lib/python3.8/site-packages/crem/crem.py:314, in __gen_replacements(mol1, mol2, db_name, radius, dist, min_size, max_size, min_rel_size, max_rel_size, min_inc, max_inc, max_replacements, replace_cycles, protected_ids_1, protected_ids_2, min_freq, symmetry_fixes, **kwargs)
    312 else:
    313     mol = mol1
--> 314     f = __fragment_mol(mol, radius, protected_ids=protected_ids_1, symmetry_fixes=symmetry_fixes)
    316 if f:
    317     mol_hac = mol.GetNumHeavyAtoms()

File ~/workspace/sci-dev/sci-dev-venv/lib/python3.8/site-packages/crem/crem.py:95, in __fragment_mol(mol, radius, return_ids, keep_stereo, protected_ids, symmetry_fixes)
     92         atom.SetIntProp("Index", atom.GetIdx())
     94 # heavy atoms
---> 95 frags = rdMMPA.FragmentMol(mol, pattern="[!#1]!@!=!#[!#1]", maxCuts=4, resultsAsMols=True, maxCutBonds=30)
     96 frags += rdMMPA.FragmentMol(mol, pattern="[!#1]!@!=!#[!#1]", maxCuts=3, resultsAsMols=True, maxCutBonds=30)
     97 # hydrogen atoms

RuntimeError: Pre-condition Violation
	RingInfo not initialized
	Violation occurred on line 83 in file Code/GraphMol/RingInfo.cpp
	Failed Expression: df_init
	RDKIT: 2021.09.5
	BOOST: 1_67

This expression works fine when I use a molecule without a dummy atom, for example the core molecule defined above. Any help with this would be greatly appreciated.

Thank you!

Tutorials

Hi! I was wondering whether you might have some more jupyter tutorials that aren't uploaded on the current GitHub repository? Thanks for building this software, it's super cool!

Issue in

Fragmentation of certain molecules fails, it is because ',' as a delimiter doesn't always work . A fix might be to use '\t' as a delimiter as it doesn't appear in SMARTS patterns.

Error in process_line() of frag_to_env_mp.py:

original line CC[C@H]1C[C@@h]1C(=O)NC(C)C,|&1:2,4|,C1C@H[C@H]1[:2],CC(C)NC(=O)[:1].CC[:2]
too many values to unpack (expected 4) ['CC[C@H]1C[C@@h]1C(=O)NC(C)C', '|&1:2', '4|', 'C1C@H[C@H]1[:2]', 'CC(C)NC(=O)[:1].CC[:2]']
There are 5 unpacked values and not 4 as expected because '|&1:2,4|' has a ',' in the pattern

Annoying [13:51:55] WARNING: not removing hydrogen atom with dummy atom neighbors

Hi, excellent package! I just wondering how to stop the warning message that pops up during link_mol and mutate_mol. I tried some

import warnings
warnings.filterwarnings("ignore", message='not removing hydrogen atom with dummy atom neighbors')

without succes.

drrdom / crem Goto Github PK

crem's Introduction

CReM - chemically reasonable mutations

Documentation

Web app

Installation

Dependencies

Generation of a fragment database

Structure generation

Additional filters to control fragments chosen for replacing

Iterative enumeration

Multiprocessing

Bechmarks

Guacamol

License

Citation

crem's People

Contributors

Stargazers

Watchers

Forkers

crem's Issues

Recommend Projects

Recommend Topics

Recommend Org