Giter Site home page Giter Site logo

abuttonch / ai-on-a-chip Goto Github PK

View Code? Open in Web Editor NEW
4.0 1.0 3.0 76.4 MB

This code was written for the publication "Combining generative artificial intelligence and on-chip synthesis for de novo drug design"

License: MIT License

Python 95.60% Shell 4.40%

ai-on-a-chip's Introduction

Combining generative artificial intelligence and on-chip synthesis for de novo drug design

In this repository, you will find data and code associated with the study: "Combining Generative Artificial Intelligence and On-Chip Synthesis for De Novo Drug Design", in which a Long Short-Term Memory network was combined with a microfluidics platform to design novel bioactive compounds, see Grisoni, Huisman et al. 2020. In this repository, you will find useful data and code to reproduce the results of our study.

  1. Getting started
  2. Data
  3. Virtual reaction filter code
    1. Installation
    2. Usage
    3. Example
  4. Generative deep learning code
  5. How to cite

Getting started

To access the content of this repository on your local machine, you can clone it as follows:

[email protected]:abuttonCH/ai-on-a-chip.git

Data

This repository contains different types of datasets that are linked to our publication. Such datasets are located in data. In particular, the folder contains the following files:

  • LSTM_pretraining_data.zip: Contains the SMILES of the compounds used for pretraining the LSTM model. Such SMILES were obtained from a library of commercial compounds, which were retained by our virtual reaction filter (see below)
  • decomposition_reactions.txt: Reaction SMARTS used to convert the molecules into their corresponding reactants.
  • LSTM_FLOW-MOL_DB_DATA.npy: Molecular database of commercially available molecules. Each entry contains the number of molecule, the molecular SMILES, and the molecular weight is stored. The numpy array object is too large to upload to git. See the virtual reaction section to understand how to use this file.
  • mol_db_data.csv: Molecular database stored as a csv file. This file needs to be converted to the corresponding numpy array object in order to work with decompose.py and retrieve_bb.py (see below).

Virtual reaction filter

Here you will find instructions to apply the virtual reaction filter, as explained in the paper. The retro-synthesis is performed in two steps. In the first step, a series of reactions are applied to each product molecule in order to decompose it in to its corresponding reactants (decompose.py). The reaction used and the reactant molecules are stored as a text file. Once all of the products have been decomposed, the reactant molecules are then compared against a database of known, commercially available molecules. If all of the reactant molecules for a given reaction can be found within the database, the product molecule along with the reaction and the retrieved reactants molecules are stored in the output file.

Installation

This code requires rdkit version 2018.09.1 to be installed. The best way to do this is to create a conda environment.

conda create -n my_retrieve_env -c rdkit rdkit=2018.09.1

Once you have created the conda environment, you need to activate it.

conda activate my_retrieve_env

Usage

The code for the decomposition and building block retrieval is located in the code folder

The folder contains the following files:

  • create_npy_db.py (generates the mol db array object)
  • decompose.py (converts molecules into reactants)
  • retrieve_bb.py (returns reactants found in mol db)
  • reaction_library.py (defines the reaction object)
  • reaction_class_auto.py (performs the reaction) Below, you will find a step-by-step explanation on how to use the code.

Create the Molecular DB

Before running the method, one first needs to create the mol db numpy array object:

python create_npy_db.py --input ../data/mol_db_data.csv --output ../data/LSTM_FLOW-MOL_DB_DATA.npy

--input: Molecules saved in a csv file (Number of Molecules,SMILES,Molecular Weight). csv file

--output: Molecules converted to a numpy array. numpy array

Conversion to Reactants

decompose.py applies each of the reactions specified in decomposition_reactions.txt to the input molecules. Each set of resulting reactants generated by a given molecule and reaction are written to the output file.

python decompose.py --mol molfile.txt --reaction decomposition_reactions.txt --out decomposition_output.txt --limit molecule_limit

--mol: The molecules to be decomposed. Each line of the input file should contain a single SMILES string. text file

--reaction: The decomposition reactions written in the SMARTS format (decomposition reaction | SMARTS | number of conserved rings). text file

--out: The file path to output the results to. text file

--limit: The number of molecules to process. If not specified, then decompose.py will process the entire file. int

Building Block Retrieval

retrieve_bb.py searches the molecular database for matches between the decomposed reactant moleuces and the molecules in the database. If all of the reactants for a given input molecule are found, then the result is written to the output file.

python retrieve_bb.py --decomp decomposition_output.txt --mol_db molecular_database.npy --out retrieve_output.txt

--decomp: The decomposed products generated by decompose.py. text file

--mol_db: Datbase of commercially available molecules stored as numpy array object. numpy array

--out: The file path specifying where to write the outputs to. string

Example

Examples of running decompose.py and retrieve_bb.py. All files have been provided except for the molecular_database file (LSTM_FLOW-MOL_DB_DATA.npy) as it was too large. LSTM_FLOW-MOL_DB_DATA.npy has been provided in the supplementary information of "Combining generative artificial intelligence and on-chip synthesis for de novo drug design".

python code/create_npy_db.py --input data/mol_db_data.csv --output data/LSTM_FLOW-MOL_DB_DATA.npy
python code/decompose.py --mol data/data_val.txt --reaction data/decomposition_reactions.txt --out output/test_decomp.txt --limit 100
python code/retrieve_bb.py --decomp output/test_decomp.txt --mol_db data/LSTM_FLOW-MOL_DB_DATA.npy --out output/test_retrieve.txt

Generative deep learning code

The code used for molecule generation can be found in the dedicated repository: ETHmodlab/virtual_libraries. To repeat our fine-tuning experiment and generate molecules, you can follow the instructions there and:

  1. Replace the parameters file by the one provided here
  2. Modify the path in the new parameters file to point toward the right data (provided here) and to the right pretrained CLM (provided here)

How to cite

If you use any data or scripts associated to this repo, please cite:

@article{grisoni2020,
  title         = {Combining generative artificial intelligence and on-chip synthesis for de novo drug design},
  author        = {Grisoni, Francesca and Huisman, Berend and Button, Alex and Moret, Michael and Atz, Kenneth and Merk, Daniel and Schneider, Gisbert},
  journal       = {Science Advances},
  volume        = {7},
  pages         = {eabg3338}, 
  year          = {2021},
  doi           = {10.1126/sciadv.abg3338},
 publisher      = {American Association for the Advancement of Science}

ai-on-a-chip's People

Contributors

grisonifr avatar abuttonch avatar michael1788 avatar

Stargazers

 avatar  avatar lyingjay avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.