Giter Site home page Giter Site logo

takshan / multiobj-rationale Goto Github PK

View Code? Open in Web Editor NEW

This project forked from wengong-jin/multiobj-rationale

0.0 0.0 0.0 55.4 MB

Multi-Objective Molecule Generation using Interpretable Substructures (ICML 2020)

License: MIT License

Python 100.00%

multiobj-rationale's Introduction

Multi-Objective Molecule Generation using Interpretable Substructures

This is the implementation of our ICML 2020 paper: https://arxiv.org/abs/2002.03244

Property Predictors

The property predictors for GSK3 and JNK3 are provided in data/gsk3/gsk3.pkl and data/jnk3/jnk3.pkl. For example, to predict properties of given molecules, run

python properties.py --prop jnk3 < data/jnk3/rationales.txt
python properties.py --prop gsk3,jnk3 < data/dual_gsk3_jnk3/rationales.txt

Rationale Extraction

The rationale extraction module will produce a list of triplets (molecule, rationale, score), where molecule is an active compound, rationale is a subgraph that explains the property and score is its predicted score. The following script uses 4 CPU cores (can be adjusted with --ncpu argument):

python mcts.py --data data/jnk3/actives.txt --prop jnk3 --ncpu 4 > jnk3_rationales.txt
python mcts.py --data data/gsk3/actives.txt --prop gsk3 --ncpu 4 > gsk3_rationales.txt

To construct multi-property rationales, we can merge the single-property rationales for GSK3 and JNK3:

python merge_rationale.py --rationale1 data/gsk3/rationales.txt --rationale2 data/jnk3/rationales.txt > gsk3_jnk3.txt

Generative Model Pre-training

The molecule completion model is pre-trained on the ChEMBL dataset. To construct the training set, run

python preprocess.py --train data/chembl/all.txt --ncpu 4
mkdir chembl-processed
mv tensor-* chembl-processed

To train the molecule completion model, run

python gnn_train.py --train chembl-processed --save_dir ckpt/chembl-molgen

Generate molecules with specific substructures

To generate molecules that contain specific substructures (e.g. benzene), first specify a rationale file named rationale.txt. Here is one example file with one line for a benzene.

c1ccc[c:1]c1

where atoms marked with 1 means the model should grow this fragment from these atoms. Then run

python decode.py --rationale rationale.txt --model ckpt/chembl-h400beta0.3/model.20 --num_decode 1000

This will generate 1000 molecules with at least one benzene ring.

GSK3 + JNK3 + QED + SA Molecule Design

This task seeks to design dual inhibitors against GSK3 and JNK3 with drug-likeness and synthetic accessibility constraints. We have already computed multi-property rationales in data/gsk3_jnk3_qed_sa/rationales.txt. It is a subset of GSK3-JNK3 rationales with QED > 0.6 and SA < 4.0.

Step 1: Fine-tuning with Policy Gradient

Given a set of rationales, the model learns to complete them into full molecules. The molecule completion model has been pre-trained on ChEMBL, and it needs to be fine-tuned so that generated molecules will satisfy all the property constraints. To fine-tune the model on the GSK3 + JNK3 + QED + SA task, run

python finetune.py \
  --init_model ckpt/chembl-h400beta0.3/model.20 --save_dir ckpt/tmp/ \
  --rationale data/gsk3_jnk3_qed_sa/rationales.txt --num_decode 200 --prop gsk3,jnk3,qed,sa --epoch 30 --alpha 0.5

Step 2: Molecule Generation

The molecule generation script will expand the extracted rationales into full molecules. The output is a list of pairs (rationale, molecule), where molecule is the completion of rationale. In the following example, each rationale is completed for 100 times, with different sampled latent vectors z.

python decode.py --model ckpt/gsk3_jnk3_qed_sa/model.final > outputs.txt

Step 3: Evaluation

You can evaluate the outputs for the four property constraint task by

python properties.py --prop gsk3,jnk3,qed,sa < outputs.txt | python scripts/qed_sa_dual_eval.py --ref_path data/dual_gsk3_jnk3/actives.txt

Here --ref_path contains all the reference molecules which is used for computing the novelty score.

multiobj-rationale's People

Contributors

wengong-jin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.