Giter Site home page Giter Site logo

3dlinker's Introduction

3DLinker: An E (3) Equivariant Variational Autoencoder for Molecular Linker Design

About

This directory contains the code and resources of the following paper:

"3DLinker: An E(3) Equivariant Variational Autoencoder for Molecular Linker Design".

  1. 3DLinker is a 3D graph variational auto-encoder that is equivariant to rigid transformations and reflections (E(3) group). It takes two molecular fragments as input and generates a "linker" (both with graphs and spatial coordinates) attaching these two fragments.
  2. We thank the authors of Deep generative models for 3D linker design for releasing their code. Our code is based on their source code release (link).
  3. Please feel free to contact Yinan Huang [email protected] if you have issue using the code.

Overview of Model

We introduce 3DLinker, a variational auto-encoder, to address the simultaneous generation of graphs and spatial coordinates in molecular linker design. Our model leverages an important geometric inductive bias: equivariance w.r.t. E(3) transformations. See the concrete encoding and decoding (generation) process below.

model

Step 1. Encode the fragments and ground-truth into equivariant node-level embeddings

An equivariant GNN is applied to jointly embed the fragments and ground-truth into node-level embeddings, including both scalar-type and vector-type embeddings. They are equivariant in the sense that scalars and vectors are 0-order and 1-order E(3) tensors respectively.

Step2. Predict anchor nodes

Use the node embeddings to predict the anchor nodes that the linker will attach the two fragments.

Step3. Predict node type

Use the node embeddings to predict the type of nodes in the linker.

Step4. Predict edges and coordinates

Following an auto-regressive policy, we sequentially predict the edges and coordinates of the selected node. The nodes are selected in a BFS manner.

For more details, see Methodology of our paper.

Sub-directories

  • [generated_samples] contains the generated molecules. Each generation will produce a .smi file (graphs info) and a .sdf file (coordinates info).
  • [zinc] contains preprocessed ZINC data. Tranining dataset is not included due to upload limit. See Data for downloading training dataset.
  • [check_points] contains pytorch model checkpoints. "pretrained_model.pickle" is a provided checkpoint that can recover the experimental results in the paper.
  • [analysis] contains evaluation code.

Data

Only test dataset is included in this directory, which can be used for generation and evaluation. To train your own model, you can download the training dataset from here.

Code Usage

Python Envirnoment

The code is tested in Python 3.9 with Pytorch 1.11.

You can create a new conda environment using the provided yaml file: conda env create -f env.yml

or manually install the following packages:

  • Pytorch: install a proper version compatible with your platform (see Pytorch versions)
  • RDKit: conda install -c rdkit rdkit
  • Docopt: pip install docopt
  • Joblib: pip install joblib

Generation

To generate new molecules using pretrained model, run python main.py --dataset zinc --config-file test_config.json --generation True --load_cpt ./check_points/pretrained_model.pickle

The default setting is to generate 250 samples per test data, saved in directory "./generated_samples" as a smi file and a sdf file. The .smi file contains lines of fragments, ground-truth, generation. Look up "test_config.json" to see and modify the setting.

Evaluation

To evaluate the generated molecules, enter the analysis directory and run python evaluate_generated_mols.py ZINC PATH_TO_GENERATED_MOLS ../zinc/smi_train.txt 1 True None ./wehi_pains.csv

Training

To train your own model, first download training dataset from here and save it under directory [zinc]. Then run python main.py --dataset zinc --config-file train_config.json

Change hyper-parameters like batch size in file train_config.json. More hyper-params can be found in main.py.

3dlinker's People

Contributors

yinanhuang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar Chen-Hsuan Huang avatar  avatar yalong avatar Xiaobing Han avatar Haoran Liu avatar  avatar  avatar He Huang avatar SonPham avatar  avatar ZhiyeGuo avatar Yuchao avatar Santiago Vargas avatar sshy avatar Sutanu Mukhopadhyay avatar  avatar José Manuel Nápoles Duarte avatar  avatar  avatar Hong avatar Ling Yang avatar zk_9 avatar Dora Zhiyu Yang avatar kjogr avatar  avatar Junyoung Park avatar MuhammadAnwar avatar Yu Bao (鲍宇) avatar Tanya Malygina avatar Jianqi Zhang avatar Manu Llanos avatar Lin Min Htoo avatar  avatar Ouyangzhenqiu avatar  avatar  avatar QF Lao avatar

Watchers

sshy avatar  avatar

3dlinker's Issues

Code Recurrence Problem About 3DLinker

Hello, author. We have some problems in reproducing your code. Now I want to ask why. After generating connector molecules with your pre trained model, we run the evaluation function. The results of some indicators in the front are OK, but in the final filter detection, the results are all zero. We don't understand why. We hope we can wait for your help. Thank you!

Need documentation on how to use 3DLinker with new data

Hello Yinan, thank you for the awesome work! I am trying to use 3DLinker on a few toy examples of fragments, but there seems not to be instructions on how to use it with customized fragments. Would you please share the step-to-step workflow of using 3DLinker with two fragments in SMILES string as input? Thank you.
ps. I went through the instructions given by DeLinker, but there are still differences between.

For reproducing the reported results

Hi, Yinan

Thanks for the great work and for providing your code and checkpoint for the followers to reproduce 3DLinker.

First, I follow your instruction in the README to reproduce the experimental results in the paper.
Specifically, I use the "pretrained_model.pickle" in the [check_points] for run the command python3 main.py --dataset zinc --config-file test_config.json --generation True --load_cpt ./check_points/pretrained_model.pickle to generate the candidates, and python3 evaluate_generated_mols.py ZINC ../generated_samples/generated_smiles_zinc.smi ../zinc/smi_train.txt 1 True None ./wehi_pains.csv to evaluate the generated candidates.
But I get the results as follows:

Pass all 2D filters:            0.00%
Valid and pass all 2D filters:  0.00%
Pass synthetic accessibility (SA) filter:       0.00%
Pass ring aromaticity filter:                   0.00%
Pass SA and ring filters:                       0.00%
Pass PAINS filters:                             0.00%
Aveage RMSD is 0.068152

Is there anything wrong with my practice?

Second, I notice there is a step for processing raw graphs of train data while initializing the "Linker(args)."
How much time does it need to process ZINC data? In my env, around 4 hours. Is it normal? Can we do preprocess just once and save it before initializing the Linker?

Third, I find it may take a very long time to re-train the 3DLinker. Are there more statistics about re-implementing the 3DLinker, such as training environment, batch_size, epochs, and training time?

I really appreciate any help you can provide.
Thank you in advance!

Best wishes
Yu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.