3dlinker's Introduction

3DLinker: An E (3) Equivariant Variational Autoencoder for Molecular Linker Design

About

This directory contains the code and resources of the following paper:

"3DLinker: An E(3) Equivariant Variational Autoencoder for Molecular Linker Design".

3DLinker is a 3D graph variational auto-encoder that is equivariant to rigid transformations and reflections (E(3) group). It takes two molecular fragments as input and generates a "linker" (both with graphs and spatial coordinates) attaching these two fragments.
We thank the authors of Deep generative models for 3D linker design for releasing their code. Our code is based on their source code release (link).
Please feel free to contact Yinan Huang [email protected] if you have issue using the code.

Overview of Model

We introduce 3DLinker, a variational auto-encoder, to address the simultaneous generation of graphs and spatial coordinates in molecular linker design. Our model leverages an important geometric inductive bias: equivariance w.r.t. E(3) transformations. See the concrete encoding and decoding (generation) process below.

Step 1. Encode the fragments and ground-truth into equivariant node-level embeddings

An equivariant GNN is applied to jointly embed the fragments and ground-truth into node-level embeddings, including both scalar-type and vector-type embeddings. They are equivariant in the sense that scalars and vectors are 0-order and 1-order E(3) tensors respectively.

Step2. Predict anchor nodes

Use the node embeddings to predict the anchor nodes that the linker will attach the two fragments.

Step3. Predict node type

Use the node embeddings to predict the type of nodes in the linker.

Step4. Predict edges and coordinates

Following an auto-regressive policy, we sequentially predict the edges and coordinates of the selected node. The nodes are selected in a BFS manner.

For more details, see Methodology of our paper.

Sub-directories

[generated_samples] contains the generated molecules. Each generation will produce a .smi file (graphs info) and a .sdf file (coordinates info).
[zinc] contains preprocessed ZINC data. Tranining dataset is not included due to upload limit. See Data for downloading training dataset.
[check_points] contains pytorch model checkpoints. "pretrained_model.pickle" is a provided checkpoint that can recover the experimental results in the paper.
[analysis] contains evaluation code.

Data

Only test dataset is included in this directory, which can be used for generation and evaluation. To train your own model, you can download the training dataset from here.

Code Usage

Python Envirnoment

The code is tested in Python 3.9 with Pytorch 1.11.

You can create a new conda environment using the provided yaml file: conda env create -f env.yml

or manually install the following packages:

Pytorch: install a proper version compatible with your platform (see Pytorch versions)
RDKit: conda install -c rdkit rdkit
Docopt: pip install docopt
Joblib: pip install joblib

Generation

To generate new molecules using pretrained model, run python main.py --dataset zinc --config-file test_config.json --generation True --load_cpt ./check_points/pretrained_model.pickle

The default setting is to generate 250 samples per test data, saved in directory "./generated_samples" as a smi file and a sdf file. The .smi file contains lines of fragments, ground-truth, generation. Look up "test_config.json" to see and modify the setting.

Evaluation

To evaluate the generated molecules, enter the analysis directory and run python evaluate_generated_mols.py ZINC PATH_TO_GENERATED_MOLS ../zinc/smi_train.txt 1 True None ./wehi_pains.csv

Training

To train your own model, first download training dataset from here and save it under directory [zinc]. Then run python main.py --dataset zinc --config-file train_config.json

Change hyper-parameters like batch size in file train_config.json. More hyper-params can be found in main.py.

3dlinker's People

Contributors

Stargazers

Watchers

3dlinker's Issues

Code Recurrence Problem About 3DLinker

Hello, author. We have some problems in reproducing your code. Now I want to ask why. After generating connector molecules with your pre trained model, we run the evaluation function. The results of some indicators in the front are OK, but in the final filter detection, the results are all zero. We don't understand why. We hope we can wait for your help. Thank you!

Need documentation on how to use 3DLinker with new data

Hello Yinan, thank you for the awesome work! I am trying to use 3DLinker on a few toy examples of fragments, but there seems not to be instructions on how to use it with customized fragments. Would you please share the step-to-step workflow of using 3DLinker with two fragments in SMILES string as input? Thank you.
ps. I went through the instructions given by DeLinker, but there are still differences between.

For reproducing the reported results

Hi, Yinan

Thanks for the great work and for providing your code and checkpoint for the followers to reproduce 3DLinker.

First, I follow your instruction in the README to reproduce the experimental results in the paper.
Specifically, I use the "pretrained_model.pickle" in the [check_points] for run the command python3 main.py --dataset zinc --config-file test_config.json --generation True --load_cpt ./check_points/pretrained_model.pickle to generate the candidates, and python3 evaluate_generated_mols.py ZINC ../generated_samples/generated_smiles_zinc.smi ../zinc/smi_train.txt 1 True None ./wehi_pains.csv to evaluate the generated candidates.
But I get the results as follows:

Pass all 2D filters:            0.00%
Valid and pass all 2D filters:  0.00%
Pass synthetic accessibility (SA) filter:       0.00%
Pass ring aromaticity filter:                   0.00%
Pass SA and ring filters:                       0.00%
Pass PAINS filters:                             0.00%
Aveage RMSD is 0.068152

Is there anything wrong with my practice?

Second, I notice there is a step for processing raw graphs of train data while initializing the "Linker(args)."
How much time does it need to process ZINC data? In my env, around 4 hours. Is it normal? Can we do preprocess just once and save it before initializing the Linker?

Third, I find it may take a very long time to re-train the 3DLinker. Are there more statistics about re-implementing the 3DLinker, such as training environment, batch_size, epochs, and training time?

I really appreciate any help you can provide.
Thank you in advance!

Best wishes
Yu

Recommend Projects

yinanhuang / 3dlinker Goto Github PK

3dlinker's Introduction

3DLinker: An E (3) Equivariant Variational Autoencoder for Molecular Linker Design

About

Overview of Model

Sub-directories

Data

Code Usage

3dlinker's People

Contributors

Stargazers

Watchers

Forkers

3dlinker's Issues

Code Recurrence Problem About 3DLinker

Need documentation on how to use 3DLinker with new data

For reproducing the reported results

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent