Giter Site home page Giter Site logo

protac-design's Introduction

Protac-Design

Description

This repo contains the code behind our workshop paper at the NeurIPS 2022 AI4Science Workshop, Link to Paper. It is organized into the following notebooks:

  • surrogate_model.ipynb: Contains the code for processing the raw PROTAC data and training the DC50 surrogate model. Note that you will need to download the public PROTAC data from PROTAC-DB in order to reproduce the results.
  • molecule_metrics.ipynb: Contains code for computing metrics on a set of generated molecules. Metrics include percentage predicted active, percentage of duplicate molecules, percentage of molecules regenerated from training set, average number of atoms, chemical diversity, and drug-likeness.
  • binary_label_metrics.py: Contains useful functions for analyzing performance of binary classification models.

Then there are additional files in the repo:

  • surrogate_model.pkl: Contains the pre-trained surrogate model for DC50 prediction.
  • features.pkl: Contains list of features used in surrogate model training; required to reproduce reinforcement learning jobs using protac scoring function.

Instructions

  1. Before running any of the notebooks, you will need to download the PROTAC data from the public PROTAC-DB database.
  2. You will then need to create a conda environment containing the following main packages: rdkit, pandas, sklearn, scipy, ipython, and optuna. See the instructions in the next section for setting this up.
  3. Open the notebooks on your favorite platform, and make sure to select the right kernel before executing.

Environment

To set up the environment for running the notebooks in this repo, you can follow the following set of instructions:

conda create -n protacs-env -c conda-forge scikit-learn optuna rdkit
conda activate protacs-env
conda install pandas scipy 

Citation

Nori, Divya et al. (2022) "De novo PROTAC design using graph-based deep generative models." NeurIPS 2022 AI4Science Workshop.

Additional data

Additional data, including saved GraphINVENT model states, generated structures, analysis scripts, and training data, are available on Zenodo here.

Authors

  • Divya Nori
  • Rocío Mercado

protac-design's People

Contributors

divnori avatar mattsonthieme avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

protac-design's Issues

GraphInvent -finetune issue

Hi,
Its a outstanding work regarding protac. I just went through this code, i faced some difficulty when execulting GraphInvent.

So during finetune part. I have to specify these(mention below which is in comment) in constants.py file.
"""
scoring_model_activity = pickle.load(open('/home/gridsan/dnori/GraphINVENT/data/protac_scoring_models/Protac_Scoring_Model_1024_100nM.pkl', 'rb'))
scoring_model_structure = pickle.load(open('/home/gridsan/dnori/GraphINVENT/data/protac_scoring_models/Protac_Scoring_Model_Structure.pkl', 'rb'))
with open('/home/gridsan/dnori/GraphINVENT/data/protac_scoring_models/features_1024_100nM.pkl','rb') as fp:
features_activity = pickle.load(fp)
with open('/home/gridsan/dnori/GraphINVENT/data/protac_scoring_models/features_structure.pkl','rb') as fp:
"""

Question 1. How can i get these files or can i use fetaure.pkl and surrogate.pkl files here ? if yes then for which pkl to use
Question 2. How can i get protac _activity result that i s mention in scoring_funtion.py file?
Question 3. after running finetune , in return i am getting files like agent.smi , BASF.smi , .valid and .likelihood , So what are those ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.