Giter Site home page Giter Site logo

choderalab / autonomous-molecular-design Goto Github PK

View Code? Open in Web Editor NEW
6.0 3.0 7.0 15.56 MB

Sandbox for the development of active-learning algorithms for automated drug discovery.

License: MIT License

Python 7.88% Shell 0.09% Jupyter Notebook 92.04%

autonomous-molecular-design's Introduction

Autonomous Molecular Design

[![Travis Build Status](https://travis-ci.org/REPLACE_WITH_OWNER_ACCOUNT/Autonomous Molecular Design.png)](https://travis-ci.org/REPLACE_WITH_OWNER_ACCOUNT/Autonomous Molecular Design) [AppVeyor Build status](https://ci.appveyor.com/project/REPLACE_WITH_OWNER_ACCOUNT/Autonomous Molecular Design/branch/master) [![codecov](https://codecov.io/gh/REPLACE_WITH_OWNER_ACCOUNT/Autonomous Molecular Design/branch/master/graph/badge.svg)](https://codecov.io/gh/REPLACE_WITH_OWNER_ACCOUNT/Autonomous Molecular Design/branch/master)

Drug discovery is an incredibly expensive and time-consuming process that aims to choose a few useful compounds out of the many billions of possible molecules that make up chemical space. Given the size of the search space, advanced algorithms are necessary to traverse this space in a fully-autonomous manner. This code base is a sandbox for development of active-learning algorithms and their applications to autonomous molecular design.

Data

This project used the 8.5-million-compound Enamine REAL diverse drug-like dataset, available at https://enamine.net/library-synthesis/real-compounds/real-compound-libraries. Random subsets of size 10,000 and 100,000 were selected using scripts/enamineRandomSubsetGenerator.py. These datasets included only SMILES-encoded structures.

To featurize our datasets, we trained a DeepChem graph convolutional neural network on each of the three properties we considered (beta-secretase 1 affinity or "bace", solobulity or "esol", and the log of the distribution coefficient logD). We then used these networks to make predictions for each property on each molecule in each dataset. This was performed using the scripts/enamineSubsetXXKGroundTruthGenerator.py files. All datasets are available in the directory labeled "data".

Note: the predictions made by the DeepChem models seem to vary widely given a fixed dataset and a given model. This issue is discussed at deepchem/deepchem#1629.

Models

The models directory contains Jupyter notebooks used for creating the ground truth property models and those models' checkpoints, and a notebook that was used early on to implement a DeepChem tutorial model.

The notebooks directory contains Jupyter notebooks containing the sandbox model in the SimpleADDScenario and the more up-to-date SimpleADDScenarioColab notebooks, a notebook for visualing the distributions of molecular properties, and a notebook used for establishing a random-search baseline.

The scripts directory also contains a file simpleaddscenariocolab.py which is a script version of the Colab notebook edited to run on an HPC cluster (preliminary version of the file).

The images directory contains plots/images generated measuring the model progress with a variety of metrics.

Notes

The temp directory contains a file "training_dataset.csv" that is used for training the model ensemble on the subsets of the data seen as the model progesses. This file is written to and read during the running of the sandbox model.

The list of packages and versions I had installed when DeepChem first worked for me is available at "DG Conda List Output 2019_6_10 (Functioning DeepChem)".

Copyright

Copyright (c) 2019, Chodera Lab

Acknowledgements

DeepChem is available at https://github.com/deepchem/deepchem and https://deepchem.io.

Project based on the Computational Molecular Science Python Cookiecutter version 1.0.

autonomous-molecular-design's People

Contributors

darnellgranberry avatar maxentile avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

autonomous-molecular-design's Issues

Travis CI Security Breach Notice

MolSSI is reaching out to every repository created from the MolSSI Cookiecutter-CMS with a .travis.yml file present to alert them to a potential security breach in using the Travis-CI service.

Between September 3 and September 10 2021, the Secure Environment Variables Travis-CI uses were leaked for ALL projects and injected into the publicly available runtime logs. See more details here. All Travis-CI users should cycle any secure variables/files, and associated objects as soon as possible. We are reaching out to our users in the name of good stewards of the third-party products we recommended and might still be in use and provide a duty-to-warn to our end-users given the potential severity of the breach.

We at MolSSI recommend moving away from Travis-CI to another CI provider as soon as possible. The nature of this breach and the way the response was mis-handled by Travis-CI, MolSSI cannot recommend the Travis-CI platform for any reason at this time. We suggest either GitHub Actions (as is used from v1.5 of the Cookiecutter-CMS) or some other service offered on GitHub.

If you have already addressed this security concern or it does not apply to you, feel free to close this issue.

This issue was created programmatically to reach as many potential end-users as possible. We do apologize if this was sent in error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.