Giter Site home page Giter Site logo

zbarry / pytorch-hcs Goto Github PK

View Code? Open in Web Editor NEW
7.0 1.0 0.0 145 KB

Predict drug mechanism of action from high content screening images using PyTorch

License: MIT License

Jupyter Notebook 86.57% Python 13.43%
deep-learning pytorch microscopy computer-vision classification umap visualization holoviews pyviz

pytorch-hcs's Introduction

pytorch-hcs

Convolutional neural network-based prediction of drug mechanism-of-action (MoA) from high content screening images and use of CNN image embeddings to find outliers/novel images.

Background

High content screening / imaging

Fluorescence microscopy is a core tool in biological and drug discovery. High content screening automates fluorescence microscopy on a mass scale, allowing researchers to understand the impact of thousands of perturbations on cellular morphology and health in a single assay. Screens specifically focused on treatment of cells with biologically active molecules / drugs can lend insight into the function of those compounds based on how they modulate the imaged cellular structures. Functional insights can lead to identification of compound "hits" for potential drug candidates.

BBBC021 dataset

See the BBBC021 landing page for more info on the dataset.

tl;dr:

  • Human breast cancer cell line (MCF-7) treated with various compounds of both known and unknown MoA.
  • Following treatment, cells are stained for their nuclei (blue) and the cytoskeletal proteins tubulin (green) and actin (red).
Aurora kinase inhibitor Tubulin stabilizer Eg5 inhibitor

Example images from BBBC021.

Project goals

  • Given a multi-channel fluorescence image of MCF-7 cells, train a convolutional neural network to predict the mechanism-of-action of the compound the cells were treated with.
  • Use the trained CNN to extract image embeddings.
  • Perform UMAP dimensionality reduction on embeddings for dataset visualization and exploration.
  • Find interesting / artifactual image outliers in the BBBC021 dataset using image embeddings.

Getting started

  1. Clone the repository:
git clone https://github.com/zbarry/pytorch-hcs.git
  1. Install the environment and the pytorch_hcs package:
cd pytorch-hcs
conda install -c conda-forge mamba -y
mamba env update

(the mamba install is optional but recommended as a conda replacement which has much faster dependency solves)

This will create a pytorch-hcs environment and pip install the Python package in one go.

A fork of pybbbc will also be installed. We use this to download the BBBC021 dataset and access individual images and metadata.

  1. Acquire the BBBC021 dataset

Either run notebooks/download_bbbc021.ipynb from top to bottom or in a Python terminal (with the pytorch-hcs environment activated):

from pybbbc import BBBC021

BBBC021.download()
BBBC021.make_dataset(max_workers=2)

# test
bbbc021 = BBBC021()
bbbc021[0]

There are a lot of files to download. Plan on this process taking hours.

Project structure

Key dependencies

Python package

Reusable code modules are found in the pytorch_hcs package.

  • datasets.py - PyTorch dataset and PyTorch-Lightning DataModule for working with BBBC021 data.
  • models.py - PyTorch-Lightning modules wrapping CNN models.
  • transforms.py - image transforms for data augmentation.

Notebooks

The code that orchestrates the modules found in the Python package is in notebooks in the notebooks/ folder.

Available notebooks (by order of execution):

  1. 01_download_bbbc021.ipynb - download raw BBBC021 images and pre-process them using pybbbc.
  2. 02_bbbc021_visualization.ipynb - explore the BBBC021 dataset with an interactive visualization.
  3. 03_train_model.ipynb - train a CNN to predict MoA from BBBC021 images.
  4. 04_evaluate_model.ipynb - evaluate performance of trained CNN on test set.
  5. 05_visualize_embeddings.ipynb - produce image embeddings, UMAP them, visualize and find outliers.

Extras:

  • dataset_cleaning_visualization.ipynb - manually step through BBBC021 with a visualization to label images in the training and validation sets as "good" or "bad".
  • notebooks/analysis/umap_param_sweep.ipynb - sweep through UMAP parameterizations to assess impact on resulting embeddings.

Development

Install pre-commit hooks

These will clear notebook outputs as well as run code formatters upon commit.

pre-commit install

Ways to contribute

  • Decrease plate effects on embeddings (e.g., through adversarial learning).
  • Add hyperparameter sweep capability using Weights and Biases / improve model classification performance.
  • Log model test set evaluation results to W&B.
  • Make better use of W&B in general for tracking results.
  • Move BBBC021 dataset to ActiveLoop Hub to speed up download / dataset prep times.
  • Try out a k-fold cross validation strategy.

pytorch-hcs's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.