Giter Site home page Giter Site logo

broadinstitute / profiling-resistance-mechanisms Goto Github PK

View Code? Open in Web Editor NEW
6.0 9.0 4.0 1.6 GB

Predicting pharmacodynamic responses to cancer drugs using cell morphology

License: BSD 3-Clause "New" or "Revised" License

Shell 0.01% Jupyter Notebook 72.29% R 0.84% Python 1.17% HTML 25.58% MATLAB 0.11%
morphology cell-painting machine-learning cancer pharmacodynamics resistance carpenter-lab

profiling-resistance-mechanisms's Introduction

DOI

Discovering Morphological Markers of Drug Resistance

In this repository we analyze Cell Painting data generated from multiple cell line clones that were resistant or sensitive to bortezomib.

Citation

Kelley ME, Berman AY, Stirling DR, Cimini BA, Han Y, Singh S, Carpenter AE, Kapoor TM, Way GP. High-content microscopy reveals a morphological signature of bortezomib resistance. (2023) eLife; 12:e91362. DOI: https://doi.org/10.7554/eLife.91362.

Data collection and processing

We cultured a colon cancer cell line (HCT116), treated with a proteosome inhibitor (Bortezomib), and selected two resistant clones. We applied Cell Painting to these cell lines (in triplicate) under four conditions (DMSO, 0.7nm, 7nm, and 70nm Bortezomib).

The Cell Painting assay captures several cellular morphology features (described in more detail here). Our hypothesis was that morphological features could distinguish wildtype from resistant clones.

We processed the cell painting data using CellProfiler. We use CellProfiler to test quality control, segment images to extract nuclei, and measure features captured by cell painting.

This repository contains all image analysis pipelines and image-based profiling pipelines (see 0.generate-profiles).

Pilot analyses

This repository ingests the processed Cell Painting data and performs several downstream analyses.

Using the triplicate measurements, and two batches, we perform the following pilot analyses:

  • Obtain similarity matrices for each batch independently and combined; perform hierarchical clustering; visualize heatmaps.
    • These analyses were performed using the Morpheus WebApp
    • An outline of the results can be viewed here.
  • Apply UMAP to the batched data to observe large differences across variables
  • Apply t-tests to determine cell morphology differences between conditions:
    • We test for differences between resistant clones at two doses of Bortezomib (0.7nm and 7nm)
    • We also test for differences between wildtype and resistant clones at a low dose of Bortezomib (0.7nm)

UMAP Batch Analysis

UMAP

T-test to Determine Morphological Differences

ttest

Reproducibility

We use conda to manage package versions. After installing conda, obtain all required packages:

conda env create --force --file environment.yml

# Activate environment
conda activate resistance-mechansisms

Clone the github repository. First, generate and enable SSH Keys if you haven't already.

# Then clone and enter repo
git clone [email protected]:broadinstitute/profiling-resistance-mechanisms
cd profiling-resistance-mechanisms

All analyses are presented in analysis.sh. To reproduce, perform the following:

./analysis.sh

Bug Reporting

Please file an issue with any questions or bug reports.

Internal documents

GDrive folder

profiling-resistance-mechanisms's People

Contributors

davidstirling avatar gwaybio avatar mekelley avatar shntnu avatar yhan8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

profiling-resistance-mechanisms's Issues

Duplicate Well Info

Warning message in profiling audit:

"WARNING! Duplicate well information detected in: Batch: 2019_06_25_Batch3 Plate: MutClones"

Combining plates in Batch 9 and 10

The data are described in more detail in #89. This issue is to discuss the results and interpretations of #90

Update 11/16/20

We discussed this plate layout in detail, and determined that the data are still usable. Our primary question is whether or not we can distinguish resistant clones from sensitive clones. We do not need drug treatment and we only need the DMSO-only plates to determine this.

Critical Issue

Specifically, each plate contains only a single treatment (either DMSO or drug). This prevents us from performing our traditional per-plate normalization strategy, and might make it impossible for us to reliably combine data across plates.

Approach to Address

  1. Determine the extent of the problem
    • Combine all plates together within batch
    • Apply PCA retaining 20 components, and then apply an ANOVA using Treatment, Clone, and Plate (and an interaction term between treatment and plate) as factors
    • Visualize the ANOVA F scores per factor
  2. Attempt to solve the problem in a naive way
    • Concatenate "augmented" (aggregated, unnormalized) per-plate "matches" (meaning all the plates that have DMSO and compound treatments for the specific clones)
    • Apply whole-plate standardization
    • Apply the steps outlined in the first step above

Results

normalized_pca_anova_batch_effects

  • Batch 7 and Batch 8 have good plate designs, Batch 9 and Batch 10 have suboptimal designs.
  • All four batches don't have plate effects! ๐ŸŽ‰
  • However, Batch 9 and Batch 10 only have very little impact for the Metadata treatement factor โ˜น๏ธ - this is because normalization occurs per plate, and each plate only has one treatment (either DMSO or drug)

naive_correction_pca_anova_batch_effects

  • A naive attempt (described above) explodes the impact of batch.
  • Maybe this is ok? Perhaps we can regress out those components and move on?

cc'ing @shntnu @AnneCarpenter - Comments/suggestions welcome, if there are any. Can we salvage these data? (there are 14 plates...)

Comparing Multiple Clones to Clone A and E

Previously, @mekelley and @ayberman sequenced Clones A and E and observed PSMB5 mutations (see #49 (comment)).

One goal is to determine if any of the four resistant clones (BZ001, BZ002, BZ003, and BZ004) also harbors a PSMB5 mutation based on a signature of morphology. The initial approach was to see where clone A and clone E clustered in comparison to each of the four "groups" we previously observed in #25. However, those groups appear to be artifactual - see #51.

An additional wrinkle in comparing the six clones, is that there appears to be batch effects between the batches analyzed. We do not observe substantial batch effect between BZ001, BZ002, BZ003, and BZ004 clones which were measured across 3 different batches (which totaled 5 different plates) (see #48). We also do not observe substantial batch effects between the two different batches (2 total plates) that measured clone A and clone E (batch 1 and 2). See below, which was added in #52

merged_umap_clone_ae

However, we do observe batch effects when combining the two groups of data (also added in #52):

clone_compare_batch_effect

Note that we can visually inspect for batch effects in this reduced dimension space (UMAP). It is not a great way to detect batch effects, but usually when we see batches cluster separately, especially when they both contain wild-type parental profiles, it hints towards a batch effect.

The current normalization procedure calculates z-scores for all samples. We can explore if alternative normalization procedures overcomes this batch effect.

Notes from Meeting 12/9/20

(I'm using this as a scratchpad for now)

This is the data used in the first part of the analysis presented

Metadata_batch Metadata_Plate Metadata_clone_number Metadata_model_split Metadata_clone_type n
2019_03_20_Batch2 207106_exposure320 CloneA test resistant 1
2020_07_02_Batch8 218361 CloneA test resistant 2
2019_02_15_Batch1_20X HCT116bortezomib CloneE test resistant 1
2019_03_20_Batch2 207106_exposure320 CloneE test resistant 1
2020_07_02_Batch8 218361 CloneE test resistant 1
2020_07_02_Batch8 218361 WT_parental test sensitive 3
2019_02_15_Batch1_20X HCT116bortezomib CloneA training resistant 3
2019_03_20_Batch2 207106_exposure320 CloneA training resistant 2
2020_07_02_Batch8 218361 CloneA training resistant 3
2019_02_15_Batch1_20X HCT116bortezomib CloneE training resistant 2
2019_03_20_Batch2 207106_exposure320 CloneE training resistant 2
2020_07_02_Batch8 218361 CloneE training resistant 4
2019_02_15_Batch1_20X HCT116bortezomib WT_parental training sensitive 3
2019_03_20_Batch2 207106_exposure320 WT_parental training sensitive 3
2020_07_02_Batch8 218361 WT_parental training sensitive 2
2020_07_02_Batch8 218360 CloneA validation resistant 5
2020_07_02_Batch8 218360 CloneE validation resistant 5
2020_07_02_Batch8 218360 WT_parental validation sensitive 5

Grouping further:

Metadata_model_split Metadata_clone_type Metadata_clone_number n
test resistant CloneA 3
test resistant CloneE 3
test sensitive WT_parental 3
training resistant CloneA 8
training resistant CloneE 8
training sensitive WT_parental 8
validation resistant CloneA 5
validation resistant CloneE 5
validation sensitive WT_parental 5

Grouping even further:

Metadata_model_split Metadata_clone_type n
training resistant 16
training sensitive 8
test resistant 6
test sensitive 3
validation resistant 10
validation sensitive 5

We are building a binary classifier of sorts (via regression-based feature selection + singscore) on the training set which comprises n=16 and n=8 samples in the two classes (sensitive and resistant).

Update Normalization (Batch 3)

@bethac07 created the profiles for batch 3 data by, essentially, following the standard profiling handbook protocol. The variable selection steps skipped (b/c the platemaps didn't permit) include "Correlation Threshold" and "Variance Threshold".

The plates were also normalized on the full plate. We need to check if normalizing by the parental cell line improves reproducibility in replicate profiles.

Strange Well Assignments for Batch 3 Mutant Clones

I noticed this when examining potential plate artifacts.

It appears that in the file backend/2019_06_25_Batch3/WTClones/WTClones_normalized_variable_selected.csv there are two different samples measured in the same well.

Sample BZ009 and BZ010 are measured each three times. The wells indicated for BZ009 are ["B10", "C10", "D10"] while the wells indicated for BZ010 are ["B11", "C10", "D10"]. Going on this data, it looks like the two samples were measured in the same well! We know that this is not the case however. See below:

Plate for Cell Count

Screen Shot 2019-08-15 at 11 37 00 AM

Note that this data was extracted directly from the sqlite files.

Plate for Replicate Correlation

Screen Shot 2019-08-15 at 11 42 34 AM

My guess is that BZ010 should be occupying well "C11" and "D11" in the figure above.

Recommendation

I am not sure we need to do anything at the moment, since we have the profiles already aggregated by well and this issue does not seem to have resulted in lost data. It is important to document though - maybe this persists in other projects? cc @bethac07 and @shntnu in case we've seen this before

Single Cell Analysis Results

in #77 we upgraded the single cell analysis to use data from batch 8 (data added in #75), and we use a multi-class classifier to predict clone A, clone E, and wildtype parental lines.

The results of this analysis were super interesting! Links to slides are below:

Preliminary analysis (#74) - https://docs.google.com/presentation/d/1_NwAVFJhqkA87duCBL5c2uxG9RlEj-qV4c5AFd4McWI/edit?usp=sharing
Batch 8 analysis (#77) - https://docs.google.com/presentation/d/13rOueMBlk8QbGx0G-Kl2qj42HVVShCbMF06PxXUfrv0/edit?usp=sharing

Primary Result

Many results and figures described in the batch 8 analysis slides relate to model performance and benchmarking model behavior. Below is the primary result to interpret.

wt_parental_dose

Predicting wildtype parental single cells treated with various doses of bortezomib. Real data model compared to a model trained with shuffled data. As the drug dose increases, the clones appear more wildtype in nature, and thus, more likely to be killed by bortezomib treatment. The wildtype lines are not impacted by bortezomib treatment, indicating that the initial model is isolating the core differences between resistant and non-resistant lines, and that these features are not directly associated with bortezomib resistance.

Cloning Details

As discussed at the meeting today, we were curious about the cloning details. We can use this issue to elaborate on the cloning procedure. We are specifically interested in knowing more details about how resistance was identified and selected for.

cc @mekelley @ayberman

Incorporate cytominer-eval

With an update to pycytominer@02ed6647a0913e9f0b28cbafa97766d55eeffd20, we can no longer rely on the audit function.

Traceback (most recent call last):
  File "generate-profiles.py", line 11, in <module>
    from scripts.profile_util import process_profile, load_config
  File "/home/ubuntu/efs/2018_05_30_ResistanceMechanisms_Kapoor/workspace/software/2018_05_30_ResistanceMechanisms_Kapoor/0.generate-profiles/scripts/profile_util.py", line 10, in <module>
    from pycytominer import (
ImportError: cannot import name 'audit' from 'pycytominer' (/home/ubuntu/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/pycytominer/__init__.py)

I need to incorporate cytominer-eval into the automated processing pipeline.

Bulk signature analysis

In #82, I an analysis of bulk (aggregated) signatures from two compiled datasets. #58 is an initial attempt at this analysis, but was using earlier (and lower quality data).

I summarize the experiment immediately below, and then describe the results in more detail further below.

Summary

The Clone AE results are promising. The signature and method clearly work in both training and testing splits, and there appears to be some sort of dose response. What this dose response means biologically is unclear, but technically (in the resistant lines) it means that the signature features become less extreme in their ranking. This means that the absolute value of signature features are higher in the Wildtype_parental profiles.

The Four Clone signature applied to the Clone AE data is odd. The results (at least for the DMSO treated samples) are mostly outside the null, and the score is less extreme than the clone AE signature, but the sign is flipped! This could be a result of some weird programmatic anomaly in fitting linear models (I confirmed one thing that might do this isn't), a metadata label mixup in the four clone dataset, or the method isn't robust across batches.

The signatures applied to the four clone dataset (even the four clone signature) are less conclusive. I am not confident in these data in nearly the same way that I am about the Batch 8 profiles.

Next steps

Signature titration

The number of features in the signature is high. Since one goal of the project is to identify a smaller set of features to potentially use as a biomarker of drug resistance, I will perform a "signature titration" analysis in which I systematically add features (starting with the most significant), and quantify the average difference of test set TotalScore between sensitive and resistant clones. This approach will give us a way to select the minimal set of features required to separate the clone types.

More data collection

We will work with the Rockefeller team to decide next steps in data collection. I see two additional data we could collect (note I have not yet processed batch 9 or 10 yet)

  • Four plates (like in batch 8) except with Clone A/E, Wildtype parental, and wildtype clones
  • Four plates (like in batch 8) except with the four wildtype clones, four resistant clones, and wildtype parental
  • Redo batch 3 (lots of WT and resistant clones with more wildtype parental lines) with updated protocols

I would also like to double check the platemap metadata labels for batches 4, 5, 6, and 7

Data

Clone A/E

  • Batch 8 profiles (four plates)
  • Only DMSO treated samples
  • Cell lines:
    • Resistant: CloneA, CloneE
    • Polyclonal wildtype: WT_parental
    • n = 240
  • Feature selection:
    • Operations: variance_threshold, correlation_threshold, drop_na_columns, blocklist, drop_outliers
    • Correlation threshold: 0.95
    • Performed independently
    • p = 3538, after feature selection = 434
  • Caveat:
    • We're comparing WT_parental to two clones. The signature may include features representing clonal selection.

Four Clone

  • Batches 4, 5, 6, and 7 (seven plates)
  • Only DMSO treated samples
  • Cell lines:
    • Wildtype (sensitive): WT002, WT008, WT009, WT009
    • Resistant: BZ001, BZ008, BZ017, BZ018
    • Polyclonal wildtype: WT_parental
    • n = 420
  • Feature selection:
    • Operations: variance_threshold, correlation_threshold, drop_na_columns, blocklist, drop_outliers
    • Correlation threshold: 0.95
    • Performed independently
    • p = 3538, after feature selection = 338

Signature generation

Procedure

For each dataset, I perform the following procedure:

  1. I split the data into an 85% training and 15% testing set balanced by cell line. I built the signature using only the training set.
  2. Using the feature selected features, fit a linear model using the following covariates:
    • Metadata_clone_type_indicator (resistant vs. sensitive)
    • Metadata_batch (four_clone only; cloneAE is one batch)
    • Metadata_Plate
    • Metadata_clone_number (clone id)
  3. Perform a TukeyHSD test to adjust p values for within feature multiple comparisons.
    • (e.g. in the cloneAE dataset and in the linear model, testing an individual feature using the plates covariate is actually 6 comparisons!)
  4. Adjust the TukeyHSD adjusted p values further by a bonferroni correction.
  5. Select all features with a p value below this adjusted rate for the Metadata_clone_type_indicator covariate this is the "PreSignature".
  6. Remove all features from the "PreSignature" with a p value below the adjusted rate for the Metadata_Plate and Metadata_batch covariates.
    • Effect: This reduces the impact of technical artifacts in the signature application

Volcano plots

These plots visualize feature significance for each linear model covariate

Clone AE

Click to show figure

bulk_tukey_volcano_cloneAE

Four Clone

Click to show figure

bulk_tukey_volcano_four_clone

Result

The signatures contain many features, and we make a distinction between features "up" and features "down":

  • Four Clone: 76 features
    • Up: 39
    • Down: 37
  • CloneAE: 188 features
    • Up: 95
    • Down: 93

Apply signatures

Approach

Because we have two datasets and two signatures, I applied each signature to each dataset independently. I also apply each signature with 1,000 random permutations to define a null distribution.

Method

I use the singscore method Foroutan et al.. This is a "single sample" method to detect signature enrichment. It is a relatively simple, rank-based approach bounded between -1 and 1, where a score of 1 means that the sample is enriched for signature features.

Results

Comparison 1: Clone AE Dataset - Clone AE Signature

data_cloneAE_signature_cloneAE_apply_singscore

Comparison 2: Clone AE Dataset - Four Clone Signature

data_cloneAE_signature_four_clone_apply_singscore

Comparison 3: Four Clone Dataset - Clone AE Signature

data_four_clone_signature_cloneAE_apply_singscore

Comparison 4: Four Clone Dataset - Four Clone Signature

data_four_clone_signature_four_clone_apply_singscore

Extracting XY location of sites

get names all channel 1 images of well B03 (and then download)

parallel aws s3 ls s3://imaging-platform/projects/2018_05_30_ResistanceMechanisms_Kapoor/2019_11_19_Batch5/images/217755/20191119-TH-WTMUT-4h-0-7nMbz_B03_s{1}_w1 ::: 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 17

use bftools to get metadata

parallel ~/Downloads/bftools/showinf -nopix 20191119-TH-WTMUT-4h-0-7nMbz_B03_s{1}_w1* '>' site_{1}.txt ::: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17

Extract stage information

parallel grep stage site_{1}.txt '>' stage_{1}.txt ::: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17

Then format it using some SublimeText magic to make it a csv

site,x,y
2,29736.2,48083.54
3,31175.9,48083.64
1,28298.52,48083.46
4,29016.74,48802.78
5,29735.66,48802.68
7,28297.44,49522.04
6,30456.76,48802.82
14,30456.84,50241.42
15,28297.12,50960.68
17,31174.62,50960.62
12,29016.64,50241.36
13,29738.18,50241.36
11,31174.24,49522.04
10,30456.86,49522.08
8,29017.98,49522.16
9,29737.52,49522.08
sites <- tribble(~site,~x,~y,
                 2,29736.2,48083.54,
                 3,31175.9,48083.64,
                 1,28298.52,48083.46,
                 4,29016.74,48802.78,
                 5,29735.66,48802.68,
                 7,28297.44,49522.04,
                 6,30456.76,48802.82,
                 14,30456.84,50241.42,
                 15,28297.12,50960.68,
                 17,31174.62,50960.62,
                 12,29016.64,50241.36,
                 13,29738.18,50241.36,
                 11,31174.24,49522.04,
                 10,30456.86,49522.08,
                 8,29017.98,49522.16,
                 9,29737.52,49522.08)

ggplot(sites, aes(x, y, label = site)) + geom_text()

image

Within Sub-Cluster Analyses

Based on the Morpheus heatmap (shown below) it appears there are four distinct sub-clusters, each with different correlational structure, but also each with representation from both wild-type and resistant clones.

batch3_morpheus

Experiment

  • Perform t-test within each individual subcluster and see if the features differences are the same

Making our images publicly-available

As we work towards publication, we also need to work towards making our data publicly available. The bulk profiles are already available, but it is a larger push to make our images public.

The group discussed this process today, and @shntnu recommended CellPainting gallery.

In this issue, we can discuss:

  1. Selecting batches to upload (which do we include in the manuscript, some were pilot batches),
  2. Thawing data
  3. Moving to the gallery
  4. Adding fluorescence and brightfield images

Notes from Meeting 1/21/20

  • Data from merged batches 5, 6, and 7 looking good so far
    • Batch correction not necessary
    • There might be some minor well effects
      • We discussed randomizing platemaps to the extent possible
  • Morphology signature is interesting, and worth having a biologist dig a bit deeper into
    • Still some concern over confluence issues
      • While it doesn't seem to impact profiles, it is impacting explanatory morphology features that are typically only important in high confluence settings
  • There might be something additionally interesting about the way the clones were selected
    • We talked about how they were all selected from the same cluster of resistant clones
      • Can you check me on this @mekelley? It is likely I didn't truly appreciate this point and I think it is important. (if possible, please add this to a new issue similar to #45 )
      • We know which cluster each sample belongs to which will help with:
        • Sequence specific clusters to determine molecular resistance mechanism
        • Use cluster info in future covariate models to observe which features are important for cluster of origin
  • The Rockefeller team is actively onboarding new clones and a new proteasome inhibitor
    • They will present at next check-in
  • We also briefly discussed presentations at next CSHL
    • It will be analysis heavy and @AnneCarpenter will represent (I don't remember if we've discussed this elsewhere before!)

Experiment Design - New Drugs

@mekelley and @ayberman are getting ready to ramp up data collection. Specifically, we will collect Cell Painting data from 7 resistant clones and a wildtype parental line undergoing two drug treatments and a DMSO negative control. The drugs are ixazomib (proteasome inhibitor) and CB-5083 (p97 inhibitor). The cells will be treated for four hours.

Clarification Questions

  • Will we also collect Cell Painting profiles with different drug doses?
  • Will the 7 resistant clones be selected independently per drug? In other words, will there actually be 14 new resistant clones?
  • Do we need to include a wild-type clonal line?

Wildtype Clonal Line Discussion

In my opinion, we definitely should try to mirror the same clonal selection procedure for the wildtype parental line to acquire a wildtype clonal line. Our approach in identifying morphology features of resistance mechanisms will be skewed by clonal selection signal. In other words, we're likely to obscure the resistance signals by also isolating the clonal selection signals.

I view the wild-type parental line as a really great validation resource. We should see a lot of heterogeneity in these samples, and could validate some resistance signatures we find by comparing wildtype clones and other resistant clones.

If we use the same clonal selection procedure for selecting resistant cells to the new drugs, then I think using the same WT clones for batches 4-7 (see #40) is sufficient.

Running Jupyter Notebook on AWS

I am getting the following errors when trying to run a jupyter notebook with an R kernel on AWS. An example of the errors I am receiving is:

Error: package โ€˜base64encโ€™ was installed by an R version with different internals; it needs to be reinstalled for use with this R version

A description of this problem is provided here.

I will document the packages that I needed to install with the given conda environment here:

  • digest
  • base64enc
  • Rcpp
  • rlang>0.4
  • glue
  • pillar
  • tibble
  • purrr
  • tidyselect
  • dplyr
  • tidyr
  • plyr
  • mnormt
  • foreign
  • stringi
  • reshape2
  • colorspace
  • scales
  • lazyeval
  • haven
  • jsonlite
  • lubridate
  • readr
  • readxl
  • xml2
  • tidyverse
  • ggplot2
  • htmltools
  • backports
  • bit
  • bit64
  • RSQLite
  • dbplyr
  • utf8

Lots of broken packages. Jupyter works again after reinstalling.

Batch 9 and 10 Data Summary

We added Batch 9 (2020_08_24_Batch9) and Batch 10 (2020_09_09_Batch10) profiles in #85 and audited in #88. I describe them in more detail below:

Including the data described in #40, we now have a total of 1,713 profiles

treatment_count

Split by batch:

treatment_count_by_batch

Plate Contents

Batch 9 (2020_08_24_Batch9)

Click to expand plate contents [6 plates]
  • 6 plates
    • 218775
      • Treatment: CB-5083 (700 nM)
      • Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, CB5038 clone 17, CB5038 clone 16, CB5038 clone 15, CB5038 clone 14, CB5038 clone 13
    • 218774
      • Treatment: DMSO (0.1%)
      • Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, CB5038 clone 17, CB5038 clone 16, CB5038 clone 15, CB5038 clone 14, CB5038 clone 13
    • 218699
      • Treatment: Ixazomib (50 nM)
      • Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, Ixazomib clone 01, Ixazomib clone 02, Ixazomib clone 03, Ixazomib clone 04, Ixazomib clone 05
    • 218698
      • Treatment: DMSO (0.1%)
      • Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, Ixazomib clone 01, Ixazomib clone 02, Ixazomib clone 03, Ixazomib clone 04, Ixazomib clone 05
    • 218697
      • Treatment: CB-5083 (700 nM)
      • Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, CB5038 clone 17, CB5038 clone 16, CB5038 clone 15, CB5038 clone 14, CB5038 clone 13
    • 218696
      • Treatment: DMSO (0.1%)
      • Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, CB5038 clone 17, CB5038 clone 16, CB5038 clone 15, CB5038 clone 14, CB5038 clone 13

Batch 10 (2020_09_09_Batch10)

Click to expand plate contents [8 plates]
  • 8 plates
    • 218852
      • Treatment: DMSO (0.1%)
      • Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, CB5038 clone 17, CB5038 clone 16, CB5038 clone 15, CB5038 clone 14, CB5038 clone 13
    • 218853
      • Treatment: CB-5083 (700 nM)
      • Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, CB5038 clone 17, CB5038 clone 16, CB5038 clone 15, CB5038 clone 14, CB5038 clone 13
    • 218854
      • Treatment: DMSO (0.1%)
      • Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, Ixazomib clone 01, Ixazomib clone 02, Ixazomib clone 03, Ixazomib clone 04, Ixazomib clone 05
    • 218855
      • Treatment: Ixazomib (50 nM)
      • Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, Ixazomib clone 01, Ixazomib clone 02, Ixazomib clone 03, Ixazomib clone 04, Ixazomib clone 05
    • 218856
      • Treatment: DMSO (0.1%)
      • Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, CB5038 clone 17, CB5038 clone 16, CB5038 clone 15, CB5038 clone 14, CB5038 clone 13
    • 218857
      • Treatment: CB-5083 (700 nM)
      • Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, CB5038 clone 17, CB5038 clone 16, CB5038 clone 15, CB5038 clone 14, CB5038 clone 13
    • 218858
      • Treatment: DMSO (0.1%)
      • Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, Ixazomib clone 01, Ixazomib clone 02, Ixazomib clone 03, Ixazomib clone 04, Ixazomib clone 05
    • 218859
      • Treatment: Ixazomib (50 nM)
      • Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, Ixazomib clone 01, Ixazomib clone 02, Ixazomib clone 03, Ixazomib clone 04, Ixazomib clone 05

Audit

Below, I report Percent Strong for each plate. For a full report see the figures in https://github.com/broadinstitute/profiling-resistance-mechanisms/tree/7e5ee27021816012297c38d58088438eb7ad3a53/1.profiling-audit/figures

Plate Batch Perturbation Percent Strong
218775 2020_08_24_Batch9 CB-5083 (700 nM) 65.33%
218774 2020_08_24_Batch9 DMSO (0.1%) 55.33%
218699 2020_08_24_Batch9 Ixazomib (50 nM) 87.33%
218698 2020_08_24_Batch9 DMSO (0.1%) 87.33%
218697 2020_08_24_Batch9 CB-5083 (700 nM) 75.33%
218696 2020_08_24_Batch9 DMSO (0.1%) 74.67%
218852 2020_09_09_Batch10 DMSO (0.1%) 60%
218853 2020_09_09_Batch10 CB-5083 (700 nM) 59.33%
218854 2020_09_09_Batch10 DMSO (0.1%) 52%
218855 2020_09_09_Batch10 Ixazomib (50 nM) 67.33%
218856 2020_09_09_Batch10 DMSO (0.1%) 40%
218857 2020_09_09_Batch10 CB-5083 (700 nM) 57.33%
218858 2020_09_09_Batch10 DMSO (0.1%) 71.33%
218859 2020_09_09_Batch10 Ixazomib (50 nM) 58.67%

Limitation

All plates in both batches include the same treatment in all wells. This prevents us from normalizing per plate. We will need to normalize all plates together and then adjust for batch effects.

Add License

The repo requires an open source license!

Correlate Mechanisms of Resistance with Morphology Profiles

@mekelley asked via an email thread what I am copying and pasting below (also noting here that I received permission to do so ๐Ÿ˜ธ)


Can identify the dominant morphological features that contributed to the 0.1% DMSO treated clones A and E from our earlier data (without the Costes features) and see if theyโ€™re the same dominant features from the 0.1% DMSO treated bortezomib resistant clones from group 3 (our recent data)?

What about dominant features from bortezomib resistant clones in the other groups from batch 3 (again without Costes features)?

The reason for these questions is to begin to correlate mechanisms of resistance (i.e. mutation in the target protein, PSMB5, or multidrug resistance or something else) with morphological profiles.

CellProfiler feature analysis

In a recent meeting, the team discussed a followup analysis for interrogating the signature features. We have already performed at least one feature analysis that defines all of the bz signature features (all features in the analysis) https://github.com/broadinstitute/profiling-resistance-mechanisms/blob/master/3.resistance-signature/5.visualize-signature-features.ipynb

Task 1 We'll also likely want to generate feature heatmaps (in a similar way that we define profile heatmaps) for only the bz signature features

Task 2 The final feature analysis ranks all features on their ability to separate the two classes (resistant vs. sensitive). We'll select the top discriminative features based on their ranks in the holdout set, but we should also perform this ranking procedure using all sets (training, test, validation, etc.).

The three above efforts nicely narrow down our search space from 1) all features, 2) bz features, 3) top bz features.

We can use this issue to elaborate on our approaches to form the heatmaps, to perform the ranking, and to determine effective visualizations.

Create Combined Heatmap

Currently, the heatmaps are separated by batch. We need to combine heatmaps together, generate a new figure, and note in a ColSideColor the batch number.

Notes from Meeting 11/13/19

  • The confluence of cells appear to be a big issue
    • The current molecular pipeline is to seed at uniform density
    • If we generate additional plates, seeded at different densities, we can test which densities are best for specific clones
    • I will also add cell count to the subcluster morpheus heatmap (see #33)
  • Next steps:
    • @bethac07 and team will run image analysis pipeline on new data
    • @mekelley and Adi (we need to get Adi on github! ๐Ÿ˜„ ) will generate new data with different seeding concentrations (see above)
    • I will send out list of top 10-15 features (based on certain metrics) to send to biologists for interpretation (@bethac07, @mekelley, Adi, and @AnneCarpenter) (see #34)

Enhancement of 3.resistance-signature/0.training-test-split.ipynb

We dropped the inference set (batch 3) because of overly confluent plates and suboptimal plate design. In the notebook, I still need to output the bortezomib signature analytical set, but I can also include the new batches of data, which will serve as a better experimentally designed inference set as included in pull request #114

What is the best way to select Profiling Variables?

In working through #1 and #8 I thought about how I would combine data from batches together. My two options seemed to be:

  1. Take feature union
  2. Take feature intersection

The problem with taking the feature intersection is that fewer features are selected. The problem with taking feature union is that some variables may not be "good" features.

What is the method for cell painting variable selection? If the features are removed b/c of lack of consistency across replicates, then they should not be included in the feature union. However, if the features are removed because they are deemed redundant, then they should be included in the feature union.

@shntnu is there any way to track the decisions for feature selection?

Site-Level Profiles - Decision

We chatted today about site-level profile strategy during our checkin meeting. I'll summarize the decision and discuss implications below:

  • We decided to revert back to well-level profiles
    • We observed little batch effects using well profiles compared to site-level profiles
    • We don't currently know site layout so we can't adjust
    • We don't know if site-level profiles will actually provide any additional signal benefit
  • We will generate well-level signatures, and then apply them to single cell profiles
    • Doing this will help us to determine if the signature we find is specific to bortezomib resistance

Site level results discussed in #59, #60, #61, #63, and #69

Bulk profile normalization strategy

Goal

To build a bulk profile signature to distinguish wildtype clones from clone A/E resistant clones.

Challenges

  • No single plate exists to facilitate this analysis, which requires us to combine plates. The current normalization strategy prohibits us from merging plates without difficulty
  • Batches 4 - 7 have behaved poorly in the past. It is unclear why.

Requirement

  • We need the wildtype clones and clone A/E resistant clones to be aligned

Approach

Simple

Plate-based normalization with wildtype parental lines only, and then merge batch 4, 5, 6, 7, and 8.

2-Step-Sphering

W_8 โ€”> sphering transform learned on D_8

D_4
D_5
D_7
D_8

W_8 * D8 โ€”> has a unit covariance matrix

W_4 โ€”> sphering transform learned on W_8 * D_4 (And not D_4)

Corrected data:
Dโ€™_8 <- W_8 * D_8
Dโ€™_4 <- W_4 * W_8 * D_4
Dโ€™_5 <- W_5 * W_8 * D_5
Dโ€™_7 <- W_7 * W_8 * D_7

All data
A_4
A_5
A_7
A_8

Aโ€™_4 <- W_4 * W_8 * A_4
Aโ€™_5
Aโ€™_7
Aโ€™_8 <- W_8 * A_8

Concern

One concern is that with such a complicated normalization strategy, applying this method to any future data will be extremely challenging.

Remaining analyses before writing the paper

A note from Adi on the remaining analyses required for a paper:

  • Building the bortezomib sensitivity signature
    • Cell line selection and clone generation
    • Any quality control analysis that went into this
    • Generating the Bz signature (WT clones 1-5, BZ clones A, E, 1-5)
    • Applying the Bz signature to other WT (10, 12-15) and Bz-resistant (6-10) clones to see that it applies well to clones the signature wasn't trained on.
  • Applying the Bz signature to Ixazomib resistant clones
    • expectation: signature should pick up Ixazomib resistant clones somewhat, but not be a perfect fit
  • Applying the Bz signature to CB-5083 resistant clones
    • expectation: signature should pick up CB-5083 resistant clones somewhat, but fit should be similar or worse than Ixazomib

Investigate Bortezomib signature in the context of other proteasomal inhibitors

We've received some insightful feedback from reviewers on our paper. Here, we will address two major comments:

Reviewer 2 - Major Comment 6.
Interestingly, the Bortezomib signature is specific to the drug and not a broad range of proteasomal inhibitors. However, seeing the common features between all the proteasomal inhibitors would be interesting.

Reviewer 3 - Major Comment 4
There was some predictive ability of the Bortezomib Signature for ixazomib resistance. Were there some features that were correlated with IX-resistance, i.e. UPS pathway, versus specific to bortezomib? Do the features suggest anything about resistance mechanisms or is the feature set too abstruse to interpret?

Next steps: @yhan8 will outline the analysis that's needed, seeking inputs from @gwaybio and @shntnu as needed, and then perform the analysis.

Notes from Meeting 10/16

  • Writing progress report
    • Megan and Tarun will send out for comments on final version
    • No figures necessary
    • Beth will add language around wild-type clonal vs. resistance signatures (4 subclusters in heatmap)
  • Mixture experiment
    • If we titrate wild-type and resistant clones into the same dish, can we distinguish?
    • Will help to set baseline of models predicting one from the other
    • Add in some ground truth fluorescent marker
    • We can simulate this experiment in silico first
  • Collecting more data
    • More bortezomib plates (with a mixture of wild-type and resistant clones on the same plate)
    • Low dose treatments to reduce dead cell effect and determine when we can start to isolate resistant populations
  • Focus on Proteasome inhibitors
    • Collect Cell Painting using other proteasome inhibitors (one that just failed clinical trails, other that has been approved)
    • Answer question if resistance signature is the same across inhibitors? Is there a universal signature that represents proteasome inhibition resistance? What proportion of untreated cells have these conditions?
  • Big Picture: Can we characterize all common resistance signatures via Cell Painting?
    • There is a paper in Cancer Cell that talks about them (there are a surprisingly few core resistance mechanisms) @mekelley, can you add this paper here? I couldn't seem to find.

Add Cell Count to SubCluster Morpheus Heatmap

in #25 I performed a subcluster analysis to identify differential features across mutant and wild-type clones in each cluster independently.

There is one large big red cluster in the heatmap. We noticed that random example images from this block had high confluence (see below).

To more systematically address this question, I will add cell count to morpheus heatmaps.

Cells in Big Red Cluster

WT Clones in Well C02

image

Mut Clones in Well B06

image

WT Parental Clones in D11

image

Cells not in big red cluster

Mut Clones in D09

image

WT Clones in E05

image

UMAP bug

In step 2, I ran into a new bug:

[NbConvertApp] Converting notebook 1.merge-datasets-gct.ipynb to html
[NbConvertApp] Writing 600777 bytes to scripts/html/1.merge-datasets-gct.html
[NbConvertApp] Converting notebook 2.umap-aggregate-profiles.ipynb to html
Traceback (most recent call last):
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/bin/jupyter-nbconvert", line 11, in <module>
    sys.exit(main())
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/jupyter_core/application.py", line 254, in launch_instance
    return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/traitlets/config/application.py", line 845, in launch_instance
    app.start()
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/nbconvertapp.py", line 350, in start
    self.convert_notebooks()
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/nbconvertapp.py", line 524, in convert_notebooks
    self.convert_single_notebook(notebook_filename)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/nbconvertapp.py", line 489, in convert_single_notebook
    output, resources = self.export_single_notebook(notebook_filename, resources, input_buffer=input_buffer)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/nbconvertapp.py", line 418, in export_single_notebook
    output, resources = self.exporter.from_filename(notebook_filename, resources=resources)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/exporters/exporter.py", line 181, in from_filename
    return self.from_file(f, resources=resources, **kw)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/exporters/exporter.py", line 199, in from_file
    return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/exporters/html.py", line 119, in from_notebook_node
    return super().from_notebook_node(nb, resources, **kw)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/exporters/templateexporter.py", line 369, in from_notebook_node
    nb_copy, resources = super().from_notebook_node(nb, resources, **kw)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/exporters/exporter.py", line 143, in from_notebook_node
    nb_copy, resources = self._preprocess(nb_copy, resources)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/exporters/exporter.py", line 318, in _preprocess
    nbc, resc = preprocessor(nbc, resc)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/preprocessors/base.py", line 47, in __call__
    return self.preprocess(nb, resources)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 79, in preprocess
    self.execute()
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbclient/util.py", line 74, in wrapped
    return just_run(coro(*args, **kwargs))
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbclient/util.py", line 53, in just_run
    return loop.run_until_complete(coro)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/asyncio/base_events.py", line 573, in run_until_complete
    return future.result()
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbclient/client.py", line 554, in async_execute
    cell, index, execution_count=self.code_cells_executed + 1
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 123, in async_execute_cell
    cell, resources = self.preprocess_cell(cell, self.resources, cell_index)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 146, in preprocess_cell
    cell = run_sync(NotebookClient.async_execute_cell)(self, cell, index, store_history=self.store_history)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbclient/util.py", line 74, in wrapped
    return just_run(coro(*args, **kwargs))
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbclient/util.py", line 53, in just_run
    return loop.run_until_complete(coro)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nest_asyncio.py", line 70, in run_until_complete
    return f.result()
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/asyncio/futures.py", line 178, in result
    raise self._exception
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/asyncio/tasks.py", line 223, in __step
    result = coro.send(None)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbclient/client.py", line 857, in async_execute_cell
    self._check_raise_for_error(cell, exec_reply)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbclient/client.py", line 760, in _check_raise_for_error
    raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)
nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
------------------
import os
import numpy as np
import pandas as pd
import umap

import plotnine as gg

from pycytominer import feature_select
from pycytominer.cyto_utils import infer_cp_features
------------------

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/errors.py in new_error_context(fmt_, *args, **kwargs)
    743     try:
--> 744         yield
    745     except NumbaError as e:

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/lowering.py in lower_block(self, block)
    229                                    loc=self.loc, errcls_=defaulterrcls):
--> 230                 self.lower_inst(inst)
    231         self.post_block(block)

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/lowering.py in lower_inst(self, inst)
    327             val = self.lower_assign(ty, inst)
--> 328             self.storevar(val, inst.target.name)
    329

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/lowering.py in storevar(self, value, name)
   1277                                                           name=name)
-> 1278             raise AssertionError(msg)
   1279

AssertionError: Storing i64 to ptr of i32 ('dim'). FE type int32

During handling of the above exception, another exception occurred:

LoweringError                             Traceback (most recent call last)
<ipython-input-1-886110eebc29> in <module>
      2 import numpy as np
      3 import pandas as pd
----> 4 import umap
      5
      6 import plotnine as gg

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/umap/__init__.py in <module>
----> 1 from .umap_ import UMAP
      2
      3 # Workaround: https://github.com/numba/numba/issues/3341
      4 import numba
      5

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/umap/umap_.py in <module>
     52 from umap.spectral import spectral_layout
     53 from umap.utils import deheap_sort, submatrix
---> 54 from umap.layouts import (
     55     optimize_layout_euclidean,
     56     optimize_layout_generic,

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/umap/layouts.py in <module>
     34         "result": numba.types.float32,
     35         "diff": numba.types.float32,
---> 36         "dim": numba.types.int32,
     37     },
     38 )

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/decorators.py in wrapper(func)
    219             with typeinfer.register_dispatcher(disp):
    220                 for sig in sigs:
--> 221                     disp.compile(sig)
    222                 disp.disable_compile()
    223         return disp

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/dispatcher.py in compile(self, sig)
    907                 with ev.trigger_event("numba:compile", data=ev_details):
    908                     try:
--> 909                         cres = self._compiler.compile(args, return_type)
    910                     except errors.ForceLiteralArg as e:
    911                         def folded(args, kws):

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/dispatcher.py in compile(self, args, return_type)
     77
     78     def compile(self, args, return_type):
---> 79         status, retval = self._compile_cached(args, return_type)
     80         if status:
     81             return retval

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/dispatcher.py in _compile_cached(self, args, return_type)
     91
     92         try:
---> 93             retval = self._compile_core(args, return_type)
     94         except errors.TypingError as e:
     95             self._failed_cache[key] = e

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/dispatcher.py in _compile_core(self, args, return_type)
    109                                       args=args, return_type=return_type,
    110                                       flags=flags, locals=self.locals,
--> 111                                       pipeline_class=self.pipeline_class)
    112         # Check typing error if object mode is used
    113         if cres.typing_error is not None and not flags.enable_pyobject:

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/compiler.py in compile_extra(typingctx, targetctx, func, args, return_type, flags, locals, library, pipeline_class)
    604     pipeline = pipeline_class(typingctx, targetctx, library,
    605                               args, return_type, flags, locals)
--> 606     return pipeline.compile_extra(func)
    607
    608

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/compiler.py in compile_extra(self, func)
    351         self.state.lifted = ()
    352         self.state.lifted_from = None
--> 353         return self._compile_bytecode()
    354
    355     def compile_ir(self, func_ir, lifted=(), lifted_from=None):

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/compiler.py in _compile_bytecode(self)
    413         """
    414         assert self.state.func_ir is None
--> 415         return self._compile_core()
    416
    417     def _compile_ir(self):

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/compiler.py in _compile_core(self)
    393                 self.state.status.fail_reason = e
    394                 if is_final_pipeline:
--> 395                     raise e
    396         else:
    397             raise CompilerError("All available pipelines exhausted")

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/compiler.py in _compile_core(self)
    384             res = None
    385             try:
--> 386                 pm.run(self.state)
    387                 if self.state.cr is not None:
    388                     break

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/compiler_machinery.py in run(self, state)
    337                     (self.pipeline_name, pass_desc)
    338                 patched_exception = self._patch_error(msg, e)
--> 339                 raise patched_exception
    340
    341     def dependency_analysis(self):

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/compiler_machinery.py in run(self, state)
    328                 pass_inst = _pass_registry.get(pss).pass_inst
    329                 if isinstance(pass_inst, CompilerPass):
--> 330                     self._runPass(idx, pass_inst, state)
    331                 else:
    332                     raise BaseException("Legacy pass in use")

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/compiler_lock.py in _acquire_compile_lock(*args, **kwargs)
     33         def _acquire_compile_lock(*args, **kwargs):
     34             with self:
---> 35                 return func(*args, **kwargs)
     36         return _acquire_compile_lock
     37

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/compiler_machinery.py in _runPass(self, index, pss, internal_state)
    287             mutated |= check(pss.run_initialization, internal_state)
    288         with SimpleTimer() as pass_time:
--> 289             mutated |= check(pss.run_pass, internal_state)
    290         with SimpleTimer() as finalize_time:
    291             mutated |= check(pss.run_finalizer, internal_state)

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/compiler_machinery.py in check(func, compiler_state)
    260
    261         def check(func, compiler_state):
--> 262             mangled = func(compiler_state)
    263             if mangled not in (True, False):
    264                 msg = ("CompilerPass implementations should return True/False. "

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/typed_passes.py in run_pass(self, state)
    461
    462         # TODO: Pull this out into the pipeline
--> 463         NativeLowering().run_pass(state)
    464         lowered = state['cr']
    465         signature = typing.signature(state.return_type, *state.args)

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/typed_passes.py in run_pass(self, state)
    382                 lower = lowering.Lower(targetctx, library, fndesc, interp,
    383                                        metadata=metadata)
--> 384                 lower.lower()
    385                 if not flags.no_cpython_wrapper:
    386                     lower.create_cpython_wrapper(flags.release_gil)

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/lowering.py in lower(self)
    134         if self.generator_info is None:
    135             self.genlower = None
--> 136             self.lower_normal_function(self.fndesc)
    137         else:
    138             self.genlower = self.GeneratorLower(self)

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/lowering.py in lower_normal_function(self, fndesc)
    188         # Init argument values
    189         self.extract_function_arguments()
--> 190         entry_block_tail = self.lower_function_body()
    191
    192         # Close tail of entry block

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/lowering.py in lower_function_body(self)
    214             bb = self.blkmap[offset]
    215             self.builder.position_at_end(bb)
--> 216             self.lower_block(block)
    217         self.post_lower()
    218         return entry_block_tail

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/lowering.py in lower_block(self, block)
    228             with new_error_context('lowering "{inst}" at {loc}', inst=inst,
    229                                    loc=self.loc, errcls_=defaulterrcls):
--> 230                 self.lower_inst(inst)
    231         self.post_block(block)
    232

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/contextlib.py in __exit__(self, type, value, traceback)
    128                 value = type()
    129             try:
--> 130                 self.gen.throw(type, value, traceback)
    131             except StopIteration as exc:
    132                 # Suppress StopIteration *unless* it's the same exception that

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/errors.py in new_error_context(fmt_, *args, **kwargs)
    749         newerr = errcls(e).add_context(_format_msg(fmt_, args, kwargs))
    750         tb = sys.exc_info()[2] if numba.core.config.FULL_TRACEBACKS else None
--> 751         raise newerr.with_traceback(tb)
    752
    753

LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
Storing i64 to ptr of i32 ('dim'). FE type int32

File "../../../../../../../miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/umap/layouts.py", line 52:
def rdist(x, y):
    <source elided>
    result = 0.0
    dim = x.shape[0]
    ^

During: lowering "dim = static_getitem(value=$8load_attr.2, index=0, index_var=$const10.3, fn=<built-in function getitem>)" at /Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/umap/layouts.py (52)
LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
Storing i64 to ptr of i32 ('dim'). FE type int32

File "../../../../../../../miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/umap/layouts.py", line 52:
def rdist(x, y):
    <source elided>
    result = 0.0
    dim = x.shape[0]
    ^

During: lowering "dim = static_getitem(value=$8load_attr.2, index=0, index_var=$const10.3, fn=<built-in function getitem>)" at /Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/umap/layouts.py (52)

Notes from Meeting 1/8/20

Documenting notes here that I previously transcribed

  • Cloning procedure was clarified in #45
  • Next steps computationally
    • Look at features that differentiate resistant and wild-type
      • Are these features consistent in different batches?
    • Figure out the best way to merge datasets together
  • Next steps molecularly
    • Collect cell painting data in a new cell line (this will require additional troubleshooting to onboard new cell line)
      • ~2 to 4 months for new cell line
        • Broad may help with acquisition?
    • Collect resistance mechanisms from new drug
      • ~2 months for another drug
  • Potential high-level outcomes:
    • We find a clear resistance signature that helps form hypothesis
    • The resistance signature is a mystery and we'll need to figure out next steps

Signature Analysis - PMSB5 and generic resistance signatures

In #57 I add a signature analysis. The purpose of this analysis is to identify morphology features that are significantly different between wildtype and resistant clones. The next step is to apply the signatures to other profiles to 1) validate the approach and 2) predict the resistance status of different samples.

This analysis was prompted by #49 .

I will describe the approach, results, and conclusions in this issue.

Site Level Audit - Results

In #60 I add code and results for performing an analysis audit (determining profile reproducibility and plate effects). I generate site level profiles in #59 for a subset of batches (batch 1 20x, batch 2, batch 5, batch 6, and batch 7). I show results below:

Batch 1 - 20X

Click to expand

Cell Counts

2019_02_15_Batch1_20XHCT116bortezomib_plate_effects_cell_count_by_site

Replicate Reproducibility

2019_02_15_Batch1_20X_HCT116bortezomib_site_correlation

Batch 2

Click to expand

Cell Counts

2019_03_20_Batch2207106_exposure320_plate_effects_cell_count_by_site

Replicate Reproducibility

2019_03_20_Batch2_207106_exposure320_site_correlation

Batch 5

Click to expand

Cell Counts

2019_11_19_Batch5217755_plate_effects_cell_count_by_site

Replicate Reproducibility

2019_11_19_Batch5_217755_site_correlation

Batch 6

Click to expand

Cell Counts - Plate 217760

2019_11_20_Batch6217760_plate_effects_cell_count_by_site

Replicate Reproducibility - Plate 217760

2019_11_20_Batch6_217760_site_correlation

Cell Counts - Plate 217762

2019_11_20_Batch6217762_plate_effects_cell_count_by_site

Replicate Reproducibility - Plate 217762

2019_11_20_Batch6217762_plate_effects_cell_count_by_site

Batch 7

Click to expand

Cell Counts - Plate 217766

2019_11_22_Batch7217766_plate_effects_cell_count

Replicate Reproducibility - Plate 217766

2019_11_22_Batch7_217766_site_correlation

Cell Counts - Plate 217768

2019_11_22_Batch7217768_plate_effects_cell_count_by_site

Replicate Reproducibility - Plate 217768

2019_11_22_Batch7_217768_site_correlation

Summary

There are many cells in nearly every site (some sites are completely missing) and the impact of plate effects appears minimal. I will proceed with analyzing these data downstream using site-level profiles.

Document results of profile reproducibility at 40x vs 20x

@bethac07 had report this

I've attached the PNGs of the clustering (and Shantanu, for you the GCTs)- on the left inserted here in the email we have the clustering from the 20X images, and on the right we have the 40x images. On the whole, they're actually remarkably similar, and pretty sensible from what we know about the biology- the biggest supercluster at the top left for both are the untreated and the low dose wells of all three cell clones, the supercluster at the other end are the higher doses of drug (and for WT, the intermediate dose), and for the resistant clones the middle dose produces an intermediate phenotype. Overall, I'm pretty encouraged by this- it means we're getting solid profiles (congrats guys!), and to me this doesn't suggest there is a "wrong" choice with respect to 20 vs 40x. There certainly seems to be more substructure in the 20X, but I'm not entirely certain how much we should really read into that that

20x:
20XClustering

40x
40XClustering

20x GCT (rename to .gct)
2019_02_15_Batch1_20X_collapsed.gct.txt

40x (rename to .gct)
2019_02_15_Batch1_40X_collapsed.gct.txt

Since this is the only project where we collected 20x vs 40x data, it will be very useful to quantify this result.

@gwaygenomics Any ideas on how to do this? I'd imagine a one-number summary will be hard, since it probably makes sense to report across all doses.

We no longer observe four clear "groups" in batch 3 data

Previously, as described in #25, we observed four distinct groups in batch 3 data. Interestingly, these four groups seemed to have similar representation of resistant and wildtype clones.

In #25, I describe results from a subcluster analysis that was performed to determine if any features were consistently different between resistant and wild-type clones. As shown here #25 (comment), we did not observe many features with consistent differences. This suggests that core differences between wild-type and resistant clones is not consistent across these groups.

Additionally, it was determined that one driving force behind the groupings was because of wonky normalized profiles, primarily b/c of costes features. So, starting in #32, I began reprocessing all profiles using a unified pipeline based on pycytominer. In an upcoming pull request, I will add all software, data, and results, but I will show below that we no longer observe four clear groups. There is still definitely interesting structure in the data, but the groupings are not as striking, and an analysis that uses these groups is not likely to find consistent and reproducible patterns.

Batch 3 Data

Note that batch 3 profiles were all untreated. For more details on available data, see #40.

heatmap_2019_06_25_Batch3

Conclusion

Recently, in #49, it was hypothesized that clones A and E (batch 1 and 2) data may harbor similar resistance features as other resistant clones (BZ001, BZ002, BZ003, and BZ004). We know that clones A and E have PSMB5 mutations, and these mutations confer bortzeomib resistance (Lu and Wang). If we can link morphology signatures of A and E clones to the other resistant clones, this may suggest that other resistant clones also have PSMB5 mutations.

One way to perform this analysis is to see where clones A and E fall within each "group". Since the grouping structure does not appear to be as consistent as before, I think an alternative strategy will be more effective. I will outline an alternative strategy in a separate issue!

Potential Inconsistency in Platemap for Batch 1 (20X)

Hi @mekelley and @ayberman,

I am working through implementing a PSMB5 signature analysis (described briefly here #49 (comment))

This requires using data from early batches (batch 1 20x and batch 2). These are the only batches to have profiles for clones A and E (the only confirmed PSMB5 mutations), correct?

I noticed that there is a bit of a wonky platemap:

2019_02_15_Batch1_20XHCT116bortezomib_plate_effects_cell_count

Copy and pasted below:

plate_map_name well_position CellLine Dosage
PlateMap_HCT116bortezomib B03 WT 0
PlateMap_HCT116bortezomib B04 WT 0
PlateMap_HCT116bortezomib B05 WT 0
PlateMap_HCT116bortezomib B06 CloneA 0
PlateMap_HCT116bortezomib B07 CloneA 0
PlateMap_HCT116bortezomib B08 CloneA 0
PlateMap_HCT116bortezomib B09 CloneE 0
PlateMap_HCT116bortezomib B10 CloneE 0
PlateMap_HCT116bortezomib B11 CloneE 0
PlateMap_HCT116bortezomib C03 WT 0.7
PlateMap_HCT116bortezomib C04 WT 0.7
PlateMap_HCT116bortezomib C05 WT 0.7
PlateMap_HCT116bortezomib C06 CloneA 0.7
PlateMap_HCT116bortezomib C07 CloneA 0.7
*PlateMap_HCT116bortezomib F08 CloneA 0.7
PlateMap_HCT116bortezomib C09 CloneE 0.7
PlateMap_HCT116bortezomib C10 CloneE 0.7
PlateMap_HCT116bortezomib C11 CloneE 0.7
PlateMap_HCT116bortezomib D03 WT 7
PlateMap_HCT116bortezomib D04 WT 7
PlateMap_HCT116bortezomib D05 WT 7
PlateMap_HCT116bortezomib D06 CloneA 7
PlateMap_HCT116bortezomib D07 CloneA 7
PlateMap_HCT116bortezomib D08 CloneA 7
PlateMap_HCT116bortezomib D09 CloneE 7
PlateMap_HCT116bortezomib D10 CloneE 7
PlateMap_HCT116bortezomib D11 CloneE 7
PlateMap_HCT116bortezomib E03 WT 70
PlateMap_HCT116bortezomib E04 WT 70
PlateMap_HCT116bortezomib E05 WT 70
PlateMap_HCT116bortezomib E06 CloneA 70
PlateMap_HCT116bortezomib E07 CloneA 70
PlateMap_HCT116bortezomib E08 CloneA 70
PlateMap_HCT116bortezomib E09 CloneE 70
PlateMap_HCT116bortezomib E10 CloneE 70
PlateMap_HCT116bortezomib E11 CloneE 70

This doesn't impact analysis much at all, just confirming that the F above should be a C. Also, on the chance that this is indeed a small mistake - this stuff happens literally all the time, so no worries! ๐Ÿ˜น It is good to confirm and clear up once identified. ( I didn't catch it in my original processing!)

Data Summary 2019

We are nearing the end of 2019, and we are starting to increase data collection. This issue will document the data we have currently collected.

Overall Summary

In summary, we've made a lot of progress in optimizing data collection, and a lot of the profiling results look promising. For example, we've identified some morphology features that are consistently different between mutant and wild-type clones. We can use the current data to finalize optimal conditions and optimal plate layouts so that we can efficiently scale up in 2020.

We can decide to scale up to profile different bortezomib resistant clones in HCT116. This would answer how many resistance mechanisms does this system develop. We could also scale up to test different proteasome inhibitors. The hypothesis there is that HCT116 cells develop the same resistance mechanism against any proteasome inhibitor. We could also scale to different cell lines, although this may be difficult since we'll have to optimize conditions again. The overall goal is to be able to predict resistance in single cells based on cell morphology. More exciting progress and questions to attack in 2020! ๐Ÿ‘€

cc @bethac07 @mekelley @shntnu @AnneCarpenter

At a Glance

HCT116 Cell Line (colorectal cancer), 2 treatments (DMSO control and Bortezomib (proteasome inhibitor)), 7 batches of data, 633 total profiles

batch_count

We have noted a potential issue in the number of cells in each well. High confluence may lead to incorrect segmentation and inaccurate profiles. This is one issue that we are working towards solving. I note the size of the single cell profiles (.sqlite file) for each plate. The bigger the size, the higher the confluence.

Batch Plate sqlite file size
2019_02_15_Batch1_20X HCT116bortezomib 11G
2019_02_15_Batch1_40X HCT116bortezomib 4.8G
2019_03_20_Batch2 207106_exposure320 7.2G
2019_06_25_Batch3 WTClones 23G
2019_06_25_Batch3 MutClones 26G
2019_11_11_Batch4 WTmut04hWed 56G
2019_11_11_Batch4 WTmut04hTh 56G
2019_11_19_Batch5 217755 37G
2019_11_20_Batch6 217762 28G
2019_11_20_Batch6 217760 48G
2019_11_22_Batch7 217768 17G
2019_11_22_Batch7 217766 39G

Traditionally, these files are between 10 and 25 GB in 384 well plates. These are only 96 well plates and the files are generally much larger.

Current Summary

Initial Testing

  • Batch 1 - tested magnification
    • We decided on 20X
  • Batch 2 - Replicate of batch 1 using only 20x data

Measure different clones

  • Batch 3 - Tested a bunch of different wildtype and mutation clones

Measure different time points

  • Batch 4 - Tested one extra day of growth (the cells in this batch might be too confluent to use)

Measure with lower confluence

  • Batch 5 - We tried to measure the same layout as batch 4 but with lower cell density

Test different confluence levels

  • Batch 6 - Testing two different plating densities
  • Batch 7 - Testing two different plating densities

Batch Details

Batch 1 - Acquired on 15 February 2019

Batch 1 data was acquired using two different magnifications (20X and 40X) (TODO see #28). The two batches are: 2019_02_15_Batch1_20X and 2019_02_15_Batch1_40X.

Notes - Batch 1

  • There were three cell lines tested: CloneA, CloneE, and WT.
  • The treatment tested was: Bortezomib
  • There were four doses tested: 0.0, 0.7, 7.0, and 70.0
  • The 0.0 dose represents 0.1% DMSO (control vehicle only)
  • Each profile was acquired in triplicate

Batch 2 - Acquired on 20 March 2019 (2019_03_20_Batch2)

Batch 2 data had the same data acquired as Batch 1. Batch 2 tested the same cell lines, the same perturbations, the same doses, and the same number of replicates as Batch 1.

Batch 3 - Acquired on 25 June 2019 (2019_06_25_Batch3)

Batch 3 data saw a shift in data collection. These cells have not undergone any treatment (not even DMSO/control vehicle). There were two plates (MutClones and WTClones). Each plate had many different wildtype and mutant clones. There were three wildtype parental lines acquired on both plates.

Notes - Batch 3

  • There were 18 mutant clones profiled: BZ001, BZ002, BZ003, ...
  • There were 15 wildtype clones profiled: WT001, WT002, WT003, ...
  • Wildtype parental profiles were collected on both plates
  • Profiles were acquired in triplicate

Batch 4 - Acquired 11 November 2019 (2019_11_11_Batch4)

Batch 4 also saw a shift in data collection. Cells were grown for 48h before fixation on both plates, however different densities were plated (10.5x10^3 cells/well for WTmut04hWed [plate #217744] and 7x10^3 cells/well for WTmut04hTh [plate #217748]). We've also now started collecting wildtype parental clones on every plate with 9 replicates.

Notes - Batch 4

  • These files are HUGE (56G) - there are too many cells here for the profiles to be reliable.
  • The plates were identical
  • Each profile was treated with DMSO or Bortezomib (7 nM)
  • Four mutant clones were tested (BZ001, BZ008, BZ017, BZ018)
  • Four wildtype clones were tested (WT002, WT008, WT009, WT011)
  • Wildtype parental lines were acquired on both plates, with both DMSO and bortezomib treatments
  • All profiles were captured in triplicate, except for wildtype parental lines treated with DMSO. These were collected 9 times.

Batch 5 - Acquired 19 November 2019 (2019_11_19_Batch5)

Batch 5 was the same experimental design as Batch 4. Batch 5 is a bit smaller than batch 4, but still very large (37G). Batch 5 was plated at 5x10^3 cells/well.

Notes - Batch 5

  • There is only one plate measured (217755)
  • Bortezomib and DMSO treatments
  • The same clones and replicate numbers as Batch 4

Batch 6 - Acquired 20 November 2019 (2019_11_20_Batch6)

Batch 6 was the same experimental design as Batches 4 and 5. There were two plates acquired in Batch 6: 217760 and 217762. We acquired brightfield images of these plates as well.

Notes - Batch 6

  • Bortezomib and DMSO treatments
  • The same clones and replicate numbers as Batches 4 and 5
  • Two different cell counts initially plated (217760 was 48G and 217762 was 28G)
  • Plated 5x10^3 cells/well in plate 217760 (imaged 5 channels)
    • Plate 217761 is the brightfield of plate 217760
  • Plated 2.5x10^3 cells/well in plate 217762 (imaged 5 channels)
    • Plate 217763 is the brightfield of plate 217762

Batch 7 - Acquired 22 November 2019 (2019_11_22_Batch7)

Batch 7 was the same experimental design as Batches 4, 5, and 6. There were two plates acquired in Batch 6: 217766 and 217768. Brightfield was also captured for this batch.

Notes - Batch 7

  • Bortezomib and DMSO treatments
  • The same clones and replicate numbers as Batches 4, 5, and 6
  • Two different cell counts initially plated (217766 was 39G and 217768 was 17G)
  • Plated 5x10^3 cells/well in plate 217766 (imaged 5 channels)
    • Plate 217767 is the brightfield of plate 217766
  • Plated 2.5x10^3 cells/well in plate 217768 (imaged 5 channels)
    • Plate 217769 is the brightfield of plate 217768

edits to add important details @mekelley described in #40 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.