broadinstitute / profiling-resistance-mechanisms Goto Github PK

6.0 9.0 4.0 1.6 GB

Predicting pharmacodynamic responses to cancer drugs using cell morphology

License: BSD 3-Clause "New" or "Revised" License

Shell 0.01% Jupyter Notebook 72.29% R 0.84% Python 1.17% HTML 25.58% MATLAB 0.11%

morphology cell-painting machine-learning cancer pharmacodynamics resistance carpenter-lab

profiling-resistance-mechanisms's Introduction

Discovering Morphological Markers of Drug Resistance

In this repository we analyze Cell Painting data generated from multiple cell line clones that were resistant or sensitive to bortezomib.

Citation

Kelley ME, Berman AY, Stirling DR, Cimini BA, Han Y, Singh S, Carpenter AE, Kapoor TM, Way GP. High-content microscopy reveals a morphological signature of bortezomib resistance. (2023) eLife; 12:e91362. DOI: https://doi.org/10.7554/eLife.91362.

Data collection and processing

We cultured a colon cancer cell line (HCT116), treated with a proteosome inhibitor (Bortezomib), and selected two resistant clones. We applied Cell Painting to these cell lines (in triplicate) under four conditions (DMSO, 0.7nm, 7nm, and 70nm Bortezomib).

The Cell Painting assay captures several cellular morphology features (described in more detail here). Our hypothesis was that morphological features could distinguish wildtype from resistant clones.

We processed the cell painting data using CellProfiler. We use CellProfiler to test quality control, segment images to extract nuclei, and measure features captured by cell painting.

This repository contains all image analysis pipelines and image-based profiling pipelines (see 0.generate-profiles).

Pilot analyses

This repository ingests the processed Cell Painting data and performs several downstream analyses.

Using the triplicate measurements, and two batches, we perform the following pilot analyses:

Obtain similarity matrices for each batch independently and combined; perform hierarchical clustering; visualize heatmaps.
- These analyses were performed using the Morpheus WebApp
- An outline of the results can be viewed here.
Apply UMAP to the batched data to observe large differences across variables
Apply t-tests to determine cell morphology differences between conditions:
- We test for differences between resistant clones at two doses of Bortezomib (0.7nm and 7nm)
- We also test for differences between wildtype and resistant clones at a low dose of Bortezomib (0.7nm)

UMAP Batch Analysis

T-test to Determine Morphological Differences

Reproducibility

We use conda to manage package versions. After installing conda, obtain all required packages:

conda env create --force --file environment.yml

# Activate environment
conda activate resistance-mechansisms

Clone the github repository. First, generate and enable SSH Keys if you haven't already.

# Then clone and enter repo
git clone [email protected]:broadinstitute/profiling-resistance-mechanisms
cd profiling-resistance-mechanisms

All analyses are presented in analysis.sh. To reproduce, perform the following:

./analysis.sh

Bug Reporting

Please file an issue with any questions or bug reports.

Internal documents

GDrive folder

profiling-resistance-mechanisms's People

Contributors

Stargazers

Watchers

Forkers

gwaybio davidstirling yhan8 mekelley

profiling-resistance-mechanisms's Issues

Duplicate Well Info

Warning message in profiling audit:

"WARNING! Duplicate well information detected in: Batch: 2019_06_25_Batch3 Plate: MutClones"

Combining plates in Batch 9 and 10

The data are described in more detail in #89. This issue is to discuss the results and interpretations of #90

Update 11/16/20

We discussed this plate layout in detail, and determined that the data are still usable. Our primary question is whether or not we can distinguish resistant clones from sensitive clones. We do not need drug treatment and we only need the DMSO-only plates to determine this.

Critical Issue

Specifically, each plate contains only a single treatment (either DMSO or drug). This prevents us from performing our traditional per-plate normalization strategy, and might make it impossible for us to reliably combine data across plates.

Approach to Address

Determine the extent of the problem
- Combine all plates together within batch
- Apply PCA retaining 20 components, and then apply an ANOVA using Treatment, Clone, and Plate (and an interaction term between treatment and plate) as factors
- Visualize the ANOVA F scores per factor
Attempt to solve the problem in a naive way
- Concatenate "augmented" (aggregated, unnormalized) per-plate "matches" (meaning all the plates that have DMSO and compound treatments for the specific clones)
- Apply whole-plate standardization
- Apply the steps outlined in the first step above

Results

Batch 7 and Batch 8 have good plate designs, Batch 9 and Batch 10 have suboptimal designs.
All four batches don't have plate effects! 🎉
However, Batch 9 and Batch 10 only have very little impact for the Metadata treatement factor ☹️ - this is because normalization occurs per plate, and each plate only has one treatment (either DMSO or drug)

A naive attempt (described above) explodes the impact of batch.
Maybe this is ok? Perhaps we can regress out those components and move on?

cc'ing @shntnu @AnneCarpenter - Comments/suggestions welcome, if there are any. Can we salvage these data? (there are 14 plates...)

Comparing Multiple Clones to Clone A and E

Previously, @mekelley and @ayberman sequenced Clones A and E and observed PSMB5 mutations (see #49 (comment)).

One goal is to determine if any of the four resistant clones (BZ001, BZ002, BZ003, and BZ004) also harbors a PSMB5 mutation based on a signature of morphology. The initial approach was to see where clone A and clone E clustered in comparison to each of the four "groups" we previously observed in #25. However, those groups appear to be artifactual - see #51.

An additional wrinkle in comparing the six clones, is that there appears to be batch effects between the batches analyzed. We do not observe substantial batch effect between BZ001, BZ002, BZ003, and BZ004 clones which were measured across 3 different batches (which totaled 5 different plates) (see #48). We also do not observe substantial batch effects between the two different batches (2 total plates) that measured clone A and clone E (batch 1 and 2). See below, which was added in #52

However, we do observe batch effects when combining the two groups of data (also added in #52):

Note that we can visually inspect for batch effects in this reduced dimension space (UMAP). It is not a great way to detect batch effects, but usually when we see batches cluster separately, especially when they both contain wild-type parental profiles, it hints towards a batch effect.

The current normalization procedure calculates z-scores for all samples. We can explore if alternative normalization procedures overcomes this batch effect.

Notes from Meeting 12/9/20

(I'm using this as a scratchpad for now)

This is the data used in the first part of the analysis presented

Metadata_batch	Metadata_Plate	Metadata_clone_number	Metadata_model_split	Metadata_clone_type	n
2019_03_20_Batch2	207106_exposure320	CloneA	test	resistant	1
2020_07_02_Batch8	218361	CloneA	test	resistant	2
2019_02_15_Batch1_20X	HCT116bortezomib	CloneE	test	resistant	1
2019_03_20_Batch2	207106_exposure320	CloneE	test	resistant	1
2020_07_02_Batch8	218361	CloneE	test	resistant	1
2020_07_02_Batch8	218361	WT_parental	test	sensitive	3
2019_02_15_Batch1_20X	HCT116bortezomib	CloneA	training	resistant	3
2019_03_20_Batch2	207106_exposure320	CloneA	training	resistant	2
2020_07_02_Batch8	218361	CloneA	training	resistant	3
2019_02_15_Batch1_20X	HCT116bortezomib	CloneE	training	resistant	2
2019_03_20_Batch2	207106_exposure320	CloneE	training	resistant	2
2020_07_02_Batch8	218361	CloneE	training	resistant	4
2019_02_15_Batch1_20X	HCT116bortezomib	WT_parental	training	sensitive	3
2019_03_20_Batch2	207106_exposure320	WT_parental	training	sensitive	3
2020_07_02_Batch8	218361	WT_parental	training	sensitive	2
2020_07_02_Batch8	218360	CloneA	validation	resistant	5
2020_07_02_Batch8	218360	CloneE	validation	resistant	5
2020_07_02_Batch8	218360	WT_parental	validation	sensitive	5

Grouping further:

Metadata_model_split	Metadata_clone_type	Metadata_clone_number	n
test	resistant	CloneA	3
test	resistant	CloneE	3
test	sensitive	WT_parental	3
training	resistant	CloneA	8
training	resistant	CloneE	8
training	sensitive	WT_parental	8
validation	resistant	CloneA	5
validation	resistant	CloneE	5
validation	sensitive	WT_parental	5

Grouping even further:

Metadata_model_split	Metadata_clone_type	n
training	resistant	16
training	sensitive	8
test	resistant	6
test	sensitive	3
validation	resistant	10
validation	sensitive	5

We are building a binary classifier of sorts (via regression-based feature selection + singscore) on the training set which comprises n=16 and n=8 samples in the two classes (sensitive and resistant).

Update Normalization (Batch 3)

@bethac07 created the profiles for batch 3 data by, essentially, following the standard profiling handbook protocol. The variable selection steps skipped (b/c the platemaps didn't permit) include "Correlation Threshold" and "Variance Threshold".

The plates were also normalized on the full plate. We need to check if normalizing by the parental cell line improves reproducibility in replicate profiles.

Strange Well Assignments for Batch 3 Mutant Clones

I noticed this when examining potential plate artifacts.

It appears that in the file backend/2019_06_25_Batch3/WTClones/WTClones_normalized_variable_selected.csv there are two different samples measured in the same well.

Sample BZ009 and BZ010 are measured each three times. The wells indicated for BZ009 are ["B10", "C10", "D10"] while the wells indicated for BZ010 are ["B11", "C10", "D10"]. Going on this data, it looks like the two samples were measured in the same well! We know that this is not the case however. See below:

Plate for Cell Count

Note that this data was extracted directly from the sqlite files.

Plate for Replicate Correlation

My guess is that BZ010 should be occupying well "C11" and "D11" in the figure above.

Recommendation

I am not sure we need to do anything at the moment, since we have the profiles already aggregated by well and this issue does not seem to have resulted in lost data. It is important to document though - maybe this persists in other projects? cc @bethac07 and @shntnu in case we've seen this before

Single Cell Analysis Results

in #77 we upgraded the single cell analysis to use data from batch 8 (data added in #75), and we use a multi-class classifier to predict clone A, clone E, and wildtype parental lines.

The results of this analysis were super interesting! Links to slides are below:

Preliminary analysis (#74) - https://docs.google.com/presentation/d/1_NwAVFJhqkA87duCBL5c2uxG9RlEj-qV4c5AFd4McWI/edit?usp=sharing
Batch 8 analysis (#77) - https://docs.google.com/presentation/d/13rOueMBlk8QbGx0G-Kl2qj42HVVShCbMF06PxXUfrv0/edit?usp=sharing

Primary Result

Many results and figures described in the batch 8 analysis slides relate to model performance and benchmarking model behavior. Below is the primary result to interpret.

Predicting wildtype parental single cells treated with various doses of bortezomib. Real data model compared to a model trained with shuffled data. As the drug dose increases, the clones appear more wildtype in nature, and thus, more likely to be killed by bortezomib treatment. The wildtype lines are not impacted by bortezomib treatment, indicating that the initial model is isolating the core differences between resistant and non-resistant lines, and that these features are not directly associated with bortezomib resistance.

Additional Experiments Comparing Feature Differences

t-test with nonnormalized data
t-test with nonvariable selected data
Plot feature distributions of top differential features using non-normalized data

Cloning Details

As discussed at the meeting today, we were curious about the cloning details. We can use this issue to elaborate on the cloning procedure. We are specifically interested in knowing more details about how resistance was identified and selected for.

cc @mekelley @ayberman

Incorporate cytominer-eval

With an update to pycytominer@02ed6647a0913e9f0b28cbafa97766d55eeffd20, we can no longer rely on the audit function.

Traceback (most recent call last):
  File "generate-profiles.py", line 11, in <module>
    from scripts.profile_util import process_profile, load_config
  File "/home/ubuntu/efs/2018_05_30_ResistanceMechanisms_Kapoor/workspace/software/2018_05_30_ResistanceMechanisms_Kapoor/0.generate-profiles/scripts/profile_util.py", line 10, in <module>
    from pycytominer import (
ImportError: cannot import name 'audit' from 'pycytominer' (/home/ubuntu/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/pycytominer/__init__.py)

I need to incorporate cytominer-eval into the automated processing pipeline.

Bulk signature analysis

In #82, I an analysis of bulk (aggregated) signatures from two compiled datasets. #58 is an initial attempt at this analysis, but was using earlier (and lower quality data).

I summarize the experiment immediately below, and then describe the results in more detail further below.

Summary

The Clone AE results are promising. The signature and method clearly work in both training and testing splits, and there appears to be some sort of dose response. What this dose response means biologically is unclear, but technically (in the resistant lines) it means that the signature features become less extreme in their ranking. This means that the absolute value of signature features are higher in the Wildtype_parental profiles.

The Four Clone signature applied to the Clone AE data is odd. The results (at least for the DMSO treated samples) are mostly outside the null, and the score is less extreme than the clone AE signature, but the sign is flipped! This could be a result of some weird programmatic anomaly in fitting linear models (I confirmed one thing that might do this isn't), a metadata label mixup in the four clone dataset, or the method isn't robust across batches.

The signatures applied to the four clone dataset (even the four clone signature) are less conclusive. I am not confident in these data in nearly the same way that I am about the Batch 8 profiles.

Next steps

Signature titration

The number of features in the signature is high. Since one goal of the project is to identify a smaller set of features to potentially use as a biomarker of drug resistance, I will perform a "signature titration" analysis in which I systematically add features (starting with the most significant), and quantify the average difference of test set TotalScore between sensitive and resistant clones. This approach will give us a way to select the minimal set of features required to separate the clone types.

More data collection

We will work with the Rockefeller team to decide next steps in data collection. I see two additional data we could collect (note I have not yet processed batch 9 or 10 yet)

Four plates (like in batch 8) except with Clone A/E, Wildtype parental, and wildtype clones
Four plates (like in batch 8) except with the four wildtype clones, four resistant clones, and wildtype parental
Redo batch 3 (lots of WT and resistant clones with more wildtype parental lines) with updated protocols

I would also like to double check the platemap metadata labels for batches 4, 5, 6, and 7

Data

Clone A/E

Batch 8 profiles (four plates)
Only DMSO treated samples
Cell lines:
- Resistant: CloneA, CloneE
- Polyclonal wildtype: WT_parental
- n = 240
Feature selection:
- Operations: variance_threshold, correlation_threshold, drop_na_columns, blocklist, drop_outliers
- Correlation threshold: 0.95
- Performed independently
- p = 3538, after feature selection = 434
Caveat:
- We're comparing WT_parental to two clones. The signature may include features representing clonal selection.

Four Clone

Batches 4, 5, 6, and 7 (seven plates)
Only DMSO treated samples
Cell lines:
- Wildtype (sensitive): WT002, WT008, WT009, WT009
- Resistant: BZ001, BZ008, BZ017, BZ018
- Polyclonal wildtype: WT_parental
- n = 420
Feature selection:
- Operations: variance_threshold, correlation_threshold, drop_na_columns, blocklist, drop_outliers
- Correlation threshold: 0.95
- Performed independently
- p = 3538, after feature selection = 338

Signature generation

Procedure

For each dataset, I perform the following procedure:

I split the data into an 85% training and 15% testing set balanced by cell line. I built the signature using only the training set.
Using the feature selected features, fit a linear model using the following covariates:
- Metadata_clone_type_indicator (resistant vs. sensitive)
- Metadata_batch (four_clone only; cloneAE is one batch)
- Metadata_Plate
- Metadata_clone_number (clone id)
Perform a TukeyHSD test to adjust p values for within feature multiple comparisons.
- (e.g. in the cloneAE dataset and in the linear model, testing an individual feature using the plates covariate is actually 6 comparisons!)
Adjust the TukeyHSD adjusted p values further by a bonferroni correction.
Select all features with a p value below this adjusted rate for the Metadata_clone_type_indicator covariate this is the "PreSignature".
Remove all features from the "PreSignature" with a p value below the adjusted rate for the Metadata_Plate and Metadata_batch covariates.
- Effect: This reduces the impact of technical artifacts in the signature application

Volcano plots

These plots visualize feature significance for each linear model covariate

Clone AE

Click to show figure

Four Clone

Click to show figure

Result

The signatures contain many features, and we make a distinction between features "up" and features "down":

Four Clone: 76 features
- Up: 39
- Down: 37
CloneAE: 188 features
- Up: 95
- Down: 93

Apply signatures

Approach

Because we have two datasets and two signatures, I applied each signature to each dataset independently. I also apply each signature with 1,000 random permutations to define a null distribution.

Method

I use the singscore method Foroutan et al.. This is a "single sample" method to detect signature enrichment. It is a relatively simple, rank-based approach bounded between -1 and 1, where a score of 1 means that the sample is enriched for signature features.

Results

Comparison 1: Clone AE Dataset - Clone AE Signature

Comparison 2: Clone AE Dataset - Four Clone Signature

Comparison 3: Four Clone Dataset - Clone AE Signature

Comparison 4: Four Clone Dataset - Four Clone Signature

Extracting XY location of sites

get names all channel 1 images of well B03 (and then download)

parallel aws s3 ls s3://imaging-platform/projects/2018_05_30_ResistanceMechanisms_Kapoor/2019_11_19_Batch5/images/217755/20191119-TH-WTMUT-4h-0-7nMbz_B03_s{1}_w1 ::: 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 17

use bftools to get metadata

parallel ~/Downloads/bftools/showinf -nopix 20191119-TH-WTMUT-4h-0-7nMbz_B03_s{1}_w1* '>' site_{1}.txt ::: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17

Extract stage information

parallel grep stage site_{1}.txt '>' stage_{1}.txt ::: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17

Then format it using some SublimeText magic to make it a csv

site,x,y
2,29736.2,48083.54
3,31175.9,48083.64
1,28298.52,48083.46
4,29016.74,48802.78
5,29735.66,48802.68
7,28297.44,49522.04
6,30456.76,48802.82
14,30456.84,50241.42
15,28297.12,50960.68
17,31174.62,50960.62
12,29016.64,50241.36
13,29738.18,50241.36
11,31174.24,49522.04
10,30456.86,49522.08
8,29017.98,49522.16
9,29737.52,49522.08

sites <- tribble(~site,~x,~y,
                 2,29736.2,48083.54,
                 3,31175.9,48083.64,
                 1,28298.52,48083.46,
                 4,29016.74,48802.78,
                 5,29735.66,48802.68,
                 7,28297.44,49522.04,
                 6,30456.76,48802.82,
                 14,30456.84,50241.42,
                 15,28297.12,50960.68,
                 17,31174.62,50960.62,
                 12,29016.64,50241.36,
                 13,29738.18,50241.36,
                 11,31174.24,49522.04,
                 10,30456.86,49522.08,
                 8,29017.98,49522.16,
                 9,29737.52,49522.08)

ggplot(sites, aes(x, y, label = site)) + geom_text()

Within Sub-Cluster Analyses

Based on the Morpheus heatmap (shown below) it appears there are four distinct sub-clusters, each with different correlational structure, but also each with representation from both wild-type and resistant clones.

Experiment

Perform t-test within each individual subcluster and see if the features differences are the same

Making our images publicly-available

As we work towards publication, we also need to work towards making our data publicly available. The bulk profiles are already available, but it is a larger push to make our images public.

The group discussed this process today, and @shntnu recommended CellPainting gallery.

In this issue, we can discuss:

Selecting batches to upload (which do we include in the manuscript, some were pilot batches),
Thawing data
Moving to the gallery
Adding fluorescence and brightfield images

Select most representative images of interesting doses

Related to #2

Notes from Meeting 1/21/20

Data from merged batches 5, 6, and 7 looking good so far
- Batch correction not necessary
- There might be some minor well effects
  - We discussed randomizing platemaps to the extent possible
Morphology signature is interesting, and worth having a biologist dig a bit deeper into
- Still some concern over confluence issues
  - While it doesn't seem to impact profiles, it is impacting explanatory morphology features that are typically only important in high confluence settings
There might be something additionally interesting about the way the clones were selected
- We talked about how they were all selected from the same cluster of resistant clones
  - Can you check me on this @mekelley? It is likely I didn't truly appreciate this point and I think it is important. (if possible, please add this to a new issue similar to #45 )
  - We know which cluster each sample belongs to which will help with:
    - Sequence specific clusters to determine molecular resistance mechanism
    - Use cluster info in future covariate models to observe which features are important for cluster of origin
The Rockefeller team is actively onboarding new clones and a new proteasome inhibitor
- They will present at next check-in
We also briefly discussed presentations at next CSHL
- It will be analysis heavy and @AnneCarpenter will represent (I don't remember if we've discussed this elsewhere before!)

Experiment Design - New Drugs

@mekelley and @ayberman are getting ready to ramp up data collection. Specifically, we will collect Cell Painting data from 7 resistant clones and a wildtype parental line undergoing two drug treatments and a DMSO negative control. The drugs are ixazomib (proteasome inhibitor) and CB-5083 (p97 inhibitor). The cells will be treated for four hours.

Clarification Questions

Will we also collect Cell Painting profiles with different drug doses?
Will the 7 resistant clones be selected independently per drug? In other words, will there actually be 14 new resistant clones?
Do we need to include a wild-type clonal line?

Wildtype Clonal Line Discussion

In my opinion, we definitely should try to mirror the same clonal selection procedure for the wildtype parental line to acquire a wildtype clonal line. Our approach in identifying morphology features of resistance mechanisms will be skewed by clonal selection signal. In other words, we're likely to obscure the resistance signals by also isolating the clonal selection signals.

I view the wild-type parental line as a really great validation resource. We should see a lot of heterogeneity in these samples, and could validate some resistance signatures we find by comparing wildtype clones and other resistant clones.

If we use the same clonal selection procedure for selecting resistant cells to the new drugs, then I think using the same WT clones for batches 4-7 (see #40) is sufficient.

Running Jupyter Notebook on AWS

I am getting the following errors when trying to run a jupyter notebook with an R kernel on AWS. An example of the errors I am receiving is:

Error: package ‘base64enc’ was installed by an R version with different internals; it needs to be reinstalled for use with this R version

A description of this problem is provided here.

I will document the packages that I needed to install with the given conda environment here:

digest
base64enc
Rcpp
rlang>0.4
glue
pillar
tibble
purrr
tidyselect
dplyr
tidyr
plyr
mnormt
foreign
stringi
reshape2
colorspace
scales
lazyeval
haven
jsonlite
lubridate
readr
readxl
xml2
tidyverse
ggplot2
htmltools
backports
bit
bit64
RSQLite
dbplyr
utf8

Lots of broken packages. Jupyter works again after reinstalling.

Batch 9 and 10 Data Summary

We added Batch 9 (2020_08_24_Batch9) and Batch 10 (2020_09_09_Batch10) profiles in #85 and audited in #88. I describe them in more detail below:

Including the data described in #40, we now have a total of 1,713 profiles

Split by batch:

Plate Contents

Batch 9 (`2020_08_24_Batch9`)

Click to expand plate contents [6 plates]

6 plates
- 218775
  - Treatment: CB-5083 (700 nM)
  - Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, CB5038 clone 17, CB5038 clone 16, CB5038 clone 15, CB5038 clone 14, CB5038 clone 13
- 218774
  - Treatment: DMSO (0.1%)
  - Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, CB5038 clone 17, CB5038 clone 16, CB5038 clone 15, CB5038 clone 14, CB5038 clone 13
- 218699
  - Treatment: Ixazomib (50 nM)
  - Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, Ixazomib clone 01, Ixazomib clone 02, Ixazomib clone 03, Ixazomib clone 04, Ixazomib clone 05
- 218698
  - Treatment: DMSO (0.1%)
  - Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, Ixazomib clone 01, Ixazomib clone 02, Ixazomib clone 03, Ixazomib clone 04, Ixazomib clone 05
- 218697
  - Treatment: CB-5083 (700 nM)
  - Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, CB5038 clone 17, CB5038 clone 16, CB5038 clone 15, CB5038 clone 14, CB5038 clone 13
- 218696
  - Treatment: DMSO (0.1%)
  - Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, CB5038 clone 17, CB5038 clone 16, CB5038 clone 15, CB5038 clone 14, CB5038 clone 13

Batch 10 (`2020_09_09_Batch10`)

Click to expand plate contents [8 plates]

8 plates
- 218852
  - Treatment: DMSO (0.1%)
  - Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, CB5038 clone 17, CB5038 clone 16, CB5038 clone 15, CB5038 clone 14, CB5038 clone 13
- 218853
  - Treatment: CB-5083 (700 nM)
  - Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, CB5038 clone 17, CB5038 clone 16, CB5038 clone 15, CB5038 clone 14, CB5038 clone 13
- 218854
  - Treatment: DMSO (0.1%)
  - Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, Ixazomib clone 01, Ixazomib clone 02, Ixazomib clone 03, Ixazomib clone 04, Ixazomib clone 05
- 218855
  - Treatment: Ixazomib (50 nM)
  - Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, Ixazomib clone 01, Ixazomib clone 02, Ixazomib clone 03, Ixazomib clone 04, Ixazomib clone 05
- 218856
  - Treatment: DMSO (0.1%)
  - Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, CB5038 clone 17, CB5038 clone 16, CB5038 clone 15, CB5038 clone 14, CB5038 clone 13
- 218857
  - Treatment: CB-5083 (700 nM)
  - Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, CB5038 clone 17, CB5038 clone 16, CB5038 clone 15, CB5038 clone 14, CB5038 clone 13
- 218858
  - Treatment: DMSO (0.1%)
  - Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, Ixazomib clone 01, Ixazomib clone 02, Ixazomib clone 03, Ixazomib clone 04, Ixazomib clone 05
- 218859
  - Treatment: Ixazomib (50 nM)
  - Clones: WT Parental, WT Clone 07, WT Clone 06, WT Clone 05, WT Clone 04, Ixazomib clone 01, Ixazomib clone 02, Ixazomib clone 03, Ixazomib clone 04, Ixazomib clone 05

Audit

Below, I report Percent Strong for each plate. For a full report see the figures in https://github.com/broadinstitute/profiling-resistance-mechanisms/tree/7e5ee27021816012297c38d58088438eb7ad3a53/1.profiling-audit/figures

Plate	Batch	Perturbation	Percent Strong
218775	2020_08_24_Batch9	CB-5083 (700 nM)	65.33%
218774	2020_08_24_Batch9	DMSO (0.1%)	55.33%
218699	2020_08_24_Batch9	Ixazomib (50 nM)	87.33%
218698	2020_08_24_Batch9	DMSO (0.1%)	87.33%
218697	2020_08_24_Batch9	CB-5083 (700 nM)	75.33%
218696	2020_08_24_Batch9	DMSO (0.1%)	74.67%
218852	2020_09_09_Batch10	DMSO (0.1%)	60%
218853	2020_09_09_Batch10	CB-5083 (700 nM)	59.33%
218854	2020_09_09_Batch10	DMSO (0.1%)	52%
218855	2020_09_09_Batch10	Ixazomib (50 nM)	67.33%
218856	2020_09_09_Batch10	DMSO (0.1%)	40%
218857	2020_09_09_Batch10	CB-5083 (700 nM)	57.33%
218858	2020_09_09_Batch10	DMSO (0.1%)	71.33%
218859	2020_09_09_Batch10	Ixazomib (50 nM)	58.67%

Limitation

All plates in both batches include the same treatment in all wells. This prevents us from normalizing per plate. We will need to normalize all plates together and then adjust for batch effects.

Compile list of top 10 or so morphology features indicative of resistance mechanism

These will be compiled using batch 3 data by various metrics that i will define here

Add License

The repo requires an open source license!

Correlate Mechanisms of Resistance with Morphology Profiles

@mekelley asked via an email thread what I am copying and pasting below (also noting here that I received permission to do so 😸)

Can identify the dominant morphological features that contributed to the 0.1% DMSO treated clones A and E from our earlier data (without the Costes features) and see if they’re the same dominant features from the 0.1% DMSO treated bortezomib resistant clones from group 3 (our recent data)?

What about dominant features from bortezomib resistant clones in the other groups from batch 3 (again without Costes features)?

The reason for these questions is to begin to correlate mechanisms of resistance (i.e. mutation in the target protein, PSMB5, or multidrug resistance or something else) with morphological profiles.

CellProfiler feature analysis

In a recent meeting, the team discussed a followup analysis for interrogating the signature features. We have already performed at least one feature analysis that defines all of the bz signature features (all features in the analysis) https://github.com/broadinstitute/profiling-resistance-mechanisms/blob/master/3.resistance-signature/5.visualize-signature-features.ipynb

Task 1 We'll also likely want to generate feature heatmaps (in a similar way that we define profile heatmaps) for only the bz signature features

Task 2 The final feature analysis ranks all features on their ability to separate the two classes (resistant vs. sensitive). We'll select the top discriminative features based on their ranks in the holdout set, but we should also perform this ranking procedure using all sets (training, test, validation, etc.).

The three above efforts nicely narrow down our search space from 1) all features, 2) bz features, 3) top bz features.

We can use this issue to elaborate on our approaches to form the heatmaps, to perform the ranking, and to determine effective visualizations.

Create Combined Heatmap

Currently, the heatmaps are separated by batch. We need to combine heatmaps together, generate a new figure, and note in a ColSideColor the batch number.

Notes from Meeting 11/13/19

The confluence of cells appear to be a big issue
- The current molecular pipeline is to seed at uniform density
- If we generate additional plates, seeded at different densities, we can test which densities are best for specific clones
- I will also add cell count to the subcluster morpheus heatmap (see #33)
Next steps:
- @bethac07 and team will run image analysis pipeline on new data
- @mekelley and Adi (we need to get Adi on github! 😄 ) will generate new data with different seeding concentrations (see above)
- I will send out list of top 10-15 features (based on certain metrics) to send to biologists for interpretation (@bethac07, @mekelley, Adi, and @AnneCarpenter) (see #34)

Adding CellProfiler Pipelines

In #79 @DavidStirling added CellProfiler pipelines for batches 4 and onwards. This issue will serve as a placeholder/reminder that we should add the pipelines for batches 1-3.

Efforts in installing R package of ComplexHeatmap

Script 3.resistance/signature/7.profile-heatmaps.ipynb requires an R package of ComplexHeatmap which requires the package of Cairo. https://anaconda.org/conda-forge/r-cairo. Cairo seems like an O.S. package (I am not sure though). I installed the Cairo package in brew but still cannot implement the package ComplexHeatmap successfully.

Increase Interpretability of Signature Interpretation Figure

On a call today @mekelley and @ayberman noted that the signature interpretation figure is difficult to interpret - and I agree! Before publication we will need to clean it up to increase interpretability.

Let's use this issue to brainstorm how to improve the figure:

Migrate UMAP in `3.feature-differences` to `2.describe data`

In #52 I add notebooks to generate gct files and perform a series of UMAP visualizations. There is some overlap, which could be confusing, with notebooks in 3.feature-differences. I need to consolidate.

Can I reproduce batch-separated heatmaps?

@bethac07 generated correlation heatmaps for each analysis batch using a local morpheus implementation. I will reproduce.

Enhancement of 3.resistance-signature/0.training-test-split.ipynb

We dropped the inference set (batch 3) because of overly confluent plates and suboptimal plate design. In the notebook, I still need to output the bortezomib signature analytical set, but I can also include the new batches of data, which will serve as a better experimentally designed inference set as included in pull request #114

What is the best way to select Profiling Variables?

In working through #1 and #8 I thought about how I would combine data from batches together. My two options seemed to be:

Take feature union
Take feature intersection

The problem with taking the feature intersection is that fewer features are selected. The problem with taking feature union is that some variables may not be "good" features.

What is the method for cell painting variable selection? If the features are removed b/c of lack of consistency across replicates, then they should not be included in the feature union. However, if the features are removed because they are deemed redundant, then they should be included in the feature union.

@shntnu is there any way to track the decisions for feature selection?

Site-Level Profiles - Decision

We chatted today about site-level profile strategy during our checkin meeting. I'll summarize the decision and discuss implications below:

We decided to revert back to well-level profiles
- We observed little batch effects using well profiles compared to site-level profiles
- We don't currently know site layout so we can't adjust
- We don't know if site-level profiles will actually provide any additional signal benefit
We will generate well-level signatures, and then apply them to single cell profiles
- Doing this will help us to determine if the signature we find is specific to bortezomib resistance

Site level results discussed in #59, #60, #61, #63, and #69

Bulk profile normalization strategy

Goal

To build a bulk profile signature to distinguish wildtype clones from clone A/E resistant clones.

Challenges

No single plate exists to facilitate this analysis, which requires us to combine plates. The current normalization strategy prohibits us from merging plates without difficulty
Batches 4 - 7 have behaved poorly in the past. It is unclear why.

Requirement

We need the wildtype clones and clone A/E resistant clones to be aligned

Approach

Simple

Plate-based normalization with wildtype parental lines only, and then merge batch 4, 5, 6, 7, and 8.

2-Step-Sphering

W_8 —> sphering transform learned on D_8

D_4
D_5
D_7
D_8

W_8 * D8 —> has a unit covariance matrix

W_4 —> sphering transform learned on W_8 * D_4 (And not D_4)

Corrected data:
D’_8 <- W_8 * D_8
D’_4 <- W_4 * W_8 * D_4
D’_5 <- W_5 * W_8 * D_5
D’_7 <- W_7 * W_8 * D_7

All data
A_4
A_5
A_7
A_8

A’_4 <- W_4 * W_8 * A_4
A’_5
A’_7
A’_8 <- W_8 * A_8

Concern

One concern is that with such a complicated normalization strategy, applying this method to any future data will be extremely challenging.

Remaining analyses before writing the paper

A note from Adi on the remaining analyses required for a paper:

Building the bortezomib sensitivity signature
- Cell line selection and clone generation
- Any quality control analysis that went into this
- Generating the Bz signature (WT clones 1-5, BZ clones A, E, 1-5)
- Applying the Bz signature to other WT (10, 12-15) and Bz-resistant (6-10) clones to see that it applies well to clones the signature wasn't trained on.
Applying the Bz signature to Ixazomib resistant clones
- expectation: signature should pick up Ixazomib resistant clones somewhat, but not be a perfect fit
Applying the Bz signature to CB-5083 resistant clones
- expectation: signature should pick up CB-5083 resistant clones somewhat, but fit should be similar or worse than Ixazomib

Investigate Bortezomib signature in the context of other proteasomal inhibitors

We've received some insightful feedback from reviewers on our paper. Here, we will address two major comments:

Reviewer 2 - Major Comment 6.
Interestingly, the Bortezomib signature is specific to the drug and not a broad range of proteasomal inhibitors. However, seeing the common features between all the proteasomal inhibitors would be interesting.

Reviewer 3 - Major Comment 4
There was some predictive ability of the Bortezomib Signature for ixazomib resistance. Were there some features that were correlated with IX-resistance, i.e. UPS pathway, versus specific to bortezomib? Do the features suggest anything about resistance mechanisms or is the feature set too abstruse to interpret?

Next steps: @yhan8 will outline the analysis that's needed, seeking inputs from @gwaybio and @shntnu as needed, and then perform the analysis.

Identifying features that best distinguish between low and high dose across clones

The biggest differences between samples occurs between 0.7 and 7um doses. Develop a strategy to identify which features are the most significantly different.

Refactor Repository

Data is beginning to pour in. I will need to refactor the repository to reduce code redundancy. See broadinstitute/image-profiling-workflow-template#2

Notes from Meeting 10/16

Writing progress report
- Megan and Tarun will send out for comments on final version
- No figures necessary
- Beth will add language around wild-type clonal vs. resistance signatures (4 subclusters in heatmap)
Mixture experiment
- If we titrate wild-type and resistant clones into the same dish, can we distinguish?
- Will help to set baseline of models predicting one from the other
- Add in some ground truth fluorescent marker
- We can simulate this experiment in silico first
Collecting more data
- More bortezomib plates (with a mixture of wild-type and resistant clones on the same plate)
- Low dose treatments to reduce dead cell effect and determine when we can start to isolate resistant populations
Focus on Proteasome inhibitors
- Collect Cell Painting using other proteasome inhibitors (one that just failed clinical trails, other that has been approved)
- Answer question if resistance signature is the same across inhibitors? Is there a universal signature that represents proteasome inhibition resistance? What proportion of untreated cells have these conditions?
Big Picture: Can we characterize all common resistance signatures via Cell Painting?
- There is a paper in Cancer Cell that talks about them (there are a surprisingly few core resistance mechanisms) @mekelley, can you add this paper here? I couldn't seem to find.

Add Cell Count to SubCluster Morpheus Heatmap

in #25 I performed a subcluster analysis to identify differential features across mutant and wild-type clones in each cluster independently.

There is one large big red cluster in the heatmap. We noticed that random example images from this block had high confluence (see below).

To more systematically address this question, I will add cell count to morpheus heatmaps.

Cells in Big Red Cluster

WT Clones in Well C02

Mut Clones in Well B06

WT Parental Clones in D11

Cells not in big red cluster

Mut Clones in D09

WT Clones in E05

UMAP bug

In step 2, I ran into a new bug:

[NbConvertApp] Converting notebook 1.merge-datasets-gct.ipynb to html
[NbConvertApp] Writing 600777 bytes to scripts/html/1.merge-datasets-gct.html
[NbConvertApp] Converting notebook 2.umap-aggregate-profiles.ipynb to html
Traceback (most recent call last):
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/bin/jupyter-nbconvert", line 11, in <module>
    sys.exit(main())
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/jupyter_core/application.py", line 254, in launch_instance
    return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/traitlets/config/application.py", line 845, in launch_instance
    app.start()
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/nbconvertapp.py", line 350, in start
    self.convert_notebooks()
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/nbconvertapp.py", line 524, in convert_notebooks
    self.convert_single_notebook(notebook_filename)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/nbconvertapp.py", line 489, in convert_single_notebook
    output, resources = self.export_single_notebook(notebook_filename, resources, input_buffer=input_buffer)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/nbconvertapp.py", line 418, in export_single_notebook
    output, resources = self.exporter.from_filename(notebook_filename, resources=resources)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/exporters/exporter.py", line 181, in from_filename
    return self.from_file(f, resources=resources, **kw)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/exporters/exporter.py", line 199, in from_file
    return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/exporters/html.py", line 119, in from_notebook_node
    return super().from_notebook_node(nb, resources, **kw)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/exporters/templateexporter.py", line 369, in from_notebook_node
    nb_copy, resources = super().from_notebook_node(nb, resources, **kw)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/exporters/exporter.py", line 143, in from_notebook_node
    nb_copy, resources = self._preprocess(nb_copy, resources)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/exporters/exporter.py", line 318, in _preprocess
    nbc, resc = preprocessor(nbc, resc)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/preprocessors/base.py", line 47, in __call__
    return self.preprocess(nb, resources)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 79, in preprocess
    self.execute()
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbclient/util.py", line 74, in wrapped
    return just_run(coro(*args, **kwargs))
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbclient/util.py", line 53, in just_run
    return loop.run_until_complete(coro)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/asyncio/base_events.py", line 573, in run_until_complete
    return future.result()
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbclient/client.py", line 554, in async_execute
    cell, index, execution_count=self.code_cells_executed + 1
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 123, in async_execute_cell
    cell, resources = self.preprocess_cell(cell, self.resources, cell_index)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 146, in preprocess_cell
    cell = run_sync(NotebookClient.async_execute_cell)(self, cell, index, store_history=self.store_history)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbclient/util.py", line 74, in wrapped
    return just_run(coro(*args, **kwargs))
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbclient/util.py", line 53, in just_run
    return loop.run_until_complete(coro)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nest_asyncio.py", line 70, in run_until_complete
    return f.result()
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/asyncio/futures.py", line 178, in result
    raise self._exception
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/asyncio/tasks.py", line 223, in __step
    result = coro.send(None)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbclient/client.py", line 857, in async_execute_cell
    self._check_raise_for_error(cell, exec_reply)
  File "/Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/nbclient/client.py", line 760, in _check_raise_for_error
    raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)
nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
------------------
import os
import numpy as np
import pandas as pd
import umap

import plotnine as gg

from pycytominer import feature_select
from pycytominer.cyto_utils import infer_cp_features
------------------

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/errors.py in new_error_context(fmt_, *args, **kwargs)
    743     try:
--> 744         yield
    745     except NumbaError as e:

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/lowering.py in lower_block(self, block)
    229                                    loc=self.loc, errcls_=defaulterrcls):
--> 230                 self.lower_inst(inst)
    231         self.post_block(block)

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/lowering.py in lower_inst(self, inst)
    327             val = self.lower_assign(ty, inst)
--> 328             self.storevar(val, inst.target.name)
    329

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/lowering.py in storevar(self, value, name)
   1277                                                           name=name)
-> 1278             raise AssertionError(msg)
   1279

AssertionError: Storing i64 to ptr of i32 ('dim'). FE type int32

During handling of the above exception, another exception occurred:

LoweringError                             Traceback (most recent call last)
<ipython-input-1-886110eebc29> in <module>
      2 import numpy as np
      3 import pandas as pd
----> 4 import umap
      5
      6 import plotnine as gg

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/umap/__init__.py in <module>
----> 1 from .umap_ import UMAP
      2
      3 # Workaround: https://github.com/numba/numba/issues/3341
      4 import numba
      5

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/umap/umap_.py in <module>
     52 from umap.spectral import spectral_layout
     53 from umap.utils import deheap_sort, submatrix
---> 54 from umap.layouts import (
     55     optimize_layout_euclidean,
     56     optimize_layout_generic,

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/umap/layouts.py in <module>
     34         "result": numba.types.float32,
     35         "diff": numba.types.float32,
---> 36         "dim": numba.types.int32,
     37     },
     38 )

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/decorators.py in wrapper(func)
    219             with typeinfer.register_dispatcher(disp):
    220                 for sig in sigs:
--> 221                     disp.compile(sig)
    222                 disp.disable_compile()
    223         return disp

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/dispatcher.py in compile(self, sig)
    907                 with ev.trigger_event("numba:compile", data=ev_details):
    908                     try:
--> 909                         cres = self._compiler.compile(args, return_type)
    910                     except errors.ForceLiteralArg as e:
    911                         def folded(args, kws):

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/dispatcher.py in compile(self, args, return_type)
     77
     78     def compile(self, args, return_type):
---> 79         status, retval = self._compile_cached(args, return_type)
     80         if status:
     81             return retval

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/dispatcher.py in _compile_cached(self, args, return_type)
     91
     92         try:
---> 93             retval = self._compile_core(args, return_type)
     94         except errors.TypingError as e:
     95             self._failed_cache[key] = e

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/dispatcher.py in _compile_core(self, args, return_type)
    109                                       args=args, return_type=return_type,
    110                                       flags=flags, locals=self.locals,
--> 111                                       pipeline_class=self.pipeline_class)
    112         # Check typing error if object mode is used
    113         if cres.typing_error is not None and not flags.enable_pyobject:

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/compiler.py in compile_extra(typingctx, targetctx, func, args, return_type, flags, locals, library, pipeline_class)
    604     pipeline = pipeline_class(typingctx, targetctx, library,
    605                               args, return_type, flags, locals)
--> 606     return pipeline.compile_extra(func)
    607
    608

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/compiler.py in compile_extra(self, func)
    351         self.state.lifted = ()
    352         self.state.lifted_from = None
--> 353         return self._compile_bytecode()
    354
    355     def compile_ir(self, func_ir, lifted=(), lifted_from=None):

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/compiler.py in _compile_bytecode(self)
    413         """
    414         assert self.state.func_ir is None
--> 415         return self._compile_core()
    416
    417     def _compile_ir(self):

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/compiler.py in _compile_core(self)
    393                 self.state.status.fail_reason = e
    394                 if is_final_pipeline:
--> 395                     raise e
    396         else:
    397             raise CompilerError("All available pipelines exhausted")

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/compiler.py in _compile_core(self)
    384             res = None
    385             try:
--> 386                 pm.run(self.state)
    387                 if self.state.cr is not None:
    388                     break

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/compiler_machinery.py in run(self, state)
    337                     (self.pipeline_name, pass_desc)
    338                 patched_exception = self._patch_error(msg, e)
--> 339                 raise patched_exception
    340
    341     def dependency_analysis(self):

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/compiler_machinery.py in run(self, state)
    328                 pass_inst = _pass_registry.get(pss).pass_inst
    329                 if isinstance(pass_inst, CompilerPass):
--> 330                     self._runPass(idx, pass_inst, state)
    331                 else:
    332                     raise BaseException("Legacy pass in use")

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/compiler_lock.py in _acquire_compile_lock(*args, **kwargs)
     33         def _acquire_compile_lock(*args, **kwargs):
     34             with self:
---> 35                 return func(*args, **kwargs)
     36         return _acquire_compile_lock
     37

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/compiler_machinery.py in _runPass(self, index, pss, internal_state)
    287             mutated |= check(pss.run_initialization, internal_state)
    288         with SimpleTimer() as pass_time:
--> 289             mutated |= check(pss.run_pass, internal_state)
    290         with SimpleTimer() as finalize_time:
    291             mutated |= check(pss.run_finalizer, internal_state)

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/compiler_machinery.py in check(func, compiler_state)
    260
    261         def check(func, compiler_state):
--> 262             mangled = func(compiler_state)
    263             if mangled not in (True, False):
    264                 msg = ("CompilerPass implementations should return True/False. "

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/typed_passes.py in run_pass(self, state)
    461
    462         # TODO: Pull this out into the pipeline
--> 463         NativeLowering().run_pass(state)
    464         lowered = state['cr']
    465         signature = typing.signature(state.return_type, *state.args)

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/typed_passes.py in run_pass(self, state)
    382                 lower = lowering.Lower(targetctx, library, fndesc, interp,
    383                                        metadata=metadata)
--> 384                 lower.lower()
    385                 if not flags.no_cpython_wrapper:
    386                     lower.create_cpython_wrapper(flags.release_gil)

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/lowering.py in lower(self)
    134         if self.generator_info is None:
    135             self.genlower = None
--> 136             self.lower_normal_function(self.fndesc)
    137         else:
    138             self.genlower = self.GeneratorLower(self)

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/lowering.py in lower_normal_function(self, fndesc)
    188         # Init argument values
    189         self.extract_function_arguments()
--> 190         entry_block_tail = self.lower_function_body()
    191
    192         # Close tail of entry block

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/lowering.py in lower_function_body(self)
    214             bb = self.blkmap[offset]
    215             self.builder.position_at_end(bb)
--> 216             self.lower_block(block)
    217         self.post_lower()
    218         return entry_block_tail

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/lowering.py in lower_block(self, block)
    228             with new_error_context('lowering "{inst}" at {loc}', inst=inst,
    229                                    loc=self.loc, errcls_=defaulterrcls):
--> 230                 self.lower_inst(inst)
    231         self.post_block(block)
    232

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/contextlib.py in __exit__(self, type, value, traceback)
    128                 value = type()
    129             try:
--> 130                 self.gen.throw(type, value, traceback)
    131             except StopIteration as exc:
    132                 # Suppress StopIteration *unless* it's the same exception that

~/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/numba/core/errors.py in new_error_context(fmt_, *args, **kwargs)
    749         newerr = errcls(e).add_context(_format_msg(fmt_, args, kwargs))
    750         tb = sys.exc_info()[2] if numba.core.config.FULL_TRACEBACKS else None
--> 751         raise newerr.with_traceback(tb)
    752
    753

LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
Storing i64 to ptr of i32 ('dim'). FE type int32

File "../../../../../../../miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/umap/layouts.py", line 52:
def rdist(x, y):
    <source elided>
    result = 0.0
    dim = x.shape[0]
    ^

During: lowering "dim = static_getitem(value=$8load_attr.2, index=0, index_var=$const10.3, fn=<built-in function getitem>)" at /Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/umap/layouts.py (52)
LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
Storing i64 to ptr of i32 ('dim'). FE type int32

File "../../../../../../../miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/umap/layouts.py", line 52:
def rdist(x, y):
    <source elided>
    result = 0.0
    dim = x.shape[0]
    ^

During: lowering "dim = static_getitem(value=$8load_attr.2, index=0, index_var=$const10.3, fn=<built-in function getitem>)" at /Users/gway/miniconda3/envs/resistance-mechanisms/lib/python3.7/site-packages/umap/layouts.py (52)

Apply UMAP to combined dataset and generate figure

The UMAP analysis will help to determine batch concordance

Notes from Meeting 1/8/20

Documenting notes here that I previously transcribed

Cloning procedure was clarified in #45
Next steps computationally
- Look at features that differentiate resistant and wild-type
  - Are these features consistent in different batches?
- Figure out the best way to merge datasets together
Next steps molecularly
- Collect cell painting data in a new cell line (this will require additional troubleshooting to onboard new cell line)
  - ~2 to 4 months for new cell line
    - Broad may help with acquisition?
- Collect resistance mechanisms from new drug
  - ~2 months for another drug
Potential high-level outcomes:
- We find a clear resistance signature that helps form hypothesis
- The resistance signature is a mystery and we'll need to figure out next steps

Signature Analysis - PMSB5 and generic resistance signatures

In #57 I add a signature analysis. The purpose of this analysis is to identify morphology features that are significantly different between wildtype and resistant clones. The next step is to apply the signatures to other profiles to 1) validate the approach and 2) predict the resistance status of different samples.

This analysis was prompted by #49 .

I will describe the approach, results, and conclusions in this issue.

Site Level Audit - Results

In #60 I add code and results for performing an analysis audit (determining profile reproducibility and plate effects). I generate site level profiles in #59 for a subset of batches (batch 1 20x, batch 2, batch 5, batch 6, and batch 7). I show results below:

Batch 1 - 20X

Click to expand

Cell Counts

Replicate Reproducibility

Batch 2

Click to expand

Cell Counts

Replicate Reproducibility

Batch 5

Click to expand

Cell Counts

Replicate Reproducibility

Batch 6

Click to expand

Cell Counts - Plate 217760

Replicate Reproducibility - Plate 217760

Cell Counts - Plate 217762

Replicate Reproducibility - Plate 217762

Batch 7

Click to expand

Cell Counts - Plate 217766

Replicate Reproducibility - Plate 217766

Cell Counts - Plate 217768

Replicate Reproducibility - Plate 217768

Summary

There are many cells in nearly every site (some sites are completely missing) and the impact of plate effects appears minimal. I will proceed with analyzing these data downstream using site-level profiles.

Document results of profile reproducibility at 40x vs 20x

@bethac07 had report this

I've attached the PNGs of the clustering (and Shantanu, for you the GCTs)- on the left inserted here in the email we have the clustering from the 20X images, and on the right we have the 40x images. On the whole, they're actually remarkably similar, and pretty sensible from what we know about the biology- the biggest supercluster at the top left for both are the untreated and the low dose wells of all three cell clones, the supercluster at the other end are the higher doses of drug (and for WT, the intermediate dose), and for the resistant clones the middle dose produces an intermediate phenotype. Overall, I'm pretty encouraged by this- it means we're getting solid profiles (congrats guys!), and to me this doesn't suggest there is a "wrong" choice with respect to 20 vs 40x. There certainly seems to be more substructure in the 20X, but I'm not entirely certain how much we should really read into that that

20x:

40x

20x GCT (rename to .gct)
2019_02_15_Batch1_20X_collapsed.gct.txt

40x (rename to .gct)
2019_02_15_Batch1_40X_collapsed.gct.txt

Since this is the only project where we collected 20x vs 40x data, it will be very useful to quantify this result.

@gwaygenomics Any ideas on how to do this? I'd imagine a one-number summary will be hard, since it probably makes sense to report across all doses.

We no longer observe four clear "groups" in batch 3 data

Previously, as described in #25, we observed four distinct groups in batch 3 data. Interestingly, these four groups seemed to have similar representation of resistant and wildtype clones.

In #25, I describe results from a subcluster analysis that was performed to determine if any features were consistently different between resistant and wild-type clones. As shown here #25 (comment), we did not observe many features with consistent differences. This suggests that core differences between wild-type and resistant clones is not consistent across these groups.

Additionally, it was determined that one driving force behind the groupings was because of wonky normalized profiles, primarily b/c of costes features. So, starting in #32, I began reprocessing all profiles using a unified pipeline based on pycytominer. In an upcoming pull request, I will add all software, data, and results, but I will show below that we no longer observe four clear groups. There is still definitely interesting structure in the data, but the groupings are not as striking, and an analysis that uses these groups is not likely to find consistent and reproducible patterns.

Batch 3 Data

Note that batch 3 profiles were all untreated. For more details on available data, see #40.

Conclusion

Recently, in #49, it was hypothesized that clones A and E (batch 1 and 2) data may harbor similar resistance features as other resistant clones (BZ001, BZ002, BZ003, and BZ004). We know that clones A and E have PSMB5 mutations, and these mutations confer bortzeomib resistance (Lu and Wang). If we can link morphology signatures of A and E clones to the other resistant clones, this may suggest that other resistant clones also have PSMB5 mutations.

One way to perform this analysis is to see where clones A and E fall within each "group". Since the grouping structure does not appear to be as consistent as before, I think an alternative strategy will be more effective. I will outline an alternative strategy in a separate issue!

Bortezomib signature platemap effect?

Confirm that platemap is not influencing the signature we observed in #94

Potential Inconsistency in Platemap for Batch 1 (20X)

Hi @mekelley and @ayberman,

I am working through implementing a PSMB5 signature analysis (described briefly here #49 (comment))

This requires using data from early batches (batch 1 20x and batch 2). These are the only batches to have profiles for clones A and E (the only confirmed PSMB5 mutations), correct?

I noticed that there is a bit of a wonky platemap:

Copy and pasted below:

plate_map_name	well_position	CellLine	Dosage
PlateMap_HCT116bortezomib	B03	WT	0
PlateMap_HCT116bortezomib	B04	WT	0
PlateMap_HCT116bortezomib	B05	WT	0
PlateMap_HCT116bortezomib	B06	CloneA	0
PlateMap_HCT116bortezomib	B07	CloneA	0
PlateMap_HCT116bortezomib	B08	CloneA	0
PlateMap_HCT116bortezomib	B09	CloneE	0
PlateMap_HCT116bortezomib	B10	CloneE	0
PlateMap_HCT116bortezomib	B11	CloneE	0
PlateMap_HCT116bortezomib	C03	WT	0.7
PlateMap_HCT116bortezomib	C04	WT	0.7
PlateMap_HCT116bortezomib	C05	WT	0.7
PlateMap_HCT116bortezomib	C06	CloneA	0.7
PlateMap_HCT116bortezomib	C07	CloneA	0.7
*PlateMap_HCT116bortezomib	F08	CloneA	0.7
PlateMap_HCT116bortezomib	C09	CloneE	0.7
PlateMap_HCT116bortezomib	C10	CloneE	0.7
PlateMap_HCT116bortezomib	C11	CloneE	0.7
PlateMap_HCT116bortezomib	D03	WT	7
PlateMap_HCT116bortezomib	D04	WT	7
PlateMap_HCT116bortezomib	D05	WT	7
PlateMap_HCT116bortezomib	D06	CloneA	7
PlateMap_HCT116bortezomib	D07	CloneA	7
PlateMap_HCT116bortezomib	D08	CloneA	7
PlateMap_HCT116bortezomib	D09	CloneE	7
PlateMap_HCT116bortezomib	D10	CloneE	7
PlateMap_HCT116bortezomib	D11	CloneE	7
PlateMap_HCT116bortezomib	E03	WT	70
PlateMap_HCT116bortezomib	E04	WT	70
PlateMap_HCT116bortezomib	E05	WT	70
PlateMap_HCT116bortezomib	E06	CloneA	70
PlateMap_HCT116bortezomib	E07	CloneA	70
PlateMap_HCT116bortezomib	E08	CloneA	70
PlateMap_HCT116bortezomib	E09	CloneE	70
PlateMap_HCT116bortezomib	E10	CloneE	70
PlateMap_HCT116bortezomib	E11	CloneE	70

This doesn't impact analysis much at all, just confirming that the F above should be a C. Also, on the chance that this is indeed a small mistake - this stuff happens literally all the time, so no worries! 😹 It is good to confirm and clear up once identified. ( I didn't catch it in my original processing!)

Data Summary 2019

We are nearing the end of 2019, and we are starting to increase data collection. This issue will document the data we have currently collected.

Overall Summary

In summary, we've made a lot of progress in optimizing data collection, and a lot of the profiling results look promising. For example, we've identified some morphology features that are consistently different between mutant and wild-type clones. We can use the current data to finalize optimal conditions and optimal plate layouts so that we can efficiently scale up in 2020.

We can decide to scale up to profile different bortezomib resistant clones in HCT116. This would answer how many resistance mechanisms does this system develop. We could also scale up to test different proteasome inhibitors. The hypothesis there is that HCT116 cells develop the same resistance mechanism against any proteasome inhibitor. We could also scale to different cell lines, although this may be difficult since we'll have to optimize conditions again. The overall goal is to be able to predict resistance in single cells based on cell morphology. More exciting progress and questions to attack in 2020! 👀

cc @bethac07 @mekelley @shntnu @AnneCarpenter

At a Glance

HCT116 Cell Line (colorectal cancer), 2 treatments (DMSO control and Bortezomib (proteasome inhibitor)), 7 batches of data, 633 total profiles

We have noted a potential issue in the number of cells in each well. High confluence may lead to incorrect segmentation and inaccurate profiles. This is one issue that we are working towards solving. I note the size of the single cell profiles (.sqlite file) for each plate. The bigger the size, the higher the confluence.

Batch	Plate	sqlite file size
2019_02_15_Batch1_20X	HCT116bortezomib	11G
2019_02_15_Batch1_40X	HCT116bortezomib	4.8G
2019_03_20_Batch2	207106_exposure320	7.2G
2019_06_25_Batch3	WTClones	23G
2019_06_25_Batch3	MutClones	26G
2019_11_11_Batch4	WTmut04hWed	56G
2019_11_11_Batch4	WTmut04hTh	56G
2019_11_19_Batch5	217755	37G
2019_11_20_Batch6	217762	28G
2019_11_20_Batch6	217760	48G
2019_11_22_Batch7	217768	17G
2019_11_22_Batch7	217766	39G

Traditionally, these files are between 10 and 25 GB in 384 well plates. These are only 96 well plates and the files are generally much larger.

Current Summary

Initial Testing

Batch 1 - tested magnification
- We decided on 20X
Batch 2 - Replicate of batch 1 using only 20x data

Measure different clones

Batch 3 - Tested a bunch of different wildtype and mutation clones

Measure different time points

Batch 4 - Tested one extra day of growth (the cells in this batch might be too confluent to use)

Measure with lower confluence

Batch 5 - We tried to measure the same layout as batch 4 but with lower cell density

Test different confluence levels

Batch 6 - Testing two different plating densities
Batch 7 - Testing two different plating densities

Batch Details

Batch 1 - Acquired on 15 February 2019

Batch 1 data was acquired using two different magnifications (20X and 40X) (TODO see #28). The two batches are: 2019_02_15_Batch1_20X and 2019_02_15_Batch1_40X.

Notes - Batch 1

There were three cell lines tested: CloneA, CloneE, and WT.
The treatment tested was: Bortezomib
There were four doses tested: 0.0, 0.7, 7.0, and 70.0
The 0.0 dose represents 0.1% DMSO (control vehicle only)
Each profile was acquired in triplicate

Batch 2 - Acquired on 20 March 2019 (`2019_03_20_Batch2`)

Batch 2 data had the same data acquired as Batch 1. Batch 2 tested the same cell lines, the same perturbations, the same doses, and the same number of replicates as Batch 1.

Batch 3 - Acquired on 25 June 2019 (`2019_06_25_Batch3`)

Batch 3 data saw a shift in data collection. These cells have not undergone any treatment (not even DMSO/control vehicle). There were two plates (MutClones and WTClones). Each plate had many different wildtype and mutant clones. There were three wildtype parental lines acquired on both plates.

Notes - Batch 3

There were 18 mutant clones profiled: BZ001, BZ002, BZ003, ...
There were 15 wildtype clones profiled: WT001, WT002, WT003, ...
Wildtype parental profiles were collected on both plates
Profiles were acquired in triplicate

Batch 4 - Acquired 11 November 2019 (`2019_11_11_Batch4`)

Batch 4 also saw a shift in data collection. Cells were grown for 48h before fixation on both plates, however different densities were plated (10.5x10^3 cells/well for WTmut04hWed [plate #217744] and 7x10^3 cells/well for WTmut04hTh [plate #217748]). We've also now started collecting wildtype parental clones on every plate with 9 replicates.

Notes - Batch 4

These files are HUGE (56G) - there are too many cells here for the profiles to be reliable.
The plates were identical
Each profile was treated with DMSO or Bortezomib (7 nM)
Four mutant clones were tested (BZ001, BZ008, BZ017, BZ018)
Four wildtype clones were tested (WT002, WT008, WT009, WT011)
Wildtype parental lines were acquired on both plates, with both DMSO and bortezomib treatments
All profiles were captured in triplicate, except for wildtype parental lines treated with DMSO. These were collected 9 times.

Batch 5 - Acquired 19 November 2019 (`2019_11_19_Batch5`)

Batch 5 was the same experimental design as Batch 4. Batch 5 is a bit smaller than batch 4, but still very large (37G). Batch 5 was plated at 5x10^3 cells/well.

Notes - Batch 5

There is only one plate measured (217755)
Bortezomib and DMSO treatments
The same clones and replicate numbers as Batch 4

Batch 6 - Acquired 20 November 2019 (`2019_11_20_Batch6`)

Batch 6 was the same experimental design as Batches 4 and 5. There were two plates acquired in Batch 6: 217760 and 217762. We acquired brightfield images of these plates as well.

Notes - Batch 6

Bortezomib and DMSO treatments
The same clones and replicate numbers as Batches 4 and 5
Two different cell counts initially plated (217760 was 48G and 217762 was 28G)
Plated 5x10^3 cells/well in plate 217760 (imaged 5 channels)
- Plate 217761 is the brightfield of plate 217760
Plated 2.5x10^3 cells/well in plate 217762 (imaged 5 channels)
- Plate 217763 is the brightfield of plate 217762

Batch 7 - Acquired 22 November 2019 (`2019_11_22_Batch7`)

Batch 7 was the same experimental design as Batches 4, 5, and 6. There were two plates acquired in Batch 6: 217766 and 217768. Brightfield was also captured for this batch.

Notes - Batch 7

Bortezomib and DMSO treatments
The same clones and replicate numbers as Batches 4, 5, and 6
Two different cell counts initially plated (217766 was 39G and 217768 was 17G)
Plated 5x10^3 cells/well in plate 217766 (imaged 5 channels)
- Plate 217767 is the brightfield of plate 217766
Plated 2.5x10^3 cells/well in plate 217768 (imaged 5 channels)
- Plate 217769 is the brightfield of plate 217768

edits to add important details @mekelley described in #40 (comment)

broadinstitute / profiling-resistance-mechanisms Goto Github PK

profiling-resistance-mechanisms's Introduction

Discovering Morphological Markers of Drug Resistance

Citation

Data collection and processing

Pilot analyses

UMAP Batch Analysis

T-test to Determine Morphological Differences

Reproducibility

Bug Reporting

Internal documents

profiling-resistance-mechanisms's People

Contributors

Stargazers

Watchers

Forkers

profiling-resistance-mechanisms's Issues

Update 11/16/20

Critical Issue

Approach to Address

Results

Plate for Cell Count

Plate for Replicate Correlation

Recommendation

Primary Result

Summary

Next steps

Signature titration

More data collection

Data

Clone A/E

Four Clone

Signature generation

Procedure

Volcano plots

Clone AE

Four Clone

Result

Apply signatures

Approach

Method

Results

Comparison 1: Clone AE Dataset - Clone AE Signature

Comparison 2: Clone AE Dataset - Four Clone Signature

Comparison 3: Four Clone Dataset - Clone AE Signature

Comparison 4: Four Clone Dataset - Four Clone Signature

Experiment

Clarification Questions

Wildtype Clonal Line Discussion

Plate Contents

Batch 9 (2020_08_24_Batch9)

Batch 10 (2020_09_09_Batch10)

Audit

Limitation

Goal

Challenges

Requirement

Approach

Simple

2-Step-Sphering

Concern

Cells in Big Red Cluster

WT Clones in Well C02

Mut Clones in Well B06

WT Parental Clones in D11

Cells not in big red cluster

Mut Clones in D09

WT Clones in E05

Batch 1 - 20X

Cell Counts

Replicate Reproducibility

Batch 2

Cell Counts

Replicate Reproducibility

Batch 5

Cell Counts

Replicate Reproducibility

Batch 6

Cell Counts - Plate 217760

Replicate Reproducibility - Plate 217760

Batch 9 (`2020_08_24_Batch9`)

Batch 10 (`2020_09_09_Batch10`)