Giter Site home page Giter Site logo

2021_09_01_varchamp's Introduction

VarChAMP: Variant Characterization across the Mendelian Proteome

We aim to functionally characterize approximately 100,000 coding variants across Mendelian disease genes, addressing the significant gap in understanding the impact of human genomic variations. By analyzing the phenotypic impacts of these variants, we seek to elucidate genotype-phenotype relationships in inherited disorders. We will create a searchable database detailing these variant effects, accessible through the IGVF consortium, which will contribute to public health by aiding in the diagnosis and treatment of Mendelian disorders.

Documents

GDrive folder (internal): link

What's in this repo?

This repo contains the analysis scripts and notebooks for the VarChAMP project. The data is stored in a separate repo, 2021_09_01_VarChAMP-data, which is added as a submodule to this repo. Profiles from all the plates are in 2021_09_01_VarChAMP-data/profiles. All levels of profiles downstream of the aggregation step in the pycytominer workflow are in that folder.

How to use this repo?

  1. Fork the repo

  2. Clone the repo

    git clone [email protected]:<YOUR USER NAME>/2021_09_01_VarChAMP.git
  3. Download the contents of the submodule

    git submodule update --init --recursive
    cd 2021_09_01_VarChAMP-data
    dvc pull
    git lfs pull
  4. Install the conda environment within each folder before running the notebooks. We use mamba to manage the computational environment. To install mamba see instructions. After installing mamba, execute the following to install and navigate to the environment:

    # First, install the conda environment
    mamba env create --force --file environment.yml
    
    # If you had already installed this environment and now want to update it
    mamba env update --file environment.yml --prune
    
    # Then, activate the environment and you're all set!
    environment_name=$(grep "name:" environment.yml | awk '{print $2}')
    mamba activate $environment_name
  5. Run the notebooks

2021_09_01_varchamp's People

Contributors

jessica-ewald avatar marziehhaghighi avatar shntnu avatar yhan8 avatar zitong-chen-16 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

2021_09_01_varchamp's Issues

Selecting negative and positive control ORFs

Negative controls:

  • We want negative controls to:
    • Have no signature confirmed by existing datasets.
      • Having no signature usually is equivalent to having low replicate correlations for negative controls, so relying on that rule, Niranj has searched for negative controls which have low replicate correlations based on two CPJUMP1 and in the JUMP production experiments.

Positive controls:

  • We want positive controls to:
    • Have strong wt to mutant mislocalization
      • we check the manual impact score by Jessie
    • Have strong wt phenotype using the rest of the cell painting channels comparing to controls
      • we check the replicate correlation in the cpjump1 dataset.

First Pass - Pilot Variant Painting data analysis

Goal:

  • What proportion of variants and which ones show a signal relative to their WT?

Basic Analysis using mean profiles:

  • Using correlation coefficients (by Marzieh)

    • Replicate correlation + null distributions
    • list of correlation coefficient scores for each pair
  • Using MAP (by @yhan8)

    • #15
    • list of map scores for each pair

poscon negcon selection experiment by Chloe

Notes from Chloe's emails:

  • Email on Aug 19, 2022:

    • For our positive controls, ideally we’d like to establish a reference ORF paired with two mutants, one showing strong shifts and one subtle in the protein channel as well as detectable changes in morphology. In this case, profiling would especially be helpful. For the NegCons, we must slim down our selection to only 4 ORFs – I’m not sure if you guys have preference for selection there.
  • Email on Sep 15, 2022:
     

    • Regarding the PosCons, we’d like to select either IMPDH1 or ALK as our reference allele,
      plus two of their respective variants (one which shows strong morphological shifts/localization patterns,
      and one that’s subtle). For NegCons, we can only select 4 to include in our screen –
      I’ll leave it up to you guys which 4 best suit your needs.

    • You can disregard all wells that are not labelled either PosCon or NegCon for this screen.
      And please keep in mind each quadrant received a varying dose of viral supernatant.
      The amount I settled on for our final pipeline is 6 uL, so perhaps you want to pay attention to the wells which received a vTitre = 6.

Why do we treat the protein channel differently?

⚠️ tedium ahead! I confused myself after writing it down so I suggest we just discuss this in person, but feel free to add notes.


I’d find it so helpful if we can clearly articulate two things

1. In the context of controls, why should we think of the protein channel differently from a Cell Painting channel, say, the ER channel for the sake of this discussion?

It is seemingly obvious, but if you take a simplified/abstracted view that all channels are just measuring some aspect of a cell’s phenotype, it’s less obvious to me.

I get it for no-protein negative controls — there’s nothing to mark in the protein channel. I'll note that there's no equivalent for the ER channel, that is, there isn't a "no-ER" negative control, and that's because the ER is always present and will therefore always be marked. Perhaps the only equivalent would be if a hypothetical negative control somehow destroyed the ability of Con A to bind to ER (without affecting the ER itself). In this case, all perturbations that didn't destroy the ability of Con A to bind to ER – whether or not they affected ER itself – would have a phenotype.

But what about a protein control that had any localization other than the protein of interest? Here you say the protein of interest will always have a phenotype. Why? Again, it is seemingly obvious because if we know the negative control has localization X, then any protein that doesn't have localization X will have a phenotype. Is there an equivalent for the ER channel? I suppose the equivalent is again a hypothetical negative control that would somehow change – but not destroy – the ability of Con A's binding to ER (without affecting the ER itself). In this case, all perturbations that didn't similarly change the ability of Con A to bind to ER – whether or not they affected ER itself – would have a phenotype.

All this leads to the next question

2. What is a good negative control, when we limit ourselves only to the phenotype observed in the protein channel?


Context:

Shantanu Singh
That said, if we’ve settled on negative controls, we can still use the framework for reporting phenotypic activity.

Anne Carpenter
(which would be for the non-protein channels - the protein channel would always have a phenotype if we are comparing to no-protein controls, or if we choose a protein control that had any localization other than the protein of interest)

Processing Pipeline

Instructions on the processing steps and parameters in each step:

1- Clean Platemap and the cleaned version in metadata/reprocessed

  • The input platemap for this project has been inconsistent across batches and also within the experiment
  • We should check this input metadata and make it consistent with an standard for each batch of data
  • Sometimes even the sqlite file has irregular column naming that we have to address by another way of handling it.

2- Save a subset of intensity features for transfection efficiency exploration and parameter selection of transfection detection

  • Save folder: plate_raw_intensity_features

3- Read Intensity features and save their distribution in results/intensity_dists

4- Based on the fixed parameters for transfection detection, generate and save population level profiles and also save transfected single cells for visualization and subpopulation analysis

  • Save folder for mean profiles: '/population_profiles/'+batchName+'/'+plateName
  • Save folder for transfected single cell profiles '/singlecell_profiles/'+batchName+'/'+plateName

5- Read data for analysis

  • Parameters:
    - single cell scaling: 'sc_per_plate_scaling':'raw' or 'sc_scaled_per_plate'
    - well level profiles zcoring: zscored_profiles: 'untransfected','untransfected_stringent'

6- Calculate replicate correlation of profiles

  • Save curve plots and values to results/replicate_corr_curves

7- Calculate WT-MT impact scores and save

  • Approach 1: average replicate level profiles and score treatment level profiles
    • save the results in: /results/Impact-Scores/Method-MeanProfiles/impact_scores_trt_todaydate
  • Approach 2: calculate impact scores per plate
    • save the results in: /results/Impact-Scores/Method-MeanProfiles/impact_scores_perplate_todaydate

Evaluation of Negative and Positive Control Selections

This issue documents information and evaluation results about the negative and positive controls we included in the VarChAMP CellPainting experiments. From the previous discussion in GH issue #3, the following treatments are selected as controls in batch B1A1R1.

  • Negative Control for Morphological Changes: MAPK9, RHEB, SLIRP, & PRKACB
    • Labeled "NC" in "node_type" column in metadata file.
    • Treatments selected because they have low replicate correlations in CPJUMP1 experiments.
  • Positive Control for Morphological Changes: PTK2B
    • Labeled "PC" in "node_type".
    • Selected because its function is likely to induce morphological changes but hasn't been confirmed in our assay.
  • Positive Control for Protein Instability/Mislocalization - ALK vs. ALK R1275Q
    • Labeled "PC" in "node_type".
    • Selected because the protein of variant R1275Q should mislocalize to the ER.
  • Transduction/Selection Control: 516 - TC
    • Labeled "TC" in "node_type".
    • There should be no remaining cells in the well after selection.

Plate 1 to 4 in B1A1R1 contains four replicates of each control treatment on each plate. Plate 4 also contains 21 candidate controls (one replicate on each plate) that could be used as controls in future experiments.

For our reference, the annotations in the "node_type" column in plate maps are:

  • "allele" - missense mutant
  • "disease_wt" - reference/WT allele
  • "TC" - transduction/selection control
  • "PC" - positive control
  • "NC" - negative control
  • "cPC" - candidate positive control (morphology)
  • "cPPC" - candidate positive control (protein)
  • "cNC" - candidate negative control (morphology)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.