Giter Site home page Giter Site logo

Comments (30)

AnneCarpenter avatar AnneCarpenter commented on May 29, 2024 1

Is this plot for all reagents (WT and MUT) being able to retrieve replicates of themselves against a background of all samples on the plates? And we are seeing roughly half do so?

from 2021_09_01_varchamp.

AnneCarpenter avatar AnneCarpenter commented on May 29, 2024 1

Awesome, could you provide zooms of both where the x axis ends around 0.1?
And can you make the legend the same in both so we don't re-learn the colors' meanings? (also good to use colorblind friendly palette, IIRC one of our labmates has trouble w red/green)

from 2021_09_01_varchamp.

AnneCarpenter avatar AnneCarpenter commented on May 29, 2024

This sounds great! I just want to clarify about the 2nd step where we compare WT and MT using MAP which you described by email: 

  • Calculating the MAP for each MT with respect to WTs (for each MT profiles we query WTs and then average over per MT AP values)

Each MT profile will try to retrieve its WT profiles against a pool of what? (its own MT replicates, or the whole experiment of profiles? if it's the former I could imagine that almost all MT/WT pairs will look different enough to pass this threshold, such that offering it the whole experiment or plate of profiles provides better resolution of the ability to retrieve?)

from 2021_09_01_varchamp.

MarziehHaghighi avatar MarziehHaghighi commented on May 29, 2024
  • Against the whole experiment profiles. Like in previous "correlation coefficient" based impact score calculations, we were setting the 15th percentile of the replicate correlation distribution as the threshold to say WT-MT scores less than that threshold can be considered impactful. Here, we instead say the MAP of MT versus WTs scores should be less than the 15th percentile of the MAP distribution for retrieval of replicates. Does it make sense?

from 2021_09_01_varchamp.

AnneCarpenter avatar AnneCarpenter commented on May 29, 2024

Makes sense!

from 2021_09_01_varchamp.

yhan8 avatar yhan8 commented on May 29, 2024
  • Using MAP (by @yhan8)

    • Replicate correlation + null distributions
    • list of map scores for each pair

Drafting my steps here. In the metadata , column Metadata_Sample_Unique includes the wild type and mutant names. Two kinds of replicability will be calculated using evalzoo:

  1. technical replicability-whether replicates based on Metadata_Sample_Unique are replicates. There are no controls in the data (i.e., remove all 516 -TC), so it will be replicate against non-replicates, by plate.
  2. biological replicability-whether mutant for the same wild type can be retrieved from from the wild type itself.

Need to discuss with @shntnu on editing the evalzoo script to accomodate this study.

from 2021_09_01_varchamp.

MarziehHaghighi avatar MarziehHaghighi commented on May 29, 2024

Summary of the results for basic analysis using correlation coefficient metric

Data stats

  • 100 uniques WTs
  • 254 unique pairs
  • Used feature selected level of profiles
  • 95 protein channel features and 584 non-protein channel features corresponding to the rest of channels went for analysis

Replicate correlation + null distributions

  • Based on protein channel features
    image

  • Based on non-protein channel features
    image

Table of all scores

  • Scores are based on approach 2 (average of per plate cc impact scores)
  • Source data on s3
Gene Metadata_Sample_Unique cc_p wt_RepCor_p cc_np wt_RepCor_np RepCor_p Rand90Perc_p Rep10Perc_p RepCor_np Rand90Perc_np Rep10Perc_np
0 DOLK DOLK Tyr441Ser 0.812154 0.303609 0.549158 0.246678 0.459496 0.236407 0.359196 0.32718 0.183632 0.206219
1 EMD EMD Ala56Thr 0.400925 0.663506 0.141576 0.108136 0.541148 0.236407 0.359196 0.638897 0.183632 0.206219
2 EMD EMD Asp72Val 0.496056 0.663506 0.261438 0.108136 0.305962 0.236407 0.359196 0.265102 0.183632 0.206219
3 EMD EMD Met1Val 0.689438 0.663506 0.358089 0.108136 0.212016 0.236407 0.359196 0.248557 0.183632 0.206219
4 EMD EMD Pro183His 0.101571 0.663506 0.263846 0.108136 0.361373 0.236407 0.359196 0.235395 0.183632 0.206219
5 EMD EMD Pro183Thr 0.290396 0.663506 0.10283 0.108136 0.338601 0.236407 0.359196 0.506725 0.183632 0.206219
6 EMD EMD Ser54Phe 0.543792 0.663506 0.156373 0.108136 0.360591 0.236407 0.359196 0.30438 0.183632 0.206219
7 IMPDH1 IMPDH1 Arg309Pro 0.354869 0.545031 0.242057 0.0643272 0.226381 0.236407 0.359196 0.170917 0.183632 0.206219
8 IMPDH1 IMPDH1 Asp311Asn -0.209642 0.545031 0.458999 0.0643272 0.542843 0.236407 0.359196 0.0557816 0.183632 0.206219
9 AIPL1 AIPL1 Arg270His 0.273113 0.810226 0.605787 0.546889 0.792534 0.236407 0.359196 0.542621 0.183632 0.206219
10 AIPL1 AIPL1 Arg302Leu 0.862609 0.810226 0.717986 0.546889 0.830929 0.236407 0.359196 0.356391 0.183632 0.206219
11 AIPL1 AIPL1 Met79Thr 0.154947 0.810226 0.491816 0.546889 0.693704 0.236407 0.359196 0.260006 0.183632 0.206219
12 AIPL1 AIPL1 Thr114Ile 0.965001 0.810226 0.939664 0.546889 0.840271 0.236407 0.359196 0.661714 0.183632 0.206219
13 EIF2B4 EIF2B4 Ala228Val 0.369129 0.399015 0.618664 0.481428 0.75831 0.236407 0.359196 0.210716 0.183632 0.206219
14 EIF2B4 EIF2B4 Ala391Asp 0.510709 0.399015 -0.172932 0.481428 0.636257 0.236407 0.359196 0.435466 0.183632 0.206219
15 EIF2B4 EIF2B4 Arg306Gly 0.626046 0.399015 0.555183 0.481428 0.443025 0.236407 0.359196 0.233263 0.183632 0.206219
16 ALAS2 ALAS2 Ala135Thr 0.957832 0.942857 0.74251 0.509418 0.919249 0.236407 0.359196 0.382541 0.183632 0.206219
17 ALAS2 ALAS2 Arg374Cys 0.92887 0.942857 0.584661 0.509418 0.839 0.236407 0.359196 0.516608 0.183632 0.206219
18 ALAS2 ALAS2 Asp122Asn 0.96137 0.942857 0.628351 0.509418 0.892947 0.236407 0.359196 0.588997 0.183632 0.206219
19 ALAS2 ALAS2 Asp153Val 0.950552 0.942857 0.576244 0.509418 0.918158 0.236407 0.359196 0.532146 0.183632 0.206219
20 ALAS2 ALAS2 Cys358Tyr 0.94791 0.942857 0.566148 0.509418 0.947057 0.236407 0.359196 0.846761 0.183632 0.206219
21 ALAS2 ALAS2 Gly254Ser 0.973936 0.942857 0.579645 0.509418 0.947381 0.236407 0.359196 0.842696 0.183632 0.206219
22 ALAS2 ALAS2 Lys262Gln 0.97146 0.942857 0.633109 0.509418 0.941559 0.236407 0.359196 0.716846 0.183632 0.206219
23 ALAS2 ALAS2 Phe128Leu 0.899358 0.942857 0.665138 0.509418 0.892667 0.236407 0.359196 0.307553 0.183632 0.206219
24 ALAS2 ALAS2 Ser531Gly 0.854929 0.942857 0.59907 0.509418 0.798461 0.236407 0.359196 0.491536 0.183632 0.206219
25 ALAS2 ALAS2 Thr351Ser 0.957671 0.942857 0.610488 0.509418 0.876557 0.236407 0.359196 0.528331 0.183632 0.206219
26 ALAS2 ALAS2 Tyr549Phe 0.961277 0.942857 0.621325 0.509418 0.932656 0.236407 0.359196 0.418941 0.183632 0.206219
27 CLCNKA CLCNKA Trp80Cys -0.471066 0.37997 -0.0133432 0.244383 0.696592 0.236407 0.359196 0.76506 0.183632 0.206219
28 FBP1 FBP1 Ala177Asp -0.168784 0.836435 -0.200173 0.392719 0.247128 0.236407 0.359196 0.154469 0.183632 0.206219
29 CTRC CTRC Arg246Cys 0.634972 0.797619 0.111161 0.216912 0.801093 0.236407 0.359196 0.455059 0.183632 0.206219
30 CTRC CTRC Arg37Gln 0.765336 0.797619 0.265112 0.216912 0.796699 0.236407 0.359196 0.334993 0.183632 0.206219
31 CTRC CTRC Gln178Arg -0.340483 0.797619 0.353732 0.216912 0.867971 0.236407 0.359196 0.568933 0.183632 0.206219
32 CTRC CTRC Glu225Ala 0.87933 0.797619 0.599191 0.216912 0.776341 0.236407 0.359196 0.24573 0.183632 0.206219
33 DCX DCX Ala251Ser -0.208084 0.790209 -0.524234 0.305862 -0.00795505 0.236407 0.359196 0.331395 0.183632 0.206219
34 DCX DCX Ala71Ser 0.889039 0.790209 0.862022 0.305862 0.666018 0.236407 0.359196 0.477633 0.183632 0.206219
35 DCX DCX Arg102Cys 0.857378 0.790209 0.778809 0.305862 0.812403 0.236407 0.359196 0.509653 0.183632 0.206219
36 DCX DCX Arg186His 0.935348 0.790209 0.786348 0.305862 0.818851 0.236407 0.359196 0.407649 0.183632 0.206219
37 DCX DCX Arg186Leu 0.147867 0.790209 0.469696 0.305862 0.816109 0.236407 0.359196 0.440123 0.183632 0.206219
38 DCX DCX Arg196Cys 0.814541 0.790209 0.643877 0.305862 0.670303 0.236407 0.359196 0.421047 0.183632 0.206219
39 DCX DCX Arg196His 0.679362 0.790209 0.724891 0.305862 0.6509 0.236407 0.359196 0.283375 0.183632 0.206219
40 DCX DCX Arg59His 0.93895 0.790209 0.886552 0.305862 0.794319 0.236407 0.359196 0.462963 0.183632 0.206219
41 DCX DCX Arg78Cys 0.60454 0.790209 0.353412 0.305862 0.771091 0.236407 0.359196 0.378881 0.183632 0.206219
42 DCX DCX Arg78His 0.870223 0.790209 0.864304 0.305862 0.835428 0.236407 0.359196 0.287032 0.183632 0.206219
43 DCX DCX Arg89Gly 0.735337 0.790209 0.375841 0.305862 0.77287 0.236407 0.359196 0.461303 0.183632 0.206219
44 DCX DCX Ile214Thr 0.680948 0.790209 0.56692 0.305862 0.640235 0.236407 0.359196 0.60222 0.183632 0.206219
45 DCX DCX Lys174Glu 0.666268 0.790209 0.529493 0.305862 0.780097 0.236407 0.359196 0.58299 0.183632 0.206219
46 DCX DCX Lys50Asn 0.885656 0.790209 0.673236 0.305862 0.812871 0.236407 0.359196 0.266286 0.183632 0.206219
47 DCX DCX Met1Thr 0.36452 0.790209 0.568294 0.305862 0.730808 0.236407 0.359196 0.559341 0.183632 0.206219
48 DCX DCX Pro191Arg -0.279817 0.790209 -0.514462 0.305862 0.259648 0.236407 0.359196 0.328803 0.183632 0.206219
49 DCX DCX Ser129Leu 0.686765 0.790209 0.669147 0.305862 0.634991 0.236407 0.359196 0.427628 0.183632 0.206219
50 DCX DCX Thr203Ala 0.957748 0.790209 0.82135 0.305862 0.860252 0.236407 0.359196 0.387076 0.183632 0.206219
51 DCX DCX Thr203Arg 0.907237 0.790209 0.729365 0.305862 0.761325 0.236407 0.359196 0.399816 0.183632 0.206219
52 DCX DCX Tyr125His 0.609427 0.790209 0.618964 0.305862 0.574572 0.236407 0.359196 0.333535 0.183632 0.206219
53 CRADD CRADD Arg185Gln 0.371742 0.821747 -0.31208 0.285804 0.814273 0.236407 0.359196 0.284464 0.183632 0.206219
54 CRADD CRADD Gly128Arg -0.10655 0.821747 -0.225103 0.285804 0.2063 0.236407 0.359196 0.131331 0.183632 0.206219
55 ACSF3 ACSF3 Ala197Thr -0.348342 0.733033 0.293595 0.304979 0.571281 0.236407 0.359196 0.201265 0.183632 0.206219
56 ACSF3 ACSF3 Arg10Trp 0.131518 0.733033 0.374758 0.304979 0.770124 0.236407 0.359196 0.541322 0.183632 0.206219
57 ACSF3 ACSF3 Arg471Trp -0.3424 0.733033 -0.174532 0.304979 0.498657 0.236407 0.359196 0.366746 0.183632 0.206219
58 ACSF3 ACSF3 Arg558Trp -0.314254 0.733033 -0.099857 0.304979 0.550779 0.236407 0.359196 0.328222 0.183632 0.206219
59 ACSF3 ACSF3 Asp236Asn -0.01745 0.733033 -0.214152 0.304979 0.32766 0.236407 0.359196 0.541452 0.183632 0.206219
60 ACSF3 ACSF3 Asp457Asn -0.223768 0.733033 0.294623 0.304979 0.497507 0.236407 0.359196 0.27002 0.183632 0.206219
61 ACSF3 ACSF3 Glu359Lys -0.344937 0.733033 0.159046 0.304979 0.51067 0.236407 0.359196 0.246498 0.183632 0.206219
62 ACSF3 ACSF3 Gly119Asp -0.309312 0.733033 -0.128686 0.304979 0.575937 0.236407 0.359196 0.208485 0.183632 0.206219
63 ACSF3 ACSF3 Gly225Arg -0.289699 0.733033 0.433965 0.304979 0.634696 0.236407 0.359196 0.349261 0.183632 0.206219
64 ACSF3 ACSF3 Ile200Met -0.31487 0.733033 0.039181 0.304979 0.534015 0.236407 0.359196 0.293977 0.183632 0.206219
65 ACSF3 ACSF3 Met198Arg -0.316022 0.733033 0.078226 0.304979 0.566512 0.236407 0.359196 0.217637 0.183632 0.206219
66 ACSF3 ACSF3 Met266Val -0.315708 0.733033 -0.298251 0.304979 0.531917 0.236407 0.359196 0.185549 0.183632 0.206219
67 ACSF3 ACSF3 Pro243Leu -0.279633 0.733033 -0.163358 0.304979 0.601404 0.236407 0.359196 0.40753 0.183632 0.206219
68 ACSF3 ACSF3 Pro285Leu -0.368099 0.733033 0.0446327 0.304979 0.629503 0.236407 0.359196 0.3804 0.183632 0.206219
69 ACSF3 ACSF3 Ser431Tyr 0.202273 0.733033 0.388348 0.304979 0.414894 0.236407 0.359196 0.199084 0.183632 0.206219
70 ACSF3 ACSF3 Thr358Ile -0.310495 0.733033 0.0161624 0.304979 0.630992 0.236407 0.359196 0.365734 0.183632 0.206219
71 FA2H FA2H Arg143Cys 0.497748 0.261103 0.25313 0.266671 0.560055 0.236407 0.359196 0.0899539 0.183632 0.206219
72 FA2H FA2H Arg62Cys 0.561424 0.261103 0.669551 0.266671 0.510462 0.236407 0.359196 0.46593 0.183632 0.206219
73 FA2H FA2H Phe144Ser 0.362088 0.261103 0.0193911 0.266671 0.19953 0.236407 0.359196 0.203714 0.183632 0.206219
74 FAM161A FAM161A Leu269Arg 0.404807 0.66421 0.0746539 0.554056 0.208519 0.236407 0.359196 0.477677 0.183632 0.206219
75 ASNS ASNS Ala6Glu -0.393304 0.86979 -0.0282066 0.552998 0.777259 0.236407 0.359196 0.467151 0.183632 0.206219
76 BCL10 BCL10 Ala5Ser 0.87405 0.838855 0.831978 0.632199 0.85294 0.236407 0.359196 0.71313 0.183632 0.206219
77 BCL10 BCL10 Leu8Leu -0.207115 0.838855 -0.0647577 0.632199 0.783252 0.236407 0.359196 0.622756 0.183632 0.206219
78 CREB1 CREB1 Asp116Gly -0.399006 0.845436 0.678742 0.364229 0.877664 0.236407 0.359196 0.60409 0.183632 0.206219
79 CRYAB CRYAB Asp109His -0.036847 0.927002 0.334968 0.691911 0.798813 0.236407 0.359196 0.43454 0.183632 0.206219
80 CRYAB CRYAB Gly154Ser 0.974476 0.927002 0.69362 0.691911 0.901846 0.236407 0.359196 0.365807 0.183632 0.206219
81 DES DES Ala135Val 0.498567 0.96098 -0.137358 0.666678 0.900919 0.236407 0.359196 0.492537 0.183632 0.206219
82 DES DES Ala213Val 0.918395 0.96098 0.287841 0.666678 0.933563 0.236407 0.359196 0.392204 0.183632 0.206219
83 DES DES Ala237Thr 0.555215 0.96098 -0.256816 0.666678 0.647107 0.236407 0.359196 0.461733 0.183632 0.206219
84 DES DES Ala337Pro -0.472672 0.96098 -0.320047 0.666678 0.400892 0.236407 0.359196 0.397876 0.183632 0.206219
85 DES DES Ala357Pro 0.0334099 0.96098 -0.26341 0.666678 0.241328 0.236407 0.359196 0.516166 0.183632 0.206219
86 DES DES Ala397Thr 0.500556 0.96098 -0.255255 0.666678 0.802505 0.236407 0.359196 0.548423 0.183632 0.206219
87 DES DES Arg127Pro 0.39693 0.96098 -0.120309 0.666678 0.698106 0.236407 0.359196 0.316707 0.183632 0.206219
88 DES DES Arg150Gln 0.436826 0.96098 -0.128675 0.666678 0.705245 0.236407 0.359196 0.525045 0.183632 0.206219
89 DES DES Arg16Cys 0.974271 0.96098 0.895651 0.666678 0.916468 0.236407 0.359196 0.582325 0.183632 0.206219
90 DES DES Arg212Gln 0.876609 0.96098 0.257424 0.666678 0.939741 0.236407 0.359196 0.602379 0.183632 0.206219
91 DES DES Arg222His 0.476384 0.96098 -0.139527 0.666678 0.784399 0.236407 0.359196 0.319717 0.183632 0.206219
92 DES DES Arg227Cys 0.909985 0.96098 0.47929 0.666678 0.880824 0.236407 0.359196 0.398954 0.183632 0.206219
93 DES DES Arg278Pro 0.339447 0.96098 -0.203738 0.666678 0.717046 0.236407 0.359196 0.523847 0.183632 0.206219
94 DES DES Arg350Pro 0.838215 0.96098 0.334518 0.666678 0.940407 0.236407 0.359196 0.53797 0.183632 0.206219
95 DES DES Arg355Pro -0.262917 0.96098 -0.0822246 0.666678 0.0933792 0.236407 0.359196 0.00857638 0.183632 0.206219
96 DES DES Arg37Trp 0.379265 0.96098 -0.0111762 0.666678 0.454628 0.236407 0.359196 0.325438 0.183632 0.206219
97 DES DES Asn342Asp -0.292271 0.96098 -0.234666 0.666678 0.651685 0.236407 0.359196 0.497773 0.183632 0.206219
98 DES DES Asp312Ala 0.613102 0.96098 0.101297 0.666678 0.671385 0.236407 0.359196 0.473157 0.183632 0.206219
99 DES DES Asp343Asn 0.581868 0.96098 -0.212229 0.666678 0.798856 0.236407 0.359196 0.31538 0.183632 0.206219
100 DES DES Gln131Lys 0.845324 0.96098 0.0421202 0.666678 0.956003 0.236407 0.359196 0.552383 0.183632 0.206219
101 DES DES Gln389Pro 0.948128 0.96098 0.485484 0.666678 0.954953 0.236407 0.359196 0.59401 0.183632 0.206219
102 DES DES Gln99Glu -0.0921232 0.96098 -0.312078 0.666678 0.294887 0.236407 0.359196 0.397539 0.183632 0.206219
103 DES DES Glu245Asp -0.112741 0.96098 -0.248482 0.666678 0.476539 0.236407 0.359196 0.419267 0.183632 0.206219
104 DES DES Glu413Lys 0.688131 0.96098 -0.183831 0.666678 0.660136 0.236407 0.359196 0.445209 0.183632 0.206219
105 DES DES Gly20Arg 0.830217 0.96098 0.302616 0.666678 0.838598 0.236407 0.359196 0.610163 0.183632 0.206219
106 DES DES Gly44Ser 0.489784 0.96098 -0.0703174 0.666678 0.560471 0.236407 0.359196 0.371118 0.183632 0.206219
107 DES DES Gly84Ser 0.456559 0.96098 -0.133157 0.666678 0.575886 0.236407 0.359196 0.306097 0.183632 0.206219
108 DES DES His243Tyr 0.832262 0.96098 0.452825 0.666678 0.882084 0.236407 0.359196 0.249457 0.183632 0.206219
109 DES DES His441Leu 0.691012 0.96098 -0.102578 0.666678 0.832119 0.236407 0.359196 0.454825 0.183632 0.206219
110 DES DES Leu136Pro 0.910889 0.96098 0.390357 0.666678 0.949513 0.236407 0.359196 0.663502 0.183632 0.206219
111 DES DES Leu274Pro 0.558595 0.96098 -0.338633 0.666678 0.824172 0.236407 0.359196 0.379396 0.183632 0.206219
112 DES DES Leu338Arg -0.118026 0.96098 -0.113017 0.666678 0.312462 0.236407 0.359196 0.328504 0.183632 0.206219
113 DES DES Leu345Pro -0.488238 0.96098 -0.275704 0.666678 0.442715 0.236407 0.359196 0.209938 0.183632 0.206219
114 DES DES Met349Ile 0.50701 0.96098 -0.233906 0.666678 0.837868 0.236407 0.359196 0.516332 0.183632 0.206219
115 DES DES Pro419Ser 0.352441 0.96098 -0.190622 0.666678 0.412984 0.236407 0.359196 0.37414 0.183632 0.206219
116 DES DES Pro433Thr 0.671607 0.96098 -0.266801 0.666678 0.769299 0.236407 0.359196 0.499264 0.183632 0.206219
117 DES DES Ser298Leu 0.59923 0.96098 -0.165427 0.666678 0.753565 0.236407 0.359196 0.550371 0.183632 0.206219
118 DES DES Ser424Phe 0.680491 0.96098 -0.240006 0.666678 0.856886 0.236407 0.359196 0.507256 0.183632 0.206219
119 DES DES Ser46Tyr 0.15768 0.96098 -0.221751 0.666678 0.141532 0.236407 0.359196 0.172794 0.183632 0.206219
120 DES DES Thr219Ile 0.856067 0.96098 0.479039 0.666678 0.918727 0.236407 0.359196 0.574198 0.183632 0.206219
121 DES DES Thr445Ala 0.769654 0.96098 0.233077 0.666678 0.899982 0.236407 0.359196 0.206988 0.183632 0.206219
122 DES DES Thr453Ile 0.507269 0.96098 -0.286501 0.666678 0.783687 0.236407 0.359196 0.290476 0.183632 0.206219
123 DES DES Tyr122Asp 0.966974 0.96098 0.6171 0.666678 0.906981 0.236407 0.359196 0.470362 0.183632 0.206219
124 DES DES Tyr331Asn 0.0531358 0.96098 0.149425 0.666678 0.622804 0.236407 0.359196 0.415561 0.183632 0.206219
125 DES DES Val126Leu 0.961887 0.96098 0.636213 0.666678 0.907963 0.236407 0.359196 0.526978 0.183632 0.206219
126 DES DES Val394Met 0.42363 0.96098 -0.139181 0.666678 0.736489 0.236407 0.359196 0.404362 0.183632 0.206219
127 DES DES Val469Met -0.165913 0.96098 -0.272643 0.666678 0.251948 0.236407 0.359196 0.216013 0.183632 0.206219
128 DES DES Val56Leu 0.547154 0.96098 -0.155579 0.666678 0.781271 0.236407 0.359196 0.337276 0.183632 0.206219
129 CA8 CA8 Arg237Gln 0.392047 0.766353 0.738549 0.678179 0.925869 0.236407 0.359196 0.561532 0.183632 0.206219
130 CDKN1A CDKN1A Arg67Leu 0.883155 0.860748 0.580158 0.638411 0.711187 0.236407 0.359196 0.638306 0.183632 0.206219
131 CDKN1A CDKN1A Arg84Gln 0.879257 0.860748 0.599658 0.638411 0.698247 0.236407 0.359196 0.590087 0.183632 0.206219
132 CDKN1A CDKN1A Asp149Gly 0.952675 0.860748 0.919989 0.638411 0.892883 0.236407 0.359196 0.834276 0.183632 0.206219
133 CDKN1A CDKN1A Ser31Arg -0.143582 0.860748 0.242012 0.638411 0.587451 0.236407 0.359196 0.576635 0.183632 0.206219
134 EFHC1 EFHC1 Arg159Trp 0.442311 0.353609 0.65473 0.308837 0.278683 0.236407 0.359196 0.286627 0.183632 0.206219
135 EFHC1 EFHC1 Asp210Asn 0.710313 0.353609 0.314558 0.308837 0.647675 0.236407 0.359196 0.484588 0.183632 0.206219
136 EFHC1 EFHC1 Asp253Tyr 0.720759 0.353609 0.558822 0.308837 0.459286 0.236407 0.359196 0.133959 0.183632 0.206219
137 EFHC1 EFHC1 Cys259Tyr 0.55515 0.353609 -0.388751 0.308837 0.844412 0.236407 0.359196 0.555367 0.183632 0.206219
138 EFHC1 EFHC1 Ile174Val 0.662932 0.353609 0.620766 0.308837 0.390085 0.236407 0.359196 0.27129 0.183632 0.206219
139 EFHC1 EFHC1 Met448Thr 0.554476 0.353609 -0.328411 0.308837 0.774648 0.236407 0.359196 0.478676 0.183632 0.206219
140 EFHC1 EFHC1 Phe229Leu 0.700251 0.353609 0.275369 0.308837 0.706964 0.236407 0.359196 0.477644 0.183632 0.206219
141 BAG3 BAG3 Arg218Trp 0.866316 0.839816 0.36141 0.500452 0.798331 0.236407 0.359196 0.52026 0.183632 0.206219
142 BAG3 BAG3 Arg258Trp -0.346637 0.839816 0.163514 0.500452 0.937087 0.236407 0.359196 0.203287 0.183632 0.206219
143 BAG3 BAG3 Arg477His 0.862916 0.839816 0.723351 0.500452 0.852504 0.236407 0.359196 0.484095 0.183632 0.206219
144 BAG3 BAG3 Leu462Pro 0.657287 0.839816 0.274224 0.500452 0.773469 0.236407 0.359196 0.349204 0.183632 0.206219
145 BAG3 BAG3 Pro380Ser 0.946702 0.839816 0.798054 0.500452 0.920722 0.236407 0.359196 0.659353 0.183632 0.206219
146 CSNK1D CSNK1D His46Arg 0.489742 0.652241 0.0320267 0.600537 0.771899 0.236407 0.359196 0.401865 0.183632 0.206219
147 BFSP2 BFSP2 Ala407Asp 0.188576 0.86151 0.0961269 0.525939 0.451919 0.236407 0.359196 0.365802 0.183632 0.206219
148 BFSP2 BFSP2 Arg287Trp -0.203345 0.86151 0.112663 0.525939 0.827873 0.236407 0.359196 0.672278 0.183632 0.206219
149 BFSP2 BFSP2 Arg339His 0.766487 0.86151 0.498634 0.525939 0.663021 0.236407 0.359196 0.434368 0.183632 0.206219
150 FADD FADD Cys105Trp 0.333091 0.369791 0.218344 0.158193 0.324666 0.236407 0.359196 0.621813 0.183632 0.206219
151 AGXT AGXT Ala186Val 0.40574 0.849535 0.20802 0.664522 0.881702 0.236407 0.359196 0.151812 0.183632 0.206219
152 AGXT AGXT Ala210Pro 0.339954 0.849535 0.46798 0.664522 0.692378 0.236407 0.359196 0.14402 0.183632 0.206219
153 AGXT AGXT Ala248Ser 0.599298 0.849535 0.407043 0.664522 0.736866 0.236407 0.359196 0.276903 0.183632 0.206219
154 AGXT AGXT Ala248Val 0.919494 0.849535 0.884055 0.664522 0.767134 0.236407 0.359196 0.548562 0.183632 0.206219
155 AGXT AGXT Ala280Val 0.846497 0.849535 0.672835 0.664522 0.651513 0.236407 0.359196 0.52764 0.183632 0.206219
156 AGXT AGXT Ala295Thr 0.964667 0.849535 0.853996 0.664522 0.846539 0.236407 0.359196 0.607658 0.183632 0.206219
157 AGXT AGXT Ala85Asp 0.335741 0.849535 0.186985 0.664522 0.808038 0.236407 0.359196 0.238338 0.183632 0.206219
158 AGXT AGXT Arg111Gln 0.639816 0.849535 0.762217 0.664522 0.756375 0.236407 0.359196 0.518734 0.183632 0.206219
159 AGXT AGXT Arg118Cys 0.676583 0.849535 0.563772 0.664522 0.702174 0.236407 0.359196 0.611143 0.183632 0.206219
160 AGXT AGXT Arg197Gln 0.896846 0.849535 0.580685 0.664522 0.64914 0.236407 0.359196 0.410531 0.183632 0.206219
161 AGXT AGXT Arg289His 0.374419 0.849535 0.452246 0.664522 0.807065 0.236407 0.359196 0.572713 0.183632 0.206219
162 AGXT AGXT Arg301Cys 0.634586 0.849535 0.281424 0.664522 0.741832 0.236407 0.359196 0.450398 0.183632 0.206219
163 AGXT AGXT Arg36Cys 0.095247 0.849535 -0.256261 0.664522 0.68908 0.236407 0.359196 0.212972 0.183632 0.206219
164 AGXT AGXT Arg381Lys 0.887036 0.849535 0.689149 0.664522 0.781288 0.236407 0.359196 0.54443 0.183632 0.206219
165 AGXT AGXT Asn22Ser 0.773378 0.849535 0.594779 0.664522 0.580641 0.236407 0.359196 0.462521 0.183632 0.206219
166 AGXT AGXT Asp129His 0.927968 0.849535 0.750045 0.664522 0.717215 0.236407 0.359196 0.335258 0.183632 0.206219
167 AGXT AGXT Asp201Asn -0.153482 0.849535 -0.238843 0.664522 0.3578 0.236407 0.359196 0.466604 0.183632 0.206219
168 AGXT AGXT Asp341Glu 0.0697666 0.849535 -0.0565573 0.664522 0.356182 0.236407 0.359196 0.276413 0.183632 0.206219
169 AGXT AGXT Glu274Asp 0.607957 0.849535 0.58081 0.664522 0.577989 0.236407 0.359196 0.5882 0.183632 0.206219
170 AGXT AGXT Gly116Arg 0.41309 0.849535 0.752629 0.664522 0.708328 0.236407 0.359196 0.568779 0.183632 0.206219
171 AGXT AGXT Gly156Arg 0.216318 0.849535 0.228235 0.664522 0.583163 0.236407 0.359196 0.281612 0.183632 0.206219
172 AGXT AGXT Gly161Arg 0.278343 0.849535 0.67403 0.664522 0.73253 0.236407 0.359196 0.47044 0.183632 0.206219
173 AGXT AGXT Gly161Ser 0.728239 0.849535 0.520919 0.664522 0.781395 0.236407 0.359196 0.327061 0.183632 0.206219
174 AGXT AGXT Gly41Arg 0.664937 0.849535 0.807193 0.664522 0.381512 0.236407 0.359196 0.204492 0.183632 0.206219
175 AGXT AGXT Gly41Glu 0.814439 0.849535 0.611575 0.664522 0.745704 0.236407 0.359196 0.4382 0.183632 0.206219
176 AGXT AGXT Gly82Arg 0.915432 0.849535 0.927311 0.664522 0.869626 0.236407 0.359196 0.668718 0.183632 0.206219
177 AGXT AGXT Ile202Asn 0.203934 0.849535 0.514362 0.664522 0.717426 0.236407 0.359196 0.34812 0.183632 0.206219
178 AGXT AGXT Ile279Met 0.078649 0.849535 0.686356 0.664522 0.824106 0.236407 0.359196 0.437371 0.183632 0.206219
179 AGXT AGXT Ile279Thr -0.204764 0.849535 -0.26962 0.664522 0.207085 0.236407 0.359196 0.198754 0.183632 0.206219
180 AGXT AGXT Ile340Met 0.903956 0.849535 0.644107 0.664522 0.691263 0.236407 0.359196 0.635978 0.183632 0.206219
181 AGXT AGXT Leu298Pro 0.235721 0.849535 0.26251 0.664522 0.602883 0.236407 0.359196 0.226917 0.183632 0.206219
182 AGXT AGXT Lys12Arg 0.937222 0.849535 0.939283 0.664522 0.787612 0.236407 0.359196 0.686885 0.183632 0.206219
183 AGXT AGXT Met195Leu 0.670139 0.849535 0.443538 0.664522 0.674409 0.236407 0.359196 0.311371 0.183632 0.206219
184 AGXT AGXT Met49Leu 0.666889 0.849535 0.43624 0.664522 0.765939 0.236407 0.359196 0.509871 0.183632 0.206219
185 AGXT AGXT Phe152Ile 0.704864 0.849535 0.857269 0.664522 0.363803 0.236407 0.359196 0.142884 0.183632 0.206219
186 AGXT AGXT Pro10Ala 0.906964 0.849535 0.89344 0.664522 0.809542 0.236407 0.359196 0.572227 0.183632 0.206219
187 AGXT AGXT Pro11His 0.689958 0.849535 0.591079 0.664522 0.753937 0.236407 0.359196 0.314062 0.183632 0.206219
188 AGXT AGXT Pro11Leu 0.33432 0.849535 0.158312 0.664522 0.659494 0.236407 0.359196 0.11871 0.183632 0.206219
189 AGXT AGXT Pro319Leu 0.59928 0.849535 0.328312 0.664522 0.733404 0.236407 0.359196 0.291737 0.183632 0.206219
190 AGXT AGXT Ser187Phe 0.59755 0.849535 0.759807 0.664522 0.804119 0.236407 0.359196 0.642991 0.183632 0.206219
191 AGXT AGXT Ser218Leu 0.64313 0.849535 0.881151 0.664522 0.826705 0.236407 0.359196 0.762984 0.183632 0.206219
192 AGXT AGXT Ser221Pro 0.248213 0.849535 0.672312 0.664522 0.686656 0.236407 0.359196 0.368276 0.183632 0.206219
193 AGXT AGXT Val162Met 0.65846 0.849535 0.557249 0.664522 0.304567 0.236407 0.359196 0.244205 0.183632 0.206219
194 AGXT AGXT Val326Ile 0.684498 0.849535 0.577242 0.664522 0.612692 0.236407 0.359196 0.536806 0.183632 0.206219
195 COQ8A COQ8A Gly272Asp 0.674594 0.317254 0.469798 0.243587 0.672593 0.236407 0.359196 0.215357 0.183632 0.206219
196 COQ8A COQ8A Gly549Ser 0.34926 0.317254 -0.0127499 0.243587 0.732797 0.236407 0.359196 0.394383 0.183632 0.206219
197 COQ8A COQ8A His80Tyr -0.376851 0.317254 -0.444023 0.243587 0.755664 0.236407 0.359196 0.572319 0.183632 0.206219
198 CHN1 CHN1 Glu313Lys 0.823906 0.497718 0.649934 0.537013 0.737574 0.236407 0.359196 0.527971 0.183632 0.206219
199 CHN1 CHN1 Ile126Met 0.742673 0.497718 0.509495 0.537013 0.721022 0.236407 0.359196 0.707186 0.183632 0.206219
200 CHN1 CHN1 Pro141Leu 0.762687 0.497718 0.537037 0.537013 0.831193 0.236407 0.359196 0.675641 0.183632 0.206219
201 CHN1 CHN1 Pro252Ser -0.1729 0.497718 0.415451 0.537013 0.844275 0.236407 0.359196 0.771959 0.183632 0.206219
202 CHN1 CHN1 Tyr143His 0.733163 0.497718 0.640303 0.537013 0.722927 0.236407 0.359196 0.774065 0.183632 0.206219
203 CDC73 CDC73 Met1Ile -0.682187 0.765012 0.11803 0.527967 0.817551 0.236407 0.359196 0.434372 0.183632 0.206219
204 COMP COMP Ala171Thr 0.0519353 0.48337 -0.0979181 0.565594 0.757712 0.236407 0.359196 0.479403 0.183632 0.206219
205 COMP COMP Arg718Pro 0.014536 0.48337 -0.427918 0.565594 0.870755 0.236407 0.359196 0.377358 0.183632 0.206219
206 COMP COMP Asn523Lys 0.0469446 0.48337 -0.370681 0.565594 0.735884 0.236407 0.359196 0.171554 0.183632 0.206219
207 COMP COMP Asn555Lys 0.278054 0.48337 0.291873 0.565594 0.888965 0.236407 0.359196 0.870107 0.183632 0.206219
208 COMP COMP Asp271His 0.448496 0.48337 0.305684 0.565594 0.214483 0.236407 0.359196 0.487396 0.183632 0.206219
209 COMP COMP Asp319Val 0.482311 0.48337 0.112505 0.565594 0.42538 0.236407 0.359196 0.328242 0.183632 0.206219
210 COMP COMP Asp342Tyr 0.0366465 0.48337 -0.564795 0.565594 0.741217 0.236407 0.359196 0.421236 0.183632 0.206219
211 COMP COMP Asp408Asn 0.652258 0.48337 0.683344 0.565594 0.159243 0.236407 0.359196 0.407203 0.183632 0.206219
212 COMP COMP Asp408His 0.0460424 0.48337 0.137493 0.565594 0.318446 0.236407 0.359196 0.192894 0.183632 0.206219
213 COMP COMP Asp511Glu 0.098606 0.48337 -0.178495 0.565594 0.690808 0.236407 0.359196 0.512038 0.183632 0.206219
214 COMP COMP Asp530Glu -0.0878529 0.48337 -0.311885 0.565594 0.914823 0.236407 0.359196 0.29386 0.183632 0.206219
215 COMP COMP Asp605Asn -0.0132953 0.48337 -0.343115 0.565594 0.815929 0.236407 0.359196 0.347799 0.183632 0.206219
216 COMP COMP Cys348Arg 0.1296 0.48337 -0.320824 0.565594 0.879598 0.236407 0.359196 0.574866 0.183632 0.206219
217 COMP COMP Gly207Asp -0.163465 0.48337 -0.386234 0.565594 0.768955 0.236407 0.359196 0.388437 0.183632 0.206219
218 COMP COMP His189Arg -0.309563 0.48337 -0.339264 0.565594 0.793272 0.236407 0.359196 0.364794 0.183632 0.206219
219 COMP COMP His441Arg -0.271095 0.48337 -0.36719 0.565594 0.753647 0.236407 0.359196 0.433715 0.183632 0.206219
220 COMP COMP His587Arg -0.0776071 0.48337 -0.475992 0.565594 0.865808 0.236407 0.359196 0.471006 0.183632 0.206219
221 COMP COMP Ser681Cys 0.654227 0.48337 0.675944 0.565594 0.266211 0.236407 0.359196 0.207867 0.183632 0.206219
222 COMP COMP Thr529Ile -0.0165323 0.48337 0.100917 0.565594 0.634356 0.236407 0.359196 0.47933 0.183632 0.206219
223 COMP COMP Thr585Arg 0.278612 0.48337 -0.0138169 0.565594 0.833655 0.236407 0.359196 0.415051 0.183632 0.206219
224 COMP COMP Thr585Lys 0.37571 0.48337 0.13773 0.565594 0.520079 0.236407 0.359196 0.371695 0.183632 0.206219
225 COMP COMP Thr585Met 0.0867655 0.48337 -0.326112 0.565594 0.848373 0.236407 0.359196 0.363455 0.183632 0.206219
226 AMPD2 AMPD2 Glu697Asp 0.766463 0.81178 0.849455 0.470379 0.772086 0.236407 0.359196 0.249831 0.183632 0.206219
227 CORO1A CORO1A Val397Ile 0.0615145 0.852053 0.306829 0.385279 0.81264 0.236407 0.359196 0.64429 0.183632 0.206219
228 APOA1 APOA1 Ala188Ser -0.092556 0.596283 -0.152395 0.463634 0.484829 0.236407 0.359196 0.358139 0.183632 0.206219
229 APOA1 APOA1 Ala199Pro -0.00654168 0.596283 0.693163 0.463634 0.886548 0.236407 0.359196 0.0456008 0.183632 0.206219
230 APOA1 APOA1 Arg197Cys 0.0274639 0.596283 -0.173555 0.463634 0.875393 0.236407 0.359196 0.228524 0.183632 0.206219
231 APOA1 APOA1 Arg34Leu -0.261367 0.596283 0.698514 0.463634 0.861617 0.236407 0.359196 0.448169 0.183632 0.206219
232 APOA1 APOA1 Leu114Pro -0.0300296 0.596283 0.840867 0.463634 0.838112 0.236407 0.359196 0.455979 0.183632 0.206219
233 APOA1 APOA1 Leu198Ser -0.0397486 0.596283 -0.0862165 0.463634 0.748647 0.236407 0.359196 0.0596162 0.183632 0.206219
234 APOA1 APOA1 Leu84Arg -0.0390537 0.596283 0.230308 0.463634 0.859377 0.236407 0.359196 0.246977 0.183632 0.206219
235 APOA1 APOA1 Trp74Arg -0.157724 0.596283 0.747219 0.463634 0.873373 0.236407 0.359196 0.448962 0.183632 0.206219
236 APOA1 APOA1 Val180Glu -0.124348 0.596283 0.887283 0.463634 0.834699 0.236407 0.359196 0.426655 0.183632 0.206219
237 CFP CFP Tyr414Asp -0.457254 0.862304 0.264033 0.204986 0.691247 0.236407 0.359196 0.423041 0.183632 0.206219
238 DIABLO DIABLO Ala3Gly 0.406146 0.544288 0.406309 0.36727 0.328608 0.236407 0.359196 0.255339 0.183632 0.206219
239 DIABLO DIABLO Gly224Arg 0.71521 0.544288 0.395186 0.36727 0.461083 0.236407 0.359196 0.504745 0.183632 0.206219
240 DIABLO DIABLO Ile59Val 0.426364 0.544288 0.211285 0.36727 0.787706 0.236407 0.359196 0.479351 0.183632 0.206219
241 DIABLO DIABLO Ser126Leu 0.55814 0.544288 0.42785 0.36727 0.651042 0.236407 0.359196 0.166687 0.183632 0.206219

Summary

  • If we call a variant to be impactful if the correlation coef of WT/MT pair is less than 10th percentile of the replicate correlate dist, below would be the percentage of the impactful variants
  • ~45% of variants in protein channel are impactful
  • ~41% of variants in non-protein channel are impactful

from 2021_09_01_varchamp.

AnneCarpenter avatar AnneCarpenter commented on May 29, 2024

Exciting!
I'm confused though because it looks like 80% of the red pairs are to the right of the dotted lines here and 10-20% of pairs are on the left of the lines, am I missing something? Neither is around 40% so I must be misinterpreting something. Also I wasn't sure what "the replicate correlate dist" means?

Screenshot 2023-04-11 at 9 49 35 AM

from 2021_09_01_varchamp.

MarziehHaghighi avatar MarziehHaghighi commented on May 29, 2024

These distributions are the regular replicate correlation distributions (along with their corresponding null - blue dist). 40% is not captured here. The only number from this figure with influences the 40% number is where the red dotted line falls (10th percentile of the red -distribution of correlation coef values among replicates- dist). Impact scores distribution is not placed on this figure. But you can check per-WT/MT-pair values in the table. For example, if you look at the "cc_np" column in that table, 40% of the values should be less than red dotted line value for the figure you copied for non-protein channel dists which is 0.2.

from 2021_09_01_varchamp.

AnneCarpenter avatar AnneCarpenter commented on May 29, 2024

I see! It would be nice to visually see the distribution of the WT-MT pairs (because IIUC the histograms are only showing WT replicates and MT replicates in red, or scrambled replicates in blue) but I am following the logic now.

from 2021_09_01_varchamp.

yhan8 avatar yhan8 commented on May 29, 2024

Drafting my steps here. In the metadata , column Metadata_Sample_Unique includes the wild type and mutant names. Two kinds of replicability will be calculated using evalzoo:

  1. technical replicability-whether replicates based on Metadata_Sample_Unique are replicates. There are no controls in the data (i.e., remove all 516 -TC), so it will be replicate against non-replicates, by plate.

All analysis was done by Copairs, three plates were combined together and all 516-TC were removed, which gave us 1077 samples. We define replicates as those who have the same Metadata_Sample_Unique, To see if we can retrieve replicates from non replicates, the following parameters were implemented into Copairs: pos_sameby = ['Metadata_Sample_Unique'] neg_diffby = ['Metadata_Sample_Unique']. We got a p value for each individual sample, so I then aggregated the result using Metadata_Sample_Unique. The table and figure show the unique Metadata_Sample_Unique that passed the significance threshold.
Screenshot 2023-05-24 at 2 59 57 PM

Screenshot 2023-05-24 at 3 00 04 PM

from 2021_09_01_varchamp.

yhan8 avatar yhan8 commented on May 29, 2024
  1. biological replicability-whether mutant for the same wild type can be retrieved from from the wild type itself.

To see if we can retrieve mutants from its own WT, the following parameters were implemented into Copairs: pos_sameby = ['Metadata_Gene'], pos_diffby = ['Metadata_type'], neg_diffby = ['Metadata_Gene']. This is to say we match mutants to its WT (a particular gene name) against the rest of the gene names including both their WTs and MTs. I got a p value for each individual sample, given the fact that each WT has different mutants, it is interesting to see which particular mutant has impact on its WT. Thus, I removed all the WTs from the Copair results, and then aggregated the results using Metadata_Sample_Unique, which in this case corresponded to each unique mutant. The table and figure below show whether those unique MT passed the significance threshold, however, in this case, we care about those who did not pass the threshold, meaning the MT had an effect on its WT.
Screenshot 2023-05-24 at 3 08 10 PM
Screenshot 2023-05-24 at 3 08 16 PM

from 2021_09_01_varchamp.

yhan8 avatar yhan8 commented on May 29, 2024

@MarziehHaghighi has generated correlation score for each Metadata_Sample_Unique to demonstrate if we can retrieve MT from its WT, the equivalence to this analysis. I plotted the mAP score for each Metadata_Sample_Unique using Copairs on the same plot with correlation score. I noticed that there are 12 unique Metadata_Sample_Unique included in my mAP score, but not in @MarziehHaghighi's correlation score.

['CTH Gln240Glu',
'BLMH Ile443Val',
'AP2S1 Arg15Cys',
'CLDN19 Arg200Gln',
'CUL3 Lys459Arg',
'CTNNA3 Val94Asp',
'AP2S1 Arg15His',
'BLK Ala71Thr',
'CCBE1 Gly136Arg',
'CLDN19 Gln57Glu',
'CTH Thr67Ile',
'CTH Ser403Ile']

For protein channel

output

For non protein channel

output

from 2021_09_01_varchamp.

AnneCarpenter avatar AnneCarpenter commented on May 29, 2024

IIUC the samples that correlate highly but have low average precision must be getting mixed up with lots of other samples in the experiment. That is, they have a strong phenotype that is similar to many other samples so it's hard to retrieve. (does anyone have an alternate explanation?) I am surprised it happens so often - the top left quadrant is much more full than I would have guessed.

from 2021_09_01_varchamp.

MarziehHaghighi avatar MarziehHaghighi commented on May 29, 2024

@yhan8 please put your data stats as I have done in my report comment to make sure they are consistent as the first thing to start with. Here it would be the number of samples and the level of profiles you used and number of features for each "p" and "np". The pattern for "np" is weird so I guess there might be some discrepancy in "np". If you checkout the short script I used to generate my analysis, you can figure out what I have filtered and the reason behind extra samples you have.

  • About the plots: would be great to have x-y axis the same scale, one is 0-1 and one -1-1 so your x would be ideally half of y in length.
  • The first plot makes more sense to me, I was expecting a linear relationship from 0-1 between cc and map! (confirming Anne's comment, I was not expecting many samples to have a strong phenotype in average but not retrievable given that we have high replicability in the dataset in general) negative cc s are all near zero for map which also makes sense! but the second plot ("np") is wired. So, lets check if the features that are used (and the level of data) match as I cant think of more details in preprocessing that might have caused this (there is really no more into it as we just read profiles and filtering the name of features for each cc and map analysis :D)

from 2021_09_01_varchamp.

AnneCarpenter avatar AnneCarpenter commented on May 29, 2024

That's a good idea - I agree Yu that it's good to record those stats so we can reality-check that everything looks sensible.

from 2021_09_01_varchamp.

yhan8 avatar yhan8 commented on May 29, 2024

Data Stats:

100 uniques WTs
254 unique pairs
Used feature selected level of profiles
101 protein channel features and 584 non-protein channel features corresponding to the rest of channels went for analysis
Note: all stats are consistent except I have 6 more protein features.

I am unclear on why @MarziehHaghighi's profile filtered out additional 12 pairs, I'll leave that to her.

Regarding the x and y scales, I am not sure if we want to visualize it like this?
output

from 2021_09_01_varchamp.

AnneCarpenter avatar AnneCarpenter commented on May 29, 2024

I should add a comment, Marzieh - I think the two plots are really similar - I don't see np as problematic (except those two weird points!) They are both showing a gentle curve but with lots in the top left.

from 2021_09_01_varchamp.

MarziehHaghighi avatar MarziehHaghighi commented on May 29, 2024

@AnneCarpenter well the unexpected pattern is much bolder for np to me. ~50 samples for MAP>0.3 for protein but ~15 for non-protein plot although in cc they look the same (many high cc points exist for both).

Thanks for checking @yu. I cloned the repo (as it was among the repos I lost) but realized I cant regenerate the results since I was using functions from the main rare_disease repo which I lost in EC2 termination incident! I had done major refactoring on that repo during the past few months which are all gone :((! Anyway, I wanted to check the reason behind the missing variants, but since you have more variants I skip doing further checking as we want to switch to your way of analysis anyway!

from 2021_09_01_varchamp.

yhan8 avatar yhan8 commented on May 29, 2024

Next steps:
Plot nlogp value, x-axis is biological, y-axis is technical. The quadrant we care about is where the WT/MT that passed the significant technical threshold, but did not pass the biological threshold, meaning there is real signal when a MT has an effect on WT, not just random noise.

from 2021_09_01_varchamp.

AnneCarpenter avatar AnneCarpenter commented on May 29, 2024

And to clarify, we probably want to reverse the axes so biological is on the y axis. Also, we couldn't decide if the technical should be retrieval for WT or for MUT of each pair. We waffled between making two copies of this chart, one with each on that axis, or if both ought to be plotted (with a line connecting them, even better). Depends how complex it is to plot all this. Yu is going to read the lung allele paper to understand better the concepts and take a look at the sparkler plot which aims to address this plotting conundrum (but is not very intuitive!).

from 2021_09_01_varchamp.

AnneCarpenter avatar AnneCarpenter commented on May 29, 2024

In checkin we talked about different visualizations that could work here:

  1. 3D with WT and MUT tech retrieval on separate axes (though hard to publish unless the data cooperates to be nicely viewable at a good angle),
  2. drawing a straight line between paired WT and MUT dots on x axis (with bio value being the same for both, such that all lines would be horizontal), could get messy but shows all the info we need.
  3. I realized another approach that may be ideal and also easy to implement: just plot the maximum of the WT or MUT tech retrieval for each pair on the x axis (and maybe have a 3 color legend for the dots, where the color indicate whether it was (a) WT tech retrieval is above the threshold, (b) MUT is above the threshold, or (c) both).

from 2021_09_01_varchamp.

yhan8 avatar yhan8 commented on May 29, 2024
  1. I realized another approach that may be ideal and also easy to implement: just plot the maximum of the WT or MUT tech retrieval for each pair on the x axis (and maybe have a 3 color legend for the dots, where the color indicate whether it was (a) WT tech retrieval is above the threshold, (b) MUT is above the threshold, or (c) both).

Here I am showing you the plot, that includes 254 WT+MT pairs (i.e., excluding all WTs), their biological retrieval, that is, does each unique MT have an effect on its WT. The x-axis is average precision score, the y-axis is nlogp value. The dotted redline is the p=0.05 threshold, then I color coded the scatters based on whether 1) the WT passed the technical retrieval threshold (p<0.05) 2). the MT but not the WT passed the technical retrieval threshold 3). both WT and MT passed the technical retrieval threshold. 4) False is neither WT and MT passed the technical retrieval threshold. This is for us to determine whether the signal we see is noise, and where the noise is from.

Note that we care about WT+MT pairs that do not pass the significance threshold, meaning we cannot retrieve its MT from its WT, hence the MT has an effect.

For protein channel

table protein

protein

For non-protein channel

Screenshot 2023-06-14 at 11 43 56 AM

output

from 2021_09_01_varchamp.

yhan8 avatar yhan8 commented on May 29, 2024

Check in on 6/22:
We have come to the conclusion that the distance metric (mAP) may not be the best approach because of the common protein properties across different genes. @shntnu will discuss with Marzieh and see if we shall try a classifier, and we will go from there.

from 2021_09_01_varchamp.

shntnu avatar shntnu commented on May 29, 2024

We have thus far used the mean average precision (mAP) framework for hit calling. In this method, a variant is deemed a 'hit' if it cannot retrieve wild-type replicates efficiently against all other wells on the same plate. To be specific:

One replicate of a variant is queried against all different-gene wells on the same plate; these serve as negative connections (typical n = 384 - number of same-gene wells on the plate). It is also compared to the corresponding (same-gene) wild-type wells across all plates; these serve as positive connections (typical n = 4, due to the typical four replicates of everything).

An mAP score and a corresponding p-value is derived for each variant using these query results. If the FDR-adjusted p-value is less than the prespecified alpha, we infer that we can retrieve the wild-type replicates well, implying that the wild-type and the mutant are practically indistinguishable (w.r.t. to other perturbations -- this is a key point). If the mAP score falls below this threshold, the variant is considered a hit. (Note, this is somewhat convoluted as we're essentially stating that we designate it as a hit if we can't reject the null hypothesis, but this is not a major issue)

Problematic Scenario:

Complications arise when the wild-type and the mutant only have subtle differences. For instance, if the wild-type elongates the cell nucleus and the mutant further enhances this phenotype, it will likely result in a high retrieval score due to the rarity of nuclear elongation as a phenotype. This implies the variant would not be tagged as a 'hit' despite the minor yet significant difference between the wild-type and the mutant.

Potential Solution (long-term):

The primary focus should be determining whether a screenable phenotype exists. In this context, we should aim to train a classifier capable of differentiating between the wild-type and the mutant. If the classifier exhibits good accuracy, it suggests the presence of a screenable phenotype.

Previously, this method was not explored due to insufficient replicates. Despite four replicates being inadequate even now, we can execute this approach at a single-cell level, which is the most promising direction moving forward.

Stop-gap solution:

The current mAP-based approach is a reasonable stop-gap solution. We may miss some hits (lower sensitivity), but the called hits will likely be true (high specificity).

Additional notes:

Marzieh and I evaluated whether her previous method of hit calling (as discussed above in this GitHub issue) was fundamentally distinct. It is not remarkably different. Her approach also designates a 'hit' based on the similarity between the wild-type and the mutant, providing it's under a certain threshold. This threshold is based on the similarities between replicates of the same perturbation.

Hence, both methodologies depend on similarities amongst other perturbations to set a threshold. However, we truly need a strategy focusing mainly on the phenotype of the wild-type and the mutant rather than their comparison to other perturbations. A targeted approach like this could yield more precise results when determining 'hits'.

Marzieh and I also discussed the broader concern about the application of supervised methods in profiling, with primary concerns in two specific areas:

  • Phenotypic Determination: A classifier could be built at the single-cell level to distinguish between the profiles of a perturbation and those of negative controls. A high classification score on a held-out test set suggests the presence of a detectable phenotype. However, while this method could help identify whether a perturbation has a phenotype, it is not conducive to creating a profile for the perturbation that could be used for clustering. Previous approaches using SVM classifiers have yielded profiles but are not wide-ranging enough, as they focus narrowly on the general perturbation effect.

  • Mechanisms of Action (MOA) Classification: Although there's no inherent issue with constructing a classifier for MOA classification, we risk failing to predict the classes of novel mechanisms. We prefer addressing this problem in a more unsupervised manner.

However, supervised learning can be a perfectly acceptable approach to predicting whether a variant has an impact, because that is the endpoint of the analysis.

One final aspect is whether we'd recommend a supervised (single-cell) approach, even for studies such as LUAD. For instance, should we recommend building a classifier to distinguish between the variant and the reference in the example below, using the accuracy of that classifier to declare whether the variant is a hit? I'm inclined to say yes, but this is open for debate (but we needn't debate that right now)

image

from 2021_09_01_varchamp.

AnneCarpenter avatar AnneCarpenter commented on May 29, 2024

I thought through what makes sense to me and perhaps you can cross check if it’s the same as the long-term solution you propose - I think it is!

Anne’s plan:

  • train a classifier to distinguish a particular variant from its WT? WTgene1_well1 vs VARgene1_well1 and so on for all combinations of replicate wells for each. (I guess at the single cell level?)

  • caveat: such classifiers may always seem effective (due to plate layout, slight changes in infection efficiency/expression, etc). So, we want to get a sense of what level of classification accuracy is significant.

  • to create such a null we can use replicates of all samples as the baseline. So the null pairings would be WTgene2_well1 vs WTgene2_well2 and so on for all the WT genes and similarly for all the variants?

  • (fancy extra detail) We could exclude the query gene itself in creating this null baseline - maybe that’s unnecessary if there are ~382 other samples on the plate. If there are lots of variants of one gene on a plate then we may want to do this step.

  • (ruled out alternative) for each gene’s null we could instead try to train a classifier to distinguish replicates of only the query gene itself: WTgene1_well1 vs WTgene1_well2 but this will likely always yield ‘successful’ classifiers due to plate layout effects.

Furthermore, I don’t see why the LUAD case is any different than the WT/VAR case of Variant Painting experiments so I would also say Yes to your query there that the same approach is appropriate there.

I made a schematic. Sort of obvious now that I drew it out so I don't expect to spark major insight here, but adding a link to google slide in case it helps anyone think through things.

Variant Painting single cell analysis

from 2021_09_01_varchamp.

shntnu avatar shntnu commented on May 29, 2024
  • train a classifier to distinguish a particular variant from its WT? WTgene1_well1 vs VARgene1_well1 and so on for all combinations of replicate wells for each. (I guess at the single cell level?)

We will do this at the single-cell level but will build a single model, so I wasn't sure what you mean by "all combinations of replicate wells". Maybe you are referring to the way we do train-test splits? If so, yes, we'd want to factor in the experimental hierarchy in some way when splitting.

  • caveat: such classifiers may always seem effective (due to plate layout, slight changes in infection efficiency/expression, etc). So, we want to get a sense of what level of classification accuracy is significant.
  • to create such a null we can use replicates of all samples as the baseline. So the null pairings would be WTgene2_well1 vs WTgene2_well2 and so on for all the WT genes and similarly for all the variants?

This sounds sensible

  • (fancy extra detail) We could exclude the query gene itself in creating this null baseline - maybe that’s unnecessary if there are ~382 other samples on the plate. If there are lots of variants of one gene on a plate then we may want to do this step.

No need to exclude the query gene when creating the null in this manner, because our null hypothesis is that the wells are arbitrarily assigned WT and MUT labels. The fancy thing to do would be to have a separate null for each WT-MUT pair, where we only consider the wells of the WT-MUT pair and shuffle their labels. But that is an overkill

  • (ruled out alternative) for each gene’s null we could instead try to train a classifier to distinguish replicates of only the query gene itself: WTgene1_well1 vs WTgene1_well2 but this will likely always yield ‘successful’ classifiers due to plate layout effects.

Oh, maybe this is the same as my fancy idea right above. We can pay closer attention to the design of the splits when we actually do the experiment.

Furthermore, I don’t see why the LUAD case is any different than the WT/VAR case of Variant Painting experiments so I would also say Yes to your query there that the same approach is appropriate there.

I think you are right

I made a schematic. Sort of obvious now that I drew it out so I don't expect to spark major insight here, but adding a link to google slide in case it helps anyone think through things.

This makes sense

from 2021_09_01_varchamp.

AnneCarpenter avatar AnneCarpenter commented on May 29, 2024

To address your first comment, I agree it's sensible to build a single model that distinguishes WTgene1 (all wells) vs VARgene1 (all wells) but what I was proposing was different :D I propose actually training a bunch of small classifiers on each pair, like single cells in WTgene1_well1 vs VARgene1_well1 and so on with well2, well3, etc. I should've been more clear and said "all pairs of replicate wells" instead of "all combinations of replicate wells".

One reason to do this pairwise across individual wells is to put error bars on classification accuracy, I guess. But mainly to make it easier to calculate a realistic null because now we can use two replicates of a sample that we KNOW should look alike (WTgene2_well1 vs WTgene2_well2 and so on for all the WT genes and similarly for all the variants). I guess a downside of this approach, though, is that WTgene2_well1 vs WTgene2_well2 is likely to always be same-well-position whereas the query test WTgene1_well1 vs VARgene1_well1 is not :(

Still, if we instead make a single classifier for each WT-MUT, those values will almost certainly always seem to be accurate classifiers (due to technical variations) so to decide if they are significant, we need a suitable null. To make its null we need to get WTs with a similar number of replicates and similar number of single cells to be fair (?). And maybe choose those having similar plate positions? I dunno.

from 2021_09_01_varchamp.

MarziehHaghighi avatar MarziehHaghighi commented on May 29, 2024

Some questions/notes:

  • My understanding is that we don't really care about overall phenotype impact score on the space of whole perturbations in the experiment. Instead, we care about the score on "if there is any consistent (across all cells) signature for WT versus mutant"? For example a 100% score means that there exist a phenotype that exists for all single cells of WT versus mutant (In contrast to the previous way of unbiased score given the full space, in which 100% (or 1) meant a signature that is in average the most distinct across all WTs and VARs of an experiment). Let me know if there is any flaw in my understanding.

  • If the above is correct and we indeed care about a phenotype that is consistent across single cells for WT versus VAR, we should be careful of the following:

    • The heterogeneity of the samples: we had a huge amount of heterogeneity across the samples of the Taipale Lab rare diseases datasets. I have not looked into subpopulation analysis data for the new VarChamp batches to have an idea of if this has changed in the new experiments. But wanted to give you a heads up given this prior knowledge.

    • Number of cells versus number of features for the classification problem, again this maybe is something that doesn't hold in the new batches of data as all the cells are now transfected but in the old batches we had small number of single cells for many wells and we should be careful about (n of features)>> (n of samples) which cause overfitting.

    • Position effect: in the first pilot batch of data that we analysed for VarChamp the plate layout was the same. If the position effect is strong, it is problematic for any method of scoring, but more problematic for a classifier which cares about all single cells in a well having a phenotype that other well dont have.

    • The overall amount of computational complexity we add to the problem (by using single cells) versus what we gain.

    • I make comments on the null after I fully understand your suggestions but for now wanted to give my two cents on this thread.

from 2021_09_01_varchamp.

AnneCarpenter avatar AnneCarpenter commented on May 29, 2024

Yes, your understanding is correct!

Your bullet points are beautifully helpful. Yes, we do anticipate heterogeneity. Essentially every MUT will be distinguishable (classifiable) from its WT just due to technical variations so we have to be careful how to set the null to know which ones are really distinguishable.

I agree with the rest of the points too.

from 2021_09_01_varchamp.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.