Goal: What proportion of variants and which ones show a sign

Against the whole experiment profiles. Like in previous "correlation coefficient

Using MAP (by <a class="user-mention notranslate" da

broadinstitute,2021_09_01_varchamp

Comments (30)

AnneCarpenter commented on May 29, 2024 1

Is this plot for all reagents (WT and MUT) being able to retrieve replicates of themselves against a background of all samples on the plates? And we are seeing roughly half do so?

from 2021_09_01_varchamp.

AnneCarpenter commented on May 29, 2024 1

Awesome, could you provide zooms of both where the x axis ends around 0.1?
And can you make the legend the same in both so we don't re-learn the colors' meanings? (also good to use colorblind friendly palette, IIRC one of our labmates has trouble w red/green)

from 2021_09_01_varchamp.

AnneCarpenter commented on May 29, 2024

This sounds great! I just want to clarify about the 2nd step where we compare WT and MT using MAP which you described by email:

Calculating the MAP for each MT with respect to WTs (for each MT profiles we query WTs and then average over per MT AP values)

Each MT profile will try to retrieve its WT profiles against a pool of what? (its own MT replicates, or the whole experiment of profiles? if it's the former I could imagine that almost all MT/WT pairs will look different enough to pass this threshold, such that offering it the whole experiment or plate of profiles provides better resolution of the ability to retrieve?)

from 2021_09_01_varchamp.

MarziehHaghighi commented on May 29, 2024

Against the whole experiment profiles. Like in previous "correlation coefficient" based impact score calculations, we were setting the 15th percentile of the replicate correlation distribution as the threshold to say WT-MT scores less than that threshold can be considered impactful. Here, we instead say the MAP of MT versus WTs scores should be less than the 15th percentile of the MAP distribution for retrieval of replicates. Does it make sense?

from 2021_09_01_varchamp.

AnneCarpenter commented on May 29, 2024

Makes sense!

from 2021_09_01_varchamp.

yhan8 commented on May 29, 2024

Using MAP (by @yhan8)

Replicate correlation + null distributions

list of map scores for each pair

Drafting my steps here. In the metadata , column Metadata_Sample_Unique includes the wild type and mutant names. Two kinds of replicability will be calculated using evalzoo:

technical replicability-whether replicates based on Metadata_Sample_Unique are replicates. There are no controls in the data (i.e., remove all 516 -TC), so it will be replicate against non-replicates, by plate.
biological replicability-whether mutant for the same wild type can be retrieved from from the wild type itself.

Need to discuss with @shntnu on editing the evalzoo script to accomodate this study.

from 2021_09_01_varchamp.

MarziehHaghighi commented on May 29, 2024

Summary of the results for basic analysis using correlation coefficient metric

Data stats

100 uniques WTs
254 unique pairs
Used feature selected level of profiles
95 protein channel features and 584 non-protein channel features corresponding to the rest of channels went for analysis

Replicate correlation + null distributions

Based on protein channel features
Based on non-protein channel features

Table of all scores

Scores are based on approach 2 (average of per plate cc impact scores)
Source data on s3

	Gene	Metadata_Sample_Unique	cc_p	wt_RepCor_p	cc_np	wt_RepCor_np	RepCor_p	Rand90Perc_p	Rep10Perc_p	RepCor_np	Rand90Perc_np	Rep10Perc_np
0	DOLK	DOLK Tyr441Ser	0.812154	0.303609	0.549158	0.246678	0.459496	0.236407	0.359196	0.32718	0.183632	0.206219
1	EMD	EMD Ala56Thr	0.400925	0.663506	0.141576	0.108136	0.541148	0.236407	0.359196	0.638897	0.183632	0.206219
2	EMD	EMD Asp72Val	0.496056	0.663506	0.261438	0.108136	0.305962	0.236407	0.359196	0.265102	0.183632	0.206219
3	EMD	EMD Met1Val	0.689438	0.663506	0.358089	0.108136	0.212016	0.236407	0.359196	0.248557	0.183632	0.206219
4	EMD	EMD Pro183His	0.101571	0.663506	0.263846	0.108136	0.361373	0.236407	0.359196	0.235395	0.183632	0.206219
5	EMD	EMD Pro183Thr	0.290396	0.663506	0.10283	0.108136	0.338601	0.236407	0.359196	0.506725	0.183632	0.206219
6	EMD	EMD Ser54Phe	0.543792	0.663506	0.156373	0.108136	0.360591	0.236407	0.359196	0.30438	0.183632	0.206219
7	IMPDH1	IMPDH1 Arg309Pro	0.354869	0.545031	0.242057	0.0643272	0.226381	0.236407	0.359196	0.170917	0.183632	0.206219
8	IMPDH1	IMPDH1 Asp311Asn	-0.209642	0.545031	0.458999	0.0643272	0.542843	0.236407	0.359196	0.0557816	0.183632	0.206219
9	AIPL1	AIPL1 Arg270His	0.273113	0.810226	0.605787	0.546889	0.792534	0.236407	0.359196	0.542621	0.183632	0.206219
10	AIPL1	AIPL1 Arg302Leu	0.862609	0.810226	0.717986	0.546889	0.830929	0.236407	0.359196	0.356391	0.183632	0.206219
11	AIPL1	AIPL1 Met79Thr	0.154947	0.810226	0.491816	0.546889	0.693704	0.236407	0.359196	0.260006	0.183632	0.206219
12	AIPL1	AIPL1 Thr114Ile	0.965001	0.810226	0.939664	0.546889	0.840271	0.236407	0.359196	0.661714	0.183632	0.206219
13	EIF2B4	EIF2B4 Ala228Val	0.369129	0.399015	0.618664	0.481428	0.75831	0.236407	0.359196	0.210716	0.183632	0.206219
14	EIF2B4	EIF2B4 Ala391Asp	0.510709	0.399015	-0.172932	0.481428	0.636257	0.236407	0.359196	0.435466	0.183632	0.206219
15	EIF2B4	EIF2B4 Arg306Gly	0.626046	0.399015	0.555183	0.481428	0.443025	0.236407	0.359196	0.233263	0.183632	0.206219
16	ALAS2	ALAS2 Ala135Thr	0.957832	0.942857	0.74251	0.509418	0.919249	0.236407	0.359196	0.382541	0.183632	0.206219
17	ALAS2	ALAS2 Arg374Cys	0.92887	0.942857	0.584661	0.509418	0.839	0.236407	0.359196	0.516608	0.183632	0.206219
18	ALAS2	ALAS2 Asp122Asn	0.96137	0.942857	0.628351	0.509418	0.892947	0.236407	0.359196	0.588997	0.183632	0.206219
19	ALAS2	ALAS2 Asp153Val	0.950552	0.942857	0.576244	0.509418	0.918158	0.236407	0.359196	0.532146	0.183632	0.206219
20	ALAS2	ALAS2 Cys358Tyr	0.94791	0.942857	0.566148	0.509418	0.947057	0.236407	0.359196	0.846761	0.183632	0.206219
21	ALAS2	ALAS2 Gly254Ser	0.973936	0.942857	0.579645	0.509418	0.947381	0.236407	0.359196	0.842696	0.183632	0.206219
22	ALAS2	ALAS2 Lys262Gln	0.97146	0.942857	0.633109	0.509418	0.941559	0.236407	0.359196	0.716846	0.183632	0.206219
23	ALAS2	ALAS2 Phe128Leu	0.899358	0.942857	0.665138	0.509418	0.892667	0.236407	0.359196	0.307553	0.183632	0.206219
24	ALAS2	ALAS2 Ser531Gly	0.854929	0.942857	0.59907	0.509418	0.798461	0.236407	0.359196	0.491536	0.183632	0.206219
25	ALAS2	ALAS2 Thr351Ser	0.957671	0.942857	0.610488	0.509418	0.876557	0.236407	0.359196	0.528331	0.183632	0.206219
26	ALAS2	ALAS2 Tyr549Phe	0.961277	0.942857	0.621325	0.509418	0.932656	0.236407	0.359196	0.418941	0.183632	0.206219
27	CLCNKA	CLCNKA Trp80Cys	-0.471066	0.37997	-0.0133432	0.244383	0.696592	0.236407	0.359196	0.76506	0.183632	0.206219
28	FBP1	FBP1 Ala177Asp	-0.168784	0.836435	-0.200173	0.392719	0.247128	0.236407	0.359196	0.154469	0.183632	0.206219
29	CTRC	CTRC Arg246Cys	0.634972	0.797619	0.111161	0.216912	0.801093	0.236407	0.359196	0.455059	0.183632	0.206219
30	CTRC	CTRC Arg37Gln	0.765336	0.797619	0.265112	0.216912	0.796699	0.236407	0.359196	0.334993	0.183632	0.206219
31	CTRC	CTRC Gln178Arg	-0.340483	0.797619	0.353732	0.216912	0.867971	0.236407	0.359196	0.568933	0.183632	0.206219
32	CTRC	CTRC Glu225Ala	0.87933	0.797619	0.599191	0.216912	0.776341	0.236407	0.359196	0.24573	0.183632	0.206219
33	DCX	DCX Ala251Ser	-0.208084	0.790209	-0.524234	0.305862	-0.00795505	0.236407	0.359196	0.331395	0.183632	0.206219
34	DCX	DCX Ala71Ser	0.889039	0.790209	0.862022	0.305862	0.666018	0.236407	0.359196	0.477633	0.183632	0.206219
35	DCX	DCX Arg102Cys	0.857378	0.790209	0.778809	0.305862	0.812403	0.236407	0.359196	0.509653	0.183632	0.206219
36	DCX	DCX Arg186His	0.935348	0.790209	0.786348	0.305862	0.818851	0.236407	0.359196	0.407649	0.183632	0.206219
37	DCX	DCX Arg186Leu	0.147867	0.790209	0.469696	0.305862	0.816109	0.236407	0.359196	0.440123	0.183632	0.206219
38	DCX	DCX Arg196Cys	0.814541	0.790209	0.643877	0.305862	0.670303	0.236407	0.359196	0.421047	0.183632	0.206219
39	DCX	DCX Arg196His	0.679362	0.790209	0.724891	0.305862	0.6509	0.236407	0.359196	0.283375	0.183632	0.206219
40	DCX	DCX Arg59His	0.93895	0.790209	0.886552	0.305862	0.794319	0.236407	0.359196	0.462963	0.183632	0.206219
41	DCX	DCX Arg78Cys	0.60454	0.790209	0.353412	0.305862	0.771091	0.236407	0.359196	0.378881	0.183632	0.206219
42	DCX	DCX Arg78His	0.870223	0.790209	0.864304	0.305862	0.835428	0.236407	0.359196	0.287032	0.183632	0.206219
43	DCX	DCX Arg89Gly	0.735337	0.790209	0.375841	0.305862	0.77287	0.236407	0.359196	0.461303	0.183632	0.206219
44	DCX	DCX Ile214Thr	0.680948	0.790209	0.56692	0.305862	0.640235	0.236407	0.359196	0.60222	0.183632	0.206219
45	DCX	DCX Lys174Glu	0.666268	0.790209	0.529493	0.305862	0.780097	0.236407	0.359196	0.58299	0.183632	0.206219
46	DCX	DCX Lys50Asn	0.885656	0.790209	0.673236	0.305862	0.812871	0.236407	0.359196	0.266286	0.183632	0.206219
47	DCX	DCX Met1Thr	0.36452	0.790209	0.568294	0.305862	0.730808	0.236407	0.359196	0.559341	0.183632	0.206219
48	DCX	DCX Pro191Arg	-0.279817	0.790209	-0.514462	0.305862	0.259648	0.236407	0.359196	0.328803	0.183632	0.206219
49	DCX	DCX Ser129Leu	0.686765	0.790209	0.669147	0.305862	0.634991	0.236407	0.359196	0.427628	0.183632	0.206219
50	DCX	DCX Thr203Ala	0.957748	0.790209	0.82135	0.305862	0.860252	0.236407	0.359196	0.387076	0.183632	0.206219
51	DCX	DCX Thr203Arg	0.907237	0.790209	0.729365	0.305862	0.761325	0.236407	0.359196	0.399816	0.183632	0.206219
52	DCX	DCX Tyr125His	0.609427	0.790209	0.618964	0.305862	0.574572	0.236407	0.359196	0.333535	0.183632	0.206219
53	CRADD	CRADD Arg185Gln	0.371742	0.821747	-0.31208	0.285804	0.814273	0.236407	0.359196	0.284464	0.183632	0.206219
54	CRADD	CRADD Gly128Arg	-0.10655	0.821747	-0.225103	0.285804	0.2063	0.236407	0.359196	0.131331	0.183632	0.206219
55	ACSF3	ACSF3 Ala197Thr	-0.348342	0.733033	0.293595	0.304979	0.571281	0.236407	0.359196	0.201265	0.183632	0.206219
56	ACSF3	ACSF3 Arg10Trp	0.131518	0.733033	0.374758	0.304979	0.770124	0.236407	0.359196	0.541322	0.183632	0.206219
57	ACSF3	ACSF3 Arg471Trp	-0.3424	0.733033	-0.174532	0.304979	0.498657	0.236407	0.359196	0.366746	0.183632	0.206219
58	ACSF3	ACSF3 Arg558Trp	-0.314254	0.733033	-0.099857	0.304979	0.550779	0.236407	0.359196	0.328222	0.183632	0.206219
59	ACSF3	ACSF3 Asp236Asn	-0.01745	0.733033	-0.214152	0.304979	0.32766	0.236407	0.359196	0.541452	0.183632	0.206219
60	ACSF3	ACSF3 Asp457Asn	-0.223768	0.733033	0.294623	0.304979	0.497507	0.236407	0.359196	0.27002	0.183632	0.206219
61	ACSF3	ACSF3 Glu359Lys	-0.344937	0.733033	0.159046	0.304979	0.51067	0.236407	0.359196	0.246498	0.183632	0.206219
62	ACSF3	ACSF3 Gly119Asp	-0.309312	0.733033	-0.128686	0.304979	0.575937	0.236407	0.359196	0.208485	0.183632	0.206219
63	ACSF3	ACSF3 Gly225Arg	-0.289699	0.733033	0.433965	0.304979	0.634696	0.236407	0.359196	0.349261	0.183632	0.206219
64	ACSF3	ACSF3 Ile200Met	-0.31487	0.733033	0.039181	0.304979	0.534015	0.236407	0.359196	0.293977	0.183632	0.206219
65	ACSF3	ACSF3 Met198Arg	-0.316022	0.733033	0.078226	0.304979	0.566512	0.236407	0.359196	0.217637	0.183632	0.206219
66	ACSF3	ACSF3 Met266Val	-0.315708	0.733033	-0.298251	0.304979	0.531917	0.236407	0.359196	0.185549	0.183632	0.206219
67	ACSF3	ACSF3 Pro243Leu	-0.279633	0.733033	-0.163358	0.304979	0.601404	0.236407	0.359196	0.40753	0.183632	0.206219
68	ACSF3	ACSF3 Pro285Leu	-0.368099	0.733033	0.0446327	0.304979	0.629503	0.236407	0.359196	0.3804	0.183632	0.206219
69	ACSF3	ACSF3 Ser431Tyr	0.202273	0.733033	0.388348	0.304979	0.414894	0.236407	0.359196	0.199084	0.183632	0.206219
70	ACSF3	ACSF3 Thr358Ile	-0.310495	0.733033	0.0161624	0.304979	0.630992	0.236407	0.359196	0.365734	0.183632	0.206219
71	FA2H	FA2H Arg143Cys	0.497748	0.261103	0.25313	0.266671	0.560055	0.236407	0.359196	0.0899539	0.183632	0.206219
72	FA2H	FA2H Arg62Cys	0.561424	0.261103	0.669551	0.266671	0.510462	0.236407	0.359196	0.46593	0.183632	0.206219
73	FA2H	FA2H Phe144Ser	0.362088	0.261103	0.0193911	0.266671	0.19953	0.236407	0.359196	0.203714	0.183632	0.206219
74	FAM161A	FAM161A Leu269Arg	0.404807	0.66421	0.0746539	0.554056	0.208519	0.236407	0.359196	0.477677	0.183632	0.206219
75	ASNS	ASNS Ala6Glu	-0.393304	0.86979	-0.0282066	0.552998	0.777259	0.236407	0.359196	0.467151	0.183632	0.206219
76	BCL10	BCL10 Ala5Ser	0.87405	0.838855	0.831978	0.632199	0.85294	0.236407	0.359196	0.71313	0.183632	0.206219
77	BCL10	BCL10 Leu8Leu	-0.207115	0.838855	-0.0647577	0.632199	0.783252	0.236407	0.359196	0.622756	0.183632	0.206219
78	CREB1	CREB1 Asp116Gly	-0.399006	0.845436	0.678742	0.364229	0.877664	0.236407	0.359196	0.60409	0.183632	0.206219
79	CRYAB	CRYAB Asp109His	-0.036847	0.927002	0.334968	0.691911	0.798813	0.236407	0.359196	0.43454	0.183632	0.206219
80	CRYAB	CRYAB Gly154Ser	0.974476	0.927002	0.69362	0.691911	0.901846	0.236407	0.359196	0.365807	0.183632	0.206219
81	DES	DES Ala135Val	0.498567	0.96098	-0.137358	0.666678	0.900919	0.236407	0.359196	0.492537	0.183632	0.206219
82	DES	DES Ala213Val	0.918395	0.96098	0.287841	0.666678	0.933563	0.236407	0.359196	0.392204	0.183632	0.206219
83	DES	DES Ala237Thr	0.555215	0.96098	-0.256816	0.666678	0.647107	0.236407	0.359196	0.461733	0.183632	0.206219
84	DES	DES Ala337Pro	-0.472672	0.96098	-0.320047	0.666678	0.400892	0.236407	0.359196	0.397876	0.183632	0.206219
85	DES	DES Ala357Pro	0.0334099	0.96098	-0.26341	0.666678	0.241328	0.236407	0.359196	0.516166	0.183632	0.206219
86	DES	DES Ala397Thr	0.500556	0.96098	-0.255255	0.666678	0.802505	0.236407	0.359196	0.548423	0.183632	0.206219
87	DES	DES Arg127Pro	0.39693	0.96098	-0.120309	0.666678	0.698106	0.236407	0.359196	0.316707	0.183632	0.206219
88	DES	DES Arg150Gln	0.436826	0.96098	-0.128675	0.666678	0.705245	0.236407	0.359196	0.525045	0.183632	0.206219
89	DES	DES Arg16Cys	0.974271	0.96098	0.895651	0.666678	0.916468	0.236407	0.359196	0.582325	0.183632	0.206219
90	DES	DES Arg212Gln	0.876609	0.96098	0.257424	0.666678	0.939741	0.236407	0.359196	0.602379	0.183632	0.206219
91	DES	DES Arg222His	0.476384	0.96098	-0.139527	0.666678	0.784399	0.236407	0.359196	0.319717	0.183632	0.206219
92	DES	DES Arg227Cys	0.909985	0.96098	0.47929	0.666678	0.880824	0.236407	0.359196	0.398954	0.183632	0.206219
93	DES	DES Arg278Pro	0.339447	0.96098	-0.203738	0.666678	0.717046	0.236407	0.359196	0.523847	0.183632	0.206219
94	DES	DES Arg350Pro	0.838215	0.96098	0.334518	0.666678	0.940407	0.236407	0.359196	0.53797	0.183632	0.206219
95	DES	DES Arg355Pro	-0.262917	0.96098	-0.0822246	0.666678	0.0933792	0.236407	0.359196	0.00857638	0.183632	0.206219
96	DES	DES Arg37Trp	0.379265	0.96098	-0.0111762	0.666678	0.454628	0.236407	0.359196	0.325438	0.183632	0.206219
97	DES	DES Asn342Asp	-0.292271	0.96098	-0.234666	0.666678	0.651685	0.236407	0.359196	0.497773	0.183632	0.206219
98	DES	DES Asp312Ala	0.613102	0.96098	0.101297	0.666678	0.671385	0.236407	0.359196	0.473157	0.183632	0.206219
99	DES	DES Asp343Asn	0.581868	0.96098	-0.212229	0.666678	0.798856	0.236407	0.359196	0.31538	0.183632	0.206219
100	DES	DES Gln131Lys	0.845324	0.96098	0.0421202	0.666678	0.956003	0.236407	0.359196	0.552383	0.183632	0.206219
101	DES	DES Gln389Pro	0.948128	0.96098	0.485484	0.666678	0.954953	0.236407	0.359196	0.59401	0.183632	0.206219
102	DES	DES Gln99Glu	-0.0921232	0.96098	-0.312078	0.666678	0.294887	0.236407	0.359196	0.397539	0.183632	0.206219
103	DES	DES Glu245Asp	-0.112741	0.96098	-0.248482	0.666678	0.476539	0.236407	0.359196	0.419267	0.183632	0.206219
104	DES	DES Glu413Lys	0.688131	0.96098	-0.183831	0.666678	0.660136	0.236407	0.359196	0.445209	0.183632	0.206219
105	DES	DES Gly20Arg	0.830217	0.96098	0.302616	0.666678	0.838598	0.236407	0.359196	0.610163	0.183632	0.206219
106	DES	DES Gly44Ser	0.489784	0.96098	-0.0703174	0.666678	0.560471	0.236407	0.359196	0.371118	0.183632	0.206219
107	DES	DES Gly84Ser	0.456559	0.96098	-0.133157	0.666678	0.575886	0.236407	0.359196	0.306097	0.183632	0.206219
108	DES	DES His243Tyr	0.832262	0.96098	0.452825	0.666678	0.882084	0.236407	0.359196	0.249457	0.183632	0.206219
109	DES	DES His441Leu	0.691012	0.96098	-0.102578	0.666678	0.832119	0.236407	0.359196	0.454825	0.183632	0.206219
110	DES	DES Leu136Pro	0.910889	0.96098	0.390357	0.666678	0.949513	0.236407	0.359196	0.663502	0.183632	0.206219
111	DES	DES Leu274Pro	0.558595	0.96098	-0.338633	0.666678	0.824172	0.236407	0.359196	0.379396	0.183632	0.206219
112	DES	DES Leu338Arg	-0.118026	0.96098	-0.113017	0.666678	0.312462	0.236407	0.359196	0.328504	0.183632	0.206219
113	DES	DES Leu345Pro	-0.488238	0.96098	-0.275704	0.666678	0.442715	0.236407	0.359196	0.209938	0.183632	0.206219
114	DES	DES Met349Ile	0.50701	0.96098	-0.233906	0.666678	0.837868	0.236407	0.359196	0.516332	0.183632	0.206219
115	DES	DES Pro419Ser	0.352441	0.96098	-0.190622	0.666678	0.412984	0.236407	0.359196	0.37414	0.183632	0.206219
116	DES	DES Pro433Thr	0.671607	0.96098	-0.266801	0.666678	0.769299	0.236407	0.359196	0.499264	0.183632	0.206219
117	DES	DES Ser298Leu	0.59923	0.96098	-0.165427	0.666678	0.753565	0.236407	0.359196	0.550371	0.183632	0.206219
118	DES	DES Ser424Phe	0.680491	0.96098	-0.240006	0.666678	0.856886	0.236407	0.359196	0.507256	0.183632	0.206219
119	DES	DES Ser46Tyr	0.15768	0.96098	-0.221751	0.666678	0.141532	0.236407	0.359196	0.172794	0.183632	0.206219
120	DES	DES Thr219Ile	0.856067	0.96098	0.479039	0.666678	0.918727	0.236407	0.359196	0.574198	0.183632	0.206219
121	DES	DES Thr445Ala	0.769654	0.96098	0.233077	0.666678	0.899982	0.236407	0.359196	0.206988	0.183632	0.206219
122	DES	DES Thr453Ile	0.507269	0.96098	-0.286501	0.666678	0.783687	0.236407	0.359196	0.290476	0.183632	0.206219
123	DES	DES Tyr122Asp	0.966974	0.96098	0.6171	0.666678	0.906981	0.236407	0.359196	0.470362	0.183632	0.206219
124	DES	DES Tyr331Asn	0.0531358	0.96098	0.149425	0.666678	0.622804	0.236407	0.359196	0.415561	0.183632	0.206219
125	DES	DES Val126Leu	0.961887	0.96098	0.636213	0.666678	0.907963	0.236407	0.359196	0.526978	0.183632	0.206219
126	DES	DES Val394Met	0.42363	0.96098	-0.139181	0.666678	0.736489	0.236407	0.359196	0.404362	0.183632	0.206219
127	DES	DES Val469Met	-0.165913	0.96098	-0.272643	0.666678	0.251948	0.236407	0.359196	0.216013	0.183632	0.206219
128	DES	DES Val56Leu	0.547154	0.96098	-0.155579	0.666678	0.781271	0.236407	0.359196	0.337276	0.183632	0.206219
129	CA8	CA8 Arg237Gln	0.392047	0.766353	0.738549	0.678179	0.925869	0.236407	0.359196	0.561532	0.183632	0.206219
130	CDKN1A	CDKN1A Arg67Leu	0.883155	0.860748	0.580158	0.638411	0.711187	0.236407	0.359196	0.638306	0.183632	0.206219
131	CDKN1A	CDKN1A Arg84Gln	0.879257	0.860748	0.599658	0.638411	0.698247	0.236407	0.359196	0.590087	0.183632	0.206219
132	CDKN1A	CDKN1A Asp149Gly	0.952675	0.860748	0.919989	0.638411	0.892883	0.236407	0.359196	0.834276	0.183632	0.206219
133	CDKN1A	CDKN1A Ser31Arg	-0.143582	0.860748	0.242012	0.638411	0.587451	0.236407	0.359196	0.576635	0.183632	0.206219
134	EFHC1	EFHC1 Arg159Trp	0.442311	0.353609	0.65473	0.308837	0.278683	0.236407	0.359196	0.286627	0.183632	0.206219
135	EFHC1	EFHC1 Asp210Asn	0.710313	0.353609	0.314558	0.308837	0.647675	0.236407	0.359196	0.484588	0.183632	0.206219
136	EFHC1	EFHC1 Asp253Tyr	0.720759	0.353609	0.558822	0.308837	0.459286	0.236407	0.359196	0.133959	0.183632	0.206219
137	EFHC1	EFHC1 Cys259Tyr	0.55515	0.353609	-0.388751	0.308837	0.844412	0.236407	0.359196	0.555367	0.183632	0.206219
138	EFHC1	EFHC1 Ile174Val	0.662932	0.353609	0.620766	0.308837	0.390085	0.236407	0.359196	0.27129	0.183632	0.206219
139	EFHC1	EFHC1 Met448Thr	0.554476	0.353609	-0.328411	0.308837	0.774648	0.236407	0.359196	0.478676	0.183632	0.206219
140	EFHC1	EFHC1 Phe229Leu	0.700251	0.353609	0.275369	0.308837	0.706964	0.236407	0.359196	0.477644	0.183632	0.206219
141	BAG3	BAG3 Arg218Trp	0.866316	0.839816	0.36141	0.500452	0.798331	0.236407	0.359196	0.52026	0.183632	0.206219
142	BAG3	BAG3 Arg258Trp	-0.346637	0.839816	0.163514	0.500452	0.937087	0.236407	0.359196	0.203287	0.183632	0.206219
143	BAG3	BAG3 Arg477His	0.862916	0.839816	0.723351	0.500452	0.852504	0.236407	0.359196	0.484095	0.183632	0.206219
144	BAG3	BAG3 Leu462Pro	0.657287	0.839816	0.274224	0.500452	0.773469	0.236407	0.359196	0.349204	0.183632	0.206219
145	BAG3	BAG3 Pro380Ser	0.946702	0.839816	0.798054	0.500452	0.920722	0.236407	0.359196	0.659353	0.183632	0.206219
146	CSNK1D	CSNK1D His46Arg	0.489742	0.652241	0.0320267	0.600537	0.771899	0.236407	0.359196	0.401865	0.183632	0.206219
147	BFSP2	BFSP2 Ala407Asp	0.188576	0.86151	0.0961269	0.525939	0.451919	0.236407	0.359196	0.365802	0.183632	0.206219
148	BFSP2	BFSP2 Arg287Trp	-0.203345	0.86151	0.112663	0.525939	0.827873	0.236407	0.359196	0.672278	0.183632	0.206219
149	BFSP2	BFSP2 Arg339His	0.766487	0.86151	0.498634	0.525939	0.663021	0.236407	0.359196	0.434368	0.183632	0.206219
150	FADD	FADD Cys105Trp	0.333091	0.369791	0.218344	0.158193	0.324666	0.236407	0.359196	0.621813	0.183632	0.206219
151	AGXT	AGXT Ala186Val	0.40574	0.849535	0.20802	0.664522	0.881702	0.236407	0.359196	0.151812	0.183632	0.206219
152	AGXT	AGXT Ala210Pro	0.339954	0.849535	0.46798	0.664522	0.692378	0.236407	0.359196	0.14402	0.183632	0.206219
153	AGXT	AGXT Ala248Ser	0.599298	0.849535	0.407043	0.664522	0.736866	0.236407	0.359196	0.276903	0.183632	0.206219
154	AGXT	AGXT Ala248Val	0.919494	0.849535	0.884055	0.664522	0.767134	0.236407	0.359196	0.548562	0.183632	0.206219
155	AGXT	AGXT Ala280Val	0.846497	0.849535	0.672835	0.664522	0.651513	0.236407	0.359196	0.52764	0.183632	0.206219
156	AGXT	AGXT Ala295Thr	0.964667	0.849535	0.853996	0.664522	0.846539	0.236407	0.359196	0.607658	0.183632	0.206219
157	AGXT	AGXT Ala85Asp	0.335741	0.849535	0.186985	0.664522	0.808038	0.236407	0.359196	0.238338	0.183632	0.206219
158	AGXT	AGXT Arg111Gln	0.639816	0.849535	0.762217	0.664522	0.756375	0.236407	0.359196	0.518734	0.183632	0.206219
159	AGXT	AGXT Arg118Cys	0.676583	0.849535	0.563772	0.664522	0.702174	0.236407	0.359196	0.611143	0.183632	0.206219
160	AGXT	AGXT Arg197Gln	0.896846	0.849535	0.580685	0.664522	0.64914	0.236407	0.359196	0.410531	0.183632	0.206219
161	AGXT	AGXT Arg289His	0.374419	0.849535	0.452246	0.664522	0.807065	0.236407	0.359196	0.572713	0.183632	0.206219
162	AGXT	AGXT Arg301Cys	0.634586	0.849535	0.281424	0.664522	0.741832	0.236407	0.359196	0.450398	0.183632	0.206219
163	AGXT	AGXT Arg36Cys	0.095247	0.849535	-0.256261	0.664522	0.68908	0.236407	0.359196	0.212972	0.183632	0.206219
164	AGXT	AGXT Arg381Lys	0.887036	0.849535	0.689149	0.664522	0.781288	0.236407	0.359196	0.54443	0.183632	0.206219
165	AGXT	AGXT Asn22Ser	0.773378	0.849535	0.594779	0.664522	0.580641	0.236407	0.359196	0.462521	0.183632	0.206219
166	AGXT	AGXT Asp129His	0.927968	0.849535	0.750045	0.664522	0.717215	0.236407	0.359196	0.335258	0.183632	0.206219
167	AGXT	AGXT Asp201Asn	-0.153482	0.849535	-0.238843	0.664522	0.3578	0.236407	0.359196	0.466604	0.183632	0.206219
168	AGXT	AGXT Asp341Glu	0.0697666	0.849535	-0.0565573	0.664522	0.356182	0.236407	0.359196	0.276413	0.183632	0.206219
169	AGXT	AGXT Glu274Asp	0.607957	0.849535	0.58081	0.664522	0.577989	0.236407	0.359196	0.5882	0.183632	0.206219
170	AGXT	AGXT Gly116Arg	0.41309	0.849535	0.752629	0.664522	0.708328	0.236407	0.359196	0.568779	0.183632	0.206219
171	AGXT	AGXT Gly156Arg	0.216318	0.849535	0.228235	0.664522	0.583163	0.236407	0.359196	0.281612	0.183632	0.206219
172	AGXT	AGXT Gly161Arg	0.278343	0.849535	0.67403	0.664522	0.73253	0.236407	0.359196	0.47044	0.183632	0.206219
173	AGXT	AGXT Gly161Ser	0.728239	0.849535	0.520919	0.664522	0.781395	0.236407	0.359196	0.327061	0.183632	0.206219
174	AGXT	AGXT Gly41Arg	0.664937	0.849535	0.807193	0.664522	0.381512	0.236407	0.359196	0.204492	0.183632	0.206219
175	AGXT	AGXT Gly41Glu	0.814439	0.849535	0.611575	0.664522	0.745704	0.236407	0.359196	0.4382	0.183632	0.206219
176	AGXT	AGXT Gly82Arg	0.915432	0.849535	0.927311	0.664522	0.869626	0.236407	0.359196	0.668718	0.183632	0.206219
177	AGXT	AGXT Ile202Asn	0.203934	0.849535	0.514362	0.664522	0.717426	0.236407	0.359196	0.34812	0.183632	0.206219
178	AGXT	AGXT Ile279Met	0.078649	0.849535	0.686356	0.664522	0.824106	0.236407	0.359196	0.437371	0.183632	0.206219
179	AGXT	AGXT Ile279Thr	-0.204764	0.849535	-0.26962	0.664522	0.207085	0.236407	0.359196	0.198754	0.183632	0.206219
180	AGXT	AGXT Ile340Met	0.903956	0.849535	0.644107	0.664522	0.691263	0.236407	0.359196	0.635978	0.183632	0.206219
181	AGXT	AGXT Leu298Pro	0.235721	0.849535	0.26251	0.664522	0.602883	0.236407	0.359196	0.226917	0.183632	0.206219
182	AGXT	AGXT Lys12Arg	0.937222	0.849535	0.939283	0.664522	0.787612	0.236407	0.359196	0.686885	0.183632	0.206219
183	AGXT	AGXT Met195Leu	0.670139	0.849535	0.443538	0.664522	0.674409	0.236407	0.359196	0.311371	0.183632	0.206219
184	AGXT	AGXT Met49Leu	0.666889	0.849535	0.43624	0.664522	0.765939	0.236407	0.359196	0.509871	0.183632	0.206219
185	AGXT	AGXT Phe152Ile	0.704864	0.849535	0.857269	0.664522	0.363803	0.236407	0.359196	0.142884	0.183632	0.206219
186	AGXT	AGXT Pro10Ala	0.906964	0.849535	0.89344	0.664522	0.809542	0.236407	0.359196	0.572227	0.183632	0.206219
187	AGXT	AGXT Pro11His	0.689958	0.849535	0.591079	0.664522	0.753937	0.236407	0.359196	0.314062	0.183632	0.206219
188	AGXT	AGXT Pro11Leu	0.33432	0.849535	0.158312	0.664522	0.659494	0.236407	0.359196	0.11871	0.183632	0.206219
189	AGXT	AGXT Pro319Leu	0.59928	0.849535	0.328312	0.664522	0.733404	0.236407	0.359196	0.291737	0.183632	0.206219
190	AGXT	AGXT Ser187Phe	0.59755	0.849535	0.759807	0.664522	0.804119	0.236407	0.359196	0.642991	0.183632	0.206219
191	AGXT	AGXT Ser218Leu	0.64313	0.849535	0.881151	0.664522	0.826705	0.236407	0.359196	0.762984	0.183632	0.206219
192	AGXT	AGXT Ser221Pro	0.248213	0.849535	0.672312	0.664522	0.686656	0.236407	0.359196	0.368276	0.183632	0.206219
193	AGXT	AGXT Val162Met	0.65846	0.849535	0.557249	0.664522	0.304567	0.236407	0.359196	0.244205	0.183632	0.206219
194	AGXT	AGXT Val326Ile	0.684498	0.849535	0.577242	0.664522	0.612692	0.236407	0.359196	0.536806	0.183632	0.206219
195	COQ8A	COQ8A Gly272Asp	0.674594	0.317254	0.469798	0.243587	0.672593	0.236407	0.359196	0.215357	0.183632	0.206219
196	COQ8A	COQ8A Gly549Ser	0.34926	0.317254	-0.0127499	0.243587	0.732797	0.236407	0.359196	0.394383	0.183632	0.206219
197	COQ8A	COQ8A His80Tyr	-0.376851	0.317254	-0.444023	0.243587	0.755664	0.236407	0.359196	0.572319	0.183632	0.206219
198	CHN1	CHN1 Glu313Lys	0.823906	0.497718	0.649934	0.537013	0.737574	0.236407	0.359196	0.527971	0.183632	0.206219
199	CHN1	CHN1 Ile126Met	0.742673	0.497718	0.509495	0.537013	0.721022	0.236407	0.359196	0.707186	0.183632	0.206219
200	CHN1	CHN1 Pro141Leu	0.762687	0.497718	0.537037	0.537013	0.831193	0.236407	0.359196	0.675641	0.183632	0.206219
201	CHN1	CHN1 Pro252Ser	-0.1729	0.497718	0.415451	0.537013	0.844275	0.236407	0.359196	0.771959	0.183632	0.206219
202	CHN1	CHN1 Tyr143His	0.733163	0.497718	0.640303	0.537013	0.722927	0.236407	0.359196	0.774065	0.183632	0.206219
203	CDC73	CDC73 Met1Ile	-0.682187	0.765012	0.11803	0.527967	0.817551	0.236407	0.359196	0.434372	0.183632	0.206219
204	COMP	COMP Ala171Thr	0.0519353	0.48337	-0.0979181	0.565594	0.757712	0.236407	0.359196	0.479403	0.183632	0.206219
205	COMP	COMP Arg718Pro	0.014536	0.48337	-0.427918	0.565594	0.870755	0.236407	0.359196	0.377358	0.183632	0.206219
206	COMP	COMP Asn523Lys	0.0469446	0.48337	-0.370681	0.565594	0.735884	0.236407	0.359196	0.171554	0.183632	0.206219
207	COMP	COMP Asn555Lys	0.278054	0.48337	0.291873	0.565594	0.888965	0.236407	0.359196	0.870107	0.183632	0.206219
208	COMP	COMP Asp271His	0.448496	0.48337	0.305684	0.565594	0.214483	0.236407	0.359196	0.487396	0.183632	0.206219
209	COMP	COMP Asp319Val	0.482311	0.48337	0.112505	0.565594	0.42538	0.236407	0.359196	0.328242	0.183632	0.206219
210	COMP	COMP Asp342Tyr	0.0366465	0.48337	-0.564795	0.565594	0.741217	0.236407	0.359196	0.421236	0.183632	0.206219
211	COMP	COMP Asp408Asn	0.652258	0.48337	0.683344	0.565594	0.159243	0.236407	0.359196	0.407203	0.183632	0.206219
212	COMP	COMP Asp408His	0.0460424	0.48337	0.137493	0.565594	0.318446	0.236407	0.359196	0.192894	0.183632	0.206219
213	COMP	COMP Asp511Glu	0.098606	0.48337	-0.178495	0.565594	0.690808	0.236407	0.359196	0.512038	0.183632	0.206219
214	COMP	COMP Asp530Glu	-0.0878529	0.48337	-0.311885	0.565594	0.914823	0.236407	0.359196	0.29386	0.183632	0.206219
215	COMP	COMP Asp605Asn	-0.0132953	0.48337	-0.343115	0.565594	0.815929	0.236407	0.359196	0.347799	0.183632	0.206219
216	COMP	COMP Cys348Arg	0.1296	0.48337	-0.320824	0.565594	0.879598	0.236407	0.359196	0.574866	0.183632	0.206219
217	COMP	COMP Gly207Asp	-0.163465	0.48337	-0.386234	0.565594	0.768955	0.236407	0.359196	0.388437	0.183632	0.206219
218	COMP	COMP His189Arg	-0.309563	0.48337	-0.339264	0.565594	0.793272	0.236407	0.359196	0.364794	0.183632	0.206219
219	COMP	COMP His441Arg	-0.271095	0.48337	-0.36719	0.565594	0.753647	0.236407	0.359196	0.433715	0.183632	0.206219
220	COMP	COMP His587Arg	-0.0776071	0.48337	-0.475992	0.565594	0.865808	0.236407	0.359196	0.471006	0.183632	0.206219
221	COMP	COMP Ser681Cys	0.654227	0.48337	0.675944	0.565594	0.266211	0.236407	0.359196	0.207867	0.183632	0.206219
222	COMP	COMP Thr529Ile	-0.0165323	0.48337	0.100917	0.565594	0.634356	0.236407	0.359196	0.47933	0.183632	0.206219
223	COMP	COMP Thr585Arg	0.278612	0.48337	-0.0138169	0.565594	0.833655	0.236407	0.359196	0.415051	0.183632	0.206219
224	COMP	COMP Thr585Lys	0.37571	0.48337	0.13773	0.565594	0.520079	0.236407	0.359196	0.371695	0.183632	0.206219
225	COMP	COMP Thr585Met	0.0867655	0.48337	-0.326112	0.565594	0.848373	0.236407	0.359196	0.363455	0.183632	0.206219
226	AMPD2	AMPD2 Glu697Asp	0.766463	0.81178	0.849455	0.470379	0.772086	0.236407	0.359196	0.249831	0.183632	0.206219
227	CORO1A	CORO1A Val397Ile	0.0615145	0.852053	0.306829	0.385279	0.81264	0.236407	0.359196	0.64429	0.183632	0.206219
228	APOA1	APOA1 Ala188Ser	-0.092556	0.596283	-0.152395	0.463634	0.484829	0.236407	0.359196	0.358139	0.183632	0.206219
229	APOA1	APOA1 Ala199Pro	-0.00654168	0.596283	0.693163	0.463634	0.886548	0.236407	0.359196	0.0456008	0.183632	0.206219
230	APOA1	APOA1 Arg197Cys	0.0274639	0.596283	-0.173555	0.463634	0.875393	0.236407	0.359196	0.228524	0.183632	0.206219
231	APOA1	APOA1 Arg34Leu	-0.261367	0.596283	0.698514	0.463634	0.861617	0.236407	0.359196	0.448169	0.183632	0.206219
232	APOA1	APOA1 Leu114Pro	-0.0300296	0.596283	0.840867	0.463634	0.838112	0.236407	0.359196	0.455979	0.183632	0.206219
233	APOA1	APOA1 Leu198Ser	-0.0397486	0.596283	-0.0862165	0.463634	0.748647	0.236407	0.359196	0.0596162	0.183632	0.206219
234	APOA1	APOA1 Leu84Arg	-0.0390537	0.596283	0.230308	0.463634	0.859377	0.236407	0.359196	0.246977	0.183632	0.206219
235	APOA1	APOA1 Trp74Arg	-0.157724	0.596283	0.747219	0.463634	0.873373	0.236407	0.359196	0.448962	0.183632	0.206219
236	APOA1	APOA1 Val180Glu	-0.124348	0.596283	0.887283	0.463634	0.834699	0.236407	0.359196	0.426655	0.183632	0.206219
237	CFP	CFP Tyr414Asp	-0.457254	0.862304	0.264033	0.204986	0.691247	0.236407	0.359196	0.423041	0.183632	0.206219
238	DIABLO	DIABLO Ala3Gly	0.406146	0.544288	0.406309	0.36727	0.328608	0.236407	0.359196	0.255339	0.183632	0.206219
239	DIABLO	DIABLO Gly224Arg	0.71521	0.544288	0.395186	0.36727	0.461083	0.236407	0.359196	0.504745	0.183632	0.206219
240	DIABLO	DIABLO Ile59Val	0.426364	0.544288	0.211285	0.36727	0.787706	0.236407	0.359196	0.479351	0.183632	0.206219
241	DIABLO	DIABLO Ser126Leu	0.55814	0.544288	0.42785	0.36727	0.651042	0.236407	0.359196	0.166687	0.183632	0.206219

Summary

If we call a variant to be impactful if the correlation coef of WT/MT pair is less than 10th percentile of the replicate correlate dist, below would be the percentage of the impactful variants
~45% of variants in protein channel are impactful
~41% of variants in non-protein channel are impactful

from 2021_09_01_varchamp.

AnneCarpenter commented on May 29, 2024

Exciting!
I'm confused though because it looks like 80% of the red pairs are to the right of the dotted lines here and 10-20% of pairs are on the left of the lines, am I missing something? Neither is around 40% so I must be misinterpreting something. Also I wasn't sure what "the replicate correlate dist" means?

from 2021_09_01_varchamp.

MarziehHaghighi commented on May 29, 2024

These distributions are the regular replicate correlation distributions (along with their corresponding null - blue dist). 40% is not captured here. The only number from this figure with influences the 40% number is where the red dotted line falls (10th percentile of the red -distribution of correlation coef values among replicates- dist). Impact scores distribution is not placed on this figure. But you can check per-WT/MT-pair values in the table. For example, if you look at the "cc_np" column in that table, 40% of the values should be less than red dotted line value for the figure you copied for non-protein channel dists which is 0.2.

from 2021_09_01_varchamp.

AnneCarpenter commented on May 29, 2024

I see! It would be nice to visually see the distribution of the WT-MT pairs (because IIUC the histograms are only showing WT replicates and MT replicates in red, or scrambled replicates in blue) but I am following the logic now.

from 2021_09_01_varchamp.

yhan8 commented on May 29, 2024

Drafting my steps here. In the metadata , column Metadata_Sample_Unique includes the wild type and mutant names. Two kinds of replicability will be calculated using evalzoo:

technical replicability-whether replicates based on Metadata_Sample_Unique are replicates. There are no controls in the data (i.e., remove all 516 -TC), so it will be replicate against non-replicates, by plate.

All analysis was done by Copairs, three plates were combined together and all 516-TC were removed, which gave us 1077 samples. We define replicates as those who have the same Metadata_Sample_Unique, To see if we can retrieve replicates from non replicates, the following parameters were implemented into Copairs: pos_sameby = ['Metadata_Sample_Unique'] neg_diffby = ['Metadata_Sample_Unique']. We got a p value for each individual sample, so I then aggregated the result using Metadata_Sample_Unique. The table and figure show the unique Metadata_Sample_Unique that passed the significance threshold.

from 2021_09_01_varchamp.

yhan8 commented on May 29, 2024

biological replicability-whether mutant for the same wild type can be retrieved from from the wild type itself.

To see if we can retrieve mutants from its own WT, the following parameters were implemented into Copairs: pos_sameby = ['Metadata_Gene'], pos_diffby = ['Metadata_type'], neg_diffby = ['Metadata_Gene']. This is to say we match mutants to its WT (a particular gene name) against the rest of the gene names including both their WTs and MTs. I got a p value for each individual sample, given the fact that each WT has different mutants, it is interesting to see which particular mutant has impact on its WT. Thus, I removed all the WTs from the Copair results, and then aggregated the results using Metadata_Sample_Unique, which in this case corresponded to each unique mutant. The table and figure below show whether those unique MT passed the significance threshold, however, in this case, we care about those who did not pass the threshold, meaning the MT had an effect on its WT.

from 2021_09_01_varchamp.

yhan8 commented on May 29, 2024

@MarziehHaghighi has generated correlation score for each Metadata_Sample_Unique to demonstrate if we can retrieve MT from its WT, the equivalence to this analysis. I plotted the mAP score for each Metadata_Sample_Unique using Copairs on the same plot with correlation score. I noticed that there are 12 unique Metadata_Sample_Unique included in my mAP score, but not in @MarziehHaghighi's correlation score.

['CTH Gln240Glu',
'BLMH Ile443Val',
'AP2S1 Arg15Cys',
'CLDN19 Arg200Gln',
'CUL3 Lys459Arg',
'CTNNA3 Val94Asp',
'AP2S1 Arg15His',
'BLK Ala71Thr',
'CCBE1 Gly136Arg',
'CLDN19 Gln57Glu',
'CTH Thr67Ile',
'CTH Ser403Ile']

For protein channel

For non protein channel

from 2021_09_01_varchamp.

AnneCarpenter commented on May 29, 2024

IIUC the samples that correlate highly but have low average precision must be getting mixed up with lots of other samples in the experiment. That is, they have a strong phenotype that is similar to many other samples so it's hard to retrieve. (does anyone have an alternate explanation?) I am surprised it happens so often - the top left quadrant is much more full than I would have guessed.

from 2021_09_01_varchamp.

MarziehHaghighi commented on May 29, 2024

@yhan8 please put your data stats as I have done in my report comment to make sure they are consistent as the first thing to start with. Here it would be the number of samples and the level of profiles you used and number of features for each "p" and "np". The pattern for "np" is weird so I guess there might be some discrepancy in "np". If you checkout the short script I used to generate my analysis, you can figure out what I have filtered and the reason behind extra samples you have.

About the plots: would be great to have x-y axis the same scale, one is 0-1 and one -1-1 so your x would be ideally half of y in length.
The first plot makes more sense to me, I was expecting a linear relationship from 0-1 between cc and map! (confirming Anne's comment, I was not expecting many samples to have a strong phenotype in average but not retrievable given that we have high replicability in the dataset in general) negative cc s are all near zero for map which also makes sense! but the second plot ("np") is wired. So, lets check if the features that are used (and the level of data) match as I cant think of more details in preprocessing that might have caused this (there is really no more into it as we just read profiles and filtering the name of features for each cc and map analysis :D)

from 2021_09_01_varchamp.

AnneCarpenter commented on May 29, 2024

That's a good idea - I agree Yu that it's good to record those stats so we can reality-check that everything looks sensible.

from 2021_09_01_varchamp.

yhan8 commented on May 29, 2024

Data Stats:

100 uniques WTs
254 unique pairs
Used feature selected level of profiles
101 protein channel features and 584 non-protein channel features corresponding to the rest of channels went for analysis
Note: all stats are consistent except I have 6 more protein features.

I am unclear on why @MarziehHaghighi's profile filtered out additional 12 pairs, I'll leave that to her.

Regarding the x and y scales, I am not sure if we want to visualize it like this?

from 2021_09_01_varchamp.

AnneCarpenter commented on May 29, 2024

I should add a comment, Marzieh - I think the two plots are really similar - I don't see np as problematic (except those two weird points!) They are both showing a gentle curve but with lots in the top left.

from 2021_09_01_varchamp.

MarziehHaghighi commented on May 29, 2024

@AnneCarpenter well the unexpected pattern is much bolder for np to me. ~50 samples for MAP>0.3 for protein but ~15 for non-protein plot although in cc they look the same (many high cc points exist for both).

Thanks for checking @yu. I cloned the repo (as it was among the repos I lost) but realized I cant regenerate the results since I was using functions from the main rare_disease repo which I lost in EC2 termination incident! I had done major refactoring on that repo during the past few months which are all gone :((! Anyway, I wanted to check the reason behind the missing variants, but since you have more variants I skip doing further checking as we want to switch to your way of analysis anyway!

from 2021_09_01_varchamp.

yhan8 commented on May 29, 2024

Next steps:
Plot nlogp value, x-axis is biological, y-axis is technical. The quadrant we care about is where the WT/MT that passed the significant technical threshold, but did not pass the biological threshold, meaning there is real signal when a MT has an effect on WT, not just random noise.

from 2021_09_01_varchamp.

AnneCarpenter commented on May 29, 2024

And to clarify, we probably want to reverse the axes so biological is on the y axis. Also, we couldn't decide if the technical should be retrieval for WT or for MUT of each pair. We waffled between making two copies of this chart, one with each on that axis, or if both ought to be plotted (with a line connecting them, even better). Depends how complex it is to plot all this. Yu is going to read the lung allele paper to understand better the concepts and take a look at the sparkler plot which aims to address this plotting conundrum (but is not very intuitive!).

from 2021_09_01_varchamp.

AnneCarpenter commented on May 29, 2024

In checkin we talked about different visualizations that could work here:

3D with WT and MUT tech retrieval on separate axes (though hard to publish unless the data cooperates to be nicely viewable at a good angle),
drawing a straight line between paired WT and MUT dots on x axis (with bio value being the same for both, such that all lines would be horizontal), could get messy but shows all the info we need.
I realized another approach that may be ideal and also easy to implement: just plot the maximum of the WT or MUT tech retrieval for each pair on the x axis (and maybe have a 3 color legend for the dots, where the color indicate whether it was (a) WT tech retrieval is above the threshold, (b) MUT is above the threshold, or (c) both).

from 2021_09_01_varchamp.

yhan8 commented on May 29, 2024

I realized another approach that may be ideal and also easy to implement: just plot the maximum of the WT or MUT tech retrieval for each pair on the x axis (and maybe have a 3 color legend for the dots, where the color indicate whether it was (a) WT tech retrieval is above the threshold, (b) MUT is above the threshold, or (c) both).

Here I am showing you the plot, that includes 254 WT+MT pairs (i.e., excluding all WTs), their biological retrieval, that is, does each unique MT have an effect on its WT. The x-axis is average precision score, the y-axis is nlogp value. The dotted redline is the p=0.05 threshold, then I color coded the scatters based on whether 1) the WT passed the technical retrieval threshold (p<0.05) 2). the MT but not the WT passed the technical retrieval threshold 3). both WT and MT passed the technical retrieval threshold. 4) False is neither WT and MT passed the technical retrieval threshold. This is for us to determine whether the signal we see is noise, and where the noise is from.

Note that we care about WT+MT pairs that do not pass the significance threshold, meaning we cannot retrieve its MT from its WT, hence the MT has an effect.

For protein channel

For non-protein channel

from 2021_09_01_varchamp.

yhan8 commented on May 29, 2024

Check in on 6/22:
We have come to the conclusion that the distance metric (mAP) may not be the best approach because of the common protein properties across different genes. @shntnu will discuss with Marzieh and see if we shall try a classifier, and we will go from there.

from 2021_09_01_varchamp.

shntnu commented on May 29, 2024

We have thus far used the mean average precision (mAP) framework for hit calling. In this method, a variant is deemed a 'hit' if it cannot retrieve wild-type replicates efficiently against all other wells on the same plate. To be specific:

One replicate of a variant is queried against all different-gene wells on the same plate; these serve as negative connections (typical n = 384 - number of same-gene wells on the plate). It is also compared to the corresponding (same-gene) wild-type wells across all plates; these serve as positive connections (typical n = 4, due to the typical four replicates of everything).

An mAP score and a corresponding p-value is derived for each variant using these query results. If the FDR-adjusted p-value is less than the prespecified alpha, we infer that we can retrieve the wild-type replicates well, implying that the wild-type and the mutant are practically indistinguishable (w.r.t. to other perturbations -- this is a key point). If the mAP score falls below this threshold, the variant is considered a hit. (Note, this is somewhat convoluted as we're essentially stating that we designate it as a hit if we can't reject the null hypothesis, but this is not a major issue)

Problematic Scenario:

Complications arise when the wild-type and the mutant only have subtle differences. For instance, if the wild-type elongates the cell nucleus and the mutant further enhances this phenotype, it will likely result in a high retrieval score due to the rarity of nuclear elongation as a phenotype. This implies the variant would not be tagged as a 'hit' despite the minor yet significant difference between the wild-type and the mutant.

Potential Solution (long-term):

The primary focus should be determining whether a screenable phenotype exists. In this context, we should aim to train a classifier capable of differentiating between the wild-type and the mutant. If the classifier exhibits good accuracy, it suggests the presence of a screenable phenotype.

Previously, this method was not explored due to insufficient replicates. Despite four replicates being inadequate even now, we can execute this approach at a single-cell level, which is the most promising direction moving forward.

Stop-gap solution:

The current mAP-based approach is a reasonable stop-gap solution. We may miss some hits (lower sensitivity), but the called hits will likely be true (high specificity).

Additional notes:

Marzieh and I evaluated whether her previous method of hit calling (as discussed above in this GitHub issue) was fundamentally distinct. It is not remarkably different. Her approach also designates a 'hit' based on the similarity between the wild-type and the mutant, providing it's under a certain threshold. This threshold is based on the similarities between replicates of the same perturbation.

Hence, both methodologies depend on similarities amongst other perturbations to set a threshold. However, we truly need a strategy focusing mainly on the phenotype of the wild-type and the mutant rather than their comparison to other perturbations. A targeted approach like this could yield more precise results when determining 'hits'.

Marzieh and I also discussed the broader concern about the application of supervised methods in profiling, with primary concerns in two specific areas:

Phenotypic Determination: A classifier could be built at the single-cell level to distinguish between the profiles of a perturbation and those of negative controls. A high classification score on a held-out test set suggests the presence of a detectable phenotype. However, while this method could help identify whether a perturbation has a phenotype, it is not conducive to creating a profile for the perturbation that could be used for clustering. Previous approaches using SVM classifiers have yielded profiles but are not wide-ranging enough, as they focus narrowly on the general perturbation effect.
Mechanisms of Action (MOA) Classification: Although there's no inherent issue with constructing a classifier for MOA classification, we risk failing to predict the classes of novel mechanisms. We prefer addressing this problem in a more unsupervised manner.

However, supervised learning can be a perfectly acceptable approach to predicting whether a variant has an impact, because that is the endpoint of the analysis.

One final aspect is whether we'd recommend a supervised (single-cell) approach, even for studies such as LUAD. For instance, should we recommend building a classifier to distinguish between the variant and the reference in the example below, using the accuracy of that classifier to declare whether the variant is a hit? I'm inclined to say yes, but this is open for debate (but we needn't debate that right now)

from 2021_09_01_varchamp.

AnneCarpenter commented on May 29, 2024

I thought through what makes sense to me and perhaps you can cross check if it’s the same as the long-term solution you propose - I think it is!

Anne’s plan:

train a classifier to distinguish a particular variant from its WT? WTgene1_well1 vs VARgene1_well1 and so on for all combinations of replicate wells for each. (I guess at the single cell level?)
caveat: such classifiers may always seem effective (due to plate layout, slight changes in infection efficiency/expression, etc). So, we want to get a sense of what level of classification accuracy is significant.
to create such a null we can use replicates of all samples as the baseline. So the null pairings would be WTgene2_well1 vs WTgene2_well2 and so on for all the WT genes and similarly for all the variants?
(fancy extra detail) We could exclude the query gene itself in creating this null baseline - maybe that’s unnecessary if there are ~382 other samples on the plate. If there are lots of variants of one gene on a plate then we may want to do this step.
(ruled out alternative) for each gene’s null we could instead try to train a classifier to distinguish replicates of only the query gene itself: WTgene1_well1 vs WTgene1_well2 but this will likely always yield ‘successful’ classifiers due to plate layout effects.

Furthermore, I don’t see why the LUAD case is any different than the WT/VAR case of Variant Painting experiments so I would also say Yes to your query there that the same approach is appropriate there.

I made a schematic. Sort of obvious now that I drew it out so I don't expect to spark major insight here, but adding a link to google slide in case it helps anyone think through things.

from 2021_09_01_varchamp.

shntnu commented on May 29, 2024

train a classifier to distinguish a particular variant from its WT? WTgene1_well1 vs VARgene1_well1 and so on for all combinations of replicate wells for each. (I guess at the single cell level?)

We will do this at the single-cell level but will build a single model, so I wasn't sure what you mean by "all combinations of replicate wells". Maybe you are referring to the way we do train-test splits? If so, yes, we'd want to factor in the experimental hierarchy in some way when splitting.

caveat: such classifiers may always seem effective (due to plate layout, slight changes in infection efficiency/expression, etc). So, we want to get a sense of what level of classification accuracy is significant.

to create such a null we can use replicates of all samples as the baseline. So the null pairings would be WTgene2_well1 vs WTgene2_well2 and so on for all the WT genes and similarly for all the variants?

This sounds sensible

(fancy extra detail) We could exclude the query gene itself in creating this null baseline - maybe that’s unnecessary if there are ~382 other samples on the plate. If there are lots of variants of one gene on a plate then we may want to do this step.

No need to exclude the query gene when creating the null in this manner, because our null hypothesis is that the wells are arbitrarily assigned WT and MUT labels. The fancy thing to do would be to have a separate null for each WT-MUT pair, where we only consider the wells of the WT-MUT pair and shuffle their labels. But that is an overkill

(ruled out alternative) for each gene’s null we could instead try to train a classifier to distinguish replicates of only the query gene itself: WTgene1_well1 vs WTgene1_well2 but this will likely always yield ‘successful’ classifiers due to plate layout effects.

Oh, maybe this is the same as my fancy idea right above. We can pay closer attention to the design of the splits when we actually do the experiment.

Furthermore, I don’t see why the LUAD case is any different than the WT/VAR case of Variant Painting experiments so I would also say Yes to your query there that the same approach is appropriate there.

I think you are right

I made a schematic. Sort of obvious now that I drew it out so I don't expect to spark major insight here, but adding a link to google slide in case it helps anyone think through things.

This makes sense

from 2021_09_01_varchamp.

AnneCarpenter commented on May 29, 2024

To address your first comment, I agree it's sensible to build a single model that distinguishes WTgene1 (all wells) vs VARgene1 (all wells) but what I was proposing was different :D I propose actually training a bunch of small classifiers on each pair, like single cells in WTgene1_well1 vs VARgene1_well1 and so on with well2, well3, etc. I should've been more clear and said "all pairs of replicate wells" instead of "all combinations of replicate wells".

One reason to do this pairwise across individual wells is to put error bars on classification accuracy, I guess. But mainly to make it easier to calculate a realistic null because now we can use two replicates of a sample that we KNOW should look alike (WTgene2_well1 vs WTgene2_well2 and so on for all the WT genes and similarly for all the variants). I guess a downside of this approach, though, is that WTgene2_well1 vs WTgene2_well2 is likely to always be same-well-position whereas the query test WTgene1_well1 vs VARgene1_well1 is not :(

Still, if we instead make a single classifier for each WT-MUT, those values will almost certainly always seem to be accurate classifiers (due to technical variations) so to decide if they are significant, we need a suitable null. To make its null we need to get WTs with a similar number of replicates and similar number of single cells to be fair (?). And maybe choose those having similar plate positions? I dunno.

from 2021_09_01_varchamp.

MarziehHaghighi commented on May 29, 2024

Some questions/notes:

My understanding is that we don't really care about overall phenotype impact score on the space of whole perturbations in the experiment. Instead, we care about the score on "if there is any consistent (across all cells) signature for WT versus mutant"? For example a 100% score means that there exist a phenotype that exists for all single cells of WT versus mutant (In contrast to the previous way of unbiased score given the full space, in which 100% (or 1) meant a signature that is in average the most distinct across all WTs and VARs of an experiment). Let me know if there is any flaw in my understanding.
If the above is correct and we indeed care about a phenotype that is consistent across single cells for WT versus VAR, we should be careful of the following:
- The heterogeneity of the samples: we had a huge amount of heterogeneity across the samples of the Taipale Lab rare diseases datasets. I have not looked into subpopulation analysis data for the new VarChamp batches to have an idea of if this has changed in the new experiments. But wanted to give you a heads up given this prior knowledge.
- Number of cells versus number of features for the classification problem, again this maybe is something that doesn't hold in the new batches of data as all the cells are now transfected but in the old batches we had small number of single cells for many wells and we should be careful about (n of features)>> (n of samples) which cause overfitting.
- Position effect: in the first pilot batch of data that we analysed for VarChamp the plate layout was the same. If the position effect is strong, it is problematic for any method of scoring, but more problematic for a classifier which cares about all single cells in a well having a phenotype that other well dont have.
- The overall amount of computational complexity we add to the problem (by using single cells) versus what we gain.
- I make comments on the null after I fully understand your suggestions but for now wanted to give my two cents on this thread.

from 2021_09_01_varchamp.

AnneCarpenter commented on May 29, 2024

Yes, your understanding is correct!

Your bullet points are beautifully helpful. Yes, we do anticipate heterogeneity. Essentially every MUT will be distinguishable (classifiable) from its WT just due to technical variations so we have to be careful how to set the null to know which ones are really distinguishable.

I agree with the rest of the points too.

from 2021_09_01_varchamp.

First Pass - Pilot Variant Painting data analysis about 2021_09_01_varchamp HOT 30 OPEN

Comments (30)

Summary of the results for basic analysis using correlation coefficient metric

Data stats

Replicate correlation + null distributions

Table of all scores

Summary

For protein channel

For non protein channel

Data Stats:

For protein channel

For non-protein channel

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent