Giter Site home page Giter Site logo

gnomad_lof's Introduction

Code for gnomAD LoF flagship manuscript

Manuscript at: Karczewski et al., 2019

Details

This repo serves as a home for two main purposes:

  • The constraint computation pipeline, written in Hail 0.2.
  • The figure-generating code for the manuscript

Constraint

The constraint pipeline, as initially described here and here, has been updated with a number of improvements as described in the supplement of Karczewski et al. Notably, the pipeline is now written in Hail, which enables scalability to large datasets like gnomAD, and can compute constraint against arbitrary sets of variants.

The main components of the pipeline can be found in constraint/constraint.py, which uses the public gnomAD data and a dataset of every possible variant (~9B variants) to compute the observed and expected number of variants per transcript/gene. This script is provided primarily for reference (it could be run with modifications, but cannot be run as-is, as it has paths to buckets on Google cloud hard-coded). Additionally, we combine the constraint data with aggregate LoF frequencies in constraint/gene_lof_matrix.py - this script cannot be run outside of the gnomAD team, as it requires access to the individual level data, but it is also provided for reference.

Figure-generating code

The code to generate all the figures can be found in R/. These scripts use only aggregated data, and thus, can be run by anyone. They look for data files in ./data, and if they are not found, downloads them as needed from the public data repository.

Each figure panel can be generated individually, or figures as a whole. For instance, in R/fig3_spectrum.R, we provide a function called figure3 which can generate the entirety of figure 3. Alternatively, running the code inside the function can generate each figure panel separately. Note that for some figures, on some R setups, attempting to generate the full figure by calling the function directly can crash R: we are uncertain of the cause of the issue, but it can be resolved by running the code inside the function step-wise.

The code was run using R 3.5.1 using the following packages:

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin18.2.0 (64-bit)
Running under: macOS  10.14.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /opt/local/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] hexbin_1.27.2        bindrcpp_0.2.2       cowplot_0.9.4        RMySQL_0.10.16       DBI_1.0.0           
 [6] ggrepel_0.8.0        pbapply_1.3-4        rlang_0.3.1          tidygraph_1.1.1      STRINGdb_1.22.0     
[11] meta_4.9-4           ggrastr_0.1.7        ggpubr_0.2           ggridges_0.5.1       readxl_1.2.0        
[16] corrr_0.3.0          corrplot_0.84        patchwork_0.0.1      naniar_0.4.1         plotROC_2.2.1       
[21] gghighlight_0.1.0    skimr_1.0.4          gapminder_0.3.0      trelliscopejs_0.1.18 scales_1.0.0        
[26] magrittr_1.5         slackr_1.4.2         plotly_4.8.0         broom_0.5.1          forcats_0.3.0       
[31] stringr_1.3.1        dplyr_0.7.8          purrr_0.2.5          readr_1.3.1          tidyr_0.8.2         
[36] tibble_2.0.1         tidyverse_1.2.1      Hmisc_4.1-1          ggplot2_3.1.0        Formula_1.2-3       
[41] survival_2.43-3      lattice_0.20-38     

loaded via a namespace (and not attached):
 [1] colorspace_1.4-0        visdat_0.5.2            mclust_5.4.2            htmlTable_1.13.1       
 [5] base64enc_0.1-3         rstudioapi_0.9.0        hash_2.2.6              bit64_0.9-7            
 [9] fansi_0.4.0             lubridate_1.7.4         sqldf_0.4-11            xml2_1.2.0             
[13] codetools_0.2-16        splines_3.5.1           knitr_1.21              jsonlite_1.6           
[17] Cairo_1.5-9             cluster_2.0.7-1         png_0.1-7               compiler_3.5.1         
[21] httr_1.4.0              backports_1.1.3         assertthat_0.2.0        Matrix_1.2-15          
[25] lazyeval_0.2.1          cli_1.0.1               acepack_1.4.1           htmltools_0.3.6        
[29] prettyunits_1.0.2       tools_3.5.1             igraph_1.2.2            gtable_0.2.0           
[33] glue_1.3.0              Rcpp_1.0.0              cellranger_1.1.0        gdata_2.18.0           
[37] nlme_3.1-137            autocogs_0.1.1          xfun_0.4                proto_1.0.0            
[41] rvest_0.3.2             gtools_3.8.1            DistributionUtils_0.6-0 hms_0.4.2              
[45] parallel_3.5.1          RColorBrewer_1.1-2      yaml_2.2.0              memoise_1.1.0          
[49] gridExtra_2.3           rpart_4.1-13            latticeExtra_0.6-28     stringi_1.2.4          
[53] RSQLite_2.1.1           plotrix_3.7-4           checkmate_1.9.1         caTools_1.17.1.1       
[57] chron_2.3-53            pkgconfig_2.0.2         bitops_1.0-6            bindr_0.1.1            
[61] labeling_0.3            htmlwidgets_1.3         bit_1.1-14              tidyselect_0.2.5       
[65] plyr_1.8.4              R6_2.3.0                gplots_3.0.1            generics_0.0.2         
[69] gsubfn_0.7              pillar_1.3.1            haven_2.0.0             foreign_0.8-71         
[73] withr_2.1.2             RCurl_1.95-4.11         nnet_7.3-12             modelr_0.1.2           
[77] crayon_1.3.4            utf8_1.1.4              KernSmooth_2.23-15      progress_1.2.0         
[81] data.table_1.12.0       blob_1.1.1              digest_0.6.18           webshot_0.5.1          
[85] munsell_0.5.0           viridisLite_0.3.0       egg_0.4.2              

gnomad_lof's People

Contributors

konradjk avatar averywpx avatar klaricch avatar lfrancioli avatar gtiao avatar

Stargazers

Xiaoru Huang avatar Nicholas Knoblauch avatar Ruchit Panchal avatar Kejun (Albert) Ying avatar Evgenii O. Tretiakov avatar Aidan Nickerson avatar  avatar Yanwei Qi avatar TOM YAN avatar slp avatar Arya avatar

Watchers

James Cloos avatar Daniel MacArthur avatar Monkol Lek avatar  avatar Kaitlin Samocha avatar  avatar  avatar  avatar Daniel Birnbaum avatar  avatar James Ware avatar  avatar Qingbo Seiha Wang avatar  avatar  avatar  avatar Harindra avatar Mike Wilson avatar Christopher Vittal avatar William Phu avatar Katherine Chao avatar  avatar Arcturus Wang avatar  avatar

gnomad_lof's Issues

Code to run UMAP for Figure 1a

I'm looking for the code that runs UMAP to produce Figure 1a, and also runs PCA to produce PC components that are later used by UMAP, but somehow I could not find it here. Where should I look?

The caption of Fig 1a says that you used 10 PCs to run UMAP, but Suppl Materials (page 16) say that you used 7 PCs. So I wanted to check the code, but could not find it.

Thanks!

Issue on mis & lof z score normalization

Hi,

May I know if the missense Z scores on the gnomad website is calculated from here? Seems the labels used for missense variants and LoF variants were incorrect. Line 819 and 820 in constraint_basics.py have labeled outlier genes as "mis_too_many" and "lof_too_many", while the below filters are using "mis_outlier" (line 831) and "lof_outlier" (line 838).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.