Giter Site home page Giter Site logo

yu-1011 / uk_biobank_gwas Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nealelab/uk_biobank_gwas

1.0 1.0 0.0 98 KB

Overview of the data QC, code, and GWAS summary output from the 2017 UK Biobank data release

Python 88.00% Shell 0.34% R 11.66%

uk_biobank_gwas's Introduction

Table of Contents

Updates

With the re-release of UK Biobank genotype imputation (which we term imputed-v3), we have generated an updated set of GWAS summary statistics for the genetics community.

  • Increased the number of phenotypes with application UKB31063 and addtl. custom curated phenotypes (see imputed-v3 Phenotypes)
  • More liberal inclusion of samples (see imputed-v3 Sample QC)
  • Inclusion of more SNPs (see imputed-v3 Variant QC)
  • Updates to our association model (imputed-v3 Association model) Our largest change is that for all phenotypes, we have run a female-only and male-only GWAS along with the full set.

Information and scripts from the previous round of GWAS are available in the imputed-v2-gwas subdirectory

imputed-v3 Phenotypes

  • Auto-curated phenotypes using PHESANT:

  • ICD10 codes (all non-coded individuals treated as controls)

  • Curated phenotypes in collaboration with the FinnGen consortium

  • Phenotypes in both sexes

    • PHESANT: 2891 total (274 continuous / 271 ordinal / 2346 binary)
    • ICD10: 633 binary
    • FinnGen curated: 559
  • Phenotypes in females

    • PHESANT: 2393 total (259 continuous / 257 ordinal / 1877 binary)
    • ICD10: 482 binary
    • FinnGen curated: 412
  • Phenotypes in males

    • PHESANT: 2305 total (262 continuous / 259 ordinal / 1784 binary)
    • ICD10: 439 binary
    • FinnGen curated: 400
  • Unique PHESANT phenotypes: 3011, of which 274 are continuous

  • 4203 total unique phenotypes: 3011 PHESANT + 559 finngen + 633 ICD10

  • Summary files:

    • phenotypes.both_sexes.tsv.gz
    • phenotypes.female.tsv.gz
    • phenotypes.male.tsv.gz
    • phenotype - phenotype ID
    • description - short description of phenotype
    • source - PHESANT auto-curation, ICD10, or FinnGen
    • n_controls - number of QC positive samples responding negatively to phenotype designation (NA if quantitative)
    • n_cases - number of QC positive samples responding affirmatively to phenotype designation (NA if quantitative)
    • n_missing - number of missing QC positive samples
    • n_non_missing - number of non-missing QC positive samples

imputed-v3 Sample QC

  • imputed-v3 parameters
    • Used.in.pca.calculation filter (unrelated samples)
    • sex chromosome aneuploidy filter
    • Use provided PCs for European sample selection to determine British ancestry
      • Use 7 standard deviations away from the 1st 6 PCs
      • Further Filter to self-reported 'white-British' / 'Irish' / 'White'
    • QCed sample count: 361194 samples
  • imputed-v2 parameters
    • Used.in.pca.calculation filter (unrelated samples)
    • sex chromosome aneuploidy filter
    • White.british.ancestry filter
    • QCed sample count: 337199 samples

imputed-v3 Variant QC

  • imputed-v3 parameters
    • Autosomes and X chromosome (but not pseudo-autosomal region or XY)
    • SNPs from HRC, UK10K, and 1KG imputation (~90 million)
    • INFO score > 0.8
    • MAF > 0.0001
      • Exception: VEP annotated Missense and PTV MAF > 1e-6
    • HWE p-value > 1e-10
    • QCed SNP count: 13.7 million
  • imputed-v2 parameters
    • Autosomes only
    • SNPs from HRC imputation (~40 million)
    • INFO score > 0.8
    • MAF > 0.0001
    • QCed SNP count: 10.9 million

imputed-v3 Association model

  • imputed-v3 model
    • Linear regression model in Hail (linreg)
    • Three GWAS per phenotype
      • Both sexes
      • Female only
      • Male only
    • Covariates: 1st 20 PCs + sex + age + age^2 + sexage + sexage2
    • Sex-specific covariates: 1st 20 PCs + age + age^2
    • Extra column for variant confidence in case/control phenotypes
      • column name: expected_case_minor_AC
      • Used to filter out false-positive SNPs when case count is low
      • Blog details here
  • imputed-v2 model
    • Linear regression model in Hail (linreg)
    • Covariates: 1st 10 PCs + sex

uk_biobank_gwas's People

Contributors

howrigan avatar liameabbott avatar astheeggeggs avatar rkwalters avatar

Stargazers

Cuihua Xia avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.