Giter Site home page Giter Site logo

gwas-pipedream's Introduction

						GWAS-PIPEDREAM

###Description

GWAS-Pipedream (working title) is a pipeline to perform basic SNP, gender, and population QC on genomic data.
Copyright (C) 2015 JBonnie, WMChen

###Contents

####Folder: SHcode makedata.sh
pheno_inc.sh
qc1.sh
relatedness_qc.sh

####Folder: Rcode

####Folder: PYTHONcode

###INPUTS

nickname - alias for the data, used in all filenames
DATAFILE - path to full data table output from GenomeStudio, Top Allele format
phenofile - phenotype file, specific columns expected, see SHcode/pheno_inc.sh for details
covariablecount : integer, the number of covariables to incorporate from the hard-coded list
chip : I/E, character value indicating whether data was generated from the Immunochip (I) or the HumanCoreExomeChip (E)
covariablevalue : integer, the number of WHICH covariable on the list should be included in the table for checking or used to color the graphs

###Order of Scripts

SHcode/makedata.sh
Transforms output from Genome Studio into plink files.
See Script for option/parameter details

SHcode/pheno_inc.sh
This script incorporates phenotypic information (disease status, sex, family relationships) into the raw plink files from makedata.sh.
It creates numeric covariate files based on a hard coded list of column names.

SHcode/qc1.sh
This script runs the initial SNP and Sample QC steps.
It produces a folder full of files in 2_QC1 folder, and, most importantly, a list of SNPs and Samples to be removed.

SHcode/relatedness_qc.sh
This script runs initial relatedness checking on data after the initial QC step.
It uses files in the 2_QC1 folder as well as the covariable lists created during pheno_inc.sh.
The output from this script is used in qc_pdf.sh for graphing.

###Requirements

PLINK
KING
Python
R

###Authors

JBonnie
WMChen

gwas-pipedream's People

Contributors

jessicabonnie avatar

Watchers

 avatar

gwas-pipedream's Issues

Why does makedata.sh reverse allele order?

Around line 137, the alleles are copied from the the full data table to an intermediate tped. Why are they reversed at that time?

awk 'NR>1' $DATAFILE | sed "s/\r//g" |\
  awk -v top1=${top1} -v top2=${top2} -v dif=${dif} '{printf("%s", $top1); for(i=top2;i<=NF;i+=dif) printf(" %s", $i);printf("\n");}' |\
  sed 's/--/0 0/g' |\
 sed 's/\([A-Z]\)\([A-Z]\)/\2 \1/g' > topAllele_tped.tmp

Dealing with duplicate sample IDs in raw data from the start

In theory, if a check has been run on the sample tracking sheet for duplicate IDs, a list of those duplicate IDs could be fed to makedata.sh. Idea: create a new header for the Full Data Table (which is too big to copy/keep) in which one of each duplicate is renamed in the header. Store this header forever after with the fulldata table. attaching it as need be. This will allowing the rest of the QC process to proceed unimpeded without any risk associated with making changes to the original raw data table. It will also allow changes to the sample id names to be traced back to as close to the raw data table as possible.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.