Giter Site home page Giter Site logo

cnngwas's Introduction

cnnGWAS

This is the repository for cnnGWAS, which is a method that captures complex biological patterns shared by GWAS risk variants. By utilizing convolutional neural networks (CNN), the method prioritizes functional variants among all candidate variants in GWAS loci.

Environment setting

Installing miniconda:

Environment setting:

  • All software tests were done under the following environment setting, and therefore we strongly recommend to set the miniconda environment via following commands:
conda create --name cnnGWAS python=2.7
conda activate cnnGWAS
conda install theano=1.0.3 numpy=1.15.4 matplotlib=2.2.3 pandas=0.24.2 bedtools=2.28.0 bedops=2.4.36 mkl=2019.4

Required R packages (R version 3.5.2):

library(ggplot2)
library(gplots)
library(grid)
library(gridExtra)
library(plyr)
library(reshape2)

Required tools:

awk (GNU Awk 3.1.7)
wget (GNU Wget 1.12 built on linux-gnu)

Tools that will be installed:

impG-summary (http://bogdan.bioinformatics.ucla.edu/software/impg/)

Cloning the repository:

  • When environment setting is complete, clone or download the repository. Through the following steps, feature DB will be downloaded and training will take place. Input text file (GWAS summary statistics) is required.

Feature DB preparation

  • Required data can be downloaded by executing the following commands:
cd DB
bash Download_DB.sh 
cd ..

Input file preparation

Input file format:

  • Input summary statistics file should be located under directory "00_RawDatas/". Example input file "AUTsummary.hg19.txt" is provided.
  • Input file must be a tab delimited text file with 6 columns.
  • Each of the columns should have a header row named as follows:
snpid  chr     bp      a1      a2      zscore
rs3131972       chr1    752721  A       G       0.618384984132729
rs3131969       chr1    754182  A       G       1.20540840742781
  • All coordinates are based on hg19 (GRCh37).
  • Raw GWAS summary statistics files can include different information; in most cases, z-score can be calculated from the given information.

Input file name:

  • We recommend simple file prefix, preferably disease name (e.g. SLE), always followed by ".txt" extension.
  • Underbar "_" must not be included inside a file prefix (e.g. PSY_AUT_dis.txt; not allowed), since it will cause string processing error.
  • Example file names: "SLE.txt", "MDD2019.txt"

1. Imputation of association p-value using ImpG-summary

Following command executes the imputation process. Multiple summary statistics files can be given as an input (e.g. bash RUN.sh FILE1.txt FILE2.txt FILE3.txt).

cd 01_RunImpG
bash RUN.sh AUTsummary.hg19.txt   
cd ..

2. Make input pickled files for CNN model training

In this stage, the input files (*.pkl.gz) for training are produced.

cd 02_PreProc
bash RUN.sh AUTsummary.hg19.txt 
cd ..

3. Model training

In this stage, the training itself takes place. As an example, a sample input file is provided (02_PreProc/Data/AUTsummary.hg19_AE_30SNP.pkl.gz).

cd 03_RunCNN
bash RUN.sh AUTsummary.hg19.txt 
cd ..

4. Evaluation of the model performance

The performance of the model is saved as a pdf file. Also, the scores of candidate SNPs are calculated and given as a text file. SNPs with causal score > 0.5 are classified as positive, and are considered as candidate causal variants.

cd 04_Performance
bash RUN.sh AUTsummary.hg19.txt
cd ..

Notes

  • Each folder contains "NOTE" or "RUN_NOTE.txt". Please read the contents before executing the software.

cnngwas's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.