Giter Site home page Giter Site logo

chizhou-tj / cage-dev Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bm2-lab/cage

0.0 1.0 0.0 2.55 MB

CRISPR KO Analysis based on Genomic Editing data

Home Page: https://github.com/bm2-lab/cage-dev

License: MIT License

Python 89.14% HTML 8.77% Shell 2.09%

cage-dev's Introduction

cage-dev

CRISPR KO Analysis based on Genomic Editing data (development)

Introduction

A CRISPR-cas9 based Genome Editing data analysis pipeline, for the analysis of indels and microhomology patterns, the identification of personalized features correlated to sgRNA KO efficiency on heterogeneous experimental conditions, and the evaluation of the sgRNA KO efficiency based on the CRISPR-Cas9 Knock-Out NGS data or the sgRNA KO assay data.

The ultimate goals of CAGE are (1) CAGE provides a standard CROWDSOURCING platform for the users to share the CRISPR-Cas9 based gene KO data, (2) CAGE provides an efficient interface to analysis and visualize the CRISPR-based KO NGS data, (3) CAGE provides a robust learning pipeline to derive the sequence determinants from heterogeneous genome editing data for different cell types and organisms, and (4) CAGE provides an personalized scoring framework for on-target sgRNA design based on the derived sequence determinants for specific cell types or organisms.

Currently CAGE records the optimal sgRNA KO efficiency prediction models and the personalized score functions in sgRNA design for the following XXX cell types. The optimal results for new cell types as well as the the current ones will be updated timely.

Implementation

  • Python >= 2.7
  • Numpy >= 1.9.2
  • Scipy >= 0.15.1
  • Pandas >= 0.16.0
  • scikit-learn >= 0.16.1
  • lxml >= 3.4.4
  • pyfasta >= 0.5.2
  • bwa >= 0.7.12
  • samtools >= 0.1.19
  • bedtools >= 2.23.0
  • LaTeX (for visualization)

Presetting

Make sure to perform this presetting carefully. Because reference setting is very important.

For the sake of simplicity, we use hg19 as the example.

  1. Download the hg19 genome(fasta file) from UCSC, put it in certain directory, name it hg19.fa and set the directory path as $FASTADB.

  2. Generate bwa index files from hg19.fa, put them in certain directory and set the directory path as $BWADB.

  3. (Optional) Download the hg19 gene annotation files from UCSC, convert it to bed-6 format with the 4th column being the gene name, put them in certain directory and set the directory path as $BEDDB. Here are the renamed file:

File Standard Requirement
hg19ref.bed Refseq required
hg19ucsc.bed UCSC Gene optional
hg19gencode.bed GENCODE optional

Installation

git clone https://github.com/bm2-lab/cage-dev.git

Usage

python cage.py <command> [option] ...

Command

  1. sg Process sgRNA sequences into sgRNA information table
  2. prep Process NGS data into sgRNA-Indel Table
  3. mh Microhomology Detection
  4. indel Feature selection and model prediction on sgRNA OTF ratio based on NGS data
  5. fs Feature selection and model prediction on clearly defined sgRNA KO efficiency
  6. eval sgRNA KO efficiency evaluation
  7. vis Visualization of feature selection result

sgRNA processing

python cage.py sg -s <sgRNA.fq>
	              -o <output directory>
                  -g <reference genome>
				  -t <bwa threads>

For more detail on the options, see python cage.py sg -h.

NGS data preprocessing

  • Single-end
python cage.py prep -s <sg file>
	                -f <reads.fq>
	                -o <output directory>
                    -g <reference genome>
					-t <bwa threads>
  • Paired-end
python cage.py prep -s <sg file>
                    -f <reads_1.fq>
					-r <reads_2.fq>
					-o <output directory>
					-g <reference genome>
					-t <bwa threads>

For more detail on the options, see python cage.py prep -h.

Microhomology detection

python cage.py mh -i <samind file>
                  -o <output directory>
	              -g <reference genome>

For more detail on the options, see python cage.py mh -h.

Feature selection and model prediction on sgRNA OTF Ratio based on NGS data

python cage.py indel -i <samind file>
                     -s <sg file>
                     -o <output directory>
	                 -g <reference genome>

For more detail on the options, see python cage.py indel -h.

Feature selection and model prediction on clearly defined sgRNA KO efficiency

python cage.py fs -i <label file>
                  -s <sg file>
                  -o <output directory>
	              -g <reference genome>
				  -m <lasso|logit>

For more detail on the options, see python cage.py fs -h.

sgRNA KO efficiency evaluation

python cage.py eval -s <sg file>
                    -f <score function file>
                    -o <output directory>
					-g <reference genome>

For more detail on the options, see python cage.py eval -h.

Visualization

python cage.py vis -f <feature report file>
                   -o <output directory>

For more detail on the options, see python cage.py vis -h.

Test

For commands testing, cd test first, then execute the following commands.

  • Testing sg: sh test.sh sg
  • Testing single-end prep: sh test.sh prep_se
  • Testing pair-end prep: sh test.sh prep_pe
  • Testing mh: sh test.sh mh
  • Testing indel without auto detection: sh test.sh indel
  • Testing indel with auto detection: sh test.sh indel_a
  • Testing fs using LASSO without auto detection: sh test.sh fs_las
  • Testing fs using LASSO with auto detection: sh test.sh fs_las_a
  • Testing fs using Logistic Regression without auto detection: sh test.sh fs_log
  • Testing fs using Logistic Regression with auto detection: sh test.sh fs_log_a
  • Testing eval: sh test.sh eval
  • Testing vis: sh test.sh vis

cage-dev's People

Contributors

lq19811015 avatar michaelchuai avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.