Giter Site home page Giter Site logo

emmadebayos / rnaseqview Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ncbi-hackathons/rnaseqview

0.0 0.0 0.0 6.41 MB

RNA-seq Viewer Team at the NCBI-assisted Boston Genomics Hackathon

License: Creative Commons Zero v1.0 Universal

Shell 0.06% JavaScript 69.52% Python 13.97% Perl 4.18% CSS 1.55% HTML 10.72%

rnaseqview's Introduction

rnaseqview

Visualize genome-wide RNA-Seq data

DOI

The Genome-Wide RNA-Seq Viewer is a web application that enables users to visualize genome-wide expression data from NCBI's Sequence Read Archive (SRA) and Gene Expression Omnibus (GEO) databases.

This repository contains a data pipeline written in Python. It extracts aligned RNA-Seq data from SRA or GEO and transforms it into a format used by Ideogram.js, a JavaScript library for chromosome visualization. The minimal front-end allows users to see the distribution of genes across the entire human genome, and filter them by expression levels in the SRA/GEO sample or gene type.

How to

Broadly, the pipeline does the following:

  1. Get data for an SRR accession from NCBI SRA
  2. Count reads for each gene and normalize expression values to TPM units
  3. Get coordinates and type for each gene from a GFF file in the NCBI Homo sapiens Annotation Release
  4. Format coordinates and TPM values for each gene into JSON used by Ideogram.js

Counter

Counter dependencies

Read counter/deps.txt to know the tools needed to run. You can install all of them from the bioconda channel if you have an enviroment running.

An easy way to install conda:

  wget http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh
  bash Miniconda-latest-Linux-x86_64.sh -b -p ~/install

Counter how to

First, cd counter.

counter.py script gets gene expression stored in NCBI's SRA database.

Run python counter.py to show information on how to use the script. It accepts SAM/BAM files or SRA accession numbers like SRR562646.

python $PATH/counter.py --inp SRR562645 --out SRR562645_counts

This will connect to NCBI and gets the genome reference used for the alignment. In case there is no alignment information, it will stop. It will download the gene annotation from NCBI; only GRCh37 and GRCh38 are supported right now.

You can use it like so:

python $PATH/counter.py --inp SRR562645.bam --out SRR562645_counts --gtf GTF_file

and it will use the given GTF to create the count data. GTF needs to have ID and gene in the attributes field.

Counter outputs

counter.py creates 2 outputs:

  • *.tsv: with absolute read counts per gene
  • *_norm.tsv: with counts/kb per gene (TPM)

Formatter

Run the formatter.py script which converts the output from the Counter to JSON format. Example

formatter.py --type srr --lookup gene_lookup_GRCh37.tsv --inp SRR562645_counts_norm.tsv --out SRR562645.json

Visualization

After running the steps above, you can plug the JSON data into Ideogram.js to view and filter RNA-Seq data on the entire human genome.

Visualization of a filtered genome-wide expression dataset for SRR562646

rnaseqview's People

Contributors

eweitz avatar lpantano avatar jingzhizhu avatar trollgrr avatar dcgenomics avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.