Giter Site home page Giter Site logo

angrymaciek / snakemake_qdgedtu Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 85 KB

RNA-Seq data analysis: Quantification, Differential Gene Expression, Differential Transcripts Usage

License: Apache License 2.0

Python 34.27% Shell 1.37% R 64.36%
snakemake-pipeline rna-seq differential-gene-expression differential-transcripts

snakemake_qdgedtu's Introduction

Snakemake pipeline for transcripts expression analyses

Maciej_Bak
Swiss_Institute_of_Bioinformatics

This is a small snakemake pipeline I have put together for quantification of transcripts expression, differential gene expression and differential transcripts usage from RNA-Seq data.
Transcript quantification is performed with salmon.
Differential Gene Expression (DESeq and edgeR) and Differential Transcripts Usage (DEXSeq and DRIMSeq) according to: https://f1000research.com/articles/7-952/v3

Snakemake pipeline execution

Snakemake is a workflow management system that helps to create and execute data processing pipelines. It requires Python 3 and can be most easily installed via the bioconda package from the anaconda cloud service.

Step 1: Download and installation of Miniconda3

Linux:

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source .bashrc

macOS:

wget https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
bash Miniconda3-latest-MacOSX-x86_64.sh
source .bashrc

Step 2: Pandas and Snakemake installation

To execute the workflow one would require pandas python library and snakemake workflow menager.
Unless a specific snakemake version is specified explicitly it is most likely the best choice to install the latest versions:

conda install -c conda-forge pandas
conda install -c bioconda snakemake

In case you are missing some dependancy packages please install them first (with conda install ... as well).

Step 3: Pipeline execution

Specify all the required information (input/output/parameters) in the config.yaml. Apart from the genome in FASTA and genomic annotation in GTF the user needs to provide a TSV design table with detailed information about sequencing data. A sample design table is available in the repository as sample_design_table.tsv. Note that:

  • the design table requires 4 columns; column names are fixed;
  • first column stands as sample ID; last column represents the condition ('treated' or 'untreated' only);
  • 2nd and 3rd column are paths to the fastq/fasta files and their order does not matter.
  • in case the sequencing data are single-end please provide paths only in the 2nd columnd and leave the 3rd column with empty strings.

Once the metadata are ready write a DAG (directed acyclic graph) into dag.pdf:

bash snakemake_dag_run.sh

There are two scripts to start the pipeline, depending on whether you want to run locally or on a SLURM computational cluster. In order to execute the workflow snakemake automatically creates internal conda virtual environments and installs software from anaconda cloud service. For the cluster execution it might be required to adapt the 'cluster_config.json' and submission scripts before starting the run.

bash snakemake_local_run_conda_env.sh
bash snakemake_cluster_run_conda_env.sh

License

Apache 2.0

snakemake_qdgedtu's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.