Giter Site home page Giter Site logo

paularthurm / pitviper Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 30.5 MB

Snakemake pipeline for meta-analysis of functional screening data

License: GNU General Public License v3.0

Python 70.58% Jupyter Notebook 3.64% Shell 2.80% R 4.34% HTML 18.47% Dockerfile 0.17%
bioinformatics-pipeline snakemake

pitviper's Introduction

alt text

A pipeline to analyze functional screening data from shRNA, CRISPR/Cas9 and CRISPR/dCas9 experiments.

Introduction

PitViper is a versatile and user-friendly pipeline designed to process, interpret, and visualize functional screening data obtained from various experimental methods such as shRNA, CRISPR/Cas9, and CRISPR/dCas9. This repository hosts the codebase for PitViper, providing researchers with a powerful tool for extracting meaningful insights from their screening experiments.

It stands for "Processing, InTerpretation, and VIsualization of PoolEd screening Results." It is a comprehensive pipeline that streamlines the analysis workflow for functional screening data. The pipeline is built using a combination of powerful tools, including Snakemake for workflow management, Flask for a lightweight web framework, and Jupyter for interactive computational documents.

Features

  • Modularity: PitViper supports a wide range of functional screening data, including shRNA, CRISPR/Cas9, and CRISPR/dCas9 experiments.

  • Reproducibility: Built using Snakemake, PitViper ensures reproducibility and scalability in data analysis.

  • User Interface: PitViper offers a user-friendly web-based interface powered by Flask, making it accessible to both beginners and experienced researchers.

  • Flexibility: Whether you have raw FASTQ or BAM files, or a pre-processed count matrix, PitViper adapts to your data input preference.

  • Visualization: Interactive Jupyter Notebook reports facilitate results visualization and interpretation.

Table of contents

Prerequisites

Before using this script, ensure you have the following prerequisites:

  • Conda: Install Conda to manage the application's environment and dependencies. You can download Conda from the official Conda website.

  • Git: If the application's dependencies include external repositories, Git is required for cloning those repositories. You can download Git from the official Git website.

Installation

Using the Automated Script

PitViper can be effortlessly installed using the provided run.sh script. This script sets up the Conda environment, installs all necessary dependencies, and launches the PitViper GUI. Dependencies are only installed during the first run.

# Clone the PitViper repository and navigate to its root directory
$ git clone https://github.com/PaulArthurM/PitViper.git
$ cd PitViper

# Run the installation script
$ ./run.sh

Available options:

--mode: Specify the mode for creating the Conda environment (yaml or lock).
--installer: Specify the Conda package manager to use (conda or mamba).
--noflask: Optionally skip running the Flask application.

Once installation is complete, the PitViper GUI will automatically open in your default web browser, allowing you to start analyzing your functional screening data seamlessly.

alt text

Run PitViper from Docker container

Building Image

PitViper main directory contains a Dockerfile that allows easy building of PitViper image. From this folder simply run:

$ docker build -t [image_name] .

Downloading image from dockerhub

PitViper docker image can also be downloaded from dockerhub using:

$ docker pull lobrylab/pitviper:v1.0

Running container

To start a PitViper container simply run:

$ docker run -p 5000:5000 -p 8888:8888 -v [fastq/bam_files_path]: [fastq/bam_files_path] -n [name_the_container] [PitViper_image_name]

For example:

$ docker run -p 5000:5000 -p 8888:8888 -v /home/data/:/home/data/ -n pitviper lobrylab/pitviper:v1.0

PitViper GUI can now be accessed in your web browser at the address: localhost:5000.

Upon completion of PitViper analysis, the jupyter notebook will be accessible following the localhost:8888/[token] link displayed in the terminal.

To quit PitViper and stop the container simply quit the jupyter notebook session.

Re-starting container

To start a new analysis, just restart the container using the following command :

$ docker start -a [container_name]

Accessing jupyter notebooks of a previous docker PitViper runs

To access the notebook of a previously ran PitViper analysis, first start the container :

$ docker start [container_name]

Then execute the notebook script:

$ docker exec -ti [container_name] bash /PitViper/notebook.sh

The folder containing all previously ran analysis will be accessible in jupyter notebook following the localhost:8888/[token] link displayed in the terminal.

Inputs

PitViper accommodates diverse input data formats, allowing you to initiate an analysis using raw data files, aligned BAM files, or a count matrix. The specific input requirements vary based on your starting data type.

Starting from raw FASTQ files

For this approach, you will need the following:

  1. Path to FATSQ files on your system.
  2. A library file comprising three comma-separated columns without headers: guide ID, guide sequence, target element.
  3. A design matrix summarizing your experimental conditions, their replicates, and associated FASTQ files.

Design matrix

Suppose you have two conditions, A (control) and B (treatment), each with three replicates. Your design matrix should be structured as follows:

condition replicate fastq order
A A_1 /path/to/A_rep1.fastq 0
A A_2 /path/to/A_rep2.fastq 0
A A_3 /path/to/A_rep3.fastq 0
B B_1 /path/to/B_rep1.fastq 1
B B_2 /path/to/B_rep2.fastq 1
B B_3 /path/to/B_rep3.fastq 1

In the order column, the control condition (A) is designated by order = 0, and any condition with an order value greater than 0 (e.g., 1 for B) is treated as a treatment condition. The order should remain consistent across replicates.

Library file

The library file should be comma-separated with three columns: guide's ID, guide's sequence, and the corresponding target element.

guide_A.1,CTTAGTTTTGAACAAGTACA,element_A
guide_A.2,GTTGAGTTATCACACATCAT,element_A
guide_A.3,AATGTAGTGTAGCTACAGTG,element_A
guide_B.1,TTAGTTTATATCTTATGGCA,element_B
guide_B.2,GATTGTCTGTGAAATTTCTG,element_B

Starting from aligned BAM files

Design matrix

If you have aligned BAM files instead of raw FASTQ files, follow these modifications to the design matrix:

  • Replace the fastq column with a bam column, and replace the paths to FASTQ files with paths to BAM files.
condition replicate bam order
A A_1 /path/to/A_rep1.bam 0
A A_2 /path/to/A_rep2.bam 0
A A_3 /path/to/A_rep3.bam 0
B B_1 /path/to/B_rep1.bam 1
B B_2 /path/to/B_rep2.bam 1
B B_3 /path/to/B_rep3.bam 1

Starting from count matrix

Starting from a count matrix eliminates the need for a fastq or bam column:

condition replicate order
A A_1 0
A A_2 0
A A_3 0
B B_1 1
B B_2 1
B B_3 1

However, the replicate column must contain the same labels as those in the count matrix header:

shRNA Gene A_1 A_2 A_3 B_1 B_2 B_3
guide_A.1 element_A 456 273 345 354 587 258
guide_A.2 element_A 354 234 852 546 64 452

PitViper CLI

PitViper CLI is a command-line tool for running the PitViper pipeline.

Usage

$ pitviper.py --configfile <configfile> [--dry_run] [--jobs <jobs>] [--notebook <notebook>]

--configfile: Path to the configuration file. The configuration file must be in YAML format.
--dry_run: If enabled, the pipeline will be run in dry run mode, which means that no commands will be executed. This is useful for testing the pipeline without actually running it.
--jobs: Number of Snakemake rules to run in parallel. The default is 1.
--notebook: Output file(s) of a PitViper rule with "notebook" entry.

Examples

$ pitviper.py --configfile config.yaml

# Run the pipeline in dry run mode
$ pitviper.py --configfile config.yaml --dry_run

# Run the pipeline with 4 jobs in parallel
$ pitviper.py --configfile config.yaml --jobs 4

# Run the pipeline and save the output notebook to notebook.html
$ pitviper.py --configfile config.yaml --notebook notebook.html

pitviper's People

Contributors

paularthurm avatar camillelobry avatar

Watchers

 avatar

pitviper's Issues

Hardcoded cut-off

Min. reads cut-off in GUI.

Remove:

sgRNA_to_keep = sum_by_group[sum_by_group.value > 50].sgRNA.values

and

Move in GUI:

cts['below_threshold'] = cts['value'] < 100

A refactoring of the previous script is necessary to make the purpose of the script clear.

  • Add docstrings
  • Add a main function
  • Remove unused functions
  • Threshold from GUI/config

Save alignment statistics - Bowtie2

Alignment statistics should be save in a results sub-directory, such as results/{token}/bowtie2/statistics.txt.

  • Save statistics in a text file

Run PitViper with subprocess

To improve the function, we could use Python's subprocess library to execute the command instead of using os.system to run the command. This would allow us to access the output of the command and take action based on it. Additionally, we could use string formatters to make the command string more readable and easier to debug. Here is an example of how the function could be improved:

import subprocess

def run_pitviper(configfile, dry_run, jobs, notebook):
    if notebook != "":
        nb_opt = f"--edit-notebook {notebook}"
    else:
        nb_opt = ""
    if dry_run:
        cmd = f"snakemake -s workflow/Snakefile -n --configfile {configfile} --use-conda --cores {jobs}"
    elif not dry_run:
        cmd = f"snakemake -s workflow/Snakefile --configfile {configfile} --use-conda --cores {jobs} {nb_opt}"
    print("Command:", cmd)
    result = subprocess.run(cmd, shell=True, capture_output=True)
    if result.returncode == 0:
        # command succeeded
    else:
        # command failed
  • Use subprocess instead of os.system

Lineplots of normalized counts

Improve visualization by:

  • Compute, display and link the mean normalized read counts of replicates per time-points
  • Display all individual points
  • Show normalization method

Rename in-house method

Find a better and more representative name for "in-house method". Then rename it in all scripts... :(

Normalization

To do:

  • Use raw counts for RRA and MLE
  • Add normalization option to RRA
  • Review the normalization process
  • Remove normalization option of MAGeCK counts

Add Union button

Add a check button for union selection instead of default intersection in integration module.

Bowtie2

Add bowtie2=2.2.4 and samtools=1.14 to env.

To do:

  • Preset options:

--very-fast
Same as: -D 5 -R 1 -N 0 -L 25 -i S,1,2.00

--fast
Same as: -D 10 -R 2 -N 0 -L 22 -i S,1,1.75

--sensitive
Same as: -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default in --local mode)

--very-sensitive
Same as: -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

Heatmap with zeros

Example: SFPQ gene return an error when using show_sgRNA_counts(token):

ValueError: The condensed distance matrix must contain only finite values.

Should add a non-zero value to all cells before using Clustergrammer2.

Improve tool implementation in report

Extensibility was one of the primary goal of PitViper. However, the current implementation of the report make it very difficult and tedious to add new tools.

We should think to a manner to generalize how all functions are used.

Creating a tool agnostic class as a common interface for all results could be a solution. This class should have several characteristics:

  • Generalization: should work with different results and metrics, such as FDR, Bayesian Factor or any others uncertainty measures.
  • Common interface: the API should be consistent across all methods.

To create a new method, two features are mandatory: the name of the elements and at least one numerical score to rank the value (FDR, Bayesian Factor, etc.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.