Giter Site home page Giter Site logo

scrna-seq's Introduction

Snakemake workflow: scrna-seq

Snakemake Last commit Release 中文说明

A Snakemake workflow for single-cell RNA-seq analysis

🤪 Authors

📦 Analysis module

  1. Difference analysis This module includes Differential Gene Analysis, GO and KEGG Pathway Enrichment, and GSEA. The Seurat package was used for differentially expressed gene analysis; the clusterProfiler package was used for functional enrichment and GSEA.

  2. Score This module includes the GSVA algorithm to score functional terms under several categories of H, C2, C5, C6 and C7 in the MSigDB database. The GSVA package was used for scoring; the limma package was used for differential expression pathway analysis.

  3. Transcription factor prediction This module uses pyscenic for transcription factor prediction, and uses the limma package for differential analysis of transcription factors between groups.

🕹️ Usage

1. Input data request

The input data is the rds file of the Seurat object. The meta.data of the Seurat object should include cell type cell_type and group information (for example: group, which includes two groups, such as Normal and Tumor).

2. Obtain a copy of this workflow

cd ~/
git clone https://github.com/zerostwo/scrna-seq.git

3. Configure workflow

Configure the workflow according to your needs via editing the file config.yaml.

#### Required content ----
# Absolute path to Seurat object
INPUT: /home/duansq/pipeline/scrna-seq/resources/test_data.rds
# Grouping information, ensure that the grouping field exists in the meta.data of the Seurat object, and only contains two groups
GROUP: METTL3_group
# Set the positive group you want to use for comparison
TREATMENT: positive
# Differentially expressed gene analysis method (optional: MAST, bimod, wilcox, LR, t)
TEST_METHOD: wilcox
# Set up the assay for analysis, usually RNA
ASSAY: RNA
# Set your species (optional: Homo sapiens or Mus musculus)
SPECIES: Homo sapiens
# Score method (optional: GSVA, AddModuleScore, AUCell)
SCORE_METHOD: AddModuleScore
#### Software settings ----
# pyscenic path
PYSCENIC_PATH: /opt/pySCENIC/0.11.2/bin/pyscenic
# The path of python where pyscenic is located
PYTHON_PATH: /opt/pySCENIC/0.11.2/venv/bin/python
# pyscenic annotation file. Download from https://pyscenic.readthedocs.io/en/latest/installation.html#auxiliary-datasets
ANNOTATIONS_FILE_PATH: /DATA/public/cisTarget_databases/human/motifs-v9-nr.hgnc-m0.001-o0.0.tbl 
# pyscenic database. Download from https://resources.aertslab.org/cistarget/
DATABASE_FILE_PATH: 
  /DATA/public/cisTarget_databases/human/hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather
  /DATA/public/cisTarget_databases/human/hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather
# A list of transcription factors. Download from https://github.com/aertslab/pySCENIC/tree/master/resources
TF_LIST: /DATA/public/cisTarget_databases/resources/hs_hgnc_tfs.txt
#### Optional ---- 
# Selected p-value and log2(fold change) threshold when performing differentially expressed gene analysis
P_VALUE: 0.05
LOG2FC: 0.25

4. Execute workflow

# 1. Switch to the pipeline path
cd ~/scrna-seq
# 2. Activate the snakemake environment
conda activate snakemake
# 3. Test your configuration by performing a dry-run via
snakemake -np
# 4. Execute the workflow locally via
snakemake --cores 10

📂 Result file description

After the program is fully run, the results are generated under the results folder. Five folders are usually generated under each program, the structure is as follows:

test_data
├── benchmark
├── deg
├── function
│   ├── GO
│   ├── GSEA
│   └── KEGG
├── logs
├── scenic
└── score
  • benchmark contains the CPU, memory and time consumed by each analysis script;
  • deg contains the result file of differentially expressed gene analysis;
  • function contains three subfolders, which are the results of GO and KEGG enrichment analysis and GSEA;
  • logs contains log files generated by each analysis script run;
  • scenic contains transcription factor prediction and between-group difference analysis results;
  • score contains a subfolder for scoring GSVA and the results of component variance analysis.

scrna-seq's People

Contributors

zerostwo avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

juzheng87

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.