Giter Site home page Giter Site logo

yixf-self / meripseqpipe-1 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from canceromics/meripseqpipe

0.0 1.0 0.0 38.94 MB

MeRIPseqPipe:An integrated analysis pipeline for MeRIP-seq data based on Nextflow.

License: MIT License

Dockerfile 0.75% HTML 1.27% R 27.54% Shell 11.79% Perl 8.24% Python 3.27% Groovy 3.32% Nextflow 43.82%

meripseqpipe-1's Introduction

MeRIPseqPipe

MeRIP-seq analysis pipeline arranged multiple alignment tools, peakCalling tools, Merge Peaks' methods and methylation analysis methods.

Nextflow check in Biotreasury install with bioconda Docker

Introduction

Here, we present MeRIPseqPipe, an integrated analysis pipeline for MeRIP-seq data based on Nextflow. It integrates ten main functional modules including data preprocessing, quality control, read mapping, peak calling, peak merging, motif searching, peak annotation, differential methylation analysis, differential expression analysis, and data visualization, which covers the basic analysis of MeRIP-seq data.

All the analysis modules are generated by Nextflow, and all the third-party tools are encapsulated in the Docker container.

Quick Start

  1. install nextflow

  2. pull docker image from dockerhub: kingzhuky/meripseqpipe:dev

  3. cloning this repository

    git clone https://github.com/canceromics/MeRIPseqPipe.git
    nextflow run /path/to/MeRIPseqPipe --help
  4. test it on a minimal dataset with a single command

    nextflow run path/to/meripseqpipe -profile test,docker
  5. Start running your own analysis!

    nextflow run path/to/meripseqpipe -profile docker --designfile designfile.tsv --comparefile compare.txt -resume --aligners star --fasta hg38_genome.fa --gtf gencode.v25.annotation.gtf --rRNA_fasta hg38_rRNA.fasta --outdir path/to/results --skip_createbedgraph --peakMerged_mode rank --star_index hg38/starindex --skip_meyer --skip_matk --methylation_analysis_mode Wilcox-test

See usage docs for more details and all of the available options when running the pipeline.

Documentation

The MeRIPseqPipe documentation is split into the following files:

  1. Usage
    • Parameter Documentation
    • An overview of how the pipeline works, how to run it and a description of all of the different command-line flags.
    • let us know if you need more customization!
  2. Output
    • An overview of the different results produced by the pipeline

Pipeline overview

This pipeline is built using Nextflow and integrates tools as follows:

  • Quality control and preprocessing of raw data
    • fastp: quality trimming and adapter clipping
    • FastQC: generate quality reports
    • RSeQC: assess mapping performance to give more insight into data quality
  • Read alignment
    • STAR: Spliced Transcripts Alignment to a Reference
    • HISAT2: memory efficient splice aware alignment to a reference
    • TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions
    • BWA: fast and accurate short read alignment with Burrows-Wheeler transform
  • Peak calling
    • MACS2: Model-based Analysis of ChIP-seq
    • MeTPeak: a novel, graphical model-based peak-calling method
    • MATK: a deep learning-based MeRIP-seq analysis tool at single-nucleotide-resolution
    • Meyer: a peak-calling tool based on Fisher's exact test
  • Peak merging
    • RobustRankAggreg: a rank aggregation algorithm
    • MSPC: using combined evidence from replicates to evaluate ChIP-seq peaks
    • BEDTools: using "mergeBed" and "intersectBed" function
  • Peak annotation
    • Perl scripts: peak start/end position, gene start/end position, transcript ID, strand, gene type (coding or noncoding, lncRNA or mRNA, etc.), peak location, gene ensemble ID, etc.
    • annotatePeaks.pl: whether a peak is in the TSS (transcription start site), TTS (transcription termination site), Exon (Coding), 5' UTR Exon, 3' UTR Exon, Intronic, or Intergenic and also shows the distance to TSS
  • Motif searching
    • HMOER: Hypergeometric Optimization of Motif EnRichment
  • M6A sites predicition
    • MATK: predict m6A sites at single nucleotide resolution
  • Differential expression analysis
    • featureCounts: read counting relative to gene biotype
    • DESeq2: for differential expression analysis of RNA-Seq, SAGE-Seq, ChIP-Seq or HiC count data
    • edgeR: for differential expression analysis of RNA-Seq, SAGE-Seq, ChIP-Seq or HiC count data
  • Differential methylation analysis
    • QNB: a statistical approach for differential RNA methylation analysis with count-based small-sample sequencing data
    • MATK: using a Bayesian hierarchical model to eliminate the effect of basal expression and quantify the true m6A level by Markov Chain Monte Carlo sampling
    • Wilcox-test: results are generated by custom R scripts referred to RPKM methods
    • DESeq2: use a generalized linear model to detect changes in IP coverage while controlling for differences in Input coverage
    • edgeR: use a generalized linear model to detect changes in IP coverage while controlling for differences in Input coverage
  • Report
    • MultiQC: summarize all results from quality control and alignment
    • R packages

Credits

MeRIPseqPipe was originally written by Xiaoqiong Bao, Kaiyu Zhu.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

BioTreasury

MeRIPseqPipe has been uploaded to BioTreasury(https://biotreasury.rjmart.cn/#/tool?id=61140), welcome to use and comment!

Citation

Xiaoqiong Bao, Kaiyu Zhu, Xuefei Liu, Zhihang Chen, Ziwei Luo, Qi Zhao, Jian Ren, Zhixiang Zuo, MeRIPseqPipe: an integrated analysis pipeline for MeRIP-seq data based on Nextflow, Bioinformatics, 2022;, btac025, https://doi.org/10.1093/bioinformatics/btac025.

Acknowledgements

Thanks to nf-core for the support and guidance!

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

meripseqpipe-1's People

Contributors

juneb4869 avatar canceromics avatar likelet avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.