Giter Site home page Giter Site logo

anavalente / dna-methylation-analysis Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 1.26 MB

DNA methylation analysis pipeline for reduced representation bissulfite sequencing data

License: GNU General Public License v3.0

Nextflow 31.69% R 38.81% Python 25.24% Shell 4.27%
methylation methylation-analysis nextflow rrbs rrbs-data-analysis rrbs-pipeline

dna-methylation-analysis's Introduction

DNA-methylation-analysis

This RRBS Nextflow pipeline was created to discover the genes associated with differentially methylated regions from the CpG methylation patterns using MethylDackel and Metilene.

The pipeline inputs BAM files, and outputs multiple txt and bedGraph files (according to the number of samples):

  • Per base methylation metrics (.bedGraph)
  • Differentially methylated regions (.bedGraph)
  • Correlation matrix and PCA (.png)
  • Heatmap with signature differences between the controls and samples (.pdf)
  • Genomic distribution across the hg38 reference genome of CpGs with different methylation frequencies between samples and controls (.png)
  • Genomic distribution across the hg38 reference genome of differentially methylated regions (.png)
  • Closest RefSeq genes (version from 2023-11-24) to the differentially methylated regions (.txt/.bedGraph)
  • Venn diagram of the closest genes (only if two or more samples were used as input) (.png)

image

Install conda environment

To use this pipeline you need to have installed conda and Nextflow.

git clone https://github.com/AnaValente/DNA-methylation-analysis/
cd DNA-methylation-analysis
conda env create -f methylation_env.yml
conda activate methylation

Usage

Mandatory inputs:

  • --files           Path to scripts and samples folder
  • --samples        [String] sample names separated by comma (always write the control name first!)
  • --replicates   [Integer] number of sample replicates
  • --genome          Path to the hg38 reference genome file (.fa.gz) (available in: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz)

Note: All samples and additional files must be placed in the scripts folder

Optional inputs:

  • --concat       Option to concatenate BAM files from different runs
  • --cell_tpm     Optional file containing two collumns, one with gene names and the other with expression levels in transcripts per million (TPM) for a cell line or cell type identical or similar to the cells under study (available in: https://www.ebi.ac.uk/gxa/experiments/E-MTAB-2770/Results) for gene name filtering
  • --cutoff_regions    [Integer] cutoff (from 1 to 100) for the difference between samples methylation frequency vs control methylation frequency for genomic annotations (default: 75)
  • --cutoff_heatmap    [Integer] cutoff (from 1 to 100) for the difference between samples methylation frequency vs control methylation frequency for clustering analysis (default: 100)

Examples

Example

nextflow run Methylation_pipeline.nf --files "Scripts/*" --samples 'Control','Sample1','Sample2' --replicates 2 --genome Scripts/hg38.fa.gz

Example with BAM concatenation

nextflow run Methylation_pipeline.nf --files "Scripts/*" --samples 'Control','Sample1','Sample2' --replicates 2 --genome Scripts/hg38.fa.gz --concat

Example with genes filtered by file

nextflow run Methylation_pipeline.nf --files "Scripts/*" --samples 'Control','Sample1','Sample2' --replicates 2 --genome Scripts/hg38.fa.gz --cell_tpm E-MTAB-2770-query-results.tsv 

Example with different cutoffs

nextflow run Methylation_pipeline.nf --files "Scripts/*" --samples 'Control','Sample1','Sample2' --replicates 2 --genome Scripts/hg38.fa.gz --cutoff_regions 50 --cutoff_heatmap 75

dna-methylation-analysis's People

Contributors

anavalente avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.