Giter Site home page Giter Site logo

skewx's Introduction

SkewX

Nextflow DOI

Introduction

SkewX is a nextflow pipeline to measure skewed X inactivation from long-read sequencing of native DNA, either with Pacbio or Nanopore or technologies. It starts from bam files that include modified basecalls for 5mCG. It first calls heterozygous variants with DeepVariant and phases them into haplotypes with WhatsHap. Then it also clusters reads based on their methylation profile over CpG islands, and pools this haplotype and epiallele information to measure the skew in X inactivation.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

Pipeline summary

The required input is modbam files with 5mCG information. Then:

  1. If the reads are not already aligned, align to the reference genome with 'Minimap2'
  2. If multiple samples per individual are present, for instance multiple tissues, merge them into a single bam file
  3. Call variants with 'DeepVariant'
  4. Phase SNPs with 'WhatsHap'
  5. Haplotype and tag reads with 'WhatsHap'
  6. Cluster reads based on methylation profile with 'NanoMethViz'
  7. Measure skew in X inactivation and generate a report for each individual.

Quick Start

  1. Install or module load Nextflow (>=21.10.3)

  2. Install any of Docker, Singularity (you can follow this tutorial), Podman, Shifter or Charliecloud for full pipeline reproducibility (you can use Conda both to install Nextflow itself and also to manage software within pipelines. Please only use it within pipelines as a last resort; see docs).

  3. IMPORTANT - ensure you mount singularity to your home directory (include "export NXF_SINGULARITY_HOME_MOUNT=true" in your .bashrc or to your session environment before launching pipeline - by default Singularity will not be able to find your home)

  4. Ensure required files (.bed files, .fa reference) are properly specified as parameters in the config (nextflow.config)

  5. Start running your own analysis!

    nextflow main.nf --input samplesheet.csv --outdir skew_results --fasta chm13v2.0.fa --cgi CGIs_CHM13v2_chrX.bed -profile singularity

Documentation

Example data

An example dataset is available in the test_data directory of this repository. The dataset contains a small region of the mouse X chromosome, with a BAM file with methylation information. The pipeline can be run on this dataset with the following command:

nextflow main.nf --input test_data/samplesheet.csv --outdir skew_test_results --fasta test_data/mm10_chrX.fa --cgi test_data/mm10_chrX_CGI.bed -profile test

Credits

SkewX was originally written by Quentin Gouil, James Lancaster and Ed Yang.

We thank the following people for their extensive assistance in the development of this pipeline:

  • Kathleen Zeglinski for her superior nextflow expertise
  • Shian Su for implementing new features in NanoMethViz

Citations

If you use SkewX for your analysis, please cite it using the following doi: 10.1101/2024.03.20.585856

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

skewx's People

Contributors

edoyango avatar qgouil avatar jlancaster95 avatar

Stargazers

Xueyi Dong avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.