Giter Site home page Giter Site logo

apriltuesday / variant-remapping Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ebivariation/variant-remapping

0.0 0.0 0.0 256 KB

The pipeline for remapping VCF variants between two arbitrary FASTA assemblies.

License: Apache License 2.0

Shell 31.41% Python 21.77% Nextflow 46.82%

variant-remapping's Introduction

variant-remapping

Pipeline for remapping VCF variants between two arbitrary assemblies in FASTA format. No chain file is required.

Method: creates reads from the flanking sequences of each variant, then maps them to the new assembly using bowtie2.

Currently, it only SNPs and short indels but has not been tested with larger or more complex variants.

Prerequisites

To run this pipeline you will need to install and configure Nextflow. The pipeline uses other software that needs to be downloaded and installed locally. You can obtain them manually or use Miniconda.

Installation using conda

git clone https://github.com/EBIvariation/variant-remapping.git
conda env create -f conda.yml
conda activate variant-remapping
pip install -r requirements.txt

Installation without conda

Download, manually install the following program and make sure the executable are in your PATH

Then run

git clone https://github.com/EBIvariation/variant-remapping.git
pip install -r requirements.txt

Testing the installation

Run the test script to check that you have all the right dependencies installed properly

tests/test_pipeline.sh

Executing the pipeline

nextflow run main.nf 
    --oldgenome <genome.fa> \
    --newgenome <new_genome.fa> \
    --vcffile <source.vcf> \
    --outfile <remap.vcf> \
    [--flankingseq 50] \
    [--scorecutoff 0.6] \
    [--diffcutoff 0.04]

Input

  • --oldgenome: Old genome assembly file (FASTA format): the genome you have variants for.
  • --newgenome: New genome assembly file (FASTA format): the genome you want to remap the variants to.
  • --vcffile: Variants file (VCF format): contains the list of variants you want to remap.
  • --flankingseq: The length of the flanking sequences that generate the reads.
  • --scorecutoff: Percentage of the flanking sequences that should be used as Alignment Score cut-off threshold.
  • --diffcutoff: Percentage of the flanking sequences that should be used as AS-XS difference cut-off threshold.

Output

--outfile specify a VCF file containing:

  • remapped coordinates (position on the new assembly)
  • rsIDs
  • the correct chromosome/contig names
  • the new REF alleles (reverse strand mapping taken into account)
  • the ALT, QUAL, FILT and INFO columns of the input VCF

Example:

"I want to remap the variants in droso_variants_renamed.vcf from droso_dm3.fasta to droso_dm6.fasta (its accession is: GCA_000001215.4), with flanking sequences of 50 bases, which will create 101-base reads. The alignment score cut-off will be -(50 x 0.6) = -30, meaning that reads with alignment scores lower than -30 will not be kept. The AS-XS difference threshold will be 0.04, meaning that AS-XS with a difference of less than 50 * 0.04 = 2 will not be kept. The remapped variants will be in test.vcf."

variant-remapping's People

Contributors

mistyskye avatar tcezard avatar tskir avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.