Giter Site home page Giter Site logo

itsroops / variant-remapping Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ebivariation/variant-remapping

0.0 0.0 0.0 285 KB

The pipeline for remapping VCF variants between two arbitrary FASTA assemblies.

License: Apache License 2.0

Shell 7.79% Python 48.60% Nextflow 43.61%

variant-remapping's Introduction

variant-remapping

Pipeline for remapping VCF variants between two arbitrary assemblies in FASTA format. No chain file is required. However, it does assume that the source and destination genomes are closely related and was designed with the explicit purpose of lifting over variants from one version of the genome to another.

Method: creates reads from the flanking sequences of each variant, then maps them to the new assembly using minimap2.

Currently, it only SNPs and short indels but has not been tested with larger or more complex variants.

Prerequisites

To run this pipeline you will need to install and configure Nextflow version 20.7 or later. The pipeline uses other software that needs to be downloaded and installed locally. You can obtain them manually or use Miniconda.

Installation using conda

git clone https://github.com/EBIvariation/variant-remapping.git
conda env create -f variant-remapping/conda.yml
conda activate variant-remapping
pip install -r variant-remapping/requirements.txt

Installation without conda

Download, manually install the following program and make sure the executable are in your PATH

Then run

git clone https://github.com/EBIvariation/variant-remapping.git
pip install -r variant-remapping/requirements.txt

Testing the installation

Run the test script to check that you have all the right dependencies installed properly

tests/test_pipeline.sh

Executing the pipeline

nextflow run main.nf 
    --oldgenome <genome.fa> \
    --newgenome <new_genome.fa> \
    --vcffile <source.vcf> \
    --outfile <remap.vcf>

Input

  • --oldgenome: Old genome assembly file (FASTA format): the genome you have variants for.
  • --newgenome: New genome assembly file (FASTA format): the genome you want to remap the variants to.
  • --vcffile: Variants file (VCF format): contains the list of variants you want to remap.

Output

--outfile specify a VCF file containing:

  • remapped coordinates (chromosome and position on the new assembly)
  • the new REF alleles from the new assembly
  • the ALT field possibly modified if the strand or REF has changed ID, QUAL, FILT and INFO columns of the input VCF
  • Additional fields in the INFO column
  • FORMAT and Sample columns if they were present in the input

Other files are created alongside the main output:

  • <output>_nra_variants.vcf variants successfully remap that landed in a position where the reference allele changed. The output contains the original variant and the original reference allele as alternate.
  • <output>_unmapped.vcf original variant that could not be successfully remap
  • <output>_count.yml YAML file containing counts associated with each round of remapping

variant-remapping's People

Contributors

andresfsilva avatar apriltuesday avatar mistyskye avatar nitin-ebi avatar tcezard avatar tskir avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.