Giter Site home page Giter Site logo

tsnorri / panvc3 Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 863 KB

Variant calling using a pan-genomic reference, version 3

License: MIT License

Makefile 5.30% C++ 86.66% Shell 0.54% CMake 0.06% Python 7.09% C 0.18% M4 0.17%
founder-sequences genotyping next-generation-sequencing variant-calling

panvc3's Introduction

PanVC 3

PanVC 3 is a set of tools to be used as part of a variant calling workflow that uses short reads as its input. The reads are aligned to an index generated from a multiple sequence alignment. A suitable index may be built from founder sequences.

Running a variant calling workflow that utilises PanVC may consist of e.g. the following phases:

  • Generating founder sequences from known variants
  • Indexing the founder sequences
  • Running the read alignment and variant calling workflow

The founder sequences may be generated with vcf2multialign.

Academic Use

If you use the software in an academic setting, we kindly ask you to cite Tackling reference bias in genotyping by using founder sequences with PanVC 3.

@article{Norri2024TacklingReferenceBias,
  author = {Norri, Tuukka and Mäkinen, Veli},
  title = {Tackling reference bias in genotyping by using founder sequences with PanVC 3},
  journal = {Bioinformatics Advances},
  volume = {4},
  number = {1},
  pages = {vbae027},
  year = {2024},
  month = {03},
  issn = {2635-0041},
  doi = {10.1093/bioadv/vbae027},
  url = {https://doi.org/10.1093/bioadv/vbae027},
  eprint = {https://academic.oup.com/bioinformaticsadvances/article-pdf/4/1/vbae027/56912765/vbae027.pdf},
}

Running

A simple example workflow and test data are provided in the test-workflow subdirectory. The workflow downloads PanVC 3 as well as other required software automatically from Anaconda. Please see README.md in the subdirectory.

A more complex workflow that uses Bowtie 2 and loads the settings using Snakemake’s configuration (e.g. a YAML file) is in the bowtie2-workflow subdirectory. Please see README.md in the subdirectory.

Contents

  • index_msa builds a co-ordinate transformation data structure from a multiple sequence alignment, as well as the sequences as unaligned FASTA to be used as input for a read aligner.
  • project_alignments uses the co-ordinate transformation data structure to project alignments in BAM or SAM format to well-known co-ordinates, rewrites the CIGAR strings to match the new reference sequence, and realigns parts of the reads if needed.
  • recalculate_mapq recalculates the mapping qualities of the alignments given as input, taking into account the projected co-ordinate of each alignment.
  • subset_alignments subsets the given alignments by some criteria, e.g. selecting the (paired) alignment with the best mapping quality for each read.
  • count_supporting_reads counts the number of aligned reads that support some known variants. From the output, reference bias can be calculated with calculate_reference_bias.py.
  • rewrite_cigar replaces sequence match operations in CIGAR strings (= and X) with alignment match operations (M) and vice-versa.

Please use the --help option with each of the tools for usage. See also the workflow written for the test data.

Installing

Binaries for Linux on x86-64 are available on Anaconda. PanVC 3 may be installed with conda install -c tsnorri -c conda-forge panvc3=v1.0. glibc 2.28 or newer is required. (ldd --version may be used to check the version installed with your operating system.)

Building

To clone the repository with submodules, please use git clone --recursive https://github.com/tsnorri/panvc3.git.

A conda package can be built with conda-build as follows. The build script has been tested with conda-build 3.25.0. glibc 2.28 or newer is required.

  1. cd conda
  2. ./conda-build.sh

Conda-build will then report the location of the package from which binaries may be extracted.

By Hand

The following software and libraries are required to build PanVC 3. The tested versions are also listed.

The following are needed to build libdispatch (provided as a Git submodule) on Linux:

After installing the prerequisites, please do the following:

  1. Create a file called local.mk in the root of the cloned repository to specify build variables. One of the files linux-static.local.mk and conda/local.mk.m4 may be used as a starting point.
  2. Run Make with e.g. make -j16.

panvc3's People

Contributors

tsnorri avatar

Stargazers

 avatar

Watchers

 avatar  avatar

panvc3's Issues

Handle CIGAR stored in CG tag

The BAM file format allows CIGAR strings to contain at most 65535 operations (since the number of operations is stored in an unsigned 16-bit integer). In these cases, the complete CIGAR is stored using the CG tag. Either our tools or SeqAn 3 should handle this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.