Giter Site home page Giter Site logo

serena-aneli / recombulator-x Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.74 MB

:dna: recombination and mutation rates inference from polymorphisms (STRs, SNPs and INDELs) along the X chromosome

Home Page: https://serena-aneli.github.io/recombulator-x/

License: MIT License

Python 14.08% Jupyter Notebook 85.92%
forensic-genetics forensics-tools recombination short-tandem-repeats x-chromosome pedigrees

recombulator-x's Introduction

RECOMBULATOR-X



recombulator-x is a Python module and command line tool for computing the recombination rates between short tandem repeats (STRs) markers and other polymorphisms (SNPs and INDELs) along the X chromosome starting from pedigree data in forensic genetics.



๐Ÿ“– Documentation website

๐Ÿ“„ Please cite Paper



recombulator-x is written in Python (3.7) and can be used either as a module or as a command-line tool. It is the first open source implementation of the estimation method introduced in Nothnagel et al., 2012, which is the gold-standard for the estimation of recombination rates for X-chromosomal markers. We designed recombulator-x to solve some practical issues with the original R implementation. Its main advantages are:

  • performance: much faster than the original implementation, thanks to algorithmic improvements (dynamic programming)
  • open source: full source code and documentation available from github
  • input parsing: reads pedigree data in standard (PED) format
  • user friendly: easy installation (via pip) and usage with a simple command-line tool
  • comprehensive toolkit: it can deal with short tandem repeats, SNPs and INDELs.

We thank Prof. Michael Nothnagel for kindly sharing the original R implementation with us, which was an important reference for the development.

๐Ÿ“– Documentation

Full documentation is available online at the ๐Ÿ“– dedicated website, or in this repository under docs.

๐Ÿ”ง Installation

You can install recombulator-x via the pip command from the standard PyPI repository:

pip install recombulator-x

๐ŸŽ“ Overview

STRs located on the X chromosome are a valuable resource for solving complex kinship cases in forensic genetics thanks to their peculiar inheritance mode. At the same time, the usage of multiple markers linked along the same chromosome, while increasing the evidential weight, also requires proper consideration of the recombination rates between markers in the biostatistical evaluation of kinship.

For more details on X-STR kinship analyses in forensic see Gomes et al., 2020 and Tillmar et al., 2017.

In the case of forensic X-STRs, recombination rates have been either inferred from population samples through high-density multi-point single nucleotide polymorphism (SNP) data or directly estimated in large pedigree-based studies.

The main statistical approach for the estimation of recombination rates from pedigrees computes the likelihood of kinship by taking into account all possible recombinations within the maternal haplotype, thus resorting to the exponential complexity of the underlying algorithm (see Nothnagel et al., 2012 for a thorough description of the likelihood computation). Despite a computational update in C++ allowing multi-core parallelization, this approach is expected to be unsuitable when panels of more than 15 X-STRs are considered (Diegoli et al., 2016).

We developed recombulator-x to overcome this issue. Built upon the same statistical framework of the previous work (Nothnagel et al., 2012), recombulator-x uses a new computational strategy to infer recombination rates for X-STRs, while taking also the probability of mutation into account.

๐Ÿ’ฅ Additional features

  • Recombulator-X can analyse also SNPs and INDELs.
  • Data consistency checks
  • Automatic family preprocessing and informative family extraction
  • Multiple likelihood implementations included
  • Accelerated likelihood computation with the JIT Python compiler Numba
  • Mutation rates can be estimated for each marker separately or as a unique parameter
  • Simulation of pedigrees typed with STRs
  • Optional bootstrapping available

๐Ÿš€ Benchmark

recombulator-x far exceeds the computational speed of the previous approach and it is scalable to many more markers.
Indeed, performance has been the main focus of recombulator-x: in a test with simulated data of the same size as the two previous works, the time necessary for the likelihood computation of a single family drops from "several months" on 32 cores of a HPC node for the previous approach to 20 minutes on a single core with recombulator-x. This is due to algorithmic improvement time complexity going from exponential to linear with our approach. Conversely, even though the algorithm time complexity is still exponential for type II families, the speed improvement is substantial with respect to the the previous implementation. Moreover, its capacity of dealing with SNPs and INDELs makes it a comprehensive toolkit for addressing linked markers in forensics.

๐Ÿ’ป Usage

recombulator-x uses the PED files based on PLINK pedigree files as input. The PED file format stores sample pedigree information (i.e., the familial relationships between samples) and the genotypes. More information on the input file can be found in this repository under docs/3_usage.md.

The program can be used both as a Python module or a command-line tool. A detailed notebook for the Python module can be found here.

recombulator-x's People

Contributors

gbirolo avatar serena-aneli avatar

Watchers

 avatar

recombulator-x's Issues

issues with ped reading

Check the following comments in recombulatorx/io.py

FIXME separator may be any space

FIXME na values should be 0 (or -9 for phenotype)

FIXME handle headerless peds

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.