Giter Site home page Giter Site logo

ga4gh / benchmarking-tools Goto Github PK

View Code? Open in Web Editor NEW
182.0 53.0 47.0 637.68 MB

Repository for the GA4GH Benchmarking Team work developing standardized benchmarking methods for germline small variant calls

License: Apache License 2.0

Shell 0.52% Python 0.96% HTML 96.98% JavaScript 1.15% CSS 0.39%
ga4gh variant-calls benchmarking genomics genome-sequencing standardization reference-materials

benchmarking-tools's Introduction

Germline Small Variant Benchmarking Tools and Standards

This repository hosts the work of the Global Alliance for Genomics and Health (GA4GH) Benchmarking Team, which is developing standardized performance metrics and tools for benchmarking germline small variant calls. This Team includes representatives from sequencing technology developers, government agencies, academic bioinformatics researchers, clinical laboratories, and commercial technology and bioinformatics developers. We have worked towards solutions for several challenges faced when benchmarking variant calls, including (1) defining high-confidence variant calls and regions that can be used as a benchmark, (2) developing tools to compare variant calls robust to differing representations, (3) defining performance metrics like false positive and false negative with respect to different matching stringencies, and (4) developing methods to stratify performance by variant type and genome context. We also provide links to our reference benchmarking engines and their implementations, as well as to benchmarking datasets.

A manuscript from the GA4GH Benchmarking Team describing best practices for benchmarking germline small variant calls is on bioRxiv, and we ask that you cite this publication in any work using these tools: https://doi.org/10.1101/270157

** Note: This site is still a work in progress. **

Standards and Definitions

See doc/standards/ for the current benchmarking standards and definitions.

Reference tool implementations

The primary reference implementation of the GA4GH Benchmarking methods is hap.py, which enables users to choose between vcfeval (recommended) and xcmp as the comparison engine, and use of GA4GH stratification bed files to assess performance in different genome contexts. A web-based implementation of this tool is available in GA4GH Benchmarking app from peter.krusche on precisionFDA.

Other reference implementations following the standards outlined above are available at tools/. These are submodules which link to the original tool repositories.

Benchmarking Intermediate Files

The benchmarking process contains a variety of steps and inputs. In doc/ref-impl/, we standardise intermediate formats for specifying truth sets, stratification regions, and intermediate outputs from comparison tools.

Benchmarking resources

In resources/, we provide files useful in the benchmarking process. Currently, this includes links to benchmarking calls and datasets from Genome in a Bottle and Illumina Platinum Genomes, as well as standardized bed files describing potentially difficult regions for performance stratification.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.