Giter Site home page Giter Site logo

snpahoy's Introduction

install with conda CI Coverage Status

SNP Ahoy!

Just a little tool for checking ID SNPs. It works in both germline and somatic modes as described in the sections below. By default, only sites with at least 30X coverage are considered, and sites with major allele frequency greater than or equal to 95% are considered homyzygote.

$ snpahoy --help
Usage: snpahoy [OPTIONS] COMMAND [ARGS]...

Options:
  --minimum_coverage INTEGER      Only consider SNP positions with a least this
                                  coverage  [default: 30]

  --minimum_base_quality INTEGER  Only count bases with at least this quality
                                  [default: 1]

  --homozygosity_threshold FLOAT  Consider a SNP position homozygote if
                                  frequency of most common allele is this or
                                  higher  [default: 0.95]

  --help                          Show this message and exit.

Commands:
  germline
  somatic

Germline Mode

To run in germline mode, simply provide a BAM file using the --bam_file option.

$ snpahoy germline --help
Usage: snpahoy germline [OPTIONS]

Options:
  --bed_file PATH          BED file with SNP postions  [required]
  --bam_file PATH          BAM file (must be indexed)  [required]
  --output_json_file PATH  JSON output file  [required]
  --help                   Show this message and exit.

The output JSON file contains input information, genotypes at all SNP positions, and a summary. In case a SNP is not genotyped (as for the chrY ones in the example below), the empty string is reported as genotype.

{
    "input": {
        "settings": {
            "minimum-coverage": 30,
            "minimum-base-quality": 1,
            "homozygosity-threshold": 0.95
        },
        "files": {
            "bed-file": "snps.bed",
            "bam-file": "germline.bam"
        }
    },
    "output": {
        "details": {
            "chr1:4789323": {
                "genotype": "CC",
                "depth": 88,
                "counts": {
                    "A": 1,
                    "C": 87,
                    "G": 0,
                    "T": 0
                }
            },
            "chr1:4895801": {
                "genotype": "TT",
                "depth": 65,
                "counts": {
                    "A": 0,
                    "C": 0,
                    "G": 0,
                    "T": 65
                }
            },

            (...)

            "chrY:23164803": {
                "genotype": "",
                "depth": 0,
                "counts": {
                    "A": 0,
                    "C": 0,
                    "G": 0,
                    "T": 0
                }
            }
        },
        "summary": {
            "snps": {
                "total": 1041,
                "genotyped": 1016
            },
            "heterozygotes-fraction": 0.4744,
            "mean-maf-homozygote-sites": 0.0022,
            "mean-off-genotype-frequency": 0.0023
        }
    }
}

Somatic Mode

To run in somatic mode, provide tumor and germline BAM files using the --tumor_bam_file and --germline_bam_file options.

$ snpahoy somatic --help
Usage: snpahoy somatic [OPTIONS]

Options:
  --bed_file PATH           BED file with SNP postions  [required]
  --tumor_bam_file PATH     Tumor BAM file (must be indexed)  [required]
  --germline_bam_file PATH  Germline BAM file (must be indexed)  [required]
  --output_json_file PATH   JSON output file  [required]
  --help                    Show this message and exit.

Output is similar to that in germline mode. Only sites which are genotyping in both tumor and germline are used, and the homozygote sites used in mean MAF calculations are the homozygote sites in the germline sample.

{
    "input": {
        "settings": {
            "minimum-coverage": 30,
            "minimum-base-quality": 1,
            "homozygosity-threshold": 0.95
        },
        "files": {
            "bed-file": "snps.bed",
            "tumor-bam-file": "tumor.bam",
            "germline-bam-file": "germline.bam"
        },
        "output": {
            "details": {
                "tumor": { ... },
                "germline": { ... }
            }
        },
        "summary": {
            "snps": {
                "total": 1041,
                "genotyped": 1000
            },
            "tumor": {
                "heterozygotes-fraction": 0.474,
                "mean-maf-homozygote-sites": 0.0019,
                "mean-off-genotype-frequency": 0.0019
            },
            "germline": {
                "heterozygotes-fraction": 0.474,
                "mean-maf-homozygote-sites": 0.0022,
                "mean-off-genotype-frequency": 0.0019
            }
        }
    }
}

This tool is developed with the MSK IMPACT panel in mind. Suggeseted cut-offs for identifying sample swap or contamination are 0.55 for heterozygotes fractions and 0.01 for mean MAFs.

Installation

The recommended way to install snpahoy is by using conda:

$ conda install -c micknudsen snpahoy

snpahoy's People

Contributors

micknudsen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.