Giter Site home page Giter Site logo

ngs-fzb / binosnp Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 1.0 1.46 MB

:mag: :pill: Detection of low-frequency SNPs in next-generation sequencing data of Mycobacterium tuberculosis complex strains

License: GNU General Public License v3.0

Perl 100.00%
variants low-frequency-variants ngs tuberculosis resistance

binosnp's Introduction


SNP detection based on binomial test procedure. Perl scripts with R integration and usage of bam-readcount algorithm.


The script accepts any BAM-file as input, but ideally duplicates (PCR artefacts) have been removed and base quality scores have been recalibrated. Additionally, the script requires an interval list where the positions to be examined are named as well as a RefAlt table defining reference and the alternative allele for those positions. Together with binoSNP we provide one example list specifically for the analysis of Tuberculosis bacteria. The list (Resisnps) includes positions known to be associated with drug resistance. As a first step the bam-readcountalgorithm from Larson (bam-readcount) is executed to extract information about the number and quality of reference and alternative alleles at the positions named in the interval list and stores this information in a text file. In a second step the resulting txt-file is read into R and for each position a p-value is calculated by using the binomial test procedure. In the next step a table is produced containing all information including the calculated p-value for each position named in the interval list. The last step implies the user-defined filtering, e.g. report variants with a p-value below 5 % (standard value for statistical significance).

Getting Started

These instructions will get you a copy of the project up and running on your local machine.

Requirements

You need to have the following programs installed and available in your PATH to run binoSNP, for installation please refer to the respective manuals:

Installing binoSNP

The script does not need to be specifically installed or compiled. Once the repository is downloaded or cloned (git clone https://github.com/ngs-fzb/binoSNP) you can run binoSNP from the console. You may need to change the permissions of the binoSNP executable with chmod 755 binoSNP. Optionally add binoSNP to your path or run it directly from the binoSNP directory.

Usage

binoSNP --help

Will print the following help message:

binoSNP 1.0.0 - Copyright (C) 2019  Viola Dreyer, Christian Utpatel
   
   [USAGE]: binoSNP [--OPTION PARAMETER] <.bam file>
   
   Available OPTIONS and default PARAMETERS:
   -i [--interval]      List of intervals to be analyzed
                        Default [Resisnps_Master.v28.interval_list.tsv]

   -m [--mut]           Mutation table (RefAlt table)
                        Default [Resisnps_Master.v28_RefuAlt.tsv]

   -o [--outdir]        Output directory
                        Default [./Low_Freq]

   -r [--ref]           Reference sequence used for aligment in fasta format
                        Default [M._tuberculosis_H37Rv_2015-11-13.fasta]

   -p [--pvalue]        p-value used to filter the results
                        Default [0.05]

   -h [--help]          This help message

   -v [--version]       Version of binoSNP

You can either run a single file by using:

binoSNP /path/to/your/filename.bam

Or you run all bamfiles of your directory as a batch:

binoSNP path/to/your/bamfiles/*.bam

Specify the output directory with -o path\to\output otherwise the results will be written to ./Low_Freqat the location from which the script was invoked.

In case you need other positions analyzed, you can create and use your own intervals. The list of intervals should follow the structure below:

M.tuberculosis_H37Rv \t start1 \t stop1 \n
M.tuberculosis_H37Rv \t start2 \t stop2 \n
M.tuberculosis_H37Rv \t start3 \t stop3 \n
M.tuberculosis_H37Rv \t start4 \t stop4 \n

For annotation of the positions, the user should also provide a mutation list in the following format:

Pos \t REF \t ALT \t Annotation \n
761140 \t A \t G \t RMP resistance \n

Usage of an own list:

binoSNP -i path/to/your/interval.list -m path/to/your/mutations.list

Authors

  • Viola Dreyer - Initial work - binoSNP
  • Christian Utpatel - Code contribution
  • Stefan Niemann - Head

License

Copyright (C) 2019 Viola Dreyer, Christian Utpatel, Stefan Niemann. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see LICENCES.

Publication

In review.

Acknowledgments

Parts of this work have been supported by the German Center for Infection Research (DZIF).

binosnp's People

Contributors

cutpatel avatar vdreyer avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

binosnp's Issues

bam-readcount: not found

Hi, I have a problem when running the script, it does not detect the *.bam file. This is the terminal output:

./binoSNP ./GATK_Bam/*.bam

[2022-04-14 00:45:33] Processing ./GATK_Bam/sequence_01.gatk.bam

[2022-04-14 00:45:33] Starting bam-readcount

[2022-04-14 00:45:33] bam-readcount ./GATK_Bam/sequence_01.gatk.bam -b 20 -w 0 -f /home/.../binoSNP/var/M._tuberculosis_H37Rv_2015-11-13.fasta -l /home/.../binoSNP/var/Resisnps_Master.v28.interval_list.tsv > Low_Freq/sequence_01.gatk.txt
sh: 1: bam-readcount: not found

[2022-04-14 00:45:33] bam-readcount ./GATK_Bam/sequence_01.gatk.bam -b 20 -w 0 -f /home/.../binoSNP/var/M._tuberculosis_H37Rv_2015-11-13.fasta -l /home/.../binoSNP/var/Resisnps_Master.v28.interval_list.tsv > Low_Freq/sequence_01.gatk.txt did not work: 32512

What do you think is the problem?

best regards
Ricardo

How to use binoSNP to detect low frequency SNPs in eukaryotic RNA-seq data?

Hi there,

I'd like to use binoSNP to detect subpopulations in my eukaryotic RNA-seq data and tried to set it up accordingly by supplying my own list of intervals, mutation table and (multiple scaffold/chromosome) reference sequence.

Although binoSNP appears to run smoothly on the supplied files, the output is empty.
I'm currently don't know where the issue is but wondered if you could provide an example bam.file to test that binoSNP runs smoothly in my environment?

Also, I'd appreciate any tips on how to use binoSNP on eukaryotic data sets.

Many thanks in advance for your help!

Use #!/usr/bin/env perl

I don't have system perl, mine is in a different location
This will ensure the first one in PATH is used.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.