Giter Site home page Giter Site logo

bam_mxtrx's Introduction

# BAM_mtrx

Overview

BAM_mtrx is a Python application designed to analyze and visualize quality distributions of sequencing reads from BAM files. It processes BAM files to compute various metrics and generates histograms for base and mapping quality distributions.

Features

  • GUI for BAM file selection: Users can select their input file using a simple file dialog.
  • Comprehensive metrics computation: Total reads, mapped reads, unmapped reads, and more.
  • Quality distribution plots: Generates histograms of base and mapping qualities.
  • Command-line interface: Supports customization through command-line arguments.

Bioinformatics Context and Logic

BAM_mtrx processes BAM files, which are binary versions of SAM (Sequence Alignment/Map) files, to assess the quality of DNA sequence alignments. Below is a detailed explanation of the program's logic, particularly focusing on CIGAR string processing and alignment data handling:

CIGAR String Handling

  • Identifying Deletions: The program includes a function to check if a CIGAR string represents a deletion in the alignment. If a deletion is detected during the read processing, the read is either skipped or specially handled depending on its impact on the alignment metrics.
  • Efficiency: The deletion check is designed for efficiency, returning immediately when a deletion is found, which optimizes performance when processing large datasets.

Alignment Quality Analysis

  • Quality Score Collection: For reads that pass the initial CIGAR check (i.e., no problematic deletions), the program collects base quality scores and mapping quality scores. These scores are crucial for assessing the reliability of the alignment.
  • High-Quality Reads Identification: Reads with base quality scores of 30 or higher are flagged as high-quality reads. Similarly, reads with a mapping quality score of 60 (perfect mapping) are separately counted to provide insights into the dataset's overall alignment quality.

Tested Data

BAM_mtrx was extensively tested using mitochondrial DNA data that was pre-processed (mapped, indexed, and sorted). This type of data provides a robust framework for evaluating the program's effectiveness in handling complex genomic datasets.

Requirements

  • Python 3
  • Libraries: pysam, matplotlib, tkinter

Installation

Ensure Python 3 is installed along with the required libraries:

pip install pysam matplotlib

Usage

To use BAM_mtrx, follow these instructions:

  1. Run the script from the command line:
python BAM_mtrx.py
  1. A file dialog will appear for you to select your BAM file.

  2. The program will process the selected BAM file and generate two plots showing the base and mapping quality distributions, which will be saved as PNG files.

  3. A text file containing detailed metrics about the processed reads will also be generated.

Command-line Arguments

  • --title (optional): Specify a base title for the graphs.

Output

PNG files: Plots showing the distributions of base and mapping qualities.These plots are saved as PNG images with filenames indicating their content (e.g., your_bam_file_base_quality.png).

Text file: Detailed metrics of the processed reads, including total reads, mapped reads, and high-quality reads.It contains various metrics like total reads, mapped reads, unmapped reads, average alignment score, and the number of reads with high base quality (above 30) and perfect mapping quality (score of 60).

Note:

The script filters out reads with missing base quality scores to avoid errors. This filtering is reported in the metrics file.

Contributing

Contributions to BAM_mtrx are welcome. Please feel free to fork the repository, make your changes, and submit a pull request.

Contact

For issues, questions, or contributions, please open an issue in the GitHub repository.

Version 1.0 - May 2024

bam_mxtrx's People

Contributors

akmartian avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.