Giter Site home page Giter Site logo

zackrhodes86 / dnanalyzer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from verisimilitudex/dnanalyzer

0.0 0.0 0.0 127.25 MB

A highly efficient, powerful, and feature-rich algorithm for analyzing DNA sequences

License: Other

Python 1.04% Java 97.82% Kotlin 1.14%

dnanalyzer's Introduction

Copyright WakaTime Releases Repository Size Hits Counter DeepSource

A highly efficient, powerful, and feature-rich algorithm for analyzing DNA sequences

DNAnalyzer identifies proteins, amino acids, start and stop codons, high coverage regions, regions susceptible to neurodevelopment disorders, transcription factors, and regulatory elements. Researchers are working to extract valuable information from such software to better understand human health and disease. Currently, we are working on developing a Command-Line-Interface (CLI) and Graphical User Interface (GUI) that will enable physicians to quickly and more easily interact with the software, enabling them to identify genetic mutations that may cause disease.

Background

The human genome is composed of over 3 billion base pairs, making human analysis nearly impossible. Consequently, using powerful computational and statistical methods to decode the functional information hidden in DNA sequences are necessary. The genome is also extremely intricate and contains a plethora of data, which need to be organized and converted into analyzable data appropriately. Current analytical tools and software make it arduous for both geneticists and physicians to do so, thus restricting them from acquiring crucial information to better understand humans. [1]

Features

  • Start and stop codons
    • Indicate the start and stop of an amino acid. There are 20 different amino acids. A protein consists of one or more chains of amino acids (called polypeptides) whose sequence is encoded in a gene. [2]
  • High coverage regions
    • Regions of a DNA genome that code for a protein and have a relatively high proportion of guanine and cytosine nucleotides to the 4 nucleotide bases (45-60% GC-content). [3]
  • Longest genes
    • Most susceptible to disease implications and are especially linked to neurodevelopmental disorders (e.g., autism). [4]
  • Transcription factors
    • Proteins that help turn specific genes "on" or "off" by binding to nearby DNA. [5]
  • Regulatory elements
    • Binding sites for transcription factors, which are involved in gene regulation. [6]
  • FASTA files (.fa)
    • Supports multi-line and single-line FASTA database files. Files can either be uploaded or linked to from the web. [7]
  • Command-line interface (Met CLI)
    • The Methionine command-line interface (abbreviated as Met CLI) is a unified tool for running DNAnalyzer services from the command-line. The CLI is a powerful tool for using DNAnalyzer services and scripting a sequence of commands to execute. You can currently access all the core features present in DNAnalyzer without having to log in, although account support will be implemented soon. To get more information on Met CLI installation and currently supported commands, refer to Met CLI GitHub repository.

Getting Started

System Requirements

To build and run the DNAnalyzer, you need * JDK 17 or greater * a JAVA\_HOME environment variable pointing to your JDK 17, or the java executable in your PATH

Build & Run

We use Gradle for the build. The Gradle wrapper takes care of downloading the dependencies etc. - simply run
gradlew build

On UNIX-like operating systems, you might have to prefix this with './' to ensure the OS looks in the current directory, so the above becomes

./gradlew build

Then, use this to run the CLI

java -jar build/libs/DNAnalyzer.jar <arguments>

Afterwards, you can run the DNAnalyzer with

java -cp build/classes/java/main/ DNAnalyzer.Main

Usage

DNAnalyzer uses CLI arguments instead of stdin. For example, you can do:

<executable> assets/dna/random/dnalong.fa --amino=ser

or

<executable> assets/dna/random/dnalong.fa --amino=ser --min=0 --max=100

Help message:

Usage: DNAnalyzer [-hrV] --amino=<aminoAcid> [--find=<proteinFile>]
                  [--max=<maxCount>] [--min=<minCount>] DNA
A program to analyze DNA sequences.
      DNA                    The FASTA file to be analyzed.
      --amino=<aminoAcid>    The amino acid representing the start of a gene.
      --find=<proteinFile>   The DNA sequence to be found within the FASTA file.
  -h, --help                 Show this help message and exit.
      --max=<maxCount>       The maximum count of the reading frame.
      --min=<minCount>       The minimum count of the reading frame.
  -r, --reverse              Reverse the DNA sequence before processing.
  -V, --version              Print version information and exit.

Demo

demo.mov

Future Support and Improvements

GUI

A cross-platform GUI-based application that will perform the algorithms implemented in the software. Currently, the Met CLI is used as an expedient for this feature. Once implemented, the Met CLI would continue to be the main tool for power users.

Needleman-Wunsch Algorithm

This algorithm is used primarily for gene sequencing looking for the optimal match between multiple gene sequences. While the Boyer-Moore algorithm is undoubtedly more efficient, the Needleman-Wunsch algorithm continues to be one of the most accurate algorithms for genomic sequencing. [8]

Cytogenic Location

This program will implement the Cytogenic Location organization technique which is a technique for finding where specific genes will be located by giving the chromosome, arm, region and band. 7q31.2, for example, would be the CFTR gene located on the 7th chromosome's long arm, in the 3rd region on the 1st band, and the 2nd sub-band. [9]

Data sources:

Citations

  1. Genomic Data Science Fact Sheet. (n.d.). Genome.gov. https://www.genome.gov/about-genomics/fact-sheets/Genomic-Data-Science
  2. DNA and RNA codon tables. (2020, December 13). Wikipedia. https://en.wikipedia.org/wiki/DNA_and_RNA_codon_tables
  3. GC-content - an overview | ScienceDirect Topics. (n.d.). Www.sciencedirect.com. https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/gc-content
  4. Length matters: Disease implications for long genes. (2013, October 22). Spectrum | Autism Research News. https://www.spectrumnews.org/opinion/viewpoint/length-matters-disease-implications-for-long-genes/
  5. Suter, D. M. (2020). Transcription Factors and DNA Play Hide and Seek. Trends in Cell Biology. https://pubmed.ncbi.nlm.nih.gov/32413318/
  6. What is noncoding DNA?: MedlinePlus Genetics. (n.d.). Medlineplus.gov. https://medlineplus.gov/genetics/understanding/basics/noncodingdna/
  7. BLAST TOPICS. (2019). Nih.gov. https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=BlastHelp
  8. Wikipedia Contributors. (2021, March 24). Needleman–Wunsch algorithm. Wikipedia; Wikimedia Foundation. https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm
  9. Cytogenic Location. (2020, December 13). Wikipedia. https://en.wikipedia.org/wiki/Cytogenetics

Terms of Use

You are entirely responsible for the use of this application, including any and all activities that occur. While the DNAnalyzer Team strives to fix all major bugs that may be either reported by a user or discovered while debugging, they will not be held liable for any loss that the user may incur as a result of using this application, under any circumstances. For further inquiries, please contact the following email address: [email protected]

Contribution Guidelines :

  • Drop a ⭐ on the Github repository (It's optional)

  • Before Contribute Please read Contributing_Guidelines.md and CODE_OF_CONDUCT.md.

  • Create an issue of the project or a feature you would like to add in the project and get the task assigned for youself.(Issue can be any bug fixes or any feature you want to add in this project).

  • Fork the repo to your Github.

  • Clone the Repo by going to your local Git Client in a particular local folder in your local machine by using this command with your forked repository link in place of below given link:
    git clone https://github.com/Verisimilitude11/DNAnalyzer

  • Create a branch using below command. git branch <your branch name>

  • Checkout to your branch. git checkout <your branch name>

  • Add your code in your local machine folder. git add .

  • Commit your changes. git commit -m"<add your message here>"

  • Push your changes. git push --set-upstream origin <your branch name>

  • Make a pull request! (compare your branch with the owner main branch)

Contributors🌟


✨THANKS TO THESE PEOPLE✨


Copyright Pending © 2022 DNAnalyzer. Some rights reserved. This is an open source project.

dnanalyzer's People

Contributors

verisimilitudex avatar deepsource-autofix[bot] avatar nv7-github avatar shubhwip avatar iamhrithikraj avatar deepsourcebot avatar sumitbishti avatar sb-decoder avatar dependabot[bot] avatar imgbotapp avatar frankschmitt avatar aakash232 avatar abbassalloum avatar yeehawbeans avatar kit-p avatar bufutda avatar saket2 avatar aminisonya avatar speedro avatar zackrhodes86 avatar klurpicolo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.