Giter Site home page Giter Site logo

sleekpanther / sequence-alignment Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 3.0 913 KB

Sequence Alignment (Needleman–Wunsch Algorithm using Dynamic Programming) for aligning sequences (words, sentences, DNA etc.)

Java 100.00%
dynamic-programming dynamic pseudocode optimal-substructure optimal optimality needleman-wunsch needlemanwunsch needleman wunsch algorithm algorithms algorithm-design memoization memo memorization noah-patullo noah patullo patulo

sequence-alignment's Introduction

Sequence Alignment

Implementation of the classic Dynamic Programming problem using the Needleman–Wunsch algorithm which requires quadratic space & time complexity.

Problem Statement

Given 2 sequences, find the minimum cost of aligning the 2 sequences (case insensitive).
Gaps can be inserted to 1 sequence or the other, but incur a penalty.

2 = Gap Penalty (δ)
If 2 characters are aligned with each other, there may be a mismatch penalty (αi j)

  • 0 = aligning identical letters
  • 1 = aligning a vowel with a vowel, or a consonant with a consonant
  • 3 = aligning a vowel with a consonant

Minimum cost = sum of mismatch & gap penalties (the optimal alignment)

Optimal Substructure

There may be multiple optimal paths, but this only finds 1 of them

Runtime

O(M*N)
Storage: also O(M*N)

Pseudocode

Usage

Requirements & Caveats

  • Alphanumeric characters only (all others including spaces are sanitized out)
  • Case insensitive
  • _ (Underscore character) represents gaps but only when displaying results (it will be removed & ignored if it's present in a sequence)
  • View results in a fixed-width font for the 2 sequences to be lined up
  • predecessorIndexes is calculated when creating memoTable but not used to find the actual alignment, only to show where the values in memoTable came from
    • For example: if predecessorIndexes[4][4] contains the array [4, 3], it means the value of memoTable[4][4] (which is 6) came from memoTable[4][3] (i.e. case3, a character in seq2 was aligned with a gap so it came from the left)
    • predecessorIndexes[0][0] contains the array [-1, -1] because the upper left corner has no predecessor

Setup

  • Provide 2 strings (edit testSequences 2D array) & run calculateAndPrintOptimalAlignment()
  • Optionally change the penalties by passing in arguments to the non-default constructor
    Currently vowelVowelMismatchPenalty, consonantConsonantMismatchPenalty & numberNumberMismatchPenalty are the same, but all are arbitrary

Example: aligning "mean" with "name"

Code Notes

  • seq1 & seq2 have a leading space added (after sanitizing illegal & whitespace characters)
    This is so that seq1.charAt(i) & seq2.charAt(j) work in the loops & the string indexes match up
    It causes some adjustments for the array sizes.
  • memoTable = new int[seq1.length()][seq2.length()] uses the exact value of length() which includes the space
  • Loops like for(int i=0; i < seq1.length(); i++) use i < seq1.length() to not go out of bounds of the string indexes
  • In findAlignment(), int i = seq1.length() - 1; & int j = seq2.length() - 1; since the 2 sequences have leading spaces & arrays indexes start from 0
  • findAlignment() retraces each calculation in the memoTable to see where it came from

References

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.