Giter Site home page Giter Site logo

tayyabrahmani / protein-sequence-analysis Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 10 KB

Explore amino acid patterns in biological sequences with this R repository. Utilizing stringr, mmand, and ape libraries, it offers spatial analysis functionalities including pattern recognition, frequency analysis, and phylogenetic tree construction. Dive into documentation for usage guidance.

R 100.00%
bioinformatics olfactory phylogenetics sequence-analysis

protein-sequence-analysis's Introduction

Spatial Analysis of Amino Acid Patterns

This repository contains R code for performing spatial analysis of amino acid patterns in biological sequences. The code utilizes various functions from stringr, mmand, and ape libraries for pattern recognition, frequency analysis, morphological operations, and phylogenetic tree construction. Below, you will find documentation on how to use the code and its functionalities. Olfaction plays a crucial role in insects, including the common fruit fly Drosophila melanogaster. Drosophila melanogaster possesses 66 distinct olfactory receptor protein sequences, showcasing significant diversity. This article presents a novel approach for clustering these 66 protein sequences using image-based similarity indices derived from mathematical morphology. Two distinct methodologies are explored: one leveraging the natural occurrence of the twenty standard amino acids, while the other employs chemical grouping of amino acids. A specific metric is devised to facilitate clustering of pairs of sequences based on their similarity indices, categorizing them into three classes: Highest, Moderate, and Least. Notably, OR83b emerges as a prominent olfactory receptor expressed across diverse insect populations, a finding corroborated quantitatively through our investigation.

Installation

To run the code, you need to have R installed on your system along with the following libraries:

  • stringr
  • mmand
  • ape

You can install these libraries using the following R commands:

install.packages("stringr")
install.packages("mmand")
install.packages("ape")

Usage

  1. Clone the Repository:
git clone https://github.com/tayyabrahmani/Protein-Sequence-analysis.git
  1. Prepare Data: Ensure that your data is in the appropriate format. For the amino acid analysis, the sequences should be in a file named drosophilaaligned.txt. Each sequence should be separated by a newline character.

  2. Run the R Script: Open RStudio or any R environment and run the script algorithm.R. Ensure that all required libraries are loaded.

  3. Interpret Results: After running the script, you will get various outputs including heatmaps, dendrograms, and phylogenetic trees. Interpret these results based on your analysis requirements.

Code Overview

The code performs the following main tasks:

  • Pattern Recognition: Defines amino acid patterns and identifies their frequencies in biological sequences.
  • Infima and Suprema Calculation: Computes infima and suprema matrices based on the identified patterns.
  • Area Interaction Matrix: Calculates area interaction matrices using infima and suprema matrices.
  • Grayscale Morphological Operations: Performs grayscale morphological dilation and erosion operations to compute distances.
  • Ranking Spatial Fields: Ranks pairs of spatial fields based on spatial interactions.
  • Phylogenetic Tree Construction: Constructs phylogenetic trees and dendrograms based on the spatial analysis results.

protein-sequence-analysis's People

Contributors

tayyabrahmani avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.