Giter Site home page Giter Site logo

ojcharles / mutationfeatures Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 1.0 1.08 MB

Generate a rich feature space describing all possible mutations in a given protein sequence (+ structure)

License: MIT License

Shell 15.64% Dockerfile 2.14% Python 53.33% R 28.89%
bioinformatics protein-feature-extraction deep-learning drug-resistance mutation-analysis structural-biology protein sequence-alignment mutation

mutationfeatures's Introduction

MutationFeatures

Deriving tabular form features for each possible mutation in a protein.

Description

A container that takes a protein sequence [.fasta] (and optionally also a structure [.pdb]). Then returns a table with rows representing each possible AA mutation, and columns representing several unique quantitative approaches to describe of those mutations.

The idea is that this will be a useful tool for those looking to find patterns that distinguish resistance or disease causing mutations for example.

Features include:

  • Evolutionary: residue frequencies, site conservation, site-site co-evolution
  • Structural: disorder, solvent accessibility, secondary structure
  • Physicochemical: change in charge, hydrophobicity, VDW radius
  • Ligand: probability residue is in a pocket, is the residue contacting the most likely drug pocket
  • Language embedding of residue: Prot5

When provided only a sequence, only predicted structural features are generated.

When provided both a sequence and a pdb file, structural features derived from the structure will be appended (all residues required to be resolved).

Usage

For all intents and purposes one can replace "podman" with "docker" below.

To run the program you need a few things:

  • A linux environment with podman and ncbi-blast installed
  • The code for MutationFeatures & to be in that directory
    git clone https://github.com/ojcharles/MutationFeatures
    cd MutationFeatures
    
  • Generate the subfolders ./db, ./query, ./temp
  • A blast database to mount in the container. MutaionFeatures currently requires uniref50
    mkdir ./db
    wget -P ./db https://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref50/uniref50.fasta.gz
    gunzip uniref50.fasta.gz
    makeblastdb -in ./db/uniref50.fasta -parse_seqids -dbtype prot
    

Then you can build the container by running the command: podman build . -t mf

To run MutationFeatures against a query protein, drop a file say my.fasta in ./query, and optionally a file with the same basename such as my.pdb (ensure the PDB file contains all residues in the protein primary sequence, such as those produced by alphafold) Then run the following command:

podman run -e NVIDIA_VISIBLE_DEVICES=1 --rm -it --name mf \
    -v ./db:/db \
    -v ./lib:/mflibs \
    -v ./query:/query \
    -v ./temp:/tmp \
    mf /bin/bash \
    -c "Rscript /scripts/mf.R /query/my.fasta uniref50.fasta 32 1e-7" # query_fasta blast_db_name threads psiblast_eval

"-e NVIDIA_VISIBLE_DEVICES=1" is optional

The resultant csv file will be deposited in the same directory as your query FASTA file. This will contain a row for every possible mutaiton, and columns representing a featurespace suitable for Machine Learning.

Oscar J Charles 2022

mutationfeatures's People

Contributors

ojcharles avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

minghao2016

mutationfeatures's Issues

natlang + s4pred -> 1 tool

s4pred is essentially doing embedding -> 2dprediction
prot5 (the current natlang embedding) is also capable of doing that , but we only take the embedding.

Clean up the container by having only a single approach for these two things.

+Feature

Generate a HMM profile for the psi_blast alignment, and use the to_aa emmission probability as a feature.
Should be fairly similar to the pfam emmission probability used in other tools, but garuanteed value per residue

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.