Giter Site home page Giter Site logo

pydssp's Introduction

PyDSSP

A simplified implementation of DSSP algorithm for PyTorch and NumPy

What's this?

DSSP (Dictionary of Secondary Structure of Protein) is a popular algorithm for assigning secondary structure of protein backbone structure. [ Wolfgang Kabsch, and Christian Sander (1983)] This repository is a python implementation of DSSP algorithm that simplifies some parts of the algorithm.

General Info

  • It's NOT a complete implementation of the original DSSP, as some parts have been simplified (some more details here). However, an average of over 97% of secondary structure determinations agree with the original.
  • The algorithm used to identify hydrogen bonded residue pairs is exactly the same as the original DSSP algorithm, but is extended to output the hydrogen-bond-pair-matrix as continuous values in the range [0,1].
  • With the continuous variable extension above, the hydrogen-bond-pair-matrix is differentiable with torch.Tensor as input.

Install

install through PyPi

pip install pydssp

install by git clone

git clone https://github.com/ShintaroMinami/PyDSSP.git
cd PyDSSSP
python setup.py install

How to use

To use pydssp script

If you have already installed pydssp, you should be able to use pydssp command.

pydssp  input_01.pdb input_02.pdb ... input_N.pdb -o output.result

The output.result will be a text format, looking like follows,

-EEEEE-E--EEEEEE---EEEE-HHHH--EEEE--------- input_01.pdb
-HHHHHHHHHHHHHH----HHHHHHHHHHHHHHHHHHH--- input_02.pdb
-EEEE-----EEEE----EEEE--E---EEE-----EEE-EEE-- input_03.pdb
...

To use as python module

Import & test coordinates

# Import
import torch
import pydssp

# Sample coordinates
batch, length, atoms, xyz = 10, 100, 4, 3
## atoms should be 4 (N, CA, C, O) or 5 (N, CA, C, O, H)
coord = torch.randn([batch, length, atom, xyz]) # batch-dim is optional

To get hydrogen-bond matrix: pydssp.get_hbond_map()

hbond_matrix = pydssp.get_hbond_map(coord)

print(hbond_matrix.shape) # should be (batch, length, length)
  • For hbond_matrix[b, i, j], index 'i' is for donner (N-H) and 'j' is for acceptor (C=O), respectively
  • The output matrix consists of constant values in the range [0,1], which is defined as follows.

$HbondMat(i,j) = (1+\sin((-0.5-E(i,j)-margin)/margin*\pi/2))/2$

Here $E$ is the electrostatic energy defined by (Kabsch and Sander 1983) and $margin(=1.0)$ is introduced to control smoothness.

To get secondary structure assignment: pydssp.assign()

dssp = pydssp.assign(coord, out_type='c3')
## output is batched np.ndarray of C3 annotation, like ['-', 'H', 'H', ..., 'E', '-']

# To get secondary str. as index
dssp = pydssp.assign(coord, out_type='index')
## 0: loop,  1: alpha-helix,  2: beta-strand

# To get secondary str. as onehot representation
dssp = pydssp.assign(coord, out_type='onehot')
## dim-0: loop,  dim-1: alpha-helix,  dim-2: beta-strand

Differences from the original DSSP

This implementation was simplified from the original DSSP algorithm. The differences from the original DSSP are as follows

  • The implementation omitted β-bulge annotation, so β-bulge is determined as a loop instead of β-strand.
  • Parameters for adding hydrogen atoms are slightly different from the original DSSP, which may cause small differences in hydrogen bond annotation.
  • Only support C3 ('-', 'H', and 'E') type assignment instead of C8 type (B, E, G, H, I, S, T, and ' ').

Although the above simplifications, the C3 type annotation still matches with the original DSSP for more than 97% of residues on average.

Reference

@article{kabsch1983dictionary,
  title={Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features},
  author={Kabsch, Wolfgang and Sander, Christian},
  journal={Biopolymers: Original Research on Biomolecules},
  volume={22},
  number={12},
  pages={2577--2637},
  year={1983},
  publisher={Wiley Online Library}
}

pydssp's People

Contributors

shintarominami avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.