benjiemc / pythonpdb Goto Github PK

Tools for working with Protein Data Bank Files

License: BSD 3-Clause "New" or "Revised" License

Shell 0.19% Python 99.81%

pythonpdb's Introduction

PythonPDB

Tools for working with Protein Data Bank Files. This package was created to provide a bridge between python based OOP (like BioPython) and dataframe/pandas style interfaces (like BioPandas).

Key Links

Source code: https://github.com/benjiemc/PythonPDB
Documentation: https://benjiemc.github.io/PythonPDB/
PyPi: https://pypi.org/project/python-pdb/

Installation

The package is available on PyPi and can be downloaded as follows.

pip install python-pdb

Installing for Development

To work on this package, download the git repository...

git clone https://github.com/benjiemc/PythonPDB.git

# or using SSH
git clone [email protected]:benjiemc/PythonPDB.git

cd PythonPDB/

... and then install the package with development dependencies (best practice is to use a virtual environment).

python3 -m venv venv
pip install -e '.[develop]'

Documentation

Documentation can be found at https://benjiemc.github.io/PythonPDB/.

pythonpdb's People

Contributors

Stargazers

Watchers

Forkers

fspoendlin

pythonpdb's Issues

Structure to_pandas: Fix documentation pd.Dataframe (or something)

Wrong type is used here I think

Create parent class for entity classes (Atom, Residue, Chain, Model, Structure)

Similar to how it's done in BioPython. Would save having multiple implementations of methods (copy, get_atoms(), etc) that are all very similar. Should keep the API the same but make existing methods and properties wrappers around the base class implementation.

Add license for software

Possibly BSD 3-Clause License?

Add automatic versioning to package

Should be bump-able via minor, major, etc notes in the commit messages.

The git tag, pyproject.toml, and package version should be updated.

Add deploy workflow for PyPi

Bug with split_states method?

PDB code 8ecq is splitting into three states instead the expected two.

test pipeline does not run on pull requests

Add direct parse pdb file to pandas function

Create parsing engine

The idea here is to create a layer for different formats (PDB, mmCIF, pandas, csv, etc) to interface with to reduce duplicate code.

Eg: Structure.from_pdb(...) and Structure.from_pandas(...) are very similar but have slightly different input formats- requiring different mehtods. The goal would be to have a layer that standardizes the input and then a layer that builds the structure from this.

Ideas:

Use namedtuple objects to store record information and feed them into the parsing engine to build the structure
Use generators to reduce looping through records multiple times

Add `in` property to entities

Would be great to have the functionality of checking if something is a part of the other entity.

Eg: atom in residue

Implemented through __contains__(self, key) method

Add functionality for `to_sequence()` method

This would convert either a Structure, Model, or Chain into a fasta formatted sequence.

The entity level could be conveyed in fasta header:

...
print(chain.to_sequence())

>A
PROTEIN

all to

...
print(structure.to_sequence())

>1:A
PROTEIN