Giter Site home page Giter Site logo

dltkcat's Introduction

DLTKcat

DLTKcat v1.0: Deep learning based prediction of temperature dependent enzyme turnover rates

Dataset curation from SABIO-RK and BRENDA

The dataset curation process is in /code/GetData.ipynb.

How to use DLTKcat ?

  1. Required inputs: substrate name, Uniprot ID of enzyme protein, temperature.
  2. Get SMILES strings and enzyme protein sequences using convert_input(path, enz_col, sub_col ) in /code/feature_functions.py.
  3. The input must be a csv file with columns of 'smiles', 'seq', 'Temp_K_norm', 'Inv_Temp_norm'.
    'Temp_K_norm' and 'Inv_Temp_norm' are normalized temperature and inverse temperature values.
  4. Run prediction:
python predict.py --model_path [default = /data/performances/model_latentdim=40_outlayer=4_rmsetest=0.8854_rmsedev=0.908.pth]<br>
--param_dict_pkl [default = /data/hyparams/param_2.pkl] <br>
--input [input.csv] --output [output file name] <br>
--has_label [default = False]
  1. Get attention weights of protein residues:
python get_attention.py --input [input.csv] --output [output file name]

Case studies

  1. Mutants of Pyrococcus furiosus Ornithine Carbamoyltransferase via directed evolution (/data/PFOCT/,/code/CaseStudy_PFOCT.ipynb).
    Ref: https://doi.org/10.1128/jb.183.3.1101-1105.2001
  2. Growth and metabolism of Lactococcus lactis and Streptococcus thermophilus at different temperatures(/data/GEMs, /code/GEMs.ipynb).
    Ref: https://doi.org/10.1038/srep14199, https://doi.org/10.1111/j.1365-2672.2004.02418.x

Dependencies

  1. Pytorch: https://pytorch.org/
  2. Scikit-learn: https://scikit-learn.org/
  3. RDKit:https://www.rdkit.org/
  4. BRENDApyrser: https://github.com/Robaina/BRENDApyrser
  5. COBRApy: https://github.com/opencobra/cobrapy
  6. Seaborn statistical data visualization:https://seaborn.pydata.org/index.html
  7. Escher: https://github.com/zakandrewking/escher

Citation

DLTKcat: deep learning based prediction of temperature dependent enzyme turnover rates Sizhe Qiu, Simiao Zhao, Aidong Yang bioRxiv 2023.08.10.552798; doi: https://doi.org/10.1101/2023.08.10.552798

Issue

Users might encounter "Index out of range" error at amino_vector = self.embedding_layer_amino(amino).
The potential solution is +1 to n_atom, n_amino in model parameters, and train a new model.

dltkcat's People

Contributors

sizheqiu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.