Giter Site home page Giter Site logo

felixgabler / master_thesis Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 38.03 MB

Supplementary code of master thesis on: "Exploring the Use of Protein Language Models to Predict Intrinsic Disorder in Proteins"

Python 4.27% Shell 0.06% Jupyter Notebook 95.67%

master_thesis's Introduction

Exploring the Use of Protein Language Models to Predict Intrinsic Disorder in Proteins

Supplementary code for master thesis by Felix Gabler

Abstract excerpt

Intrinsically disordered proteins (IDPs) are an abundant class of proteins that do not adopt a fixed or ordered three-dimensional structure, typically in the absence of molecular interactions with other proteins or macromolecules such as DNA or RNA. IDPs may be completely unstructured or partially structured with sections of intrinsically disordered regions (IDRs). Since these proteins are associated with various diseases such as Alzheimer's disease and Huntington's disease, their study is of great importance. Unfortunately, experimental determination of intrinsic disorder in proteins is tedious and expensive, as evidenced by the lack of data in disorder databases. While there are many computational methods for predicting IDRs, most are rather inefficient and therefore not suitable for larger databases. In our work, we evaluated the use of transformer-based protein language models (pLMs) for the prediction of intrinsic disorder in proteins. These deep learning models are trained in an unsupervised manner exclusively on sequences and have been shown to extract biophysical properties of amino acids in their resulting embeddings. To evaluate the utility of these models for our use case, we have experimented with various state-of-the-art, pre-trained pLMs, the degree of fine-tuning useful, and the complexity of the models required to extract disorder-related information from the embeddings. In addition, we explored the benefits of training with more nuanced continuous disorder scores.

About this repository

This repository contains all the data (see folder /data), methods (see folder /bin/disorder) and experiments (see folder /experiments and other folders as described) described in the thesis.

master_thesis's People

Contributors

felixgabler avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.