Giter Site home page Giter Site logo

anazhmetdin / golite Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 607 KB

python package to train CNN and DenseNet for protein function prediction

License: MIT License

Python 90.68% Makefile 9.32%
deep-learning cnn densenet protein-function-prediction cafa bioinformatics

golite's Introduction

GOlite

description

GOlite is a python package that trains different Neural Network models for protein function prediction. It adapts to different encodings by building dynamic structures. It's designed to train on encoded data produced by protEncoder.

Available Structures

1. Convolutional Neural Network (CNN)

In this package, 1D and 2D are available to choose from based on the input shape. The network structure uses the following elements:

  • binary cross-entropy loss
  • Adam optimizer
  • ReLU activation function in the hidden layers
  • Sigmoid activation function in the final classification layer
  • filters size changes in each layer and it's selected by the user and follows this pattern: start,end,step
  • filters count is selected by the user
  • one fully connected layer in the end
  • MaxPooling layer is added after each filter in order to avoid overfitting

2. Dense Convolutional Network (DenseNet)

figures from [1]

DenseNet implementation in Keras is not as flexible as CNN, therefore, we had to use the provided structures without adjustments. There are three prebuilt designs with different depths: DenseNet121, DenseNet169, and DenseNet201. DenseNet is considere superior to CNN in the image classification field in terms of efficiency and accuracy.

Input

for more input options see the package documentation:

GOlite --help

Output

Training

  • Model folder for each epoch.
    • Folder name pattern: outPrefix_modelType_epochNumber
  • Five Images plotting the following measurements across training and validation subsets in all epochs so far.
    • AUC
    • Loss
    • Mean Squared Error (MSE)
    • Categorical Accuracy
    • Categorical Cross Entropy
  • Five npy files storing the model performance measurements in all the previous epochs

Prediction

  • One npy file for each batch storing encoded labels vectors.
    • File name pattern: inputFile_prdcts.npy
  • One npy file for each batch storing the probabilities for each label and for all proteins.
    • File name pattern: inputFile_prdctsCert.npy

Getting started

Installing the package

  • Download the latest release: releases

  • In your command line environment:

pip install path/to/GOlite-x.x.x-py2.py3-none-any.whl

Run an example

  • In your Command Line environment:

GOlite -d m1000_s50_n1000_part*_onehot.npy" -l m1000_s50_n1000_part*_FGOA.npy" -o oneHot -e 100 -b 500 -f 64 -s 8,64,8

Performance

Fmax scores of GOlite models

These scores were acquired following CAFA3 instructions

Fmax scores ot the top GOlite models agains the top models in CAFA3 and published Deep-Learning-Based models

These scores are acquired using NK, and full evaluation modes

For more details on the experiment methodology please refere to my thesis or contact me through the email

golite's People

Contributors

anazhmetdin avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.