Giter Site home page Giter Site logo

codon-optimization's Introduction

Codon-Optimization

A deep learning based approach to the task of genetic codon prediction and optimization. We propose an LSTM-Transducer model for this task, gaining modest improvements in accuracy and perplexity in predicting codon choice over frequency-based methods.

This was originally implemented as an undergraduate project in Google Colab using a PyTorch wrapper, namedtensor for Harvard's CS287r Machine Learning for Natural Language Processing Course. After this work was presented as a poster at MLCB 2019, the code was revised with better coding practices for readability and reproducibility *. Lastly, no model generated sequences have been experimentaly tested for expression in the lab against frequency baselines.

Data

The models were tested on highly expressed genes of E. coli MG1655 and Humans hg19. The highly expressed gene set in data (data/ecoli.heg.fasta and data/human_HE.fasta) can be used directly to train a new model or any set of transcripts in Fasta form can be used as input to the model. The script src/download_human_genes.py was used to resolve nucleotide sequences for the human housekeeping gene set.

Running the code

After downloading a set of transcripts in Fasta form for modeling from the above links and removing redundancy (e.g. using CD-Hit), a model can be trained and predictions generated on a random train/val/test split using the command:

python src/main.py --data-file [data/datafile.fasta]

Different models can be selected over the codon layer using the --codon-model-name flag and over the amino acid layer using the --aa-model-name flag

To run baseline models:

python src/main.py --data-file [data/datafile.fasta] --run-baselines

* As this work is not being actively continued, while a --gpu flag is provided, this code has only been tested on CPU and has not yet been used to reproduce the results table or free energy analysis from the original colab experiments.

codon-optimization's People

Contributors

samgoldman97 avatar dkmy avatar mzio avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.