Giter Site home page Giter Site logo

dna-barcode-classification's Introduction

README

Methods

In this competition, I used seven methods to approach the prediction.

  1. 1D CNN

    • File location: Methods/1D_CNN.ipynb
    • The final accuracy at Kaggle is 94.004%.
  2. 2D CNN trained on the one-hot encoder

    • File location: Methods/2D_CNN.ipynb
    • The final accuracy at Kaggle is 95.208%.
  3. LSTM/Recurrent neural networks

    • File location: Methods/LSTM_Recurrent_neural_networksipynb.ipynb
    • The final accuracy at Kaggle is 94.943%.
  4. Variational auto encoder (VAE) fed into a supervised learning algorithm

    • File location: Methods/Variational_auto_encoder_(VAE).ipynb
    • The final accuracy at Kaggle is 94.317%.
  5. Feed-forward neural network

    • File location: Methods/FNN.ipynb
    • The final accuracy at Kaggle is 94.341%.
  6. Natural language processing(NLP) embeddings of k-mers + multinomial

    • File location: Methods/NLP_with_Multinomial_Classifier.ipynb
    • The final accuracy at Kaggle is 96.990%. (Due to the kaggle entry issue, I did not submit this result successfully.)
  7. Natural language processing(NLP) + CNN

    • File location: Methods/NLP_with_CNN.ipynb
    • The final accuracy at Kaggle is 96.291%.
  8. Consider new species.(Submission failed)

    • File location: Data/mv_predict_3.csv
    • The final accuracy at Kaggle is 97.592% on the public leaderboard, and 97.432% on private leaderboard.
    • The imperfect thing is that there should be five entries each day, but after four entries, the system did not allow me to submit the prediction. I am new to kaggle so I have no idea regarding this issue. This is the best prediction among all my results after I taking new species into account. The main idea is combining all seven results together and training to figure out the new species.

Data

  • All methods can be run directly by using the original data sets at Data folder. No external data is needed.

  • Since different methods may use a different types of data forms, such as character to integer, one-hot encoder, or k-mers, thus, different methods were organized separately and can be run independently.

dna-barcode-classification's People

Contributors

xz76 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.