Giter Site home page Giter Site logo

lstm-from-scratch's Introduction

Educational LSTM From Scratch in Vanilla Python

  • Use this repo to train and test your own RNN and LSTM.
  • You can train and fine-tune a model on any text file, and it will generate text that sounds like it.
  • The LSTM layers, with full forward and backprop, are in layers_torch.py.

1. Project Structure

  • numpy_implementations/ : Folder with model and every layer implemented from scratch using only numpy.

  • data/ : Folder to store the text file. Currently holds shakespeare.txt (which is the default).

  • models/ : Folder which stores the saved models. Further explaination in section 2.

  • config.py : File with all model configuration. Edit this file to alter model layers and hyperparameters.

  • torch_layers.py : File containing every layer of the LSTM. Each layer is a class with a .forward and .backward method.

  • torch_model.py : File with the Model class.

  • run.py : Script ran by the ./run.sh command. Trains the model.

  • utils.py : File with helper functions and classes.

2. Running it Yourself

Requirements

  • The required packages are listed on recquirements.txt. The numpy-based implementations of the layers are in the numpy_implementations folder in layers.py and model.py, and the torch implementation is on layers_torch.py and model_torch.py.
  • The torch version is a little faster, and is the one used on the run.py implementation. The numpy files are listed for educational purposes only.
  • To setup and join a miniconda virtual environment, run on terminal:
conda create -n environment_name python=3.8
conda activate environment_name
  • The requirements can be installed on a virtual environment with the command
pip install -r requirements.txt
  • To run, install the necessary requirements and a text corpus (any text you wish to replicate, .txt format).
  • Please download your text file in the data directory.

Pretraining

  • To pretrain a RNN on language modeling (predicting next character), first go into config.py and chose the necessary arguments.

  • In the training_params dictionary, choose:

    • --corpus (name of file in data directory with the text you want to train the model on)
    • --to_path (.json file that will be created to store the model) [OPTIONAL]
  • And you can choose the hyperparameters (although the defaults work pretty well):

    • n_iter (number of times the model will run a full sequence during training)
    • n_timesteps (number of characters the model will see/predict on each iteration in n_iter)
    • batch_size (number of parallel iterations the model will run)
    • learning_rate (scalar regulating how quickly model parameters change. Should be smaller for fine-tuning)
    • regularization: (scalar regulating size of weights and overfitting) [OPTIONAL]
    • patience (after how many iterations without improvement should the learning rate be reduced) [OPTIONAL]
  • Under model_layers, you can choose whatever configuration works best. Usually, layers with more parameters require larger text files to avoid overfitting and repetitive outputs.

  • Finally, simply run on terminal:

python3 run.py --train --config=config.py
  • Whenever you feel like the samples are good enough, you can kill the training at any time. This will NOT corrupt the model saved .json file, and you may proceed to testing and fine_tuning on smaller datasets.

Note: For pretraining, a really large text corpus is usually necessary. I obtained good results with ~1M characters. If you want to alter layers/dimensions, do so in the config.py file, as described in the Build the Model section.

Fine-Tuning

  • To fine-tune a RNN on a given text file, go to config.py and choose the arguments:

  • In the fine_tuning_params dictionary, choose:

    • --corpus (name of file in data directory with the text you want to train the model on)
    • --from_path (.json file that contains pretrained model)
    • --to_path (.json file that will be created to store the model) [OPTIONAL]
  • And you can choose the hyperparameters (although the defaults work pretty well):

    • n_iter (number of times the model will run a full sequence during training)
    • n_timesteps (number of characters the model will see/predict on each iteration in n_iter)
    • batch_size (number of parallel iterations the model will run)
    • learning_rate (scalar regulating how quickly model parameters change)
    • regularization: (scalar regulating size of weights and overfitting) [OPTIONAL]
    • patience (after how many iterations without improvement should the learning rate be reduced) [OPTIONAL]
  • model_layers will not be accessed during fine-tuning, as the layers of the pretrained model will be automatically loaded.

  • Finally, simply run on terminal:

python3 run.py --fine_tune --config=config.py

Note: For fine-tuning, a you can get adventurous with smaller text files. I obtained really nice results with ~10K characters, such as a small Shakespeare dataset and Bee Gees' songs.

Testing

  • To test your RNN, go to config.py and choose the arguments:
  • In the testing_params dictionary, choose:
    • --from_path (.json file that contains pretrained model)
    • --sample_size (how many characters will be generated, "sounding" like the source text) [OPTIONAL]
    • --seed (the start to the string your model generates, it has to "continue" it) [OPTIONAL]

Note: the testing script does not access any hyperparametes, because the model is already trained.

  • model_layers will not be accessed during testing, as you will use the layers of the pretrained model.

  • Finally, simply run on terminal:

python3 run.py --test --config=config.py

Build a custom Model

  • To customize the model layers, go into config.py and edit the model_layers dictionary.
  • Each layer takes as arguments the input and output sizes.
  • You may chose among the following layers:
    • Embedding (turns input indexes into vectors)
    • TemporalDense (simple fully-connected layer)
    • RNN (Recurrent Neural Network layer)
    • RNNBlock (RNN + TemporalDense with residual connections)
    • LSTM (Long Short Term Memory layer)
    • TemporalSoftmax (returns probabilities for next generated character)

Note: The first layer must be a Embedding layer with input size equals vocab_size. The last layer must be a TemporalSoftmax layer with the previous layer's output size equals vocab_size. The training is by default implemented to detect CUDA availability, and run on CUDA if found.

3. Results

  • The Recurrent Neural Network implementation in main.py achieved a loss of 1.42 with a 78 vocabulary size training on the tiny shakespeare corpus in shakespeare.txt.
CORIOLANUS:
I am the guilty of us, friar is too tate.

QUEEN ELIZABETH:
You are! Marcius worsed with thy service, if nature all person, thy tear. My shame;
I will be deaths well; I say
Of day, who nay, embrace
The common on him;
To him life looks,
Yet so made thy breast,
From nightly:
Stand good.

BENVOLIO:
Why, whom I come in his own share; so much for it;
For that O, they say they shall, for son that studies soul
Having done,
And this is the rest in this in a fellow.

Note: Results achieved with the model configuration exactly as presented in this repo. The training took ~1h and 1500 steps.

  • The Long Short Term Memory (LSTM) implementation, using LSTMs instead of RNNs, achieved a loss of 1.32 with a 78 vocabulary size training on the tiny shakespeare corpus in shakespeare.txt.
HERMIONE:
Of all the sin of the hard heart; and hence,
For all the blessing from the king.

QUEEN ELIZABETH:
Ah, that away?

HERMIONE:
I'll go along.

QUEEN ELIZABETH:
Thou wear'st out yourself, and indeed Edward,
and his hours' vent, O why, away.

Note: Training times seemed to be a little faster with GPU (GTX 1070 vs M2 CPU), but the improvement was not dramatic (maybe due to iterative and non-paralellizeable nature of RNNs). The training took ~2h30 and 1500 steps.

  • Thanks for reading!

lstm-from-scratch's People

Contributors

eduardoleao052 avatar

Stargazers

 avatar Martim Morelli Cordeiro avatar

Watchers

Kostas Georgiou avatar  avatar  avatar  avatar

Forkers

serantu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.