Educational LSTM From Scratch in Vanilla Python

Use this repo to train and test your own RNN and LSTM.
You can train and fine-tune a model on any text file, and it will generate text that sounds like it.
The LSTM layers, with full forward and backprop, are in layers_torch.py.

1. Project Structure

numpy_implementations/ : Folder with model and every layer implemented from scratch using only numpy.
data/ : Folder to store the text file. Currently holds shakespeare.txt (which is the default).
models/ : Folder which stores the saved models. Further explaination in section 2.
config.py : File with all model configuration. Edit this file to alter model layers and hyperparameters.
torch_layers.py : File containing every layer of the LSTM. Each layer is a class with a .forward and .backward method.
torch_model.py : File with the Model class.
run.py : Script ran by the ./run.sh command. Trains the model.
utils.py : File with helper functions and classes.

2. Running it Yourself

Requirements

The required packages are listed on recquirements.txt. The numpy-based implementations of the layers are in the numpy_implementations folder in layers.py and model.py, and the torch implementation is on layers_torch.py and model_torch.py.
The torch version is a little faster, and is the one used on the run.py implementation. The numpy files are listed for educational purposes only.
To setup and join a miniconda virtual environment, run on terminal:

conda create -n environment_name python=3.8
conda activate environment_name

The requirements can be installed on a virtual environment with the command

pip install -r requirements.txt

To run, install the necessary requirements and a text corpus (any text you wish to replicate, .txt format).
Please download your text file in the data directory.

Pretraining

To pretrain a RNN on language modeling (predicting next character), first go into config.py and chose the necessary arguments.
In the training_params dictionary, choose:
- --corpus (name of file in data directory with the text you want to train the model on)
- --to_path (.json file that will be created to store the model) [OPTIONAL]
And you can choose the hyperparameters (although the defaults work pretty well):
- n_iter (number of times the model will run a full sequence during training)
- n_timesteps (number of characters the model will see/predict on each iteration in n_iter)
- batch_size (number of parallel iterations the model will run)
- learning_rate (scalar regulating how quickly model parameters change. Should be smaller for fine-tuning)
- regularization: (scalar regulating size of weights and overfitting) [OPTIONAL]
- patience (after how many iterations without improvement should the learning rate be reduced) [OPTIONAL]
Under model_layers, you can choose whatever configuration works best. Usually, layers with more parameters require larger text files to avoid overfitting and repetitive outputs.
Finally, simply run on terminal:

python3 run.py --train --config=config.py

Whenever you feel like the samples are good enough, you can kill the training at any time. This will NOT corrupt the model saved .json file, and you may proceed to testing and fine_tuning on smaller datasets.

Note: For pretraining, a really large text corpus is usually necessary. I obtained good results with ~1M characters. If you want to alter layers/dimensions, do so in the config.py file, as described in the Build the Model section.

Fine-Tuning

To fine-tune a RNN on a given text file, go to config.py and choose the arguments:
In the fine_tuning_params dictionary, choose:
- --corpus (name of file in data directory with the text you want to train the model on)
- --from_path (.json file that contains pretrained model)
- --to_path (.json file that will be created to store the model) [OPTIONAL]
And you can choose the hyperparameters (although the defaults work pretty well):
- n_iter (number of times the model will run a full sequence during training)
- n_timesteps (number of characters the model will see/predict on each iteration in n_iter)
- batch_size (number of parallel iterations the model will run)
- learning_rate (scalar regulating how quickly model parameters change)
- regularization: (scalar regulating size of weights and overfitting) [OPTIONAL]
- patience (after how many iterations without improvement should the learning rate be reduced) [OPTIONAL]
model_layers will not be accessed during fine-tuning, as the layers of the pretrained model will be automatically loaded.
Finally, simply run on terminal:

python3 run.py --fine_tune --config=config.py

Note: For fine-tuning, a you can get adventurous with smaller text files. I obtained really nice results with ~10K characters, such as a small Shakespeare dataset and Bee Gees' songs.

Testing

To test your RNN, go to config.py and choose the arguments:
In the testing_params dictionary, choose:
- --from_path (.json file that contains pretrained model)
- --sample_size (how many characters will be generated, "sounding" like the source text) [OPTIONAL]
- --seed (the start to the string your model generates, it has to "continue" it) [OPTIONAL]

Note: the testing script does not access any hyperparametes, because the model is already trained.

model_layers will not be accessed during testing, as you will use the layers of the pretrained model.
Finally, simply run on terminal:

python3 run.py --test --config=config.py

Build a custom Model

To customize the model layers, go into config.py and edit the model_layers dictionary.
Each layer takes as arguments the input and output sizes.
You may chose among the following layers:
- Embedding (turns input indexes into vectors)
- TemporalDense (simple fully-connected layer)
- RNN (Recurrent Neural Network layer)
- RNNBlock (RNN + TemporalDense with residual connections)
- LSTM (Long Short Term Memory layer)
- TemporalSoftmax (returns probabilities for next generated character)

Note: The first layer must be a Embedding layer with input size equals vocab_size. The last layer must be a TemporalSoftmax layer with the previous layer's output size equals vocab_size. The training is by default implemented to detect CUDA availability, and run on CUDA if found.

3. Results

The Recurrent Neural Network implementation in main.py achieved a loss of 1.42 with a 78 vocabulary size training on the tiny shakespeare corpus in shakespeare.txt.

CORIOLANUS:
I am the guilty of us, friar is too tate.

QUEEN ELIZABETH:
You are! Marcius worsed with thy service, if nature all person, thy tear. My shame;
I will be deaths well; I say
Of day, who nay, embrace
The common on him;
To him life looks,
Yet so made thy breast,
From nightly:
Stand good.

BENVOLIO:
Why, whom I come in his own share; so much for it;
For that O, they say they shall, for son that studies soul
Having done,
And this is the rest in this in a fellow.

Note: Results achieved with the model configuration exactly as presented in this repo. The training took ~1h and 1500 steps.

The Long Short Term Memory (LSTM) implementation, using LSTMs instead of RNNs, achieved a loss of 1.32 with a 78 vocabulary size training on the tiny shakespeare corpus in shakespeare.txt.

HERMIONE:
Of all the sin of the hard heart; and hence,
For all the blessing from the king.

QUEEN ELIZABETH:
Ah, that away?

HERMIONE:
I'll go along.

QUEEN ELIZABETH:
Thou wear'st out yourself, and indeed Edward,
and his hours' vent, O why, away.

Note: Training times seemed to be a little faster with GPU (GTX 1070 vs M2 CPU), but the improvement was not dramatic (maybe due to iterative and non-paralellizeable nature of RNNs). The training took ~2h30 and 1500 steps.

Thanks for reading!

eduardoleao052 / lstm-from-scratch Goto Github PK

lstm-from-scratch's Introduction

Educational LSTM From Scratch in Vanilla Python

1. Project Structure

2. Running it Yourself

Requirements

Pretraining

Fine-Tuning

Testing

Build a custom Model

3. Results

lstm-from-scratch's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent