I've written nanoGPT following the tutorial by Andrej Karpathy before, but the jump from that character-level GPT to the models we pull from HuggingFace to use is too big.
I wrote this in a bid to get my hands dirty implementing a Decoder-only transformer architecture from scratch in PyTorch.
There's a couple of resources I followed to do this:
- This blog post from Deep Learning Focus https://cameronrwolfe.substack.com/p/decoder-only-transformers-the-workhorse
- This repo: https://github.com/alan-cooney/transformer-from-scratch