My work to implement Generative Pre-trained Transformers
Baseline Implementation
- Create baseline project implementation based off the LLMs From Scratch Course
Improvements to the GPT-2 Architecture
- Watch and brainstorm improvements per Zero To Hero cousework
- Implement sinusoidal positional embeddings
- Improve Attention layer performance using FlashAttention, measure performance increase
Followup Work
- Implement Mixture of Experts (MoE) per GPT-4