Implementations of machine learning algorithms, techniques and model architectures
- fully-connected layer
- layer normalization
- scaled dot-product attention
- multi-head attention
- positional encoding
- softmax
- dropout
- xavier uniform/normal
- he uniform/normal
- input based uniform/normal
- ones or zeros
- Transformer
- GPT
- MoE
- log softmax
- RNN
- CNN
- optimizers (SGD, Adam...)
- loss functions (Cross entropy, MSE...)