This repo contains fast Triton-based implementation (maybe CUTLASS/CUTE in the future) of causal linear attention (i.e., RNNs with 2D hidden states), with a specific focus on modern decoder-only language models. Join discord if you are interested in this project!
Orded by my expected implementation time.
Date | Title | Paper | Code | Support |
---|---|---|---|---|
2023-07 | 🔥🔥🔥[RetNet] Retentive network: a successor to transformer for large language models(@MRSA@THU) | [arxiv] | [official] [RetNet] | Parallel✅ FusedRecurrent✅ FusedChunkwise✅ |
2023-12 | 🔥🔥[GLA] Gated Linear Attention Transformers with Hardware-Efficient Training (@MIT@IBM) | [arxiv] | [official] | FusedRecurrent✅ BlockParallelChunk✅ FusedChunkWise✅ |
2023-12 | 🔥🔥[Based] An Educational and Effective Sequence Mixer (@Stanford Hazyresearch) | [blog] | [official] | TODO |
2023-07 | 🔥🔥[TransnormerLLM] A Faster and Better Large Language Model with Improved TransNormer (@Shanghai AI Lab) | openreview arxiv | [official] | TODO |
2023-05 | 🔥🔥🔥[RWKV-v6] Reinventing RNNs for the Transformer Era (@BlinkDL) | arxiv | [official] | TODO |
2023-10 | 🔥[GateLoop]Fully Data-Controlled Linear Recurrence for Sequence Modeling | openreview arxiv | [jax] | TODO |
2021-10 | [ABC] Attention with Bounded-memory Control (@UW) | arxiv | - | TODO |
2023-09 | 🔥[VQ-transformer] Linear-Time Transformers via Vector Quantization | arxiv | [official] | TODO |
pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightly
If you find this repo useful, please consider citing our work:
@article{yang2023gated,
title={Gated Linear Attention Transformers with Hardware-Efficient Training},
author={Yang, Songlin and Wang, Bailin and Shen, Yikang and Panda, Rameswar and Kim, Yoon},
journal={arXiv preprint arXiv:2312.06635},
year={2023}
}