mard1no / flash-linear-attention Goto Github PK

View Code? Open in Web Editor NEW

Fast implementations of causal linear attention for autogressive language modeling

C++ 2.08% Python 93.48% Cuda 4.44%

flash-linear-attention's Introduction

flash-linear-attention

This repo contains fast Triton-based implementation (maybe CUTLASS/CUTE in the future) of causal linear attention (i.e., RNNs with 2D hidden states), with a specific focus on modern decoder-only language models. Join discord if you are interested in this project!

Models

Orded by my expected implementation time.

Date	Title	Paper	Code	Support
2023-07	🔥🔥🔥[RetNet] Retentive network: a successor to transformer for large language models(@MRSA@THU)	[arxiv]	[official] [RetNet]	Parallel✅ FusedRecurrent✅ FusedChunkwise✅
2023-12	🔥🔥[GLA] Gated Linear Attention Transformers with Hardware-Efficient Training (@MIT@IBM)	[arxiv]	[official]	FusedRecurrent✅ BlockParallelChunk✅ FusedChunkWise✅
2023-12	🔥🔥[Based] An Educational and Effective Sequence Mixer (@Stanford Hazyresearch)	[blog]	[official]	TODO
2023-07	🔥🔥[TransnormerLLM] A Faster and Better Large Language Model with Improved TransNormer (@Shanghai AI Lab)	openreview arxiv	[official]	TODO
2023-05	🔥🔥🔥[RWKV-v6] Reinventing RNNs for the Transformer Era (@BlinkDL)	arxiv	[official]	TODO
2023-10	🔥[GateLoop]Fully Data-Controlled Linear Recurrence for Sequence Modeling	openreview arxiv	[jax]	TODO
2021-10	[ABC] Attention with Bounded-memory Control (@UW)	arxiv	-	TODO
2023-09	🔥[VQ-transformer] Linear-Time Transformers via Vector Quantization	arxiv	[official]	TODO

Requirements

PyTorch >= 2.0
Triton latest nightly release

pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightly

einops

Citation

If you find this repo useful, please consider citing our work:

@article{yang2023gated,
  title={Gated Linear Attention Transformers with Hardware-Efficient Training},
  author={Yang, Songlin and Wang, Bailin and Shen, Yikang and Panda, Rameswar and Kim, Yoon},
  journal={arXiv preprint arXiv:2312.06635},
  year={2023}
}

Recommend Projects