Giter Site home page Giter Site logo

mard1no / flash-linear-attention Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sustcsonglin/flash-linear-attention

0.0 0.0 0.0 91 KB

Fast implementations of causal linear attention for autogressive language modeling

C++ 2.08% Python 93.48% Cuda 4.44%

flash-linear-attention's Introduction

flash-linear-attention

This repo contains fast Triton-based implementation (maybe CUTLASS/CUTE in the future) of causal linear attention (i.e., RNNs with 2D hidden states), with a specific focus on modern decoder-only language models. Join discord if you are interested in this project!

Models

Orded by my expected implementation time.

Date Title Paper Code Support
2023-07 🔥🔥🔥[RetNet] Retentive network: a successor to transformer for large language models(@MRSA@THU) [arxiv] [official] [RetNet] Parallel✅ FusedRecurrent✅ FusedChunkwise✅
2023-12 🔥🔥[GLA] Gated Linear Attention Transformers with Hardware-Efficient Training (@MIT@IBM) [arxiv] [official] FusedRecurrent✅ BlockParallelChunk✅ FusedChunkWise✅
2023-12 🔥🔥[Based] An Educational and Effective Sequence Mixer (@Stanford Hazyresearch) [blog] [official] TODO
2023-07 🔥🔥[TransnormerLLM] A Faster and Better Large Language Model with Improved TransNormer (@Shanghai AI Lab) openreview arxiv [official] TODO
2023-05 🔥🔥🔥[RWKV-v6] Reinventing RNNs for the Transformer Era (@BlinkDL) arxiv [official] TODO
2023-10 🔥[GateLoop]Fully Data-Controlled Linear Recurrence for Sequence Modeling openreview arxiv [jax] TODO
2021-10 [ABC] Attention with Bounded-memory Control (@UW) arxiv - TODO
2023-09 🔥[VQ-transformer] Linear-Time Transformers via Vector Quantization arxiv [official] TODO

Requirements

pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightly

Citation

If you find this repo useful, please consider citing our work:

@article{yang2023gated,
  title={Gated Linear Attention Transformers with Hardware-Efficient Training},
  author={Yang, Songlin and Wang, Bailin and Shen, Yikang and Panda, Rameswar and Kim, Yoon},
  journal={arXiv preprint arXiv:2312.06635},
  year={2023}
}

flash-linear-attention's People

Contributors

sustcsonglin avatar yzhangcs avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.