Giter Site home page Giter Site logo

kousw / recurrent-memory-transformer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from booydar/recurrent-memory-transformer

0.0 0.0 0.0 64.55 MB

[NeurIPS 22] [AAAI 24] Recurrent Transformer-based long-context architecture.

Shell 1.07% C++ 1.51% Python 26.98% C 0.05% Cuda 0.57% Makefile 0.01% Jupyter Notebook 69.82%

recurrent-memory-transformer's Introduction

Recurrent Memory Transformer implementation for Hugging Face models

RMT resources

[ paper ] [ code ] Recurrent Memory Transformer implementation and various training examples.

[ paper ] [ code ] BABILong - a long-context benchmark that supports 20 tasks and various sources of background text.

[ code ] Evaluate your long-context LLM on BABILong!

RMT is a memory-augmented segment-level recurrent Transformer. We implement our memory mechanism as a wrapper for any Hugging Face model by adding special memory tokens to the input sequence. The model is trained to control both memory operations and sequence representations processing.

drawing

Open In Colab Example: LM with RMT

Installation

pip install -e .

This command will install lm_experiments_tools with only required packages for Trainer and tools.

lm_experiments_tools Trainer supports gradient accumulation, logging to tensorboard, saving the best models based on metrics, custom metrics and data transformations support.

Install requirements for all experiments

Full requirements for all experiments are specified in requirements.txt. Install requirements after cloning the repo:

pip install -r requirements.txt

Citation

If you find our work useful, please consider citing the RMT papers:

@inproceedings{
        bulatov2022recurrent,
        title={Recurrent Memory Transformer},
        author={Aydar Bulatov and Yuri Kuratov and Mikhail Burtsev},
        booktitle={Advances in Neural Information Processing Systems},
        editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
        year={2022},
        url={https://openreview.net/forum?id=Uynr3iPhksa}
}
@misc{bulatov2023scaling,
      title={Scaling Transformer to 1M tokens and beyond with RMT}, 
      author={Aydar Bulatov and Yuri Kuratov and Mikhail S. Burtsev},
      year={2023},
      eprint={2304.11062},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@misc{kuratov2024search,
      title={In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss}, 
      author={Yuri Kuratov and Aydar Bulatov and Petr Anokhin and Dmitry Sorokin and Artyom Sorokin and Mikhail Burtsev},
      year={2024},
      eprint={2402.10790},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

recurrent-memory-transformer's People

Contributors

booydar avatar yurakuratov avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.