Giter Site home page Giter Site logo

lm-infinite's Introduction

Multi-Modality

LM-INFINITE: SIMPLE ON-THE-FLY LENGTH GENERALIZATION FOR LARGE LANGUAGE MODELS

LM-Infinite is a solution proposed by Chi Han, Qifan Wang, Wenhan Xiong, Yu Chen, Heng Ji, and Sinong Wang to address the length generalization failure of Large Language Models (LLMs) on long sequences. LLMs, such as Transformer-based models, have shown impressive performance in various domains but struggle when it comes to longer reasoning processes or understanding larger contexts. Current pre-training schemes truncate training sequences to a fixed length, and even with relative positional encoding, LLMs struggle to generate coherent texts or perform downstream tasks after longer contexts.

The authors investigate the main out-of-distribution factors contributing to this problem and propose LM-Infinite as an efficient solution. LM-Infinite only requires a Λ-shaped attention mask and a distance limit, without any parameter updates or learning. It can be applied to different LLMs using relative-position encoding methods. LM-Infinite demonstrates consistent fluency and generation quality for sequences as long as 32k tokens on datasets like ArXiv and OpenWebText2, with a decoding speedup of 2.72x. Furthermore, it continues to perform well on inputs much longer than training lengths in downstream tasks like passkey retrieval, where vanilla models fail immediately.

Paper Link


Appreciation

  • Lucidrains
  • Agorians

Install

pip install lm-infinite

Usage

import torch
from infinite.main import LMInfinite

d_model = 512
seq_len = 100
n_global = 100
l_pretrain = 50


#sample
q = torch.randn(1, seq_len, d_model)
k = torch.randn(1, seq_len, d_model)
v = torch.randn(1, seq_len, d_model)


#llm infinite mode
model = LMInfinite(
    d_model,
    n_global,
    l_pretrain
)

#forwad pass
output = model(q, k, v)
print(output.shape)

Architecture

Todo

License

MIT

Citations

@misc{2308.16137,
Author = {Chi Han and Qifan Wang and Wenhan Xiong and Yu Chen and Heng Ji and Sinong Wang},
Title = {LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models},
Year = {2023},
Eprint = {arXiv:2308.16137},
}

lm-infinite's People

Contributors

kyegomez avatar

Stargazers

 avatar fredchen avatar elucida avatar Chuangyang Zheng avatar  avatar Sasi Kiran Malladi avatar  avatar Finlay Small avatar Pedro L. Chacín avatar David Okpare avatar Roger GOU avatar smellslikeml avatar Srikanta Prasad (Sri) avatar connero avatar Aowen Wang avatar Cinemachina avatar  avatar SeshurajuP avatar Mihai Chirculescu avatar Xin Li avatar  avatar Jeremy Dombrowski avatar Biorn Christiansen avatar 姬忠鹏 avatar Sandalots avatar Zekun Wang avatar Zheng Yuan avatar Rasmus Schultz avatar 爱可可-爱生活 avatar 任思宇 avatar  avatar daven avatar Le Tuan Thanh avatar  avatar infocyde/cybermageAI avatar Joseph Cheng avatar FBLGit avatar  avatar

Watchers

Cinemachina avatar  avatar  avatar  avatar

lm-infinite's Issues

Working examples

Has this library been integrated anywhere? I looked for package dependents, and there aren't any.

As I asked here:

wait, so do I understand correctly, this not only generalizes to different models - it also speeds them up by almost 3 times??

You replied with "affirmative", so I suppose you must have actually tested it?

I'm really eager to see this in action. 😊

(FYI, I'm just a developer, not an ML expert - so in this context, I'm just and end user hoping for a truly useful LLM to help me organize and research details for a project I'm working on.)

Has this technique been implemented anywhere else?

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar

Is the implementation the same as in the paper

I ported this code to gpt_nexo model's decoding, and the effect was severely degraded。

seq_len = 8192
l_pretrain = 4096
n_global = 2048

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar

line 51 is a wrong code?

logits = logits = mask.to(logits.device)

or

logits = logits + mask.to(logits.device)

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.