Giter Site home page Giter Site logo

glaciohound / lm-infinite Goto Github PK

View Code? Open in Web Editor NEW
70.0 70.0 7.0 105 KB

Implementation of paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"

Home Page: https://arxiv.org/abs/2308.16137

License: MIT License

Python 96.92% Shell 3.08%
language-model long-context model-diagnostics

lm-infinite's People

Contributors

glaciohound avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

lm-infinite's Issues

Should the llama model be fine-tuned?

Hello! I am a rookie to LLMs and I want to reproduce your nice work with the llama model (not the llama2).
Should I fine-tune the llama model on ARXIV or OpenWebText2 before evaluating it?
From my comprehension these two datasets are both the pre-training dataset of llama, so maybe the raw weights of llama model just work?
Thank you so much for your reply~

Some errors.

Hi,

when I run the code, I encounter two errors:

1. An error 1 occurred when running Evaluation on Passkey Retrieval Task:
Traceback (most recent call last):
File "scripts/eval_downstream_tasks.py", line 121, in
main(args)
File "scripts/eval_downstream_tasks.py", line 71, in main
output, output_ids = model.generate(
TypeError: generate() missing 1 required positional argument: 'do_sample'

2. An error 2 occurred when running Generation":
Traceback (most recent call last):
File "scripts/eval_generation.py", line 107, in
main(args)
File "scripts/eval_generation.py", line 94, in main
scores = generation_overall_metric(
File "LM-Infinite/data/generation_metrics.py", line 6, in generation_overall_metric
rouge = evaluate.load("rouge")
File "python3.8/dist-packages/evaluate/loading.py", line 731, in load
evaluation_module = evaluation_module_factory(
File "python3.8/dist-packages/evaluate/loading.py", line 681, in evaluation_module_factory
raise FileNotFoundError(
FileNotFoundError: Couldn't find a module script at LM-Infinite/rouge/rouge.py. Module 'rouge' doesn't exist on the Hugging Face Hub either.

Looking forward to your reply!

Implementation with RoPE

Hi, thanks for sharing this nice work!
I am a little confused about why keeping all k vectors unrotated while rotating all q vectors on the global branch. Any explanations would be appreciated!

How to Inferance?

The documentation does not make it clear how to perform inference using the lambda attention.

limited_distance_forward() got an unexpected keyword argument 'padding_mask'

I'm trying to run the eval script.

PYTHONPATH=. deepspeed --include localhost:$CUDA_VISIBLE_DEVICES --master_port $MASTER_PORT scripts/eval_downstream_tasks.py     --deepspeed_config configs/zero3_efficient_config.json     --model meta-llama/Llama-2-7b-hf --tokenizer_path meta-llama/Llama-2-7b-hf     --use_lambda_attention --local_branch 4096 --global_branch 100 --limit_distance 4096     --dataset passkey_retrieval --dataset_dir ${PASSKEY_DATA} --dataset_group ${MAX_LENGTH}     --max_generation_length 10 --evaluate_metrics     --log_dir $LOG_DIR/$TRIAL
image

GPTNeoX or Transformers support?

I'm trying to integrate LM-Infinite into GPTNeoX pythia-dedup. I managed to bring in the lambda_attn to work, but the rotary's implementation on the GPTNeoX is a bit different, and the heads is a 3 * hidden_size to form QKV, and the other model has separated layers of 1 * hidden_size that are independent Q/K/V. It managed to train, but during inference or evaluation (single batch) I got stuck on some shape mismatch.

I did managed to see the training benefit of lambda_attn, with a higher it/s. The GPU metrics are more smooth and steady on high throughput. The CPU exhibits also higher compute demand compared to traditional training and it doesn't appear to show any contention for the training. As a test, I did managed to train a larger context with the same hardware and at a higher performance, this works obviously.

Perhaps I was thinking wether having a folder or a separate repo with these modeling_$model.py that can be fit into transformers, would help to simplify the setup and adoption?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.