glaciohound / lm-infinite Goto Github PK
View Code? Open in Web Editor NEWImplementation of paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
Home Page: https://arxiv.org/abs/2308.16137
License: MIT License
Implementation of paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
Home Page: https://arxiv.org/abs/2308.16137
License: MIT License
Hello! I am a rookie to LLMs and I want to reproduce your nice work with the llama model (not the llama2).
Should I fine-tune the llama model on ARXIV or OpenWebText2 before evaluating it?
From my comprehension these two datasets are both the pre-training dataset of llama, so maybe the raw weights of llama model just work?
Thank you so much for your reply~
Hi,
when I run the code, I encounter two errors:
1. An error 1 occurred when running Evaluation on Passkey Retrieval Task:
Traceback (most recent call last):
File "scripts/eval_downstream_tasks.py", line 121, in
main(args)
File "scripts/eval_downstream_tasks.py", line 71, in main
output, output_ids = model.generate(
TypeError: generate() missing 1 required positional argument: 'do_sample'
2. An error 2 occurred when running Generation":
Traceback (most recent call last):
File "scripts/eval_generation.py", line 107, in
main(args)
File "scripts/eval_generation.py", line 94, in main
scores = generation_overall_metric(
File "LM-Infinite/data/generation_metrics.py", line 6, in generation_overall_metric
rouge = evaluate.load("rouge")
File "python3.8/dist-packages/evaluate/loading.py", line 731, in load
evaluation_module = evaluation_module_factory(
File "python3.8/dist-packages/evaluate/loading.py", line 681, in evaluation_module_factory
raise FileNotFoundError(
FileNotFoundError: Couldn't find a module script at LM-Infinite/rouge/rouge.py. Module 'rouge' doesn't exist on the Hugging Face Hub either.
Looking forward to your reply!
Hi, thanks for sharing this nice work!
I am a little confused about why keeping all k vectors unrotated while rotating all q vectors on the global branch. Any explanations would be appreciated!
The documentation does not make it clear how to perform inference using the lambda attention.
I'm trying to run the eval script.
PYTHONPATH=. deepspeed --include localhost:$CUDA_VISIBLE_DEVICES --master_port $MASTER_PORT scripts/eval_downstream_tasks.py --deepspeed_config configs/zero3_efficient_config.json --model meta-llama/Llama-2-7b-hf --tokenizer_path meta-llama/Llama-2-7b-hf --use_lambda_attention --local_branch 4096 --global_branch 100 --limit_distance 4096 --dataset passkey_retrieval --dataset_dir ${PASSKEY_DATA} --dataset_group ${MAX_LENGTH} --max_generation_length 10 --evaluate_metrics --log_dir $LOG_DIR/$TRIAL
I'm trying to integrate LM-Infinite into GPTNeoX pythia-dedup
. I managed to bring in the lambda_attn
to work, but the rotary's implementation on the GPTNeoX is a bit different, and the heads is a 3 * hidden_size
to form QKV, and the other model has separated layers of 1 * hidden_size
that are independent Q/K/V. It managed to train, but during inference or evaluation (single batch) I got stuck on some shape mismatch.
I did managed to see the training benefit of lambda_attn
, with a higher it/s
. The GPU metrics are more smooth and steady on high throughput. The CPU exhibits also higher compute demand compared to traditional training and it doesn't appear to show any contention for the training. As a test, I did managed to train a larger context with the same hardware and at a higher performance, this works obviously.
Perhaps I was thinking wether having a folder or a separate repo with these modeling_$model.py
that can be fit into transformers, would help to simplify the setup and adoption?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.