llm-efficiency-challenge / neurips_llm_efficiency_challenge Goto Github PK
View Code? Open in Web Editor NEWNeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
Hello, I have a question in that file!
No other than that, there seem to be a total of 3 files that can proceed with the helm benchmark test, and I wonder if the final evaluation is done with that file, or with the file "run_specs_full_coase_600_budge.conf"!
Thank you.
Hey, I am trying to run the tutorial at https://github.com/llm-efficiency-challenge/neurips_llm_efficiency_challenge/tree/master/sample-submissions/lit-gpt
It doesn't say so, but if I want to make a Lit-GPT submission, I have to cd
into /neurips_llm_efficiency_challenge/sample-submissions/lit-gpt
because that's where the Dockerfile is?
Now, when running the docker build
step, I am getting the following error:
master ~/neurips_llm_efficiency_challenge/sample-submissions/lit-gpt docker build -t toy_submission .
[+] Building 2.5s (12/16) docker:default
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 1.52kB 0.0s
=> [internal] load metadata for ghcr.io/pytorch/pytorch-nightly:c69b6e5-cu11.8.0 0.1s
=> [ 1/12] FROM ghcr.io/pytorch/pytorch-nightly:c69b6e5-cu11.8.0@sha256:748628fda7661f7e0612299b2012c 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 19.21kB 0.0s
=> CACHED [ 2/12] WORKDIR /submission 0.0s
=> CACHED [ 3/12] COPY /lit-gpt/ /submission/ 0.0s
=> CACHED [ 4/12] COPY ./fast_api_requirements.txt fast_api_requirements.txt 0.0s
=> CACHED [ 5/12] RUN pip install --no-cache-dir --upgrade -r fast_api_requirements.txt 0.0s
=> CACHED [ 6/12] RUN apt-get update && apt-get install -y git 0.0s
=> CACHED [ 7/12] RUN pip install -r requirements.txt huggingface_hub sentencepiece 0.0s
=> ERROR [ 8/12] RUN python scripts/download.py --repo_id openlm-research/open_llama_3b 2.3s
Trying the last step manually, it gives me an error that the scripts/download.py
file doesn't exist. Shouldn't it be lit-gpt/scripts/download.py
instead?
Following the README, I tried building up the Docker image. But, get the following error in the weights conversion step.
=> ERROR [10/13] RUN python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/open-llama/7B --model_size 7B 66.8s
------
> [10/13] RUN python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/open-llama/7B --model_size 7B:
65.78 Initializing lit-llama
65.78 Saving to disk at checkpoints/lit-llama/7B
65.78 Processing checkpoints/open-llama/7B/pytorch_model-00001-of-00002.bin
65.78 Traceback (most recent call last):
65.78 File "/submission/scripts/convert_hf_checkpoint.py", line 136, in convert_hf_checkpoint
65.78 sd[sd_key] = saver.store_early(sd[sd_key])
65.78 File "/submission/lit_llama/utils.py", line 469, in store_early
65.78 return SavingProxyForTensor(tensor, self)
65.78 File "/submission/lit_llama/utils.py", line 387, in __init__
65.78 storage_proxy = SavingProxyForStorage(
65.78 File "/submission/lit_llama/utils.py", line 363, in __init__
65.78 File "/submission/lit_llama/utils.py", line 492, in _write_storage_and_return_key
65.78 storage_key = saver._write_storage_and_return_key(storage)
65.79 self.zipfile.write_record(name, storage.data_ptr(), num_bytes)
65.79 RuntimeError: [enforce fail at inline_container.cc:471] . PytorchStreamWriter failed writing file data/151: file write failed
65.79
65.79 During handling of the above exception, another exception occurred:
65.79
65.79 Traceback (most recent call last):
65.79 File "/submission/scripts/convert_hf_checkpoint.py", line 166, in <module>
65.79 CLI(convert_hf_checkpoint)
65.79 File "/opt/conda/lib/python3.10/site-packages/jsonargparse/_cli.py", line 85, in CLI
65.79 return _run_component(component, cfg_init)
65.79 File "/opt/conda/lib/python3.10/site-packages/jsonargparse/_cli.py", line 147, in _run_component
65.79 return component(**cfg)
65.79 File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
65.79 return func(*args, **kwargs)
65.79 File "/submission/scripts/convert_hf_checkpoint.py", line 88, in convert_hf_checkpoint
65.79 with incremental_save(output_dir / "lit-llama.pth") as saver:
65.79 File "/submission/lit_llama/utils.py", line 496, in __exit__
65.79 self.zipfile.write_end_of_file()
65.79 RuntimeError: [enforce fail at inline_container.cc:337] . unexpected pos 17973300160 vs 17973300056
------
Dockerfile:22
--------------------
20 | # get open-llama weights
21 | RUN python scripts/download.py --repo_id openlm-research/open_llama_7b --local_dir checkpoints/open-llama/7B
22 | >>> RUN python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/open-llama/7B --model_size 7B
23 |
24 | # Copy over single file server
--------------------
ERROR: failed to solve: process "/bin/sh -c python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/open-llama/7B --model_size 7B" did not complete successfully: exit code: 1
$ docker build -t toy_submission .
[+] Building 0.1s (2/2) FINISHED docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 2B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
ERROR: failed to solve: failed to read dockerfile: open /var/lib/docker/tmp/buildkit-mount3987352626/Dockerfile: no such file or directory
cc: @carmocca
Hi!
I just wanted to point out that the NeurIPS Competitions blog post still links to https://weiweiy.github.io/, which forwards to https://kaleab-k.github.io/LLMEC/index instead of https://llm-efficiency-challenge.github.io/.
Best,
Harald
I wonder if you might consider getting rid of the conlang_translation
task. Doing multilingual NLP is a whole different thing, and it's useful, but if everything else is English, then having one translation task seems like a distraction.
...And maybe language_identification
. That's already so easy and effective with fasttext-style models anyway, that it seems like a low-priority thing for people to be working on with LLMs.
AFAICT removing these two would mean people can focus on English tasks. There's already a lot to do, so this would make life easier!
cc @weiweiy, @artidoro and @perlitz as requested by @msaroufim
I got this error in block code
from quantize.bnb import Linear4bit
class QuantizedLinear(Linear4bit):
def __init__(self, *args, **kwargs):
super().__init__(*args, quant_type="nf4", compress_statistics=True, **kwargs)
but, how to fix?
Ref: https://github.com/Lightning-AI/lit-llama
The open-source code in this repository works with the original LLaMA weights that are distributed by Meta under a research-only license.
New Apache 2.0 licensed weights are being released as part of the Open LLaMA project. To use the Open LLaMA weights or other LLaMA-like checkpoints such as Vicuna, check out the Lit-GPT repository.
In the toy submission docs here, I think you'd also need to first install the dependencies, e.g., via pip install -r requirements.txt
from the lit-gpt
subfolder before running docker build -t toy_submission .
Or perhaps this should be added as a step to the Dockerfile (but I don't know enough about Docker).
Could we allow users to be able to choose the batch size when doing evaluation? It is important to be able to change batch size when experimenting with different sized models on GPU. @drisspg
In the line here, the size of the iterables included in ZIP does not match. More specifically, the tokens
includes the logprob of all the tokens (ie input prompt + model response) whereas logprobs
corresponds only to the new tokens generated ie model response. As a result, I believe the generated_tokens
variable has incorrect values stored in it.
Can someone look into it/confirm this ?
ubuntu@146-235-201-180:~/neurips_llm_efficiency_challenge/sample-submissions/lit-gpt$ sudo docker build -t toy_submission .
[+] Building 0.5s (7/16)
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 1.38kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for ghcr.io/pytorch/pytorch-nightly:c69b6e5-cu11.8.0 0.5s
=> [internal] load build context 0.0s
=> => transferring context: 128B 0.0s
=> [ 1/12] FROM ghcr.io/pytorch/pytorch-nightly:c69b6e5-cu11.8.0@sha256:748628fda7661f7e0612299b2012ca3a9407ac920ea791398f9d553de8a43380 0.0s
=> CACHED [ 2/12] WORKDIR /submission 0.0s
=> ERROR [ 3/12] COPY /lit-gpt/ /submission/ 0.0s
------
> [ 3/12] COPY /lit-gpt/ /submission/:
------
Dockerfile:10
--------------------
8 |
9 | # Copy the specific file into the container at /submission
10 | >>> COPY /lit-gpt/ /submission/
11 |
12 | # Setup server requriements
--------------------
ERROR: failed to solve: failed to compute cache key: failed to calculate checksum of ref moby::de7206o7kdfxfqkatsgfjhmw2: "/lit-gpt": not found
I followed all instructions in README.md, but I got the above error.
Does dockerfile need to be modified?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.