Feature request Add Llama 3 support to <a href="https://github.com

cc <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

see <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id

Add Llama 3 support to `convert_llama_weights_to_hf()` about transformers HOT 6 CLOSED

calmitchell617 commented on May 22, 2024

Add Llama 3 support to `convert_llama_weights_to_hf()`

from transformers.

Comments (6)

amyeroberts commented on May 22, 2024

cc @ArthurZucker

from transformers.

calmitchell617 commented on May 22, 2024

I successfully performed the following steps to test whether adding Llama 3 support to this script would facilitate the use case I outlined above:

I downloaded Llama 2 in meta format (not HF format) with the torchtune cli. It is important not to test with a model in the HF format, because Llama 3 is not yet officially uploaded in HF format. It wouldn't be an apples to apples comparison.

CLI command used:

tune download meta-llama/Llama-2-7b-chat --output-dir <checkpoint_path> --hf-token $HF_TOKEN

I then fine tuned the downloaded meta-format Llama 2 with torchtune:

tune run \
    --nproc_per_node=4 \
    lora_finetune_distributed \
    --config llama2/7B_lora \
    batch_size=1 \
    seed=29 \
    tokenizer.path=<checkpoint_path> \
    checkpointer.checkpoint_dir=<checkpoint_path> \
    checkpointer.output_dir=<checkpoint_path> \
    dataset=torchtune.datasets.my_custom_dataset \
    checkpointer.checkpoint_files=['consolidated.00.pth'] \
    checkpointer=torchtune.utils.FullModelMetaCheckpointer \
    gradient_accumulation_steps=1 \
    lr_scheduler.num_warmup_steps=5 \
    enable_activation_checkpointing=False \
    dataset.max_rows=100 \
    epochs=1

Then I converted the fine-tuned model to format that can be loaded with from_pretrained() with the convert_llama_weights_to_hf() function. I named the script convert_checkpoint.py, but it just contains the convert_llama_weights_to_hf() function copy/pasted from Transformers:

python convert_checkpoint.py --input_dir <checkpoint_dir> --output_dir <output_dir>  --model_size 7B

I then ran a Gradio chatbot on the fine-tuned/converted model, and it worked as expected. So, it does seem that adding Llama 3 support to this script will unlock my particular use case (and probably many others).

from transformers.

ArthurZucker commented on May 22, 2024

see #30334

from transformers.

calmitchell617 commented on May 22, 2024

Thank you! I will keep an eye on that on that PR.

from transformers.

ArthurZucker commented on May 22, 2024

Just needs the doc 🤗

from transformers.

calmitchell617 commented on May 22, 2024

Good to know.

Let me know if that's something I could contribute, otherwise, thanks for the good work.

from transformers.

Recommend Projects

Add Llama 3 support to `convert_llama_weights_to_hf()` about transformers HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent