Giter Site home page Giter Site logo

Comments (6)

amyeroberts avatar amyeroberts commented on May 22, 2024

cc @ArthurZucker

from transformers.

calmitchell617 avatar calmitchell617 commented on May 22, 2024

I successfully performed the following steps to test whether adding Llama 3 support to this script would facilitate the use case I outlined above:

I downloaded Llama 2 in meta format (not HF format) with the torchtune cli. It is important not to test with a model in the HF format, because Llama 3 is not yet officially uploaded in HF format. It wouldn't be an apples to apples comparison.

CLI command used:

tune download meta-llama/Llama-2-7b-chat --output-dir <checkpoint_path> --hf-token $HF_TOKEN

I then fine tuned the downloaded meta-format Llama 2 with torchtune:

tune run \
    --nproc_per_node=4 \
    lora_finetune_distributed \
    --config llama2/7B_lora \
    batch_size=1 \
    seed=29 \
    tokenizer.path=<checkpoint_path> \
    checkpointer.checkpoint_dir=<checkpoint_path> \
    checkpointer.output_dir=<checkpoint_path> \
    dataset=torchtune.datasets.my_custom_dataset \
    checkpointer.checkpoint_files=['consolidated.00.pth'] \
    checkpointer=torchtune.utils.FullModelMetaCheckpointer \
    gradient_accumulation_steps=1 \
    lr_scheduler.num_warmup_steps=5 \
    enable_activation_checkpointing=False \
    dataset.max_rows=100 \
    epochs=1

Then I converted the fine-tuned model to format that can be loaded with from_pretrained() with the convert_llama_weights_to_hf() function. I named the script convert_checkpoint.py, but it just contains the convert_llama_weights_to_hf() function copy/pasted from Transformers:

python convert_checkpoint.py --input_dir <checkpoint_dir> --output_dir <output_dir>  --model_size 7B

I then ran a Gradio chatbot on the fine-tuned/converted model, and it worked as expected. So, it does seem that adding Llama 3 support to this script will unlock my particular use case (and probably many others).

from transformers.

ArthurZucker avatar ArthurZucker commented on May 22, 2024

see #30334

from transformers.

calmitchell617 avatar calmitchell617 commented on May 22, 2024

Thank you! I will keep an eye on that on that PR.

from transformers.

ArthurZucker avatar ArthurZucker commented on May 22, 2024

Just needs the doc 🤗

from transformers.

calmitchell617 avatar calmitchell617 commented on May 22, 2024

Good to know.

Let me know if that's something I could contribute, otherwise, thanks for the good work.

from transformers.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.