Comments (6)
from transformers.
I successfully performed the following steps to test whether adding Llama 3 support to this script would facilitate the use case I outlined above:
I downloaded Llama 2 in meta format (not HF format) with the torchtune cli. It is important not to test with a model in the HF format, because Llama 3 is not yet officially uploaded in HF format. It wouldn't be an apples to apples comparison.
CLI command used:
tune download meta-llama/Llama-2-7b-chat --output-dir <checkpoint_path> --hf-token $HF_TOKEN
I then fine tuned the downloaded meta-format Llama 2 with torchtune:
tune run \
--nproc_per_node=4 \
lora_finetune_distributed \
--config llama2/7B_lora \
batch_size=1 \
seed=29 \
tokenizer.path=<checkpoint_path> \
checkpointer.checkpoint_dir=<checkpoint_path> \
checkpointer.output_dir=<checkpoint_path> \
dataset=torchtune.datasets.my_custom_dataset \
checkpointer.checkpoint_files=['consolidated.00.pth'] \
checkpointer=torchtune.utils.FullModelMetaCheckpointer \
gradient_accumulation_steps=1 \
lr_scheduler.num_warmup_steps=5 \
enable_activation_checkpointing=False \
dataset.max_rows=100 \
epochs=1
Then I converted the fine-tuned model to format that can be loaded with from_pretrained()
with the convert_llama_weights_to_hf()
function. I named the script convert_checkpoint.py
, but it just contains the convert_llama_weights_to_hf()
function copy/pasted from Transformers:
python convert_checkpoint.py --input_dir <checkpoint_dir> --output_dir <output_dir> --model_size 7B
I then ran a Gradio chatbot on the fine-tuned/converted model, and it worked as expected. So, it does seem that adding Llama 3 support to this script will unlock my particular use case (and probably many others).
from transformers.
see #30334
from transformers.
Thank you! I will keep an eye on that on that PR.
from transformers.
Just needs the doc 🤗
from transformers.
Good to know.
Let me know if that's something I could contribute, otherwise, thanks for the good work.
from transformers.
Related Issues (20)
- [DOCS] - Model outputs of RecurrentGemmaCausalLM doesn't align with the documentation HOT 1
- [Batched Whisper] ValueError on input mel features HOT 3
- use_reentrant=False can't be set properly HOT 6
- Bug: InformerModel, decoder_input torch.cat size of tensor mismatch error otherwise HOT 7
- BitsNBytes 4 bit quantization error message typo and logical errors in error message handling HOT 3
- train_new_from_iterator does not properly modify the tokenizer's postprocessor's ids when using a Sequence postprocessor
- recent version of Transformers seems to mess with forward/__call__. Breaks patching loss function HOT 3
- TypeError: 'list' object is not callable || Resume from checkpoint HOT 3
- Failed to import transformers.models.vit.feature_extraction_vit because of the following error (look up to see its traceback): No module named 'ml_dtypes._custom_floats' HOT 1
- TokenClassificationPipeline support is_split_into_words tokeniser parameter HOT 2
- Implement kv cache sparsity like H2O with attention score HOT 1
- BART generate with min_new_tokens exceeds maximum length HOT 4
- Convert Helsinki-NLP model to huggingface
- Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained HOT 3
- Grounding DINO missing custom kernels HOT 2
- For multiple GPUs: torch.cuda.empty_cache() stuck forever
- Issues occuring during parallel evaluation (using Trainer.evaluate)
- ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided [] HOT 4
- Can the BNB quantization process be on GPU? HOT 2
- no_speech_probablity HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.