Comments (2)
In [3]: assistant_model.config
Out[3]:
LlamaConfig {
"_name_or_path": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 2048,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 22,
"num_key_value_heads": 4,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.41.0.dev0",
"use_cache": true,
"vocab_size": 32000
}
In [4]: model.config
Out[4]:
LlamaConfig {
"_name_or_path": "meta-llama/Meta-Llama-3-8B-Instruct",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128001,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.41.0.dev0",
"use_cache": true,
"vocab_size": 128256
}
the vocab size is very different 😉 you need to resize the embedding or choose a better assistant that was trained on the same vocab size!
Closing as this is just that the assitant model is not correct!
from transformers.
Thanks, that makes a lot of sense!
from transformers.
Related Issues (20)
- Add siglip flashattention support? HOT 4
- Unable to load t5-small tokenizer saved with latest packages in older versions
- Adding Special Tokens to GPT2 doesn't have any effect HOT 1
- TypeError: 'NoneType' object cannot be interpreted as an integer HOT 5
- AttributeError: 'LlamaForCausalLM' object has no attribute '_setup_cache' HOT 2
- Conversation pipeline example doesn't work HOT 6
- Finetuning OPT models with 8bit and LoRA on QA tasks leads to NAN weight in `model.qa_outputs` HOT 4
- LLaVA-NeXT-Video support HOT 4
- transformers 4.41.2 breaks paligemma inference HOT 4
- ModuleNotFoundError: No module named 'distutils' in Python 3.12 HOT 2
- Uniswap
- Uniswapv2
- Uniswapv3
- Etherscanio
- Erc
- Ww
- Wws HOT 1
- Speculative Decoding for chunked audios HOT 2
- Original Llama-3 tokenizer behaves differently from `transformers` version HOT 2
- MusicGen fails when being used with pipeline HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.