Comments (4)
Hi @vsocrates
Thanks for the issue !
You are getting that warning because the model's maximum positional embedding stops at 1024 tokens, some models have fixed positional embeddings where you can't exceed the maximum number of tokens by design (e.g. by having a nn.Embedding
layer), for some models it is possible to exceed that, at your own risk as the model has not been trained to exceed that many tokens. If you are getting consistent / nice generations, I would say that there is nothing to worry about, otherwise you might need to use other models that support longer context length
from transformers.
Understood! Went through some other issues and and it looks like T5 might use relative position embeddings so in theory, should be able to extend beyond its max context length (512 tokens), but potentially with some loss of accuracy/weird generations, is that correct?
from transformers.
Yes this is correct, from my experience with flan-T5 that was possible but with some potential loss of accuracy / unconsistent generation
from transformers.
Great, thanks, closing this issue!
from transformers.
Related Issues (20)
- 'FastSpeech2ConformerConfig' object has no attribute 'model_config' HOT 1
- Attention mask is None when 0.0 is not in attention_mask HOT 2
- While using the integration of bitsandbytes, Error shows: name 'torch' is not defined HOT 6
- Confusion about LlavaNextImageProcessor results HOT 5
- KeyError: 'grounding-dino' HOT 3
- Stuck on Initializing Transformers Model with FSDP (Fully Sharded Data Parallel) using meta device HOT 3
- AttributeError: 'TrainingArguments' object has no attribute 'packing' #qlora, STFTrainer #unsloth HOT 3
- Constraints in constrained beam search can be satisfied by the inputs. HOT 4
- Having a function to verify if checkpoint is valid HOT 2
- `merge_and_unload` for a quantized model ruins its quality
- SwinLayer / DonutSwinLayer / ClapAudioLayer attention mask creation always happens on CPU HOT 1
- AutoModelForCausalLM.from_pretrained silently fails HOT 2
- Galore finetuning #stopped HOT 5
- [GGUF] Support new architectures/ quantisation schemes in Transformers HOT 1
- Language modeling examples do not show how to do multi-gpu training / fine-tuning HOT 8
- Using a single 'RecurrentGemmaRglru' layer - "Trying to backward through the graph a second time" Error
- `MixtralFlashAttention2` subscripts `position_ids` before checking if it is `None` HOT 2
- llava-next, any resolution bug? HOT 4
- `model_kwargs` is None when `generation_config` is passed as a dict instead of `generation.GenerationConfig` HOT 2
- Can't do fine-tuning on Colab HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.