Comments (4)
Actually we want the inv freq to be in float 32 to keep as much precision as possible.
because in the code we specify the dtype, the default torch dtype is not used
import torch.nn as nn
class myModule(nn.Module):
def __init__(self):
super().__init__()
self.register_buffer("freq", torch.ones(12,12, dtype=torch.float32))
torch.set_default_dtype(torch.bfloat16)
module = myModule()
module.freq.dtype
vs
import torch.nn as nn
class myModule(nn.Module):
def __init__(self):
super().__init__()
self.register_buffer("freq", torch.ones(12,12))
torch.set_default_dtype(torch.bfloat16)
module = myModule()
module.freq.dtype
from transformers.
Hey!
To answer that question I would need a simple reproducer, as well as the output of transformers-cli env
to see which version you are running this one.
AFAIK, buffers should we affected. I recently specifically wanted to keep certain buffers in a precision and it did not work
from transformers.
Hey! To answer that question I would need a simple reproducer, as well as the output of
transformers-cli env
to see which version you are running this one. AFAIK, buffers should we affected. I recently specifically wanted to keep certain buffers in a precision and it did not work
@ArthurZucker
sure, you could get the code here, colab
from transformers import AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained(
'Qwen/Qwen1.5-0.5B',torch_dtype=torch.bfloat16)
for name,buffer in model.named_buffers():
if buffer.dtype==torch.float32:
print(name,buffer.dtype)
the model is randomly chosen to test.
seems just this buffer rotary_emb.inv_freq
still torch.float32
from transformers.
Closing this as it's affecting buffers that don't specify a dtype, but not others.
from transformers.
Related Issues (20)
- AttributeError: 'BertModel' object has no attribute 'attn_implementation' HOT 16
- Training GPT2 with run_clm.py exceeds the described memory amount . HOT 2
- LayoutLMv3 Significant Training Slowdown from 4.33.3 -> 4.34.0 and beyond versions HOT 13
- Off-by-one error in strided perplexity calculation HOT 2
- RuntimeError: unique_by_key: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered HOT 2
- Autotokenizer."from_pretrained" read wrong config file. not "tokenizer_config.json", but "config.json" HOT 3
- ViTLayer.forward() needs to be in "eager" mode when `output_attentions=True` HOT 2
- Fix for hardcoded `final_labels` to enable loss calculation in PaliGemma HOT 9
- Sentence Transformers Gets Stuck loading HOT 3
- Paligemma causal attention still not causal ? HOT 5
- Add Nomic Embed Code to Transformers HOT 2
- loss calculation for PaliGemmaForConditionalGeneration potentially not cast to correct device HOT 2
- Trainer should throw a warning if max_sequence_length < number of tokens in dataset sample record. HOT 1
- Missing "config.json" when loading Llama-2-7b-chat-hf HOT 2
- OSError due to huggingface-hub FutureWarning about resume_download HOT 1
- Is there a source code installation method available? For example: from Test_transformers import AutoModelForCausalLM, AutoTokenizer HOT 4
- Mistral compile not working on T4 GPU (torch 2.3 + cu121)
- model.generate() able to accept past_key_values=None HOT 2
- RuntimeError: unable to open file when calling from_pretrained on multiple processes after upgrading hugginface_hub to 0.23.1 HOT 3
- Models with Phi3Config fail due to missing attention_bias
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.