System Info transformers ve

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

cc <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Issue related to dtype with F.conv1d in Whisper evaluation about transformers HOT 4 OPEN

moncefbenaicha commented on June 14, 2024

Issue related to dtype with F.conv1d in Whisper evaluation

from transformers.

Comments (4)

ylacombe commented on June 14, 2024 2

Hey @moncefbenaicha, you're temporary solution is actually the right one since the processor only outputs torch.float32 arrays!

However, I do believe it should work with Flash Attention, have you got an error using bloat16 and FA?

from transformers.

LysandreJik commented on June 14, 2024

cc @sanchit-gandhi @ylacombe

from transformers.

moncefbenaicha commented on June 14, 2024

Update

A temporary solution is to force the input to bfloat16 and disable flash_attention.

batch = self.processor(
            audio=audio_arrays,
            sampling_rate=16000,
            padding="max_length",
            return_tensors="pt",
        )
batch["input_features"] = batch["input_features"].to(dtype=torch.bfloat16)

from transformers.

sanchit-gandhi commented on June 14, 2024

Hey @moncefbenaicha - it would be great to see:

How you're instantiating the model using .from_pretrained. Specifically, what argument you're passing to attn_implementation, torch_dtype, and whether you're moving the model manually to a torch device
The training args you're using. Specifically, what you set for fp16, bf16, fp16_full_eval and bf16_full_eval

Passing bf16_full_eval=True might be of interest to you if you're casting the model weights to bf16 manually yourself.

from transformers.

Issue related to dtype with F.conv1d in Whisper evaluation about transformers HOT 4 OPEN

Comments (4)

Update

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent