Hey all, after a nice conversation with <a class="user-mention notranslate" data-hover

Benchmarks for whisperx, faster-whisper, and whispers2t! about whisperx HOT 9 OPEN

BBC-Esq commented on July 16, 2024

Benchmarks for whisperx, faster-whisper, and whispers2t!

from whisperx.

Comments (9)

MahmoudAshraf97 commented on July 16, 2024

Interesting results indeed thanks for sharing, but afaik whispers2t is just an interface for multiple backends, so which one are you using here?

from whisperx.

BBC-Esq commented on July 16, 2024

Oh yeah, sorry, using the ctranslate2 backend. It's important to note that it's ctranslate2 and not just faster-whisper. As far as I know, whisperX and whisperS2T are the only repositories that have batch processing using ctranslate2. faster-whisper should hopefully be getting it soon, however. See Here.

At any rate, out of respect for the hard work of all the repositories I'm benching, it's important to note that different libraries have different benefits/drawbacks...my benchmarks are only for speed purposes.

from whisperx.

Infinitay commented on July 16, 2024

Recently https://github.com/ictnlp/StreamSpeech was released and I'm curious how it pairs up. Although currently it doesn't support many language unless you train it yourself and it's more real-time focused. Any chance you could benchmark it alongside whisperx if possible? Thanks

from whisperx.

BBC-Esq commented on July 16, 2024

Recently https://github.com/ictnlp/StreamSpeech was released and I'm curious how it pairs up. Although currently it doesn't support many language unless you train it yourself and it's more real-time focused. Any chance you could benchmark it alongside whisperx if possible? Thanks

Interesting...Thanks for the link. I briefly checked it out and the model names imply that they only handle translation. I didn't see a model that handled straight transcription from one language to the same language. With that being said, if you find out otherwise and provide me with a basic script that can perform inference, I'll fine tune it to get vram measurements and timing and process the same audio file that my other benchmarks did?

from whisperx.

stri8ed commented on July 16, 2024

It looks like whispers2t does not use the previous segment transcription as context. This is the same with WhisperX. Would be interesting to see WER benchmarks alongside the performance, especially for long audio, which may be more sensitive to the context, or lack thereof.

from whisperx.

MahmoudAshraf97 commented on July 16, 2024

I guess whisperx paper showed that using previous segment transcription in the prompt isn't useful

from whisperx.

stri8ed commented on July 16, 2024

I guess whisperx paper showed that using previous segment transcription in the prompt isn't useful

Indeed. I recall reading that. Anecdotally, that does not seem to be the case for me, but interested to hear if anyone else has more data on that. Intuitively, I would expect additional context to be useful, given the model was trained to condition the result based on the prompt/context.

from whisperx.

BBC-Esq commented on July 16, 2024

It looks like whispers2t does not use the previous segment transcription as context. This is the same with WhisperX. Would be interesting to see WER benchmarks alongside the performance, especially for long audio, which may be more sensitive to the context, or lack thereof.

If you go here you can see that the WER rate is actually better...lol. Still trying to figure that out out, but the guy seems solid in his testing so far:

https://github.com/shashikg/WhisperS2T/releases

from whisperx.

Jiltseb commented on July 16, 2024

Generally very long context (>30 sec) is not needed for ASR (unlike paralinguistic tasks). By not passing in the previous context, we can prevent some repetitions/hallucinations from passing on to the next segment, as we see in batched faster_whisper, and inturn better WER.

from whisperx.

Benchmarks for whisperx, faster-whisper, and whispers2t! about whisperx HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent