Comments (9)
Interesting results indeed thanks for sharing, but afaik whispers2t is just an interface for multiple backends, so which one are you using here?
from whisperx.
Oh yeah, sorry, using the ctranslate2 backend. It's important to note that it's ctranslate2
and not just faster-whisper
. As far as I know, whisperX
and whisperS2T
are the only repositories that have batch processing using ctranslate2
. faster-whisper
should hopefully be getting it soon, however. See Here.
At any rate, out of respect for the hard work of all the repositories I'm benching, it's important to note that different libraries have different benefits/drawbacks...my benchmarks are only for speed purposes.
from whisperx.
Recently https://github.com/ictnlp/StreamSpeech was released and I'm curious how it pairs up. Although currently it doesn't support many language unless you train it yourself and it's more real-time focused. Any chance you could benchmark it alongside whisperx if possible? Thanks
from whisperx.
Recently https://github.com/ictnlp/StreamSpeech was released and I'm curious how it pairs up. Although currently it doesn't support many language unless you train it yourself and it's more real-time focused. Any chance you could benchmark it alongside whisperx if possible? Thanks
Interesting...Thanks for the link. I briefly checked it out and the model names imply that they only handle translation. I didn't see a model that handled straight transcription from one language to the same language. With that being said, if you find out otherwise and provide me with a basic script that can perform inference, I'll fine tune it to get vram measurements and timing and process the same audio file that my other benchmarks did?
from whisperx.
It looks like whispers2t
does not use the previous segment transcription as context. This is the same with WhisperX
. Would be interesting to see WER benchmarks alongside the performance, especially for long audio, which may be more sensitive to the context, or lack thereof.
from whisperx.
I guess whisperx paper showed that using previous segment transcription in the prompt isn't useful
from whisperx.
I guess whisperx paper showed that using previous segment transcription in the prompt isn't useful
Indeed. I recall reading that. Anecdotally, that does not seem to be the case for me, but interested to hear if anyone else has more data on that. Intuitively, I would expect additional context to be useful, given the model was trained to condition the result based on the prompt/context.
from whisperx.
It looks like
whispers2t
does not use the previous segment transcription as context. This is the same withWhisperX
. Would be interesting to see WER benchmarks alongside the performance, especially for long audio, which may be more sensitive to the context, or lack thereof.
If you go here you can see that the WER rate is actually better...lol. Still trying to figure that out out, but the guy seems solid in his testing so far:
https://github.com/shashikg/WhisperS2T/releases
from whisperx.
Generally very long context (>30 sec) is not needed for ASR (unlike paralinguistic tasks). By not passing in the previous context, we can prevent some repetitions/hallucinations from passing on to the next segment, as we see in batched faster_whisper, and inturn better WER.
from whisperx.
Related Issues (20)
- Load Model To CPU and Then GPU HOT 1
- My timestamps with whisperX are way off HOT 14
- Issue with Periods in Dates or numbers Causing Incorrect Segment Splitting in German Transcriptions
- Unable to Transcribe More Than 90 Minutes (1h30m) HOT 2
- Empty transcript is generated
- Readability trashed after putting length limits. HOT 1
- matrix contains invalid numeric entries
- Hallucinations in silent videos - need suggestions
- Running Finetuned Model Error
- torch 1.10.0+cu102, yours is 2.0.0
- Chinese and English multi-language recognition, there is a problem of ignoring English recognition,How to solve this problem? HOT 1
- Translate japanese audio HOT 1
- Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
- `numpy==2.0.0` breaks model loading HOT 5
- WhisperX drops repeated words .
- WhisperX missing audio part while original whisper and fast whisper working fine HOT 1
- Natural Subtitle Segmentation and Splitting without trashing the readability. HOT 9
- How can I close Vad
- English Translation in whisperx has much better result than original transcription
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisperx.