Comments (9)
It works and I understand what you mean. Regarding the lack of timestamp information for numbers, I've been inferring and filling in this data using the timestamps of adjacent words. However, during processing, the original spacy file likely uses the token information from the JSON file generated by Whisper - a feature that WhisperX lacks. I'm uncertain whether this difference impacts Spacy's functionality.
I think i finally found a perfect way to get subtitles, in which i get near perfect aligning of timestamps and the length of the sentence is also naturally segregated. Using this spacy script.
What i did is basically divided what whisperX does in 3 parts.
Transcription, Aligning, Segmenting
What i did is do them all seperately and then combine those different methods to work with each other.
It's a bit long, but its perfect.
from whisperx.
Did you solve this problem?
from whisperx.
Did you solve this problem?
Yes i did
Basically some changes in the script made it run with whisperX as well
from whisperx.
So can you teach me how to do this, I try to change it many times,but I failed
from whisperx.
I have only implemented it on whisper. The JSON file output by whisperx does not contain the relevant token information, and I do not know how to handle this part in the script.
from whisperx.
I have only implemented it on whisper. The JSON file output by whisperx does not contain the relevant token information, and I do not know how to handle this part in the script.
Yes, i created different spacy files for whisperX and whispertimestamped,
https://gist.github.com/ankitgurua/eac069ed0c95e1ce5924a10923883133
https://gist.github.com/ankitgurua/7b0db06baa8e2c7288cbbf396169120d
the problem with whisperX is that its alignment model cannot align numbers so its json have numbers in sentences but dont have timestamps for it, so to deal with whisperX output you should use --suppress_numerals in your command and then use this spacy script i provided to segment it.
from whisperx.
It works and I understand what you mean. Regarding the lack of timestamp information for numbers, I've been inferring and filling in this data using the timestamps of adjacent words. However, during processing, the original spacy file likely uses the token information from the JSON file generated by Whisper - a feature that WhisperX lacks. I'm uncertain whether this difference impacts Spacy's functionality.
from whisperx.
What i did is basically divided what whisperX does in 3 parts. Transcription, Aligning, Segmenting
——Could you tell me how to do? Thank you very much.
from whisperx.
Transcription → WhisperX
Aligning → WhisperX -wav2vec2
Segmenting → Spacy
- You can find the script and environmental requirements above
from whisperx.
Related Issues (20)
- Allow Repetition HOT 1
- Could I Add Timestamps to My Text by WhisperX? HOT 2
- How to achieve known text content and obtain the timestamp of the text corresponding to the audio HOT 2
- How to use a fine-tuned segmentation model for diarization? HOT 3
- Use whisperx and pyannote in Colab without HuggingFace token HOT 1
- How to run Whisper X in Colab? HOT 4
- Bulk Processing HOT 1
- OAI Whisper transcribes correctly but whisperx returns `No active speech found in audio` HOT 6
- Speaker Diarization for bilingual speech
- how to change max_new_tokens (parameter) of whisper
- ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.
- WhisperX return translated output instead of normal transcription
- Unable to run the whisperx with the installation steps provided in repository HOT 1
- whisperX witout internet access HOT 2
- how to use whsperx with hugging face pipeline
- OSError: undefined symbol: _ZN2at4_ops10zeros... HOT 1
- provide a option to use local VAD model
- not being able to pickup words at the last of the audio's while force aligning hindi audios
- Turning off timestamps?
- How to enable diarization in python code (not terminal)?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisperx.