Comments (4)
Ok, I figured out what I was doing wrong. I will leave the comment here in case someone has similar problem and will close the issue.
When sending to diarization, I was using segments created by the transcription process. Segments were too long (ie. 3-5 sentences), which meant that sometimes speakers were changing in between and the model took the one that was the most common in that segment. I have now changed and am sending segments created by the alignment process, where segments are much shorter and the result is much better.
from whisperx.
I have tried upgrading to Pyannote 3.1, and the problem persists. The alignment is pretty useless - even in a very controlled environment (ie. studio recording, BBC podcast, with 3 speakers), it is missing quiet a bit.
Anyone had success in making this better?
from whisperx.
@nikola1975 I am having the same issue, but your solution (the default code example in the README) doesn't solve it. Here's my code:
options = {
"max_new_tokens": None,
"clip_timestamps": None,
"hallucination_silence_threshold": None
}
model = whisperx.load_model("large-v3", device, compute_type=compute_type, download_root=model_dir, language=language, asr_options=options)
audio = whisperx.load_audio(file_path)
result = model.transcribe(audio, batch_size=batch_size, chunk_size=10, print_progress=True)
model_a, metadata = whisperx.load_align_model(language_code=result["language"], device=device)
result = whisperx.align(result["segments"], model_a, metadata, audio, device, return_char_alignments=False)
diarize_model = whisperx.DiarizationPipeline(use_auth_token=HF_TOKEN, device=device)
diarize_segments = diarize_model(audio, min_speakers=min_speakers)
result = whisperx.assign_word_speakers(diarize_segments, result)
from whisperx.
You are getting poor results from the diarization, or is it wrongly recognizing speakers? My results are not 100% precise now, but they are relatively close to it. I am not sure what are your expectations :)
I suppose you are using Pyannote 3.1 model? Try to run diarization through this link and check if you are getting the same results:
https://huggingface.co/spaces/pyannote/pretrained-pipelines
from whisperx.
Related Issues (20)
- Unable to run the whisperx with the installation steps provided in repository HOT 3
- whisperX witout internet access HOT 2
- how to use whsperx with hugging face pipeline
- OSError: undefined symbol: _ZN2at4_ops10zeros... HOT 4
- provide a option to use local VAD model
- not being able to pickup words at the last of the audio's while force aligning hindi audios
- Turning off timestamps? HOT 1
- How to enable diarization in python code (not terminal)? HOT 2
- Version 3.1.5 is distributed on pypi but Github repo only has 3.1.1? HOT 2
- WhisperX just stops at Diarization
- Is there a way to transcribe multiple audio files asynchronously/parallel with whisperX?
- Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory HOT 3
- Transcribing error HOT 2
- KeyError 'en'
- Wav2vec doesn't align numerical characters HOT 1
- Open PR to add latest version of faster-whisper HOT 1
- Parameter to enable verbose/Segment level printing for better debugging HOT 1
- Can Hard Coded Hyperparameters be moved to a config file? HOT 4
- Use whisperx diarization offline
- Split words separated by hyphens
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisperx.