I am running speaker diarization, with Pyannote 3.0.1, and am struggling to improve re

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Diarization precision - is there way to improve it? about whisperx HOT 4 CLOSED

nikola1975 commented on September 25, 2024 1

Diarization precision - is there way to improve it?

from whisperx.

Comments (4)

nikola1975 commented on September 25, 2024 3

Ok, I figured out what I was doing wrong. I will leave the comment here in case someone has similar problem and will close the issue.

When sending to diarization, I was using segments created by the transcription process. Segments were too long (ie. 3-5 sentences), which meant that sometimes speakers were changing in between and the model took the one that was the most common in that segment. I have now changed and am sending segments created by the alignment process, where segments are much shorter and the result is much better.

from whisperx.

nikola1975 commented on September 25, 2024

I have tried upgrading to Pyannote 3.1, and the problem persists. The alignment is pretty useless - even in a very controlled environment (ie. studio recording, BBC podcast, with 3 speakers), it is missing quiet a bit.

Anyone had success in making this better?

from whisperx.

drstuggels commented on September 25, 2024

@nikola1975 I am having the same issue, but your solution (the default code example in the README) doesn't solve it. Here's my code:

options = {
    "max_new_tokens": None,
    "clip_timestamps": None,
    "hallucination_silence_threshold": None
}

model = whisperx.load_model("large-v3", device, compute_type=compute_type,  download_root=model_dir, language=language, asr_options=options)
audio = whisperx.load_audio(file_path)
result = model.transcribe(audio, batch_size=batch_size, chunk_size=10, print_progress=True)

model_a, metadata = whisperx.load_align_model(language_code=result["language"], device=device)
result = whisperx.align(result["segments"], model_a, metadata, audio, device, return_char_alignments=False)

diarize_model = whisperx.DiarizationPipeline(use_auth_token=HF_TOKEN, device=device)
diarize_segments = diarize_model(audio, min_speakers=min_speakers)
result = whisperx.assign_word_speakers(diarize_segments, result)

from whisperx.

nikola1975 commented on September 25, 2024

You are getting poor results from the diarization, or is it wrongly recognizing speakers? My results are not 100% precise now, but they are relatively close to it. I am not sure what are your expectations :)

I suppose you are using Pyannote 3.1 model? Try to run diarization through this link and check if you are getting the same results:
https://huggingface.co/spaces/pyannote/pretrained-pipelines

from whisperx.

Recommend Projects

Diarization precision - is there way to improve it? about whisperx HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent