Comments (16)
It's easy to add, not a big deal, but I want to make sure first that it does affect the inference, because it was reported earlier that this parameter has no effect on inference, only evaluation which is not used here
from whisper-diarization.
the speaker count in the config is for evaluation purposes only, in order to tune the diarization performance, you need to play with speaker_embeddings
parameters such as window_length_in_sec
and shift_length_in_sec
. diar_window_length
and sigmoid_threshold
might be worth giving a shot too
from whisper-diarization.
thanks, i'll give that a try :)
from whisper-diarization.
Hi,
I'm have the same issue. Can you provide some instructions on how to tune those parameters to have a higher number of speakers by default, please?
Thanks in advance.
from whisper-diarization.
Hi @famda , I haven't tinkered with these before, but it's totally trial and error
from whisper-diarization.
No worries, I can make some tests with it. I just need to understand where to start. Can you guide me a little so I can play around with it? π
from whisper-diarization.
From just playing around with the shift_length and lowering it to about 0.25 I was able to detect 7 speakers when running it on an audiofile it only detected 3 before (theres 9 actual people speaking in it). Going Lower than that, didn't make much of a difference, but increased inference time dramatically (running locally on CPU). So i would start there. Changing the sigmoid_threshold didn't do much, but you can try that as well.
from whisper-diarization.
You should also try playing with scale windows and weights
from whisper-diarization.
Hi, how do you configure the possibility to separate the speaker?
Which is the config flag to set? because in the instruction there is nothing about this..
Can you help me?
from whisper-diarization.
Hello, what do you mean exactly by separating the speaker?
from whisper-diarization.
Sorry for my bad english!
Actually I have this:
{
"speaker": "Speaker 0",
"start_time": 433460,
"end_time": 435480,
"text": "Adesso approfondiamo un pochino meglio. "
},
{
"speaker": "Speaker 0",
"start_time": 435520,
"end_time": 437420,
"text": "Allora, innanzitutto, da dove venite ragazzi? "
},
both sentences are categorized as βspeaker0,β but the first sentence is spoken by a woman and the second by a male speaker.
Is it necessary to set some special configuration parameter to get speaker0 and speaker1?
from whisper-diarization.
Hi, I want know can i use task='translate' in Whisper_Transcription_+_NeMo_Diarization.ipynb file. actually i want to pass a non-english (hindi) audio in model and then after getting english transcription(by using task = translate). I want to perform speaker diarization.
but i think because my translated transcript and audio both have different language english and hindi respectively I am not able to achieve this.
can somebody help me. How can i perform speaker diarization and translation of transcription both.
for testing purpose i am using Whisper_Transcription_+_NeMo_Diarization.ipynb
from whisper-diarization.
@francescocassini usually the default settings work as expected, but you can check my second comment about what to change if they dont
@01Ashish I haven't tested it for translate task yet, but as a starter, you should enable word timestamps in whisper and remove the alignment model and see how it goes
from whisper-diarization.
I want to use max_speaker parameter as cli argument like whisperx. Do you have any plan or solution?
Our case is clear for how many peoples speak in audio, so I expect model perform well by assigning correct max_speaker number for each execution. However, I don't know well how to do that.
from whisper-diarization.
I want to use max_speaker parameter as cli argument like whisperx. Do you have any plan or solution?
Our case is clear for how many peoples speak in audio, so I expect model perform well by assigning correct max_speaker number for each execution. However, I don't know well how to do that.
You can modify this parameter in the telephonic YAML config found in the configs folder, can you try it on an audio that predicts the wrong number of speakers and see if it makes a difference?
from whisper-diarization.
@MahmoudAshraf97
Thank you for quick reply. but, unfortunately I didn't face wrong number of speaker problem. just considering how to use this repository well.
I use this repository for creating transcription with docker on cloud services. So..., changing yaml is bit difficult for my case.
Ideally, If I can assign max_speaker value from cli, I can change behavior dynamically via docker run
command.
That kind of operation should be supported in this repository in future? If no plan, I will use larger speaker count
from whisper-diarization.
Related Issues (20)
- TypeError: unsupported operand type(s) for |: 'type' and 'type' HOT 4
- No such file or directory: '/usr/local/lib/python3.10/dist-packages/ctc_forced_aligner/punctuations.lst' HOT 1
- Installation issues with Windows HOT 4
- separating tracks killed HOT 2
- "Permission denied" HOT 6
- ContextualVersionConflict / Could not find the operator torchvision::nms HOT 5
- ubuntu pip install issue HOT 2
- Doesn't it provide Diarization for Korean? HOT 2
- Installed without any issues, received this error when trying to run. HOT 11
- Cannot import name 'ModelFilter' from 'huggingface_hub' HOT 6
- ModuleNotFoundError: No module named 'numpy' HOT 4
- Additional punctuation support HOT 2
- Some errors occurred during Speaker Diarization using the NeMo MSDD Model HOT 2
- How to tune speaker diarization error? HOT 3
- ImportError: cannot import name 'ModelFilter' from 'huggingface_hub' HOT 2
- FileNotFoundError HOT 27
- Deployment options HOT 5
- Requested float16 compute type, tesla p40 HOT 1
- Check if each ground truth RTTMs were present in the provided manifest file. Skipping calculation of Diarization Error Rate error HOT 1
- cannot import name '_sentencepiece' from partially initialized module 'sentencepiece' (most likely due to a circular import) HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisper-diarization.