Giter Site home page Giter Site logo

pyannote-whisper's Introduction

pyannote-whisper

Run ASR and speaker diarization based on whisper and pyannote.audio.

Installation

  1. Install whisper.
  2. Install pyannote.audio.
  3. Downgrade setuptools to 59.5.0

Command-line usage

Same as whisper except a new param diarization:

python -m pyannote_whisper.cli.transcribe data/afjiv.wav --model tiny --diarization True

Python usage

Transcription can also be performed within Python:

import whisper
from pyannote.audio import Pipeline
from pyannote_whisper.utils import diarize_text
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization",
                                    use_auth_token="your/token")
model = whisper.load_model("tiny.en")
asr_result = model.transcribe("data/afjiv.wav")
diarization_result = pipeline("data/afjiv.wav")
final_result = diarize_text(asr_result, diarization_result)

for seg, spk, sent in final_result:
    line = f'{seg.start:.2f} {seg.end:.2f} {spk} {sent}'
    print(line)
0.00 10.34 SPEAKER_00  I think if you're a leader and you don't understand the terms that you're using, that's probably the first start.
10.34 16.24 SPEAKER_00  It's really important that as a leader in the organisation you understand what digitisation means.
16.24 18.52 SPEAKER_00  You take the time to read widely in the sector.
18.52 26.16 SPEAKER_00  There are a lot of really good books, Kevin Kelly, who started Wired magazine has written a great book on various technologies.
26.16 34.80 SPEAKER_00  I think understanding the technologies, understanding what's out there so that you can separate the hype from the hope is really an important first step.
34.80 41.04 SPEAKER_00  And then making sure you understand the relevance of that for your function and how that fits into your business is the second step.
41.04 44.92 SPEAKER_01  I think two simple suggestions.
44.92 49.68 SPEAKER_01  One is I love the phrase brilliant at the basics.
49.68 52.00 SPEAKER_01  How can you become brilliant at the basics?
52.00 62.48 SPEAKER_01  But beyond that, the fundamental thing I've seen which hasn't changed is so few organisations as a first step have truly taken control of their spend data.
62.48 68.44 SPEAKER_01  As a key first step on a digital transformation, taking ownership of data.
68.44 71.76 SPEAKER_01  That's not a decision to use one vendor over someone else.
71.76 76.40 SPEAKER_01  That says we are going to be completely data driven, we're going to try and be as real time as possible.
76.40 81.04 SPEAKER_01  And we're going to be able to explain that data to anyone the way they want to see it.
81.04 91.04 SPEAKER_03  Understand why you're doing it.
91.04 95.24 SPEAKER_03  Talk to them, collaborate with them, you'll get a much better outcome.
95.24 104.32 SPEAKER_04  Think about what outcome you want at the end instead of thinking about the different processes and their software names.
104.32 108.32 SPEAKER_04  So, e-sourcing being one of 20.
108.32 109.52 SPEAKER_04  Think big and be brave.
109.52 118.56 SPEAKER_04  I think and talk to technology vendors because rather than just sending them forms, we won't bite you.
118.56 130.96 SPEAKER_02  I think we should fundamentally, all of us, rethink how procurement should be done and then start to define the functionality that we need and how we can make this work.
130.96 135.68 SPEAKER_02  What we do today is absolutely wrong.
135.68 172.00 SPEAKER_02  We don't like it, but we don't like it, our colleagues don't like it, nobody wants it and we're spending a huge amount of money for no reason.

Python usage 2

please find more details in this notebook.

import whisper
from pyannote.audio import Pipeline
from pyannote.audio import Audio
from pyannote_whisper.utils import diarize_text
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization",
                                    use_auth_token="your/token")
model = whisper.load_model("tiny.en")
diarization_result = pipeline("data/afjiv.wav")

from pyannote.audio import Audio
audio = Audio(sample_rate=16000, mono=True)
audio_file = "data/afjiv.wav"
for segment, _, speaker in diarization_result.itertracks(yield_label=True):
    waveform, sample_rate = audio.crop(audio_file, segment)
    text = model.transcribe(waveform.squeeze().numpy())["text"]
    print(f"{segment.start:.2f}s {segment.end:.2f}s {speaker}: {text}")

Meeting summary from chatgpt

summarize the following words: SPEAKER_00  I think if you're a leader and you don't understand the terms that you're using, that's probably the first start.
SPEAKER_00  It's really important that as a leader in the organisation you understand what digitisation means.
SPEAKER_00  You take the time to read widely in the sector.
SPEAKER_00  There are a lot of really good books, Kevin Kelly, who started Wired magazine has written a great book on various technologies.
...
Leaders need to understand the terms and technologies of digitization in order to separate hype from hope and apply it effectively to their business. Becoming data-driven and collaborating with technology vendors can help organizations achieve their desired outcomes. Procurement processes need to be fundamentally rethought to improve functionality and eliminate unnecessary spending.
summarize the opinions of different speaker
Speaker 00 emphasizes the importance of leaders understanding digitization terms and technologies to apply them effectively to their business. 
Speaker 01 suggests that taking control of spend data is a crucial first step in digital transformation. 
Speaker 02 believes that procurement processes need to be rethought to improve functionality and eliminate unnecessary spending. 
Speaker 03 advises understanding the purpose of digitization and collaborating with colleagues for a better outcome. 
Speaker 04 suggests thinking big and being brave, as well as talking to technology vendors to achieve desired outcomes.

pyannote-whisper's People

Contributors

benjamin-loison avatar benoitrolland avatar gsheni avatar guillermo1996 avatar jqueguiner avatar rbroderi avatar yinruiqing avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyannote-whisper's Issues

No longer diarizes

Seems that it only performs the transcription and no longer diarization. See below is based on the shared example file (of which the repo is sitll using yinruiqing's HF token - as poined out by Jordi in another thread) 太可怕~

Screenshot 2024-02-18 at 11 35 25

Diarization ends in crash

Command used: python -m pyannote_whisper.cli.transcribe C:\Users\style\Desktop\sicherheit\1045400524.wav --model tiny --diarization True

Traceback (most recent call last):
  File "C:\Users\style\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\style\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\style\PycharmProjects\pyannote-whisper\pyannote_whisper\cli\transcribe.py", line 131, in <module>
    cli()
  File "C:\Users\style\PycharmProjects\pyannote-whisper\pyannote_whisper\cli\transcribe.py", line 126, in cli
    res = diarize_text(result, diarization_result)
  File "C:\Users\style\PycharmProjects\pyannote-whisper\pyannote_whisper\utils.py", line 59, in diarize_text
    res_processed = merge_sentence(spk_text)
  File "C:\Users\style\PycharmProjects\pyannote-whisper\pyannote_whisper\utils.py", line 43, in merge_sentence
    elif text[-1] in PUNC_SENT_END:
IndexError: string index out of range

Any idea why this is happening? I have this for multiple files, but not for all files.

Script exits with Cuda Error under Ubuntu WSL2

System: Ubuntu WSL2, Windows 11

Hey, when I run your script, I encounter the following error:

python3 -m pyannote_whisper.cli.transcribe "videoplayback.m4a" --model large --diarization True Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/**/.local/lib/python3.8/site-packages/pyannote_whisper/cli/transcribe.py", line 124, in <module> cli() File "/home/**/.local/lib/python3.8/site-packages/pyannote_whisper/cli/transcribe.py", line 91, in cli model = load_model(model_name, device=device, download_root=model_dir) File "/home/**/.local/lib/python3.8/site-packages/whisper/__init__.py", line 115, in load_model return model.to(device) File "/home/**/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 987, in to return self._apply(convert) File "/home/**/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 639, in _apply module._apply(fn) File "/home/**/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 639, in _apply module._apply(fn) File "/home/**/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 639, in _apply module._apply(fn) [Previous line repeated 2 more times] File "/home/**/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 662, in _apply param_applied = fn(param) File "/home/**/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 985, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 4.00 GiB total capacity; 3.22 GiB already allocated; 0 bytes free; 3.47 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

However, if I run nvidia-smi under Ubuntu, it displays that only 10MiB/4GB of GPU memory is used. I have the newest NVIDIA Cuda drivers installed.

Segmentation fault Error

Hi,

I just followed the Github ReadMe Instruction to install all the packages and run the Python usage 1 sample code (Code is exactly same as the ReadMe) in an EC2 server, but I met an error "Segmentation fault" and don't know how to resolve that. I copied all the python command line output here, can you help me with this? Thanks!

/home/ec2-user/pyannote-whisper/venv/lib/python3.10/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
torchaudio.set_audio_backend("soundfile")
Segmentation fault

Should Diarization take this long?

Hi, How can I improve the diarization time? Currently 40+ seconds is taking nearly 3 minutes (see times below)

Load Model
2023-02-09 17:24:42.884215

Transcribe
2023-02-09 17:24:43.236359

Diarize
2023-02-09 17:24:48.702966

Combined
2023-02-09 17:27:40.335563

0.00-4.00 SPEAKER_02  Thanks to our partner Square, everything your business needs.
4.00-10.00 SPEAKER_02  Like payments, point of sale, e-commerce, inventory management, charging endlessly about the weather?
10.00-12.00 SPEAKER_02  Well, you're on your own there.
14.00-16.00 SPEAKER_01  Square, everything your business needs.
16.00-20.00 SPEAKER_01  Almost. Visit Square.com.
20.00-30.00 SPEAKER_00  This is an Irish Independent Podcast.
30.00-34.00 SPEAKER_03  I'm Adrian Wacler and this is the Big Tech Show.
34.00-37.00 SPEAKER_03  Is your traditional office doomed?
37.00-50.00 SPEAKER_03  Dropbox founder and CEO Drew Huyerson thinks so, and was in double in the week to show off the company's new revamped virtual first office in the series.

Here is my Python code, any help in optimizing it or even implementing a better solution would be greatly appreciated, cheers.

from pyannote.audio import Pipeline
import whisper
from pyannote_whisper.utils import diarize_text

pipeline = Pipeline.from_pretrained("Models/config.yaml")
tiny = "Models/tiny.pt"
audio = "Audio/WAV-CLIP-The-Big-Tech-Show_Dropbox-CEO.wav"

model = whisper.load_model(tiny)
asr_result = model.transcribe(audio)
diarization_result = pipeline(audio)
final_result = diarize_text(asr_result, diarization_result)

for seg, spk, sent in final_result:
    line = f'{seg.start:.2f}-{seg.end:.2f} {spk} {sent}'
    print(line)

Syntax error on model_name

I think this is a really cool project and am trying to get it working unfortunately I keep getting this error

python -m pyannote_whisper.cli.transcribe ~/Downloads/preamble10.wav --model medium --diarization True
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 163, in _run_module_as_main
mod_name, _Error)
File "/usr/lib/python2.7/runpy.py", line 119, in _get_module_details
code = loader.get_code(mod_name)
File "/usr/lib/python2.7/pkgutil.py", line 281, in get_code
self.code = compile(source, self.filename, 'exec')
File "/home/conrad/Documents/pyannote-whisper-main/pyannote_whisper/cli/transcribe.py", line 67
model_name: str = args.pop("model")
^
SyntaxError: invalid syntax

Any thoughts?

No SPEAKER label even when Diarization is enabled.

I've ran "python setup.py build" and "python setup.py install" to install this package, but when running the following

$ python -m pyannote_whisper.cli.transcribe oxventure.mp3 --model medium.en --diarization True                                                                                                                              
[00:00.000 --> 00:03.440]  Hello and welcome to the Oxventure podcast. I'm Jane his Andy.                                                                                                                                                    
[00:03.440 --> 00:04.840]  Hey, Jane, thanks for having me.                                                                                                                                                                                  
[00:04.840 --> 00:10.520]  Welcome to this podcast version of the first ever episode of Oxventure D&D.                                                                                                                                       
[00:10.520 --> 00:14.360]  Yep, which stands for Dungeons and Dragons, if you're not aware.                                                                                                                                                  
[00:14.360 --> 00:15.760]  Which we know now.

there are no SPEAKER labels like we expect to see in the README. Perhaps I'm doing something wrong?

The audio file to test can be found here https://outsidexbox.libsyn.com/1-the-spicy-rat-caper-part-1

Audio annotation application like prodigy or descript

I am developing an audio annotation application based on Vue.js. The project repo is here. It could take advantage of asr and speaker diarization systems like Whisper and pyannote-audio. If someone is interested in this project and familiar with frontend skills (javascript, vue.js, react), I can add you as collaborator in this project.

view

invalid str2bool value

I'm getting "transcribe.py: error: argument --diarization: invalid str2bool value: 'true'".

How do I fix this?

Oh and I have a question, how would I go about splitting the audio into individual files by speaker? maybe a feature you could add?

Thanks!

code is returning two digits output?

output:
0.000 13.000 SPEAKER_01 xxx
13.000 17.000 SPEAKER_00 xxx xxxx
17.000 19.000 SPEAKER_01 xxxxxx

when it instead should be:

11.954 13.183 SPEAKER_01 xxx
13.677 16.681 SPEAKER_00 xxx xxxx
17.568 18.763 SPEAKER_01 xxxxxx

License

Thank you for the cool project.
This is a request to choose and add a license to your repo. An open license such as MIT would be great, however, it's up to you.

why diarize_text method take such a long time?

image
image

as i see the method is just doing some str merging job ,but it take much longer than i expected.
i.e a 7-minutes long audio file which take 30s to be processed with whisper and pyannote models, but i take 70s to processed by diarize_text method! i have no idea why this happen.
actually the variable diarization allready contain all the infomation we need, all we need to do is to concate it,right?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.