collabora / whisperlive Goto Github PK
View Code? Open in Web Editor NEWA nearly-live implementation of OpenAI's Whisper.
License: MIT License
A nearly-live implementation of OpenAI's Whisper.
License: MIT License
Hello,
I have a problem when transcribe audio file.
File "C:\Users\atrabels\AppData\Local\Programs\Python\Python311\Lib\subprocess.py", line 1538, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] Le fichier spécifié est introuvable
i have solved this problem by changing shell=True in the subprocess. py file by i have faced new problem
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 84: invalid start byte
Any solution ?
When the server tries to process the first 30 seconds of input from the browser I get this error:
INFO:faster_whisper:Processing audio with duration 00:30.000
Exception in thread Thread-11:
Traceback (most recent call last):
File "/media/UltraStorageBTRFS/Programs/Linux/anaconda3/envs/whisper-live/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/media/UltraStorageBTRFS/Programs/Linux/anaconda3/envs/whisper-live/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/marc/Documents/GitClones/whisper-live/server.py", line 152, in speech_to_text
self.language, lang_prob = self.transcriber.transcribe(
ValueError: too many values to unpack (expected 2)
The issue is with line 152, even though it has two variables to unpack to it's throwing an error. I fixed it by doing this but there's probably a better way. Line 152 onwards:
transcriber_output = self.transcriber.transcribe(
input_bytes,
initial_prompt=None,
language=self.language,
task=self.task
)
self.language = transcriber_output[0]
lang_prob = transcriber_output[1]
I start up the server via $ python ./run_server.py
(whisper_live) whisperlive git:(main)✗ 🚀 python ./run_server.py
Downloading: "https://github.com/snakers4/silero-vad/archive/master.zip" to /Users/justinwinter/.cache/torch/hub/master.zip
2023-08-21 12:14:34.119619 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '628'. It is not used by any node and should be removed from the model.
2023-08-21 12:14:34.119647 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '629'. It is not used by any node and should be removed from the model.
2023-08-21 12:14:34.119652 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '623'. It is not used by any node and should be removed from the model.
2023-08-21 12:14:34.119655 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '625'. It is not used by any node and should be removed from the model.
2023-08-21 12:14:34.119659 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '620'. It is not used by any node and should be removed from the model.
2023-08-21 12:14:34.119696 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '139'. It is not used by any node and should be removed from the model.
2023-08-21 12:14:34.119701 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '131'. It is not used by any node and should be removed from the model.
2023-08-21 12:14:34.119704 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '140'. It is not used by any node and should be removed from the model.
2023-08-21 12:14:34.119708 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '134'. It is not used by any node and should be removed from the model.
2023-08-21 12:14:34.119711 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '136'. It is not used by any node and should be removed from the model.
ERROR:root:no close frame received or sent
Then start up the client via:
(whisper_live) whisperlive git:(main)✗ 🚀 python ./run_client.py
[INFO]: * recording
[INFO]: Waiting for server ready ...
False en transcribe
[INFO]: Opened connection
[INFO]: Server Ready!
Traceback (most recent call last):
File "/Users/justinwinter/projects/whisperlive/./run_client.py", line 3, in <module>
client()
File "/Users/justinwinter/projects/whisperlive/whisper_live/client.py", line 298, in __call__
self.client.record()
File "/Users/justinwinter/projects/whisperlive/whisper_live/client.py", line 234, in record
data = self.stream.read(self.CHUNK)
File "/opt/homebrew/Caskroom/miniconda/base/envs/whisper_live/lib/python3.9/site-packages/pyaudio/__init__.py", line 570, in read
return pa.read_stream(self._stream, num_frames,
OSError: [Errno -9981] Input overflowed
// run_client.py
from whisper_live.client import TranscriptionClient
client = TranscriptionClient("0.0.0.0", "8080", is_multilingual=False, lang="en", translate=False)
client()
the new onnxruntime==1.16.0 breaks the whisper server
I installed it by "docker build . -t whisper-live -f docker/Dockerfile.cpu" and then ran it by "docker run -it -p 9090:9090 whisper-live:latest". But it seems to just hang on terminal (after a while, it asked for microphone access, but that was it). Could you help troubleshoot this issue? Thank you!
Hello!
I am currently using Twilio as my method to make phone calls + streaming the voice data to my server.
However, when I tried to convert Twilio's x-mulaw audio format to PCM Linear (as expected from WhisperLive), I don't get any response from WhisperLive. In other words, I know that my GPU is working, I know that the audio data works (I converted the audio to a wav file and listened to it clearly) but I'm not getting any transcription.
I also thought it was worth noting that the audio was a bit quiet--not sure if that could be a source of suspicion?
Here's my code conversion (from x-mulaw to PCM)
audio = base64.b64decode(packet['media']['payload'])
audio = audioop.ulaw2lin(audio, 2)
audio = audioop.ratecv(audio, 2, 1, 8000, 16000, None)[0]
await websocket.send(audio)
Let me know if you'd like me to provide any additional context :)
Thanks for the help in advance!
Hi,
I need a bit technical aid,
I want to forward the transciption, but the client is just forwarding it to an html client.
How can i do it by tweaking the client calling?
I suggest setting the output of start and end as strings, and controlling them to three decimal places, which is accurate to millimeters.
Current outputs:
{
"start": 29.304000000000002,
"end": 30.624000000000002,
"text": "OK"
}
Suggested:
{
"start": "29.304"
"end": "30.624",
"text": "OK"
}
Hi!
Thank you for this great project!
How can we switch the whisper model, to the latest whisperV3, and how can we use the large-v3 model?
Also, If I have a fine-tuned large-v3 model, how can I use the custom model?
The problem occurs when build the docker image.
I ran the server.py on my Mac and was successful.
I then ran the client.py and got the following error:
# Run the client
from whisper_live.client import TranscriptionClient
client = TranscriptionClient(
"localhost",
9090,
is_multilingual=True,
lang="ko",
translate=False,
model_size="base"
)
client()
Traceback (most recent call last):
File "/Users/asadal/Documents/Dev/Hani/WhisperLive_streamlit.py", line 5, in <module>
client = TranscriptionClient(
TypeError: TranscriptionClient.__init__() got an unexpected keyword argument 'model_size'
How do I fix this?
==============
Plus:
Is there a way to save the text output by the chrome extension to a file?
Thanks,
Looks like this package not testing on Windows
(whisper-live) E:\Workspace\github\whisper-live>python server.py
INFO:websockets.server:connection open
Downloading: "https://github.com/snakers4/silero-vad/zipball/master" to C:\Users\ufo/.cache\torch\hub\master.zip
2023-07-02 10:32:14.0518293 [W:onnxruntime:, graph.cc:3543 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer '620'. It is not used by any node and should be removed from the model.
2023-07-02 10:32:14.0551451 [W:onnxruntime:, graph.cc:3543 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer '623'. It is not used by any node and should be removed from the model.
2023-07-02 10:32:14.0586288 [W:onnxruntime:, graph.cc:3543 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer '625'. It is not used by any node and should be removed from the model.
2023-07-02 10:32:14.0626213 [W:onnxruntime:, graph.cc:3543 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer '629'. It is not used by any node and should be removed from the model.
2023-07-02 10:32:14.0667860 [W:onnxruntime:, graph.cc:3543 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer '628'. It is not used by any node and should be removed from the model.
2023-07-02 10:32:14.0703673 [W:onnxruntime:, graph.cc:3543 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer '131'. It is not used by any node and should be removed from the model.
2023-07-02 10:32:14.0745995 [W:onnxruntime:, graph.cc:3543 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer '134'. It is not used by any node and should be removed from the model.
2023-07-02 10:32:14.0776970 [W:onnxruntime:, graph.cc:3543 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer '136'. It is not used by any node and should be removed from the model.
2023-07-02 10:32:14.0812995 [W:onnxruntime:, graph.cc:3543 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer '140'. It is not used by any node and should be removed from the model.
2023-07-02 10:32:14.0841789 [W:onnxruntime:, graph.cc:3543 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer '139'. It is not used by any node and should be removed from the model.
E:\Workspace\github_me\whisper-live\server.py:112: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at C:\cb\pytorch_1000000000000\work\torch\csrc\utils\tensor_numpy.cpp:205.)
speech_prob = self.vad_model(torch.from_numpy(frame_np), self.RATE).item()
With option to record and transcribe system audio it would be possible to cover all sorts of useful use cases like:
By transcribing system audio I mean to capture and transcribe anything that is coming out of the system speakers or headphones, similar to this example https://github.com/tez3998/loopback-capture-sample
I'm trying to do realtime transcription from the browser, I have the server running on a container. I connect to it and do the handshake. After the handshake I start recording, when I stop the audio I convert the audio to a Float32Array, so that it can be understood by the server. Right now I'm just trying to send one full length audio Float32Array, however, when I send it I don't get a response. I'm wondering if my logic is correct also if the format I'm sending the data is correct. Removed some stuff from my code for brevity.
let socket = new WebSocket('wss://');
socket.onopen = function(e) {
socket.send(
JSON.stringify({
uid: v4(),
multilingual: true,
language: "en",
task: "transcribe"
})
);
};
socket.addEventListener('message', (event) => {
console.log('Message from server: ', event);
});
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
mediaRecorder = new MediaRecorder(stream);
mediaRecorder.ondataavailable = (e) => {
mediaChunks.push(e.data);
};
mediaRecorder.onstop = function() {
const audioBlob = new Blob(mediaChunks, { 'type': 'audio/ogg; codecs=opus' });
mediaChunks = [];
audioBlob.arrayBuffer().then(buffer => {
const audioContext = new (window.AudioContext || window.webkitAudioContext)();
audioContext.decodeAudioData((buffer, decodedData) => {
const monoChannelData = decodedData.getChannelData(0); // Get the mono channel data
connection.send(monoChannelData.buffer);
});
});
Hi I have a specific use case in mind for this and i'm looking for any assistance I can get modifying this software for my use case. it might be a beneficial addition to the project to have a mode which operates this way.
i have Zello (a walkie talkie app) running and playing the output through a virtual audio cable and the other end is the default input.
When i run the whisper-live client in microphone mode it gets the audio stream from zello just fine and transcription proceeds. Great!
Now my goal is to get it so that when i speak a message the transcribed text gets url encoded and passed to
mpg123 http://my.server/my.php?text=<text>
This relates to an AI text-to-speech generator which replies with streaming mp3 audio (and Zello is set to VOX transmit so the voice gets played back to me on Zello).
The idea is that both server and client would stay running 24/7, transcribing messages only when recieved, waiting for the message to be completely transcribed, then running mpg123.
Once a message has been played we can dispose of any temporary audio chunks since we don't want to repeat any old text.
A few challenges I'm facing currently...
it seems that the longer the client is running it is writing chunks of audio to the disk. the longer i have the client running the longer and longer transcription takes.
also, in the on_message function, the message keeps getting longer and longer and i dont want the full message, only new text.
hope this makes sense, if anyone can help please reply here or discord laozi101. thanks!
Hey guys. Really appreciate the project. I'm really new to Whisper and Python but have a fair amount of coding background in other languages. Wondering if you could provide any strategy ideas or an outline on the best way to approach the below.
I've got an existing websocket server implementation that accepts a websocket connection from Twilio
The websocket media messages look like this:
{
"event": "media",
"sequenceNumber": "4",
"media": {
"track": "inbound",
"chunk": "2",
"timestamp": "5",
"payload": "no+JhoaJjpzSHxAKBgYJDhtEopGKh4aIjZm7JhILBwYIDRg1qZSLh4aIjJevLBUMBwYHDBUsr5eMiIaHi5SpNRgNCAYHCxImu5mNiIaHipGiRBsOCQYGChAf0pyOiYaGiY+e/x4PCQYGCQ4cUp+QioaGiY6bxCIRCgcGCA0ZO6aSi4eGiI2YtSkUCwcGCAwXL6yVjIeGh4yVrC8XDAgGBwsUKbWYjYiGh4uSpjsZDQgGBwoRIsSbjomGhoqQn1IcDgkGBgkPHv+ej4mGhomOnNIfEAoGBgkOG0SikYqHhoiNmbsmEgsHBggNGDWplIuHhoiMl68sFQwHBgcMFSyvl4yIhoeLlKk1GA0IBgcLEia7mY2IhoeKkaJEGw4JBgYKEB/SnI6JhoaJj57/Hg8JBgYJDhxSn5CKhoaJjpvEIhEKBwYIDRk7ppKLh4aIjZi1KRQLBwYIDBcvrJWMh4aHjJWsLxcMCAYHCxQptZiNiIaHi5KmOxkNCAYHChEixJuOiYaGipCfUhwOCQYGCQ8e/56PiYaGiY6c0h8QCgYGCQ4bRKKRioeGiI2ZuyYSCwcGCA0YNamUi4eGiIyXrywVDAcGBwwVLK+XjIiGh4uUqTUYDQgGBwsSJruZjYiGh4qRokQbDgkGBgoQH9KcjomGhomPnv8eDwkGBgkOHFKfkIqGhomOm8QiEQoHBggNGTumkouHhoiNmLUpFAsHBggMFy+slYyHhoeMlawvFwwIBgcLFCm1mI2IhoeLkqY7GQ0IBgcKESLEm46JhoaKkJ9SHA4JBgYJDx7/no+JhoaJjpzSHxAKBgYJDhtEopGKh4aIjZm7JhILBwYIDRg1qZSLh4aIjJevLBUMBwYHDBUsr5eMiIaHi5SpNRgNCAYHCxImu5mNiIaHipGiRBsOCQYGChAf0pyOiYaGiY+e/x4PCQYGCQ4cUp+QioaGiY6bxCIRCgcGCA0ZO6aSi4eGiI2YtSkUCwcGCAwXL6yVjIeGh4yVrC8XDAgGBwsUKbWYjYiGh4uSpjsZDQgGBwoRIsSbjomGhoqQn1IcDgkGBgkPHv+ej4mGhomOnNIfEAoGBgkOG0SikYqHhoiNmbsmEgsHBggNGDWplIuHhoiMl68sFQwHBgcMFSyvl4yIhoeLlKk1GA0IBgcLEia7mY2IhoeKkaJEGw4JBgYKEB/SnI6JhoaJj57/Hg8JBgYJDhxSn5CKhoaJjpvEIhEKBwYIDRk7ppKLh4aIjZi1KRQLBwYIDBcvrJWMh4aHjJWsLxcMCAYHCxQptZiNiIaHi5KmOxkNCAYHChEixJuOiYaGipCfUhwOCQYGCQ8e/56PiYaGiY6c0h8QCgYGCQ4bRKKRioeGiA=="
},
"streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0"
}
source: https://www.twilio.com/docs/voice/twiml/stream#websocket-messages-from-twilio
Here is my existing websocket proof of concept that accepts an incoming stream fine and I can transcribe using whisper_cpp after the stream has completed. I'm looking to get realtime transcription working though if possible.
@app.websocket("/stream")
async def websocket_endpoint(websocket: WebSocket):
await websocket.accept()
audio_bytes_buffer = bytearray()
try:
while True:
message = await websocket.receive_text()
packet = json.loads(message)
if packet["event"] == "start":
print("Streaming is starting")
elif packet["event"] == "stop":
print("\nStreaming has stopped")
# global accumulated_audio, accumulated_frames
# accumulated_audio = bytearray() # Reset accumulated_audio
# accumulated_frames = [] # Reset accumulated_frames
break
elif packet["event"] == "media":
audio = bytes.fromhex(packet["media"]["payload"])
audio = audioop.ulaw2lin(audio, 2)
audio = audioop.ratecv(audio, 2, 1, 8000, 16000, None)[0]
audio_bytes_buffer.extend(audio)
# Append the processed audio to the audio buffer for asynchronous processing
audio_buffer.append(audio)
# length of audio_bytes_buffer in seconds
length_in_seconds = len(audio_bytes_buffer) / BYTES_IN_1_MS / 1000
logger.info(f"audio_bytes_buffer seconds: {length_in_seconds}")
# Schedule background task for transcription
asyncio.create_task(execute_transcription(model, audio_bytes_buffer))
# SAVE COMPLETE AUDIO FILE
filename = f"99_complete_audio.wav"
length_in_seconds = len(audio_bytes_buffer) / BYTES_IN_1_MS / 1000
print(f"Saving {filename} seconds: {length_in_seconds}")
asyncio.create_task(execute_save_segment(audio_bytes_buffer, filename))
except Exception as e:
print(f"WebSocket closed unexpectedly: {e}")
What I'm wondering is what would be the best way to send the live streaming audio data to the server? Would it make sense to create a new websocket server to listen for incoming Twilio stream data and then send that to the TwilioClient somehow. Thinking of modifying the record method to handle incoming audio data instead of recording from the mic. Any feedback would be greatly appreciated.
cheers!
Hi Dear i have a issue in runing run_server how can i solve?this issue?
Downloading: "https://github.com/snakers4/silero-vad/zipball/master" to /root/.cache/torch/hub/master.zip
2023-09-16 19:29:25.999206343 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '131'. It is not used by any node and should be removed from the model.
2023-09-16 19:29:25.999219739 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '136'. It is not used by any node and should be removed from the model.
2023-09-16 19:29:25.999221888 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '139'. It is not used by any node and should be removed from the model.
2023-09-16 19:29:25.999223751 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '140'. It is not used by any node and should be removed from the model.
2023-09-16 19:29:25.999225479 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '134'. It is not used by any node and should be removed from the model.
2023-09-16 19:29:25.999253937 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '628'. It is not used by any node and should be removed from the model.
2023-09-16 19:29:25.999255898 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '623'. It is not used by any node and should be removed from the model.
2023-09-16 19:29:25.999258033 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '629'. It is not used by any node and should be removed from the model.
2023-09-16 19:29:25.999259496 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '620'. It is not used by any node and should be removed from the model.
2023-09-16 19:29:25.999261155 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '625'. It is not used by any node and should be removed from the model.
i'm not sure if what i'm doing is right but adjusting the language string to "ar" didn't switch to Arabic and was still showing English.
Hello,
It is interesting ti have speaker diarization for off transcription. Have you already this fonctionality ?
best regards
here is the log after this code
from whisper_live.server import TranscriptionServer
server = TranscriptionServer()
server.run("0.0.0.0", 9090)
Traceback (most recent call last):
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 187, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 110, in _get_module_details
import(pkg_name)
File "D:\Code\whisper\WhisperLive-0.0.7\main.py", line 2, in
server = TranscriptionServer()
File "D:\Code\whisper\WhisperLive-0.0.7\whisper_live\server.py", line 38, in init
self.vad_model = VoiceActivityDetection()
File "D:\Code\whisper\WhisperLive-0.0.7\whisper_live\vad.py", line 21, in init
self.session = onnxruntime.InferenceSession(path, providers=['CPUExecutionProvider'], sess_options=opts)
File "C:\Users\25813\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 419, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "C:\Users\25813\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 452, in _create_inference_session
sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\25813/.cache/whisper-live/silero_vad.onnx failed:C:\a_work\1\s\onnxruntime\core\graph\model.cc:134 onnxruntime::Model::Model ModelProto does not have a graph.
i have installed requirement from client.txt and server.txt,and i can not installed them from setup.sh,cause i am running server and client both in the same windwos system,as well as setup.py ,it error Perhaps my account does not have write access to this directory
and i don`t installed cuda,is this error linked with this or something else?
Lokk forward for your early reply
is there a demo of this somewhere to get a sense of the transcription latency?
can't see to reopen the issue so just bumping here
I have gone through server.py and noticed that the initial_prompt
and vad_parameters
are hardcoded. Is there a way to make them configurable by the client? Perhaps set them in the first message after establishing a WebSocket connection? Alternatively, could you provide some guidance so that I could submit a PR?
I feel a little foolish here, as i am probably missing something obvious.
I get that the client has the onMessage function - but i am struggling to see where i can access this to either save to a file or to pass to another function?
Also, while i am here - do any flags need to be passed to use GPU? output feels a little sluggish, for having a 3090.
where i can change and specifie the whisper model like switching from base to medium or other
How to set the specified local model?I want to use the large model
I want to know if bufferSize is required to be a fixed value. Thank you. ❤️
client.py
if len(text) > 3: text = text[-3:]
The first time WhisperLive is deployed and a client request is made, the model is downloaded. However, at this time, there is always a WebSocket error. But subsequent client requests all proceed normally.
python3 run_server.py
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 2.39k/2.39k [00:00<00:00, 16.6MB/s]
preprocessor_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████| 340/340 [00:00<00:00, 2.30MB/s]
vocabulary.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1.07M/1.07M [00:00<00:00, 1.65MB/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 2.48M/2.48M [00:00<00:00, 3.08MB/s]
model.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 3.09G/3.09G [04:18<00:00, 11.9MB/s]
ERROR:websockets.server:connection handler failed
Traceback (most recent call last):████████████████████████████████████████████████████████████████████████████████| 3.09G/3.09G [04:18<00:00, 11.5MB/s]
File "/home/ubuntu/.local/lib/python3.10/site-packages/websockets/sync/server.py", line 499, in conn_handler
handler(connection)
File "/home/ubuntu/WhisperLive/whisper_live/server.py", line 99, in recv_audio
client = ServeClient(
File "/home/ubuntu/WhisperLive/whisper_live/server.py", line 255, in __init__
self.websocket.send(
File "/home/ubuntu/.local/lib/python3.10/site-packages/websockets/sync/connection.py", line 284, in send
with self.send_context():
File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/home/ubuntu/.local/lib/python3.10/site-packages/websockets/sync/connection.py", line 724, in send_context
raise self.protocol.close_exc from original_exc
websockets.exceptions.ConnectionClosedError: no close frame received or sent
Hi,
I tested for single client on Tesla T4 and the transcriptions are real-time. But what is the best way to scale server code to 50-100 concurrent users at least (more preferred but I mentioned 50-100 concurrent users because it will need 3 k8 pods with tesla t4 each)?
I've successfully build the docker image, and run the docker run ...
command come from the documentation, then I got this log:
Downloading VAD ONNX model...
--2023-11-18 04:58:37-- https://github.com/snakers4/silero-vad/raw/master/files/silero_vad.onnx
Resolving github.com (github.com)... 20.205.243.166
Connecting to github.com (github.com)|20.205.243.166|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/snakers4/silero-vad/master/files/silero_vad.onnx [following]
--2023-11-18 04:58:38-- https://raw.githubusercontent.com/snakers4/silero-vad/master/files/silero_vad.onnx
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 182.43.124.6, ::
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|182.43.124.6|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1807522 (1.7M) [application/octet-stream]
Saving to: ‘/root/.cache/whisper-live/silero_vad.onnx’
/root/.cache/whisper-live/sil 100%[=================================================>] 1.72M 318KB/s in 5.8s
2023-11-18 04:58:46 (302 KB/s) - ‘/root/.cache/whisper-live/silero_vad.onnx’ saved [1807522/1807522]
After this, I tried to load the extension to the browser, after I click Start Capture
, I see nothing happens from the browser.
I tried to go to localhost:9090
from the browser, got this:
Failed to open a WebSocket connection: invalid Connection header: keep-alive.
You cannot access a WebSocket server directly with a browser. You need a WebSocket client.
So, am I got the image built correctly? And how to use the client for the docker server?
Thanks.
I want to add speaker diarization. Wondering how much I would have to change (existing code), also if you can point me to how you would approach this since I'm still getting used to the code base.
Hello! I'm trying to run the server but I keep getting this error:
ImportError: cannot import name '_LANGUAGE_CODES' from 'faster_whisper.tokenizer' (C:\Users\Víctor Masip\AppData\Local\Programs\Python\Python310\lib\site-packages\faster_whisper\tokenizer.py)
Thanks for your work!
I've tried transcribing an HLS stream by inputing a link to a .m3u8 file, but it does not seem to work. Is that feature supported somehow?
@author I am using this repo for server client based Speech to Text for live transcription. I want to generate text on only server side, client will ony send the audio to the server, and text recognized by server will be shown in the server side.
But, when I say something one time, it prints more than one time. Can you please help me in this?
Using the JFK wave file here: https://github.com/ggerganov/whisper.cpp/blob/master/samples/jfk.wav
Here is my run_client.py
from whisper_live.client import TranscriptionClient
client = TranscriptionClient("0.0.0.0", "8080", is_multilingual=False, lang="en", translate=False)
client("./jfk.wav")
running the client only transcribes the following:
And so, my fellow America, ask not what your country can do
for you.
Hi,
I would like to know if Windows will be supported, I tried with the actual files but I am not able to make it work…
Any hint on how to make it work would be welcome.
This issue refers to changes introduced in the recent commit 72ead71, related to the addition of vad_parameters
and initial_prompt
parameters. I didn't see any branch related to it so I'm raising an issue instead of a comment
Client requests seem to fail because of the lack of initial_prompt
and vad_parameter
keys in the first message. I reproduced this error with servers ran through either run_server.py
or a Docker container.
Here are small snippets to reproduce the error:
from whisper_live.server import TranscriptionServer
if __name__ == "__main__":
server = TranscriptionServer()
server.run("0.0.0.0", 9090)
from whisper_live.client import TranscriptionClient
client = TranscriptionClient(
"localhost",
9090,
is_multilingual=False,
lang="en",
translate=False,
model_size="small",
)
client("tests/jfk.flac")
ERROR:websockets.server:connection handler failed
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/websockets/sync/server.py", line 499, in conn_handler
handler(connection)
File "/app/whisper_live/server.py", line 104, in recv_audio
initial_prompt=options["initial_prompt"],
KeyError: 'initial_prompt'
I guess this comes from the lack of parameter forwarding in the client.py
file during the opening of the websocket, which could be solved by this:
def on_open(self, ws):
"""
Callback function called when the WebSocket connection is successfully opened.
Sends an initial configuration message to the server, including client UID, multilingual mode,
language selection, and task type.
Args:
ws (websocket.WebSocketApp): The WebSocket client instance.
"""
print(self.multilingual, self.language, self.task)
print("[INFO]: Opened connection")
ws.send(
json.dumps(
{
"uid": self.uid,
"multilingual": self.multilingual,
"language": self.language,
"task": self.task,
"model_size": self.model_size,
"initial_prompt": self.initial_prompt, # added line
"vad_parameters": self.vad_parameters, # added line
}
)
)
Hi everyone,
I am working on the WhisperLive to get the transcriptions on client side. It is working now, but there is an issue. The issue is that after a few minutes (1 or 2 minutes), it does not print the updated (new) transcription for the audio on the terminal. It feels like that it gets stuck.
Can anyone help me in the issue mentioned above? Thanks in advance.
Could you please help explain what these error messages mean and how to resolve them? Thank you.
ERROR:root:[ERROR]: sent 1000 (OK); then received 1000 (OK)
ERROR:root:received 1001 (going away); then sent 1001 (going away)
ERROR:root:[ERROR]: 'WhisperModel' object has no attribute 'model'
ERROR:root:received 1001 (going away); then sent 1001 (going away)
ERROR:root:[ERROR]: 'WhisperModel' object has no attribute 'model'
ERROR:root:no close frame received or sent
ERROR:root:[ERROR]: 'WhisperModel' object has no attribute 'model'
There are a lot of segments which are small. I was going to change self.pick_previous_segments = 2
to 3 or more, but found that fill_output
isn't used so it wouldn't make a difference. Any quick line change I can do to achieve lengthier segments?
Can we incorporate speaker identification into the transcription results?
I found a project called whisper-diarization from Faster Whisper's Community integrations section.
Is it possible for us to integrate it?
Seeing the following error trying to run the docker image on my Mac M1
whisperlive git:(main)✗ 🚀 docker run -it --gpus all -p 9090:9090 whisper-live:latest
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0000] error waiting for container:
whisperlive git:(main)✗ 🚀 docker run -it -p 9090:9090 whisper-live:latest
==========
== CUDA ==
==========
CUDA Version 11.2.2
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .
It wasn't clear to me that this docker implementation required NVDIA gpus
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.