Giter Site home page Giter Site logo

fl33tw00d / whisper-turbo Goto Github PK

View Code? Open in Web Editor NEW
1.6K 15.0 61.0 3.92 MB

Cross-Platform, GPU Accelerated Whisper đŸŽïž

Home Page: https://whisper-turbo.com

License: Apache License 2.0

JavaScript 2.01% TypeScript 85.30% CSS 12.58% Just 0.12%
machine-learning rust webgpu whisper audio speech-recognition windows

whisper-turbo's Introduction

What is Whisper Turbo?

Whisper Turbo is a fast, cross-platform Whisper implementation, designed to run entirely client-side in your browser/electron app.

Check out the Rust library behind Whisper Turbo, Ratchet

Demo

readme-demo.mp4

Supported Platforms

WebGPU is only officially supported on Chromium based browsers running on Windows & MacOS. For more information, check out Supported Platforms

Want to get involved?

  • Are you a GPU wizard?
  • Do you know what a HRTB is in Rust?
  • Do you know what is going on here?
  • Reach out: [email protected]

whisper-turbo's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

whisper-turbo's Issues

[Feature Request] Basic integration tests.

Some unit tests and integration test needed to be passed in order to merge to master.
We can check the translation and transcription of a small file and repeat the process encoding it in several formats.

  • unit tests that checks the translation and transcription of a file transcoded into many formats
  • gh actions workflow to be run on every push

[Question] How to introduce VAD to solve the problem of hallucinations

Background:
I've noticed that when processing audio files containing silent or non-speech segments, Whisper tends to generate hallucinatory content. This not only affects the segments with silence or non-human voices but also seems to impact the subsequent normal speech parts in the audio.

Inquiry:
Given that this is an inherent issue with Whisper, I am curious to know if it's feasible to incorporate strategies similar to VAD in Whisper-turbo. I am aware of approaches like those used in projects such as WhisperX, which seem to effectively mitigate such issues.

Thank you for your time and the incredible work on this project.

[Question] Support Large Model

  1. I don't think there is a binary file yet because the source code seems to support the Large model. Is it currently being built?
  2. As far as I know in openai, there are whisper large v2, are you planning to make large v2 as well?
    Looking at this link, it's only possible up to v1.

[Feature request] Add kill() to whisper-webgpu API

The only way to use JS workers with whisper-webgpu seems to be this repository. I found whisper-webgpu failing some of the times when saving the "medium" .bin model in the DB.

I actually have all that infrastructure built for myself (when I was using whisper.cpp).

Therefore, I think a kill() method in whisper-webgpu (maybe using flags or whatever) could be very beneficial for the widespread usage of the API. This API looks like the next big thing, would be cool to ask features directly to it instead of this wrapper.

[BUG] Problems with music transition (+ comparison with whisper.cpp)

I have this interesting audio file: http://sndup.net/r7pp that is giving me problems.

I detected a major problem that apparently whisper.cpp is not suffering from. There's a music transition in the podcast (of around 20 seconds) that whisper-turbo just ignores and then the timestamps from that point on are 20 seconds on the past.

Whisper-WebGPU transcription

[00:03:43.200 --> 00:03:46.200] Per començar bé l'any.
[00:03:46.280 --> 00:03:49.280] Ho trobarem.
[00:03:49.360 --> 00:03:51.360] Eren bons els ous, Gio.
[00:03:51.400 --> 00:03:53.400] Ui, ui, ui, ui!
[00:03:53.480 --> 00:03:55.480] Next!
[00:03:55.560 --> 00:03:57.560] De pagĂšs.
(!!!) [00:03:57.640 --> 00:03:59.640] Normalment ara dirĂ­em allĂČ de baixar revolucions,
[00:03:59.720 --> 00:04:01.720] posar mĂșsica calmada i anar a l'entrevista,
[00:04:01.800 --> 00:04:03.800] perĂČ el nostre convidat d'avui Ă©s d'aquells canyeros.

Whisper.cpp transcription

[00:03:49.520 --> 00:03:52.440]   Ah, està bé, està bé. Per començar bé l'any.
[00:03:52.440 --> 00:03:54.520]   Ho trobarem. Ho trobarem.
[00:03:54.520 --> 00:03:56.640]   SĂ­, sĂ­. Eren bons els ous, tio.
[00:03:56.640 --> 00:03:59.080]   #
(!!!) [00:03:59.160 --> 00:04:19.520]   Normalment ara dirĂ­em allĂČ de baixar revolucions,
[00:04:19.520 --> 00:04:22.520]   posar mĂșsica calmada i anar a l'entrevista,
[00:04:22.520 --> 00:04:25.000]   perĂČ el nostre convidat d'avui Ă©s d'aquells canyeros.

Can not define language for model. Model sometimes wrong about language.

Describe the bug
Thanks for your awesome work! I currently have an issue that sometimes whisper correctly detects language and sometimes not. Is there a way to force language, of force model to always translate. Right now it's a random mix of two languages.

To Reproduce
Steps to reproduce the behavior:

  1. Go to playground, select small model.
  2. Try to speak any non English language.
  3. Try two cases. Start with english word first (then use your second language). Start with your second language.

Desktop (please complete the following information):

  • Reproduces everywere

Additional context

[BUG] medium model will not output complete results

Describe the bug

  • Medium model will not output complete results
    • The larger the model, the higher the probability of bug.
    • which occasionally happens in small one, etc.
    • Happens almost 99% when I use medium model for audio longer than 1 minute.
    • Occurs in most language audio.
    • I suspect there is some timeout or leakage.

To Reproduce
Steps to reproduce the behavior:

  1. Select the medium model
  2. Select test.mp3
  3. Setting the ’zh‘ language may reduce the chance of problems.
  4. The console only shows the processing but does not return json.

Desktop (please complete the following information):

  • OS: [Windows]
  • Browser [Chrome 119]
  • Also tried compiling locally

image
test-audio.zip

whisper-webgpu.js panicked

Error try use playground

example audio
example-audio.wav.zip

whisper-webgpu.js:522 panicked at /Users/fleetwood/Code/whisper-web/crates/whisper-core/src/whisper.rs:211:24:
called `Result::unwrap()` on an `Err` value: TooWide

Stack:

Error
    at imports.wbg.__wbg_new_abda76e883ba8a5f (webpack-internal:///./node_modules/whisper-webgpu/whisper-webgpu.js:515:21)
    at http://localhost:3000/_next/static/media/whisper-webgpu_bg.572596d0.wasm:wasm-function[1818]:0x266653
    at http://localhost:3000/_next/static/media/whisper-webgpu_bg.572596d0.wasm:wasm-function[2455]:0x27cc55
    at http://localhost:3000/_next/static/media/whisper-webgpu_bg.572596d0.wasm:wasm-function[2059]:0x2711aa
    at http://localhost:3000/_next/static/media/whisper-webgpu_bg.572596d0.wasm:wasm-function[214]:0x50dc6
    at http://localhost:3000/_next/static/media/whisper-webgpu_bg.572596d0.wasm:wasm-function[1112]:0x22eee8
    at http://localhost:3000/_next/static/media/whisper-webgpu_bg.572596d0.wasm:wasm-function[2648]:0x27efec
    at __wbg_adapter_28 (webpack-internal:///./node_modules/whisper-webgpu/whisper-webgpu.js:217:10)
    at real (webpack-internal:///./node_modules/whisper-webgpu/whisper-webgpu.js:202:20)

[Question] License Check

Can I use this project for commercial use?
I'm wonder about the license tolerance for this project.

Add a license to the project

Without a license for your project, the default copyright laws kick in. Your code can't be reproduced, distributed or have derivative works created.

I have a project in mind based on your work, but my hands are tied if I can't fork and modify your project.
Could you add a permissive license such as MIT so that I can use your project?

Memory issues

Experimenting with the project I found two issues:

  1. GPU memory is not released after the infer.
  2. The bigger the audio file the bigger the memory consumed. For a 4min file it drains on my computer 4,48GB. Bigger files has frozen my computer draining up to 20GB.

Macos M1 16GB Ventura 13.1
Chromium 118.0.5993.32

[BUG] whisper.rs:208:76: called `Result::unwrap()` on an `Err` value: BufferAsyncError(BufferAsyncError)

Thanks for your work, very cool project.
I'm getting these errors in browser console (Chrome 117.0.5938.150 on Windows 10):

451-bef1ecb884dd18e5.js:1 panicked at /Users/fleetwood/Code/whisper-web/crates/whisper-core/src/whisper.rs:208:76:
called `Result::unwrap()` on an `Err` value: BufferAsyncError(BufferAsyncError)

Stack:

Error
    at t.wbg.__wbg_new_abda76e883ba8a5f (https://whisper-turbo.com/_next/static/chunks/451-bef1ecb884dd18e5.js:1:17432)
    at https://whisper-turbo.com/_next/static/media/whisper-webgpu_bg.9e5117bc.wasm:wasm-function[2042]:0x2e752a
    at https://whisper-turbo.com/_next/static/media/whisper-webgpu_bg.9e5117bc.wasm:wasm-function[2734]:0x2ff385
    at https://whisper-turbo.com/_next/static/media/whisper-webgpu_bg.9e5117bc.wasm:wasm-function[2307]:0x2f3038
    at https://whisper-turbo.com/_next/static/media/whisper-webgpu_bg.9e5117bc.wasm:wasm-function[229]:0x7b44b
    at https://whisper-turbo.com/_next/static/media/whisper-webgpu_bg.9e5117bc.wasm:wasm-function[1244]:0x292720
    at https://whisper-turbo.com/_next/static/media/whisper-webgpu_bg.9e5117bc.wasm:wasm-function[2947]:0x301b9f
    at m (https://whisper-turbo.com/_next/static/chunks/451-bef1ecb884dd18e5.js:1:13180)
    at _ (https://whisper-turbo.com/_next/static/chunks/451-bef1ecb884dd18e5.js:1:13056)
whisper-webgpu_bg.9e5117bc.wasm:0x2e7640 Uncaught (in promise) RuntimeError: unreachable
    at whisper-webgpu_bg.9e5117bc.wasm:0x2e7640
    at whisper-webgpu_bg.9e5117bc.wasm:0x2ff385
    at whisper-webgpu_bg.9e5117bc.wasm:0x2f3038
    at whisper-webgpu_bg.9e5117bc.wasm:0x7b44b
    at whisper-webgpu_bg.9e5117bc.wasm:0x292720
    at whisper-webgpu_bg.9e5117bc.wasm:0x301b9f
    at m (451-bef1ecb884dd18e5.js:1:13180)
    at _ (451-bef1ecb884dd18e5.js:1:13056)

Possibly because of these errors that look like Chrome ones:

ID3D12Device::GetDeviceRemovedReason failed with DXGI_ERROR_DEVICE_HUNG (0x887A0006)
 - While handling unexpected error type Internal when allowed errors are (Validation|DeviceLost).
    at CheckHRESULTImpl (..\..\third_party\dawn\src\dawn\native\d3d\D3DError.cpp:96)
    at CheckAndUpdateCompletedSerials (..\..\third_party\dawn\src\dawn\native\d3d12\DeviceD3D12.cpp:359)
    at CheckPassedSerials (..\..\third_party\dawn\src\dawn\native\ExecutionQueue.cpp:33)
    at Tick (..\..\third_party\dawn\src\dawn\native\Device.cpp:1325)

whisper-turbo.com/:1 ID3D12Device::GetDeviceRemovedReason failed with DXGI_ERROR_DEVICE_HUNG (0x887A0006)
 - While handling unexpected error type Internal when allowed errors are (Validation|DeviceLost).
    at CheckHRESULTImpl (..\..\third_party\dawn\src\dawn\native\d3d\D3DError.cpp:96)
    at CheckAndUpdateCompletedSerials (..\..\third_party\dawn\src\dawn\native\d3d12\DeviceD3D12.cpp:359)
    at CheckPassedSerials (..\..\third_party\dawn\src\dawn\native\ExecutionQueue.cpp:33)
    at Tick (..\..\third_party\dawn\src\dawn\native\Device.cpp:1325)

My main problem is not the error itself, but that I wasted some time because UI didn't tell me that some error happened and the spinner was spinning happily forever

CORS are not enabled at https://rmbl.us

Im trying to setup the library in a test project however CORS 😈 does not allow me to get access đŸ™…â€â™‚ïž .

Access to fetch at 'https://rmbl.us/whisper-turbo/whisper-small-pf16-full.bin' from origin 'http://localhost:19006' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

Any chance to get support for Linux?

Hello,

this is a really cool project. I wanted to try it out but when the base model loads and reaches 100% I get this error:

image

I was about to make a report bug then saw in the bug report than only mac and windows are supported so maybe that's expected.

Are you planning on adding Linux support at some point?

Thanks!

Replicate OAI transcribe interface

In order to be a drop-in replacement for OAI, we need to replicate the transcribe interface

def transcribe(
    model: "Whisper",
    audio: Union[str, np.ndarray, torch.Tensor],
    *,
    verbose: Optional[bool] = None,
    temperature: Union[float, Tuple[float, ...]] = (0.0, 0.2, 0.4, 0.6, 0.8, 1.0),
    compression_ratio_threshold: Optional[float] = 2.4,
    logprob_threshold: Optional[float] = -1.0,
    no_speech_threshold: Optional[float] = 0.6,
    condition_on_previous_text: bool = True,
    initial_prompt: Optional[str] = None,
    word_timestamps: bool = False,
    prepend_punctuations: str = "\"'“¿([{-",
    append_punctuations: str = "\"'.。,!?:”)]}、",
    **decode_options,
):
  • Beam sampling
  • Word level timestamps
  • Initial prompting

Modal not loading

After choosing the model nothing happened just text Load appears. But after choosing audio file I got error "no model loaded".

[BUG] crash on Mac Chrome with 81MB file

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://whisper-turbo.com/
  2. Click on Tiny
  3. Upload the amazon.mp3 file (56 minutes 29 seconds long)
  4. See error
    image
269-03125f41d949514b.js:1 panicked at crates/whisper-core/src/decoding.rs:85:17:
index out of bounds: the len is 509 but the index is 4294967295

Stack:

Error
    at t.wbg.__wbg_new_abda76e883ba8a5f (https://whisper-turbo.com/_next/static/chunks/269-03125f41d949514b.js:1:17434)
    at https://whisper-turbo.com/_next/static/media/whisper-webgpu_bg.d05673f1.wasm:wasm-function[2029]:0x2e31ac
    at https://whisper-turbo.com/_next/static/media/whisper-webgpu_bg.d05673f1.wasm:wasm-function[2722]:0x2fafdf
    at https://whisper-turbo.com/_next/static/media/whisper-webgpu_bg.d05673f1.wasm:wasm-function[2364]:0x2f313a
    at https://whisper-turbo.com/_next/static/media/whisper-webgpu_bg.d05673f1.wasm:wasm-function[756]:0x22edd9
    at https://whisper-turbo.com/_next/static/media/whisper-webgpu_bg.d05673f1.wasm:wasm-function[241]:0x9662c
    at https://whisper-turbo.com/_next/static/media/whisper-webgpu_bg.d05673f1.wasm:wasm-function[1241]:0x28f015
    at https://whisper-turbo.com/_next/static/media/whisper-webgpu_bg.d05673f1.wasm:wasm-function[2935]:0x2fd7f4
    at m (https://whisper-turbo.com/_next/static/chunks/269-03125f41d949514b.js:1:13180)
    at _ (https://whisper-turbo.com/_next/static/chunks/269-03125f41d949514b.js:1:13056)


t.wbg.__wbg_error_f851667af71bcfc6 @ 269-03125f41d949514b.js:1
$func2029 @ whisper-webgpu_bg.d05673f1.wasm:0x2e3279
$func2722 @ whisper-webgpu_bg.d05673f1.wasm:0x2fafdf
$func2364 @ whisper-webgpu_bg.d05673f1.wasm:0x2f313a
$func756 @ whisper-webgpu_bg.d05673f1.wasm:0x22edd9
$func241 @ whisper-webgpu_bg.d05673f1.wasm:0x9662c
$func1241 @ whisper-webgpu_bg.d05673f1.wasm:0x28f015
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_f7e06ee3c11698eb @ 269-03125f41d949514b.js:2
$func2048 @ whisper-webgpu_bg.d05673f1.wasm:0x2e41e7
$func2185 @ whisper-webgpu_bg.d05673f1.wasm:0x2ea92e
$func502 @ whisper-webgpu_bg.d05673f1.wasm:0x1c5ee0
$func2112 @ whisper-webgpu_bg.d05673f1.wasm:0x2e7485
$func1553 @ whisper-webgpu_bg.d05673f1.wasm:0x2c3fd5
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_b2267541e2a73865 @ 269-03125f41d949514b.js:2
$func1507 @ whisper-webgpu_bg.d05673f1.wasm:0x2c027d
$func238 @ whisper-webgpu_bg.d05673f1.wasm:0x8b780
$func241 @ whisper-webgpu_bg.d05673f1.wasm:0x962ae
$func1241 @ whisper-webgpu_bg.d05673f1.wasm:0x28f015
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_f7e06ee3c11698eb @ 269-03125f41d949514b.js:2
$func2048 @ whisper-webgpu_bg.d05673f1.wasm:0x2e41e7
$func2185 @ whisper-webgpu_bg.d05673f1.wasm:0x2ea92e
$func502 @ whisper-webgpu_bg.d05673f1.wasm:0x1c5ee0
$func2112 @ whisper-webgpu_bg.d05673f1.wasm:0x2e7485
$func1553 @ whisper-webgpu_bg.d05673f1.wasm:0x2c3fd5
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_b2267541e2a73865 @ 269-03125f41d949514b.js:2
$func1507 @ whisper-webgpu_bg.d05673f1.wasm:0x2c027d
$func238 @ whisper-webgpu_bg.d05673f1.wasm:0x8b780
$func241 @ whisper-webgpu_bg.d05673f1.wasm:0x962ae
$func1241 @ whisper-webgpu_bg.d05673f1.wasm:0x28f015
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_f7e06ee3c11698eb @ 269-03125f41d949514b.js:2
$func2048 @ whisper-webgpu_bg.d05673f1.wasm:0x2e41e7
$func2185 @ whisper-webgpu_bg.d05673f1.wasm:0x2ea92e
$func502 @ whisper-webgpu_bg.d05673f1.wasm:0x1c5ee0
$func2112 @ whisper-webgpu_bg.d05673f1.wasm:0x2e7485
$func1553 @ whisper-webgpu_bg.d05673f1.wasm:0x2c3fd5
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_b2267541e2a73865 @ 269-03125f41d949514b.js:2
$func1507 @ whisper-webgpu_bg.d05673f1.wasm:0x2c027d
$func238 @ whisper-webgpu_bg.d05673f1.wasm:0x8b780
$func241 @ whisper-webgpu_bg.d05673f1.wasm:0x962ae
$func1241 @ whisper-webgpu_bg.d05673f1.wasm:0x28f015
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_f7e06ee3c11698eb @ 269-03125f41d949514b.js:2
$func2048 @ whisper-webgpu_bg.d05673f1.wasm:0x2e41e7
$func2185 @ whisper-webgpu_bg.d05673f1.wasm:0x2ea92e
$func502 @ whisper-webgpu_bg.d05673f1.wasm:0x1c5ee0
$func2112 @ whisper-webgpu_bg.d05673f1.wasm:0x2e7485
$func1553 @ whisper-webgpu_bg.d05673f1.wasm:0x2c3fd5
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_b2267541e2a73865 @ 269-03125f41d949514b.js:2
$func1507 @ whisper-webgpu_bg.d05673f1.wasm:0x2c027d
$func238 @ whisper-webgpu_bg.d05673f1.wasm:0x8b780
$func241 @ whisper-webgpu_bg.d05673f1.wasm:0x962ae
$func1241 @ whisper-webgpu_bg.d05673f1.wasm:0x28f015
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_f7e06ee3c11698eb @ 269-03125f41d949514b.js:2
$func2048 @ whisper-webgpu_bg.d05673f1.wasm:0x2e41e7
$func2185 @ whisper-webgpu_bg.d05673f1.wasm:0x2ea92e
$func502 @ whisper-webgpu_bg.d05673f1.wasm:0x1c5ee0
$func2112 @ whisper-webgpu_bg.d05673f1.wasm:0x2e7485
$func1553 @ whisper-webgpu_bg.d05673f1.wasm:0x2c3fd5
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_b2267541e2a73865 @ 269-03125f41d949514b.js:2
$func1507 @ whisper-webgpu_bg.d05673f1.wasm:0x2c027d
$func238 @ whisper-webgpu_bg.d05673f1.wasm:0x8b780
$func241 @ whisper-webgpu_bg.d05673f1.wasm:0x962ae
$func1241 @ whisper-webgpu_bg.d05673f1.wasm:0x28f015
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_f7e06ee3c11698eb @ 269-03125f41d949514b.js:2
$func2048 @ whisper-webgpu_bg.d05673f1.wasm:0x2e41e7
$func2185 @ whisper-webgpu_bg.d05673f1.wasm:0x2ea92e
$func502 @ whisper-webgpu_bg.d05673f1.wasm:0x1c5ee0
$func2112 @ whisper-webgpu_bg.d05673f1.wasm:0x2e7485
$func1553 @ whisper-webgpu_bg.d05673f1.wasm:0x2c3fd5
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_b2267541e2a73865 @ 269-03125f41d949514b.js:2
$func1507 @ whisper-webgpu_bg.d05673f1.wasm:0x2c027d
$func238 @ whisper-webgpu_bg.d05673f1.wasm:0x8b780
$func241 @ whisper-webgpu_bg.d05673f1.wasm:0x962ae
$func1241 @ whisper-webgpu_bg.d05673f1.wasm:0x28f015
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_f7e06ee3c11698eb @ 269-03125f41d949514b.js:2
$func2048 @ whisper-webgpu_bg.d05673f1.wasm:0x2e41e7
$func2185 @ whisper-webgpu_bg.d05673f1.wasm:0x2ea92e
$func502 @ whisper-webgpu_bg.d05673f1.wasm:0x1c5ee0
$func2112 @ whisper-webgpu_bg.d05673f1.wasm:0x2e7485
$func1553 @ whisper-webgpu_bg.d05673f1.wasm:0x2c3fd5
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_b2267541e2a73865 @ 269-03125f41d949514b.js:2
$func1507 @ whisper-webgpu_bg.d05673f1.wasm:0x2c027d
$func238 @ whisper-webgpu_bg.d05673f1.wasm:0x8b780
$func241 @ whisper-webgpu_bg.d05673f1.wasm:0x962ae
$func1241 @ whisper-webgpu_bg.d05673f1.wasm:0x28f015
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_f7e06ee3c11698eb @ 269-03125f41d949514b.js:2
$func2048 @ whisper-webgpu_bg.d05673f1.wasm:0x2e41e7
$func2185 @ whisper-webgpu_bg.d05673f1.wasm:0x2ea92e
$func502 @ whisper-webgpu_bg.d05673f1.wasm:0x1c5ee0
$func2112 @ whisper-webgpu_bg.d05673f1.wasm:0x2e7485
$func1553 @ whisper-webgpu_bg.d05673f1.wasm:0x2c3fd5
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_b2267541e2a73865 @ 269-03125f41d949514b.js:2
$func1507 @ whisper-webgpu_bg.d05673f1.wasm:0x2c027d
$func238 @ whisper-webgpu_bg.d05673f1.wasm:0x8b780
$func241 @ whisper-webgpu_bg.d05673f1.wasm:0x962ae
$func1241 @ whisper-webgpu_bg.d05673f1.wasm:0x28f015
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_f7e06ee3c11698eb @ 269-03125f41d949514b.js:2
$func2048 @ whisper-webgpu_bg.d05673f1.wasm:0x2e41e7
$func2185 @ whisper-webgpu_bg.d05673f1.wasm:0x2ea92e
$func502 @ whisper-webgpu_bg.d05673f1.wasm:0x1c5ee0
$func2112 @ whisper-webgpu_bg.d05673f1.wasm:0x2e7485
$func1553 @ whisper-webgpu_bg.d05673f1.wasm:0x2c3fd5
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_b2267541e2a73865 @ 269-03125f41d949514b.js:2
$func1507 @ whisper-webgpu_bg.d05673f1.wasm:0x2c027d
$func238 @ whisper-webgpu_bg.d05673f1.wasm:0x8b780
$func241 @ whisper-webgpu_bg.d05673f1.wasm:0x962ae
$func1241 @ whisper-webgpu_bg.d05673f1.wasm:0x28f015
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_f7e06ee3c11698eb @ 269-03125f41d949514b.js:2
$func2048 @ whisper-webgpu_bg.d05673f1.wasm:0x2e41e7
$func2185 @ whisper-webgpu_bg.d05673f1.wasm:0x2ea92e
$func502 @ whisper-webgpu_bg.d05673f1.wasm:0x1c5ee0
$func2112 @ whisper-webgpu_bg.d05673f1.wasm:0x2e7485
$func1553 @ whisper-webgpu_bg.d05673f1.wasm:0x2c3fd5
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_b2267541e2a73865 @ 269-03125f41d949514b.js:2
$func1507 @ whisper-webgpu_bg.d05673f1.wasm:0x2c027d
$func238 @ whisper-webgpu_bg.d05673f1.wasm:0x8b780
$func241 @ whisper-webgpu_bg.d05673f1.wasm:0x962ae
$func1241 @ whisper-webgpu_bg.d05673f1.wasm:0x28f015
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_f7e06ee3c11698eb @ 269-03125f41d949514b.js:2
$func2048 @ whisper-webgpu_bg.d05673f1.wasm:0x2e41e7
$func2185 @ whisper-webgpu_bg.d05673f1.wasm:0x2ea92e
$func502 @ whisper-webgpu_bg.d05673f1.wasm:0x1c5ee0
$func2112 @ whisper-webgpu_bg.d05673f1.wasm:0x2e7485
$func1553 @ whisper-webgpu_bg.d05673f1.wasm:0x2c3fd5
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_b2267541e2a73865 @ 269-03125f41d949514b.js:2
$func1507 @ whisper-webgpu_bg.d05673f1.wasm:0x2c027d
$func238 @ whisper-webgpu_bg.d05673f1.wasm:0x8b780
$func241 @ whisper-webgpu_bg.d05673f1.wasm:0x962ae
$func1241 @ whisper-webgpu_bg.d05673f1.wasm:0x28f015
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_f7e06ee3c11698eb @ 269-03125f41d949514b.js:2
$func2048 @ whisper-webgpu_bg.d05673f1.wasm:0x2e41e7
$func2185 @ whisper-webgpu_bg.d05673f1.wasm:0x2ea92e
$func502 @ whisper-webgpu_bg.d05673f1.wasm:0x1c5ee0
$func2112 @ whisper-webgpu_bg.d05673f1.wasm:0x2e7485
$func1553 @ whisper-webgpu_bg.d05673f1.wasm:0x2c3fd5
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_b2267541e2a73865 @ 269-03125f41d949514b.js:2
$func1507 @ whisper-webgpu_bg.d05673f1.wasm:0x2c027d
$func238 @ whisper-webgpu_bg.d05673f1.wasm:0x8b780
$func241 @ whisper-webgpu_bg.d05673f1.wasm:0x962ae
$func1241 @ whisper-webgpu_bg.d05673f1.wasm:0x28f015
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_f7e06ee3c11698eb @ 269-03125f41d949514b.js:2
$func2048 @ whisper-webgpu_bg.d05673f1.wasm:0x2e41e7
$func2185 @ whisper-webgpu_bg.d05673f1.wasm:0x2ea92e
$func502 @ whisper-webgpu_bg.d05673f1.wasm:0x1c5ee0
$func2112 @ whisper-webgpu_bg.d05673f1.wasm:0x2e7485
$func1553 @ whisper-webgpu_bg.d05673f1.wasm:0x2c3fd5
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_b2267541e2a73865 @ 269-03125f41d949514b.js:2
$func1507 @ whisper-webgpu_bg.d05673f1.wasm:0x2c027d
$func238 @ whisper-webgpu_bg.d05673f1.wasm:0x8b780
$func241 @ whisper-webgpu_bg.d05673f1.wasm:0x962ae
$func1241 @ whisper-webgpu_bg.d05673f1.wasm:0x28f015
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_f7e06ee3c11698eb @ 269-03125f41d949514b.js:2
$func2048 @ whisper-webgpu_bg.d05673f1.wasm:0x2e41e7
$func2185 @ whisper-webgpu_bg.d05673f1.wasm:0x2ea92e
$func502 @ whisper-webgpu_bg.d05673f1.wasm:0x1c5ee0
$func2112 @ whisper-webgpu_bg.d05673f1.wasm:0x2e7485
$func1553 @ whisper-webgpu_bg.d05673f1.wasm:0x2c3fd5
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_b2267541e2a73865 @ 269-03125f41d949514b.js:2
$func1507 @ whisper-webgpu_bg.d05673f1.wasm:0x2c027d
$func238 @ whisper-webgpu_bg.d05673f1.wasm:0x8b780
$func241 @ whisper-webgpu_bg.d05673f1.wasm:0x962ae
$func1241 @ whisper-webgpu_bg.d05673f1.wasm:0x28f015
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_f7e06ee3c11698eb @ 269-03125f41d949514b.js:2
$func2048 @ whisper-webgpu_bg.d05673f1.wasm:0x2e41e7
$func2185 @ whisper-webgpu_bg.d05673f1.wasm:0x2ea92e
$func502 @ whisper-webgpu_bg.d05673f1.wasm:0x1c5ee0
$func2112 @ whisper-webgpu_bg.d05673f1.wasm:0x2e7485
$func1553 @ whisper-webgpu_bg.d05673f1.wasm:0x2c3fd5
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_b2267541e2a73865 @ 269-03125f41d949514b.js:2
$func1507 @ whisper-webgpu_bg.d05673f1.wasm:0x2c027d
$func238 @ whisper-webgpu_bg.d05673f1.wasm:0x8b780
$func241 @ whisper-webgpu_bg.d05673f1.wasm:0x962ae
$func1241 @ whisper-webgpu_bg.d05673f1.wasm:0x28f015
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_f7e06ee3c11698eb @ 269-03125f41d949514b.js:2
$func2048 @ whisper-webgpu_bg.d05673f1.wasm:0x2e41e7
$func2185 @ whisper-webgpu_bg.d05673f1.wasm:0x2ea92e
$func502 @ whisper-webgpu_bg.d05673f1.wasm:0x1c5ee0
$func2112 @ whisper-webgpu_bg.d05673f1.wasm:0x2e7485
$func1553 @ whisper-webgpu_bg.d05673f1.wasm:0x2c3fd5
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1
Promise.then (async)
t.wbg.__wbg_then_b2267541e2a73865 @ 269-03125f41d949514b.js:2
$func1507 @ whisper-webgpu_bg.d05673f1.wasm:0x2c027d
$func238 @ whisper-webgpu_bg.d05673f1.wasm:0x8b780
$func241 @ whisper-webgpu_bg.d05673f1.wasm:0x962ae
$func1241 @ whisper-webgpu_bg.d05673f1.wasm:0x28f015
$__wbindgen_export_3 @ whisper-webgpu_bg.d05673f1.wasm:0x2fd7f4
m @ 269-03125f41d949514b.js:1
_ @ 269-03125f41d949514b.js:1

Desktop (please complete the following information):

  • OS: [MacOS Monterey 12.6 (21G115)]
  • Browser [Version 117.0.5938.88 (Official Build) (x86_64)]

[Feature Request] Integration of word-level timestamps

The "word timestamps" feature would allow users to control the playback progress of an audio player and edit audio segments at a word level. This would be incredibly useful for applications requiring precise editing and navigation within audio files, such as in transcription, language learning tools, or detailed audio analysis.

Whisper already provides word-level timestamps.
I am looking forward to any information regarding the potential inclusion of this feature in the roadmap for Whisper-turbo.

Thank you for your time and the incredible work on this project.

[Question] Select Language

In the original model of Whisper in openai, I can choose which language to transfer to. Can I choose from this model?

I didn't find it on the source code.

[BUG] WAVE and WEBM crashes whisper-turbo

This issue has two problems:

  • Weak handling of wav and webm files (despite fixing the duration in webm)
  • Uncaught in promise that makes js unable to catch the failure ending up in unhandled situations

The same youtube-news file that we used previously converted into wav or wav16 makes whisper.turbo to crash without any error handling possible . Im sending the files to your email. Try at whisper-turbo.com

462.98ecf0d3854ca5c4.js:2 panicked at crates/whisper-core/src/logit_mutators/timestamp_rules.rs:87:62:
called `Result::unwrap()` on an `Err` value: UndefinedOrder

Stack:

Error
    at t.wbg.__wbg_new_abda76e883ba8a5f (https://whisper-turbo.com/_next/static/chunks/462.98ecf0d3854ca5c4.js:2:1742)
    at https://whisper-turbo.com/_next/static/media/whisper-webgpu_bg.1f358181.wasm:wasm-function[2328]:0x34911a
    at https://whisper-turbo.com/_next/static/media/whisper-webgpu_bg.1f358181.wasm:wasm-function[3093]:0x363550
    at https://whisper-turbo.com/_next/static/media/whisper-webgpu_bg.1f358181.wasm:wasm-function[2642]:0x356fdc
    at https://whisper-turbo.com/_next/static/media/whisper-webgpu_bg.1f358181.wasm:wasm-function[231]:0x48347
    at https://whisper-turbo.com/_next/static/media/whisper-webgpu_bg.1f358181.wasm:wasm-function[236]:0x5d1db
    at https://whisper-turbo.com/_next/static/media/whisper-webgpu_bg.1f358181.wasm:wasm-function[1419]:0x2eeb7a
    at https://whisper-turbo.com/_next/static/media/whisper-webgpu_bg.1f358181.wasm:wasm-function[3315]:0x365fc6
    at x (https://whisper-turbo.com/_next/static/chunks/462.98ecf0d3854ca5c4.js:1:4056)
    at o (https://whisper-turbo.com/_next/static/chunks/462.98ecf0d3854ca5c4.js:1:3932)


t.wbg.__wbg_error_f851667af71bcfc6 @ 462.98ecf0d3854ca5c4.js:2
$func2328 @ whisper-webgpu_bg.1f358181.wasm:0x3491e7
$func3093 @ whisper-webgpu_bg.1f358181.wasm:0x363550
$func2642 @ whisper-webgpu_bg.1f358181.wasm:0x356fdc
$func231 @ whisper-webgpu_bg.1f358181.wasm:0x48347
$func236 @ whisper-webgpu_bg.1f358181.wasm:0x5d1db
$func1419 @ whisper-webgpu_bg.1f358181.wasm:0x2eeb7a
$__wbindgen_export_3 @ whisper-webgpu_bg.1f358181.wasm:0x365fc6
x @ 462.98ecf0d3854ca5c4.js:1
o @ 462.98ecf0d3854ca5c4.js:1
whisper-webgpu_bg.1f358181.wasm:0x349230 Uncaught RuntimeError: unreachable
    at whisper-webgpu_bg.1f358181.wasm:0x349230
    at whisper-webgpu_bg.1f358181.wasm:0x363550
    at whisper-webgpu_bg.1f358181.wasm:0x356fdc
    at whisper-webgpu_bg.1f358181.wasm:0x48347
    at whisper-webgpu_bg.1f358181.wasm:0x5d1db
    at whisper-webgpu_bg.1f358181.wasm:0x2eeb7a
    at whisper-webgpu_bg.1f358181.wasm:0x365fc6
    at x (462.98ecf0d3854ca5c4.js:1:4056)
    at o (462.98ecf0d3854ca5c4.js:1:3932)

Minimum whisper code:

import * as whisper from 'whisper-webgpu';
 ...

try {
 await whisper.default();
 const builder = new whisper.SessionBuilder();
 const session = await builder.setModel(model).setTokenizer(tokenizer).build();
 const segments = await session.run(audio).catch(err => { throw err }); 
} catch(err) {
  // never enters here
}

Macos M1 16GB Ventura 13.1
Chromium 118.0.5993.32

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.