sandrohanea / whisper.net Goto Github PK
View Code? Open in Web Editor NEWWhisper.net. Speech to text made simple using Whisper Models
License: MIT License
Whisper.net. Speech to text made simple using Whisper Models
License: MIT License
Either I lack the understanding what this is supposed to do or it doesn't work.
I tried a variety values from 0.1f to 10f and I still had hallucinations remaining in my output, so stuff like "you" repeatedly which seems to occur whenever an empty audio stream is passed to whisper as that has likely not been including in the model training data.
I tried using WithThreads
I have a very powerful processor
But it didn't help anything, although it took more power from the processor, but it took the same amount of time, which means I just lost
Is there something I'm doing wrong?
I tried too
WithSpeedUp2x
And then it just doesn't transcribe anything!
Thanks for this wonderful library
and for your help
Thanks for creating this project!
I'd like to use it in an Android app with Maui. Are you planning to add support for Android?
I've managed to hack OpenBLAS support into the cmake files for linux-x64. For some reason, find_package(BLAS) does not work, and I had to set some variables manually. This resulted in greatly improved processing time on a 2 thread VM. On a random audio file I've been testing with, originally it took 1511 seconds, now it takes 770 seconds!
I'm very rusty with CMake, so I'm open to ideas.
I implemeted this library per the example provided here and I get the following exception when trting to submit requests:
System.Security.Authentication.AuthenticationException: Authentication failed
I guess this is due to missing api key.
Hello there . Greeting
I have a desktop application in which i am trying to integrate whisper .net library . The project is based on .net framework 4.7.2 and WPF . Visual studio 2019 is used
After adding the project from nugget when i try to add sample code in the project in the following line i get an compilation error .
await foreach (var result in processor.ProcessAsync(fileStream))
{
Console.WriteLine($"{result.Start}->{result.End}: {result.Text}");
}
The error is The type 'IAsyncEnumerable<>' is defined in an assembly that is not referenced. You must add a reference to assembly 'Microsoft.Bcl.AsyncInterfaces, Version=7.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'
after doing some research i found that IAsyncEnumerable is part of .net core sdk . to use it for framework version i need to use Microsoft.Bcl.AsyncInterfaces package . even after installing it the error is still there . i have also .net core installed in the sytem and also tried with chaning c# language version from 7.3 to 8.0 . error is still there
I have successfully build sample project which is based on .net core .
So my question is , is this library .net framework compatible or .net core is must ?
Hi!
I want to transcribe audio for only one duration. When I use both WithOffset() and WithDuration(), Whisper often outputs text that exceeds the duration setting length, is it my problem?
I can't get it to work for a project made with maui for mac osx (Debug a code made in maui for osx (net7.0-maccatalyst/maccatalyst-arm64).
I downloaded the specific libs for osx-arm64 and maccatalyst, but the code:
await foreach (var segment in processor.ProcessAsync(fileStream, CancellationToken.None))
It doesn't enter foreach and doesn't give an error. Can someone help me?
Thanks.
Is there a way in the library to use the microphone and not just transcribe an existing recording?
because the original library has
in whisper.cpp
When trying to build the WhisperProcessor I get the following error: 'Failed to load native whisper library.'
I am using Windows 11 Pro on ARM64 with the latest Visual Studio running .NET 7.
I don't know why this error occurs because I took this code from the example on GitHub and the model file is downloaded correctly.
Can you help me out?
Great job! it will be available for .NetFramework 4.7 ?
thanks!
Please ignore this post.
I'd love to use whisper.net with a graphics card. Waiting half an hour every time I run the code gets a tad tedious after some time. Is there a possibility how we can add graphics cores and mem support?
Anyways, thanks for the port :)
some times,we have to use winform to develop our software,so .net framework 4.7.2 can be supported,it will be well for winform.
Hello, thanks for porting this to .NET! I was playing around with it last night and found that each time a new segment is generated, the event handler receives all previous segments.
Here's a short example output from my program:
await using var fileStream = File.OpenRead("/home/evan/Downloads/audio/output.wav");
using var processor = WhisperProcessorBuilder.Create()
.WithSegmentEventHandler((sender, e) => Console.WriteLine("{0} - {1} - {2}", e.Start, e.End, e.Segment))
.WithFileModel("ggml-base.en.bin")
.WithThreads(1)
.WithLanguage("en")
.Build();
00:00:00 - 00:00:25.8400000 - CHAPTER I
00:00:00 - 00:00:25.8400000 - CHAPTER I
00:00:25.8400000 - 00:00:32.1600000 - The Jacques-Arde bathrobe hanging on his bedpost bore the monogram Hotel Ritz Paris.
00:00:00 - 00:00:25.8400000 - CHAPTER I
00:00:25.8400000 - 00:00:32.1600000 - The Jacques-Arde bathrobe hanging on his bedpost bore the monogram Hotel Ritz Paris.
00:00:32.1600000 - 00:00:36.4000000 - Slowly the fog began to lift. Langdon picked up the receiver.
00:00:00 - 00:00:25.8400000 - CHAPTER I
00:00:25.8400000 - 00:00:32.1600000 - The Jacques-Arde bathrobe hanging on his bedpost bore the monogram Hotel Ritz Paris.
00:00:32.1600000 - 00:00:36.4000000 - Slowly the fog began to lift. Langdon picked up the receiver.
00:00:36.4000000 - 00:00:37.4000000 - "Hello?"
00:00:00 - 00:00:25.8400000 - CHAPTER I
00:00:25.8400000 - 00:00:32.1600000 - The Jacques-Arde bathrobe hanging on his bedpost bore the monogram Hotel Ritz Paris.
00:00:32.1600000 - 00:00:36.4000000 - Slowly the fog began to lift. Langdon picked up the receiver.
00:00:36.4000000 - 00:00:37.4000000 - "Hello?"
00:00:37.4000000 - 00:00:43.1600000 - "Mr. Langdon?" a man's voice said. "I hope I have not awoken you."
Each time we get another segment, we also receive all previous segments... Is this by design?
Thanks for this great project!
I am attempting to use a WhisperProcessor
instance more than once, calling ProcessAsync
serially. However, after the first successful recognition, it only sometimes returns SegmentData
s. It appears that OnSegmentHandler
is not being called while in whisper_full_with_state
.
To work around this, I save the builder and then Build()
before every ProcessAsync
. Everything gets recognized successfully.
Hi,
I got this error message, which indicates that a program has attempted to execute an invalid or unsupported CPU instruction. My CPU is Intel(R) Core(TM) i5-2430M CPU @ 2.40GHz , X64-based processor.
Any idea?
The code
`void FullDetection()
{
var processor = WhisperProcessorBuilder.Create()
.WithSegmentEventHandler(OnNewSegment)
.WithFileModel(modelName)
.WithTranslate()
.WithLanguage("auto")
.Build();
void OnNewSegment(object sender, OnSegmentEventArgs e)
{
textBox1.Text= ($"CSSS {e.Start} ==> {e.End} : {e.Segment}");
}
lock (new object())
{
using (var fileStream = File.OpenRead(filename))
{
processor.Process(fileStream);
}
}
}`
Whisper.Net version: 1.2.2
Environment: win10-x64
.NET version: Framework 4.7.2
Model: Small.bin
wav file language: Japanese
Use Whisper.net.Demo sample code, in Program.cs, pass a 1 minute cancellation token to ProcessAsync():
var cts = new CancellationTokenSource(TimeSpan.FromMinutes(1));
await foreach (var segment in processor.ProcessAsync(fileStream, cts.Token))
{
Console.WriteLine($"New Segment: {segment.Start} ==> {segment.End} : {segment.Text}");
}
After 1min, the Demo crashed by AccessViolationException:
0x00007FFB245934AE (whisper.dll)处(位于 Whisper.net.Demo.exe 中)引发的异常: 0xC0000005: 读取位置 0x000001D0863A1D90 时发生访问冲突。
thank u
The NativeLibraryLoader uses conditional preprocessor directives (#if ANDROID) which are not being applied so the existing NuGet ends up trying to load the Linux arm64 library and fails. Might be wrong but I think looking for the platform at runtime would be the correct way of supporting the various platforms. Cheers!
I was able to get the demo code running using the Kennedy.wav file. But when I recorded a file using the Windows 11 Recorder it said the wave file header was invalid.
Whisper.net.Wave.CorruptedWaveException: 'Invalid wave file header.'
Windows 11 Sound Recorder can generate Wav files of various qualities.
I took your suggestion from Issue #33 and wrote out the headers for each quality level.
Kennedy.wav: RIFF?¶WAVEfmt
Auto.wav: RIFF??☻WAVEJUNK
Medium.wav: RIFFJ?☺WAVEJUNK
Best.wav: RIFFB?WAVEJUNK
High.wav: RIFFv?♠WAVEJUNK
I would have expected these files to be valid. Is there something I'm missing?
at Whisper.net.WhisperProcessor.Process(System.IO.Stream)
at WhisperAI.AudioProcessor+d__1.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.__Canon ByRef)
at WhisperAI.AudioProcessor.ProcessAudio(System.String)
at Program+<
This error happens on big and long .WAV files (200mb and 40minutes long) It happens very randomly.
there is no interface in the binding?
The GetAvgSamples() method reads the number of samples from the stream based on the value in SamplesCount. However the async version of that method reads until the end of the stream without considering SamplesCount. If there are other chunks after the data chunk this leads to an out of bounds exception.
A workaround right now is processor.ProcessAsync(new WaveParser(fileStream).GetAvgSamples())), effectively reading sync, but still processing async.
Hello!
Is there any information which "With~" in the fluent api corresponds to which settings/flags in whisper.cpp?
I'm mostly interested in -ml
flag, which allows for limiting output length per line.
Looks like the WithMaxSegmentLength()
should work the same way as -ml
but I think it does not
Thanks!
When trying to use the WhisperGgmlDownloader.GetGgmlModelAsync
method with the large model
using var modelStream = await WhisperGgmlDownloader.GetGgmlModelAsync(GgmlType.Large);
I get the following error:
System.Net.Http.HttpRequestException: Cannot write more bytes to the buffer than the configured maximum buffer size: 2147483647
Pull request for bugfix is out. Changed issue to solved
I copy/pasted the demo code to the file whisper.cs isntalled the packages in nuget. The only changes I made are changing the models (base to large) and the file name inside Default="" in the'f' Option. The code is besides taht really the same as the demo code of this repo!
The wav files are in the project folder and registered by whisper.net
But, unfortunateIy I get the following error message every time no matter which wav file I try:
at Whisper.net.Wave.WaveParser.InitializeAsync()
at Whisper.net.Wave.WaveParser.GetAvgSamplesAsync(CancellationToken cancellationToken)
at Whisper.net.WhisperProcessor.ProcessAsync(Stream waveStream, CancellationToken cancellationToken)+MoveNext()
at Whisper.net.WhisperProcessor.ProcessAsync(Stream waveStream, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
at Program.<<Main>$>g__FullDetection|0_2(Options opt) in C:\Users\huddeij\RiderProjects\whisperTest\whisper2.cs:line 80
at Program.<<Main>$>g__FullDetection|0_2(Options opt) in C:\Users\huddeij\RiderProjects\whisperTest\whisper2.cs:line 80
at Program.<<Main>$>g__Demo|0_0(Options opt) in C:\Users\huddeij\RiderProjects\whisperTest\whisper2.cs:line 33
at CommandLine.ParserResultExtensions.WithParsedAsync[T](ParserResult`1 result, Func`2 action)
at Program.<Main>$(String[] args) in C:\Users\huddeij\RiderProjects\whisperTest\whisper2.cs:line 13
at Program.<Main>(String[] args)
The output before the error message:
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1280
whisper_model_load: n_text_head = 20
whisper_model_load: n_text_layer = 32
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: type = 5
whisper_model_load: mem required = 3557.00 MB (+ 71.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx = 2950.97 MB
whisper_model_load: model size = 2950.66 MB
I tried the sample wav files from this repo, audio records, converted into wav via cloudconvert and ffmpeg.
Environment:
MS Windows 11 Pro 22H2
.Net v7.0.203
Jetbrains Rider 2023.1.1
What am i doing wrong here?
I can't get it work, I keep getting error "Failed to load native whisper library". Not sure if I'm supposed to do something rather then adding packages in project, downloading model and creating processor. I'm doing exactly what is done in Simple example.
The error appears at following line:
using var whisperFactory = WhisperFactory.FromPath("ggml-base.bin");
I'm integrating into dotnet core 5 application running on linux-x64 machine.
Should I manually run whisper library or add it somewhere? Not sure that I understand the process at all..
`
// This section detects whether the "ggml-base.bin" file exists in our project disk. If it doesn't, it downloads it from the internet
if (!System.IO.File.Exists(modelFileName))
{
await DownloadModel(modelFileName, ggmlType);
}
// This section creates the whisperFactory object which is used to create the processor object.
using var whisperFactory = WhisperFactory.FromPath("ggml-base.bin");
// This section creates the processor object which is used to process the audio file, it uses language `auto` to detect the language of the audio file.
using var processor = whisperFactory.CreateBuilder()
.WithLanguage("auto")
.Build();
`
If you give an invalid path or a path that does not exist yet to WhisperFactory.FromPath
, the program waits until NativeMethods.whisper_init_state
is called to fail. And it fails hard by throwing a non-recoverable AccessViolationExeption. If it is the correct approach, could you add a validation to the whisper factory to prevent this?
I seem to have run into an issue with a memorystream created by letting ffmpegcore download a video and stripping the audio from it.
As soon as it arrives at the ProcessAsync call. The application uses almost 10 gigs of memory. Calling Process instead of the async variant leads to "unable to read beyond the end of the stream"
I debugged part of it already and saw that the dataChunkSize seems to be massive compared to the memorystreams length (memory streams length: 888910)(https://github.com/sandrohanea/whisper.net/blob/main/Whisper.net/Wave/WaveParser.cs#L356)
When I hardcoded the dataChunkSize to be the length, it read the stream fine and gave me the expected output.
I wondered if you could tell me what might be the issue here. Either by the settings for the wave (unsupported codec or something else) or what might go wrong with reading the created memorystream. I added an example project to this post. (You might need to get the required ffmpeg binaries from https://ffbinaries.com/downloads)
Is there any built-in way to identify two speakers?
Thank you
I'm looking for ways to make the transcription faster?
This library is excellent and I really enjoy it, but the transcription takes me a long time.
I have a powerful CPU
And a powerful GPU
And still it's very slow, it doesn't use at all the CPU it could use, certainly not the GPU
Is there a way I can make it work faster?
I tested the new version, and most of the usage is on the CPU, only occasionally it uses the GPU for a moment, I don't know why not all the usage is on the GPU, which would be much faster?
You can also see here
If you use what he did, it is only on GPU and works very fast
https://github.com/Const-me/Whisper
Thanks for everything, it definitely improved performance, but not as much as I expected
I get an error when initializing WhisperFactory:
System.EntryPointNotFoundException: Unable to find an entry point named 'whisper_init_from_file_no_state' in DLL 'whisper'.
at Whisper.net.Native.NativeMethods.whisper_init_from_file_no_state(String path)
at Whisper.net.Internals.ModelLoader.WhisperProcessorModelFileLoader.LoadNativeContext()
at Whisper.net.WhisperFactory..ctor(IWhisperProcessorModelLoader loader, Boolean delayInit, String libraryPath, Boolean bypassLoading)
at Whisper.net.WhisperFactory.FromPath(String path, Boolean delayInitialization, String libraryPath, Boolean bypassLoading)
Tested with both version 1.4.3 and 1.4.2 on Windows Server 2022 x64 AMD EPYC.
using (var whisperFactory = WhisperFactory.FromPath(modelPath)) // this gives exception
{
// ...
}
Hello
In both 1.4 (and the newly released 1.4.2) I'm getting the error
System.Exception : Failed to load native whisper library. Error: The specified module could not be found.
I've dug into the Whisper.net.Runtime package and it looks like the win-x64/whisper.dll
is not supported. The Rider assembly explorer is flagging it as a Win32 resource
- perhaps the pipeline built the wrong file?
Suggestions on how to fix?
Many thanks
How can we use the library with MP3 files? At the moment when working with MP3, the error "Invalid wave file RIFF header" is thrown. The original Whisper supports MP3 files.
I'm downloaded model Medium and Large at https://ggml.ggerganov.com/
After run:
whisper_init_from_file: loading model from 'ggml-model-whisper-medium.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 2
whisper_model_load: type = 4
whisper_model_load: mem required = 1720.00 MB (+ 43.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx = 1462.35 MB
Is there a way to do this in C#?
Hi y'all, thank you for working on this .NET implementation for Whisper!
I'm trying to run the "Simple" example from the repo but run into issues on macOS Ventura (ARM, M1 Pro).
It appears to find the native library but can't call it correctly.
Older Whisper.net versions (1.4.4, 1.4.3) are exhibiting the same behavior.
whisper.net/examples/Simple on main [✘] via .NET 7.0.101
➜ dotnet run --framework net6.0
Downloading Model ggml-base.bin
whisper_init_from_file_no_state: loading model from 'ggml-base.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2
whisper_model_load: mem required = 310.00 MB (+ 6.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx = 140.66 MB
whisper_model_load: model size = 140.54 MB
Unhandled exception. System.EntryPointNotFoundException: Unable to find an entry point named 'whisper_full_default_params_by_ref' in shared library 'whisper'.
at Whisper.net.Native.NativeMethods.whisper_full_default_params_by_ref(WhisperSamplingStrategy strategy)
at Whisper.net.WhisperProcessor.GetWhisperParams()
at Whisper.net.WhisperProcessor..ctor(WhisperProcessorOptions options)
at Whisper.net.WhisperProcessorBuilder.Build()
at Program.Main(String[] args) in /Users/philippbauer/Learning/whisper.net/examples/Simple/Program.cs:line 29
at Program.<Main>(String[] args)
When using .WithLanguageDetection()
or .WithLanguage("auto")
the language is always English auto-detected language: en (p = 0.515557)
but if you specify the correct language then everything is fine
Other "port" and "bindings" libraries work fine and detect the language correctly
Whisper.net.Wave.NotSupportedWaveException
HResult=0x80131500
Message=Only 16KHz sample rate is supported.
How To solve this problem
Hello, the code examples are all asynchronous. Do you have any synchronous examples?
HI,
I have been trying to run this code, but I keep getting this error message:
System.Exception: 'Failed to load native whisper library.'
`
using System;
using System.IO;
using System.Threading.Tasks;
using System.Windows.Forms;
using CommandLine;
using Whisper.net;
using Whisper.net.Ggml;
using Whisper.net.Wave;
namespace NameSpace
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private async void button1_Click(object sender, EventArgs e)
{
await Parser.Default.ParseArguments<object>(new string[] { })
.WithParsedAsync(this.Demo);
}
string modelName = "ggml-base.bin";
string filename = "1min.wav";
async Task Demo(object opt)
{
if (!File.Exists(modelName))
{
Console.WriteLine($"Downloading Model ggml-base.bin");
var modelStream = await WhisperGgmlDownloader.GetGgmlModelAsync(GgmlType.BaseEn);
var fileWriter = File.OpenWrite(modelName);
await modelStream.CopyToAsync(fileWriter);
}
FullDetection();
}
void FullDetection()
{
var processor = WhisperProcessorBuilder.Create()
.WithSegmentEventHandler(OnNewSegment)
.WithFileModel(modelName)
.WithTranslate()
.WithLanguage("auto")
.Build();
void OnNewSegment(object sender, OnSegmentEventArgs e)
{
Console.WriteLine($"CSSS {e.Start} ==> {e.End} : {e.Segment}");
}
var fileStream = File.OpenRead(filename);
processor.Process(fileStream);
}
}
}
`
Improve model downloader with new HuggingFace link, to include quantized models and CoreML models.
if you specify .WithMaxTokensPerSegment(1)
then there will be only one segment in output. Everything is fine in whisper.cpp library.
Hi,
Thanks for an awesome library. When looking to incorporate Whisper.net into a product, we would need the ability to load the native .dylib from a location other than where the NativeLibraryLoader is currently trying to find the binary (under runtimes/...).
Would it be possible to make an extension to the API to allow the library user to specify search paths manually? We'd much prefer this over having to fork the library just for this purpose.
CUDA error 35 at Z:\Projects\sandro\whisper.net\whisper.cpp\ggml-cuda.cu:420: CUDA driver version is insufficient for CUDA runtime version,
In my computer there is no Z driver and path; And I installed CUDA driver is the newest from official site. I was installed whisper.net.Runtime.Cublas, Whisper.net. version 1.4.5
try
{
await foreach (var segment in processor.ProcessAsync(decodedFileStream, ctx))
yield return segment;
}
finally
{
processor.Dispose();
}
// CPU Usage is still 100% here
I can see 100% CPU usage after ProcessAsync throws.
if I start to process another file before the CPU usage drops to zero, it pretty much crawls to a halt for minutes until the original instance terminates.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.