sandrohanea / whisper.net Goto Github PK

View Code? Open in Web Editor NEW

488.0 20.0 76.0 60.04 MB

Whisper.net. Speech to text made simple using Whisper Models

License: MIT License

CMake 11.85% C# 30.35% PowerShell 1.11% Makefile 3.93% Metal 52.74%

cross-platform dotnet dotnetcore speech-recognition speech-to-text translation

whisper.net's Introduction

Whisper.net

Open-Source Whisper.net

Dotnet bindings for OpenAI Whisper made possible by whisper.cpp

Native builds:

Getting started

To install Whisper.net, run the following command in the Package Manager Console:

PM> Install-Package Whisper.net
PM> Install-Package Whisper.net.Runtime

or simply add a package reference in your csproj:

    <PackageReference Include="Whisper.net" Version="1.5.0" />
    <PackageReference Include="Whisper.net.Runtime" Version="1.5.0" />

GPT for whisper

We also have a custom-built GPT inside chatgpt, which can help you with information based on this code, previous issues and releases available here.

Please, make sure you try to ask it before publishing a new question here, as it can be a lot faster.

Runtime

The runtime package, Whisper.net.Runtime, contains the native whisper.cpp library and it is required in order to run Whisper.net.

CoreML Runtime

Whisper.net.Runtime.CoreML contains the native whisper.cpp library with Apple CoreML support enabled. Using this on Apple hardware (macOS, iOS, etc.) can net performance improvements over the core runtimes. To use it, reference the Whisper.net.Runtime.CoreML nuget,

    <PackageReference Include="Whisper.net" Version="1.5.0" />
    <PackageReference Include="Whisper.net.Runtime.CoreML" Version="1.5.0" />

Note that only the CoreML built libraries are available in this package and does not contain libraries for other platforms (Linux, Windows, etc). If you are creating a cross-platform application you can use conditional target frameworks to install the correct library package for each version.

Using the ggml whisper models with CoreML requires an additional mlmodelc file to be placed alongside your whisper model.

You can download and extract these using WhisperGgmlDownloader. Check the CoreML example.

You can also generate these via the whisper.cpp scripts. As whisper.cpp uses filepaths to detect this folder, you must load your whisper model with a file path.

If successful, the whisper output logs will announce:

whisper_init_state: loading Core ML model from...

If not, it will announce an error and use the original core library instead.

GPU Support

Dependeing on your GPU, you can use either Whisper.net.Runtime.Cublas or Whisper.net.Runtime.Clblast. For now, they are only available on Windows x64 and Linux x64 (only Cublas).

Check the Cublas and Clblast examples.

Blazor and WASM

Blazor is supported with both InteractivityServer and InteractivityWebAssemly. You can check the Blazor example here.

Versioning

Each version of Whisper.net is tied to a specific version of Whisper.cpp. The version of Whisper.net is the same as the version of Whisper it is based on. For example, Whisper.net 1.2.0 is based on Whisper.cpp 1.2.0.

However, the patch version is not tied to Whisper.cpp. For example, Whisper.net 1.2.1 is based on Whisper.cpp 1.2.0 and Whisper.net 1.5.0 is based on Whisper.cpp 1.5.1.

Ggml Models

Whisper.net uses Ggml models to perform speech recognition and translation.

You can find more about Ggml models here

Also, for easier integration Whisper.net provides a Downloader which is using https://huggingface.co.

    var modelName = "ggml-base.bin";
    if (!File.Exists(modelName))
    {
        using var modelStream = await WhisperGgmlDownloader.GetGgmlModelAsync(GgmlType.Base);
        using var fileWriter = File.OpenWrite(modelName);
        await modelStream.CopyToAsync(fileWriter);
    }

Usage

    using var whisperFactory = WhisperFactory.FromPath("ggml-base.bin");

    using var processor = whisperFactory.CreateBuilder()
        .WithLanguage("auto")
        .Build();

    using var fileStream = File.OpenRead(wavFileName);

    await foreach(var result in processor.ProcessAsync(fileStream))
    {
        Console.WriteLine($"{result.Start}->{result.End}: {result.Text}");
    }

Examples

Check more examples here

Documentation

You can find the documentation and code samples here: https://github.com/sandrohanea/whisper.net

Building The Runtime

The build scripts are a combination of PowerShell scripts and a Makefile. You can read each of them for the underlying cmake commands being used, or run them directly from the scripts.

Android:

make android

Before running, create an environment variable for NDK_PATH with the path to your Android NDK. For example,

NDK_PATH=/Users/UserName/Library/Developer/Xamarin/android-sdk-macosx/ndk-bundle

Apple:

make apple

Compiling the Apple libraries requires a Mac with Xcode installed.

Apple CoreML:

make apple_coreml

Compiling the Apple libraries requires a Mac with Xcode installed.

Linux:

make linux

Windows:

Import the powershel module Import-Module ./windows-scripts.ps1
Run BuildWindowsAll to build all Windows libraries

License

MIT Licence https://github.com/sandrohanea/whisper.net/blob/main/LICENSE

Supported platforms

Whisper.net is supported on the following platforms:

Windows x86
Windows x64
Windows ARM64
Windows ARM
Linux x64
Linux ARM64
Linux ARM
macOS x64
macOS ARM64 (Apple Silicon)
Android
iOS
MacCatalyst
tvOS
WebAssembly

whisper.net's People

Contributors

Stargazers

Watchers

Forkers

dkakaie malincrist p3ngu1nzz adamnova minsley aki-0929 jettoblack vadd98 johnsmithtoyou clarity99 dimq1 kleyoit mlnethub awesomeyuer david0718 allenlin527 r618 drasticactions igoryunusov darth-vader-lg drericebert baoyuebo joyswing jonyhuang xioasong danield2019 chenqian2008 jsliugang germos csuffyy guigenyi leecloudvictor gfniko drajvver jsliugang00 illuminare zhangjie19911023 hillcat jdluzen trrahul giagiigi wangchengqun esbencarlsen jkmchinese minvision avensun nasa03 mochidoesvr dengber kjpou1 rkhomyk patrickwf brooklet xlaoshu kadeshar ryanmetcalfeint8 christianniederm ashd asahijoe aidajiangtang lopesclayton zorrowm sdozono wecando168 jasonzhang761213 will-wangbo grigoriy-yashin emanuelandeson gustavogmoraes nripeshn bestowl joelvaneenwyk billylo1

whisper.net's Issues

Using WhisperProcessor.ProcessAsync more than once

Thanks for this great project!
I am attempting to use a WhisperProcessor instance more than once, calling ProcessAsync serially. However, after the first successful recognition, it only sometimes returns SegmentDatas. It appears that OnSegmentHandler is not being called while in whisper_full_with_state.

To work around this, I save the builder and then Build() before every ProcessAsync. Everything gets recognized successfully.

Issue with Wave Memorystream obtained through ffmpegcore

I seem to have run into an issue with a memorystream created by letting ffmpegcore download a video and stripping the audio from it.
As soon as it arrives at the ProcessAsync call. The application uses almost 10 gigs of memory. Calling Process instead of the async variant leads to "unable to read beyond the end of the stream"

I debugged part of it already and saw that the dataChunkSize seems to be massive compared to the memorystreams length (memory streams length: 888910)(https://github.com/sandrohanea/whisper.net/blob/main/Whisper.net/Wave/WaveParser.cs#L356)
When I hardcoded the dataChunkSize to be the length, it read the stream fine and gave me the expected output.

I wondered if you could tell me what might be the issue here. Either by the settings for the wave (unsupported codec or something else) or what might go wrong with reading the created memorystream. I added an example project to this post. (You might need to get the required ffmpeg binaries from https://ffbinaries.com/downloads)

WhisperIssueExample project

The program has exited with code -1073741795 (0xc000001d) 'Illegal Instruction'.

Hi,
I got this error message, which indicates that a program has attempted to execute an invalid or unsupported CPU instruction. My CPU is Intel(R) Core(TM) i5-2430M CPU @ 2.40GHz , X64-based processor.

Any idea?

The code

`void FullDetection()
{
var processor = WhisperProcessorBuilder.Create()
.WithSegmentEventHandler(OnNewSegment)
.WithFileModel(modelName)
.WithTranslate()
.WithLanguage("auto")
.Build();

        void OnNewSegment(object sender, OnSegmentEventArgs e)
        {
            textBox1.Text= ($"CSSS {e.Start} ==> {e.End} : {e.Segment}");
        }

        lock (new object())
        {
            using (var fileStream = File.OpenRead(filename))
            {
                processor.Process(fileStream);
            }
        }

    }`

Failed to load native whisper library

I can't get it work, I keep getting error "Failed to load native whisper library". Not sure if I'm supposed to do something rather then adding packages in project, downloading model and creating processor. I'm doing exactly what is done in Simple example.
The error appears at following line:
using var whisperFactory = WhisperFactory.FromPath("ggml-base.bin");

I'm integrating into dotnet core 5 application running on linux-x64 machine.
Should I manually run whisper library or add it somewhere? Not sure that I understand the process at all..

    // This section detects whether the "ggml-base.bin" file exists in our project disk. If it doesn't, it downloads it from the internet
    if (!System.IO.File.Exists(modelFileName))
    {
      await DownloadModel(modelFileName, ggmlType);
    }

    // This section creates the whisperFactory object which is used to create the processor object.
    using var whisperFactory = WhisperFactory.FromPath("ggml-base.bin");

    // This section creates the processor object which is used to process the audio file, it uses language `auto` to detect the language of the audio file.
    using var processor = whisperFactory.CreateBuilder()
        .WithLanguage("auto")
        .Build();

Do you have any synchronous examples?

Hello, the code examples are all asynchronous. Do you have any synchronous examples?

Tiny CoreML zip file contains duplicate of itself

All files in the CoreML encoder directory exist as a duplicate within themselves:

GetAvgSamplesAsync reads beyond data chunk

The GetAvgSamples() method reads the number of samples from the stream based on the value in SamplesCount. However the async version of that method reads until the end of the stream without considering SamplesCount. If there are other chunks after the data chunk this leads to an out of bounds exception.
A workaround right now is processor.ProcessAsync(new WaveParser(fileStream).GetAvgSamples())), effectively reading sync, but still processing async.

Controlling the length of the generated text segments

Please ignore this post.

System.Exception: 'Failed to load native whisper library.'

HI,

I have been trying to run this code, but I keep getting this error message:
System.Exception: 'Failed to load native whisper library.'

`
using System;
using System.IO;
using System.Threading.Tasks;
using System.Windows.Forms;
using CommandLine;
using Whisper.net;
using Whisper.net.Ggml;
using Whisper.net.Wave;

namespace NameSpace
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}

    private async void button1_Click(object sender, EventArgs e)
    {

        await Parser.Default.ParseArguments<object>(new string[] { })
            .WithParsedAsync(this.Demo);
    }
    string modelName = "ggml-base.bin";
    string filename = "1min.wav"; 
    async Task Demo(object opt)
    {
        if (!File.Exists(modelName))
        {
            Console.WriteLine($"Downloading Model ggml-base.bin");
            var modelStream = await WhisperGgmlDownloader.GetGgmlModelAsync(GgmlType.BaseEn);
            var fileWriter = File.OpenWrite(modelName);
            await modelStream.CopyToAsync(fileWriter);
        }

        FullDetection();
    }

 

    void FullDetection()
    {
        var processor = WhisperProcessorBuilder.Create()
        .WithSegmentEventHandler(OnNewSegment)
        .WithFileModel(modelName)
        .WithTranslate()
         .WithLanguage("auto")
        .Build();

        void OnNewSegment(object sender, OnSegmentEventArgs e)
        {
            Console.WriteLine($"CSSS {e.Start} ==> {e.End} : {e.Segment}");
        }

        var fileStream = File.OpenRead(filename);
        processor.Process(fileStream);
    }
}

}

I hope this project can support .net framework 4.7.2

some times,we have to use winform to develop our software,so .net framework 4.7.2 can be supported,it will be well for winform.

Only 16KHz sample rate is supported.

Whisper.net.Wave.NotSupportedWaveException
HResult=0x80131500
Message=Only 16KHz sample rate is supported.

How To solve this problem

Could not able to compile and build while trying to integrate in .net framework project 4.7.2

Hello there . Greeting

I have a desktop application in which i am trying to integrate whisper .net library . The project is based on .net framework 4.7.2 and WPF . Visual studio 2019 is used

After adding the project from nugget when i try to add sample code in the project in the following line i get an compilation error .

await foreach (var result in processor.ProcessAsync(fileStream))
           {
               Console.WriteLine($"{result.Start}->{result.End}: {result.Text}");
           }

The error is The type 'IAsyncEnumerable<>' is defined in an assembly that is not referenced. You must add a reference to assembly 'Microsoft.Bcl.AsyncInterfaces, Version=7.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'

after doing some research i found that IAsyncEnumerable is part of .net core sdk . to use it for framework version i need to use Microsoft.Bcl.AsyncInterfaces package . even after installing it the error is still there . i have also .net core installed in the sytem and also tried with chaning c# language version from 7.3 to 8.0 . error is still there

I have successfully build sample project which is based on .net core .

So my question is , is this library .net framework compatible or .net core is must ?

is there a possibility to add CUDA or OpenCL support to whisper.net?

I'd love to use whisper.net with a graphics card. Waiting half an hour every time I run the code gets a tad tedious after some time. Is there a possibility how we can add graphics cores and mem support?

Anyways, thanks for the port :)

GPU support Error

CUDA error 35 at Z:\Projects\sandro\whisper.net\whisper.cpp\ggml-cuda.cu:420: CUDA driver version is insufficient for CUDA runtime version，

In my computer there is no Z driver and path; And I installed CUDA driver is the newest from official site. I was installed whisper.net.Runtime.Cublas, Whisper.net. version 1.4.5

NuGet library is not able to load the whisper library on Android

The NativeLibraryLoader uses conditional preprocessor directives (#if ANDROID) which are not being applied so the existing NuGet ends up trying to load the Linux arm64 library and fails. Might be wrong but I think looking for the platform at runtime would be the correct way of supporting the various platforms. Cheers!

Most of the usage is in the CPU and not in the GPU

I tested the new version, and most of the usage is on the CPU, only occasionally it uses the GPU for a moment, I don't know why not all the usage is on the GPU, which would be much faster?
You can also see here
If you use what he did, it is only on GPU and works very fast
https://github.com/Const-me/Whisper
Thanks for everything, it definitely improved performance, but not as much as I expected

How do you do it faster?

I'm looking for ways to make the transcription faster?
This library is excellent and I really enjoy it, but the transcription takes me a long time.
I have a powerful CPU
And a powerful GPU
And still it's very slow, it doesn't use at all the CPU it could use, certainly not the GPU
Is there a way I can make it work faster?

netstandard2.0 dll is not installed

I will run this project in visual studio 2022 ,all dll is installed but netstandard2.0 is not installed . I installed .Net Core 2 and Desktop Development C++ library but still this error remain.

Canceling ProcessAsync returns before inference terminates.

try
{
    await foreach (var segment in processor.ProcessAsync(decodedFileStream, ctx))
        yield return segment;
}
finally
{
    processor.Dispose();
}

// CPU Usage is still 100% here

I can see 100% CPU usage after ProcessAsync throws.

if I start to process another file before the CPU usage drops to zero, it pretty much crawls to a halt for minutes until the original instance terminates.

Using a microphone

Is there a way in the library to use the microphone and not just transcribe an existing recording?
because the original library has
in whisper.cpp

Unable to find an entry point named 'whisper_init_from_file_no_state' in DLL 'whisper'

I get an error when initializing WhisperFactory:

System.EntryPointNotFoundException: Unable to find an entry point named 'whisper_init_from_file_no_state' in DLL 'whisper'.
   at Whisper.net.Native.NativeMethods.whisper_init_from_file_no_state(String path)
   at Whisper.net.Internals.ModelLoader.WhisperProcessorModelFileLoader.LoadNativeContext()
   at Whisper.net.WhisperFactory..ctor(IWhisperProcessorModelLoader loader, Boolean delayInit, String libraryPath, Boolean bypassLoading)
   at Whisper.net.WhisperFactory.FromPath(String path, Boolean delayInitialization, String libraryPath, Boolean bypassLoading)

Tested with both version 1.4.3 and 1.4.2 on Windows Server 2022 x64 AMD EPYC.

            using (var whisperFactory = WhisperFactory.FromPath(modelPath)) // this gives exception
            {
                 // ...
            }

How to convert whisper model to GGML

Is there a way to do this in C#?

Explanation of FluentAPI settings

Hello!

Is there any information which "With~" in the fluent api corresponds to which settings/flags in whisper.cpp?
I'm mostly interested in -ml flag, which allows for limiting output length per line.

Looks like the WithMaxSegmentLength() should work the same way as -ml but I think it does not

Thanks!

Repeats Previous Segments

Hello, thanks for porting this to .NET! I was playing around with it last night and found that each time a new segment is generated, the event handler receives all previous segments.

Here's a short example output from my program:

await using var fileStream = File.OpenRead("/home/evan/Downloads/audio/output.wav");
using var processor = WhisperProcessorBuilder.Create()
    .WithSegmentEventHandler((sender, e) => Console.WriteLine("{0} - {1} - {2}", e.Start, e.End, e.Segment))
    .WithFileModel("ggml-base.en.bin")
    .WithThreads(1)
    .WithLanguage("en")
    .Build();

00:00:00 - 00:00:25.8400000 -  CHAPTER I
00:00:00 - 00:00:25.8400000 -  CHAPTER I
00:00:25.8400000 - 00:00:32.1600000 -  The Jacques-Arde bathrobe hanging on his bedpost bore the monogram Hotel Ritz Paris.
00:00:00 - 00:00:25.8400000 -  CHAPTER I
00:00:25.8400000 - 00:00:32.1600000 -  The Jacques-Arde bathrobe hanging on his bedpost bore the monogram Hotel Ritz Paris.
00:00:32.1600000 - 00:00:36.4000000 -  Slowly the fog began to lift. Langdon picked up the receiver.
00:00:00 - 00:00:25.8400000 -  CHAPTER I
00:00:25.8400000 - 00:00:32.1600000 -  The Jacques-Arde bathrobe hanging on his bedpost bore the monogram Hotel Ritz Paris.
00:00:32.1600000 - 00:00:36.4000000 -  Slowly the fog began to lift. Langdon picked up the receiver.
00:00:36.4000000 - 00:00:37.4000000 -  "Hello?"
00:00:00 - 00:00:25.8400000 -  CHAPTER I
00:00:25.8400000 - 00:00:32.1600000 -  The Jacques-Arde bathrobe hanging on his bedpost bore the monogram Hotel Ritz Paris.
00:00:32.1600000 - 00:00:36.4000000 -  Slowly the fog began to lift. Langdon picked up the receiver.
00:00:36.4000000 - 00:00:37.4000000 -  "Hello?"
00:00:37.4000000 - 00:00:43.1600000 -  "Mr. Langdon?" a man's voice said. "I hope I have not awoken you."

Each time we get another segment, we also receive all previous segments... Is this by design?

Identifying two speakers

Is there any built-in way to identify two speakers?
Thank you

Win-x64 library issue

Hello

In both 1.4 (and the newly released 1.4.2) I'm getting the error
System.Exception : Failed to load native whisper library. Error: The specified module could not be found.

I've dug into the Whisper.net.Runtime package and it looks like the win-x64/whisper.dll is not supported. The Rider assembly explorer is flagging it as a Win32 resource - perhaps the pipeline built the wrong file?

Suggestions on how to fix?

Many thanks

How to handle real-time sound streams

thank u

Improve Model Downloader

Improve model downloader with new HuggingFace link, to include quantized models and CoreML models.

Working with MP3s

How can we use the library with MP3 files? At the moment when working with MP3, the error "Invalid wave file RIFF header" is thrown. The original Whisper supports MP3 files.

Where should I insert my api key?

I implemeted this library per the example provided here and I get the following exception when trting to submit requests:
System.Security.Authentication.AuthenticationException: Authentication failed

I guess this is due to missing api key.

Failed to load native whisper library.

When trying to build the WhisperProcessor I get the following error: 'Failed to load native whisper library.'

I am using Windows 11 Pro on ARM64 with the latest Visual Studio running .NET 7.

I don't know why this error occurs because I took this code from the example on GitHub and the model file is downloaded correctly.
Can you help me out?

Unable to find an entry point named 'whisper_full_default_params_by_ref' in shared library 'whisper'.

Hi y'all, thank you for working on this .NET implementation for Whisper!

I'm trying to run the "Simple" example from the repo but run into issues on macOS Ventura (ARM, M1 Pro).
It appears to find the native library but can't call it correctly.

Older Whisper.net versions (1.4.4, 1.4.3) are exhibiting the same behavior.

whisper.net/examples/Simple on  main [✘] via .NET 7.0.101 
➜ dotnet run --framework net6.0
Downloading Model ggml-base.bin
whisper_init_from_file_no_state: loading model from 'ggml-base.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2
whisper_model_load: mem required  =  310.00 MB (+    6.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx     =  140.66 MB
whisper_model_load: model size    =  140.54 MB
Unhandled exception. System.EntryPointNotFoundException: Unable to find an entry point named 'whisper_full_default_params_by_ref' in shared library 'whisper'.
   at Whisper.net.Native.NativeMethods.whisper_full_default_params_by_ref(WhisperSamplingStrategy strategy)
   at Whisper.net.WhisperProcessor.GetWhisperParams()
   at Whisper.net.WhisperProcessor..ctor(WhisperProcessorOptions options)
   at Whisper.net.WhisperProcessorBuilder.Build()
   at Program.Main(String[] args) in /Users/philippbauer/Learning/whisper.net/examples/Simple/Program.cs:line 29
   at Program.<Main>(String[] args)

Add Android support

Thanks for creating this project!
I'd like to use it in an Android app with Maui. Are you planning to add support for Android?

how to retrieve diarization and how to use prompt?

there is no interface in the binding?

WithDuration() is not working?

Hi!
I want to transcribe audio for only one duration. When I use both WithOffset() and WithDuration(), Whisper often outputs text that exceeds the duration setting length, is it my problem?

Catching the native call exception

If you give an invalid path or a path that does not exist yet to WhisperFactory.FromPath, the program waits until NativeMethods.whisper_init_state is called to fail. And it fails hard by throwing a non-recoverable AccessViolationExeption. If it is the correct approach, could you add a validation to the whisper factory to prevent this?

Memory required with model medium and large

I'm downloaded model Medium and Large at https://ggml.ggerganov.com/
After run:
whisper_init_from_file: loading model from 'ggml-model-whisper-medium.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 2
whisper_model_load: type = 4
whisper_model_load: mem required = 1720.00 MB (+ 43.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx = 1462.35 MB

Invalid wave file header when using Win11 Recorder

I was able to get the demo code running using the Kennedy.wav file. But when I recorded a file using the Windows 11 Recorder it said the wave file header was invalid.

Whisper.net.Wave.CorruptedWaveException: 'Invalid wave file header.'

Windows 11 Sound Recorder can generate Wav files of various qualities.

I took your suggestion from Issue #33 and wrote out the headers for each quality level.

Kennedy.wav: RIFF?¶WAVEfmt
Auto.wav: RIFF??☻WAVEJUNK
Medium.wav: RIFFJ?☺WAVEJUNK
Best.wav: RIFFB?WAVEJUNK
High.wav: RIFFv?♠WAVEJUNK

I would have expected these files to be valid. Is there something I'm missing?

Any version for .netframework 4.7.x ?

Great job! it will be available for .NetFramework 4.7 ?

thanks!

Faster transcription

I tried using WithThreads
I have a very powerful processor
But it didn't help anything, although it took more power from the processor, but it took the same amount of time, which means I just lost
Is there something I'm doing wrong?
I tried too
WithSpeedUp2x
And then it just doesn't transcribe anything!
Thanks for this wonderful library
and for your help

Always english

When using .WithLanguageDetection() or .WithLanguage("auto") the language is always English auto-detected language: en (p = 0.515557) but if you specify the correct language then everything is fine

Other "port" and "bindings" libraries work fine and detect the language correctly

[BUG] Getting error "System.Net.Http.HttpRequestException" when trying to use WhisperGgmlDownloader.GetGgmlModelAsync with the large model

When trying to use the WhisperGgmlDownloader.GetGgmlModelAsync method with the large model
using var modelStream = await WhisperGgmlDownloader.GetGgmlModelAsync(GgmlType.Large);
I get the following error:
System.Net.Http.HttpRequestException: Cannot write more bytes to the buffer than the configured maximum buffer size: 2147483647
Pull request for bugfix is out. Changed issue to solved

AccessViolation when try to cancel ProcessAsync()

Whisper.Net version: 1.2.2
Environment: win10-x64
.NET version: Framework 4.7.2
Model: Small.bin
wav file language: Japanese

Use Whisper.net.Demo sample code, in Program.cs, pass a 1 minute cancellation token to ProcessAsync():

    var cts = new CancellationTokenSource(TimeSpan.FromMinutes(1));
    await foreach (var segment in processor.ProcessAsync(fileStream, cts.Token))
    {
        Console.WriteLine($"New Segment: {segment.Start} ==> {segment.End} : {segment.Text}");
    }

After 1min, the Demo crashed by AccessViolationException:
0x00007FFB245934AE (whisper.dll)处(位于 Whisper.net.Demo.exe 中)引发的异常: 0xC0000005: 读取位置 0x000001D0863A1D90 时发生访问冲突。

Error Message of "Invalid wave file RIFF header" with various valid wav files

I copy/pasted the demo code to the file whisper.cs isntalled the packages in nuget. The only changes I made are changing the models (base to large) and the file name inside Default="" in the'f' Option. The code is besides taht really the same as the demo code of this repo!
The wav files are in the project folder and registered by whisper.net

But, unfortunateIy I get the following error message every time no matter which wav file I try:

   at Whisper.net.Wave.WaveParser.InitializeAsync()
   at Whisper.net.Wave.WaveParser.GetAvgSamplesAsync(CancellationToken cancellationToken)
   at Whisper.net.WhisperProcessor.ProcessAsync(Stream waveStream, CancellationToken cancellationToken)+MoveNext()
   at Whisper.net.WhisperProcessor.ProcessAsync(Stream waveStream, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at Program.<<Main>$>g__FullDetection|0_2(Options opt) in C:\Users\huddeij\RiderProjects\whisperTest\whisper2.cs:line 80
   at Program.<<Main>$>g__FullDetection|0_2(Options opt) in C:\Users\huddeij\RiderProjects\whisperTest\whisper2.cs:line 80
   at Program.<<Main>$>g__Demo|0_0(Options opt) in C:\Users\huddeij\RiderProjects\whisperTest\whisper2.cs:line 33
   at CommandLine.ParserResultExtensions.WithParsedAsync[T](ParserResult`1 result, Func`2 action)
   at Program.<Main>$(String[] args) in C:\Users\huddeij\RiderProjects\whisperTest\whisper2.cs:line 13
   at Program.<Main>(String[] args)

The output before the error message:

whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1280
whisper_model_load: n_text_head = 20
whisper_model_load: n_text_layer = 32
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: type = 5
whisper_model_load: mem required = 3557.00 MB (+ 71.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx = 2950.97 MB
whisper_model_load: model size = 2950.66 MB

I tried the sample wav files from this repo, audio records, converted into wav via cloudconvert and ffmpeg.

Environment:
MS Windows 11 Pro 22H2
.Net v7.0.203
Jetbrains Rider 2023.1.1

What am i doing wrong here?

API doesn't permit loading native .dylib from non standard location

Hi,

Thanks for an awesome library. When looking to incorporate Whisper.net into a product, we would need the ability to load the native .dylib from a location other than where the NativeLibraryLoader is currently trying to find the binary (under runtimes/...).

Would it be possible to make an extension to the API to allow the library user to specify search paths manually? We'd much prefer this over having to fork the library just for this purpose.

OpenBLAS support

I've managed to hack OpenBLAS support into the cmake files for linux-x64. For some reason, find_package(BLAS) does not work, and I had to set some variables manually. This resulted in greatly improved processing time on a 2 thread VM. On a random audio file I've been testing with, originally it took 1511 seconds, now it takes 770 seconds!

I'm very rusty with CMake, so I'm open to ideas.

Process terminated. A callback was made on a garbage collected delegate of type 'Whisper.net!Whisper.net.Native.WhisperNewSegmentCallback::Invoke'

Process terminated. A callback was made on a garbage collected delegate of type 'Whisper.net!Whisper.net.Native.WhisperNewSegmentCallback::Invoke'.
Repeat 2 times:

at Whisper.net.Native.NativeMethods.whisper_full(IntPtr, Whisper.net.Native.WhisperFullParams, IntPtr, Int32)

at Whisper.net.WhisperProcessor.Process(System.IO.Stream)
at WhisperAI.AudioProcessor+d__1.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.__Canon ByRef)
at WhisperAI.AudioProcessor.ProcessAudio(System.String)
at Program+<

$>d__0.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.__Canon ByRef)
at Program.$(System.String[])
at Program.(System.String[])

This error happens on big and long .WAV files (200mb and 40minutes long) It happens very randomly.

WithNoSpeechThreshold doesn't seem to do anything

Either I lack the understanding what this is supposed to do or it doesn't work.
I tried a variety values from 0.1f to 10f and I still had hallucinations remaining in my output, so stuff like "you" repeatedly which seems to occur whenever an empty audio stream is passed to whisper as that has likely not been including in the model training data.

.WithMaxTokensPerSegment(1) returns only one segment

if you specify .WithMaxTokensPerSegment(1) then there will be only one segment in output. Everything is fine in whisper.cpp library.

Debug a code made in maui for osx (maccatalyst-arm64)

I can't get it to work for a project made with maui for mac osx (Debug a code made in maui for osx (net7.0-maccatalyst/maccatalyst-arm64).

I downloaded the specific libs for osx-arm64 and maccatalyst, but the code:
await foreach (var segment in processor.ProcessAsync(fileStream, CancellationToken.None))

It doesn't enter foreach and doesn't give an error. Can someone help me?

Thanks.