vilassn / whisper_android Goto Github PK

Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android

License: MIT License

Java 2.18% CMake 0.60% C++ 54.34% C 30.98% Starlark 1.84% Shell 0.23% Python 3.76% NASL 0.22% JavaScript 0.71% Ruby 0.01% Swift 1.07% Kotlin 0.71% Dart 0.57% HTML 0.01% CSS 0.01% Go 0.23% TypeScript 0.94% C# 1.11% Lua 0.23% Nim 0.24%

asr openai texttospeech tts whisper text-to-speech speech-recognition tensorflow tflite offline tensorflowlite android automatic-speech-recognition transcription transcribe embedded mobile

whisper_android's Introduction

Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite

This guide explains how to integrate Whisper and Recorder class in Android apps for audio recording and speech recognition.

Whisper ASR Integration Guide

Here are separate code snippets for using Whisper and Recorder:

Whisper (Speech Recognition)

Initialization and Configuration:

// Initialize Whisper
Whisper mWhisper = new Whisper(this); // Create Whisper instance

// Load model and vocabulary for Whisper
String modelPath = getFilePath("whisper-tiny.tflite"); // Provide model file path
String vocabPath = getFilePath("filters_vocab_multilingual.bin"); // Provide vocabulary file path
mWhisper.loadModel(modelPath, vocabPath, true); // Load model and set multilingual mode

// Set a listener for Whisper to handle updates and results
mWhisper.setListener(new IWhisperListener() {
    @Override
    public void onUpdateReceived(String message) {
        // Handle Whisper status updates
    }

    @Override
    public void onResultReceived(String result) {
        // Handle transcribed results
    }
});

Transcription:

// Set the audio file path for transcription. Audio format should be in 16K, mono, 16bits
String waveFilePath = getFilePath("your_audio_file.wav"); // Provide audio file path
mWhisper.setFilePath(waveFilePath); // Set audio file path

// Start transcription
mWhisper.setAction(Whisper.ACTION_TRANSCRIBE); // Set action to transcription
mWhisper.start(); // Start transcription

// Perform other operations
// Add your additional code here

// Stop transcription
mWhisper.stop(); // Stop transcription

Recorder (Audio Recording)

Initialization and Configuration:

// Initialize Recorder
Recorder mRecorder = new Recorder(this); // Create Recorder instance

// Set a listener for Recorder to handle updates and audio data
mRecorder.setListener(new IRecorderListener() {
    @Override
    public void onUpdateReceived(String message) {
        // Handle Recorder status updates
    }

    @Override
    public void onDataReceived(float[] samples) {
        // Handle audio data received during recording
        // You can forward this data to Whisper for live recognition using writeBuffer()
        // mWhisper.writeBuffer(samples);
    }
});

Recording:

// Check and request recording permissions
checkRecordPermission(); // Check and request recording permissions

// Set the audio file path for recording. It record audio in 16K, mono, 16bits format
String waveFilePath = getFilePath("your_audio_file.wav"); // Provide audio file path
mRecorder.setFilePath(waveFilePath); // Set audio file path

// Start recording
mRecorder.start(); // Start recording

// Perform other operations
// Add your additional code here

// Stop recording
mRecorder.stop(); // Stop recording

Please adapt these code snippets to your specific use case, provide the correct file paths, and handle exceptions appropriately in your application.

Note: Ensure that you have the necessary permissions, error handling, and file path management in your application when using the Recorder class.

Demo Video

Important Note

Whisper ASR is a powerful tool for transcribing speech into text. However, keep in mind that handling audio data and transcriptions may require careful synchronization and error handling in your Android application to ensure a smooth user experience.

Enjoy using the Whisper ASR Android app to enhance your speech recognition capabilities!

whisper_android's People

Contributors

Stargazers

Watchers

Forkers

flynn-x sang556 wydgetlabsadmin hangox neu070224 ttcluo veryquant guodong029 yongaru uzstudio ca4ti eltld anchangsu gianpaolof sergenes thestretivibity junglej1m yoline777

whisper_android's Issues

Getting CMake exception while gradle sync/build

I cloned repository and started the gradle sync, but got following exception, can anyone help with that?:

1: Task failed with an exception.
-----------
* What went wrong:
Execution failed for task ':app:configureCMakeDebug[arm64-v8a]'.
> [CXX1429] error when building with cmake using C:\Users\15010\AndroidStudioProjects\whisper_android\app\src\main\cpp\CMakeLists.txt: C++ build system [configure] failed while executing:
      @echo off
      "C:\\Users\\15010\\AppData\\Local\\Android\\Sdk\\cmake\\3.22.1\\bin\\cmake.exe" ^
        "-HC:\\Users\\15010\\AndroidStudioProjects\\whisper_android\\app\\src\\main\\cpp" ^
        "-DCMAKE_SYSTEM_NAME=Android" ^
        "-DCMAKE_EXPORT_COMPILE_COMMANDS=ON" ^
        "-DCMAKE_SYSTEM_VERSION=26" ^
        "-DANDROID_PLATFORM=android-26" ^
        "-DANDROID_ABI=arm64-v8a" ^
        "-DCMAKE_ANDROID_ARCH_ABI=arm64-v8a" ^
        "-DANDROID_NDK=C:\\Users\\15010\\AppData\\Local\\Android\\Sdk\\ndk\\23.1.7779620" ^
        "-DCMAKE_ANDROID_NDK=C:\\Users\\15010\\AppData\\Local\\Android\\Sdk\\ndk\\23.1.7779620" ^
        "-DCMAKE_TOOLCHAIN_FILE=C:\\Users\\15010\\AppData\\Local\\Android\\Sdk\\ndk\\23.1.7779620\\build\\cmake\\android.toolchain.cmake" ^
        "-DCMAKE_MAKE_PROGRAM=C:\\Users\\15010\\AppData\\Local\\Android\\Sdk\\cmake\\3.22.1\\bin\\ninja.exe" ^
        "-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=C:\\Users\\15010\\AndroidStudioProjects\\whisper_android\\app\\build\\intermediates\\cxx\\Debug\\6v4z4y72\\obj\\arm64-v8a" ^
        "-DCMAKE_RUNTIME_OUTPUT_DIRECTORY=C:\\Users\\15010\\AndroidStudioProjects\\whisper_android\\app\\build\\intermediates\\cxx\\Debug\\6v4z4y72\\obj\\arm64-v8a" ^
        "-DCMAKE_BUILD_TYPE=Debug" ^
        "-BC:\\Users\\15010\\AndroidStudioProjects\\whisper_android\\app\\.cxx\\Debug\\6v4z4y72\\arm64-v8a" ^
        -GNinja
    from C:\Users\15010\AndroidStudioProjects\whisper_android\app

Realtime use possible?

There are some whisper realtime libraries out there.
Is there any possible way to make this library realtime ?

I have an issue where when I am using real time transcription, when I am not talking, it seems like it parses random text.

I was able to setup model and it works really great. My code is:

`private fun testAudio() {
// Initialize Whisper
val mWhisper = Whisper(this) // Create Whisper instance

// Load model and vocabulary for Whisper
val basePath = Global.fileOperations.getOutputDirectory("/Models", this)!!.path
val modelPath = basePath + "/whisper-tiny.tflite" // Provide model file path

    val vocabPath: String = basePath +
        "/filters_vocab_multilingual.bin" // Provide vocabulary file path
    println("PATHS: ")
    println(modelPath)
    println(vocabPath)
    mWhisper.loadModel(modelPath, vocabPath, true) // Load model and set multilingual mode

// Set a listener for Whisper to handle updates and results

    mWhisper.setListener(object : IWhisperListener {
        override fun onUpdateReceived(message: String?) {
            Log.i("TRANSCRIBE_WHISPER", "New State: $message")
            // Handle Whisper status updates
        }

        override fun onResultReceived(result: String?) {
            Log.i("TRANSCRIBE_WHISPER", result ?: "")
            // Handle transcribed results
        }
    })
    // Initialize Recorder
    val mRecorder = Recorder(this) // Create Recorder instance

// Set a listener for Recorder to handle updates and audio data
mRecorder.setListener(object : IRecorderListener {
override fun onUpdateReceived(message: String) {
// Handle Recorder status updates
}

        override fun onDataReceived(samples: FloatArray) {
            // Handle audio data received during recording
            // You can forward this data to Whisper for live recognition using writeBuffer()
            mWhisper.writeBuffer(samples);
        }
    })

    mRecorder.start(); // Start recording

}`

and  override fun onResultReceived(result: String?) {
            Log.i("TRANSCRIBE_WHISPER", result ?: "")
            // Handle transcribed results
        }

seemed to return:

[audioRecordData][fine] 5s(f:5014 m:0 s:0) : pid 8824 uid 10419 sessionId 41305 sr 16000 ch 1 fmt 1

I'll make a hole in the hole.
2 times this:

[audioRecordData][fine] 10s(f:10000 m:0 s:0) : pid 8824 uid 10419 sessionId 41305 sr 16000 ch 1 fmt 1
then
I'll be back with a little .... <== repeated a lot

thanks for you hard work :P

How to fix error version tensorflow lite ?

"E/tflite: Model provided has model identifier 'ion ', should be 'TFL3'"

thanks!

Get timestamps at the segment or word level

Thanks for the port.

Can this output a transcript of the provided audio with timestamps at the segment, word level, or both. I'm trying to transcribe audio files for dubbing and i need timestamp precision for wav file transcripts. Basically the start and end times for words or texts .

Open ai provides an api for this through the [timestamp_granularities[] parameter](https://platform.openai.com/docs/api-reference/audio/createTranscription#audio-createtranscription-timestamp_granularities)

Can you add this feature?

Is it possible to set input language with whisper-base.tflite with this code

Here's the thing, I speak Chinese but the result is in English with similar meaning. So I am wondering if it is possible to set the input language so that the output will be in the same language as the input.

Not working on virtual devices

I have tried the project as is on 2 virtual devices (Android 14 and 12) and one physical device (Android 12). It seems to not run on virtual devices, you might want to mention this in the readme.md.

Error while run the app -> `java.lang.UnsatisfiedLinkError: dlopen failed: library "libtensorflowlite.so" not found`

Hello @ALL,
I am facing this error when I run whisper_android repo can anyone suggest me solution?

IllegalArgumentException: Internal error: Failed to run on the given Interpreter

2024-04-29 09:48:50.970 24207-24753 Whisper com.whispertflite E Error...
java.lang.IllegalArgumentException: Internal error: Failed to run on the given Interpreter: tensorflow/lite/kernels/reduce.cc:390 std::apply(optimized_ops::Mean<T, U>, args) was not true.
tensorflow/lite/kernels/reduce.cc:390 std::apply(optimized_ops::Mean<T, U>, args) was not true.
tensorflow/lite/kernels/reduce.cc:390 std::apply(optimized_ops::Mean<T, U>, args) was not true.
tensorflow/lite/kernels/reduce.cc:390 std::apply(optimized_ops::Mean<T, U>, args) was not true.
tensorflow/lite/kernels/reduce.cc:390 std::apply(optimized_ops::Mean<T,
at org.tensorflow.lite.NativeInterpreterWrapper.run(Native Method)
at org.tensorflow.lite.NativeInterpreterWrapper.run(NativeInterpreterWrapper.java:247)
at org.tensorflow.lite.InterpreterImpl.runForMultipleInputsOutputs(InterpreterImpl.java:107)
at org.tensorflow.lite.Interpreter.runForMultipleInputsOutputs(Interpreter.java:80)
at org.tensorflow.lite.InterpreterImpl.run(InterpreterImpl.java:100)
at org.tensorflow.lite.Interpreter.run(Interpreter.java:80)
at com.whispertflite.engine.WhisperEngine.runInference(WhisperEngine.java:147)
at com.whispertflite.engine.WhisperEngine.transcribeFile(WhisperEngine.java:74)
at com.whispertflite.asr.Whisper.threadFunction(Whisper.java:129)
at com.whispertflite.asr.Whisper.lambda$start$0$com-whispertflite-asr-Whisper(Whisper.java:76)
at com.whispertflite.asr.Whisper$$ExternalSyntheticLambda0.run(Unknown Source:2)
at java.lang.Thread.run(Thread.java:930)

The only difference between success and failure success Input Tensor Dump ===>
2024-04-29 09:50:13.724 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.725 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.725 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.725 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.725 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.726 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.726 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.726 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.726 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.727 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.727 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.727 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.727 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.727 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.728 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.728 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.728 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.728 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.729 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.729 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.729 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.729 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.729 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.729 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.730 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.730 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.730 24920-25027 WhisperEngineJava 2024-04-29 09:50:13.730 24920-25027 WhisperEngineJava is the tflite file，This is their parameter print :
com.whispertflite D shape.length: 3
com.whispertflite D shape[0]: 1
com.whispertflite D shape[1]: 80
com.whispertflite D shape[2]: 3000
com.whispertflite D dataType: FLOAT32
com.whispertflite D name: serving_default_input_ids:0
com.whispertflite D numBytes: 960000
com.whispertflite D index: 0
com.whispertflite D numDimensions: 3
com.whispertflite D numElements: 240000
com.whispertflite D shapeSignature.length: 3
com.whispertflite D quantizationParams.getScale: 0.0
com.whispertflite D quantizationParams.getZeroPoint: 0
com.whispertflite D ==================================================================
com.whispertflite D Output Tensor Dump ===>
com.whispertflite D shape.length: 2
com.whispertflite D shape[0]: 1
com.whispertflite D shape[1]: 448
com.whispertflite D dataType: INT32
com.whispertflite D name: StatefulPartitionedCall:0
com.whispertflite D numBytes: 1792
com.whispertflite D index: 1047
com.whispertflite D numDimensions: 2
com.whispertflite D numElements: 448
com.whispertflite D shapeSignature.length: 2
com.whispertflite D quantizationParams.getScale: 0.0
com.whispertflite D quantizationParams.getZeroPoint: 0
com.whispertflite D ==================================================================

failed Input Tensor Dump ===>
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D shape.length: 3
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D shape[0]: 1
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D shape[1]: 80
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D shape[2]: 3000
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D dataType: FLOAT32
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D name: serving_default_input_ids:0
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D numBytes: 960000
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D index: 0
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D numDimensions: 3
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D numElements: 240000
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D shapeSignature.length: 3
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D quantizationParams.getScale: 0.0
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D quantizationParams.getZeroPoint: 0
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D ==================================================================
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D Output Tensor Dump ===>
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D shape.length: 2
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D shape[0]: 1
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D shape[1]: 451
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D dataType: INT32
2024-04-29 09:48:45.686 24207-24753 WhisperEngineJava com.whispertflite D name: StatefulPartitionedCall:0
2024-04-29 09:48:45.686 24207-24753 WhisperEngineJava com.whispertflite D numBytes: 1804
2024-04-29 09:48:45.686 24207-24753 WhisperEngineJava com.whispertflite D index: 559
2024-04-29 09:48:45.686 24207-24753 WhisperEngineJava com.whispertflite D numDimensions: 2
2024-04-29 09:48:45.686 24207-24753 WhisperEngineJava com.whispertflite D numElements: 451
2024-04-29 09:48:45.686 24207-24753 WhisperEngineJava com.whispertflite D shapeSignature.length: 2
2024-04-29 09:48:45.686 24207-24753 WhisperEngineJava com.whispertflite D quantizationParams.getScale: 0.0
2024-04-29 09:48:45.686 24207-24753 WhisperEngineJava com.whispertflite D quantizationParams.getZeroPoint: 0
2024-04-29 09:48:45.686 24207-24753 WhisperEngineJava com.whispertflite D ==================================================================

The script for generating the failed tflite file is as follows:

import tensorflow as tf
from datasets import load_dataset
from transformers import WhisperProcessor, WhisperFeatureExtractor, TFWhisperForConditionalGeneration, WhisperTokenizer

whisperPath = "openai/whisper-tiny.en"
saved_model_dir = 'path/to/tf_whisper_saved'

tflite_model_path = 'path/to/whisper111.tflite'

feature_extractor = WhisperFeatureExtractor.from_pretrained(whisperPath)
tokenizer = WhisperTokenizer.from_pretrained(whisperPath, predict_timestamps=True)
processor = WhisperProcessor(feature_extractor, tokenizer)
model = TFWhisperForConditionalGeneration.from_pretrained(whisperPath, from_pt=True)

Loading dataset

ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation", trust_remote_code = True)

inputs = feature_extractor(
ds[0]["audio"]["array"], sampling_rate=ds[0]["audio"]["sampling_rate"], return_tensors="tf"
)
input_features = inputs.input_features

Generating Transcription

generated_ids = model.generate(input_features=input_features)
print(generated_ids)
transcription = processor.tokenizer.decode(generated_ids[0])
print(transcription)
model.save(saved_model_dir)

Convert the model

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.target_spec.supported_ops = [
tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()

Save the model

with open(tflite_model_path, 'wb') as f:
f.write(tflite_model)

class GenerateModel(tf.Module):
def init(self, model):
super(GenerateModel, self).init()
self.model = model

@tf.function(
# shouldn't need static batch size, but throws exception without it (needs to be fixed)
input_signature=[
tf.TensorSpec((1, 80, 3000), tf.float32, name="input_ids"),
],
)
def serving(self, input_features):
outputs = self.model.generate(
input_features,
max_new_tokens=450, #change as needed
return_dict_in_generate=True,
)
return {"sequences": outputs["sequences"]}

saved_model_dir = '/content/tf_whisper_saved'

tflite_model_path = 'whisper-tiny.en.tflite'

tflite_model_path = 'path/to/whisper222.tflite'

tflite_model_path = 'path/to/whisper_vi222.tflite'

generate_model = GenerateModel(model=model)
tf.saved_model.save(generate_model, saved_model_dir, signatures={"serving_default": generate_model.serving})

Convert the model

Save the model

with open(tflite_model_path, 'wb') as f:
f.write(tflite_model)

How to generate whisper tflite model

I want to generate my own model.
I try :
https://colab.research.google.com/github/usefulsensors/openai-whisper/blob/main/notebooks/whisper_base_tflite_model.ipynb#scrollTo=TzCrY9Q5jVsg
But, it's not work for me.

It seems like Voice Activity Detection (VAD) isn't working very accurately?

https://github.com/vilassn/whisper_android/blob/master/app/src/main/cpp/vad.cpp

I tried it with sample rate 16000 and 44100 but both results were "Silence".

Do I need to adjust any parameters to suit different audio files?

Thanks!

vilassn / whisper_android Goto Github PK

whisper_android's Introduction

Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite

Whisper ASR Integration Guide

Whisper (Speech Recognition)

Recorder (Audio Recording)

Demo Video

Important Note

whisper_android's People

Contributors

Stargazers

Watchers

Forkers

whisper_android's Issues

Loading dataset

Generating Transcription

Convert the model

Save the model

saved_model_dir = '/content/tf_whisper_saved'

tflite_model_path = 'whisper-tiny.en.tflite'

tflite_model_path = 'path/to/whisper222.tflite'

Convert the model

Save the model

Recommend Projects

Recommend Topics

Recommend Org