Giter Site home page Giter Site logo

vilassn / whisper_android Goto Github PK

View Code? Open in Web Editor NEW
114.0 4.0 18.0 191.57 MB

Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android

License: MIT License

Java 2.18% CMake 0.60% C++ 54.34% C 30.98% Starlark 1.84% Shell 0.23% Python 3.76% NASL 0.22% JavaScript 0.71% Ruby 0.01% Swift 1.07% Kotlin 0.71% Dart 0.57% HTML 0.01% CSS 0.01% Go 0.23% TypeScript 0.94% C# 1.11% Lua 0.23% Nim 0.24%
asr openai texttospeech tts whisper text-to-speech speech-recognition tensorflow tflite offline tensorflowlite android automatic-speech-recognition transcription transcribe embedded mobile

whisper_android's Introduction

Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite

This guide explains how to integrate Whisper and Recorder class in Android apps for audio recording and speech recognition.

Whisper ASR Integration Guide

Here are separate code snippets for using Whisper and Recorder:

Whisper (Speech Recognition)

Initialization and Configuration:

// Initialize Whisper
Whisper mWhisper = new Whisper(this); // Create Whisper instance

// Load model and vocabulary for Whisper
String modelPath = getFilePath("whisper-tiny.tflite"); // Provide model file path
String vocabPath = getFilePath("filters_vocab_multilingual.bin"); // Provide vocabulary file path
mWhisper.loadModel(modelPath, vocabPath, true); // Load model and set multilingual mode

// Set a listener for Whisper to handle updates and results
mWhisper.setListener(new IWhisperListener() {
    @Override
    public void onUpdateReceived(String message) {
        // Handle Whisper status updates
    }

    @Override
    public void onResultReceived(String result) {
        // Handle transcribed results
    }
});

Transcription:

// Set the audio file path for transcription. Audio format should be in 16K, mono, 16bits
String waveFilePath = getFilePath("your_audio_file.wav"); // Provide audio file path
mWhisper.setFilePath(waveFilePath); // Set audio file path

// Start transcription
mWhisper.setAction(Whisper.ACTION_TRANSCRIBE); // Set action to transcription
mWhisper.start(); // Start transcription

// Perform other operations
// Add your additional code here

// Stop transcription
mWhisper.stop(); // Stop transcription

Recorder (Audio Recording)

Initialization and Configuration:

// Initialize Recorder
Recorder mRecorder = new Recorder(this); // Create Recorder instance

// Set a listener for Recorder to handle updates and audio data
mRecorder.setListener(new IRecorderListener() {
    @Override
    public void onUpdateReceived(String message) {
        // Handle Recorder status updates
    }

    @Override
    public void onDataReceived(float[] samples) {
        // Handle audio data received during recording
        // You can forward this data to Whisper for live recognition using writeBuffer()
        // mWhisper.writeBuffer(samples);
    }
});

Recording:

// Check and request recording permissions
checkRecordPermission(); // Check and request recording permissions

// Set the audio file path for recording. It record audio in 16K, mono, 16bits format
String waveFilePath = getFilePath("your_audio_file.wav"); // Provide audio file path
mRecorder.setFilePath(waveFilePath); // Set audio file path

// Start recording
mRecorder.start(); // Start recording

// Perform other operations
// Add your additional code here

// Stop recording
mRecorder.stop(); // Stop recording

Please adapt these code snippets to your specific use case, provide the correct file paths, and handle exceptions appropriately in your application.

Note: Ensure that you have the necessary permissions, error handling, and file path management in your application when using the Recorder class.

Demo Video

Video

Important Note

Whisper ASR is a powerful tool for transcribing speech into text. However, keep in mind that handling audio data and transcriptions may require careful synchronization and error handling in your Android application to ensure a smooth user experience.

Enjoy using the Whisper ASR Android app to enhance your speech recognition capabilities!

whisper_android's People

Contributors

vilassn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

whisper_android's Issues

Getting CMake exception while gradle sync/build

I cloned repository and started the gradle sync, but got following exception, can anyone help with that?:

1: Task failed with an exception.
-----------
* What went wrong:
Execution failed for task ':app:configureCMakeDebug[arm64-v8a]'.
> [CXX1429] error when building with cmake using C:\Users\15010\AndroidStudioProjects\whisper_android\app\src\main\cpp\CMakeLists.txt: C++ build system [configure] failed while executing:
      @echo off
      "C:\\Users\\15010\\AppData\\Local\\Android\\Sdk\\cmake\\3.22.1\\bin\\cmake.exe" ^
        "-HC:\\Users\\15010\\AndroidStudioProjects\\whisper_android\\app\\src\\main\\cpp" ^
        "-DCMAKE_SYSTEM_NAME=Android" ^
        "-DCMAKE_EXPORT_COMPILE_COMMANDS=ON" ^
        "-DCMAKE_SYSTEM_VERSION=26" ^
        "-DANDROID_PLATFORM=android-26" ^
        "-DANDROID_ABI=arm64-v8a" ^
        "-DCMAKE_ANDROID_ARCH_ABI=arm64-v8a" ^
        "-DANDROID_NDK=C:\\Users\\15010\\AppData\\Local\\Android\\Sdk\\ndk\\23.1.7779620" ^
        "-DCMAKE_ANDROID_NDK=C:\\Users\\15010\\AppData\\Local\\Android\\Sdk\\ndk\\23.1.7779620" ^
        "-DCMAKE_TOOLCHAIN_FILE=C:\\Users\\15010\\AppData\\Local\\Android\\Sdk\\ndk\\23.1.7779620\\build\\cmake\\android.toolchain.cmake" ^
        "-DCMAKE_MAKE_PROGRAM=C:\\Users\\15010\\AppData\\Local\\Android\\Sdk\\cmake\\3.22.1\\bin\\ninja.exe" ^
        "-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=C:\\Users\\15010\\AndroidStudioProjects\\whisper_android\\app\\build\\intermediates\\cxx\\Debug\\6v4z4y72\\obj\\arm64-v8a" ^
        "-DCMAKE_RUNTIME_OUTPUT_DIRECTORY=C:\\Users\\15010\\AndroidStudioProjects\\whisper_android\\app\\build\\intermediates\\cxx\\Debug\\6v4z4y72\\obj\\arm64-v8a" ^
        "-DCMAKE_BUILD_TYPE=Debug" ^
        "-BC:\\Users\\15010\\AndroidStudioProjects\\whisper_android\\app\\.cxx\\Debug\\6v4z4y72\\arm64-v8a" ^
        -GNinja
    from C:\Users\15010\AndroidStudioProjects\whisper_android\app

Realtime use possible?

There are some whisper realtime libraries out there.
Is there any possible way to make this library realtime ?

I have an issue where when I am using real time transcription, when I am not talking, it seems like it parses random text.

I was able to setup model and it works really great. My code is:

`private fun testAudio() {
// Initialize Whisper
val mWhisper = Whisper(this) // Create Whisper instance

// Load model and vocabulary for Whisper
val basePath = Global.fileOperations.getOutputDirectory("/Models", this)!!.path
val modelPath = basePath + "/whisper-tiny.tflite" // Provide model file path

    val vocabPath: String = basePath +
        "/filters_vocab_multilingual.bin" // Provide vocabulary file path
    println("PATHS: ")
    println(modelPath)
    println(vocabPath)
    mWhisper.loadModel(modelPath, vocabPath, true) // Load model and set multilingual mode

// Set a listener for Whisper to handle updates and results

    mWhisper.setListener(object : IWhisperListener {
        override fun onUpdateReceived(message: String?) {
            Log.i("TRANSCRIBE_WHISPER", "New State: $message")
            // Handle Whisper status updates
        }

        override fun onResultReceived(result: String?) {
            Log.i("TRANSCRIBE_WHISPER", result ?: "")
            // Handle transcribed results
        }
    })
    // Initialize Recorder
    val mRecorder = Recorder(this) // Create Recorder instance

// Set a listener for Recorder to handle updates and audio data
mRecorder.setListener(object : IRecorderListener {
override fun onUpdateReceived(message: String) {
// Handle Recorder status updates
}

        override fun onDataReceived(samples: FloatArray) {
            // Handle audio data received during recording
            // You can forward this data to Whisper for live recognition using writeBuffer()
            mWhisper.writeBuffer(samples);
        }
    })

    mRecorder.start(); // Start recording

}`

and  override fun onResultReceived(result: String?) {
            Log.i("TRANSCRIBE_WHISPER", result ?: "")
            // Handle transcribed results
        }

seemed to return:

[audioRecordData][fine] 5s(f:5014 m:0 s:0) : pid 8824 uid 10419 sessionId 41305 sr 16000 ch 1 fmt 1

I'll make a hole in the hole.
2 times this:

[audioRecordData][fine] 10s(f:10000 m:0 s:0) : pid 8824 uid 10419 sessionId 41305 sr 16000 ch 1 fmt 1
then
I'll be back with a little .... <== repeated a lot

thanks for you hard work :P

Get timestamps at the segment or word level

Thanks for the port.

Can this output a transcript of the provided audio with timestamps at the segment, word level, or both. I'm trying to transcribe audio files for dubbing and i need timestamp precision for wav file transcripts. Basically the start and end times for words or texts .

Open ai provides an api for this through the [timestamp_granularities[] parameter](https://platform.openai.com/docs/api-reference/audio/createTranscription#audio-createtranscription-timestamp_granularities)

Can you add this feature?

Not working on virtual devices

I have tried the project as is on 2 virtual devices (Android 14 and 12) and one physical device (Android 12). It seems to not run on virtual devices, you might want to mention this in the readme.md.

IllegalArgumentException: Internal error: Failed to run on the given Interpreter

2024-04-29 09:48:50.970 24207-24753 Whisper com.whispertflite E Error...
java.lang.IllegalArgumentException: Internal error: Failed to run on the given Interpreter: tensorflow/lite/kernels/reduce.cc:390 std::apply(optimized_ops::Mean<T, U>, args) was not true.
tensorflow/lite/kernels/reduce.cc:390 std::apply(optimized_ops::Mean<T, U>, args) was not true.
tensorflow/lite/kernels/reduce.cc:390 std::apply(optimized_ops::Mean<T, U>, args) was not true.
tensorflow/lite/kernels/reduce.cc:390 std::apply(optimized_ops::Mean<T, U>, args) was not true.
tensorflow/lite/kernels/reduce.cc:390 std::apply(optimized_ops::Mean<T,
at org.tensorflow.lite.NativeInterpreterWrapper.run(Native Method)
at org.tensorflow.lite.NativeInterpreterWrapper.run(NativeInterpreterWrapper.java:247)
at org.tensorflow.lite.InterpreterImpl.runForMultipleInputsOutputs(InterpreterImpl.java:107)
at org.tensorflow.lite.Interpreter.runForMultipleInputsOutputs(Interpreter.java:80)
at org.tensorflow.lite.InterpreterImpl.run(InterpreterImpl.java:100)
at org.tensorflow.lite.Interpreter.run(Interpreter.java:80)
at com.whispertflite.engine.WhisperEngine.runInference(WhisperEngine.java:147)
at com.whispertflite.engine.WhisperEngine.transcribeFile(WhisperEngine.java:74)
at com.whispertflite.asr.Whisper.threadFunction(Whisper.java:129)
at com.whispertflite.asr.Whisper.lambda$start$0$com-whispertflite-asr-Whisper(Whisper.java:76)
at com.whispertflite.asr.Whisper$$ExternalSyntheticLambda0.run(Unknown Source:2)
at java.lang.Thread.run(Thread.java:930)

The only difference between success and failure is the tflite file,This is their parameter print :
success Input Tensor Dump ===>
2024-04-29 09:50:13.724 24920-25027 WhisperEngineJava com.whispertflite D shape.length: 3
2024-04-29 09:50:13.725 24920-25027 WhisperEngineJava com.whispertflite D shape[0]: 1
2024-04-29 09:50:13.725 24920-25027 WhisperEngineJava com.whispertflite D shape[1]: 80
2024-04-29 09:50:13.725 24920-25027 WhisperEngineJava com.whispertflite D shape[2]: 3000
2024-04-29 09:50:13.725 24920-25027 WhisperEngineJava com.whispertflite D dataType: FLOAT32
2024-04-29 09:50:13.726 24920-25027 WhisperEngineJava com.whispertflite D name: serving_default_input_ids:0
2024-04-29 09:50:13.726 24920-25027 WhisperEngineJava com.whispertflite D numBytes: 960000
2024-04-29 09:50:13.726 24920-25027 WhisperEngineJava com.whispertflite D index: 0
2024-04-29 09:50:13.726 24920-25027 WhisperEngineJava com.whispertflite D numDimensions: 3
2024-04-29 09:50:13.727 24920-25027 WhisperEngineJava com.whispertflite D numElements: 240000
2024-04-29 09:50:13.727 24920-25027 WhisperEngineJava com.whispertflite D shapeSignature.length: 3
2024-04-29 09:50:13.727 24920-25027 WhisperEngineJava com.whispertflite D quantizationParams.getScale: 0.0
2024-04-29 09:50:13.727 24920-25027 WhisperEngineJava com.whispertflite D quantizationParams.getZeroPoint: 0
2024-04-29 09:50:13.727 24920-25027 WhisperEngineJava com.whispertflite D ==================================================================
2024-04-29 09:50:13.728 24920-25027 WhisperEngineJava com.whispertflite D Output Tensor Dump ===>
2024-04-29 09:50:13.728 24920-25027 WhisperEngineJava com.whispertflite D shape.length: 2
2024-04-29 09:50:13.728 24920-25027 WhisperEngineJava com.whispertflite D shape[0]: 1
2024-04-29 09:50:13.728 24920-25027 WhisperEngineJava com.whispertflite D shape[1]: 448
2024-04-29 09:50:13.729 24920-25027 WhisperEngineJava com.whispertflite D dataType: INT32
2024-04-29 09:50:13.729 24920-25027 WhisperEngineJava com.whispertflite D name: StatefulPartitionedCall:0
2024-04-29 09:50:13.729 24920-25027 WhisperEngineJava com.whispertflite D numBytes: 1792
2024-04-29 09:50:13.729 24920-25027 WhisperEngineJava com.whispertflite D index: 1047
2024-04-29 09:50:13.729 24920-25027 WhisperEngineJava com.whispertflite D numDimensions: 2
2024-04-29 09:50:13.729 24920-25027 WhisperEngineJava com.whispertflite D numElements: 448
2024-04-29 09:50:13.730 24920-25027 WhisperEngineJava com.whispertflite D shapeSignature.length: 2
2024-04-29 09:50:13.730 24920-25027 WhisperEngineJava com.whispertflite D quantizationParams.getScale: 0.0
2024-04-29 09:50:13.730 24920-25027 WhisperEngineJava com.whispertflite D quantizationParams.getZeroPoint: 0
2024-04-29 09:50:13.730 24920-25027 WhisperEngineJava com.whispertflite D ==================================================================

failed Input Tensor Dump ===>
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D shape.length: 3
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D shape[0]: 1
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D shape[1]: 80
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D shape[2]: 3000
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D dataType: FLOAT32
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D name: serving_default_input_ids:0
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D numBytes: 960000
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D index: 0
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D numDimensions: 3
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D numElements: 240000
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D shapeSignature.length: 3
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D quantizationParams.getScale: 0.0
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D quantizationParams.getZeroPoint: 0
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D ==================================================================
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D Output Tensor Dump ===>
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D shape.length: 2
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D shape[0]: 1
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D shape[1]: 451
2024-04-29 09:48:45.685 24207-24753 WhisperEngineJava com.whispertflite D dataType: INT32
2024-04-29 09:48:45.686 24207-24753 WhisperEngineJava com.whispertflite D name: StatefulPartitionedCall:0
2024-04-29 09:48:45.686 24207-24753 WhisperEngineJava com.whispertflite D numBytes: 1804
2024-04-29 09:48:45.686 24207-24753 WhisperEngineJava com.whispertflite D index: 559
2024-04-29 09:48:45.686 24207-24753 WhisperEngineJava com.whispertflite D numDimensions: 2
2024-04-29 09:48:45.686 24207-24753 WhisperEngineJava com.whispertflite D numElements: 451
2024-04-29 09:48:45.686 24207-24753 WhisperEngineJava com.whispertflite D shapeSignature.length: 2
2024-04-29 09:48:45.686 24207-24753 WhisperEngineJava com.whispertflite D quantizationParams.getScale: 0.0
2024-04-29 09:48:45.686 24207-24753 WhisperEngineJava com.whispertflite D quantizationParams.getZeroPoint: 0
2024-04-29 09:48:45.686 24207-24753 WhisperEngineJava com.whispertflite D ==================================================================

The script for generating the failed tflite file is as follows:

import tensorflow as tf
from datasets import load_dataset
from transformers import WhisperProcessor, WhisperFeatureExtractor, TFWhisperForConditionalGeneration, WhisperTokenizer

whisperPath = "openai/whisper-tiny.en"
saved_model_dir = 'path/to/tf_whisper_saved'

tflite_model_path = 'path/to/whisper111.tflite'

feature_extractor = WhisperFeatureExtractor.from_pretrained(whisperPath)
tokenizer = WhisperTokenizer.from_pretrained(whisperPath, predict_timestamps=True)
processor = WhisperProcessor(feature_extractor, tokenizer)
model = TFWhisperForConditionalGeneration.from_pretrained(whisperPath, from_pt=True)

Loading dataset

ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation", trust_remote_code = True)

inputs = feature_extractor(
ds[0]["audio"]["array"], sampling_rate=ds[0]["audio"]["sampling_rate"], return_tensors="tf"
)
input_features = inputs.input_features

Generating Transcription

generated_ids = model.generate(input_features=input_features)
print(generated_ids)
transcription = processor.tokenizer.decode(generated_ids[0])
print(transcription)
model.save(saved_model_dir)

Convert the model

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.target_spec.supported_ops = [
tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()

Save the model

with open(tflite_model_path, 'wb') as f:
f.write(tflite_model)

class GenerateModel(tf.Module):
def init(self, model):
super(GenerateModel, self).init()
self.model = model

@tf.function(
# shouldn't need static batch size, but throws exception without it (needs to be fixed)
input_signature=[
tf.TensorSpec((1, 80, 3000), tf.float32, name="input_ids"),
],
)
def serving(self, input_features):
outputs = self.model.generate(
input_features,
max_new_tokens=450, #change as needed
return_dict_in_generate=True,
)
return {"sequences": outputs["sequences"]}

saved_model_dir = '/content/tf_whisper_saved'

tflite_model_path = 'whisper-tiny.en.tflite'

tflite_model_path = 'path/to/whisper222.tflite'

tflite_model_path = 'path/to/whisper_vi222.tflite'

generate_model = GenerateModel(model=model)
tf.saved_model.save(generate_model, saved_model_dir, signatures={"serving_default": generate_model.serving})

Convert the model

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.target_spec.supported_ops = [
tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()

Save the model

with open(tflite_model_path, 'wb') as f:
f.write(tflite_model)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.