Giter Site home page Giter Site logo

sarmientof / whisper.expo Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 2.75 MB

License: MIT License

CMake 0.52% Java 1.47% C++ 43.81% JavaScript 0.81% Objective-C 4.63% C 38.41% Objective-C++ 0.45% Metal 5.10% Kotlin 0.29% Swift 0.15% Ruby 0.11% TypeScript 0.45% Shell 0.56% Dockerfile 0.03% Makefile 0.20% Go 0.65% HTML 1.56% Vim Script 0.19% Python 0.60% Batchfile 0.02%

whisper.expo's Introduction

whisper.rn

Actions Status License: MIT npm

React Native binding of whisper.cpp.

whisper.cpp: High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model

Screenshots

iOS: Tested on iPhone 13 Pro Max Android: Tested on Pixel 6
(tiny.en, Core ML enabled, release mode + archive) (tiny.en, armv8.2-a+fp16, release mode)

Installation

npm install whisper.rn

iOS

Please re-run npx pod-install again.

If you want to use medium or large model, the Extended Virtual Addressing capability is recommended to enable on iOS project.

Android

Add proguard rule if it's enabled in project (android/app/proguard-rules.pro):

# whisper.rn
-keep class com.rnwhisper.** { *; }

For build, it's recommended to use ndkVersion = "24.0.8215888" (or above) in your root project build configuration for Apple Silicon Macs. Otherwise please follow this trobleshooting issue.

Expo

You will need to prebuild the project before using it. See Expo guide for more details.

Add Microphone Permissions (Optional)

If you want to use realtime transcribe, you need to add the microphone permission to your app.

iOS

Add these lines to ios/[YOU_APP_NAME]/info.plist

<key>NSMicrophoneUsageDescription</key>
<string>This app requires microphone access in order to transcribe speech</string>

For tvOS, please note that the microphone is not supported.

Android

Add the following line to android/app/src/main/AndroidManifest.xml

<uses-permission android:name="android.permission.RECORD_AUDIO" />

Tips & Tricks

The Tips & Tricks document is a collection of tips and tricks for using whisper.rn.

Usage

import { initWhisper } from 'whisper.rn'

const whisperContext = await initWhisper({
  filePath: 'file://.../ggml-tiny.en.bin',
})

const sampleFilePath = 'file://.../sample.wav'
const options = { language: 'en' }
const { stop, promise } = whisperContext.transcribe(sampleFilePath, options)

const { result } = await promise
// result: (The inference text result from audio file)

Use realtime transcribe:

const { stop, subscribe } = await whisperContext.transcribeRealtime(options)

subscribe(evt => {
  const { isCapturing, data, processTime, recordingTime } = evt
  console.log(
    `Realtime transcribing: ${isCapturing ? 'ON' : 'OFF'}\n` +
      // The inference text result from audio record:
      `Result: ${data.result}\n\n` +
      `Process time: ${processTime}ms\n` +
      `Recording time: ${recordingTime}ms`,
  )
  if (!isCapturing) console.log('Finished realtime transcribing')
})

In iOS, You may need to change the Audio Session so that it can be used with other audio playback, or to optimize the quality of the recording. So we have provided AudioSession utilities for you:

Option 1 - Use options in transcribeRealtime:

import { AudioSessionIos } from 'whisper.rn'

const { stop, subscribe } = await whisperContext.transcribeRealtime({
  audioSessionOnStartIos: {
    category: AudioSessionIos.Category.PlayAndRecord,
    options: [AudioSessionIos.CategoryOption.MixWithOthers],
    mode: AudioSessionIos.Mode.Default,
  },
  audioSessionOnStopIos: 'restore', // Or an AudioSessionSettingIos
})

Option 2 - Manage the Audio Session in anywhere:

import { AudioSessionIos } from 'whisper.rn'

await AudioSessionIos.setCategory(
  AudioSessionIos.Category.PlayAndRecord, [AudioSessionIos.CategoryOption.MixWithOthers],
)
await AudioSessionIos.setMode(AudioSessionIos.Mode.Default)
await AudioSessionIos.setActive(true)
// Then you can start do recording

In Android, you may need to request the microphone permission by PermissionAndroid.

Please visit the Documentation for more details.

Usage with assets

You can also use the model file / audio file from assets:

import { initWhisper } from 'whisper.rn'

const whisperContext = await initWhisper({
  filePath: require('../assets/ggml-tiny.en.bin'),
})

const { stop, promise } =
  whisperContext.transcribe(require('../assets/sample.wav'), options)

// ...

This requires editing the metro.config.js to support assets:

// ...
const defaultAssetExts = require('metro-config/src/defaults/defaults').assetExts

module.exports = {
  // ...
  resolver: {
    // ...
    assetExts: [
      ...defaultAssetExts,
      'bin', // whisper.rn: ggml model binary
      'mil', // whisper.rn: CoreML model asset
    ]
  },
}

Please note that:

  • It will significantly increase the size of the app in release mode.
  • The RN packager is not allowed file size larger than 2GB, so it not able to use original f16 large model (2.9GB), you can use quantized models instead.

Core ML support

Platform: iOS 15.0+, tvOS 15.0+

To use Core ML on iOS, you will need to have the Core ML model files.

The .mlmodelc model files is load depend on the ggml model file path. For example, if your ggml model path is ggml-tiny.en.bin, the Core ML model path will be ggml-tiny.en-encoder.mlmodelc. Please note that the ggml model is still needed as decoder or encoder fallback.

The Core ML models are hosted here: https://huggingface.co/ggerganov/whisper.cpp/tree/main

If you want to download model at runtime, during the host file is archive, you will need to unzip the file to get the .mlmodelc directory, you can use library like react-native-zip-archive, or host those individual files to download yourself.

The .mlmodelc is a directory, usually it includes 5 files (3 required):

[
  'model.mil',
  'coremldata.bin',
  'weights/weight.bin',
  // Not required:
  // 'metadata.json', 'analytics/coremldata.bin',
]

Or just use require to bundle that in your app, like the example app does, but this would increase the app size significantly.

const whisperContext = await initWhisper({
  filePath: require('../assets/ggml-tiny.en.bin')
  coreMLModelAsset:
    Platform.OS === 'ios'
      ? {
          filename: 'ggml-tiny.en-encoder.mlmodelc',
          assets: [
            require('../assets/ggml-tiny.en-encoder.mlmodelc/weights/weight.bin'),
            require('../assets/ggml-tiny.en-encoder.mlmodelc/model.mil'),
            require('../assets/ggml-tiny.en-encoder.mlmodelc/coremldata.bin'),
          ],
        }
      : undefined,
})

In real world, we recommended to split the asset imports into another platform specific file (e.g. context-opts.ios.js) to avoid these unused files in the bundle for Android.

Run with example

The example app provide a simple UI for testing the functions.

Used Whisper model: tiny.en in https://huggingface.co/ggerganov/whisper.cpp Sample file: jfk.wav in https://github.com/ggerganov/whisper.cpp/tree/master/samples

Please follow the Development Workflow section of contributing guide to run the example app.

Mock whisper.rn

We have provided a mock version of whisper.rn for testing purpose you can use on Jest:

jest.mock('whisper.rn', () => require('whisper.rn/jest/mock'))

Contributing

See the contributing guide to learn how to contribute to the repository and the development workflow.

Troubleshooting

See the troubleshooting if you encounter any problem while using whisper.rn.

License

MIT


Made with create-react-native-library


Built and maintained by BRICKS.

whisper.expo's People

Contributors

sarmientof avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.