Giter Site home page Giter Site logo

swiftwhisper's Introduction

SwiftWhisper

The easiest way to use Whisper in Swift

Easily add transcription to your app or package. Powered by whisper.cpp.

Install

Swift Package Manager

Add SwiftWhisper as a dependency in your Package.swift file:

let package = Package(
  ...
  dependencies: [
    // Add the package to your dependencies
    .package(url: "https://github.com/exPHAT/SwiftWhisper.git", branch: "master"),
  ],
  ...
  targets: [
    // Add SwiftWhisper as a dependency on any target you want to use it in
    .target(name: "MyTarget",
            dependencies: [.byName(name: "SwiftWhisper")])
  ]
  ...
)

Xcode

Add https://github.com/exPHAT/SwiftWhisper.git in the "Swift Package Manager" tab.

Usage

API Documentation.

import SwiftWhisper

let whisper = Whisper(fromFileURL: /* Model file URL */)
let segments = try await whisper.transcribe(audioFrames: /* 16kHz PCM audio frames */)

print("Transcribed audio:", segments.map(\.text).joined())

Delegate methods

You can subscribe to segments, transcription progress, and errors by implementing WhisperDelegate and setting whisper.delegate = ...

protocol WhisperDelegate {
  // Progress updates as a percentage from 0-1
  func whisper(_ aWhisper: Whisper, didUpdateProgress progress: Double)

  // Any time a new segments of text have been transcribed
  func whisper(_ aWhisper: Whisper, didProcessNewSegments segments: [Segment], atIndex index: Int)
  
  // Finished transcribing, includes all transcribed segments of text
  func whisper(_ aWhisper: Whisper, didCompleteWithSegments segments: [Segment])

  // Error with transcription
  func whisper(_ aWhisper: Whisper, didErrorWith error: Error)
}

Misc

Downloading Models ๐Ÿ“ฅ

You can find the pre-trained models here for download.

CoreML Support ๐Ÿง 

To use CoreML, you'll need to include a CoreML model file with the suffix -encoder.mlmodelc under the same name as the whisper model (Example: tiny.bin would also sit beside a tiny-encoder.mlmodelc file). In addition to the additonal model file, you will also need to use the Whisper(fromFileURL:) initializer. You can verify CoreML is active by checking the console output during transcription.

Converting audio to 16kHz PCM ๐Ÿ”ง

The easiest way to get audio frames into SwiftWhisper is to use AudioKit. The following example takes an input audio file, converts and resamples it, and returns an array of 16kHz PCM floats.

import AudioKit

func convertAudioFileToPCMArray(fileURL: URL, completionHandler: @escaping (Result<[Float], Error>) -> Void) {
    var options = FormatConverter.Options()
    options.format = .wav
    options.sampleRate = 16000
    options.bitDepth = 16
    options.channels = 1
    options.isInterleaved = false

    let tempURL = URL(fileURLWithPath: NSTemporaryDirectory()).appendingPathComponent(UUID().uuidString)
    let converter = FormatConverter(inputURL: fileURL, outputURL: tempURL, options: options)
    converter.start { error in
        if let error {
            completionHandler(.failure(error))
            return
        }

        let data = try! Data(contentsOf: tempURL) // Handle error here

        let floats = stride(from: 44, to: data.count, by: 2).map {
            return data[$0..<$0 + 2].withUnsafeBytes {
                let short = Int16(littleEndian: $0.load(as: Int16.self))
                return max(-1.0, min(Float(short) / 32767.0, 1.0))
            }
        }

        try? FileManager.default.removeItem(at: tempURL)

        completionHandler(.success(floats))
    }
}

Development speed boost ๐Ÿš€

You may find the performance of the transcription slow when compiling your app for the Debug build configuration. This is because the compiler doesn't fully optimize SwiftWhisper unless the build configuration is set to Release.

You can get around this by installing a version of SwiftWhisper that uses .unsafeFlags(["-O3"]) to force maximum optimization. The easiest way to do this is to use the latest commit on the fast branch. Alternatively, you can configure your scheme to build in the Release configuration.

  ...
  dependencies: [
    // Using latest commit hash for `fast` branch:
    .package(url: "https://github.com/exPHAT/SwiftWhisper.git", revision: "deb1cb6a27256c7b01f5d3d2e7dc1dcc330b5d01"),
  ],
  ...

swiftwhisper's People

Contributors

exphat avatar jonathanstorey avatar jordibruin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.