Giter Site home page Giter Site logo

saik0s / whisperboard Goto Github PK

View Code? Open in Web Editor NEW
589.0 16.0 64.0 178.36 MB

The open-source iOS app that's making quality voice transcription more accessible on mobile devices.

License: GNU General Public License v3.0

Swift 99.01% Shell 0.45% Makefile 0.40% C 0.01% PHP 0.13%
openai ios speech-recognition speech-to-text swiftui transcription audio-to-text composable-architecture tca tuist

whisperboard's Introduction

WhisperBoard

Welcome to WhisperBoard, the open-source iOS app that's making quality voice transcription more accessible on mobile devices. Built with the power of OpenAI's Whisper model, WhisperBoard is your go-to tool for capturing thoughts, meetings, and conversations with unparalleled accuracy.



Twitter: @sa1k0s Commit Activity License Powered by Tuist

🎙️ Features That Speak Volumes

  • Simplicity at Your Fingertips: Start recording with a single tap and play back your audio with ease.
  • Transcription Magic: Powered by OpenAI's Whisper, your audio is transcribed with cutting-edge technology.
  • Audio File Mastery: Import your existing audio files or export new ones for seamless sharing and editing.
  • Mic Check: Choose your preferred microphone to ensure the best sound quality for your recordings.
  • Model Flexibility: Browse and download any Whisper model directly from the app to tailor your transcription experience.

🚀 On the Horizon

  • Continue where left: With resumable transcription, you can pick up right where you left off, even if the app closes.
  • Instant text: Real-time transcription is on our roadmap, aiming to give you immediate results with smaller, faster models.

Installation

  1. Clone this repository
  2. Run make
  3. Open the project in Xcode

License

This project is licensed under the GPL-3.0 license.

The Poppins and Karla fonts used in project are licensed under the SIL Open Font License.

Links

Buy Me A Coffee

whisperboard's People

Contributors

saik0s avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

whisperboard's Issues

Suggestions

Great app, I would pay for this! A couple of ideas: 1. Allow for import of existing voice recordings from the native iPhone app. 2. Ability to see the transcribing and ability to copy and paste from it while it's still recording. 3. Allow for saving and importing files from both the native files app and dropbox. Thanks for putting this together!

make didn't working

When I tried to build your project, I got the following error.

make
Resolving and fetching plugins.
Plugins resolved and fetched successfully.
Resolving and fetching dependencies.
Installing Swift Package Manager dependencies.
error: 'swiftpackagemanager': Invalid manifest
/Path_to/Whisperboard-main/Tuist/Dependencies/SwiftPackageManager/Package.swift:2:8: error: module 'PackageDescription' was created for incompatible target x86_64-apple-macosx10.14: /var/folders/y9/4jvdtw7517q5c3tvz0ttyj2r0000gn/C/clang/ModuleCache/PackageDescription-3QVF8C4290OM5.swiftmodule
import PackageDescription
^
The 'swift' command exited with error code 1
Consider creating an issue using the following link: https://github.com/tuist/tuist/issues/new/choose
make: *** [all] Error 1

My swift version is
Apple Swift version 5.7.2 (swiftlang-5.7.2.135.5 clang-1400.0.29.51)
Target: arm64-apple-darwin22.3.0

How can I modify the settings to build your project?
Thanks in advance.

Is there a way to turn off the diagonally swaying of the saved files on Whisperboard?

Thanks for making such a wonderful free and open source Whisper based speech recognition app for iOS!

Is there a way to turn off the diagonal swaying of the saved files screen on Whisperboard? It is visually interesting but somewhat visually distracting for me to see the files I saved diagonally swaying back and forth and needing to select from the swaying files. Would be nice to have an option to somehow select another screen (or side panel) which statically shows the saved files.

transcription while app is running in background

This app is amazing! I can't believe you are giving it away for free!

I see that you have the following future feature planned:
Enable background transcription when the app is minimized, allowing users to perform other tasks while the transcription proceeds.

I'm simply filing this issue to vote +1 for this feature. With this feature, I think your app would fully replace the cloud-based transcription apps like Otter.ai. I love the privacy, portability, and simplicity of transcribing on my device --- but I need to be able to do other things in the background, especially when using one of the larger models.

Microphone background audio when switching to other apps

Hi, I was wondering if you planned on implementing ability to have the transcribing / recording still going if you switch to another app? Like in voice memos when you go to another app it shows the orange bar up top indicating something is still recording / using your microphone. Is this something on your radar? Thanks!

Some suggestions!

Great app! I've tried a few. Yours is the most stable. For my use-cases of taking notes rapidly and often (daily about two dozen), your app worked quite well.
I tried another one free. The transcription deteriorated after taking a dozen notes.

The following are a wish list:

  1. When copying or exporting, support a configurable option to include the timestamp of when the notes were recorded. (The timestamp tells me about my state of productivity.)
  2. The flow of user interaction might be simplified. For example,
    When launching the app, at the screen of note list. Touch the icon of microphone should directly start recording (currently one has to touch in the recording screen a mic icon again!
    In the recording screen, when one touches the check icon to accept the recording, I hope that the transcription can start by default. Currently, one has to explicitly request the transcription.
  3. Batch export/deletion of multiple notes. The set of notes might be selected by time range or user manual selection.

Thanks again for a very helpful app!

Repeated words and phrases in the transcription

I am noticing that many of my transcriptions contain many repetitions of a word or phrase that the algorithm seems to get stuck on. Sometimes the repetitions are all in a row, other times the repetitions will "echo" throughout a long transcription.

In this example, the word "certainty" was only said once:
image

I have noticed this with both the "small" and "medium" models, I haven't tested all of them.

Folder support

Feature Request

It would be nice to organize voice memos in different folders, just like the official voice memo app by apple.

My usecase

I record 10-20 voice notes daily ranging from Checklist items, Random thought and journaling. It would be nice to sort them out in different folders. Just like how the official voice memo app does it.

Make is not running. "No such file or directory: '../.env'"

This is what I get when I run "make".

MacBook-Pro Whisperboard % make
[Errno 2] No such file or directory: '../.env'
Failed to read .env file. Using default values.
Resolving and fetching plugins.
Plugins resolved and fetched successfully.
Resolving and fetching dependencies.
Installing Swift Package Manager dependencies.
error: terminated(1): /usr/bin/xcrun --sdk macosx --show-sdk-platform-path output:
xcrun: error: unable to lookup item 'PlatformPath' from command line tools installation
xcrun: error: unable to lookup item 'PlatformPath' in SDK '/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk'

The 'swift' command exited with error code 1
Consider creating an issue using the following link: https://github.com/tuist/tuist/issues/new/choose
make: *** [project_file] Error 1

Transcription accuracy

Can you add an option of using the large model? Else the accuracy of transcription is not great

`Tuist` config is missing?

Great work.

But I have some problem on generating the project. Follow the README.md, I run make command and only get the following on terminal:

tuist fetch
make: tuist: No such file or directory
make: *** [all] Error 1

Is there something i missed?

Support for large-v2

Will you add the support for large-v2? Maybe on the new iPhones the large model can run without any issue?

Crash on Transcribe/Start Over (was: Crash on Open)

Igor @Saik0s , what is the best way to help you debug and/or collect debugging info?

iOS 17.2.1
iPhone 14 Pro (not max)
Whisperboard 1.11.6

  • I imported a long podcast mp3 from the app Overcast why the share sheet
  • I started transcribing using the large model
  • it crashed but I wasn't looking at the phone at that time.
  • since then, even after a reboot and the iOS update to the .1 version the app now crashes on launch
  • I do quickly see the blue screen with a ✔ checkmark and "iphone Microphone" as text and then it exits

As I have a couple of finished voice recording transscripts in the app which I have not yet saved elsewhere I do not want to de-install as I will loose all the data

What would be helpful here would be a "Files" Folder in the iOs Files App where the recordings and transcripts reside. That way I could access them without the app and also remove buggy files if needed....

Great project

Great project which will allow user can transcribe in the iPhone.

May I ask something about the future works?

  1. Now the app can only share the transcribed text to another App, Can or will it allow to share the audio to another App?
  2. In the recording view, Can or will it allow to transcribe the speech on time. Showing the text on the fly such like that:

image

word level timestamps and diarization?

Great app, really fast on my iPhone!

I was wondering if it'd be possible to get word level timestamps and/or speaker diarization.

I noticed ggml whisper has introduced diarization recently, so perhaps it's possible in your app now?

Would be great for podcasts or meetings.

App crashes

I was playing around with this in Xcode and then realized it’s live in the App Store. Awesome!

My three initial findings are (and I can file separate issues if preferable, and provide screen recordings):

  1. The app crashes if I hit “transcribe” after recording something. When I re-open the app and hit transcribe, then it succeeds.
  2. The transcription doesn’t always seem to be accurate. For a simple two-sentence recording, I had to re-transcribe it three times before it picked up the first and last five seconds.
  3. There doesn’t currently seem to be any way to delete models that are not needed, but taking up space.

UI refresh bug report

This is mostly a UI-update bug report:

When downloading a model, the size is not immediately reflected in the “storage taken” size on the settings screen.

After using the “Delete Storage” button, downloaded models still appear to be installed. Tapping a model refreshes the UI and shows the correct model list.

Also, after deleting storage, the recording list retains deleted recording entries. Tapping on one opens the detail screen, but with a spinner at the bottom of the screen.

Suggestion: tapping on the settings gear icon, when in a settings sub-screen, should take you back to the main settings screen.

[Feature request] Ability to search transcriptions

Is your feature request related to a problem? Please describe.
Having a lot of transcriptions in WhisperBoard and then needing to find one specific one is cumbersome, without the ability to search the transcription title & body

Describe the solution you'd like
Ability to search transcriptions (title & body)

Additional context
Search text
in transcription Title

in transcription body

Passing argument of non-sendable type 'NSItemProvider' outside of main actor-isolated context may introduce data races

func processInputItems(extensionContext: NSExtensionContext) async {
    do {
      try setupSharedContainer()

      let itemsAttachments = extensionContext.inputItems
        .compactMap { $0 as? NSExtensionItem }
        .compactMap(\.attachments)
        .flatMap { $0 }

      for itemProvider in itemsAttachments where itemProvider.hasItemConformingToTypeIdentifier(UTType.url.identifier) {
        let data = try await itemProvider.loadItem(forTypeIdentifier: UTType.url.identifier, options: nil)
        if let data = data as? URL {
          try await handleLoadedData(data)
        } else {
          throw ShareError.somethingWentWrong
        }
      }
      state = .success
    } catch {
      print("\(#filePath):\(#line)", error.localizedDescription)
      presentErrorAlert(for: ShareError.somethingWentWrong)
    }
  }

At let data = try await itemProvider.loadItem(...), there is an error:

/Whisperboard/App/ShareExtension/ShareViewController.swift:237:30 Passing argument of non-sendable type 'NSItemProvider' outside of main actor-isolated context may introduce data races

iOS 15 support

Great app, I really need something like that to transcribe lectures on the go. The problem is, my phone only runs iOS 15.1 for reasons. Would you be open to lowering the minimum version to 15?

Old downloaded models (ie. large-v2) use up space (no delete option) in v1.11.7

I just updated to 1.11.7

  • In Settings - Models it showed "tiny as active"
  • All large models had the "Download" button active
  • I downloaded large-v3
  • Looking in iOS Settings Whisperboard uses up approx 6.5GB, so the old (large-v2) is still on the system but I cannot delete it

Additionally the buttons for the models only have the states

  • Download
  • Active (no background color i.e.: Model is inactive)
  • Active (Green Backround) => model is active and being used

Missing:

  • When the model was downloaded and is not being used you would need a swipe left action to delete and/or a second button with a delete ixon
  • Old models with a different naming scheme seem to be kept on the system. They cannot be selected or deleted at all

Multilingual model translated into English

tested on an iPhone 8 with iOS 16.3.1

I chose the "ggml-small.bin" which I thought would be the multilingual model (like the one from Whisper, without the "en" that I had tested before). But when I spoke 3 test sentences in German, in its output, it was translated into English:

"Okay, that's just a test. But I don't know exactly how this transcription should work. Maybe it would be a potential possibility."

That is indeed a very good translation of what I said in German, but not the task it is supposed to do. :-)

M2 iPad

Did someone try this app on an iPad with m2? How does it perform?

app crashes when switching models

tested on an iPhone 8 with iOS 16.3.1

The first model I had (successfully) downloaded was "ggml-small.bin". After testing it (see my other issue: #7 ), then I downloaded the model "ggml-tiny.bin" but now, every time I try to select it in the settings, the app completely crashes (without giving any error message). It takes me back to iOS homescreen right away.

Haven't tried a complete re-install yet (which will most probably solve this).

KeyboardKit_KeyboardKit.bundle: No such file or directory

Hi,

Thanks for your great job. I cloned your code and followed the instructions. After the successfully make, open the xcode, press run. There is an error:

/Users/binchen/Library/Developer/Xcode/DerivedData/WhisperBoard-enonstfmvuacfxcgqzvzabfepqgv/Build/Products/Debug-iphonesimulator/KeyboardKit_KeyboardKit.bundle /Users/binchen/Library/Developer/Xcode/DerivedData/WhisperBoard-enonstfmvuacfxcgqzvzabfepqgv/Build/Products/Debug-iphonesimulator/KeyboardKit_KeyboardKit.bundle: No such file or directory

I checked that 'make' process successfully downloaded the KeyboardKit git repository. How can I work around?

Ability to suspend or cancel transcription

Transcription can take a long time and lots of resources. Would be nice to be able to pause or cancel.

Other ideas to consider:

  • Compare/view retranscription history
  • Add metadata showing which model was used for transcription.

Great app! Really appreciate the work you've put into it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.