Giter Site home page Giter Site logo

localstt's Introduction

LocalSTT

(Jump to English)

[Català]

Nota: Aquesta aplicació de moment només és una prova de concepte

LocalSTT és una aplicació per Android que proporciona reconeixement automàtic de la parla sense necessitat de conexió a internet ja que tot el processament és local al mòbil.

Això és possible gràcies a:

  • un RecognitionService que utilitza la llibreria de Vosk
  • un RecognitionService que utilitza la lliberia de Mozilla Deepspeech
  • una Activity que gestiona intents RECOGNIZE_SPEECH entre altres

El codi és actualment una prova de concepte i es basa fortament en els següents projectes:

LocalSTT hauria de funcionar amb la majoria de teclats i aplicacions que implementen la funció de reconeixement de veu a través d'un intent RECOGNIZE_SPEECH o directament fent servir la classe SpeechRecognizer d'Android. Ha estat provada amb èxit fent servir les següent aplicacions en un terminal Android 9:

Us podeu descarregar un APK que inclou models de Vosk i DeepSpeech pel català aquí.

[English]

Note: This application is just a proof of concept for now

LocalSTT is an Android application that provides automatic speech recognition services without needing internet connection as all processing is done locally on your phone.

This is possible thanks to:

  • a RecognitionService wrapping the Vosk library
  • a RecognitionService wrapping Mozilla's DeepSpeech library
  • an Activity that handles RECOGNIZE_SPEECH intents amongst others

The code is currently just a PoC strongly based on:

LocalSTT should work with all keyboards and applications implementing speech recognition through the RECOGNIZE_SPEECH intent or Android's SpeechRecognizer class. It has been successfully tested using the following applications on Android 9:

You can download a pre-built binary with Vosk and DeepSpeech models for catalan here.

If you want to use the application with your language just replace the models in app/src/main/assets/sync and rebuild the application.

Demo

LocalSTT in action

localstt's People

Contributors

ccoreilly avatar darthpleurotus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

localstt's Issues

Just an appreciation

Thanks for your great work. I'm not a programmer, but I've been looking to replace google services. Although, i love their speech to text system. Thanks to you and developers of kõnele, i can use STT without dependency on google. At least, the things i want to use it for, work great. Thank You.

English models by default

Hi.

From what I've found out the models used by default are for Catala language. Have You considered adding English one as a default? I believe it would be useful for a great amount of people (including me). In my opinion there could even be two or even more separate APKs to download: for Catala and English language, depending on the models. That would be ok for me at least. Then You could even add this application to Google Play or F-Droid store I believe.

For now the application looks promising, but without English models it's not really useful for me. I may find the models and rebuild the application manually, but it's truly more effort then downloading an APK (and it's theoretically something that every English speaking person would have to do).

Thank You for Your work anyway. I don't feel there are many more free and open-source projects like that, if there are any. I appreciate it.

How to use other (English for example) models?

I downloaded the Vosk models from https://alphacephei.com/vosk/models (vosk-model-small-en-us-0.15 exactly) and added them into /app/src/main/assets/sync . I wanted to also add DeepSpeech models, but I couldn't find a .scorer smaller then 900 MB which seems just too much (the "apk" hed about 1 GB). Are those models required? Where to take a smaller "scorer" from? I experimented for quite a time (trying to use only Vosk models, also DeepSpeech once without "scorer" and so on) and installed app I built has been just crashing after some time after running. Do the folders need to have some specific names for example?

more than one language

So as naming is hardcoded I have to replace vosk model files inside vosk-catala folder.
What if I need a second language?
English and another?

Error loading recogniser

Quan intente obrir el LocalSTT en el meu mòbil m'ix el següent error: "Error loading recogniser". L'app té els permissos de grabar i aceder al sistema de fitxers.

La versió d'Android és 10 amb el nucli 4.9.206-perf+. No tinc els serveis de Google Play habilitats.

Si cal alguna informació més, avisa'm. Gràcies per un treball collonut!

error loading recogniser

I can install the release localstt (2020-12-03).
But when starting it, I get the error stated above. The log I posted here.
Here is the filtered content (only Local TTS) from a new test.

My device: Pixel 2 with LineageOS for microG.
Android 11 (18.1-202111208-microG-walleye).

Is there any solution there? Or can someone give me a hint on how to build my own apk?

Small scorer file

Hey! Very excited to see this. It seems to still build and function in 2024. I had to update the version of the vosk dependency to 0.3.47 to get it to build. For DeepSpeech/English, I had to replace the model files with the 0.8.2 deepspeech releases and add org.gradle.jvmargs=-Xmx16g to gradle.properties to let gradle build with the massive multilingual .scorer file. After that I switched up the paths in app/src/main/java/cat/oreilly/localstt/DeepSpeechRecognitionService.java and it works. I did have to use ndk 20.1.5948944, it wouldn't work with the latest NDK.

I was wondering how you got the very small .scorer file for Catalan? Is there a script somewhere I could run to extract single languages from the big scorer file? Or did you generate your own? It looks like there might be ways to do that, but I haven't looked into it too closely yet. The scorer file from the DeepSpeech release seems to work with the little bits of Spanish I know, so I'm guessing it works with every language that DeepSpeech supports. No wonder it's so big! 😅

Thank you so much for publishing this. It's really helpful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.