sveinbjornt / hear Goto Github PK
View Code? Open in Web Editor NEWCommand line speech recognition and transcription for macOS
Home Page: https://sveinbjorn.org/hear
License: BSD 3-Clause "New" or "Revised" License
Command line speech recognition and transcription for macOS
Home Page: https://sveinbjorn.org/hear
License: BSD 3-Clause "New" or "Revised" License
On Monterey 12.6, arm64, this error occurs upon run from fresh build or bash install:
Process: hear [32653]
Path: /usr/local/bin/hear
Identifier: hear
Version: ???
Code Type: ARM-64 (Native)
Parent Process: Exited process [32651]
Responsible: Electron [11201]
User ID: 0
Date/Time: 2022-09-16 19:05:40.5243 -0400
OS Version: macOS 12.6 (21G115)
Report Version: 12
Anonymous UUID: 041C5E4F-1501-2D80-5C6B-36160B336907
Time Awake Since Boot: 23000 seconds
System Integrity Protection: enabled
Crashed Thread: 2 Dispatch queue: com.apple.root.default-qos
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY
Termination Reason: Namespace TCC, Code 0
This app has crashed because it attempted to access privacy-sensitive data without a usage description. The app's Info.plist must contain an NSSpeechRecognitionUsageDescription key with a string value explaining to the user how the app uses this data.
hear -d -l 'en-GB' -i someM4A.m4a > aTranscription.txt
results in
Error: The operation couldn’t be completed. (kAFAssistantErrorDomain error 203.)
Running 12.4 (21F79), with en-GB for on device.
$ hear
[1] 25534 abort hear
With macOS 13 there's been an automatic punctuation function added into the language model.
It seems to be a matter of setting of one property, but I don't even know how to build the program properly, so I'll leave it up to you knowledgeable folks :)
https://developer.apple.com/documentation/speech/sfspeechrecognitionrequest/3930023-addspunctuation?changes=late_1_2
First things first: I love this tool!
It's working great for the english language. Maybe I'm using it wrong but I would expect this to return german text (if german text was spoken):
hear -l de-DE
Instead I'm getting: No file at path de-DE
I reassured that this language is supported by executing hear -s
(the list contains de-DE).
Where am I supposed to save what file exactly? Or how should I call hear -l
?
I cannot for the life of me get the -l flag to work.
Running an Intel build of 12.4 any calls to a language other than the default results in:
No file at path en-GB
or en-IE or sk-SK etc. I assume that it either can’t find the correct path to a language pack or in some way it the OS needs to be informed to download that specific language pack?
Dictation in other applications (e.g. text edit) are working as expected in a system preferences selected language.
Is there something I’m missing about how to use other language profiles?
Hi, great utility! Any way to integrate this with ffmpeg to be able to auto-subtitle videos? Maybe there's a piping command that could make this work today, but I'm not able to figure that out.
Hi, I'm interested in using hear for Speech Recognition when I have a call via e.g. FaceTime, Microsoft Teams or Google Meet. Most of these apps already have a recognition feature, but mostly it's for English only.
Do you know if it's possible to use the audio output, instead of the audio input, as source for hear
?
I run hear with:
hear -l it-IT -d
It works perfectly.
Then I leave it running for some time without speaking.
When I speak again, speech is no logged anymore.
After some seconds of speech, hear restarts logging the speech.
It sounds like if there's an auto idle status after some time of no speech.
I checked in activity monitor, and neither hear nor localspeechrecognition processes show App Nap is active.
Any idea?
Hi,
thanks for this great job. I'm not exactly writing an issue, rather a feature request. Hear adds a carriage return after a certain timespan. Consequently, it starts to log the speech in a new line. Would be useful to log on the output whenever the sentence is considered finished, and next word will be logged on a new line. This could be achieved by simply adding a carriage return at the proper deadline, without waiting for a new word to be logged.
cheers
michele
It would be nice if one could chose from one of the available languages. Is that planned? :-)
bash-5.1$ uname -v
Darwin Kernel Version 21.6.0: Mon Aug 22 20:19:52 PDT 2022; root:xnu-8020.140.49~2/RELEASE_ARM64_T6000
bash-5.1$ /usr/libexec/PlistBuddy -c "Print:ProductName" -c "Print:ProductVersion" -c "Print:ProductBuildVersion" /System/Library/CoreServices/SystemVersion.plist
macOS
12.6
21G115
bash-5.1$ git clone https://github.com/sveinbjornt/hear
bash-5.1$ cd hear
bash-5.1$ make
** BUILD SUCCEEDED **
bash-5.1$ cd products
bash-5.1$ ./hear
Abort trap: 6
Timed text format?
Hello, I'm hoping to use this tool as a way to have speech-to-text in the terminal and to then send the query as a JSON payload to the ChatGPT API.
The little Bash script looks like this:
The problem that I'm encountering is that the JSON is malformed, and ends up looking like this:
{ "model": "text-davinci-003", "prompt": "\u001b[2K\rTell\u001b[2K\rTell me a\u001b[2K\rTell me a joke\u001b[2K\rTell me a joke send", "max_tokens": "100", "temperature": "0.5" }
Do you know why this is the case, and how I might resolve this so that only the full final output of hear
is used? From what I can tell by cat
-ing out the /tmp/voice.txt
file, it does contain just the prompt. I'm confused on why the includes these extra ANSI escape characters.
When I run 'hear' from Alacritty it does not ask access to speech recognition and return 'zsh: abort'.
Console.app:
ERROR: This app has crashed because it attempted to access privacy-sensitive data without a usage description. The app's Info.plist must contain an NSSpeechRecognitionUsageDescription key with a string value explaining to the user how the app uses this data.
On the other hand, when I tried to run 'hear' using Terminal.app, it worked perfectly. Terminal asked for access to speech recognition and started outputting the transcribed text.
Apple documentation says that "You must include the NSSpeechRecognitionUsageDescription key in your app’s Info.plist file. If this key is not present, your app will crash when it attempts to request authorization or use the APIs of the Speech framework."(1) Because "This key is required if your app uses APIs that send user data to Apple’s speech recognition servers."(2)
I added to /Applications/Alacritty.app/Contents/Info.plist:
<key>NSSpeechRecognitionUsageDescription</key>
<string>An application in Alacritty would like to access speech recognition.</string>
Now 'hear' exit with:
Error: Speech recognition authorization not determined
Line 114 in e324756
Does anyone know how to solve this issue? Or have any suggestion?
Thanks!
I have Mac Mini (M1), MacOS Monterey 12.5.1 (21G83).
After attempt to launch hear
I see the following:
./hear -s
ar-SA
ca-ES
cs-CZ
da-DK
de-AT
de-CH
de-DE
el-GR
en-AE
en-AU
en-CA
en-GB
en-ID
en-IE
en-IN
en-NZ
en-PH
en-SA
en-SG
en-US
en-ZA
es-419
es-CL
es-CO
es-ES
es-MX
es-US
fi-FI
fr-BE
fr-CA
fr-CH
fr-FR
he-IL
hi-IN
hi-IN-translit
hi-Latn
hr-HR
hu-HU
id-ID
it-CH
it-IT
ja-JP
ko-KR
ms-MY
nb-NO
nl-BE
nl-NL
pl-PL
pt-BR
pt-PT
ro-RO
ru-RU
sk-SK
sv-SE
th-TH
tr-TR
uk-UA
vi-VN
wuu-CN
yue-CN
zh-CN
zh-HK
zh-TW
But nevertheless I can not transcribe any audio with the following error:
./hear -d -i ./test.m4a
2023-03-09 19:20:01.767 hear[21203:2358806] Required assets are not available for Locale:en-US
Error: On-device recognition is not supported for en-US
Here are the settings of dictation (see the screenshot).
That would be great. Thanks for making this awesome CLI!
Hi, thanks for the work. I understand correctly? Will it work with the flag -d only the language package that is in ‘Siri’ and is loaded? I got a job only with the flag -d ‘ru-RU’ and ‘en -US’, but I need ‘uk-UA’. ‘Siri’ does not speak Ukrainian and so I can’t get a translation with a flag -d. I conducted an experiment, installed ‘Siri’ French, the language package loaded and I had an ‘fr -Fr’ Local with a flag -d.
Please check in the build_unsigned file
Whatever command I run, other than hear -h
or hear -v
, I am getting a zsh: abort
response. For instance:
$ hear -m
zsh: abort hear -m
Is there something I am missing?
I wish hear
could stop automatically, instead of staying active. Maybe when there is no more input for 3 seconds, or when reaching a user-defined text length?
In my simple game, the user has to say back the word.
say --voice="Samantha" -- "$word [[slnc 400]]"
response=$(hear --mode)
shopt -s nocasematch
if [[ "${response}" == "${word}" ]]; then
echo ✅ $word
else
echo 🛑 wrong. ➡ $word
echo " - press enter for next word -"; read
fi
However pressing CTRL-C to stop the listening it stops the script instead.
Also for scripting, the ESCAPE-DELETE control characters get into the way. A plain text output when the program is complete would be preffered. The --mode output looks like this
�[2K
Good�[2K
�[2K
Good morning
Workarounds:
response=$(timeout 4 hear --mode)
response=$(echo "$response" | sed 's/.*\[2K//g' | tr -d '\r\n')
hear -i some.mp3 > text.txt
or
hear -i some.mp3
... only outputs the same first 500 bytes of text
macOS Monterey version 12.4
Thoughts?
chris
Is there any way to take audio input from inside the computer (any application), maybe via a virtual audio driver. For now, it can only take in default microphone input. Thanks. @sveinbjornt @MrYakobo @adisidev
first try on MacbookPro M1 and works fine but same not working on desktop iMac...
Looks like some security issue. first allowed app opened from App store and identified developers
but still getting abort.
hear -h is works fine
Checking why with:
xattr -l /usr/local/bin/hear
getting : com.apple.quarantine: 0081;665dd081;Chrome;
also tried to exclude from quarantine xattr -d com.apple.quarantine /path/to/file , but not helped..
Submitted for the record.
I've just had a frustrating few hours getting hear going in Monterey 12.5.1. Its a neat piece of work.
If you have the same issue it is not Sveinbjornt's problem, it is caused by Apple not supporting continuous dictation in older Intel based Macs.
See: https://discussions.apple.com/thread/253318311
Please see macOS Monterey - New Features. The availability of this feature is limited to Macs with the M1 chip. Does your Mac mini have the M1?
Perhaps a comment could be added to the installation notes to avoid more frustrated users?
The limitation could be subverted by automatically breaking the input audio file into less than 30(or is it 60) second chunks. A task for someone cleverer than me.
Using iTerm 2 with zsh, hear immediately aborts:
hear -d -i example.m4a > output.txt
zsh: abort hear -d -i example.m4a > output.txt
iTerm 2, Build 3.4.19
MacBook Pro, M1 Pro
Ventura 13.3.1
Using Terminal Version 2.13, this issue does not occur.
Thanks for awesome software!
It works in terminal but when I run the same script using visualstudiocode it outputs nothing (no errors too, neither in code or in result.stderr
).
I run it like result = subprocess.run(["./hear", "-l", "en-US", "-d", "-p", "-i", "output.wav"], capture_output=True, text=True)
VisualStudioCode has access to microphone and it works (checked it)
Thx!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.