sveinbjornt / hear Goto Github PK

View Code? Open in Web Editor NEW

327.0 327.0 20.0 564 KB

Command line speech recognition and transcription for macOS

Home Page: https://sveinbjorn.org/hear

License: BSD 3-Clause "New" or "Revised" License

Roff 5.61% Objective-C 76.26% Shell 2.69% Makefile 5.18% HTML 7.87% Ruby 2.39%

command-line command-line-tool macos speech-recognition transcribe-audio-files transcription

hear's People

Contributors

Stargazers

Watchers

Forkers

daenjiel bizeu rasata synox michaelnoguera uwilli curtcox dlorencic adisidev dmittman surgeontalus tanyingyu johnrkowalsky dtinth heydoctordre carloslaotrujillo expelledboy chinarui-na

hear's Issues

Crash - privacy usage string missing from info.plist

On Monterey 12.6, arm64, this error occurs upon run from fresh build or bash install:
Process: hear [32653]
Path: /usr/local/bin/hear
Identifier: hear
Version: ???
Code Type: ARM-64 (Native)
Parent Process: Exited process [32651]
Responsible: Electron [11201]
User ID: 0

Date/Time: 2022-09-16 19:05:40.5243 -0400
OS Version: macOS 12.6 (21G115)
Report Version: 12
Anonymous UUID: 041C5E4F-1501-2D80-5C6B-36160B336907

Time Awake Since Boot: 23000 seconds

System Integrity Protection: enabled

Crashed Thread: 2 Dispatch queue: com.apple.root.default-qos

Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY

Termination Reason: Namespace TCC, Code 0
This app has crashed because it attempted to access privacy-sensitive data without a usage description. The app's Info.plist must contain an NSSpeechRecognitionUsageDescription key with a string value explaining to the user how the app uses this data.

kAFAssistantErrorDomain error 203

hear -d -l 'en-GB' -i someM4A.m4a > aTranscription.txt

results in

Error: The operation couldn’t be completed. (kAFAssistantErrorDomain error 203.)

Running 12.4 (21F79), with en-GB for on device.

Abort after starting.

$ hear
[1] 25534 abort hear

FR: Automatic punctuation

With macOS 13 there's been an automatic punctuation function added into the language model.
It seems to be a matter of setting of one property, but I don't even know how to build the program properly, so I'll leave it up to you knowledgeable folks :)
https://developer.apple.com/documentation/speech/sfspeechrecognitionrequest/3930023-addspunctuation?changes=late_1_2

Supported language not finding file

First things first: I love this tool!

It's working great for the english language. Maybe I'm using it wrong but I would expect this to return german text (if german text was spoken):
hear -l de-DE
Instead I'm getting: No file at path de-DE

I reassured that this language is supported by executing hear -s (the list contains de-DE).

Where am I supposed to save what file exactly? Or how should I call hear -l ?

Usage of the -l flag

I cannot for the life of me get the -l flag to work.

Running an Intel build of 12.4 any calls to a language other than the default results in:

No file at path en-GB

or en-IE or sk-SK etc. I assume that it either can’t find the correct path to a language pack or in some way it the OS needs to be informed to download that specific language pack?

Dictation in other applications (e.g. text edit) are working as expected in a system preferences selected language.

Is there something I’m missing about how to use other language profiles?

Integrate with ffmpeg?

Hi, great utility! Any way to integrate this with ffmpeg to be able to auto-subtitle videos? Maybe there's a piping command that could make this work today, but I'm not able to figure that out.

Translate from audio output

Hi, I'm interested in using hear for Speech Recognition when I have a call via e.g. FaceTime, Microsoft Teams or Google Meet. Most of these apps already have a recognition feature, but mostly it's for English only.

Do you know if it's possible to use the audio output, instead of the audio input, as source for hear?

Hear "go to sleep" after some inactivity

I run hear with:
hear -l it-IT -d

It works perfectly.
Then I leave it running for some time without speaking.
When I speak again, speech is no logged anymore.
After some seconds of speech, hear restarts logging the speech.

It sounds like if there's an auto idle status after some time of no speech.

I checked in activity monitor, and neither hear nor localspeechrecognition processes show App Nap is active.

Any idea?

FR: log when the sentence is considered finsihed

Hi,
thanks for this great job. I'm not exactly writing an issue, rather a feature request. Hear adds a carriage return after a certain timespan. Consequently, it starts to log the speech in a new line. Would be useful to log on the output whenever the sentence is considered finished, and next word will be logged on a new line. This could be achieved by simply adding a carriage return at the proper deadline, without waiting for a new word to be logged.
cheers
michele

Choose language

It would be nice if one could chose from one of the available languages. Is that planned? :-)

"Abort trap: 6" - Permission request gets lost

bash-5.1$ uname -v

Darwin Kernel Version 21.6.0: Mon Aug 22 20:19:52 PDT 2022; root:xnu-8020.140.49~2/RELEASE_ARM64_T6000

bash-5.1$ /usr/libexec/PlistBuddy -c "Print:ProductName" -c "Print:ProductVersion" -c "Print:ProductBuildVersion" /System/Library/CoreServices/SystemVersion.plist

macOS
12.6
21G115

bash-5.1$ git clone https://github.com/sveinbjornt/hear
bash-5.1$ cd hear
bash-5.1$ make

** BUILD SUCCEEDED **

bash-5.1$ cd products
bash-5.1$ ./hear
Abort trap: 6

Support exporting to SRT or WebTT?

Timed text format?

PTAL at https://github.com/arcusmaximus/YTSubConverter

Dealing With ANSI Escape Sequences

Hello, I'm hoping to use this tool as a way to have speech-to-text in the terminal and to then send the query as a JSON payload to the ChatGPT API.

The little Bash script looks like this:

The problem that I'm encountering is that the JSON is malformed, and ends up looking like this:

{ "model": "text-davinci-003", "prompt": "\u001b[2K\rTell\u001b[2K\rTell me a\u001b[2K\rTell me a joke\u001b[2K\rTell me a joke send", "max_tokens": "100", "temperature": "0.5" }

Do you know why this is the case, and how I might resolve this so that only the full final output of hear is used? From what I can tell by cat-ing out the /tmp/voice.txt file, it does contain just the prompt. I'm confused on why the includes these extra ANSI escape characters.

hear works on Terminal, but not on Alacritty: NSSpeechRecognitionUsageDescription key missing.

When I run 'hear' from Alacritty it does not ask access to speech recognition and return 'zsh: abort'.

Console.app:

ERROR: This app has crashed because it attempted to access privacy-sensitive data without a usage description. The app's Info.plist must contain an NSSpeechRecognitionUsageDescription key with a string value explaining to the user how the app uses this data.

On the other hand, when I tried to run 'hear' using Terminal.app, it worked perfectly. Terminal asked for access to speech recognition and started outputting the transcribed text.

Apple documentation says that "You must include the NSSpeechRecognitionUsageDescription key in your app’s Info.plist file. If this key is not present, your app will crash when it attempts to request authorization or use the APIs of the Speech framework."(1) Because "This key is required if your app uses APIs that send user data to Apple’s speech recognition servers."(2)

I added to /Applications/Alacritty.app/Contents/Info.plist:

<key>NSSpeechRecognitionUsageDescription</key>
<string>An application in Alacritty would like to access speech recognition.</string>

Now 'hear' exit with:

Error: Speech recognition authorization not determined

hear/src/Hear.m

Line 114 in e324756

[self die:@"Speech recognition authorization not determined"];

Does anyone know how to solve this issue? Or have any suggestion?

Thanks!

Error: On-device recognition is not supported for en-US

I have Mac Mini (M1), MacOS Monterey 12.5.1 (21G83).
After attempt to launch hear I see the following:

./hear -s
ar-SA
ca-ES
cs-CZ
da-DK
de-AT
de-CH
de-DE
el-GR
en-AE
en-AU
en-CA
en-GB
en-ID
en-IE
en-IN
en-NZ
en-PH
en-SA
en-SG
en-US
en-ZA
es-419
es-CL
es-CO
es-ES
es-MX
es-US
fi-FI
fr-BE
fr-CA
fr-CH
fr-FR
he-IL
hi-IN
hi-IN-translit
hi-Latn
hr-HR
hu-HU
id-ID
it-CH
it-IT
ja-JP
ko-KR
ms-MY
nb-NO
nl-BE
nl-NL
pl-PL
pt-BR
pt-PT
ro-RO
ru-RU
sk-SK
sv-SE
th-TH
tr-TR
uk-UA
vi-VN
wuu-CN
yue-CN
zh-CN
zh-HK
zh-TW

But nevertheless I can not transcribe any audio with the following error:

./hear  -d -i ./test.m4a        
2023-03-09 19:20:01.767 hear[21203:2358806] Required assets are not available for Locale:en-US
Error: On-device recognition is not supported for en-US

Here are the settings of dictation (see the screenshot).

What am I missing?

Add to homebrew

That would be great. Thanks for making this awesome CLI!

Flag -d

Hi, thanks for the work. I understand correctly? Will it work with the flag -d only the language package that is in ‘Siri’ and is loaded? I got a job only with the flag -d ‘ru-RU’ and ‘en -US’, but I need ‘uk-UA’. ‘Siri’ does not speak Ukrainian and so I can’t get a translation with a flag -d. I conducted an experiment, installed ‘Siri’ French, the language package loaded and I had an ‘fr -Fr’ Local with a flag -d.

Check in build_unsigned

Please check in the build_unsigned file

`zsh: abort` when running from the IntelliJ built-in terminal

Whatever command I run, other than hear -h or hear -v, I am getting a zsh: abort response. For instance:

$ hear -m                      
zsh: abort      hear -m

Is there something I am missing?

Option to stop listening after a while

I wish hear could stop automatically, instead of staying active. Maybe when there is no more input for 3 seconds, or when reaching a user-defined text length?

In my simple game, the user has to say back the word.

	say --voice="Samantha"  -- "$word [[slnc 400]]" 
	response=$(hear --mode)

	shopt -s nocasematch
	if [[  "${response}" == "${word}" ]]; then
		echo ✅ $word
	else
		echo 🛑 wrong. ➡  $word
		echo " - press enter for next word -"; read
	fi

However pressing CTRL-C to stop the listening it stops the script instead.

Also for scripting, the ESCAPE-DELETE control characters get into the way. A plain text output when the program is complete would be preffered. The --mode output looks like this

�[2K
Good�[2K
�[2K
Good morning

Workarounds:

stop after 4 seconds: response=$(timeout 4 hear --mode)
clean string: response=$(echo "$response" | sed 's/.*\[2K//g' | tr -d '\r\n')

Only outputs about 500 bytes of text ?

hear -i some.mp3 > text.txt
or
hear -i some.mp3
... only outputs the same first 500 bytes of text
macOS Monterey version 12.4
Thoughts?
chris

In-computer audio input

Is there any way to take audio input from inside the computer (any application), maybe via a virtual audio driver. For now, it can only take in default microphone input. Thanks. @sveinbjornt @MrYakobo @adisidev

execution denied

Hello, I tried hear but denied execution.
It said "hear" can't be opened because Apple cannot check it for malicious software.
Any suggestion for a workaround?

Thanks in advance.

Getting - zsh: abort hear on iMac M1

first try on MacbookPro M1 and works fine but same not working on desktop iMac...

Looks like some security issue. first allowed app opened from App store and identified developers
but still getting abort.

hear -h is works fine

Checking why with:
xattr -l /usr/local/bin/hear

getting : com.apple.quarantine: 0081;665dd081;Chrome;

also tried to exclude from quarantine xattr -d com.apple.quarantine /path/to/file , but not helped..

Continuous dictation only works on Apple Silicon M1

Submitted for the record.
I've just had a frustrating few hours getting hear going in Monterey 12.5.1. Its a neat piece of work.
If you have the same issue it is not Sveinbjornt's problem, it is caused by Apple not supporting continuous dictation in older Intel based Macs.

See: https://discussions.apple.com/thread/253318311

Please see macOS Monterey - New Features. The availability of this feature is limited to Macs with the M1 chip. Does your Mac mini have the M1?

Perhaps a comment could be added to the installation notes to avoid more frustrated users?

The limitation could be subverted by automatically breaking the input audio file into less than 30(or is it 60) second chunks. A task for someone cleverer than me.

iTerm 2 zsh: abort

Using iTerm 2 with zsh, hear immediately aborts:

hear -d -i example.m4a > output.txt
zsh: abort      hear -d -i example.m4a > output.txt

iTerm 2, Build 3.4.19
MacBook Pro, M1 Pro
Ventura 13.3.1

Using Terminal Version 2.13, this issue does not occur.

Doesn't work in VisualStudioCode

Thanks for awesome software!
It works in terminal but when I run the same script using visualstudiocode it outputs nothing (no errors too, neither in code or in result.stderr).

I run it like result = subprocess.run(["./hear", "-l", "en-US", "-d", "-p", "-i", "output.wav"], capture_output=True, text=True)
VisualStudioCode has access to microphone and it works (checked it)
Thx!