ideasman42 / nerd-dictation Goto Github PK
View Code? Open in Web Editor NEWSimple, hackable offline speech to text - using the VOSK-API.
License: GNU General Public License v3.0
Simple, hackable offline speech to text - using the VOSK-API.
License: GNU General Public License v3.0
First, very cool application!
I use wayland on KDE and found that ydotool (https://github.com/ReimuNotMoe/ydotool) works pretty well as a replacement for xdotool. To use ydotool I changed the command from xdotool to ydotool, removed the --clearmodifiers and changed the backspace command from ["backspace" to ['14', '14:0']
Ydotool is a bit harder to use because of wayland, but it works with both x and wayland so I find that a plus
The documentation says
Once this is working properly you may wish to download one of the larger language models for more accurate dictation. They are available here.
When one goes to that page, the intuitive thing to do is to download the top large model, currently "vosk-model-en-us-0.22"
However, that model on my i7 processor with nerd-dictation takes 5-10 seconds to start responding to speech input, plus it has the bug described in #31 issuing "the" intermittently if you leave it running. Together these really impair the functionality. My presumption is that the default use case for VOSK is to stay running and transcribe something longer than just a snippet of spoken text, where that loading time is not really an issue; but nerd-dictation users I think are going to tend to be turning it on and off, loading and unloading the model from memory.
So I tried the next model down, under "English Other", which is "vosk-model-en-us-daanzu-20200905". That model starts up within a second and seems to have extremely good accuracy. As a user experience, with nerd-dictation, I would describe it as night and day, not-usable vs usable, versus the "main" big VOSK model.
So just for the benefit of new users coming in and trying out nerd-dictation, to not have to do that experimentation and frustration with the "main" large model, please consider pointing in that hint to the daanzu model as one that likely works better for the context of nerd-dictation, at least for now.
It would be nice if there was a flag so you could convert an audio or video file to text.
At the moment I use desktop background sound as a virtual microphone with pavucontrol
and it works flawlessly.
I am aware this is not the right place for that, but wanted to reach out just to say thanks for creating this project!
Only on Windows 11 is the Voice Dictation tool universally available to write anywhere.
As a hobby, I do some writing and this is exactly what I was looking for.
Just as an anecdote, I am using the following changes to support for some basic keyboard commands
text = text.replace(" new line", "\n")
text = text.replace(" dash", "-")
text = text.replace(" slash", "/")
text = text.replace(" period", ".")
text = text.replace(" comma", ",")
text = text.replace(" coma", ",") # This is probably because of my accent, serves as an example too
text = text.replace(" calmer", ",") # This is probably because of my accent, serves as an example too
text = text.replace(" colon", ":")
text = text.replace(" question mark", "?")
text = text.replace(" exclamation mark", "!")
I had to take care a bit of my microphone's input quality, but could do with Pipewire + Easy Effects, to deal with noise suppression and the such, but afterwards, the english standard models works more than fine with this.
Best regards and take care!
Hi, I'm trying to run nerd-dictator on Kubutu 20.04.
I created a virtualenv, activated it and installed vosk by pip3.
I'm running newrd-dictation as root user and I get
./nerd-dictation begin --vosk-model-dir=./model &
pa_context_connect() failed: Connection refused
(the process still runs on background).
What is causing this error?
Am I missing something?
If I try ti run it as normal user, I get some permission error:
File "./nerd-dictation", line 1188, in <module>
main()
File "./nerd-dictation", line 1184, in main
args.func(args)
File "./nerd-dictation", line 1107, in <lambda>
func=lambda args: main_begin(
File "./nerd-dictation", line 747, in main_begin
touch(path_to_cookie)
File "./nerd-dictation", line 65, in touch
os.utime(filepath, None)
PermissionError: [Errno 13] Permission denied
I tried to change the ownership of the main folder and model/ folder so they belong to my current user, but I still get the error.
I notice the error mention a "path_to_cookie" but I have no idea of what path it could be.
It would be nice to have an option for single-pass word-by-word processing (or similar) in order to enable continuous listening to commands without emitting them multiple times (c.f. #17).
To avoid undesired behavior in continuous listening the option to Add '--commands' command line argument to restrict input to a limited set of commands (#3) could also be a solution here.
Hello,
Thanks for this tool which improves the usage of vosk.
I have done some tests in French. What surprised me is that the dictation before the first output seems to does take into account the keyboard layout.
For example I got:
Ceci est un essqi
instead of
Ceci est un essai
which is what I said.
Next sentences are well transcribed.
I didn't yet explore the code.
When I run the program by assigning the command "nerd-dictation begin --timeout 1 --numbers-as-digits --numbers-use-separator"
to a custom Keyboard Shortcut on Ubuntu 20.04 it seems to be freezing every single time. Any fixes for this?
It seems to behave like a memory leak, It completely crashes the OS.
Not sure why, but all of my sentences when using the large english model, appear as if delimited with the word 'the'
the that is a success the
xdotool makes the program crash when special characters are introduced, a workaround for now is to use unidecode to get rid of the special characters, i use spanish so i will need to proofread this more troughly, anyways, this is the workaround, although a better solution would be to have a tool that allows utf-8 characters or other languages. I already tried the workaround with accented words 'á' and the special character 'ñ' , the accent is removed and the ñ is replaced with n.
Thanks for the tool, i was having problems finding something and i was thinking on doing exactly this, i need to make a big document in a short time and without this tool it might have been very difficult, if i can afterwards i will look for a complete solution instead of a workaround.
pip install unidecode
from unidecode import unidecode
def run_xdotool(subcommand: str, payload: List[str]) -> None:
cmd_base = "xdotool"
payload = [unidecode(word) for word in payload]
I have to copy the model to "~/.config/nerd-dictation", it doesn't test for the "model" folder on the script dir.
So I want to respeak my live recorded speech.
That means: mic -> text -> sound. Or in another words: Speech to Text
and then Text to Speech
.
The part for converting sounds from the microphone to text I achieve it thanks to nerd-dictation.
The part for converting text to sound again I want to implement it thanks to festival.
1 - I have sort of added a new output method to nerd-dictation. I call it file
because it's meant to go into a file.
My current work can be found at https://github.com/ruckard/nerd-dictation/tree/speech_to_file_v2 . As you can see I have not added a new option for this mode because I'm not sure if it's worth it.
The current way that I run nerd-dictation is like this:
./nerd-dictation begin --vosk-model-dir=/home/playg/vosk-models/vosk-model-small-es-0.22 --full-sentence --punctuate-from-previous-timeout 1 --idle-time 0.5 --continuous --timeout 0.5 --output=STDOUT > /tmp/output_test_file.txt
.
Then I just tail -f /tmp/output_test_file.txt
.
2 - The current changes ( ruckard@5acbd54 ) abuse the timeout option so that instead of exiting the program it process the audio again and gives me another sentence. It also makes sure not to output new text if there is nothing else said.
The idea is to read every line (after \n
is issued) and reproduce it thanks to festival.
3 - Anyways in the end I have three questions for you:
Thank you very much for your feedback.
this should use pypy3 to launch nerd-dictation instead of python3. i got pypy3 to work but i did not implement as an executable like python3. it seemingly gives a performance increase although i still have some lock up when i wait for the text to process. another alternative is to use numba but i haven't looked into this or whether it is possible nor have i profiled the code to see where this would be most appropriate.
thank you i wrote this issue using nerd dictation with minor editing.
I would like to reduce the typing speed and limit text prediction period. It should look like 75-90 words per minute. A major issue is that I will dictate a sentence then it will backspace a big portion of that sentence replacing wit with something else that isn't as accurate the first time around. Please advise what file I should change. Thank you.
UPDATE:
Modifying xdotool parameters can slow down the backspaces (set to 50ms)...
def simulate_backspace_presses(count: int, simulate_input_tool: str) -> None:
cmd = [simulate_input_tool.lower()]
if simulate_input_tool == "XDOTOOL":
cmd += ["key", "--delay", "50"] + ["BackSpace"] * count
... and typing speed (set to 100ms)
def simulate_typing(text: str, simulate_input_tool: str) -> None:
cmd = [simulate_input_tool.lower()]
if simulate_input_tool == "XDOTOOL":
cmd += ["type", "--clearmodifiers", "--delay", "100", text]
Would still like to know how to prevent the program from backspacing most of a completed sentence only to reprint the a slightly less accurate replacement.
PR #17 is a Proof of Concept of how speech to commands could be supported. The idea is the following:
WORD_CMD_MAP
) or to the command name reserved for dictation ("type"
); retry if no match (resetting everything).WORD_CMD_MAP
) until a full command is identified; then, launch it and reset nerd-dictationFor this workflow, nerd-dictation should provide a reset function that can be called from within the configuration script (nerd-dictation.py
). Moreover, it could be very useful if this reset function accepted a command name which could then be passed further to nerd_dictation_process
as an optional argument. When this optional argument is given, then the first step above could be skipped, e.g., one could directly enter into the dictation mode just like before (and avoid that the first word would be the dictation command name which likely influences the statistical natural language prediction negatively). Furthermore, with little modification this would allow also to easily pass freely dictated arguments to certain commands. Finally, the whole workflow enables continuous listening to commands avoiding a reloading of the VOSK model and commands are only emitted if securely identified as the first word (and the following would not be needed: Add '--commands' command line argument to restrict input to a limited set of commands
(#3) ).
Now, while this seems all very straightforward, there is one crucial issue to solve in order to enable efficient speech to commands: commands seem to be very badly recognized by the normal VOSK natural language model(s) (at least in my few tests). The model expects as first word, e.g, "hi" and "hello" instead of any random word that we might want to use as command name. As a result, a command name like "right" (right mouseclick) is most of the time recognized as "hi" by the VOSK model. In consequence, I believe it will be necessary to use a different VOSK model for the command recognition then for natural language dictation. I don't know if the "Speaker identification model" (see: https://alphacephei.com/vosk/models) might be of any use; else, one could create a very simple VOSK module based on the command tree dictionary (WORD_CMD_MAP
). For technical details, it would certainly help to learn more about how the VOSK model for command recognition was built for this Android app:
https://realize.be/blog/offline-speech-text-trigger-custom-commands-android-kaldi-and-vosk
alphacep/vosk-api#41
While the creation of a simple VOSK model for command recognition is probably a bit of work, I believe that it would lead to an exceptional model (as it would contain only exactly what should be recognized).
When executing nerd-dictation with or without --continuous option, the load of CPU is still high, of about one complete core.
When stopping the execution with keyboard interrupt, I got always interruption in function exit_fn.
If I try to add:
time.sleep(PROGRESSIVE_CONTINUOUS_SLEEP_WHEN_IDLE)
just before calling return 0
, the CPU usage is low, without big impact on accuracy.
Hi. I have a question that I cannot find the answer to, since I am a beginner in Python. So I use a german model for nerd-dication. Full sentences do not work here, so I have to dictate "Komma" or "Punkt" for comma and full-stop. Now I have configured the user configuration file that those two words are replaced with "," and ".". I need to delete the space in front of those characters. How can I achieve this via the configuration file? Your help is much appreciated.
By the way I solved the "proper noun" thing for me, as I integrated spacy and let it check for nouns and then capitalize the word. It is a bit slow then, but it works.
Thanks for this really great tool!
--full-sentence --punctuate-from-previous-timeout 2
This only capitalizes the first sentence, but not subsequent, plse advise.
Hello,
I have the idea to package nerd-dictation for Pypi.org. I tested adding a setup.py and setup.cfg file.
Thus I tried to consider nerd-dictation file as a module, adding a console script entry.
At this step, I'm facing a problem that the name nerd-dictation is not allowed because of the dash, the name generates syntax error with `import nerd-dictation".
Can it be considered to change the name in nerd_dictation instead of nerd-dictation?
I didn't yet explored another way to not use module/console script, but to install directly the nerd-dictation script.
What do you think about that?
The background idea is to distribute it with easy installation with pip install, and also that elograf can require it as dependency.
Hi,
First, I'd like to congratulate Campbell Barton. Thank you very much for this wonderful script !
Melbourne, Berlin, John, etc. are recognized with lower case first letter. If possible, who could write a script to add to nerd-dictation.py ?
Unfortunately, I can't do it !
Thanks to you.
Hello,
after starting recognition, for any short sentence to be recognized, it takes ages before producing any output.
(by ages, I mean from 30 to 90 seconds; in the meantime, the computer is frozen).
after getting the first output the terminal starts responding again (although not fluidly) and I'm able to stop it.
I am running Ubuntu 20.04.1 on a i7 .3GHz 8 core CPU with 15.4 GB RAM
any suggestion is highly appreciated..
Hi, I love ND!
I'm finally trying to get it integrated with the CLI with Bash aliases, but I'm getting a lot of false positives like "huh".
What is the best way to decrease the sensitivity to these?
First of all thank you for this very useful project.
I need to incorporate punctuation into dictation (in french).
Replace single and multiple words works very well. But I encounter a difficulty with the need to be able to make a point and then return to the line. I am beginner in python ...
The best I've managed to do looks like this :
if text == "point à la ligne":
text = "."
run_xdotool("key", ["Return"])
And by the way make sure that there is no space before the point.
I searched for a while but here I am stuck, thanks in advance.
With some sort of recent upgrade either Ubuntu or LibreOffice the I have noticed that I cannot use nerd-dictation in LibreOffice writer. No text appears. nerd-dictation works fine with chrome or thunderbird windows. It did not used to be this way. I have upgraded from ubuntu 18 to 21.10 recently, so perhaps there was some sort of change with regard to that period maybe there's some sort of security policy that prevents simulated keystrokes? Just a guess. Libreoffice is 7.2.3.2.
"Packages No packages published" is displayed right now, fortunately this pointless section can be removed.
Edit repo page config to remove it (cog next to the description).
I am not making a PR as it is defined in proprietary github settings, not in a git repository - and I have no rights to modify repo settings.
Maybe also remove releases section
BTW, if there is some appearance of new people it may be result of https://news.ycombinator.com/item?id=29972579
Usecase: with audiofile as input
Example flag: --single-word-timestamp
Output: word: hi beginning:1.54,6 end: 1.55,3
First of all thank you for this project, I haven't tried FOSS Speech-to-Text for a while, and I'm pleasantly surprised by the quality of the result of VOSK, and nerd-dictation makes it easily hackable, great!
I have a little issue though, the --numbers-as-digits
doesn't seem to work with French model (the biggest one): when I try I have this result:
% nerd-dictation begin --numbers-as-digits
deux mille vingt-deux un deux trois quatre cinq 6 sept huit neuf dix zéro
(curiously, 6
is output correctly as a number, but it's the only one). Is this feature supported for English only, or is it supposed to work with French too?
I've installed nerd-dictation on Arch Linux from AUR package nerd-dictation-git
.
Thanks!
In situations where only a limited set of commands is needed, it would be useful to pass this list in as an argument.
This has the advantage that dictation could end immediately once a unique command was matched.
It would also allow for fuzzy matching if exact matches could not be found.
Example:
COMMAND="$(nerd-dictation begin --commands=valid_commands.txt --timeout=1.0)"
Hello, I wrote a ydotool alternative called dotool and think it could help.
It is designed to run without root permissions and the daemon is optional.
I wrote it because ydotool and all the required patching was stopping my program (https://numen.johngebbie.com) that depended on it getting packaged on distros.
All the best,
John
EDIT: I have got dotool packaged on Void Linux, hopefully it will be packaged elsewhere soon.
The characters are strangely out of order. Using the vosk-model-en-us-0.22-lgraph.zip
model. Saying "This is a test of the emergency broadcast system" multiple times:
$ ./nerd-dictation begin --vosk-model-dir=./model --timeout=1.0
this i tstfesa o the mycnegeer broactdas ysstem
$ ./nerd-dictation begin --vosk-model-dir=./model --timeout=1.0
tihs is atesoft theem ergencbortsy adca systme
$ ./nerd-dictation begin --vosk-model-dir=./model --timeout=1.0
this is a se oftt the mereg aorbnecscdyta system
The Vosk API test_microphone.py works correctly:
$ python3 test_microphone.py
LOG (VoskAPI:ReadDataFiles():model.cc:213) Decoding params beam=13 max-active=7000 lattice-beam=6
LOG (VoskAPI:ReadDataFiles():model.cc:216) Silence phones 1:2:3:4:5:11:12:13:14:15
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG (VoskAPI:CompileLooped():nnet-compile-looped.cc:345) Spent 0.089 seconds in looped compilation.
LOG (VoskAPI:ReadDataFiles():model.cc:248) Loading i-vector extractor from model/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():model.cc:281) Loading HCL and G from model/graph/HCLr.fst model/graph/Gr.fst
LOG (VoskAPI:ReadDataFiles():model.cc:302) Loading winfo model/graph/phones/word_boundary.int
################################################################################
Press Ctrl+C to stop the recording
################################################################################
{
"partial" : ""
}
<SNIP DUPLICATES>
{
"partial" : "this"
}
{
"partial" : "this"
}
{
"partial" : "this is"
}
{
"partial" : "this is a"
}
{
"partial" : "this is a test of"
}
{
"partial" : "this is a test of"
}
{
"partial" : "this is a test of the"
}
{
"partial" : "this is a test of the emergency"
}
{
"partial" : "this is a test of the emergency broadcast"
}
<SNIP DUPLICATES>
{
"partial" : "this is a test of the emergency broadcast system"
}
<SNIP DUPLICATES>
{
"text" : "this is a test of the emergency broadcast system"
}
{
"partial" : ""
}
<SNIP DUPLICATES>
^C
Done
Hi there,
First of all, congrats on this tool, it's light-weight, simple, customizable, can be executed from emacs, just perfect. I was just wondering whether it was possible to convert a spoken command into a command with modifier keys (C-c C-c typically...!)?
Cheers,
Vian
I am on Arch Linux. After opening a terminal and typing pamac search vosk
I execute pamac install python-vosk
and then I get the error Failed to compile vosk-api
The same error happens if I execute pamac install nerd-dictation-git
Not a super big issue because I have been able to install the code successfully by following the tutorial. But I wonder why pamac is not working for installing these packages.
Is there an easy way to fix missing initial caps? Maybe a keystroke?
I have been using nerd-dictation for a while and it's fantastic - open-source, adaptable, hackable, Python, Linux-friendly :) Thanks!
But there's a strange thing happening: English input works without any problems, German input however nearly always prints nein
(no) or einen
(one, pronoun) at the end of a spoken information chunk. I have no idea why.
System settings:
Linux Mint Cinnamon 20.2 (Ubuntu 20.04.1)
I invoke both via keyboard shortcuts that call a bash script. Commands in the bash script (I skipped the venv-related paths etc)
# German
nerd-dictation begin --vosk-model-dir ~/opt/nerd-dictation/model-de --numbers-as-digits --timeout 5 --punctuate-from-previous-timeout 3
# English
nerd-dictation begin --vosk-model-dir ~/opt/nerd-dictation/model-en-us --numbers-as-digits --timeout 5 --punctuate-from-previous-timeout 3
Models are the full versions: German is vosk-model-de-0.21.zip
, English is vosk-model-en-us-0.22.zip
.
I suppose it might be related to:
I am at a loss how to debug it or discover the origin of the superfluous words. Any ideas, explanations, possibilities?
Curious to look at phoneme data from input speech, would this be part of the speech-to-text pipeline, if so would there be a part of the program I could look to to modify and provide this output?
The current documentation makes it seem like using this command should result in fully punctuated sentences:
nerd-dictation begin --full-sentence --continuous --punctuate-from-previous-timeout=2 --timeout=4
But instead I'm getting something like this:
"Sentence oneSentence two"
Specifically you don't see nerd-dictation begin --help
, which basically contains all the goodies?
I want to use nerd-dictation for processing my photos, basically:
next
previous
delete
promote
)I am not entirely sure what would be the best way to implement this - has anyone did something like that already? Seems a relatively obvious use of actually working voice-to-text.
(maybe using nerd-dictation
is a mistake and I should be using vosk API directly?)
I would like to have support for easily having a separate script and model for languages other than English. I'd gladly contribute this if it's a feature you would like in nerd-dictation.
Currently, you have a config folder where you can place a model, and a Python file nerd-dictation will use by default.
The idea would be to add a new command-line option to choose a config subfolder, e.g. --config-subfolder=fr
. In that case, nerd-dictation would try to find the model and the Python file in .config/nerd-dictation/fr/
instead of .config/nerd-dictation/
.
This would allow users to have a configuration per language (or, I dunno, for different use cases) without adding complexity to the program.
Is this something you'd like to have in nerd-dictation? If so, Is there anything you would like me to do to contribute this? Documentation, specific cli option name, example configuration in another language...
This is a feature request. In Dragon, one can capitalize any word by saying "cap" before it, so saying "cap nerd cap diction" outputs "Nerd Dictation". It would be nice to have this functionality in nerd-dictation.
It would also be nice if the command word was configurable. Although "cap" was short and intuitive, it really wasn't the best choice since it is a word one actually uses fairly often.
Russian input lags entire interface. But some programs (Blender for example) don't lag at all (also Blender usually launched in fullscreen). English input works fine. Model: "vosk-model-small-ru-0.22"
It would be good to support a --config=filepath
argument so each command can specify a different configuration to use.
This would allow different use cases based on whoever launches the command, where 1 call could be used for dictation another call might be used for home-assistant actions (just as an example).
Thank you for writing this interesting project. It's running, but it's spitting out a lot of garbage along with the text.
❯ ./nerd-dictation begin
0.09997663497924805
0.09870014190673829
0.09955344200134278
0.09974346160888672
0.09971175193786622
0.0929502010345459
0.09946784973144532
0.09947595596313477
0.0925527572631836
0.09944138526916504
0.09245476722717286
0.09949836730957032
0.09236202239990235
0.09945592880249024
0.09939346313476563
0.0923090934753418
0.09901008605957032
THIS0.09907612800598145
IS0.039521551132202154
0.09932670593261719
0.09929046630859376
ANOTHER0.07741460800170899
0.09929213523864747
0.09936389923095704
TERRORIST0.015120840072631841
0.09926352500915528
ST0.09925565719604493
0.0896986484527588
0.09934697151184083
0.09947404861450196
0.09257588386535645
0.09938035011291504
0.09136066436767579
0.09934458732604981
0.06850967407226563
0.09943637847900391
0.09936747550964356
0.09154710769653321
0.09944114685058594
0.09195122718811036
0.09947142601013184
How do I suppress all these logits?
I have created for my usage a launcher which displays an icon in the systray and on which action launches or stop nerd-dictation.
What is working uses PyQt and the icon acts as a toggle button. Memory impact seems to be 17 Mo.
I have also tried pystray. It works but with some caveats: in my LXQt, the icon is not displayed, but a gear instead. It acts by displaying a contextual menu, thus needs two actions, I did find how to catch a single click on the icon. And on a recent LXQt, it doesn't work at all because of an error on DBus. Memory impact seems to be 13 Mo.
I didn't check the size of what is pulled as requirement.
This is not yet ready for large usage, because I have no installation system, and paths are hardcoded.
But do you mean it's something to add to nerd-dictation or to put in another place ?
First off - thank you. This is precisely what I have been looking for. Great work here!
I want to ensure that the program is using the right microphone - I want to make sure it uses the external one, not the one on my laptop. Running pactl list
gives me a WHOLE slew of stuff, but I think this is the chunk I'm most interested in, since it lists my external microphone:
Card #2
Name: alsa_card.usb-BLUE_MICROPHONE_Blue_Snowball_201603-00
Driver: module-alsa-card.c
Owner Module: 28
Properties:
alsa.card = "1"
alsa.card_name = "Blue Snowball"
alsa.long_card_name = "BLUE MICROPHONE Blue Snowball at usb-0000:00:14.0-3, full speed"
alsa.driver_name = "snd_usb_audio"
device.bus_path = "pci-0000:00:14.0-usb-0:3:1.0"
sysfs.path = "/devices/pci0000:00/0000:00:14.0/usb1/1-3/1-3:1.0/sound/card1"
udev.id = "usb-BLUE_MICROPHONE_Blue_Snowball_201603-00"
device.bus = "usb"
device.vendor.id = "0d8c"
device.vendor.name = "C-Media Electronics, Inc."
device.product.id = "0005"
device.product.name = "Blue Snowball"
device.serial = "BLUE_MICROPHONE_Blue_Snowball_201603"
device.string = "1"
device.description = "Blue Snowball"
module-udev-detect.discovered = "1"
device.icon_name = "audio-card-usb"
Profiles:
input:mono-fallback: Mono Input (sinks: 0, sources: 1, priority: 1, available: yes)
input:multichannel-input: Multichannel Input (sinks: 0, sources: 1, priority: 1, available: yes)
off: Off (sinks: 0, sources: 0, priority: 0, available: yes)
Active Profile: input:mono-fallback
Ports:
analog-input-mic: Microphone (priority: 8700, latency offset: 0 usec)
Properties:
device.icon_name = "audio-input-microphone"
Part of profile(s): input:mono-fallback
multichannel-input: Multichannel Input (priority: 0, latency offset: 0 usec)
Part of profile(s): input:multichannel-input
I have tried feeding the "Name" value (alsa_card.usb-BLUE_MICROPHONE_Blue_Snowball_201603-00), the udev.id, and the device.icon_name (longshot) into the CLI, each time getting the error Stream error: No such entity
. If I don't include the --pulse-device-name
, dictation works fine, but I want to ensure it's getting the best input possible.
Which of the values from the pactl list
output should we use for that flag? Or is there another value further up in the stream - i.e. not "Card #2' - that I should be looking at?
Thanks!
Instead of using a keyboard shortcut, I would like to use 2 keywords to begin and end dictation. Meaning that nerd-dictation will always listen to microphone and when recognised: listen on = start typing and listen off, stop typing (xdotool).
How to implement that?
Hi, when starting nerd dictation, I get the following message
Stream error: No such entity
When speaking, no text shows.
I know this might be beyond the simplicity of this program, but it would be awesome to have at least an icon in the taskbar, that way i could put the timeout in the shortcut and know when it has timed out, it would be a quality of life improvement, although it is non essential.
so far there is "begin", "end" and "cancel" - and it is wonderful. but I sometimes struggle to find words and start mumbling and I do not want that to be transcribed. I just mute the mic now, but that results in fragments which are highly annoying see #26. since this is due to vosk, a nice work around would be a "pause" mode of the input that I can set a key binding to.
maybe it is even possible to change the vosk model during "pause" mode? so one could switch languages..
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.