chigkim / vocr Goto Github PK

License: GNU General Public License v3.0

Shell 0.21% Swift 96.27% Python 3.52%

vocr's Introduction

Enhancing Accessibility with Seamless Screen Recognition

Welcome to VOCR

Discover the cutting-edge capabilities of VOCR, your ultimate OCR and AI-powered screen recognition tool designed to enhance your digital accessibility experience. Beyond the simple navigation feature with OCR, VOCR seamlessly integrates with VoiceOver, enabling users to effortlessly capture and recognize screen content with intuitive and customizable shortcuts. With features like Real-Time OCR, users can continuously monitor and read live content, such as subtitles. The ASK AI functionality allows you to leverage advanced AI models, including OpenAI GPT to ask detailed questions about images and receive insightful answers. It also supports local vision language models via Ollama for your privacy. Explore with AI takes it a step further by analyzing images, identifying different areas, and providing comprehensive descriptions.

VOCR's robust suite of features offers unparalleled control and precision, making it an indispensable tool for users seeking a seamless, efficient, and highly functional OCR solution. Whether you're navigating inaccessible applications or curious about images, VOCR empowers you to do more with ease and confidence.

WARNING: USE AT YOUR OWN RISK!

VOCR is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY, expressed or implied, of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Please see the GNU General Public License for more details.

Download

Here is the direct link to download VOCR v2.0.1.

Setup

To ensure VOCR works properly, it is crucial to follow every step precisely. Missing even one step could prevent VOCR from functioning correctly.

After uncompressing the downloaded zip file, move the application to your Applications folder and run it.
Confirm VOCR is running in the menu bar by pressing vo+m twice.
In VoiceOver Utility, under the General category, check the box for "Allow VoiceOver to be controlled with AppleScript."
If active, turn off the screen curtain with vo+shift+f11. Note that the screen curtain must be off for the app to work properly.
Hide VoiceOver visuals with vo+command+f11 if they are displayed. If not hidden, elements like the VoiceOver caption panel will be recognized along with other screen content.
Press command+shift+control+w. You should receive a notification asking for accessibility permission. If VoiceOver does not automatically focus on the window, press vo+f1 twice to display the list of currently running apps; the system dialog should be in this list.
After granting accessibility permission, press command+shift+control+w again to receive a notification requesting permission for VOCR to take a screenshot. If you do not receive the alert, locate the system dialog as described previously.
If you cannot locate the system dialog, go to System Settings, Privacy & Security, then choose Screen Recording, and find the VOCR app.
After granting accessibility permission, restart the app as prompted.
Verify the app is in the menu bar by pressing vo+m twice.
Press command+shift+control+w. You should hear a beep and a voice prompt saying "finished."
You can now navigate the recognized results using command+control+arrows. Refer to the shortcuts section below for more information.
When navigating results for the first time, an alert will prompt you to allow VOCR to control VoiceOver for speaking announcements.
Press Escape to exit VOCR's navigation mode and free up navigation shortcuts.

OCR VoiceOver Cursor

This feature is useful for capturing specific portions of a screen, such as a video player on a webpage or images on social media.

Move your VoiceOver cursor to the element you want to recognize.
Press command+shift+control+v.
- The first time you use this feature, you will receive an alert to allow VOCR to run AppleScript.
After granting permission, press command+shift+control+v again.

Real-Time OCR

Press Command+Shift+Control+R after scanning a window or using VOCursor to start or stop real-time OCR. When activated, VOCR will continuously scan and report only new content. This is useful for reading live content such as subtitles.

Setup AI Model

You can host your own vision language model using Ollama or utilize OpenAI GPT to ask questions about images captured with VOCR.

To use the OpenAI GPT model:

Purchase API credits for your account.
Create an OpenAI API key.
Enter your OpenAI API key in the VOCR Menu: Settings > Engine > OpenAI API Key.

Note: It may take several hours for your API to become active after purchasing credits.

The usage cost from VOCR is an estimate. For the official usage and cost, please refer to the Usage Dashboard on OpenAI website.

To utilize a local vision language model with Ollama:

Ollama is free and private, but it is less accurate and requires a lot of computing power. I recommend M1 chip or later with minimum 16GB memory.

Download and install Ollama.
Download a multimodal (vision-language) model by executing the following command in your terminal:
```
ollama pull llava
```

Note that there are also llava:13b and llava:34b models, which offer higher accuracy but require more storage, memory, and computing power.

You may also want to try a related app called VOLlama. It is an accessible chat client for Ollama, allowing you to easily interact with an open-source large language model that runs locally on your computer.

ASK AI

After the setting up OpenAI and/or Ollama:

Choose Ollama or GPT in VOCR Menu > Settings > Engine.
Scan a window/VOCursor or capture an image from a camera.
Press Command+Shift+Control+A to ask the selected model a question about the image.

The response will be copied to the clipboard so you can review in case you miss it.

Also you can select an image file in Finder, bring up the contextual menu with VO+Shift+M, go to 'Open with,' and choose VOCR to ask a question about the image.

Explore with AI

Choose GPT in the VOCR Menu > Settings > Engine.
Provide your OpenAI API key in VOCR Menu > Settings > Engine > OpenAI API Key.
Scan a window or use VOCursor.
Press Command+Shift+Control+E.

VOCR will ask GPT to analyze the image, identify various areas, and describe the contents of each. You can navigate the results using the shortcuts Command + Control + Arrows.

Note: This feature is experimental and often produces inaccurate descriptions of locations and content.

Global Shortcuts

These shortcuts work at all times:

VOCR Menu: Command+Shift+Control+S
OCR Window: Command+Shift+Control+W
OCR VoiceOver Cursor: Command+Shift+Control+V
Camera Capture: Command+Shift+Control+C
Toggle Real-Time OCR: Command+Shift+Control+R
Ask AI: Command+Shift+Control+A
Explore with AI: Command+Shift+Control+E

Navigation Shortcuts

These shortcuts only work when navigation is active after a scan:

Move down/up: Command+Control+Down/Up Arrow
Move left/right: Command+Control+Left/Right Arrow
Previous/next character: Command+Shift+Control+Left/Right Arrow
Go to top/bottom: Command+Control+Page Up/Down
Go to beginning/end horizontally: Command+Control+Home/End
Exit navigation: Escape
Location: Command+Control+L (Reports current coordinates)
Identify Object: Command+Control+I (Identifies current object with AI when object detection is enabled in settings)

Settings

Access the VOCR Menu with Command+Control+Shift+S. This menu contains all settings and operations.

Target Window: Allows you to scan a different window than the current one.
Autoscan: Automatically scans after clicking an item with VO+Shift+Space.
Detect Object: Locates objects with no text such as icons.
Use Last Prompt: Reuses the last prompt when asking AI with Command+Shift+Control+A.
Move Mouse: Moves the mouse cursor when you navigate.
Positional Audio: Provides audio feedback as the mouse cursor moves. Frequency changes correspond to vertical location, and audio panning corresponds to horizontal position. If you don't hear the audio feedback, go to Settings > Sound Output.
Reset Position: When disabled, the cursor will not reset to the top-left corner after every new scan.
Launch on Login: Automatically runs VOCR when you log in.
Log: Starts writing logs to VOCR.txt in your Documents folder.
Sound Output: Choose a sound device for audio positional feedback.
Choose Camera: Select the camera to use for capturing an image.
Shortcuts: Customize shortcuts.
Engine: Choose between GPT or Ollama.

Note that Llama.cpp temporarily suspended support for the vision language model on their server.

Operation

When you open the VOCR menu, few operations are available after a scan:

Save Last Image
Save OCR Result
Updates

Troubleshooting

If you hear "nothing found" you likely need to turn off the VoiceOver screen curtain with vo+shift+f11 or adjust accessibility and screen recording permissions in System Settings > Privacy & Security.
If you do not hear anything after using the "OCR VoiceOver Cursor" feature, you probably need to grant VOCR permissions to: send Apple Events.

Usually, relaunching VOCR and reissuing the command retriggers the alerts to reappear in the system dialogs as described above.

Lastly, please enjoy using VOCR!

vocr's People

Contributors

Stargazers

Watchers

Forkers

mikidrums vick08 ronnajack thepowerofswift jpabloc jacofrank caxandre2107 okarim9 kevinu fandango64 pianist211 hnguyenly rrishu0thakurr blindguynw ssawczyn kchro3

vocr's Issues

Error with OCR Front Window Macro

After Scanning, the following Error Appears:
/var/folders/7c/nl8j8s0n5ps7v52bdn7wr0hw0000gn/T/Keyboard-Maestro-Script-5281AD59-443A-4FDF-9EC4-448192CAD144:2360:2392: execution error: Error on line 72: ReferenceError: Can't find variable: str (-2700)
I downloaded the newest commit and replaced the old Macros.

Suggestion: Allow Moving the Mouse Free-hand In Addition to From REsult to Result

Hi there,

One of the programs I miss on Mac OS is a simple add-on called Golden Cursor. It basically allows, among other things, the user to move the mouse pixel by pixel across the screen. This is occasionally useful in playing games, which don't expose control information but which VOCR recognizes very well.

I suggest that some hotkeys be added to allow the mouse to move arbitrarily across the screen. This would help in cases where clicking directly on the text itself is impractical.

The other major functionality of Golden Cursor was an ability to bookmark certain parts of the screen, that is, certain pixel locations, and return to them on demand. I don't know how practical that is but it would be very nice to have as well.

I realize this isn't really an OCR problem, but the two tools are used so often together in my use case that I think of them as almost in separable.

Thanks for your work on this.

VOCR can't OCR full-screened windows

It seems since installing the M1 universal version, you can no longer VOCR full-screen apps. I have confirmation that this was possible on the older Intel version.

VOCR keeps launching at login even if i disable the option in the menu extra.

hello.
once i disabled this option and i restarted my mac the app keeps launching. not sure if it's a system thing, but i wonder if can you get a work around to take it off the login items once i disable the option.

VOCR and Mac OS 12.x

Is VOCR working with the current Mac OS release 12.x?

I've done the upgrade a few days ago and VOCR does not longer work. To make sure if this not related toy my system it would be good to know if VOCR still works for your on Mac OS Monterey.

Image OCR doesn't work with MacOS 10.16 beta 3

I have following all steps. The screen OCR works but the image OCR not. I think VOCR has no access to the desktop folder. In the "system preferences" -> "security" -> "personal data" -> "files and folder" is no "VOCR" entry.

Some feedback

I love this, it works flawlessly and is super useful. This is one of the features that has been sadly missing from Voiceover for too long. Thank you for taking the time to develop it.

I would like to see the ability to change the keyboard shortcut. CMD Shift O is the same shortcut as used to open the documents folder in the finder and to perform tasks in many other applications.

Also, I'm not sure if this is possible. But it would be great to have a command to OCR only the currently focused element - an image for example.

Add option to take screenshot from the VOCR menu extras menu

Would it be possible to add a menu option under the VOCR menu to take the screenshot from there?

This might help with newcomers to make it more discoverable, and also those - like me - that use VOCR only occasionally who maybe are a bit forgetful and don't really need a shortcut key, or might be struggling with all the buttons needed to execute it.

I have only just started using VOCR, but it did allow mt to successfuly use an otherwise inaccessible part of an app, so I can see it being a hugely useful part of my toolbox. So, thank you so much for this.

Error with Macro Front Window xyz

Now the following error appears, after I made a Screenshot.
/var/folders/7c/nl8j8s0n5ps7v52bdn7wr0hw0000gn/T/Keyboard-Maestro-Script-D1F8938A-F77B-4A17-B1D0-0B9A6F565984:2020:2050: execution error: Error on line 60: Error: Die Datei wurde nicht gefunden. (-43)

Problem after Scanning the Screen

After Scanning the Screen, this Error appears.

Desktop: 2880,1800,261
Safari - README: 1264,872
Original: 2528, 1744
Enlarged: 2906, 2005
Scale: 2.2990506329113924, 2.2993119266055047

+~—nigname—ne mavro gornrr rom mmaow to matror your sureerr size: 1 or eaxampre ron ymaow —1o tor t ro—ro—mon: Make sure Keyboard Maestro is set to edit macros by selecting "Start Editing Macros" from the View menu.

<p class='ocr_par' id='par_1_4' lang='eng' title="bbox 612 174 1427 205">
 <span class='ocr_line' id='line_1_7' title="bbox 612 174 1427 205; baseline 0 -7; x_size 31; x_descenders 7; x_ascenders 6"><span class='ocrx_word' id='word_1_44' title='bbox 612 174 691 198; x_wconf 90'><strong>Press</strong></span> <span class='ocrx_word' id='word_1_45' title='bbox 700 174 758 205; x_wconf 84'><strong>vo+j</strong></span> <span class='ocrx_word' id='word_1_46' title='bbox 771 174 826 198; x_wconf 91'><strong>until</strong></span> <span class='ocrx_word' id='word_1_47' title='bbox 837 180 887 205; x_wconf 93'><strong>you</strong></span> <span class='ocrx_word' id='word_1_48' title='bbox 897 174 967 205; x_wconf 92' lang='deu'><strong>jump</strong></span> <span class='ocrx_word' id='word_1_49' title='bbox 977 176 1003 198; x_wconf 93' lang='deu'><strong>to</strong></span> <span class='ocrx_word' id='word_1_50' title='bbox 1012 174 1056 198; x_wconf 92' lang='deu'><strong>the</strong></span> <span class='ocrx_word' id='word_1_51' title='bbox 1068 180 1154 198; x_wconf 91' lang='deu'><strong>macro</strong></span> <span class='ocrx_word' id='word_1_52' title='bbox 1165 180 1262 205; x_wconf 91' lang='deu'><strong>groups</strong></span> <span class='ocrx_word' id='word_1_53' title='bbox 1273 174 1345 198; x_wconf 89' lang='deu'><strong>scroll</strong></span> <span class='ocrx_word' id='word_1_54' title='bbox 1357 180 1427 198; x_wconf 88' lang='deu'><strong>area.</strong></span> 
 </span>
</p>

<p class='ocr_par' id='par_1_5' lang='deu' title="bbox 611 224 1523 255">
 <span class='ocr_line' id='line_1_8' title="bbox 611 224 1523 255; baseline 0.001 -7; x_size 32; x_descenders 7; x_ascenders 7"><span class='ocrx_word' id='word_1_55' title='bbox 611 225 735 255; x_wconf 91'><strong>Navigate</strong></span> <span class='ocrx_word' id='word_1_56' title='bbox 745 226 771 249; x_wconf 93'><strong>to</strong></span> <span class='ocrx_word' id='word_1_57' title='bbox 781 224 824 249; x_wconf 89'><strong>the</strong></span> <span class='ocrx_word' id='word_1_58' title='bbox 836 224 926 249; x_wconf 90'><strong>VOCR</strong></span> <span class='ocrx_word' id='word_1_59' title='bbox 938 230 1025 249; x_wconf 90'><strong>macro</strong></span> <span class='ocrx_word' id='word_1_60' title='bbox 1036 230 1116 255; x_wconf 92'><strong>group</strong></span> <span class='ocrx_word' id='word_1_61' title='bbox 1127 224 1178 249; x_wconf 91'><strong>and</strong></span> <span class='ocrx_word' id='word_1_62' title='bbox 1190 225 1273 249; x_wconf 89'><strong>select</strong></span> <span class='ocrx_word' id='word_1_63' title='bbox 1284 225 1297 248; x_wconf 93'><strong>it</strong></span> <span class='ocrx_word' id='word_1_64' title='bbox 1307 224 1362 248; x_wconf 92'><strong>with</strong></span> <span class='ocrx_word' id='word_1_65' title='bbox 1373 224 1523 255; x_wconf 89'><strong>vo+Space.</strong></span> 
 </span>
</p>

<p class='ocr_par' id='par_1_6' lang='deu' title="bbox 612 275 1242 306">
 <span class='ocr_line' id='line_1_9' title="bbox 612 275 1242 306; baseline 0 -7; x_size 31; x_descenders 7; x_ascenders 6"><span class='ocrx_word' id='word_1_66' title='bbox 612 275 691 299; x_wconf 92'><strong>Press</strong></span> <span class='ocrx_word' id='word_1_67' title='bbox 700 275 758 306; x_wconf 24'><strong>vo+</strong></span> <span class='ocrx_word' id='word_1_68' title='bbox 769 277 795 299; x_wconf 93'><strong>to</strong></span> <span class='ocrx_word' id='word_1_69' title='bbox 804 275 874 306; x_wconf 92'><strong>jump</strong></span> <span class='ocrx_word' id='word_1_70' title='bbox 884 277 909 299; x_wconf 93'><strong>to</strong></span> <span class='ocrx_word' id='word_1_71' title='bbox 920 275 963 299; x_wconf 91'><strong>the</strong></span> <span class='ocrx_word' id='word_1_72' title='bbox 975 275 1078 299; x_wconf 91'><strong>Macros</strong></span> <span class='ocrx_word' id='word_1_73' title='bbox 1088 275 1161 299; x_wconf 90' lang='eng'><strong>scroll</strong></span> <span class='ocrx_word' id='word_1_74' title='bbox 1172 281 1242 299; x_wconf 91' lang='eng'><strong>area.</strong></span> 
 </span>
</p>

Navigate to the OCR FrontWindow 27 macro, and select it with vo+Space Bar.

Press vo+ to jump to the Macro edit detail scroll area.

Press vo+Right Arrow to move to the macro name edit field. o Delete the existing number 27 and type the number that matches your screen size in inches.

Getting Started

« Try performing OCR on the Keyboard Maestro Editor window first before trying it in other applications.

<p class='ocr_par' id='par_1_19' lang='deu' title="bbox 505 748 1286 773">
 <span class='ocr_line' id='line_1_23' title="bbox 505 748 1286 773; baseline 0 0; x_size 31.68919; x_descenders 6.689189; x_ascenders 7"><span class='ocrx_word' id='word_1_153' title='bbox 505 758 516 769; x_wconf 72'><strong>+</strong></span> <span class='ocrx_word' id='word_1_154' title='bbox 543 749 617 773; x_wconf 77'><strong>Make</strong></span> <span class='ocrx_word' id='word_1_155' title='bbox 629 755 689 773; x_wconf 91'><strong>sure</strong></span> <span class='ocrx_word' id='word_1_156' title='bbox 699 749 743 773; x_wconf 91' lang='eng'><strong>the</strong></span> <span class='ocrx_word' id='word_1_157' title='bbox 754 755 848 773; x_wconf 91' lang='eng'><strong>screen</strong></span> <span class='ocrx_word' id='word_1_158' title='bbox 860 749 953 773; x_wconf 91' lang='eng'><strong>curtain</strong></span> <span class='ocrx_word' id='word_1_159' title='bbox 966 749 987 773; x_wconf 91' lang='eng'><strong>is</strong></span> <span class='ocrx_word' id='word_1_160' title='bbox 997 748 1031 773; x_wconf 91' lang='eng'><strong>off</strong></span> <span class='ocrx_word' id='word_1_161' title='bbox 1040 749 1096 773; x_wconf 92' lang='eng'><strong>with</strong></span> <span class='ocrx_word' id='word_1_162' title='bbox 1107 748 1286 773; x_wconf 91' lang='eng'><strong>vo+shift+f11.</strong></span> 
 </span>
</p>

<p class='ocr_par' id='par_1_20' lang='deu' title="bbox 505 799 2377 881">
 <span class='ocr_line' id='line_1_24' title="bbox 505 799 2377 830; baseline 0 -7; x_size 30; x_descenders 6; x_ascenders 6"><span class='ocrx_word' id='word_1_163' title='bbox 505 808 516 820; x_wconf 85'><strong>«</strong></span> <span class='ocrx_word' id='word_1_164' title='bbox 543 799 622 824; x_wconf 83'><strong>Press</strong></span> <span class='ocrx_word' id='word_1_165' title='bbox 632 799 1001 824; x_wconf 90'><strong>command+control+shift+o</strong></span> <span class='ocrx_word' id='word_1_166' title='bbox 1013 799 1063 824; x_wconf 90'><strong>and</strong></span> <span class='ocrx_word' id='word_1_167' title='bbox 1075 799 1131 824; x_wconf 92'><strong>wait</strong></span> <span class='ocrx_word' id='word_1_168' title='bbox 1140 799 1178 824; x_wconf 92'><strong>for</strong></span> <span class='ocrx_word' id='word_1_169' title='bbox 1187 799 1230 824; x_wconf 91'><strong>the</strong></span> <span class='ocrx_word' id='word_1_170' title='bbox 1241 799 1311 824; x_wconf 90'><strong>OCR</strong></span> <span class='ocrx_word' id='word_1_171' title='bbox 1323 805 1433 830; x_wconf 91'><strong>process</strong></span> <span class='ocrx_word' id='word_1_172' title='bbox 1443 801 1468 824; x_wconf 93'><strong>to</strong></span> <span class='ocrx_word' id='word_1_173' title='bbox 1478 799 1560 824; x_wconf 91'><strong>finish.</strong></span> <span class='ocrx_word' id='word_1_174' title='bbox 1572 799 1663 824; x_wconf 90'><strong>VOCR</strong></span> <span class='ocrx_word' id='word_1_175' title='bbox 1673 799 1716 823; x_wconf 90'><strong>will</strong></span> <span class='ocrx_word' id='word_1_176' title='bbox 1729 801 1827 830; x_wconf 91'><strong>prompt</strong></span> <span class='ocrx_word' id='word_1_177' title='bbox 1836 805 1886 830; x_wconf 92'><strong>you</strong></span> <span class='ocrx_word' id='word_1_178' title='bbox 1897 799 1953 823; x_wconf 91' lang='eng'><strong>with</strong></span> <span class='ocrx_word' id='word_1_179' title='bbox 1964 805 1981 824; x_wconf 91' lang='eng'><strong>a</strong></span> <span class='ocrx_word' id='word_1_180' title='bbox 1991 799 2104 830; x_wconf 88' lang='eng'><strong>&quot;Ready&quot;</strong></span> <span class='ocrx_word' id='word_1_181' title='bbox 2116 805 2244 830; x_wconf 88' lang='eng'><strong>message</strong></span> <span class='ocrx_word' id='word_1_182' title='bbox 2254 799 2330 824; x_wconf 90'><strong>when</strong></span> <span class='ocrx_word' id='word_1_183' title='bbox 2343 799 2377 824; x_wconf 92' lang='eng'><strong>it‘s</strong></span> 
 </span>
 <span class='ocr_line' id='line_1_25' title="bbox 541 849 940 881; baseline 0 -7; x_size 32; x_descenders 7; x_ascenders 7"><span class='ocrx_word' id='word_1_184' title='bbox 541 850 611 874; x_wconf 90' lang='eng'><strong>done</strong></span> <span class='ocrx_word' id='word_1_185' title='bbox 621 850 676 874; x_wconf 92' lang='eng'><strong>with</strong></span> <span class='ocrx_word' id='word_1_186' title='bbox 687 850 730 874; x_wconf 91' lang='eng'><strong>the</strong></span> <span class='ocrx_word' id='word_1_187' title='bbox 741 849 811 874; x_wconf 90' lang='eng'><strong>OCR</strong></span> <span class='ocrx_word' id='word_1_188' title='bbox 823 856 940 881; x_wconf 88' lang='eng'><strong>process.</strong></span> 
 </span>
</p>

« Press to read the result and press vo+shift+space to click.

* To choose different language, press vo+shift+l. Vo+command+return will allow you to selecte more than one language.

NOTE: Keep in mind that many app interfaces use icons, and Tesseract may recognize them as weird symbols. For example, a right larrow might appear as a ”> than” sign and a left arrow as a "< than" sign. Tesseract may also ignore icons entirely that cannot be recognized as a character or symbol.

Reporting Issues

GitHub provides a convenient and reliable way to track and resolve issues. Please click here, and search for your issue. If you don‘t find an open issue relating to your problem, you can create a new one by clicking on "new issue" and filling out the required fields.

Generating A Report

When troubleshooting a problem, it might occasionally be necessary to have Keyboard Maestro generate a report when a script fails. By dafonlt VNCBR will innara arrare hiut far diannactin nurnacas it miaht be necessary to provide those results. Follow these StepS to

*n neyvuaru maesuy, navigate w ute un rrunmt vvinuuw anu Click once on the macro.

<p class='ocr_par' id='par_1_31' lang='deu' title="bbox 505 1866 2410 1947">
 <span class='ocr_line' id='line_1_40' title="bbox 505 1866 2410 1897; baseline 0 -7; x_size 31; x_descenders 7; x_ascenders 6"><span class='ocrx_word' id='word_1_383' title='bbox 505 1875 516 1887; x_wconf 87'>«</span> <span class='ocrx_word' id='word_1_384' title='bbox 543 1866 622 1891; x_wconf 87'>Press</span> <span class='ocrx_word' id='word_1_385' title='bbox 631 1866 682 1890; x_wconf 91'>Tab</span> <span class='ocrx_word' id='word_1_386' title='bbox 693 1872 761 1890; x_wconf 88'>once</span> <span class='ocrx_word' id='word_1_387' title='bbox 772 1866 823 1890; x_wconf 90'>and</span> <span class='ocrx_word' id='word_1_388' title='bbox 834 1866 904 1897; x_wconf 89'>you&#39;ll</span> <span class='ocrx_word' id='word_1_389' title='bbox 916 1866 950 1890; x_wconf 91' lang='eng'>be</span> <span class='ocrx_word' id='word_1_390' title='bbox 961 1866 1053 1897; x_wconf 91' lang='eng'>placed</span> <span class='ocrx_word' id='word_1_391' title='bbox 1065 1866 1086 1890; x_wconf 93' lang='eng'>in</span> <span class='ocrx_word' id='word_1_392' title='bbox 1098 1866 1141 1890; x_wconf 90' lang='eng'>the</span> <span class='ocrx_word' id='word_1_393' title='bbox 1153 1866 1239 1890; x_wconf 92' lang='eng'>Macro</span> <span class='ocrx_word' id='word_1_394' title='bbox 1252 1866 1305 1890; x_wconf 91' lang='eng'>Edit</span> <span class='ocrx_word' id='word_1_395' title='bbox 1316 1866 1394 1890; x_wconf 91' lang='eng'>Detail</span> <span class='ocrx_word' id='word_1_396' title='bbox 1406 1866 1484 1891; x_wconf 91' lang='eng'>Scroll</span> <span class='ocrx_word' id='word_1_397' title='bbox 1495 1866 1569 1890; x_wconf 90' lang='eng'>Area.</span> <span class='ocrx_word' id='word_1_398' title='bbox 1583 1866 1636 1891; x_wconf 92' lang='eng'>Use</span> <span class='ocrx_word' id='word_1_399' title='bbox 1647 1866 1795 1890; x_wconf 91' lang='eng'>VoiceOver</span> <span class='ocrx_word' id='word_1_400' title='bbox 1804 1868 1830 1890; x_wconf 92' lang='eng'>to</span> <span class='ocrx_word' id='word_1_401' title='bbox 1842 1866 1960 1897; x_wconf 92' lang='eng'>navigate</span> <span class='ocrx_word' id='word_1_402' title='bbox 1971 1872 2033 1890; x_wconf 92' lang='eng'>over</span> <span class='ocrx_word' id='word_1_403' title='bbox 2042 1868 2068 1890; x_wconf 91' lang='eng'>to</span> <span class='ocrx_word' id='word_1_404' title='bbox 2078 1866 2121 1890; x_wconf 92' lang='eng'>the</span> <span class='ocrx_word' id='word_1_405' title='bbox 2134 1866 2246 1890; x_wconf 91' lang='eng'>Execute</span> <span class='ocrx_word' id='word_1_406' title='bbox 2257 1866 2324 1890; x_wconf 91' lang='eng'>Java</span> <span class='ocrx_word' id='word_1_407' title='bbox 2335 1866 2410 1897; x_wconf 91' lang='eng'>script</span> 
 </span>
 <span class='ocr_line' id='line_1_41' title="bbox 540 1916 953 1947; baseline 0 -6; x_size 31; x_descenders 6; x_ascenders 7"><span class='ocrx_word' id='word_1_408' title='bbox 540 1916 577 1941; x_wconf 92' lang='eng'>for</span> <span class='ocrx_word' id='word_1_409' title='bbox 587 1917 748 1941; x_wconf 89' lang='eng'>Automation</span> <span class='ocrx_word' id='word_1_410' title='bbox 759 1917 846 1941; x_wconf 91' lang='eng'>Action</span> <span class='ocrx_word' id='word_1_411' title='bbox 858 1916 953 1947; x_wconf 87' lang='eng'>Group.</span> 
 </span>
</p>

<p class='ocr_par' id='par_1_32' lang='deu' title="bbox 505 1967 2387 1998">
 <span class='ocr_line' id='line_1_42' title="bbox 505 1967 2387 1998; baseline 0 -7; x_size 31; x_descenders 7; x_ascenders 6"><span class='ocrx_word' id='word_1_412' title='bbox 505 1976 516 1988; x_wconf 78'>«</span> <span class='ocrx_word' id='word_1_413' title='bbox 543 1968 647 1992; x_wconf 74'>Interact</span> <span class='ocrx_word' id='word_1_414' title='bbox 656 1967 712 1991; x_wconf 92' lang='eng'>with</span> <span class='ocrx_word' id='word_1_415' title='bbox 723 1967 766 1992; x_wconf 92' lang='eng'>the</span> <span class='ocrx_word' id='word_1_416' title='bbox 777 1968 861 1992; x_wconf 92' lang='eng'>action</span> <span class='ocrx_word' id='word_1_417' title='bbox 872 1973 952 1998; x_wconf 92' lang='eng'>group</span> <span class='ocrx_word' id='word_1_418' title='bbox 963 1967 1014 1992; x_wconf 92' lang='eng'>and</span> <span class='ocrx_word' id='word_1_419' title='bbox 1027 1968 1146 1998; x_wconf 91' lang='eng'>navigate</span> <span class='ocrx_word' id='word_1_420' title='bbox 1156 1973 1218 1992; x_wconf 92' lang='eng'>over</span> <span class='ocrx_word' id='word_1_421' title='bbox 1227 1969 1253 1992; x_wconf 92' lang='eng'>to</span> <span class='ocrx_word' id='word_1_422' title='bbox 1263 1967 1307 1992; x_wconf 92' lang='eng'>the</span> <span class='ocrx_word' id='word_1_423' title='bbox 1318 1973 1416 1998; x_wconf 89' lang='eng'>pop—up</span> <span class='ocrx_word' id='word_1_424' title='bbox 1428 1974 1504 1992; x_wconf 89' lang='eng'>menu</span> <span class='ocrx_word' id='word_1_425' title='bbox 1515 1967 1568 1992; x_wconf 91' lang='eng'>that</span> <span class='ocrx_word' id='word_1_426' title='bbox 1578 1973 1650 1998; x_wconf 91' lang='eng'>says,</span> <span class='ocrx_word' id='word_1_427' title='bbox 1663 1967 1763 1998; x_wconf 91' lang='eng'>&quot;Ignore</span> <span class='ocrx_word' id='word_1_428' title='bbox 1776 1967 1890 1992; x_wconf 90' lang='eng'>Results&quot;</span> <span class='ocrx_word' id='word_1_429' title='bbox 1902 1967 1953 1992; x_wconf 92' lang='eng'>and</span> <span class='ocrx_word' id='word_1_430' title='bbox 1965 1967 2068 1998; x_wconf 92' lang='eng'>change</span> <span class='ocrx_word' id='word_1_431' title='bbox 2078 1967 2128 1992; x_wconf 92'>this</span> <span class='ocrx_word' id='word_1_432' title='bbox 2138 1968 2223 1998; x_wconf 91'>option</span> <span class='ocrx_word' id='word_1_433' title='bbox 2234 1969 2260 1992; x_wconf 93'>to</span> <span class='ocrx_word' id='word_1_434' title='bbox 2271 1967 2387 1998; x_wconf 91' lang='eng'>&quot;Display</span> 
 </span>
</p>

Make ctrl command shift w shortcut configurable

Hello,
Is it possible to make the shortcut configurable?
Steam doesn't like this shortcut and it takes precedence over VOCR, so basically I cannot use Steam at all.

Thanks.

Request for Multilingual Support in VOCR App

I hope this message finds you well. I am reaching out to express my appreciation for your work on the VOCR app, which has been a valuable resource for visually impaired individuals. However, I noticed that the app primarily supports English-speaking users, which limits its accessibility to non-English speaking communities.
I believe that extending the language support to include other languages could significantly enhance the app's usability and impact. It would provide non-English speaking visually impaired individuals with a tool to navigate their digital environments more effectively.
Here are a few suggestions on how this could be achieved:

1 .
Translation of the User Interface : Collaborate with volunteer translators or use translation platforms to translate the UI into various languages.
2 .
Multilingual Optical Character Recognition : Incorporate multilingual Optical Character Recognition systems to understand and interpret commands in different languages. Additionally, I suggest leveraging the built-in OCR engine of Mac to ensure support across both Intel and Silicon Mac systems.

I am eager to contribute to this initiative and can assist in coordinating translation efforts or providing feedback on multilingual support.
Thank you for considering my request. I am looking forward to the possibility of making VOCR a more inclusive and accessible tool for visually impaired individuals across the globe.
Warmest regards,

Let us Browse OCR results without Moving MOuse

I find myself using applications which often pull up tooltips when the mouse is over specific parts of the screen, but the act of OCRing for them is tedious because I need to remember where the mouse was and return it to the same spot if I don’t find them on a given OCR run.

NVDA OCR, for instance, allows you to review the OCR results independently of mouse movement, with a command to route the mouse to whatever the last result was.

THis would be useful and could perhaps be a toggle, much like the existing shortcuts for positional audio and position reset.

Error with previous and next line macro

If I use the Macros an error like the following appears.

/var/folders/7c/nl8j8s0n5ps7v52bdn7wr0hw0000gn/T/Keyboard-Maestro-Script-03566237-88BA-4120-87E1-55AA130441F9:150:190: execution error: Error on line 5: SyntaxError: JSON Parse error: Unexpected identifier "OCR" (-2700)

OCR Front Window

If I Scan the Screen, the following Error appears:
/var/folders/7c/nl8j8s0n5ps7v52bdn7wr0hw0000gn/T/Keyboard-Maestro-Script-4BC595B6-AEB3-454B-976B-6C1EA6EC078C:2360:2392: execution error: Error on line 72: ReferenceError: Can't find variable: str (-2700)

The ask feature always describes an image that isn't on the screen

says it all in the title when I try to use the ask feature to see what's on my screen it always describes some image that isn't there. It must be the wallpaper of macOS but it won't read any texts or anything. Just show just describe an image.

force itself on top off steam's shameless keyboard hook.

please, try to make it prioritize itself instead off steam, and make it OCR steam windows

Feature Request: Language Selection via Custom Prompt for Chat GPT in VOCR

Hi chigkim,

Just tried out the new Chat GPT integration in VOCR – what an incredible addition! The potential for this feature is huge, and even from my brief testing, I can see how it's a game-changer.

I'd like to propose the idea of introducing a way to specify the language for Chat GPT's responses. A simple yet powerful option could be to have a text field or prompt within the settings where users can set a standard prompt with the desired language. For example, I would write "Norwegian" to receive responses in my native tongue.

Looking ahead, this prompt-based approach could bring even more value, allowing users to not only set the language but also include instructions on the kind of response they need – like "Give me a high-level overview," or "Please provide detailed descriptions of all interactive elements."

While the primary goal is to enable multilingual support, this flexibility could also pave the way for personalized content analysis – a bonus that might favor a prompt solution over a simple language option.

Seeing how quickly VOCR evolves, adding this feature could make the app an even more indispensable tool for international users, including those of us using VoiceOver.

Appreciate the work you're doing and looking forward to future updates!

All the best,

The prompt window to ask a question is usually include as part of the screenshot

When I execute the function to ask (CTRL +CMD +Shift +A), the dialog that asks for the user's prompt is described as part of the screen. Maybe the screenshot should be obtained before displaying the user prompt dialog.

issue during installing VOCR / install.command

Issue VOCR installation.txt
hi, I've got really no Idea what's going wrong with my System.
System: MacOs 10.13.6, latest security Update (2019.1)
I'm trying to install the VOCR Terminal Installer. At first everything's going right.
But during the Installation suddenly the following line appears:
"Invalid Option --with-all-Languages"
What am I doing wrong? I've used VOCR on my old system, wich i had to reinstall some days ago.
I really appreciate a Tip.
Thanks alot.
The Lines will be found in the Attachment.
Martin

Issue with previous and next word macros

KM reports the following error:

/var/folders/1_/47smkf8908s0tr18sdln0rw00000gp/T/Keyboard-Maestro-Script-58F88724-FD79-4288-9428-3DBE787FA46F:225:247: execution error: Error on line 9: TypeError: undefined is not an object (evaluating 'words.length') (-2700)

Translate VOCR into french and change key

Hi, there,
I'd like to translate VOCR into French, but I don't see a . string file to do it.
Do you plan to translate the application?

And in the documentation you display keyboard shortcuts that do not correspond to the French keyboard ;)

Mathieu

vocr needs update for macos ventura compatibility

hello.
there are some elements from some apps that are recognized just fine in the macos monterey by vocr, but on ventura they don't even show up. i met this issue in the vdj app on the login dialog where on the monterey i see the remember me checkbox and on ventura i don't. i assume there are some ocr things that were updated on ventura.
best regards

Image recognition recognises the whole screen if VoiceOver visuals are turned off, recognises image when they are turned on.

The title says it all: If I turn off VoiceOver visuals, as instructed by the ReadMe, image recognition will perform an OCR of the entire screen. This includes that I can navigate all the lines with CTRL+Cmd+arrows. If I turn them on, which is against the recommendation/instruction from the read me, image recognition will recognise only the image and just speak the description, not OCR the whole screen. But then, of course, window OCR is broken, because VO visuals get in the way.

This is on two M1 Macs, one running Big Sur, the other running Monterey beta.