alekssamos / cloudvision Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 9.0 108 KB

Vision Bot NVDA addon

Home Page: https://visionbot.ru/addon/

License: MIT License

Python 98.83% Smarty 1.01% Batchfile 0.16%

cloudvision's People

Contributors

Stargazers

Watchers

Forkers

zstanecic vovamobile beqabeqa473 keyang556 akash07k jkinnunen edu-mx destranis rayo-alcantar

cloudvision's Issues

As of NVDA 2023.1, the plugin is often buggy.

The following issues listed below will help you temporarily resolve some issues from NVDA 2019.3 to NVDA 2022.4. The solution will be provided after mentioning the issues below.
When the plugin captures the image by pressing NVDA + CTRL + i keys, the plugin can recognize the text 2 to 3 times without problems.

However, after 2 or 3 times, the plugin simply no longer verbalizes the captured text.

When I first want to read previously captured text using NVDA + CTRL +"I I", the plugin allows me to read the text without problems, but after I close the curtain to read the text, the plugin stops working completely.

That being said, the issues mentioned above were resolved as follows.

At the time of writing this solution dated December 27, 2023, it is recommended to use NVDA 2022.4, as this is the latest version that can work without the bugs present as of NVDA 2023.1:

If you want to use an older version, the minimum and recommended version to use with this plugin is 2019.3:

Now, you will be told which functions you need to keep active so that the plugin can work correctly using its keyboard shortcuts:

NVDA+5: activated:

NVDA + 6: disabled:
NVDA + 7: disabled:
NVDA + 8: disabled:

NVDA + m: On:

Also, remember to keep the following function activated in the navigation mode tab indicated in quotes below:

"Enable navigation mode on page load."

If you encounter errors in NVDA versions 2019.3 or 2022.4, it is recommended to disable automatic updates for any plugin, as well as disable the following option for the IBMTTS plugin (indicated in quotes):
"Always send your current settings":

Now, the problems that usually appear from version 2023.1 of NVDA are the following:

Sometimes an error may occur when sending the analysis to the server:
When you run recognition for the first time on some website or program, the text will be displayed complete, but if you change programs or interfaces, the text will be displayed incomplete and the scans will stop before the required time.

Once again I mention that this plugin is currently only recommended for use on NVDA 2019.3 and NVDA 2022.4.

The following comments were for the purpose of finding the solution I just mentioned, so you don't need to read them.

Extending Be My AI functionality to support follow-up questions

@alekssamos Greetings. Now that CloudVision supports Be My AI, would it be possible to enhance the add-on to support the asking of follow-up questions, like what JAWS also does in its May 2024 release, after image recognition is finished?
Thanks.

.

Обновление украинского перевода

Здравствуйте. Я опубликовал в виде pull request обновление перевода дополнения для NVDA на украинский язык, поскольку он содержыт некоторые ошибки. С того времени вышло уже два обновления, но обновленный перевод в дополнение не был включен. Включите его, пожалуйста, и скажите, почему он не был включен.

CloudVision, any chance of extending it to recognize the current screen or window?

@alekssamos First and foremost, thanks for fixing the Be My AI issue in V3.2.0.1.
Now that Be My AI is available, would it be possible to extend CloudVision's functionality in a way that it can become capable of recognizing the currently focused window or screen? This would help a lot with, say, opened images in Telegram, inaccessible apps which display text which cannot be detected via NVDA's cursor/object navigation key strokes, etc. Maybe new hot keys can be added to cover screen/window detection/recognition. The current approach is either object-based or file-based, but my suggestion can expand the usefulness of the add-on.
Thanks.

The recognition result cannot be displayed in NVDA version alpha-21371

NVDA: alpha-21371,376edfcf
Add-on Version: 2.0.3.7

Quickly press NVDA+CTRL+I twice. After the recognition result is returned, an error sound will be heard, and NVDA is forced to restart.

error log:
INFO - main (10:15:59.742) - MainThread (16148):
Starting NVDA version alpha-21371,376edfcf
INFO - core.main (10:15:59.888) - MainThread (16148):
Config dir: C:\Users\manch\AppData\Roaming\nvda
INFO - config.ConfigManager._loadConfig (10:15:59.888) - MainThread (16148):
Loading config: C:\Users\manch\AppData\Roaming\nvda\nvda.ini
INFO - core.main (10:16:00.001) - MainThread (16148):
Using Windows version 10.0.19042 workstation
INFO - core.main (10:16:00.001) - MainThread (16148):
Using Python version 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:01:55) [MSC v.1900 32 bit (Intel)]
INFO - core.main (10:16:00.001) - MainThread (16148):
Using comtypes version 1.1.7
INFO - core.main (10:16:00.002) - MainThread (16148):
Using configobj version 5.1.0 with validate version 1.0.1
INFO - synthDriverHandler.setSynth (10:16:00.504) - MainThread (16148):
Loaded synthDriver vocalizer_expressive2
INFO - core.main (10:16:00.504) - MainThread (16148):
Using wx version 4.0.3 msw (phoenix) wxWidgets 3.0.5 with six version 1.12.0
INFO - brailleInput.initialize (10:16:00.505) - MainThread (16148):
Braille input initialized
INFO - braille.initialize (10:16:00.506) - MainThread (16148):
Using liblouis version 3.15.0
INFO - braille.initialize (10:16:00.509) - MainThread (16148):
Using pySerial version 3.4
INFO - braille.BrailleHandler.setDisplayByName (10:16:00.513) - MainThread (16148):
Loaded braille display driver noBraille, current display has 0 cells.
INFO - core.main (10:16:00.721) - MainThread (16148):
Java Access Bridge support initialized
INFO - _UIAHandler.UIAHandler.MTAThreadFunc (10:16:00.732) - _UIAHandler.UIAHandler.MTAThread (6816):
UIAutomation: IUIAutomation6
DEBUGWARNING - inputCore.InputManager.loadLocaleGestureMap (10:16:00.822) - MainThread (16148):
No locale gesture map for language zh_CN
DEBUGWARNING - touchHandler.touchSupported (10:16:00.858) - MainThread (16148):
No touch devices found
DEBUGWARNING - abstractEngine.AbstractEngineHandler.getEngineList (10:16:00.929) - MainThread (16148):
Engine 'captcha' doesn't pass the check, excluding from list
DEBUGWARNING - abstractEngine.AbstractEngineHandler.getEngineList (10:16:00.931) - MainThread (16148):
Engine 'sougouOCR' doesn't pass the check, excluding from list
DEBUGWARNING - abstractEngine.AbstractEngineHandler.getEngineList (10:16:00.938) - MainThread (16148):
Engine 'machineLearning' doesn't pass the check, excluding from list
IO - external:globalPlugins.fixime.GuessComposition.get_code_by_name (10:16:00.949) - MainThread (16148):
controlconverted to 17
IO - external:globalPlugins.fixime.GuessComposition.get_code_by_name (10:16:00.949) - MainThread (16148):
numpadenterconverted to 13
IO - external:globalPlugins.fixime.GuessComposition.get_code_by_name (10:16:00.949) - MainThread (16148):
enterconverted to 13
INFO - external:globalPlugins.fixime.GlobalPlugin.refreshLayoutString (10:16:00.954) - MainThread (16148):
2052
INFO - external:globalPlugins.fixime.GlobalPlugin.refreshLayoutString (10:16:00.954) - MainThread (16148):
00000804
IO - inputCore.InputManager.executeGesture (10:16:00.958) - winInputHook (15064):
Input: kb(laptop):shift+windows+space
INFO - external:globalPlugins.fixime.GlobalPlugin.refreshLayoutString (10:16:00.961) - MainThread (16148):
2052
INFO - external:globalPlugins.fixime.GlobalPlugin.refreshLayoutString (10:16:00.961) - MainThread (16148):
00000804
IO - inputCore.InputManager.executeGesture (10:16:00.962) - winInputHook (15064):
Input: kb(laptop):windows+space
INFO - external:globalPlugins.fixime.GlobalPlugin.refreshLayoutString (10:16:00.965) - MainThread (16148):
2052
INFO - external:globalPlugins.fixime.GlobalPlugin.refreshLayoutString (10:16:00.965) - MainThread (16148):
00000804
INFO - core.main (10:16:01.128) - MainThread (16148):
NVDA initialized
IO - speech.speak (10:16:01.293) - MainThread (16148):
Speaking ['空格', EndUtteranceCommand()]
IO - speech.speak (10:16:01.295) - MainThread (16148):
Speaking ['空格', EndUtteranceCommand()]
IO - speech.speak (10:16:01.386) - MainThread (16148):
Speaking ['放大', '按钮']
IO - inputCore.InputManager.executeGesture (10:16:01.885) - winInputHook (15064):
Input: kb(laptop):escape
DEBUGWARNING - NVDAObjects.IAccessible.IAccessible.get_IA2States (10:16:01.904) - MainThread (16148):
could not get IAccessible2 states
Traceback (most recent call last):
File "NVDAObjects\IAccessible_init.pyc", line 1633, in _get_IA2States
File "comtypesMonkeyPatches.pyc", line 26, in call
_ctypes.COMError: (-2147417848, '被调用的对象已与其客户端断开连接。', (None, None, None, 0, None))
DEBUGWARNING - NVDAObjects.IAccessible.IAccessible._get_IA2Attributes (10:16:01.905) - MainThread (16148):
IAccessibleObject.attributes COMError (-2147417848, '被调用的对象已与其客户端断开连接。', (None, None, None, 0, None))
DEBUGWARNING - NVDAObjects.IAccessible.IAccessible.get_IAccessibleRole (10:16:01.906) - MainThread (16148):
accRole failed: (-2147417848, '被调用的对象已与其客户端断开连接。', (None, None, None, 0, None))
IO - speech.speak (10:16:01.990) - MainThread (16148):
Speaking ['桌面', '列表']
IO - speech.speak (10:16:01.994) - MainThread (16148):
Speaking ['NVDA']
IO - inputCore.InputManager.executeGesture (10:16:02.333) - winInputHook (15064):
Input: kb(laptop):alt+f4
IO - speech.speak (10:16:02.482) - MainThread (16148):
Speaking ['关闭 Windows', '对话框', '关闭所有应用，然后关闭电脑。']
IO - speech.speak (10:16:02.488) - MainThread (16148):
Speaking ['希望计算机做什么(W)?', '组合框', '关机', '已折叠', 'Alt+w']
IO - inputCore.InputManager.executeGesture (10:16:02.830) - winInputHook (15064):
Input: kb(laptop):alt+f4
IO - speech.speak (10:16:02.882) - MainThread (16148):
Speaking ['桌面', '列表']
IO - speech.speak (10:16:02.884) - MainThread (16148):
Speaking ['NVDA']
IO - inputCore.InputManager.executeGesture (10:16:03.137) - winInputHook (15064):
Input: kb(laptop):alt+tab
IO - speech.speak (10:16:03.211) - MainThread (16148):
Speaking ['亲情永恒']
IO - speech.speak (10:16:03.442) - MainThread (16148):
Speaking ['亲情永恒']
IO - speech.speak (10:16:03.553) - MainThread (16148):
Speaking ['输入', '编辑框', '多行', '空白']
IO - inputCore.InputManager.executeGesture (10:16:03.746) - winInputHook (15064):
Input: kb(laptop):alt+f4
IO - speech.speak (10:16:03.823) - MainThread (16148):
Speaking ['桌面', '列表']
IO - speech.speak (10:16:03.831) - MainThread (16148):
Speaking ['NVDA']
IO - inputCore.InputManager.executeGesture (10:16:04.903) - winInputHook (15064):
Input: kb(laptop):control+NVDA+i
IO - speech.speak (10:16:04.907) - MainThread (16148):
Speaking ['识别查看对象']
DEBUGWARNING - Python warning (10:16:04.908) - MainThread (16148):
C:\Users\manch\AppData\Roaming\nvda\addons\CloudVision\globalPlugins\CloudVision_init.py:198: wxPyDeprecationWarning: Call to deprecated item EmptyBitmap. Use :class:wx.Bitmap instead
bmp = wx.EmptyBitmap(width, height)
IO - inputCore.InputManager.executeGesture (10:16:05.049) - winInputHook (15064):
Input: kb(laptop):control+NVDA+i
INFO - external:globalPlugins.fixime.patchIMESupport.inputLangChangeNotify (10:16:17.933) - Dummy-5 (2988):
threadID16148
hkl:134481924
layoutString微软拼音

INFO - external:globalPlugins.fixime.patchIMESupport.patchedHICM (10:16:17.951) - MainThread (16148):
oldFlags
1
newFlags:
1025
lcid
2052
INFO - external:globalPlugins.fixime.patchIMESupport.inputLangChangeNotify (10:16:17.991) - Dummy-5 (2988):
threadID392
hkl:134481924
layoutString微软拼音

IO - speech.speak (10:16:18.091) - MainThread (16148):
Speaking ['识别结果']
IO - inputCore.InputManager.executeGesture (10:16:19.741) - winInputHook (15064):
Input: kb(laptop):tab
IO - inputCore.InputManager.executeGesture (10:16:20.094) - winInputHook (15064):
Input: kb(laptop):escape
CRITICAL - watchdog._crashHandler (10:16:18.282) - Dummy-6 (392):
NVDA crashed! Minidump written to C:\Users\manch\AppData\Local\Temp\nvda.log..\nvda_crash.dmp
INFO - watchdog._crashHandler (10:16:20.650) - Dummy-6 (392):
Listing stacks for Python threads:
Python stack for thread 392 (Dummy-6):
File "watchdog.pyc", line 213, in _crashHandler
File "watchdog.pyc", line 63, in getFormattedStacksForAllThreads

Python stack for thread 15732 (virtualBuffers.MSHTML.VirtualBuffer.loadBuffer):
File "threading.pyc", line 890, in _bootstrap
File "threading.pyc", line 926, in bootstrap_inner
File "threading.pyc", line 870, in run
File "virtualBuffers_init.pyc", line 444, in _loadBuffer

Python stack for thread 13792 (watchdog):
File "threading.pyc", line 890, in _bootstrap
File "threading.pyc", line 926, in _bootstrap_inner
File "threading.pyc", line 870, in run
File "watchdog.pyc", line 120, in _watcher

Python stack for thread 15064 (winInputHook):
File "threading.pyc", line 890, in _bootstrap
File "threading.pyc", line 926, in _bootstrap_inner
File "threading.pyc", line 870, in run
File "winInputHook.pyc", line 79, in hookThreadFunc

Python stack for thread 6816 (_UIAHandler.UIAHandler.MTAThread):
File "threading.pyc", line 890, in _bootstrap
File "threading.pyc", line 926, in _bootstrap_inner
File "threading.pyc", line 870, in run
File "_UIAHandler.pyc", line 315, in MTAThreadFunc
File "queue.pyc", line 170, in get
File "threading.pyc", line 296, in wait

Python stack for thread 12476 (Thread-1):
File "threading.pyc", line 890, in _bootstrap
File "threading.pyc", line 926, in _bootstrap_inner
File "C:\Users\manch\AppData\Roaming\nvda\addons\vocalizer_expressive2_driver\synthDrivers\vocalizer_expressive2_vocalizer.py", line 34, in run
instance, inText = self._bgQueue.get()
File "queue.pyc", line 170, in get
File "threading.pyc", line 296, in wait

Python stack for thread 16148 (MainThread):
File "nvda.pyw", line 247, in
File "core.pyc", line 568, in main
File "wx\core.pyc", line 2134, in MainLoop
File "gui_init_.pyc", line 1062, in Notify
File "core.pyc", line 544, in run
File "baseObject.pyc", line 159, in invalidateCaches
File "comtypesMonkeyPatches.pyc", line 105, in newCpbDel
File "comtypes_init_.pyc", line 918, in del
File "comtypes_init_.pyc", line 1172, in Release
File "comtypesMonkeyPatches.pyc", line 26, in call

INFO - watchdog._crashHandler (10:16:20.650) - Dummy-6 (392):
Restarting due to crash
IO - speech.speak (10:16:20.697) - MainThread (16148):
Speaking ['正在加载文档...']

CloudVision V3.20, how does Be My AI integration work?

Greetings @alekssamos
V3.20 of CloudVision provides support for Be My AI. It even offers a check box for it which is checked by default. So how does it work, or what does it need? My results with the updated add-on seem similar to the ones with the older release. So please kindly shed some light on it.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.