alekssamos / cloudvision Goto Github PK
View Code? Open in Web Editor NEWVision Bot NVDA addon
Home Page: https://visionbot.ru/addon/
License: MIT License
Vision Bot NVDA addon
Home Page: https://visionbot.ru/addon/
License: MIT License
The following issues listed below will help you temporarily resolve some issues from NVDA 2019.3 to NVDA 2022.4. The solution will be provided after mentioning the issues below.
When the plugin captures the image by pressing NVDA + CTRL + i keys, the plugin can recognize the text 2 to 3 times without problems.
However, after 2 or 3 times, the plugin simply no longer verbalizes the captured text.
When I first want to read previously captured text using NVDA + CTRL +"I I", the plugin allows me to read the text without problems, but after I close the curtain to read the text, the plugin stops working completely.
That being said, the issues mentioned above were resolved as follows.
If you want to use an older version, the minimum and recommended version to use with this plugin is 2019.3:
NVDA+5: activated:
NVDA + 6: disabled:
NVDA + 7: disabled:
NVDA + 8: disabled:
NVDA + m: On:
Also, remember to keep the following function activated in the navigation mode tab indicated in quotes below:
"Enable navigation mode on page load."
Now, the problems that usually appear from version 2023.1 of NVDA are the following:
Sometimes an error may occur when sending the analysis to the server:
When you run recognition for the first time on some website or program, the text will be displayed complete, but if you change programs or interfaces, the text will be displayed incomplete and the scans will stop before the required time.
Once again I mention that this plugin is currently only recommended for use on NVDA 2019.3 and NVDA 2022.4.
The following comments were for the purpose of finding the solution I just mentioned, so you don't need to read them.
@alekssamos Greetings. Now that CloudVision supports Be My AI, would it be possible to enhance the add-on to support the asking of follow-up questions, like what JAWS also does in its May 2024 release, after image recognition is finished?
Thanks.
Здравствуйте. Я опубликовал в виде pull request обновление перевода дополнения для NVDA на украинский язык, поскольку он содержыт некоторые ошибки. С того времени вышло уже два обновления, но обновленный перевод в дополнение не был включен. Включите его, пожалуйста, и скажите, почему он не был включен.
@alekssamos First and foremost, thanks for fixing the Be My AI issue in V3.2.0.1.
Now that Be My AI is available, would it be possible to extend CloudVision's functionality in a way that it can become capable of recognizing the currently focused window or screen? This would help a lot with, say, opened images in Telegram, inaccessible apps which display text which cannot be detected via NVDA's cursor/object navigation key strokes, etc. Maybe new hot keys can be added to cover screen/window detection/recognition. The current approach is either object-based or file-based, but my suggestion can expand the usefulness of the add-on.
Thanks.
NVDA: alpha-21371,376edfcf
Add-on Version: 2.0.3.7
Quickly press NVDA+CTRL+I twice. After the recognition result is returned, an error sound will be heard, and NVDA is forced to restart.
error log:
INFO - main (10:15:59.742) - MainThread (16148):
Starting NVDA version alpha-21371,376edfcf
INFO - core.main (10:15:59.888) - MainThread (16148):
Config dir: C:\Users\manch\AppData\Roaming\nvda
INFO - config.ConfigManager._loadConfig (10:15:59.888) - MainThread (16148):
Loading config: C:\Users\manch\AppData\Roaming\nvda\nvda.ini
INFO - core.main (10:16:00.001) - MainThread (16148):
Using Windows version 10.0.19042 workstation
INFO - core.main (10:16:00.001) - MainThread (16148):
Using Python version 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:01:55) [MSC v.1900 32 bit (Intel)]
INFO - core.main (10:16:00.001) - MainThread (16148):
Using comtypes version 1.1.7
INFO - core.main (10:16:00.002) - MainThread (16148):
Using configobj version 5.1.0 with validate version 1.0.1
INFO - synthDriverHandler.setSynth (10:16:00.504) - MainThread (16148):
Loaded synthDriver vocalizer_expressive2
INFO - core.main (10:16:00.504) - MainThread (16148):
Using wx version 4.0.3 msw (phoenix) wxWidgets 3.0.5 with six version 1.12.0
INFO - brailleInput.initialize (10:16:00.505) - MainThread (16148):
Braille input initialized
INFO - braille.initialize (10:16:00.506) - MainThread (16148):
Using liblouis version 3.15.0
INFO - braille.initialize (10:16:00.509) - MainThread (16148):
Using pySerial version 3.4
INFO - braille.BrailleHandler.setDisplayByName (10:16:00.513) - MainThread (16148):
Loaded braille display driver noBraille, current display has 0 cells.
INFO - core.main (10:16:00.721) - MainThread (16148):
Java Access Bridge support initialized
INFO - _UIAHandler.UIAHandler.MTAThreadFunc (10:16:00.732) - _UIAHandler.UIAHandler.MTAThread (6816):
UIAutomation: IUIAutomation6
DEBUGWARNING - inputCore.InputManager.loadLocaleGestureMap (10:16:00.822) - MainThread (16148):
No locale gesture map for language zh_CN
DEBUGWARNING - touchHandler.touchSupported (10:16:00.858) - MainThread (16148):
No touch devices found
DEBUGWARNING - abstractEngine.AbstractEngineHandler.getEngineList (10:16:00.929) - MainThread (16148):
Engine 'captcha' doesn't pass the check, excluding from list
DEBUGWARNING - abstractEngine.AbstractEngineHandler.getEngineList (10:16:00.931) - MainThread (16148):
Engine 'sougouOCR' doesn't pass the check, excluding from list
DEBUGWARNING - abstractEngine.AbstractEngineHandler.getEngineList (10:16:00.938) - MainThread (16148):
Engine 'machineLearning' doesn't pass the check, excluding from list
IO - external:globalPlugins.fixime.GuessComposition.get_code_by_name (10:16:00.949) - MainThread (16148):
controlconverted to 17
IO - external:globalPlugins.fixime.GuessComposition.get_code_by_name (10:16:00.949) - MainThread (16148):
numpadenterconverted to 13
IO - external:globalPlugins.fixime.GuessComposition.get_code_by_name (10:16:00.949) - MainThread (16148):
enterconverted to 13
INFO - external:globalPlugins.fixime.GlobalPlugin.refreshLayoutString (10:16:00.954) - MainThread (16148):
2052
INFO - external:globalPlugins.fixime.GlobalPlugin.refreshLayoutString (10:16:00.954) - MainThread (16148):
00000804
IO - inputCore.InputManager.executeGesture (10:16:00.958) - winInputHook (15064):
Input: kb(laptop):shift+windows+space
INFO - external:globalPlugins.fixime.GlobalPlugin.refreshLayoutString (10:16:00.961) - MainThread (16148):
2052
INFO - external:globalPlugins.fixime.GlobalPlugin.refreshLayoutString (10:16:00.961) - MainThread (16148):
00000804
IO - inputCore.InputManager.executeGesture (10:16:00.962) - winInputHook (15064):
Input: kb(laptop):windows+space
INFO - external:globalPlugins.fixime.GlobalPlugin.refreshLayoutString (10:16:00.965) - MainThread (16148):
2052
INFO - external:globalPlugins.fixime.GlobalPlugin.refreshLayoutString (10:16:00.965) - MainThread (16148):
00000804
INFO - core.main (10:16:01.128) - MainThread (16148):
NVDA initialized
IO - speech.speak (10:16:01.293) - MainThread (16148):
Speaking ['空格', EndUtteranceCommand()]
IO - speech.speak (10:16:01.295) - MainThread (16148):
Speaking ['空格', EndUtteranceCommand()]
IO - speech.speak (10:16:01.386) - MainThread (16148):
Speaking ['放大', '按钮']
IO - inputCore.InputManager.executeGesture (10:16:01.885) - winInputHook (15064):
Input: kb(laptop):escape
DEBUGWARNING - NVDAObjects.IAccessible.IAccessible.get_IA2States (10:16:01.904) - MainThread (16148):
could not get IAccessible2 states
Traceback (most recent call last):
File "NVDAObjects\IAccessible_init.pyc", line 1633, in _get_IA2States
File "comtypesMonkeyPatches.pyc", line 26, in call
_ctypes.COMError: (-2147417848, '被调用的对象已与其客户端断开连接。', (None, None, None, 0, None))
DEBUGWARNING - NVDAObjects.IAccessible.IAccessible._get_IA2Attributes (10:16:01.905) - MainThread (16148):
IAccessibleObject.attributes COMError (-2147417848, '被调用的对象已与其客户端断开连接。', (None, None, None, 0, None))
DEBUGWARNING - NVDAObjects.IAccessible.IAccessible.get_IAccessibleRole (10:16:01.906) - MainThread (16148):
accRole failed: (-2147417848, '被调用的对象已与其客户端断开连接。', (None, None, None, 0, None))
IO - speech.speak (10:16:01.990) - MainThread (16148):
Speaking ['桌面', '列表']
IO - speech.speak (10:16:01.994) - MainThread (16148):
Speaking ['NVDA']
IO - inputCore.InputManager.executeGesture (10:16:02.333) - winInputHook (15064):
Input: kb(laptop):alt+f4
IO - speech.speak (10:16:02.482) - MainThread (16148):
Speaking ['关闭 Windows', '对话框', '关闭所有应用,然后关闭电脑。']
IO - speech.speak (10:16:02.488) - MainThread (16148):
Speaking ['希望计算机做什么(W)?', '组合框', '关机', '已折叠', 'Alt+w']
IO - inputCore.InputManager.executeGesture (10:16:02.830) - winInputHook (15064):
Input: kb(laptop):alt+f4
IO - speech.speak (10:16:02.882) - MainThread (16148):
Speaking ['桌面', '列表']
IO - speech.speak (10:16:02.884) - MainThread (16148):
Speaking ['NVDA']
IO - inputCore.InputManager.executeGesture (10:16:03.137) - winInputHook (15064):
Input: kb(laptop):alt+tab
IO - speech.speak (10:16:03.211) - MainThread (16148):
Speaking ['亲情永恒']
IO - speech.speak (10:16:03.442) - MainThread (16148):
Speaking ['亲情永恒']
IO - speech.speak (10:16:03.553) - MainThread (16148):
Speaking ['输入', '编辑框', '多行', '空白']
IO - inputCore.InputManager.executeGesture (10:16:03.746) - winInputHook (15064):
Input: kb(laptop):alt+f4
IO - speech.speak (10:16:03.823) - MainThread (16148):
Speaking ['桌面', '列表']
IO - speech.speak (10:16:03.831) - MainThread (16148):
Speaking ['NVDA']
IO - inputCore.InputManager.executeGesture (10:16:04.903) - winInputHook (15064):
Input: kb(laptop):control+NVDA+i
IO - speech.speak (10:16:04.907) - MainThread (16148):
Speaking ['识别查看对象']
DEBUGWARNING - Python warning (10:16:04.908) - MainThread (16148):
C:\Users\manch\AppData\Roaming\nvda\addons\CloudVision\globalPlugins\CloudVision_init.py:198: wxPyDeprecationWarning: Call to deprecated item EmptyBitmap. Use :class:wx.Bitmap
instead
bmp = wx.EmptyBitmap(width, height)
IO - inputCore.InputManager.executeGesture (10:16:05.049) - winInputHook (15064):
Input: kb(laptop):control+NVDA+i
INFO - external:globalPlugins.fixime.patchIMESupport.inputLangChangeNotify (10:16:17.933) - Dummy-5 (2988):
threadID16148
hkl:134481924
layoutString微软拼音
INFO - external:globalPlugins.fixime.patchIMESupport.patchedHICM (10:16:17.951) - MainThread (16148):
oldFlags
1
newFlags:
1025
lcid
2052
INFO - external:globalPlugins.fixime.patchIMESupport.inputLangChangeNotify (10:16:17.991) - Dummy-5 (2988):
threadID392
hkl:134481924
layoutString微软拼音
IO - speech.speak (10:16:18.091) - MainThread (16148):
Speaking ['识别结果']
IO - inputCore.InputManager.executeGesture (10:16:19.741) - winInputHook (15064):
Input: kb(laptop):tab
IO - inputCore.InputManager.executeGesture (10:16:20.094) - winInputHook (15064):
Input: kb(laptop):escape
CRITICAL - watchdog._crashHandler (10:16:18.282) - Dummy-6 (392):
NVDA crashed! Minidump written to C:\Users\manch\AppData\Local\Temp\nvda.log..\nvda_crash.dmp
INFO - watchdog._crashHandler (10:16:20.650) - Dummy-6 (392):
Listing stacks for Python threads:
Python stack for thread 392 (Dummy-6):
File "watchdog.pyc", line 213, in _crashHandler
File "watchdog.pyc", line 63, in getFormattedStacksForAllThreads
Python stack for thread 15732 (virtualBuffers.MSHTML.VirtualBuffer.loadBuffer):
File "threading.pyc", line 890, in _bootstrap
File "threading.pyc", line 926, in bootstrap_inner
File "threading.pyc", line 870, in run
File "virtualBuffers_init.pyc", line 444, in _loadBuffer
Python stack for thread 13792 (watchdog):
File "threading.pyc", line 890, in _bootstrap
File "threading.pyc", line 926, in _bootstrap_inner
File "threading.pyc", line 870, in run
File "watchdog.pyc", line 120, in _watcher
Python stack for thread 15064 (winInputHook):
File "threading.pyc", line 890, in _bootstrap
File "threading.pyc", line 926, in _bootstrap_inner
File "threading.pyc", line 870, in run
File "winInputHook.pyc", line 79, in hookThreadFunc
Python stack for thread 6816 (_UIAHandler.UIAHandler.MTAThread):
File "threading.pyc", line 890, in _bootstrap
File "threading.pyc", line 926, in _bootstrap_inner
File "threading.pyc", line 870, in run
File "_UIAHandler.pyc", line 315, in MTAThreadFunc
File "queue.pyc", line 170, in get
File "threading.pyc", line 296, in wait
Python stack for thread 12476 (Thread-1):
File "threading.pyc", line 890, in _bootstrap
File "threading.pyc", line 926, in _bootstrap_inner
File "C:\Users\manch\AppData\Roaming\nvda\addons\vocalizer_expressive2_driver\synthDrivers\vocalizer_expressive2_vocalizer.py", line 34, in run
instance, inText = self._bgQueue.get()
File "queue.pyc", line 170, in get
File "threading.pyc", line 296, in wait
Python stack for thread 16148 (MainThread):
File "nvda.pyw", line 247, in
File "core.pyc", line 568, in main
File "wx\core.pyc", line 2134, in MainLoop
File "gui_init_.pyc", line 1062, in Notify
File "core.pyc", line 544, in run
File "baseObject.pyc", line 159, in invalidateCaches
File "comtypesMonkeyPatches.pyc", line 105, in newCpbDel
File "comtypes_init_.pyc", line 918, in del
File "comtypes_init_.pyc", line 1172, in Release
File "comtypesMonkeyPatches.pyc", line 26, in call
INFO - watchdog._crashHandler (10:16:20.650) - Dummy-6 (392):
Restarting due to crash
IO - speech.speak (10:16:20.697) - MainThread (16148):
Speaking ['正在加载文档...']
Greetings @alekssamos
V3.20 of CloudVision provides support for Be My AI. It even offers a check box for it which is checked by default. So how does it work, or what does it need? My results with the updated add-on seem similar to the ones with the older release. So please kindly shed some light on it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.