Comments (3)
Keep it simple. I think it would be sufficient to have a user option (similar to the tesseract path option) for the language / script which is preset to eng
(the default language which is always installed). The user would be responsible for installing and selecting the right models, otherwise Tesseract would simply fail with an error.
Latin
(or script/Latin
, depending on your installation) is a good choice for all texts based on Latin script. Some users might need Cyrillic
, Greek
, Arabic
or other scripts. The user option would also allow setting Latin+Greek+Arabic
, for example, so I see no need to ask each time.
from zotero-ocr.
Regarding 1.
For Unix-systems it would probably be enough to just run the command
tesseract --list-langs > /path/to/file.txt
to print all the available languages to a file.
If this works fine, one could implement a Dropdownmenu to just select the language. I think that would be enough.
from zotero-ocr.
A simple solution in a free textbox in the new preferences as @stweil suggested is now implemented.
I am aware of the command in tesseract to show all available languages, but I don't see a possibility to call this from Zotero and save its output somewhere. But yeah we could create a file with something like this.
Let us wait a little bit more and in practice how good the simple solution is already working.
from zotero-ocr.
Related Issues (20)
- Change language to chi_sim_vert, perform OCR didn't response HOT 3
- plugin does not find tesseract HOT 3
- No pdftoppm.exe executive found HOT 2
- Corrupted PDF HOT 8
- Issue with Farsi OCR HOT 1
- An Academic Workflow: Zotero & Obsidian | by Alexandra Phelan | Medium
- OCR Produces corrupted file HOT 3
- Zotero 7 Support HOT 16
- Automatic installation on ArchLinux HOT 3
- Unclear when working HOT 1
- PDF does not auto-link to group libraries
- Arabic language "Saudi Arabia" HOT 1
- Automatically OCR new pdfs
- couldn't open 'nameToUnicode' HOT 1
- No bin.exe executable found HOT 5
- OCR option not in Z7 context menu HOT 19
- 无法调用ocr软件 HOT 9
- TypeError: IOUtils.DirectoryIterator is not a constructor HOT 7
- bugs with newest version & questions on developing HOT 3
- Increase multithreading processing capability HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zotero-ocr.