Comments (8)
With this change Lute could support things like Kobo dictionaries with a small amount of effort. From the Discord from hinufo:
To get kobo dicts: https://www.epubor.com/kobo-dictionary-download-and-install.html . The structure for the words within the dictionary is like this: <w><a name="WORD_NAME"> ...description...</w>
... And for variations it's something like this:<w><a name="WORD_NAME"> ...description...<variant="WORD_NAME1"><variant="WORD_NAME2"><variant="WORD_NAME3"></w>
That code is a good start. For integration, Lute dict lookups need to have some kind of "type" field, so this would be a "kobo" type, vs an "online" type, and then Lute would exec the proper code. ... OR the "dict url" would need to handle a different "protocol" - http/s for online, "kobo" for lookup, etc. Anyway, it's a code change.
from lute-v3.
Another dictionary note: From MyCheze in discord:
I'm currently learning Czech, which is a language with a TON of inflections. Every noun and verb has many, many forms. And adjectives follow gender and yadda yadda. I've also learned Spanish which is similar, but has lots of "smushing together" of words (dámelo). Because of this, looking up words can be sort of annoying. Most Czech-English dictionaries are difficult to use because they're not "super extensive" or the lemmatization sucks. For example, negation is a ne- prefix, but the lemmatizers don't always know that. I'm not technical enough to know why that is since it seems like the most obvious thing to check.
I want to build my own dictionary by going through content and saving the words and their inflections so that future Czech learners (and myself) can actually do lookups. But, obviously, that would be a lot of work. I'm happy to do some, but I'd really prefer to start with something. Vocabsieve has pretty good lemmatization (for a ton of langauges) and uses Wiktionary for lookups. An integration would be really nice for filling out the Lute database. Maybe it can analyze the text and update some backend information and then let the user go through things. And if it could somehow use the "known words" database for it's text analysis (which is already pretty cool).
jz note: "all points make sense. The lookups are the hardest part, and Lute is at the mercy of what dictionaries can offer. Perhaps what might make sense would be to have a separate process build a dictionary using spaCy (https://www.tutorialspoint.com/spacy/spacy_models_and_languages.htm) to generate inflected forms of words, and then have a "custom dictionary" that has a map of inflected forms to their root/lemma form. That's technically feasible as a separate process, outside of Lute or whatever tool. For it to be available within Lute, Lute would then need to support custom dictionaries, which is a good idea that's come up before (#6)."
from lute-v3.
Another change I would suggest is the control if you want to search for the word using lowercase or using the same capitalization as in the original text. This makes a difference for a language like German. For example, the page https://de.m.wiktionary.org/wiki/Mimik has content and the page https://de.m.wiktionary.org/wiki/mimik is empty.
from lute-v3.
Thank you @jahnke, great point. I haven't studied German with Lute yet but really want to. I believe that some other changes may be necessary -- I think Lute downcases things automatically when saving terms, which perhaps isn't a valid design choice. Will check.
EDIT:
- yes, Lute downcases all terms, so "Tag" gets saved as "tag". Will need to investigate further on how saving the case as part of the term would affect things.
- It appears that different dictionaries handle things differently. e.g., https://de.thefreedictionary.com/ looks up "mimik" and returns "Mimik". fyi only, I think the case still needs to be handled better.
from lute-v3.
Update: in late v2, and v3, the user can change the term case (but only the case).
from lute-v3.
Picking back up b/c this should be completed, have WIP code and webofpies is working on it.
from lute-v3.
Merged into develop
. Note that this code change only addresses the original issue outline.
from lute-v3.
Launched in 3.2.1.
from lute-v3.
Related Issues (20)
- Increased length for URL field needed HOT 4
- Adding/changing image to a term also changes the parent as well as the other terms that share the same parent HOT 4
- Show translation (or part of it) for selected statuses
- Allow use of offline dictionaries or APIs for prepopulation such as Freedict HOT 1
- A better way to sort by difficulty HOT 1
- Term Import IntegrityError HOT 6
- Term Import Internal Server Error after deleting all Tags HOT 4
- Add .m4a audio file support HOT 2
- Extra empty error messages in Language edit error. HOT 3
- BookStats remaining after books have been deleted HOT 1
- Easy way to read messages (short, frequent texts) HOT 3
- Add "clean child duplicate descriptions" tool
- dictionary iframes can steal focus from lute HOT 3
- Not possible to update unknown words using bulk import (without making them known) HOT 3
- Remove whitespace from text item HOT 3
- Change audio file HOT 2
- 3.5.1: Dictionaries no longer loading HOT 7
- Words per page ideas: a) parse text length before import; b) target number of pages HOT 2
- [Enhancement] Improve the new implementation of keyboard focus introduced in 3.5.1 HOT 3
- Add parents to the CLI term export HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lute-v3.