Giter Site home page Giter Site logo

Comments (8)

jzohrab avatar jzohrab commented on August 27, 2024

With this change Lute could support things like Kobo dictionaries with a small amount of effort. From the Discord from hinufo:

To get kobo dicts: https://www.epubor.com/kobo-dictionary-download-and-install.html . The structure for the words within the dictionary is like this: <w><a name="WORD_NAME"> ...description...</w> ... And for variations it's something like this:<w><a name="WORD_NAME"> ...description...<variant="WORD_NAME1"><variant="WORD_NAME2"><variant="WORD_NAME3"></w>

A basic PHP parser:
image

That code is a good start. For integration, Lute dict lookups need to have some kind of "type" field, so this would be a "kobo" type, vs an "online" type, and then Lute would exec the proper code. ... OR the "dict url" would need to handle a different "protocol" - http/s for online, "kobo" for lookup, etc. Anyway, it's a code change.

from lute-v3.

jzohrab avatar jzohrab commented on August 27, 2024

Another dictionary note: From MyCheze in discord:

I'm currently learning Czech, which is a language with a TON of inflections. Every noun and verb has many, many forms. And adjectives follow gender and yadda yadda. I've also learned Spanish which is similar, but has lots of "smushing together" of words (dámelo). Because of this, looking up words can be sort of annoying. Most Czech-English dictionaries are difficult to use because they're not "super extensive" or the lemmatization sucks. For example, negation is a ne- prefix, but the lemmatizers don't always know that. I'm not technical enough to know why that is since it seems like the most obvious thing to check.

I want to build my own dictionary by going through content and saving the words and their inflections so that future Czech learners (and myself) can actually do lookups. But, obviously, that would be a lot of work. I'm happy to do some, but I'd really prefer to start with something. Vocabsieve has pretty good lemmatization (for a ton of langauges) and uses Wiktionary for lookups. An integration would be really nice for filling out the Lute database. Maybe it can analyze the text and update some backend information and then let the user go through things. And if it could somehow use the "known words" database for it's text analysis (which is already pretty cool).

jz note: "all points make sense. The lookups are the hardest part, and Lute is at the mercy of what dictionaries can offer. Perhaps what might make sense would be to have a separate process build a dictionary using spaCy (https://www.tutorialspoint.com/spacy/spacy_models_and_languages.htm) to generate inflected forms of words, and then have a "custom dictionary" that has a map of inflected forms to their root/lemma form. That's technically feasible as a separate process, outside of Lute or whatever tool. For it to be available within Lute, Lute would then need to support custom dictionaries, which is a good idea that's come up before (#6)."

from lute-v3.

jahnke avatar jahnke commented on August 27, 2024

Another change I would suggest is the control if you want to search for the word using lowercase or using the same capitalization as in the original text. This makes a difference for a language like German. For example, the page https://de.m.wiktionary.org/wiki/Mimik has content and the page https://de.m.wiktionary.org/wiki/mimik is empty.

from lute-v3.

jzohrab avatar jzohrab commented on August 27, 2024

Thank you @jahnke, great point. I haven't studied German with Lute yet but really want to. I believe that some other changes may be necessary -- I think Lute downcases things automatically when saving terms, which perhaps isn't a valid design choice. Will check.


EDIT:

  • yes, Lute downcases all terms, so "Tag" gets saved as "tag". Will need to investigate further on how saving the case as part of the term would affect things.
  • It appears that different dictionaries handle things differently. e.g., https://de.thefreedictionary.com/ looks up "mimik" and returns "Mimik". fyi only, I think the case still needs to be handled better.

from lute-v3.

jzohrab avatar jzohrab commented on August 27, 2024

Update: in late v2, and v3, the user can change the term case (but only the case).

from lute-v3.

jzohrab avatar jzohrab commented on August 27, 2024

Picking back up b/c this should be completed, have WIP code and webofpies is working on it.

from lute-v3.

jzohrab avatar jzohrab commented on August 27, 2024

Merged into develop. Note that this code change only addresses the original issue outline.

from lute-v3.

jzohrab avatar jzohrab commented on August 27, 2024

Launched in 3.2.1.

from lute-v3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.