luteorg / lute-v3 Goto Github PK
View Code? Open in Web Editor NEWLUTE = Learning Using Texts: learn languages through reading. Python/Flask.
License: MIT License
LUTE = Learning Using Texts: learn languages through reading. Python/Flask.
License: MIT License
It would be very nice to have a chapter marking or similar, and then have a table of contents or similar, showing the current chapter number at the top of the page perhaps.
When reading long texts, I sometimes want to know how many pages until I reach the end of the chapter.
Description
When using https://ctext.org/dictionary.pl?if=en&char=### as a dictionary 1, clicking a word in the text
To Reproduce
Steps to reproduce the behavior, e.g.:
This issue does not come up with other dictionaries such as:
https://www.archchinese.com/chinese_english_dictionary.html?find=###
Extra software info, if not already included in the Description:
Description
I've noticed that some lines (not very common, but some) are out of order from the original text file that was imported. I noticed this while listening along with an audiobook and certain things would be skipped then gone back to in a few moments. It's not an issue with the next file (see attachments).
To Reproduce
Steps to reproduce the behavior, e.g.:
Screenshots
This is the text I confirmed with. Harry Potter a Fenixuv rad - J. K. Rowling.txt. An example of a line that is out of place is S mou drahou starou matičkou, ano
which is on line 1889.
Here it is in Lute out of order:
Extra software info, if not already included in the Description:
It would be nice be able to export individual books into files that someone else can load into their Lute. It'd include all the words/definitions, the audio, the bookmarks, everything. I think a feature like that would really help form a Lute community. I know that I'd love to share my texts with other Czech learners.
This would be something like an "anki package" zip file. Work involved:
Lute can import .txt files, would be nice to also support importing .epub.
There are python libraries for importing .epubs, eg: https://andrew-muller.medium.com/getting-text-from-epub-files-in-python-fbfe5df5c2da
Don't know if that's the best one.
The code in develop
has things in place for the epub import to be implemented:
/lute/book/routes.py
method _get_file_content(filefielddata)
has a check for the filename extension, and calls the service.py for epub parsing/lute/book/service.py
has a stub method get_epub_content(epub_file_field_data)
to be implemented./tests/acceptance/book.feature
has a commented-out epub import test. The implementation should add a short sample epub file to tests/acceptance/sample_files/
.The code has a few comments with "todo epub:", where things should be updated:
$ inv todos | grep -i epub
Group: epub
./pyproject.toml : # TODO epub: add epub parsing library to dependencies
./tests/acceptance/book.feature : # TODO epub: add an epub file to sample_files, activate this test.
./lute/book/service.py : raise ValueError("TODO epub: to be implemented.")
./lute/book/forms.py : # TODO epub: add epub to the list, change prompt.
Is your feature request related to a problem? Please describe.
I sometimes have some dead time in my day when I'd like to just "add more words to my dictionary." I don't really want to read, I just want to mindlessly add more terms and definitions so that when I come across them in reading, it's more seamless.
Describe the solution you'd like
To be able to enter a special view of a book where it only has each new/unknown word ONCE, in order of frequency. Then I can just go through them.
It'd also be cool to have the same, but in alphabetical order so that you might find word families and be able to "kill a bunch of birds with one copy and paste." (Add the head word, then copy it and paste it as a parent in the next 5-6 in the list)
Store audio in same parent folder as where images are stored, perhaps ... or stream from URI?
Requirements:
The grinberg article appears to be the best resource.
References:
Audio files could be stored in user's data
folder, and files could be found by md5 of term, e.g.
Big effort required:
Lots of things to do here.
If I know a book has a term, I just want to search for it somewhere, and have the pages where it shows up.
Either a slider, text box, or select box. When reading, I sometimes want to jump to the first page, or the list, arbitrarily. Shouldn't have to go through the book page by page.
Prerequisite for this: #86
Hotkey left-right moves to terms, so up/down could change status.
Would need to go up related to the current status. If multiple terms chosen, could just start with the lowest status. Go from 1-2-3-4-5-WellKnown, skip "ignored".
The function to update is in lute.js, handle_keydown
-- at least, that is how I was intending to do it. If there is a better option, LMK.
Currently users set font size through custom settings, but a nicer method would be a slider or dropdown.
rem
is the way to go. I guess that the rem could go between, what, 25% up to 500%? No idea what range makes sense.For the first pass implementation, don't bother storing this in the db settings table (i.e. where the custom css is stored). That would require a web service call to set the value, another to reload at launch, etc, a bunch of code for little value. If it's easy enough to adjust, it should suffice.
Per https://discord.com/channels/1074759089051160647/1074759090338812015/1185063655977529404
On the last page, there's only a green checkmark, but not a ">" to mark page as done. Need something similar on last page.
(I've revised this issue based on new thoughts)
I wrote Lute based off of LWT, but dropped the SRS feature of LWT: the code was brutal, and for the initial MVP (minimal viable product) release of Lute I didn't feel that it was a necessary feature. I still don't :-) for a few reasons:
Even Steve Kaufmann of LingQ doesn't really recommend using their testing feature, probably for the same reasons as I have above. :-) (He does recommend their "sentence mode" for building sentences, I believe.)
Exporting terms to a CSV, and images to somewhere else, may be trickier than needed, so I'll with using AnkiConnect as the first iteration of this.
This will be an opinionated export: it will assume certain note types, deck name, field names, etc.
The following are some rough ideas only. I'd need to try implementations to really get a handle on the UX.
The term listing has a checkbox. Users could select the terms they want to export, and then click an "export to anki" button.
Ankiconnect supports exporting images, see FooSoft/anki-connect#158.
Lute doesn't store sample sentences for terms, but it does has a reference lookup that could get the latest sentence for any term. The sentences table is loaded on opening a page for reading, so even if the page isn't marked read something should be in the table. Need to update the export to include a non-read page's sentences.
Fields to export:
I'll put some kind of pre-designed note type in a public place so people can access it ... that's probably the easiest thing to do. Creating a single-card shared deck on AnkiWeb would be easiest. AnkiConnect apparently does let you create models using the API, I'm not sure how tough that would be. https://foosoft.net/projects/anki-connect/index.html#model-actions
It would be nice to have Anki cards be able click back to Lute, if Lute's running, so that people can see the term and its sample sentences again.
Just like bulk set parent, but status :)
Make it easy to export terms. This would let users share data, ppl could group up to make data mappings, etc.
Initial solution: just export everything into a CSV. :-) Good enough for now.
Possible long-term solution: somehow combine this with the "filters" in the Term Listing page, so that once a filter is applied, only those terms would be exported.
"Easy exporting and syncing of "parent" database (users learning the same lang could crowdsource)" - "crowdsourcing" to me implies some kind of central place to store definitions, choose the best, filter out trash etc -- that's a different beast.
In most languages, the parent is pretty visibly related to the child, but with a few letter changes. It´d be nice not to have to copy the whole word from below, but just press a button and then type the end of the word.
I'm not sure how this would be done for most languages ... this feels extremely tough.
This is a complicated issue
spaCy and the stanford stanza project are very good parsing libraries, it would be nice to use something like that instead of (hacky?) regex solutions. Unfortunately spaCy is very slow, so things would need to change quite a lot to make it usable within Lute.
Currently Lute parses very frequently:
To get around the frequent parsing (for reading), Lute could:
Is your feature request related to a problem? Please describe.
Not all languages are supported and the regex/link stuff isn't possible for "non-techies."
Describe the solution you'd like
When adding a new language, there's the option to "load from predefined." It would be a great "stepping stone" if there was a simple text file that could be made and shared to "import" languages according to the settings that work well for another user.
This could also allow for more "default" languages to be supported if they're just a small text file that can be downloaded and added to future releases.
Describe alternatives you've considered
Additional context
This is how a file might look:
czechlanguagelute.txt
Todo:
Would it be possible to add a column in the Books section with the book creation date and an option to sort based on it (or maybe a simple ordinal number)?
I can use tags to mark records as newest (or a special naming convention), but that is not as convenient.
Discord discussion notes with user "Jiggle":
before I was editing the original css files (styles, styles-compact) and was storing the font files in the same folder as these css files
but unfortunately the custom_styles is not a static file as I understand
css file edits:
@font-face {
font-family: "MYFONT";
src: url("Rubik-Regular.woff") format("woff");
font-style: normal;
font-weight: normal;
}
if the font is in the same folder as the css file, it works
Notes:
When creating a new Term and adding a new Parent, currently both get the same translation. E.g. from the Tutorial, click on "dogs" and create a new Term with new parent "dog", translation "woof." When saved, "dogs: woof" and "dog: woof" are both created -- but that's kind of redundant.
Would it be better to save the translation with the parent only?
I've created branch set_translation_for_parent_only
which implements this, but it makes Lute look like it loses data. eg create new term for "dogs":
On save, the "dogs" and "dog" term hovers look good, however, when I click on "dogs" again I see the following:
This looks like some data has been lost.
Currently pages break by tokens. Sometimes it would be nice to break chapters or sections forcibly.
e.g., creating a book with text
Hello.
---
Goodbye.
creates a book with two pages: "Hello.", "Goodbye." This page break marker does not change the max words per page, it works with it.
Test cases:
Per https://discord.com/channels/1074759089051160647/1181103335265288202/1184966117941313536
a.completed_book:before {
content: url('/static/icn/tick.png');
margin-right: 5px;
}
but then also have the book titles line up. Easiest/best just to put a new unnamed column in the datatables output.
Is your feature request related to a problem? Please describe.
I'd like for there to be a way to parse Korean texts as I'm learning Korean.
Describe the solution you'd like
Implement a Korean parser based on MeCab-Ko.
Describe alternatives you've considered
I tried to use MeCab to parse a Korean text, but it didn't work, even though MeCab and MeCab-Ko seem to have similarities based on my online research.
(I was using \p{Hangul}
as the Regex for character matching, but I'm not sure if that's correct either so that could have been the issue.)
Is your feature request related to a problem? Please describe.
Germanic languages like German or Dutch have words that span several words that can be separated, in particular verbs.
For example: "Ich lade dich zu meiner Party ein." means I invite you to my party. The verb is "einladen", but in the phrase, the "lade" and the "ein" are separate (and not only this is common, this is mandatory as per the grammar). These kinds of verbs are very common.
Describe the solution you'd like
Current multi-word selection doesn't work, and shift+clicking on words is used for bulk selection. Maybe alt+clicking or some other combination could work.
Describe alternatives you've considered
I don't think there's any,, or at least I can't imagine it. The suggestion on possible solution works maybe to add terms, but no idea on how it would work to show back the information, to be honest. The main reason for the feature request is in case I'm missing something obvious that can work as a solution (short of trying to parse natural grammars).
Prerequisite (?) for audio support that doesn't break when moving to new pages, allow responsive paging.
Pressing the up arrow scrolls up on the page (and now with the audioplayer, there's less vertical space, so it's more common)
From Discord:
It'd be great if you could upload an .srt or other subtitle file with an audio file and have it convert it to a txt for reading, but also add bookmarks for the different pages (or even for all the sentences and next to them, there's a little button that, basically, says "skip to this sentence in the audio"). This would make Lute amazing to use with anything audio based alongside Whisper getting better and better.
Challenges I can see with this request:
Currently, lute/templates/read/textitem.html
has the following:
tid="{{ item.text_id }}"
lid="{{ item.lang_id }}"
paraid="{{ item.para_id }}"
seid="{{ item.se_id }}"
data_text="{{ item.text }}"
data_status_class="{{ item.status_class }}"
data_order="{{ item.order }}"
{% if item.wo_id is not none %}
data_wid="{{ item.wo_id }}"
This doesn't follow javascript standards, e.g outlined at https://dev.to/dev-harbiola/custom-data-attributes-in-html-a-guide-to-data--373.
These could be changed as follows:
tid => data-tid (or data-text-id)
lid => data-lid (or data-lang-id)
paraid => data-para-id
data-se-id or data-sentence-id
data-status-class
data-order
data-wid or data-word-id
I believe that these are only referenced in lute/static/lute.js:
(.venv) MacBook-Pro:lute-v3 jeff$ for t in tid lid paraid seid data_text data_status_class data_order data_wid; do
> echo ------------------------------------
> echo $t
> inv search $t | grep lute.js # limit search to only lute.js
> done
------------------------------------
tid
lute/static/js/lute.js:function prepareTextInteractions(textid) {
------------------------------------
lid
lute/static/js/lute.js: elid = parseInt(el.attr('data_wid'));
lute/static/js/lute.js: url: `/read/termpopup/${elid}`,
lute/static/js/lute.js: const lid = parseInt(el.attr('lid'));
lute/static/js/lute.js: const url = `/read/termform/${lid}/${sendtext}?${extras}`;
lute/static/js/lute.js: const langid = firstel.attr('lid');
------------------------------------
paraid
lute/static/js/lute.js: attr_name = 'paraid';
lute/static/js/lute.js: attr_value = w.attr('paraid');
------------------------------------
seid
lute/static/js/lute.js: let attr_name = 'seid';
lute/static/js/lute.js: let attr_value = w.attr('seid');
------------------------------------
data_text
lute/static/js/lute.js: let text = extra_args.textparts ?? [ el.attr('data_text') ];
------------------------------------
data_status_class
lute/static/js/lute.js: * Terms have data_status_class attribute. If highlights should be shown,
lute/static/js/lute.js:/** Add the data_status_class to the term's classes. */
lute/static/js/lute.js: el.addClass(el.attr("data_status_class"));
lute/static/js/lute.js: el.removeClass(el.attr("data_status_class"));
lute/static/js/lute.js: const st = nextword.attr('data_status_class');
lute/static/js/lute.js: let update_data_status_class = function (e) {
lute/static/js/lute.js: .attr('data_status_class',`${newClass}`);
lute/static/js/lute.js: $('span.kwordmarked').each(update_data_status_class);
lute/static/js/lute.js: $('span.wordhover').each(update_data_status_class);
------------------------------------
data_order
lute/static/js/lute.js:let save_curr_data_order = function(el) {
lute/static/js/lute.js: LUTE_CURR_TERM_DATA_ORDER = parseInt(el.attr('data_order'));
lute/static/js/lute.js: save_curr_data_order($(this));
lute/static/js/lute.js: save_curr_data_order($(this));
lute/static/js/lute.js: const first = parseInt(start_el.attr('data_order'))
lute/static/js/lute.js: const last = parseInt(end_el.attr('data_order'));
lute/static/js/lute.js: const ord = $(this).attr("data_order");
lute/static/js/lute.js: save_curr_data_order(el);
lute/static/js/lute.js: return $(a).attr('data_order') - $(b).attr('data_order');
lute/static/js/lute.js: const i = words.toArray().findIndex(x => parseInt(x.getAttribute('data_order')) === LUTE_CURR_TERM_DATA_ORDER);
lute/static/js/lute.js: save_curr_data_order(curr);
------------------------------------
data_wid
lute/static/js/lute.js: elid = parseInt(el.attr('data_wid'));
(.venv) MacBook-Pro:lute-v3 jeff$
I don't know if this work is worth it ... following standards is good, but not critical. Is this make-work only?
Currently, when the term form is displayed while reading, it sends an event to pause the audio. Some users want to be able to disable that, i.e. audio continues even when the form is opened.
I want to know about my backups:
Its not a big deal, but right now you need to edit the database file to migrate from python to docker config.
This configuration should be converted automatically, or it should be possible to edit all settings (like "backup directory").
Commit ce50ee5112f27 added a basic acceptance (browser-level) test of Lute using Panther: reading a text, and creating Terms and multi-word Terms.
Per https://github.com/jzohrab/lute/blob/develop/tests/acceptance/README.md#tests-to-write, there are a bunch of tests to write, and if extensive work is done on any section of Lute then some of these acceptance might be useful.
It happened to me, that I chose an image for a word, and then I realized, that it would be very difficult to find an image that represented that particular word.
It would be great to be able to click on a selected image again and have that image separate from the word.
The home screen filters are cleared on every page refresh. It should save state, like the Term listing does.
Is your feature request related to a problem? Please describe.
Golden dict is app can have many dictionaries. It may be good to integrate it with Lute
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Notes from a slack chat:
I will attempt to articulate what I think the deal breaker could potentially be without getting into the weeds of how Ancient Greek actually works. You may have noticed that the words contain accented characters. There are various features of the language that cause those accents to change without changing anything about the meaning of the word. For example, I would have to define γὰρ twice because it can appear as either γὰρ OR γάρ. That's one of the most common words in the language meaning something like "for" or "since" or "because". Now, it only comes in those two flavors but I think you could see how quickly it would become tedious to define words over and over just because of diacritics.
With respect to this question accents, I have noticed that chrome is character agnostic when it does its "find in page" search. If I search γὰρ it will highlight γάρ and even γαρ. Perhaps Lute could have in the options specific to a language to ignore accents as well?
Let's continue to use the example of γὰρ, when I am reading the text, I would still see the orthography displayed as the author intended but, behind the scenes in the database, as far as Lute is concerned, γὰρ, γάρ, and γαρ share the same entry.
I know the original LWT let you do character substitutions but it actually just hotswapped one character for another and that fact was reflected in the actual text that you are reading. Basically it would see the character set as consisting of only 24 characters (not accounting for uppercase). The unaccented Greek alphabet.
My thoughts:
Rendered TextTokens (i.e., words shown in the reading pane) would include the accents, but Terms (stored in the db) would be without accents, and the rendered TextTokens would be associated to Terms w/o accents.
No idea at the moment if this would be tough or not!
This is a good idea, simple offline-style dict.
Is it possible to add ability to export terms as a CSV or TXT file format? It would be awesome if we could export filtered terms, not all of them.
It is occasionally needed to be able to remove a term while staying on the reading page.
I think there are many use cases for this:
These notes could be tagged by category, or by term, say, and when looking at a term, the associated notes would be returned too.
Sentences are "Value objects", the id could be tracked by md5 etc of the sentence text, including the language id as part of the md5. case-insens md5 too.
Unused fields:
Is your feature request related to a problem? Please describe.
Many small issues with texts or dictionaries are easily solved by ChatGPT and a few prompts. But copy/pasting things between Lute and ChatGPT can be cumbersome and slow. It would be great to have the integrated a little.
Describe the solution you'd like
I would like to see a few ChatGPT features.
Additional context
Here're some example prompts that I've been using:
For defining stubborn words:
Help me translate this word as it doesn't appear in dictionaries.
The word is: olizovala
Format the response like this, but replace the capitalized words with the correct information:
WORD
UNCONJUGATED, UNDECLINED DICTIONARY FORM OF THE WORD
PART OF SPEECH
- TRANSLATION IN ENGLISH
- OTHER MEANING (ONLY IF APPLICABLE)
SHORT EXAMPLE SENTENCE USING THE WORD
VERY SHORT EXPLANATION OF THE SIGINIFICANCE OF THE WORD USING SIMPLE ENGLISH
For reformatting:
The following passage has a few typos and formatting issues. Please rewrite the passage exactly the same, but fix any typos and reformat it to be more readable. Keep all the "artistic" choices made by the author.
Here's the passage:
Terms created today/cumulative -- when reading, sometimes my reading is creating too many status = 1 terms in one day, too much to bite off. If I create too much new stuff, there's not enough time to digest everything.
Copying over notes from jzohrab/lute#21.
Currently, Lute stores "dictionary 1" and "dictionary 2" URLs in the Language table, with placeholders for term substitution. This creates a few limitations:
It is potentially worth it to change dictionaries into first-class entities, e.g. with a brand new user form like this:
field | notes |
---|---|
dictionary URL | textbox, the url with "###" placeholders -- better yet, change the placeholder to "[LUTETERM]" or similar, since "#" is a valid URL entry (e.g., looking up "https://en.m.wiktionary.org/wiki/essere#Italian" would use a URL like "https://en.m.wiktionary.org/wiki/[LUTETERM]#Italian") |
opens in pop-up? | checkbox |
encoding | dropdown or textbox |
returns | dropdown (html default, or json -- reason for the "json" option is that some languages seem to only have dictionaries available via a json API) |
active | checkbox. Sometimes some dictionaries will be more useful than others -- eg, when offline, any online dicts are useless, so I could potentially deactivate the online dicts and only use an offline Kobo dict or whatever. |
These would be stored in a new dictionaries table, and would be linked to the Languages. First draft UI implementation could be a dedicated UI screen to define dictionaries, that would be easiest (It's possible to create child subforms, but I haven't done that yet in Symfony :-) ).
One dict would have to be marked as primary. A language could define one or multiple dicts.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.