With a Kindle in Portuguese, highlights's location and date aren't added in Notion...

Placing <a class="issue-link js-issue-link" data-error-text="Failed to load title" dat

Non-English language support about kindle2notion HOT 6 OPEN

paperboi commented on May 18, 2024

Non-English language support

from kindle2notion.

Comments (6)

mefonseca commented on May 18, 2024 3

Hi! Really appreciate this package!
I was also using Kindle in Portuguese and not getting the location and date. I changed my devise to English and it is all good now.
However, non-english letters are missing. I think is the same problem as @asyr01.
"Transformação" -> "Transformao"
"Mudança" -> "Mudana"
"Está" -> "est"
"Você" -> "voc"

I saw that the last commit was regard this issue:

raw_clippings_text = raw_clippings_text.encode("ascii", errors="xmlcharrefreplace").decode()

If it was only utf-8-sig I think it would read the non-english letter (I tried manually running the funcion "read_raw_clippings" on my "My Clippings.txt"), but I don't know what would happen on other parts of the code.

raw_clippings_text = open(clippings_file_path, "r", encoding="utf-8-sig").read()

Thank you!

from kindle2notion.

paperboi commented on May 18, 2024 2

Placing #46 here for reference. Thanks for contributing again @asyr01!

Regarding your second question, the current package is already capable of doing that. It can be optimized with a JSON structure to track clippings instead of the current method.

from kindle2notion.

paperboi commented on May 18, 2024 1

For reference, this is the snippet of code that scrapes out the location, page and date information from the text file.
See lines 49-53 in /kindle2notion/parasing.py

The function that addresses this is pasted below:

def _parse_page_location_and_date(raw_clipping_list: List) -> Tuple[str, str, str]:
    second_line = raw_clipping_list[1]
    second_line_as_list = second_line.strip().split(' | ')
    page = location = date = ''
    for element in second_line_as_list:
        element = element.lower()
        if 'page' in element:
            page = element[element.find('page'):].replace('page', '').strip()
        if 'location' in element:
            location = element[element.find('location'):].replace('location', '').strip()
        if 'added on' in element:
            date = parse(element[element.find('added on'):].replace('added on', '').strip())
            date = date.strftime('%A, %d %B %Y %I:%M:%S %p')

    return page, location, date

One would need to replace 'page' , 'location' and 'added on' in lines 49, 51, 53 with their language equivalent terms as used in the respective My Clippings.txt file to get the relevant result.

In your case from my limited understanding it would be 'destaque na página', 'destaque ou posição, and Adicionado: .

Leaving this issue open cause I'm unsure of how to incorporate this feature within the structure of the package. I'm open to hearing inputs from the GH community on this one. A working solution may be to identify the language on scraping the first clipping and adapting the relevant keywords to fetch respectively. I can change the languages on my Kindle and make some test clippings so that they would get saved in that language in the My Clippings file and code from there.

from kindle2notion.

asyr01 commented on May 18, 2024 1

Really appreciate the hard work you put in.
There is no problem with English. However when it comes to my Turkish Books,
Unfortunately there is missing worlds on notion which includes special letters in Turkish,
For example "i, ç , ü, ö", This non-english letters are missing,
Maybe we could find some way to handle it.
Also when we start the script for second time, if clippings are all same it could skip existing ones
and only append the new ones, is it possible?
Thanks, Have a good one.

from kindle2notion.

paperboi commented on May 18, 2024

Hi! Really appreciate this package!
I was also using Kindle in Portuguese and not getting the location and date. I changed my devise to English and it is all good now.
However, non-english letters are missing. I think is the same problem as @asyr01.
"Transformação" -> "Transformao"
"Mudança" -> "Mudana"
"Está" -> "est"
"Você" -> "voc"

I saw that the last commit was regard this issue:
raw_clippings_text = raw_clippings_text.encode("ascii", errors="xmlcharrefreplace").decode()

If it was only utf-8-sig I think it would read the non-english letter (I tried manually running the funcion "read_raw_clippings" on my "My Clippings.txt"), but I don't know what would happen on other parts of the code.
raw_clippings_text = open(clippings_file_path, "r", encoding="utf-8-sig").read()

Thank you!

Thanks for the tip @mefonseca! Implemented your request in the latest release.
@asyr01 please update the package and try running it on your system. It should account for those letters now.

@lfschafaschek Will implement custom Portuguese support soon!

Thank you all for your patience and goodwill. Hope this fix addresses your issues here.

from kindle2notion.

huhlik-cz commented on May 18, 2024

Hi, I'm running the latest version and I have the same issue as above but with the Czech characters like these: ěščřžňů. Can the Czech language be also supported? Thank you!

from kindle2notion.

Non-English language support about kindle2notion HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent