Giter Site home page Giter Site logo

Comments (6)

mefonseca avatar mefonseca commented on May 18, 2024 3

Hi! Really appreciate this package!
I was also using Kindle in Portuguese and not getting the location and date. I changed my devise to English and it is all good now.
However, non-english letters are missing. I think is the same problem as @asyr01.
"Transformação" -> "Transformao"
"Mudança" -> "Mudana"
"Está" -> "est"
"Você" -> "voc"

I saw that the last commit was regard this issue:

raw_clippings_text = raw_clippings_text.encode("ascii", errors="xmlcharrefreplace").decode()


If it was only utf-8-sig I think it would read the non-english letter (I tried manually running the funcion "read_raw_clippings" on my "My Clippings.txt"), but I don't know what would happen on other parts of the code.

raw_clippings_text = open(clippings_file_path, "r", encoding="utf-8-sig").read()


Thank you!

from kindle2notion.

paperboi avatar paperboi commented on May 18, 2024 2

Placing #46 here for reference. Thanks for contributing again @asyr01!

Regarding your second question, the current package is already capable of doing that. It can be optimized with a JSON structure to track clippings instead of the current method.

from kindle2notion.

paperboi avatar paperboi commented on May 18, 2024 1

For reference, this is the snippet of code that scrapes out the location, page and date information from the text file.
See lines 49-53 in /kindle2notion/parasing.py

The function that addresses this is pasted below:

def _parse_page_location_and_date(raw_clipping_list: List) -> Tuple[str, str, str]:
    second_line = raw_clipping_list[1]
    second_line_as_list = second_line.strip().split(' | ')
    page = location = date = ''
    for element in second_line_as_list:
        element = element.lower()
        if 'page' in element:
            page = element[element.find('page'):].replace('page', '').strip()
        if 'location' in element:
            location = element[element.find('location'):].replace('location', '').strip()
        if 'added on' in element:
            date = parse(element[element.find('added on'):].replace('added on', '').strip())
            date = date.strftime('%A, %d %B %Y %I:%M:%S %p')

    return page, location, date

One would need to replace 'page' , 'location' and 'added on' in lines 49, 51, 53 with their language equivalent terms as used in the respective My Clippings.txt file to get the relevant result.

In your case from my limited understanding it would be 'destaque na página', 'destaque ou posição, and Adicionado: .

Leaving this issue open cause I'm unsure of how to incorporate this feature within the structure of the package. I'm open to hearing inputs from the GH community on this one. A working solution may be to identify the language on scraping the first clipping and adapting the relevant keywords to fetch respectively. I can change the languages on my Kindle and make some test clippings so that they would get saved in that language in the My Clippings file and code from there.

from kindle2notion.

asyr01 avatar asyr01 commented on May 18, 2024 1

Really appreciate the hard work you put in.
There is no problem with English. However when it comes to my Turkish Books,
Unfortunately there is missing worlds on notion which includes special letters in Turkish,
For example "i, ç , ü, ö", This non-english letters are missing,
Maybe we could find some way to handle it.
Also when we start the script for second time, if clippings are all same it could skip existing ones
and only append the new ones, is it possible?
Thanks, Have a good one.

from kindle2notion.

paperboi avatar paperboi commented on May 18, 2024

Hi! Really appreciate this package!
I was also using Kindle in Portuguese and not getting the location and date. I changed my devise to English and it is all good now.
However, non-english letters are missing. I think is the same problem as @asyr01.
"Transformação" -> "Transformao"
"Mudança" -> "Mudana"
"Está" -> "est"
"Você" -> "voc"

I saw that the last commit was regard this issue:
raw_clippings_text = raw_clippings_text.encode("ascii", errors="xmlcharrefreplace").decode()

If it was only utf-8-sig I think it would read the non-english letter (I tried manually running the funcion "read_raw_clippings" on my "My Clippings.txt"), but I don't know what would happen on other parts of the code.
raw_clippings_text = open(clippings_file_path, "r", encoding="utf-8-sig").read()

Thank you!

Thanks for the tip @mefonseca! Implemented your request in the latest release.
@asyr01 please update the package and try running it on your system. It should account for those letters now.

@lfschafaschek Will implement custom Portuguese support soon!

Thank you all for your patience and goodwill. Hope this fix addresses your issues here.

from kindle2notion.

huhlik-cz avatar huhlik-cz commented on May 18, 2024

Hi, I'm running the latest version and I have the same issue as above but with the Czech characters like these: ěščřžňů. Can the Czech language be also supported? Thank you!

from kindle2notion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.