Giter Site home page Giter Site logo

Comments (4)

aytey avatar aytey commented on August 18, 2024

Could be related to file size again:

(venv) atg@vapvdatg01:/tmp> size=10057858 && tail -c $size original.in > v2 && python test.py v2
MacCyrillic
(venv) atg@vapvdatg01:/tmp> size=10057859 && tail -c $size original.in > v2 && python test.py v2
Traceback (most recent call last):
  File "test.py", line 6, in <module>
    encoding = detect(open(fname, "rb").read())["encoding"]
  File "/tmp/venv/lib64/python3.8/site-packages/charset_normalizer/legacy.py", line 28, in detect
    r = from_bytes(byte_str).best()
  File "/tmp/venv/lib64/python3.8/site-packages/charset_normalizer/api.py", line 452, in from_bytes
    and fallback_u8.fingerprint != fallback_ascii.fingerprint
  File "/tmp/venv/lib64/python3.8/site-packages/charset_normalizer/models.py", line 274, in fingerprint
    return sha256(self.output()).hexdigest()
  File "/tmp/venv/lib64/python3.8/site-packages/charset_normalizer/models.py", line 265, in output
    self._output_payload = str(self).encode(encoding, "replace")
  File "/tmp/venv/lib64/python3.8/site-packages/charset_normalizer/models.py", line 114, in __str__
    self._string = str(self._payload, self._encoding, "strict")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 2711776: ordinal not in range(128)

Where:

(venv) atg@vapvdatg01:/tmp> diff <(tail -c 10057858 original.in)  <(tail -c 10057859 original.in)
1c1
< __true_type __type;
---
>  __true_type __type;

(so we gained a single ASCII space)

from charset_normalizer.

Ousret avatar Ousret commented on August 18, 2024

Feel free to give feedback on that matter.

from charset_normalizer.

aytey avatar aytey commented on August 18, 2024

Feel free to give feedback on that matter.

Tested! Works great! Thanks, @Ousret!

from charset_normalizer.

JensTimmerman avatar JensTimmerman commented on August 18, 2024

I had the same issue, this patch also fixes it for me.

from charset_normalizer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.