Giter Site home page Giter Site logo

Comments (5)

paulfitz avatar paulfitz commented on May 20, 2024

Thanks for reporting this @SonOfLilit. For daff.py, a hack to make this work is to edit it by hand, replacing codecs.open(path,"r","utf-8") with codecs.open(path,"r","iso-8859-1"). With that change, I see a diff of:

@@,a,b
→, à,á→â

You may need to change more if you want the diff itself to be produced in the same encoding rather than utf-8.

How ideally should this work? A parameter specifying encoding? An attempt at autodetection?

from daff.

dogmatic69 avatar dogmatic69 commented on May 20, 2024

param should be best, can't rely on what the file says as you can have latin1 in a utf8 file 👎

I guess you could use auto-detection as a default, but will need something to be able to specify when things are crazy.

from daff.

SonOfLilit avatar SonOfLilit commented on May 20, 2024

Ideally there should be a cmd parameter because some poor people need to
use utf16, which can't be made sense of without very special treatment.

But more importantly, default behavior should be to work on raw, undecoded
bytes. As long as you never try to split cell contents (e.g. you must
output "[abc->aBc]" and not "a[b->B]c" which might split a character in the
middle in utf8), every other encoding I'm aware of would work just fine,
including utf8, DOS codepages, ISO codepages and Windows codepages (I must
admit I have no idea how pre-Unicode chinese/japanese codepages work, but
they would probably be fine too).

On Sat, Sep 17, 2016, 12:40 AM Carl Sutton [email protected] wrote:

param should be best, can't rely on what the file says as you can have
latin1 in a utf8 file 👎

I guess you could use auto-detection as a default, but will need something
to be able to specify when things are crazy.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#71 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAA6fWvR_PGnOYspRD79VcT6HlpCUKtsks5qqwzmgaJpZM4J-are
.

from daff.

paulfitz avatar paulfitz commented on May 20, 2024

Ok, sounds like a parameter is important since there'll always be those who need it.

I'm not sure I can completely avoid touching cell contents. There are options for whitespace-insensitive and case-insensitive diffs for example. These obviously get wacky in the general case but people want them for the common special case of plain old ascii. Would auto-detection via delegation to eg chardet [1] in python be adequate do you think @SonOfLilit?

[1] https://github.com/chardet/chardet

from daff.

SonOfLilit avatar SonOfLilit commented on May 20, 2024

As long as you're only touching characters that are ASCII (commas, double
quotes, tabs, spaces) you should be fine with all the encodings I listed as
not needing a parameter - the reason they don't is that they only differ in
the non-ASCII code points.

On Tue, Sep 20, 2016, 12:17 AM Paul Fitzpatrick [email protected]
wrote:

Ok, sounds like a parameter is important since there'll always be those
who need it.

I'm not sure I can completely avoid touching cell contents. There are
options for whitespace-insensitive and case-insensitive diffs for example.
These obviously get wacky in the general case but people want them for the
common special case of plain old ascii. Would auto-detection via delegation
to eg chardet [1] in python be adequate do you think @SonOfLilit
https://github.com/SonOfLilit?

[1] https://github.com/chardet/chardet


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#71 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAA6fUNNg9OSOnPonqPf1srU3Kx8svQcks5qrvvygaJpZM4J-are
.

from daff.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.