Comments (5)
Thanks for reporting this @SonOfLilit. For daff.py
, a hack to make this work is to edit it by hand, replacing codecs.open(path,"r","utf-8")
with codecs.open(path,"r","iso-8859-1")
. With that change, I see a diff of:
@@,a,b
→, à,á→â
You may need to change more if you want the diff itself to be produced in the same encoding rather than utf-8.
How ideally should this work? A parameter specifying encoding? An attempt at autodetection?
from daff.
param should be best, can't rely on what the file says as you can have latin1 in a utf8 file
I guess you could use auto-detection as a default, but will need something to be able to specify when things are crazy.
from daff.
Ideally there should be a cmd parameter because some poor people need to
use utf16, which can't be made sense of without very special treatment.
But more importantly, default behavior should be to work on raw, undecoded
bytes. As long as you never try to split cell contents (e.g. you must
output "[abc->aBc]" and not "a[b->B]c" which might split a character in the
middle in utf8), every other encoding I'm aware of would work just fine,
including utf8, DOS codepages, ISO codepages and Windows codepages (I must
admit I have no idea how pre-Unicode chinese/japanese codepages work, but
they would probably be fine too).
On Sat, Sep 17, 2016, 12:40 AM Carl Sutton [email protected] wrote:
param should be best, can't rely on what the file says as you can have
latin1 in a utf8 file👎 I guess you could use auto-detection as a default, but will need something
to be able to specify when things are crazy.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#71 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAA6fWvR_PGnOYspRD79VcT6HlpCUKtsks5qqwzmgaJpZM4J-are
.
from daff.
Ok, sounds like a parameter is important since there'll always be those who need it.
I'm not sure I can completely avoid touching cell contents. There are options for whitespace-insensitive and case-insensitive diffs for example. These obviously get wacky in the general case but people want them for the common special case of plain old ascii. Would auto-detection via delegation to eg chardet [1] in python be adequate do you think @SonOfLilit?
[1] https://github.com/chardet/chardet
from daff.
As long as you're only touching characters that are ASCII (commas, double
quotes, tabs, spaces) you should be fine with all the encodings I listed as
not needing a parameter - the reason they don't is that they only differ in
the non-ASCII code points.
On Tue, Sep 20, 2016, 12:17 AM Paul Fitzpatrick [email protected]
wrote:
Ok, sounds like a parameter is important since there'll always be those
who need it.I'm not sure I can completely avoid touching cell contents. There are
options for whitespace-insensitive and case-insensitive diffs for example.
These obviously get wacky in the general case but people want them for the
common special case of plain old ascii. Would auto-detection via delegation
to eg chardet [1] in python be adequate do you think @SonOfLilit
https://github.com/SonOfLilit?[1] https://github.com/chardet/chardet
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#71 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAA6fUNNg9OSOnPonqPf1srU3Kx8svQcks5qrvvygaJpZM4J-are
.
from daff.
Related Issues (20)
- TypeScript definition file? HOT 2
- Chrome ext wy-z/github-csv-diff is in Chrome web store. Floppy's is not. HOT 1
- SourceTree keep on chrashing with enabled daff in git project HOT 7
- daff git csv with lfs
- question: Is it possible to specify a tolerance value for floating point comparisons? HOT 5
- Error in daff.js for when running in Deno JavaScript runtime instead of NodeJS HOT 2
- Is it possible to set daff git diff output format? HOT 1
- diff record based on primary key doesn't work HOT 1
- Error: global leaks detected: $haxeUID, f, SqliteDatabase
- [bug] conlict with delta
- windows-1250 encoding csv
- Publish a new package to npm HOT 4
- Missing values
- Document CSV serialization format (RFC 4180?)
- Version published to bower is very old (1.3.16) HOT 2
- Show row number or fixed column in diff HOT 4
- CPU Parallelization for daff ?
- this case looks wrong?
- npm install error
- daff on same file and highlight
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from daff.