ashapkin / xmldiff Goto Github PK
View Code? Open in Web Editor NEWSimple Xml diff tool based on Linq to Xml
License: MIT License
Simple Xml diff tool based on Linq to Xml
License: MIT License
I really like how this library isn't too bothered about the order of different XML elements, i.e. diffing the following:
<root>
<example1 id="example1" />
<example2 id="example2" />
<example3 id="example3" />
<example4 id="example4" />
</root>
and
<root>
<example3 id="example3" />
<example1 id="example1" />
<example4 id="example4" />
</root>
results in:
= Element "root"
...- Element "example2" "id"="example2"
However, I've noticed that the diff shown when removing an element that has siblings with the same name isn't as "clean" i.e. (just removing the numbers from the element names compared to the sample above):
<root>
<example id="example1" />
<example id="example2" />
<example id="example3" />
<example id="example4" />
</root>
vs
<root>
<example id="example3" />
<example id="example1" />
<example id="example4" />
</root>
which results in:
= Element "root"
...= Element "example" "id"="example3"
......- Attribute: "id" with value: "example1"
......+ Attribute: "id" with value: "example3"
...= Element "example" "id"="example1"
......- Attribute: "id" with value: "example2"
......+ Attribute: "id" with value: "example1"
...= Element "example" "id"="example4"
......- Attribute: "id" with value: "example3"
......+ Attribute: "id" with value: "example4"
...- Element "example" "id"="example4"
It'd be great if we could add the ability to produce a diff like the following instead:
= Element "root"
...- Element "example" "id"="example2"
In my opinion it makes it much clearer to see what actually changed. I imagine that if we want to keep the old behavior, we could add a parameter to the XmlComparer
constructor which would define which comparison algorithm to use, and it could have a default value so as not to affect any existing applications using this library.
I think it's this code and/or this code that would need to change.
I realize that it would probably be useful if we can define what makes an element "the same". When the elements have no children, it is fairly simple - an element with the same name and exact same attributes can be considered the same.
In fact, rather than deciding on complicated logic for the other cases (elements with children, more/less attributes/different values etc.), maybe the simplest method to produce the "cleanest" diffs would be to find the element with the same name that has the least differences. i.e. for each element in the destination document, find all elements with the same name in the source document and order them by the count of differences between the two, and take the one with the least differences. If all the destination elements have already found matches in the source document, those remaining unmatched source elements can be considered removed. Hopefully this won't affect performance too much, as I like how fast this library currently is.
Maybe I'm over-engineering this in my head, but it may even make sense to add weighting so that changes to attributes would score differently to changes to children (or text content), so that something like:
<root>
<example>Hello World!</example>
<example attr="value" />
</root>
vs
<root>
<example attr="foobar" />
<example>hello world.</example>
</root>
would produce a more "natural" diff - i.e. the Hello World!
text becomes hello world.
and the attr
value becomes foobar
, i.e. one "operation", as opposed to 1. removing the attribute 2. adding the text
(or the opposite) or 1. removing the attribute. 2. adding the same attribute with a different value
being 2 operations.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.