Giter Site home page Giter Site logo

xmldiff's People

Contributors

ashapkin avatar keith-hall avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

xmldiff's Issues

"Diff cleanliness" when removing an element that has siblings with the same name

I really like how this library isn't too bothered about the order of different XML elements, i.e. diffing the following:

<root>
    <example1 id="example1" />
    <example2 id="example2" />
    <example3 id="example3" />
    <example4 id="example4" />
</root>

and

<root>
    <example3 id="example3" />
    <example1 id="example1" />
    <example4 id="example4" />
</root>

results in:

= Element "root"
...- Element "example2" "id"="example2"

However, I've noticed that the diff shown when removing an element that has siblings with the same name isn't as "clean" i.e. (just removing the numbers from the element names compared to the sample above):

<root>
    <example id="example1" />
    <example id="example2" />
    <example id="example3" />
    <example id="example4" />
</root>

vs

<root>
    <example id="example3" />
    <example id="example1" />
    <example id="example4" />
</root>

which results in:

= Element "root"
...= Element "example" "id"="example3"
......- Attribute: "id" with value: "example1"
......+ Attribute: "id" with value: "example3"
...= Element "example" "id"="example1"
......- Attribute: "id" with value: "example2"
......+ Attribute: "id" with value: "example1"
...= Element "example" "id"="example4"
......- Attribute: "id" with value: "example3"
......+ Attribute: "id" with value: "example4"
...- Element "example" "id"="example4"

It'd be great if we could add the ability to produce a diff like the following instead:

= Element "root"
...- Element "example" "id"="example2"

In my opinion it makes it much clearer to see what actually changed. I imagine that if we want to keep the old behavior, we could add a parameter to the XmlComparer constructor which would define which comparison algorithm to use, and it could have a default value so as not to affect any existing applications using this library.


I think it's this code and/or this code that would need to change.
I realize that it would probably be useful if we can define what makes an element "the same". When the elements have no children, it is fairly simple - an element with the same name and exact same attributes can be considered the same.
In fact, rather than deciding on complicated logic for the other cases (elements with children, more/less attributes/different values etc.), maybe the simplest method to produce the "cleanest" diffs would be to find the element with the same name that has the least differences. i.e. for each element in the destination document, find all elements with the same name in the source document and order them by the count of differences between the two, and take the one with the least differences. If all the destination elements have already found matches in the source document, those remaining unmatched source elements can be considered removed. Hopefully this won't affect performance too much, as I like how fast this library currently is.
Maybe I'm over-engineering this in my head, but it may even make sense to add weighting so that changes to attributes would score differently to changes to children (or text content), so that something like:

<root>
  <example>Hello World!</example>
  <example attr="value" />
</root>

vs

<root>
  <example attr="foobar" />
  <example>hello world.</example>
</root>

would produce a more "natural" diff - i.e. the Hello World! text becomes hello world. and the attr value becomes foobar, i.e. one "operation", as opposed to 1. removing the attribute 2. adding the text (or the opposite) or 1. removing the attribute. 2. adding the same attribute with a different value being 2 operations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.