Giter Site home page Giter Site logo

daisydiff / daisydiff Goto Github PK

View Code? Open in Web Editor NEW
78.0 43.0 63.0 1.58 MB

Visual :white_flower: comparison of HTML in :coffee: Java

Home Page: https://daisydiff.github.io/

License: Apache License 2.0

CSS 0.54% JavaScript 43.07% Java 54.85% XSLT 1.54%
java comparison-tool text-processing html

daisydiff's Introduction

This is a maintenance project of DaisyDiff in Java. The initial commit is a checkout of version 1.2 of old DaisyDiff project.

For more documentation see daisydiff.github.io.

WARNING The maintenance of this repository by the Nuxeo organization is now strictly limited to critical security fixes. If you need some other kind of maintenance, please check the repository's forks or fork it yourself.

Standalone usage

java -jar daisydiff-1.2-NX4-SNAPSHOT-jar-with-dependencies.jar [oldHTML] [newHTML] [optional arguments]

Optional Arguments:

  • --file=[filename] - Write output to the specified file.
  • --type=[html/tag] - Use the html (default) diff algorithm or the tag diff.
  • --css=[cssfile1;cssfile2;cssfile3] - Add external CSS files.
  • --output=[html/xml] - Write html (default) or xml output.
  • --q - Generate less console output.

Example:

java -jar daisydiff-1.2-NX4-SNAPSHOT-jar-with-dependencies.jar http://web.archive.org/web/20070107145418/http://news.bbc.co.uk/ http://web.archive.org/web/20070107182640/http://news.bbc.co.uk/ --css=http://web.archive.org/web/20070107145418/http://news.bbc.co.uk/nol/shared/css/news_r5.css

Requirements: Java 1.5 or 6

Embedded usage

org.outerj.daisy.diff.DaisyDiff{

/**
 * Diffs two html files, outputting the result to the specified consumer.
 */
public static void diffHTML(InputSource oldSource, InputSource newSource, ContentHandler consumer, String prefix, Locale locale) throws SAXException, IOException;

/**
 * Diffs two html files word for word as source, outputting the result to
 * the specified consumer.
 */            
public static void diffTag(String oldText, String newText, ContentHandler consumer) throws Exception;

}

Requirements: Java 1.5 or 6

To run Daisy Diff embedded in your application, you don't need the entire Jar file. A much smaller Jar file without Xerces and NekoHtml will suffice.

PHP

The DaisyDiff algorithm has been integrated in MediaWiki. However, it had major errors and has been pulled out. More info at www.mediawiki.org/wiki/Visual_Diff. See also github.com/cdauth/htmldiff.

Acknowledgements

daisydiff's People

Contributors

ataillefer avatar dependabot[bot] avatar gzgreg avatar jlleitschuh avatar jsdelivrbot avatar peter-kehl avatar troger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

daisydiff's Issues

Check Boxes

When i compare files both having checkboxes or bulletpoints , the diff.htm file displays everything okay but the &#9744 this code is displayed rather than a box.

Issue while comparing table using daisy diff

table compare issue

Hi,

I am comparing two files, each file contains a table, I have edited the contents of the table in one file.
After comparing the these two files with daisy diff, the result is not as expected.
The removed content comes at the top of the table rather it should come beside the content which was edited. You can refer the attached image for more clarity on this scenario.

Node.getLastCommonParent Algorithm

Hi - I'm wondering if anyone could shed some light on the Node.getLastCommonParent() method. If we have the following hierarchy and we're trying to find where to add some deleted text..then supposing an invocation of
node.getLastCommonParent(otherNode)
where

node.getParentTree() => [body, div, p class=foo, span id=_123]
otherNode.getParentTree() => [body, div, p class=bar, span id=_123]

the "last" common ancestor is deemed to be the div at index 1. The algorithm starts from the root and works down until it finds a node that is different in each list, in this instance getting as far as the p tag at index 2 and since they have different classes in both trees, it thinks the previous entry is the last common parent. The issue this seems to cause is that DaisyDiff then places the removed text beneath that div. It would seem to me that since the two nodes have the same parent in both trees, that node should be selected ? i.e. we would want to show the deleted text under the span id=_123.

Is there some fundamental reason I'm failing to grasp that informed the algorithm in getLastCommonParent ?

Missing dependencies

How does one build a self-contained executable jar from this? I am on Ubuntu 16.04 using JDK 8. I have in vain looked for a goal/option that will accomplish this.

Tests fail

All test fail due to different line wrapper of the HTML tags. The main problem is that the compare is done with an assertEqual(str1, str2). In the expect file (/target/test-classes/testdata/General/Minimal/32 Empty tag) it is written e.g.

<p> A <span class="diff-html-added" id="added-diff-0" previous="first-diff" changeId="added-diff-0" next="last-diff"></span>
</p><p>
<table>
<tbody></tbody>
</table>
</p><p> B </p>

But derived is

<p>
    <table>
        <tbody></tbody>
    </table>
</p>

therefore the compare fails. This affects ALL test cases.
Tested with openjdk11.

DaisyDiff adding html code which has not changed as well in daisydiff.htm

Hello friends,

I have added the DaisyDIff library on our system, which I am calling from command line for comparing two html files. Currently, it's working quite good. The only problem we are facing is, when there is no change in the two html files, or part of it remains same, then also it is added in the diff report. How can I exclude this? Any ideas. Thank you.

HTMLDiffer.diff hangs *JVM* on dirty input

Last week we've encountered a HTML page that has a lot of \ and ' dropped in it. Previous version of the same page didn't have that garbage. See attached files: fist.html is initial version of a page and second.html is updated version. When trying to calculate differences between those two using following sample code, HTMLDiffer hangs:

		String html1;
		try (BufferedReader reader = Files.newBufferedReader(Paths.get("first.html"))) {
			StringBuilder bodyBuilder = new StringBuilder();
			String line;
			while ((line = reader.readLine()) != null)
				bodyBuilder.append(line).append('\n');
			html1 = bodyBuilder.toString();
		}

		String html2;
		try (BufferedReader reader = Files.newBufferedReader(Paths.get("second.html"))) {
			StringBuilder bodyBuilder = new StringBuilder();
			String line;
			while ((line = reader.readLine()) != null)
				bodyBuilder.append(line).append('\n');
			html2 = bodyBuilder.toString();
		}
		
		InputStream oldStream = new ByteArrayInputStream(html1.getBytes());
		InputStream newStream = new ByteArrayInputStream(html2.getBytes());
	
		Locale locale = LocaleUtils.toLocale("hr");
	
		HtmlCleaner cleaner = new HtmlCleaner();
	
		InputSource oldSource = new InputSource(oldStream);
		InputSource newSource = new InputSource(newStream);
	
		DomTreeBuilder oldHandler = new DomTreeBuilder();
		cleaner.cleanAndParse(oldSource, oldHandler);
		TextNodeComparator leftComparator = new TextNodeComparator(oldHandler, locale);
	
		DomTreeBuilder newHandler = new DomTreeBuilder();
		cleaner.cleanAndParse(newSource, newHandler);
		TextNodeComparator rightComparator = new TextNodeComparator(newHandler, locale);

		DiffOutput nullDiffOutput = new DiffOutput() {
			@Override
			public void generateOutput(TagNode node) throws SAXException { /* ignore */ }
		};
		
		HTMLDiffer differ = new HTMLDiffer(nullDiffOutput);
		
		System.err.println("Going to hang now :)");
		differ.diff(leftComparator, rightComparator);	// HANG!

I understand that this is some very unusual HTML and it would be perfectly acceptable for diff calculation to last very long, but what is weird is that upon entering HTMLDiffer.diff whole app, which is running inside Tomcat container, stops accepting inbound connections. Other apps from same Tomcat instance (Manager app for instance) stops accepting connections too, what suggests that something happens on JVM level.
I've tried to figure out what global resource HTMLDiffer my consume that would cause whole cause JVM to go down, but couldn't find any. But it's reproducible and happens every time with given HTML input.

Ubuntu 16.04.1 LTS
Apache Tomcat 8.5.6
Oracle Java 1.8.0u112
daisydiff 1.2-NX5-SNAPSHOT

I would gladly provide any additional info or test case if needed.

Error in generating the output: accessExternalDTD' is not recognized

files-to-compare.zip
When I have only this image in my source and destination with a different attribute (like height) in the img tag i have this error:

Is it because of the size of the image?

Comparing documents: C:/Users/smahajan/Desktop/JAR_FILES/old.html and C:/Users/smahajan/Desktop/JAR_FILES/new.html
Diff type: html
Writing html output to daisydiff.htm

.Warning: org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser: Property 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not recognized.
Compiler warnings:
WARNING: 'org.apache.xerces.jaxp.SAXParserImpl: Property 'http://javax.xml.XMLConstants/property/accessExternalDTD' is not recognized.'
Warning: org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser: Property 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not recognized.
Compiler warnings:
WARNING: 'org.apache.xerces.jaxp.SAXParserImpl: Property 'http://javax.xml.XMLConstants/property/accessExternalDTD' is not recognized.'
.Warning: org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser: Property 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not recognized.
Compiler warnings:
WARNING: 'org.apache.xerces.jaxp.SAXParserImpl: Property 'http://javax.xml.XMLConstants/property/accessExternalDTD' is not recognized.'
.java.lang.StackOverflowError
at java.util.regex.Pattern.sequence(Pattern.java:2130)
at java.util.regex.Pattern.expr(Pattern.java:1996)
at java.util.regex.Pattern.compile(Pattern.java:1696)
at java.util.regex.Pattern.(Pattern.java:1351)
at java.util.regex.Pattern.compile(Pattern.java:1028)
at java.lang.String.replaceAll(String.java:2223)
at org.outerj.daisy.diff.html.ancestor.ChangeText.clean(ChangeText.java:107)
at org.outerj.daisy.diff.html.ancestor.ChangeText.addText(ChangeText.java:33)
at org.outerj.daisy.diff.html.ancestor.ChangeText.addTextBrokenAcrossLines(ChangeText.java:90)
at org.outerj.daisy.diff.html.ancestor.ChangeText.addTextCarefully(ChangeText.java:65)
at org.outerj.daisy.diff.html.ancestor.ChangeText.addText(ChangeText.java:36)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.