java-diff-utils / java-diff-utils Goto Github PK

Diff Utils library is an OpenSource library for performing the comparison / diff operations between texts or some kind of data: computing diffs, applying patches, generating unified diffs or parsing them, generating diff output for easy future displaying (like side-by-side view) and so on.

Home Page: https://java-diff-utils.github.io/java-diff-utils/

License: Apache License 2.0

Java 94.10% Shell 5.90%

diff java unified-diffs tools inline merge-text java-diff-utils computing-diffs diff-algorithm meyer

java-diff-utils's Introduction

java-diff-utils

Status

Intro

Diff Utils library is an OpenSource library for performing the comparison operations between texts: computing diffs, applying patches, generating unified diffs or parsing them, generating diff output for easy future displaying (like side-by-side view) and so on.

Main reason to build this library was the lack of easy-to-use libraries with all the usual stuff you need while working with diff files. Originally it was inspired by JRCS library and it's nice design of diff module.

This is originally a fork of java-diff-utils from Google Code Archive.

API

Javadocs of the actual release version: JavaDocs java-diff-utils

Examples

Look here to find more helpful informations and examples.

These two outputs are generated using this java-diff-utils. The source code can also be found at the Examples page:

Producing a one liner including all difference information.

//create a configured DiffRowGenerator
DiffRowGenerator generator = DiffRowGenerator.create()
                .showInlineDiffs(true)
                .mergeOriginalRevised(true)
                .inlineDiffByWord(true)
                .oldTag(f -> "~")      //introduce markdown style for strikethrough
                .newTag(f -> "**")     //introduce markdown style for bold
                .build();

//compute the differences for two test texts.
List<DiffRow> rows = generator.generateDiffRows(
                Arrays.asList("This is a test senctence."),
                Arrays.asList("This is a test for diffutils."));
        
System.out.println(rows.get(0).getOldLine());

This is a test ~~senctence~~for diffutils.

Producing a side by side view of computed differences.

DiffRowGenerator generator = DiffRowGenerator.create()
                .showInlineDiffs(true)
                .inlineDiffByWord(true)
                .oldTag(f -> "~")
                .newTag(f -> "**")
                .build();
List<DiffRow> rows = generator.generateDiffRows(
                Arrays.asList("This is a test senctence.", "This is the second line.", "And here is the finish."),
                Arrays.asList("This is a test for diffutils.", "This is the second line."));
        
System.out.println("|original|new|");
System.out.println("|--------|---|");
for (DiffRow row : rows) {
    System.out.println("|" + row.getOldLine() + "|" + row.getNewLine() + "|");
}

original	new
This is a test ~~senctence~~.	This is a test for diffutils.
This is the second line.	This is the second line.
~~And here is the finish.~~

Main Features

computing the difference between two texts.
capable to hand more than plain ascii. Arrays or List of any type that implements hashCode() and equals() correctly can be subject to differencing using this library
patch and unpatch the text with the given patch
parsing the unified diff format
producing human-readable differences
inline difference construction
Algorithms:
- Myers Standard Algorithm
- Myers with linear space improvement
- HistogramDiff using JGit Library

Algorithms

Myer's diff
HistogramDiff

But it can easily replaced by any other which is better for handing your texts. I have plan to add implementation of some in future.

Source Code conventions

Recently a checkstyle process was integrated into the build process. java-diff-utils follows the sun java format convention. There are no TABs allowed. Use spaces.

public static <T> Patch<T> diff(List<T> original, List<T> revised,
    BiPredicate<T, T> equalizer) throws DiffException {
    if (equalizer != null) {
        return DiffUtils.diff(original, revised,
        new MyersDiff<>(equalizer));
    }
    return DiffUtils.diff(original, revised, new MyersDiff<>());
}

This is a valid piece of source code:

blocks without braces are not allowed
after control statements (if, while, for) a whitespace is expected
the opening brace should be in the same line as the control statement

To Install

Just add the code below to your maven dependencies:

<dependency>
    <groupId>io.github.java-diff-utils</groupId>
    <artifactId>java-diff-utils</artifactId>
    <version>4.12</version>
</dependency>

or using gradle:

// https://mvnrepository.com/artifact/io.github.java-diff-utils/java-diff-utils
implementation "io.github.java-diff-utils:java-diff-utils:4.12"

java-diff-utils's People

Stargazers

Watchers

Forkers

alekstrue kogoro johnsmithflyhigh sethsaur davutozcan87 zzyfisher engmux virtualsharif hzhou1 linisme letimome junsen stevenpach10 ryan-laundrapp ktsour morristech koppor cc-cpo sam-yr lhns kansassamurai acbalan paulherger 0xflotus jdelker qnghiembp-3 trinhtrannp jbjerre gao2q lzmthscarvalho raymondhung010 gitliveapp rosta 948462448 earthcomputer 972898415 lathys mowdownjoe benoitpflieger dector mylovepooh ertaowang funcguy denis256 tchigher rockystevejobs mfrank2016 wangwang1069 xiakexingji romaicus ibrahimalii codebyteme yoyo-yy-yoyo yoursungjin mbobiosio bcui6611 zhangjenkins thanajade dandycheung finaledison githubxiaowangzi francescoz93 lnsylt024 livingman sullis judge-girl kibi xiaohei56 linhttbk johannesspecht mjserrato koryovip huganghui fgsoap mtelling benjame zhuyoufeng jardevbox kishorkunal-raj shinailu sanjaymsh alonhar jeffyuan mikesamuel ragroll kuangjz stormdb tactile q977734161 daniele2open jianwuforks sinaweyrich wanghaili techcable carolinesongbow unrealperson666 byrson elanzini arnesacnussem anatawa12

java-diff-utils's Issues

How to do git-like conflict output

When git hits a conflict, instead of completely failing, it will create a file like so:

# Unchanged Line
<<<<<< HEAD
# their change
======
# our change
>>>>>>> my-branch

Could I produce something similar with this library?

I'm already using UnifiedDiff, but it only throws when encountering an issue.

linefeed possibly within new / old tag

e.g. <span <br/> class ... >

That should not happen.

javadoc: document when DiffException is thrown

Please add javadoc explaining why DiffException is thrown.

What's the reason that diff-ing two strings may throw DiffException (DiffUtils.diffInline)?
Shouldn't string diff never throw an exception?

DiffRowGenerator replaces < and >

Expected Behavior

DiffRowGenerator would leave < and > the way they are

Actual Behavior

DiffRowGenerator replaces < and > with their html entities.

Steps to Reproduce the Problem

Run DiffRowGenerator.create().generateDiffRows(List.of("<"), List.of("<"))
Observe that the resulting String has < instead of <

Specifications

Version: 4.0
Platform: Linux
Subsystem:

Using reportLinesUnchanged is not an option since that means tags are missing if an entire line is added/deleted. Having something like replaceSomeHTMLEntities with default true but optionally disableable would be nice, or being able to supply a general string normalisation function.

If needed/wanted I can PR this.

upgrade infrastructure

junit 5
aspectj
pom dependencies

Support Hierarchy

Expected Behavior

We need support for generating tree like diffs. Similar to a file system diff where you have folders compared as well as files.
In our case, we have groups of groups and we currently create separate diffs for each group. Having hierarchy support will enable us to create a single unified diff for the customer.

Actual Behavior

Hierarchy not supported

Specifications

Version: 4.0
Platform: Win/Linux
Subsystem:

Problem with using 4.4 in modularized project (cannnot determine module name)

Describe the bug
Update to 4.4 in our JabRef (modularized application) fails. Gradle complains about the module name:

cannot determine module name for /home/travis/.gradle/caches/modules-2/files-2.1/io.github.java-diff-utils/java-diff-utils/4.4/87ebb16140d120da919b62117865954da06981b6/java-diff-utils-4.4.jar
/home/travis/build/JabRef/jabref/src/main/java/org/jabref/gui/mergeentries/DiffHighlighting.java:10: error: package com.github.difflib does not exist

import com.github.difflib.DiffUtils;

To Reproduce
Use the 4.4 in a modularized java project
For reference, this is the PR which is failing: JabRef/jabref#5594

I tried also to add the parent as well:

 compile 'io.github.java-diff-utils:java-diff-utils-parent:4.4'
  compile 'io.github.java-diff-utils:java-diff-utils:4.4'

Expected behavior
No errors

System

Java version 13
Version 4.4

diff have a problem

Expected Behavior

original	new
This is a test ~~senctence~~.	This is a test for diffutils.
	This is the second line.

Actual Behavior

original	new
This is a test ~~senctence~~.	This is a test for diffutils.**
	This is the second line.

         DiffRowGenerator generator = DiffRowGenerator.create()
			.showInlineDiffs(true)
			.inlineDiffByWord(true)
			.oldTag(f -> "~")
			.newTag(f -> "**")
			.build();
	List<DiffRow> rows = generator.generateDiffRows(
			Arrays.asList("This is a test senctence."),
			Arrays.asList("This is a test for diffutils.", "This is the second line."));

	System.out.println("|original|new|");
	System.out.println("|--------|---|");
	for (DiffRow row : rows) {
		System.out.println("|" + row.getOldLine() + "|" + row.getNewLine() + "|");
	}

implement other versions of meyers algorithm to successfully analyze large datasets

ignoreBlankLines does not appear to be implemented

Describe the bug
I'd like to remove blank lines from the diff generated by DiffRowGenerator but there's no option. I've had a look at the sources of the 4.5 release and I can see mention of this option there but no implementation or setter. WIP?

System

diffutils 4.5 from maven

Diff by phrase functionality?

Thanks a lot for this project. I was wondering if something can be done to achieve the following. Let's say I have 2 versions of a string:

"J. G. Feldstein, Chair" and "T. P. Pastor, Chair"

When using character level diff, I get this result:

Makes sense, of course, since it diffs by character and keeps all the ones that were there before. Id I diff by word, this happens:

Again, makes sense. However, this makes it difficult for a user to understand what happens. What would be great is if we could "whitelist" a few characters, so that they don't count on the diff, and use it in phrase level. So in that case, if we whitelist whitespace and punctuation, we could get:

Would this be feasible and/or possible?

[Feature request] Consider making project modular

It's nice to have zero-dependency implementations. I think it's possible to split library into 2 or modules:

diff-utils-core
diff-utils-myers
diff-utils-jgit

or (just moving jgit dependency out of core library):

diff-utils
diff-utils-jgit

DiffRowGenerator returns too many diffs in special cases

Describe the bug
DiffRowGenerator returns too many diffs in special cases.

To Reproduce
Steps to reproduce the behavior:

 DiffRowGenerator generator = DiffRowGenerator.create()
                .showInlineDiffs(true)
                .reportLinesUnchanged(true)
                .oldTag(f -> "~")
                .newTag(f -> "**")
                .mergeOriginalRevised(true)
//                .inlineDiffByWord(true)
                .build();

        List<DiffRow> diffRows = generator.generateDiffRows(sortedRemovals, sortedAdditions);

sortedRemovals:
["Ich möchte nicht mit einem Bot sprechen.", "Ich soll das schon wieder wiederholen?"]

sortedAdditions:
["Ich möchte nicht mehr mit dir sprechen. Leite mich weiter.", "Kannst du mich zum Kundendienst weiterleiten?"]

Expected behavior
diffRows should have size 2 but it has size 3.

System

Java version 1.8
Version 4.5

Setting inlineDiffByWord(true) does not produce the error.

DiffRow implements Serializable

Use of DiffRow generates problems in clustered environments (using Infinispan), because DiffRow is not marked as Serializable, but is IS Serializable. Add ... implements Serializable. See attached code
DiffRow.zip

More examples and some colored online differences at the readme and wiki

multi-file diffs are not supported

(this is probably not a bug - its just missing)

the issue is that although a diff of multiple files are parsed but the patch information cannot be related back to which file they belong. At least I did not find an API to understand to which files a particular hunk/delta refers to.

To Reproduce

'diff -U olddir newdir > udiff.diff' with some files differing

now use UnifiedDiffUtils.parseUnifiedDiff("udiff.diff")

I would expect that it is possible now for every hunk to understand which fromFile and which toFile have been used to produce a particular delta/hunk.

List<String> generated from ResultSet show inconsistent marks as a List<String> written by hand

Expected Behavior

ResultSet from JDBC converted into Strings and then added to a List should show the same differences as a List = Arrays.asList() created by hand.

When using this (entered by hand)

List<String> listOne = Arrays.asList("MASS_TABLE,ID,NUMBER,22,N,", "MASS_TABLE,ISSUEID,NUMBER,22,Y,", "MASS_TABLE,MODIFIED,NUMBER,22,Y,", "MASS_TABLE,SYSTEMNAME,VARCHAR2,1020,Y,");
List<String> listTwo = Arrays.asList("ATOM_TABLE,ID,NUMBER,22,N,", "ATOM_TABLE,ISSUEID,NUMBER,22,Y,", "ATOM_TABLE,MODIFIED,NUMBER,22,Y,", "ATOM_TABLE,ACRONYM,VARCHAR2,255,Y," );

You will get this when ran through the DiffGenerator:

|~MASS_TABLE~,ID,NUMBER,22,N, | **ATOM_TABLE**,ID,NUMBER,22,N,|
|~MASS_TABLE~,ISSUEID,NUMBER,22,Y, | **ATOM_TABLE**,ISSUEID,NUMBER,22,Y,|
|~MASS_TABLE~,MODIFIED,NUMBER,22,Y, | **ATOM_TABLE**,MODIFIED,NUMBER,22,Y,|
|~MASS_TABLE~,~SYSTEMNAME~,VARCHAR2,~1020~,Y, | **ATOM_TABLE**,**ACRONYM**,VARCHAR2,**255**,Y,|

Actual Behavior

DiffGenerator shows apparently incorrect results.

Example:

("NUMBER" on the first row, and "VARCHAR" on the last row show marks ("**") in front of the word to indicate the difference, but don't show the corresponding marks behind it). Also, as the results go further, some differences will be marked on the first list and no corresponding marks will be shown at all on the second list.

| ~MASS_TABLE~,ID,NUMBER,22,N, | **ATOM_TABLE**,**ID**,**NUMBER,22,N, |
| ~MASS_TABLE~,ISSUEID,NUMBER,22,Y, | ATOM_TABLE,ISSUEID,NUMBER,22,Y, |
| ~MASS_TABLE~,MODIFIED,NUMBER,22,Y, | ATOM_TABLE,MODIFIED,NUMBER,22,Y, |
| ~MASS_TABLE~,SYSTEMNAME,VARCHAR2,~1020~,Y, | ATOM_TABLE,ACRONYM,**VARCHAR2,**255**,Y, |

Steps to Reproduce the Problem

1.Connect to two different Oracle Databases and run a SQL query, saving information as text from
ResultSet into different List
2.Compare the two Lists in the DiffGenerator
3.Compare these results to the results when the information is put in by hand and compared using DiffGenerator.

Shows as expected:

  List<String> listOne = Arrays.asList("MASS_TABLE,ID,NUMBER,22,N,", "MASS_TABLE,ISSUEID,NUMBER,22,Y,", "MASS_TABLE,MODIFIED,NUMBER,22,Y,", "MASS_TABLE,SYSTEMNAME,VARCHAR2,1020,Y,");
        List<String> listTwo = Arrays.asList("ATOM_TABLE,ID,NUMBER,22,N,", "ATOM_TABLE,ISSUEID,NUMBER,22,Y,", "ATOM_TABLE,MODIFIED,NUMBER,22,Y,", "ATOM_TABLE,ACRONYM,VARCHAR2,255,Y," );

        DiffRowGenerator generator = DiffRowGenerator.create()
                .showInlineDiffs(true)
                .inlineDiffByWord(true)
                .oldTag(f -> "~")
                .newTag(f -> "**") 
                .build();

        List<DiffRow> diffRowList;

        try {
            diffRowList = generator.generateDiffRows(listOne, listTwo);

            for (DiffRow diffRow : diffRowList) {
                System.out.println("|" + diffRow.getOldLine() + "|" + diffRow.getNewLine() + "|");
            }
        } catch (DiffException e) {
            e.printStackTrace();
        }

Does not show as expected, using database:

...
  List<String> databaseSchemaList = new ArrayList<>();
        try {
            conn = DBHelper.getConnection(databaseName);
            stmt = Objects.requireNonNull(conn).createStatement();
            rs = stmt.executeQuery(query);

            while (rs.next()) {
                int columnCount = rs.getMetaData().getColumnCount();
                StringBuilder rsStringBuilder = new StringBuilder();
                for (int i = 1; i <= columnCount; i++) {
                    Object rsObject = rs.getObject(i);
                    String rsString = (rsObject == null) ? "NULL" : rsObject.toString();
                    rsStringBuilder.append(rsString).append(",");
                }         
                databaseSchemaList.add(rsStringBuilder.toString());
            }
       
        } catch (Exception e) {
            e.printStackTrace();
        } finally {     
            ...
        }
        return databaseSchemaList;   
    }

    private void getComparisonResults(List<String> listOne, List<String> listTwo) {

        DiffRowGenerator generator = DiffRowGenerator.create()
                .showInlineDiffs(true)
                .inlineDiffByWord(true)
                .oldTag(f -> "~") 
                .newTag(f -> "**")
                .build();

        List<DiffRow> diffRowList;

        try {
            diffRowList = generator.generateDiffRows(listOne, listTwo);

            for (DiffRow diffRow : diffRowList) {
                System.out.println("| " + diffRow.getOldLine() + " | " + diffRow.getNewLine() + " |");
            }
      ...
}

 public static void main(String[] args) {

        SchemaComparison schemaComparison = new SchemaComparison();

        //get list from each database
        List<String> aSchemaList = schemaComparison.getDatabaseSchemaList(DatabaseName.A);
        List<String> bSchemaList = schemaComparison.getDatabaseSchemaList(DatabaseName.B);

       //run comparison
        schemaComparison.getComparisonResults(aSchemaList,bSchemaList);

    }

Specifications

Version: 2.3 SNAPSHOT
Platform: Windows 7
Subsystem: Oracle Database 11g, IntelliJ 2018.1

Get list of lines that are common between source and patch

I am trying to iterate through the deltas =, however I would like to get list of lines that are common between the two sets of List by checking
if (delta.getType() == DeltaType.EQUAL) {
}

However I am not able to see how to get this. The JUnit code and test examples do not show this examples. Please help?

text/DiffRowGenerator customizing (custom algorithm, custom equalizer)

The DiffRowGenerator should be as customizable as the DiffUtils itself:

algorithm
equalizer

Moreover it would be helpfull to provide the type of Tag being processed by the tagGenerator.

Need Help: UTF 8 Support

How to get UTF 8 support in DiffRowGenerator class?

Introduce more shortcuts in DiffUtil class

Whitespace visualization in diffs?

Code formatting often involves "extra whitespace" or "missing whitespace" or "CRLF vs LF" issues.
Unfortunately, whitespace is not visible by default.

What if there was an option to visualize the whitespace that contributes to the diff?

For instance: use ␍␊ characters when output encoding allows. Use · for space visualization, and so on.

Note: the point is not "visualize all the whitespace characters", but "visualize only those chars that contribute to the diff".

integration Java 9 automatic module name via meta.inf

to make it useable in a java 9 module environment

Migration to new java-diff-utils organization

A code review due to the licensing issues is still required. (@koppor)

DiffRowGenerator should be able to deliver DiffRow list wwithout HTML tags in oldLine and newLine

By default DiffRowGenerator.generateDiffRows delivers a List of DiffRow where the oldLine and the newLine property are polluted with HTML tags like
and HTML-escaping sequences like &gt.

It should be possible to get the unmodified text lines back for use in projects where HTML enrichment/escaping is not necessary or useful.

My proposal is to introduce a new property reportLinesUnWrapped which can be set by the DiffRowGenerator.Builder. If this property is set to true the DiffRowGenerator delivers back the original text lines.

Please see the attached source.
DiffRowGenerator.zip

Insert next to change is not working as expected.

Expected Behavior

Old : [position: 0, size: 1, lines: [Hello]]
New : [position: 0, size: 1, lines: [Hello World]]
Diff Type : CHANGE

Old : [position: 0, size: 0, lines: []]
New : [position: 1, size: 1, lines: [new]]
Diff Type : INSERT

Old : [position: 2, size: 1, lines: [Last]]
New : [position: 3, size: 0, lines: []]
Diff Type : DELETE

Actual Behavior

Old : [position: 0, size: 1, lines: [Hello]]
New : [position: 0, size: 2, lines: [Hello World, new]]
Diff Type : CHANGE

Old : [position: 2, size: 1, lines: [Last]]
New : [position: 3, size: 0, lines: []]
Diff Type : DELETE

Steps to Reproduce the Problem

1.Use original.txt and revised.txt as the input and compute difference using the code given below.

Original.txt
Hello
Hi
Last

Revised.txt
Hello World
new
Hi

Java code

List first = Files.readAllLines(Paths.get("src/main/resources/original.txt"));
List second = Files.readAllLines(Paths.get("src/main/resources/revised.txt"));
Patch diff = DiffUtils.diff(first, second);

for(Delta s : diff.getDeltas()) {
System.out.println("Old : " + s.getOriginal());
System.out.println("New : " + s.getRevised());
System.out.println("Diff Type : " + s.getType());
System.out.println();
}

Specifications

Version: 2.2
Platform: Windows
Subsystem: Windows 7, JDK 1.8_141 (x64)

Using of LinkedList

Hi!
You can see using LinkedList in many places and also using the method list.get(i). In this case, to get n'th element we should iterate through n-1 items in the list. And a cycle for (int i=0; ... has n^2 complexity.

May be, such code should be replaced with for (T item : list) or LinkedList should be replaced with ArrayList.

For example, in DiffRowGenerator

final List<Delta<String>> deltaList = patch.getDeltas();
        for (int i = 0; i < deltaList.size(); i++) {
            Delta<String> delta = deltaList.get(i);

and patch#deltas is LinkedList.

We also can see get(i) in MyersDiff.java.

usage: DiffRowGenerator to compute git-diff like diffs?

Hi!

First thanks for this project, it's quite interesting and useful 👍
I'm trying to use it to compute diff between piece of Java codes and I want to obtain diff like those created by git-diff. The ultimate objective would be to obtain exactly the same, if possible.

So do you have any hint to achieve this objective?

I started with a very simple solution using DiffRowGenerator but maybe there's already something creating to transform a diff-utils patch to a git-patch or something like that?

[Tests] `LRHistogramDiffTest` is failed when run in isolation

Describe the bug
LRHistogramDiffTest is failing when run in isolation

To Reproduce

Run mvn -Dtest=-Dtest=LRHistogramDiffTest test.
Observe test failing.

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running com.github.difflib.algorithm.jgit.LRHistogramDiffTest
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 28.826 sec <<< FAILURE! - in com.github.difflib.algorithm.jgit.LRHistogramDiffTest
testPossibleDiffHangOnLargeDatasetDnaumenkoIssue26(com.github.difflib.algorithm.jgit.LRHistogramDiffTest)  Time elapsed: 28.776 sec  <<< FAILURE!
java.lang.AssertionError: expected:<50> but was:<246579>
	at com.github.difflib.algorithm.jgit.LRHistogramDiffTest.testPossibleDiffHangOnLargeDatasetDnaumenkoIssue26(LRHistogramDiffTest.java:88)


Results :

Failed tests: 
  LRHistogramDiffTest.testPossibleDiffHangOnLargeDatasetDnaumenkoIssue26:88 expected:<50> but was:<246579>

Tests run: 1, Failures: 1, Errors: 0, Skipped: 0

However running all tests with mvn test is sucessfull.

Expected behavior
Test is passing.

System

Java version: 1.8.0_201.
Version: Current master (156f4f2).

unified diff parser in unified-diff-parser branch parsing issues

a unified diff such as the one below is not parsed properly. i do see that the parse complete however the fromFile gets updated with the content of the line '---' out of the line fragments. essentially the fromFile is "some comment" after the parse.

--- a.vhd	2019-04-18 13:49:39.516149751 +0200
+++ b.vhd	2019-04-18 11:33:08.372563078 +0200
@@ -2819,3 +2819,3 @@
--- some comment
-bla
-bla
+
+

read the above unified patch into using

  val diff = UnifiedDiffReader.parseUnifiedDiff(diff_file)

Need to get raw text from DiffRow

Code snippet

	DiffRowGenerator generator = DiffRowGenerator.create()
			.showInlineDiffs(true)
			.mergeOriginalRevised(false)
			.inlineDiffByWord(true)
			.oldTag(f -> "~~")
			.newTag(f -> "**")
			.ignoreWhiteSpaces(true)
			.build();
	List<DiffRow> rows = generator.generateDiffRows(content1, content2);
	int line = 1;
	for (DiffRow row : rows) {
		if (isIncluded(row)) {
			// Write out the markdown ...
		}
		line++;
	}

The function isIncluded() is implemented as

	private boolean isIncluded(DiffRow row) {
		if ( row.getTag() == Tag.EQUAL) {
			return false;
		}
		return excludePatterns.stream()
				.noneMatch(p -> p.matcher(row.getOldLine()).find()
						|| p.matcher(row.getNewLine()).find());
	}

where excludePatterns is a list of compiled regular expressions provided by the user.

Challenge

The pattern is matched on the formatted lines, so the user has to provide regular expressions such as
\* <dt>Generated</dt><dd>[0-9 :~*-]*</dd>
rather than
\* <dt>Generated</dt><dd>[0-9 :-]*</dt>
which is more readable and less error prone.

In matter of fact the first regular expression doesn't work in any case because the markdown tag falls in the middle of the </dd> tag. For example:

<dd>2019-07-**26** **05:00<**/dd>

NB I could have used the reportLinesUnchanged(boolean) builder config, but then I would lose the formatted lines which are used to output markdown code. (See the comment below as well)

Request

Provide methods in the DiffRow
public String getRawOldLine()
and
public String getRawNewLine().

(By the way many thanks for this library, it's been very useful)

overcome inheritance of delta types

Dependent on delta types insert, delete or change could be differently implemented. This now is done using fixed inherited classes.

The aim is to implement some kind of DeltaProcessor, that does exactly this apply, restore and so on. With this all change algorithms are externalized from all this classes.

One possible use case is the computation of one document containing all inline changes. This would be a merge of those delta types.

API possibility to get inline changes

At the moment changes are produced at row level. There is the possibility to display those inline changes but only for displaying purposes and not in a class like DiffLine in addition to DiffRow.

Push missing release 4.4 tag

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

Example data
simple programm snippet
See error

Expected behavior
A clear and concise description of what you expected to happen.

System

Java version
Version [e.g. 22]

It is possible to avoid the dependency on jgit?

Its pulling it into my final jar even though I can see it is only used by jgit/HistogramDiff

wiki examples are not working (anymore)

The examples should rebuild and shortened. Some refactorings should apply to those examples as well.

Release 3.0

[small thing] Maven dependency in README lists 4.0-SNAPSHOT instead of 4.0

As far as I can tell the version should just be 4.0, not 4.0-SNAPSHOT as listed in the readme, just thought I'd point that out!

Make new release

Release 4.0 was released over 9 months ago, and there have been 72 commits to master since this release. Could we make a new release?

Add docs explaining difference with diff-match-patch

Describe the bug
I'd like to know what are the differences between java-diff-utils and diff-match-patch, especially when it comes to performance. Is there a concrete reason why this library was created instead of reusing the former?

Expected behavior
There some docs in the wiki explaining the rationale behind creating this library and how it compares with diff-match-patch.

Differences are marked on some lines correctly, but not marked on others

Expected Behavior

Should mark the differences between the two lists or two files. In my example, I have information stored in CSV files.

Actual Behavior

At first appears to correctly identify the differences between two CSV files, but later the differences are incorrectly marked. I've also tried this with two lists that have the same structure.

The last three lines show it skipping some differences and marking others incorrectly, like putting ** in front rather than around the item that is not the same ('**TASK'), and not marking the differences at all on the last line on the second file but marking them on the first ('ACTIONS_C16913').

Steps to Reproduce the Problem

Have two lists or two files containing a structure similar to this:

File One
(snip)
TABLE_NAME, COLUMN_NAME, DATA_TYPE, DATA_LENGTH, DATA_PRECISION, NULLABLE,
ACTIONS_C17005, ID, NUMBER, 22, 19, N,
ACTIONS_C17005, ISSUEID, NUMBER, 22, 19, Y,
ACTIONS_C17005, MODIFIED, NUMBER, 22, 10, Y,
ACTIONS_C17005, TABLE, VARCHAR2, 1020, null, Y,
ACTIONS_C17005, S_NAME, CLOB, 4000, null, Y,
ACTIONS_C17008, ID, NUMBER, 22, 19, N,
ACTIONS_C17008, ISSUEID, NUMBER, 22, 19, Y,
ACTIONS_C17008, MODIFIED, NUMBER, 22, 10, Y,

File Two
(snip)
TABLE_NAME, COLUMN_NAME, DATA_TYPE, DATA_LENGTH, DATA_PRECISION, NULLABLE,
ACTIONS_C16913, ID, NUMBER, 22, 19, N,
ACTIONS_C16913, ISSUEID, NUMBER, 22, 19, Y,
ACTIONS_C16913, MODIFIED, NUMBER, 22, 10, Y,
ACTIONS_C16913, VRS, NUMBER, 22, 1, Y,
ACTIONS_C16913, ZTABS, VARCHAR2, 255, null, Y,
ACTIONS_C16913, ZTABS_S, VARCHAR2, 255, null, Y,
ACTIONS_C16913, TASK, VARCHAR2, 255, null, Y,
ACTIONS_C16913, HOURS_SPENT, VARCHAR2, 255, null, Y,

import com.github.difflib.algorithm.DiffException;
import com.github.difflib.text.DiffRow;
import com.github.difflib.text.DiffRowGenerator;

import java.io.*;
import java.util.ArrayList;
import java.util.List;

public class Demo {

    public static void main(String[] args) throws DiffException, IOException {
        final String FIRSTDB = "C:\\dev\\fileOneSchema.csv";
        final String SECONDDB = "C:\\dev\\fileTwoSchema.csv";

        BufferedReader in = new BufferedReader(new FileReader(FIRSTDB));
        String strOne;

        List<String> listOne = new ArrayList<String>();
        while ((strOne = in.readLine()) != null) {
            listOne.add(strOne);
        }

        BufferedReader inTwo = new BufferedReader(new FileReader(SECONDDB));
        String strTwo;

        List<String> listTwo = new ArrayList<String>();
        while ((strTwo = inTwo.readLine()) != null) {
            listTwo.add(strTwo);
        }

        DiffRowGenerator generator = DiffRowGenerator.create()
                .showInlineDiffs(true)      //show the ~ ~ and ** ** symbols on each difference
                .inlineDiffByWord(true)     //show the ~ ~ and ** ** around each different word instead of each letter
                //.reportLinesUnchanged(true) //experiment
                .oldTag(f -> "~")
                .newTag(f -> "**")
                .build();

        List<DiffRow> rows = generator.generateDiffRows(listOne, listTwo);

        for (DiffRow row : rows) {
            System.out.println("|" + row.getOldLine() + "| " + row.getNewLine() + " |");
        }
    }
}

Result
|TABLE_NAME, COLUMN_NAME, DATA_TYPE, DATA_LENGTH, DATA_PRECISION, NULLABLE,| TABLE_NAME, COLUMN_NAME, DATA_TYPE, DATA_LENGTH, DATA_PRECISION, NULLABLE, |
|~~ACTIONS_C17005~~, ID, NUMBER, 22, 19, N, | ACTIONS_C16913, ID, NUMBER, 22, 19, N, |
|~~ACTIONS_C17005~~, ISSUEID, NUMBER, 22, 19, Y, | ACTIONS_C16913, ISSUEID, NUMBER, 22, 19, Y, |
|~~ACTIONS_C17005~~, MODIFIED, NUMBER, 22, 10, Y, | ACTIONS_C16913, MODIFIED, NUMBER, 22, 10, Y, |
|~~ACTIONS_C17005~~, ~~TABLE~~, VARCHAR2, ~~1020~~, null, Y, | ACTIONS_C16913, VRS, **NUMBER, 22, 1, Y, |
|~~ACTIONS_C17005~~, ~~S_NAME~~, ~~CLOB~~, ~~4000~~, null, Y, | ACTIONS_C16913, ZTABS, **VARCHAR2, 255, null, Y, |
|~~ACTIONS_C17008~~, ID, NUMBER, 22, 19, N, | ACTIONS_C16913, ZTABS_S, VARCHAR2, 255, null, Y, |
|~~ACTIONS_C17008~~, ISSUEID, NUMBER, 22, 19, Y, | ACTIONS_C16913, **TASK, VARCHAR2, 255, null, Y, |
|~~ACTIONS_C17008~~, MODIFIED, NUMBER, 22, 10, Y, | ACTIONS_C16913, HOURS_SPENT, VARCHAR2, 255, null, Y, |

Specifications

Version: com.github.wumpz:diffutils:2.2
Platform: Windows 7, JDK 8, IntelliJ 2017.3.4
Subsystem:

Release to Maven Central

This project looks great!
I'd like to use it for our tests but without a published release it's hard to pull in...

[Feature request] Consider using gradle

What do you think about using Gradle as a build system? It's faster, don't require to embedded xml-parser chip into your head and has lots of plugins.

Support for File Creation

Hi there, I ran into a problem when I tried to create a unified diff for the creation of a new file, which seams not to be supported by your library right now.

As far as I can see, this feature is very easy to implement and could help others as well.

It requires just a small change in the UnifiedDiffUtils.processDeltas() method, some changes in the method signatures to get the info about the new File creation into the methods and maybe another method to keep calls to old method signature running with default values.

Actually it just those few lines in the main logic that could cover file creation:

if (createNewFile) {
            origStart = 0; // the 0 here will trigger the creation of the new file
 } else {
            origStart = curDelta.getSource().getPosition() + 1 - contextSize;
            // NOTE: +1 to overcome the 0-offset Position
            if (origStart < 1) {
                origStart = 1;
            }
 }

Cheers, Eva

Documenting thread-safety of DiffRowGenerator

Expected Behavior

Confirm the thread-safety on javadoc of DiffRowGenerator

Actual Behavior

Can not find the information about thread-safety from javadoc

Steps to Reproduce the Problem

see https://github.com/wumpz/java-diff-utils/blob/master/src/main/java/com/github/difflib/text/DiffRowGenerator.java

Specifications

Version: current version
Platform:
Subsystem:

release version 2

Integrate some kind of progress listener for diff processing

Unified patches from git do not parse correctly

Expected Behavior

Unified patches from git to parse correctly

Actual Behavior

The git diffs contain extra context after the chunk header @@ suffix (I think its the closest line above the chunk with a '{') but the regex in the parser ends with @@$, expecting @@ to the last thing on the line

Steps to Reproduce the Problem

create a patch file with git diff
note the line in the chunk header
attempt to parse with java-diff-utils

How to get the original line number?

Expected Behavior

The class DiffRow may add field to record the original line number.

Actual Behavior

no method to get the line number.

Steps to Reproduce the Problem

none

Specifications

Version: 4.0
Platform: Ubuntu 18
Subsystem:

Ignore Whitespaces Only Partially Works

Preface
I'm not sure if this is a bug or if I am misusing the tool and/or not configuring it correctly. If so, please consider this to be a question and I would appreciate any information that could be provided.

Thanks!

Describe the bug
When checking for equality, whitespaces are correctly ignored. When generating differences, some whitespaces are still compared while others are deleted from the result.

To Reproduce
Steps to reproduce the behavior:

DiffRowGenerator generator = DiffRowGenerator.create()
		.showInlineDiffs(true)
		.inlineDiffByWord(true) 
		.ignoreWhiteSpaces(true)
		.oldTag(f -> "~")      //introduce markdown style for strikethrough
		.newTag(f -> "**")     //introduce markdown style for bold
		.build();

//compute the differences for two test texts.
List<DiffRow> rows1 = generator.generateDiffRows(
		Arrays.asList("This\nis\na\ntest."),
		Arrays.asList("This is a test"));

or a more basic example using tabs instead of newlines...

//compute the differences for two test texts.
List<DiffRow> rows2 = generator.generateDiffRows(
		Arrays.asList("This\tis\ta\ttest."),
		Arrays.asList("This is a test"));

or an even more basic example that just changes the number of spaces...

//compute the differences for two test texts.
List<DiffRow> rows3 = generator.generateDiffRows(
		Arrays.asList("This  is  a  test."),
		Arrays.asList("This is a test"));

Actual Result

rows1:
(period is correctly identified as an "old" tag while newlines are gone)

`Thisisatest~.~`

(spaces are considered to be "new" tags when we were asking for them to be ignored)

`This** **is** **a** **test`

rows2:
(period is correctly identified as an "old" tag while tabs are also treated as "old" tags)

`This~    ~is~    ~a~    ~test~.~`

(spaces are considered to be "new" tags when we are asking for them to be ignored)

`This** **is** **a** **test`

rows3:
(period is correctly identified as an "old" tag while the spaces are also treated as "old" tags)

`This~  ~is~  ~a~  ~test~.~`

(spaces are considered to be "new" tags when we are asking for them to be ignored)

`This** **is** **a** **test`

Expected Behavior

rows1:

This\nis\na\ntest~.~

This is a test

rows2:

This\tis\ta\ttest~.~

This is a test

rows3:

This  is  a  test~.~

This is a test

Notes
If a period is left at the end of the second string being diff-ed in any of the above blocks, this does not happen and the entire block is identified as matching like was expected.

Suggested Fix

Pass DiffRowGenerator.equalizer down to DiffUtils.diff when DiffRowGenerator.generateInlineDiffs is called.
Use an internal identifier for merging/splitting instead of something that could be contained within the comparison.

System

Java version: 8 (1.8.0_151)
Diff Utils Version: 4.5

java-diff-utils / java-diff-utils Goto Github PK

java-diff-utils's Introduction

java-diff-utils

Status

Intro

API

Examples

Main Features

Algorithms

Source Code conventions

To Install

java-diff-utils's People

Stargazers

Watchers

Forkers

java-diff-utils's Issues

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

Expected Behavior

Actual Behavior

Specifications

Expected Behavior

Actual Behavior

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

Code snippet

Challenge

Request

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

Recommend Projects

Recommend Topics

Recommend Org