Giter Site home page Giter Site logo

java-diff-utils / java-diff-utils Goto Github PK

View Code? Open in Web Editor NEW
1.2K 30.0 176.0 2.92 MB

Diff Utils library is an OpenSource library for performing the comparison / diff operations between texts or some kind of data: computing diffs, applying patches, generating unified diffs or parsing them, generating diff output for easy future displaying (like side-by-side view) and so on.

Home Page: https://java-diff-utils.github.io/java-diff-utils/

License: Apache License 2.0

Java 94.10% Shell 5.90%
diff java unified-diffs tools inline merge-text java-diff-utils computing-diffs diff-algorithm meyer

java-diff-utils's Introduction

java-diff-utils

Status

Build Status

Build Status using Github Actions

Codacy Badge

Maven Central

Intro

Diff Utils library is an OpenSource library for performing the comparison operations between texts: computing diffs, applying patches, generating unified diffs or parsing them, generating diff output for easy future displaying (like side-by-side view) and so on.

Main reason to build this library was the lack of easy-to-use libraries with all the usual stuff you need while working with diff files. Originally it was inspired by JRCS library and it's nice design of diff module.

This is originally a fork of java-diff-utils from Google Code Archive.

API

Javadocs of the actual release version: JavaDocs java-diff-utils

Examples

Look here to find more helpful informations and examples.

These two outputs are generated using this java-diff-utils. The source code can also be found at the Examples page:

Producing a one liner including all difference information.

//create a configured DiffRowGenerator
DiffRowGenerator generator = DiffRowGenerator.create()
                .showInlineDiffs(true)
                .mergeOriginalRevised(true)
                .inlineDiffByWord(true)
                .oldTag(f -> "~")      //introduce markdown style for strikethrough
                .newTag(f -> "**")     //introduce markdown style for bold
                .build();

//compute the differences for two test texts.
List<DiffRow> rows = generator.generateDiffRows(
                Arrays.asList("This is a test senctence."),
                Arrays.asList("This is a test for diffutils."));
        
System.out.println(rows.get(0).getOldLine());

This is a test senctencefor diffutils.

Producing a side by side view of computed differences.

DiffRowGenerator generator = DiffRowGenerator.create()
                .showInlineDiffs(true)
                .inlineDiffByWord(true)
                .oldTag(f -> "~")
                .newTag(f -> "**")
                .build();
List<DiffRow> rows = generator.generateDiffRows(
                Arrays.asList("This is a test senctence.", "This is the second line.", "And here is the finish."),
                Arrays.asList("This is a test for diffutils.", "This is the second line."));
        
System.out.println("|original|new|");
System.out.println("|--------|---|");
for (DiffRow row : rows) {
    System.out.println("|" + row.getOldLine() + "|" + row.getNewLine() + "|");
}
original new
This is a test senctence. This is a test for diffutils.
This is the second line. This is the second line.
And here is the finish.

Main Features

  • computing the difference between two texts.
  • capable to hand more than plain ascii. Arrays or List of any type that implements hashCode() and equals() correctly can be subject to differencing using this library
  • patch and unpatch the text with the given patch
  • parsing the unified diff format
  • producing human-readable differences
  • inline difference construction
  • Algorithms:
    • Myers Standard Algorithm
    • Myers with linear space improvement
    • HistogramDiff using JGit Library

Algorithms

  • Myer's diff
  • HistogramDiff

But it can easily replaced by any other which is better for handing your texts. I have plan to add implementation of some in future.

Source Code conventions

Recently a checkstyle process was integrated into the build process. java-diff-utils follows the sun java format convention. There are no TABs allowed. Use spaces.

public static <T> Patch<T> diff(List<T> original, List<T> revised,
    BiPredicate<T, T> equalizer) throws DiffException {
    if (equalizer != null) {
        return DiffUtils.diff(original, revised,
        new MyersDiff<>(equalizer));
    }
    return DiffUtils.diff(original, revised, new MyersDiff<>());
}

This is a valid piece of source code:

  • blocks without braces are not allowed
  • after control statements (if, while, for) a whitespace is expected
  • the opening brace should be in the same line as the control statement

To Install

Just add the code below to your maven dependencies:

<dependency>
    <groupId>io.github.java-diff-utils</groupId>
    <artifactId>java-diff-utils</artifactId>
    <version>4.12</version>
</dependency>

or using gradle:

// https://mvnrepository.com/artifact/io.github.java-diff-utils/java-diff-utils
implementation "io.github.java-diff-utils:java-diff-utils:4.12"

java-diff-utils's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

java-diff-utils's Issues

How to do git-like conflict output

When git hits a conflict, instead of completely failing, it will create a file like so:

# Unchanged Line
<<<<<< HEAD
# their change
======
# our change
>>>>>>> my-branch

Could I produce something similar with this library?

I'm already using UnifiedDiff, but it only throws when encountering an issue.

javadoc: document when DiffException is thrown

Please add javadoc explaining why DiffException is thrown.

What's the reason that diff-ing two strings may throw DiffException (DiffUtils.diffInline)?
Shouldn't string diff never throw an exception?

DiffRowGenerator replaces < and >

Expected Behavior

DiffRowGenerator would leave < and > the way they are

Actual Behavior

DiffRowGenerator replaces < and > with their html entities.

Steps to Reproduce the Problem

  1. Run DiffRowGenerator.create().generateDiffRows(List.of("<"), List.of("<"))
  2. Observe that the resulting String has &lt; instead of <

Specifications

  • Version: 4.0
  • Platform: Linux
  • Subsystem:

Using reportLinesUnchanged is not an option since that means tags are missing if an entire line is added/deleted. Having something like replaceSomeHTMLEntities with default true but optionally disableable would be nice, or being able to supply a general string normalisation function.

If needed/wanted I can PR this.

Support Hierarchy

Expected Behavior

We need support for generating tree like diffs. Similar to a file system diff where you have folders compared as well as files.
In our case, we have groups of groups and we currently create separate diffs for each group. Having hierarchy support will enable us to create a single unified diff for the customer.

Actual Behavior

Hierarchy not supported

Specifications

  • Version: 4.0
  • Platform: Win/Linux
  • Subsystem:

Problem with using 4.4 in modularized project (cannnot determine module name)

Describe the bug
Update to 4.4 in our JabRef (modularized application) fails. Gradle complains about the module name:

cannot determine module name for /home/travis/.gradle/caches/modules-2/files-2.1/io.github.java-diff-utils/java-diff-utils/4.4/87ebb16140d120da919b62117865954da06981b6/java-diff-utils-4.4.jar
/home/travis/build/JabRef/jabref/src/main/java/org/jabref/gui/mergeentries/DiffHighlighting.java:10: error: package com.github.difflib does not exist

import com.github.difflib.DiffUtils;

To Reproduce
Use the 4.4 in a modularized java project
For reference, this is the PR which is failing: JabRef/jabref#5594

I tried also to add the parent as well:

 compile 'io.github.java-diff-utils:java-diff-utils-parent:4.4'
  compile 'io.github.java-diff-utils:java-diff-utils:4.4'

Expected behavior
No errors

System

  • Java version 13
  • Version 4.4

diff have a problem

Expected Behavior

original new
This is a test senctence. This is a test for diffutils.
This is the second line.

Actual Behavior

original new
This is a test senctence. This is a test for diffutils.**
This is the second line.
         DiffRowGenerator generator = DiffRowGenerator.create()
			.showInlineDiffs(true)
			.inlineDiffByWord(true)
			.oldTag(f -> "~")
			.newTag(f -> "**")
			.build();
	List<DiffRow> rows = generator.generateDiffRows(
			Arrays.asList("This is a test senctence."),
			Arrays.asList("This is a test for diffutils.", "This is the second line."));

	System.out.println("|original|new|");
	System.out.println("|--------|---|");
	for (DiffRow row : rows) {
		System.out.println("|" + row.getOldLine() + "|" + row.getNewLine() + "|");
	}

ignoreBlankLines does not appear to be implemented

Describe the bug
I'd like to remove blank lines from the diff generated by DiffRowGenerator but there's no option. I've had a look at the sources of the 4.5 release and I can see mention of this option there but no implementation or setter. WIP?

System

  • diffutils 4.5 from maven

Diff by phrase functionality?

Thanks a lot for this project. I was wondering if something can be done to achieve the following. Let's say I have 2 versions of a string:

"J. G. Feldstein, Chair" and "T. P. Pastor, Chair"

When using character level diff, I get this result:
image

Makes sense, of course, since it diffs by character and keeps all the ones that were there before. Id I diff by word, this happens:
image

Again, makes sense. However, this makes it difficult for a user to understand what happens. What would be great is if we could "whitelist" a few characters, so that they don't count on the diff, and use it in phrase level. So in that case, if we whitelist whitespace and punctuation, we could get:
image

Would this be feasible and/or possible?

[Feature request] Consider making project modular

It's nice to have zero-dependency implementations. I think it's possible to split library into 2 or modules:

diff-utils-core
diff-utils-myers
diff-utils-jgit

or (just moving jgit dependency out of core library):

diff-utils
diff-utils-jgit

DiffRowGenerator returns too many diffs in special cases

Describe the bug
DiffRowGenerator returns too many diffs in special cases.

To Reproduce
Steps to reproduce the behavior:

 DiffRowGenerator generator = DiffRowGenerator.create()
                .showInlineDiffs(true)
                .reportLinesUnchanged(true)
                .oldTag(f -> "~")
                .newTag(f -> "**")
                .mergeOriginalRevised(true)
//                .inlineDiffByWord(true)
                .build();

        List<DiffRow> diffRows = generator.generateDiffRows(sortedRemovals, sortedAdditions);

sortedRemovals:
["Ich möchte nicht mit einem Bot sprechen.", "Ich soll das schon wieder wiederholen?"]

sortedAdditions:
["Ich möchte nicht mehr mit dir sprechen. Leite mich weiter.", "Kannst du mich zum Kundendienst weiterleiten?"]

Expected behavior
diffRows should have size 2 but it has size 3.

System

  • Java version 1.8
  • Version 4.5

Setting inlineDiffByWord(true) does not produce the error.

DiffRow implements Serializable

Use of DiffRow generates problems in clustered environments (using Infinispan), because DiffRow is not marked as Serializable, but is IS Serializable. Add ... implements Serializable. See attached code
DiffRow.zip

multi-file diffs are not supported

(this is probably not a bug - its just missing)

the issue is that although a diff of multiple files are parsed but the patch information cannot be related back to which file they belong. At least I did not find an API to understand to which files a particular hunk/delta refers to.

To Reproduce

'diff -U olddir newdir > udiff.diff' with some files differing

  • now use UnifiedDiffUtils.parseUnifiedDiff("udiff.diff")

I would expect that it is possible now for every hunk to understand which fromFile and which toFile have been used to produce a particular delta/hunk.

List<String> generated from ResultSet show inconsistent marks as a List<String> written by hand

Expected Behavior

ResultSet from JDBC converted into Strings and then added to a List should show the same differences as a List = Arrays.asList() created by hand.

When using this (entered by hand)

List<String> listOne = Arrays.asList("MASS_TABLE,ID,NUMBER,22,N,", "MASS_TABLE,ISSUEID,NUMBER,22,Y,", "MASS_TABLE,MODIFIED,NUMBER,22,Y,", "MASS_TABLE,SYSTEMNAME,VARCHAR2,1020,Y,");
List<String> listTwo = Arrays.asList("ATOM_TABLE,ID,NUMBER,22,N,", "ATOM_TABLE,ISSUEID,NUMBER,22,Y,", "ATOM_TABLE,MODIFIED,NUMBER,22,Y,", "ATOM_TABLE,ACRONYM,VARCHAR2,255,Y," );

You will get this when ran through the DiffGenerator:

|~MASS_TABLE~,ID,NUMBER,22,N, | **ATOM_TABLE**,ID,NUMBER,22,N,|
|~MASS_TABLE~,ISSUEID,NUMBER,22,Y, | **ATOM_TABLE**,ISSUEID,NUMBER,22,Y,|
|~MASS_TABLE~,MODIFIED,NUMBER,22,Y, | **ATOM_TABLE**,MODIFIED,NUMBER,22,Y,|
|~MASS_TABLE~,~SYSTEMNAME~,VARCHAR2,~1020~,Y, | **ATOM_TABLE**,**ACRONYM**,VARCHAR2,**255**,Y,|

Actual Behavior

DiffGenerator shows apparently incorrect results.

Example:

("NUMBER" on the first row, and "VARCHAR" on the last row show marks ("**") in front of the word to indicate the difference, but don't show the corresponding marks behind it). Also, as the results go further, some differences will be marked on the first list and no corresponding marks will be shown at all on the second list.

| ~MASS_TABLE~,ID,NUMBER,22,N, | **ATOM_TABLE**,**ID**,**NUMBER,22,N, |
| ~MASS_TABLE~,ISSUEID,NUMBER,22,Y, | ATOM_TABLE,ISSUEID,NUMBER,22,Y, |
| ~MASS_TABLE~,MODIFIED,NUMBER,22,Y, | ATOM_TABLE,MODIFIED,NUMBER,22,Y, |
| ~MASS_TABLE~,SYSTEMNAME,VARCHAR2,~1020~,Y, | ATOM_TABLE,ACRONYM,**VARCHAR2,**255**,Y, |

Steps to Reproduce the Problem

1.Connect to two different Oracle Databases and run a SQL query, saving information as text from
ResultSet into different List
2.Compare the two Lists in the DiffGenerator
3.Compare these results to the results when the information is put in by hand and compared using DiffGenerator.

Shows as expected:

  List<String> listOne = Arrays.asList("MASS_TABLE,ID,NUMBER,22,N,", "MASS_TABLE,ISSUEID,NUMBER,22,Y,", "MASS_TABLE,MODIFIED,NUMBER,22,Y,", "MASS_TABLE,SYSTEMNAME,VARCHAR2,1020,Y,");
        List<String> listTwo = Arrays.asList("ATOM_TABLE,ID,NUMBER,22,N,", "ATOM_TABLE,ISSUEID,NUMBER,22,Y,", "ATOM_TABLE,MODIFIED,NUMBER,22,Y,", "ATOM_TABLE,ACRONYM,VARCHAR2,255,Y," );

        DiffRowGenerator generator = DiffRowGenerator.create()
                .showInlineDiffs(true)
                .inlineDiffByWord(true)
                .oldTag(f -> "~")
                .newTag(f -> "**") 
                .build();

        List<DiffRow> diffRowList;

        try {
            diffRowList = generator.generateDiffRows(listOne, listTwo);

            for (DiffRow diffRow : diffRowList) {
                System.out.println("|" + diffRow.getOldLine() + "|" + diffRow.getNewLine() + "|");
            }
        } catch (DiffException e) {
            e.printStackTrace();
        }

Does not show as expected, using database:

...
  List<String> databaseSchemaList = new ArrayList<>();
        try {
            conn = DBHelper.getConnection(databaseName);
            stmt = Objects.requireNonNull(conn).createStatement();
            rs = stmt.executeQuery(query);

            while (rs.next()) {
                int columnCount = rs.getMetaData().getColumnCount();
                StringBuilder rsStringBuilder = new StringBuilder();
                for (int i = 1; i <= columnCount; i++) {
                    Object rsObject = rs.getObject(i);
                    String rsString = (rsObject == null) ? "NULL" : rsObject.toString();
                    rsStringBuilder.append(rsString).append(",");
                }         
                databaseSchemaList.add(rsStringBuilder.toString());
            }
       
        } catch (Exception e) {
            e.printStackTrace();
        } finally {     
            ...
        }
        return databaseSchemaList;   
    }

    private void getComparisonResults(List<String> listOne, List<String> listTwo) {

        DiffRowGenerator generator = DiffRowGenerator.create()
                .showInlineDiffs(true)
                .inlineDiffByWord(true)
                .oldTag(f -> "~") 
                .newTag(f -> "**")
                .build();

        List<DiffRow> diffRowList;

        try {
            diffRowList = generator.generateDiffRows(listOne, listTwo);

            for (DiffRow diffRow : diffRowList) {
                System.out.println("| " + diffRow.getOldLine() + " | " + diffRow.getNewLine() + " |");
            }
      ...
}

 public static void main(String[] args) {

        SchemaComparison schemaComparison = new SchemaComparison();

        //get list from each database
        List<String> aSchemaList = schemaComparison.getDatabaseSchemaList(DatabaseName.A);
        List<String> bSchemaList = schemaComparison.getDatabaseSchemaList(DatabaseName.B);

       //run comparison
        schemaComparison.getComparisonResults(aSchemaList,bSchemaList);

    }

Specifications

  • Version: 2.3 SNAPSHOT
  • Platform: Windows 7
  • Subsystem: Oracle Database 11g, IntelliJ 2018.1

Get list of lines that are common between source and patch

I am trying to iterate through the deltas =, however I would like to get list of lines that are common between the two sets of List by checking
if (delta.getType() == DeltaType.EQUAL) {
}

However I am not able to see how to get this. The JUnit code and test examples do not show this examples. Please help?

Whitespace visualization in diffs?

Code formatting often involves "extra whitespace" or "missing whitespace" or "CRLF vs LF" issues.
Unfortunately, whitespace is not visible by default.

What if there was an option to visualize the whitespace that contributes to the diff?

For instance: use ␍␊ characters when output encoding allows. Use · for space visualization, and so on.

Note: the point is not "visualize all the whitespace characters", but "visualize only those chars that contribute to the diff".

DiffRowGenerator should be able to deliver DiffRow list wwithout HTML tags in oldLine and newLine

By default DiffRowGenerator.generateDiffRows delivers a List of DiffRow where the oldLine and the newLine property are polluted with HTML tags like
and HTML-escaping sequences like &gt.

It should be possible to get the unmodified text lines back for use in projects where HTML enrichment/escaping is not necessary or useful.

My proposal is to introduce a new property reportLinesUnWrapped which can be set by the DiffRowGenerator.Builder. If this property is set to true the DiffRowGenerator delivers back the original text lines.

Please see the attached source.
DiffRowGenerator.zip

Insert next to change is not working as expected.

Expected Behavior

Old : [position: 0, size: 1, lines: [Hello]]
New : [position: 0, size: 1, lines: [Hello World]]
Diff Type : CHANGE

Old : [position: 0, size: 0, lines: []]
New : [position: 1, size: 1, lines: [new]]
Diff Type : INSERT

Old : [position: 2, size: 1, lines: [Last]]
New : [position: 3, size: 0, lines: []]
Diff Type : DELETE

Actual Behavior

Old : [position: 0, size: 1, lines: [Hello]]
New : [position: 0, size: 2, lines: [Hello World, new]]
Diff Type : CHANGE

Old : [position: 2, size: 1, lines: [Last]]
New : [position: 3, size: 0, lines: []]
Diff Type : DELETE

Steps to Reproduce the Problem

1.Use original.txt and revised.txt as the input and compute difference using the code given below.

Original.txt
Hello
Hi
Last

Revised.txt
Hello World
new
Hi

Java code

List first = Files.readAllLines(Paths.get("src/main/resources/original.txt"));
List second = Files.readAllLines(Paths.get("src/main/resources/revised.txt"));
Patch diff = DiffUtils.diff(first, second);

for(Delta s : diff.getDeltas()) {
System.out.println("Old : " + s.getOriginal());
System.out.println("New : " + s.getRevised());
System.out.println("Diff Type : " + s.getType());
System.out.println();
}

Specifications

  • Version: 2.2
  • Platform: Windows
  • Subsystem: Windows 7, JDK 1.8_141 (x64)

Using of LinkedList

Hi!
You can see using LinkedList in many places and also using the method list.get(i). In this case, to get n'th element we should iterate through n-1 items in the list. And a cycle for (int i=0; ... has n^2 complexity.

May be, such code should be replaced with for (T item : list) or LinkedList should be replaced with ArrayList.

For example, in DiffRowGenerator

final List<Delta<String>> deltaList = patch.getDeltas();
        for (int i = 0; i < deltaList.size(); i++) {
            Delta<String> delta = deltaList.get(i);

and patch#deltas is LinkedList.

We also can see get(i) in MyersDiff.java.

usage: DiffRowGenerator to compute git-diff like diffs?

Hi!

First thanks for this project, it's quite interesting and useful 👍
I'm trying to use it to compute diff between piece of Java codes and I want to obtain diff like those created by git-diff. The ultimate objective would be to obtain exactly the same, if possible.

So do you have any hint to achieve this objective?

I started with a very simple solution using DiffRowGenerator but maybe there's already something creating to transform a diff-utils patch to a git-patch or something like that?

[Tests] `LRHistogramDiffTest` is failed when run in isolation

Describe the bug
LRHistogramDiffTest is failing when run in isolation

To Reproduce

  1. Run mvn -Dtest=-Dtest=LRHistogramDiffTest test.
  2. Observe test failing.
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running com.github.difflib.algorithm.jgit.LRHistogramDiffTest
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 28.826 sec <<< FAILURE! - in com.github.difflib.algorithm.jgit.LRHistogramDiffTest
testPossibleDiffHangOnLargeDatasetDnaumenkoIssue26(com.github.difflib.algorithm.jgit.LRHistogramDiffTest)  Time elapsed: 28.776 sec  <<< FAILURE!
java.lang.AssertionError: expected:<50> but was:<246579>
	at com.github.difflib.algorithm.jgit.LRHistogramDiffTest.testPossibleDiffHangOnLargeDatasetDnaumenkoIssue26(LRHistogramDiffTest.java:88)


Results :

Failed tests: 
  LRHistogramDiffTest.testPossibleDiffHangOnLargeDatasetDnaumenkoIssue26:88 expected:<50> but was:<246579>

Tests run: 1, Failures: 1, Errors: 0, Skipped: 0

However running all tests with mvn test is sucessfull.

Expected behavior
Test is passing.

System

  • Java version: 1.8.0_201.
  • Version: Current master (156f4f2).

unified diff parser in unified-diff-parser branch parsing issues

hi

a unified diff such as the one below is not parsed properly. i do see that the parse complete however the fromFile gets updated with the content of the line '---' out of the line fragments. essentially the fromFile is "some comment" after the parse.

--- a.vhd	2019-04-18 13:49:39.516149751 +0200
+++ b.vhd	2019-04-18 11:33:08.372563078 +0200
@@ -2819,3 +2819,3 @@
--- some comment
-bla
-bla
+
+

read the above unified patch into using

  val diff = UnifiedDiffReader.parseUnifiedDiff(diff_file)

Need to get raw text from DiffRow

Code snippet

	DiffRowGenerator generator = DiffRowGenerator.create()
			.showInlineDiffs(true)
			.mergeOriginalRevised(false)
			.inlineDiffByWord(true)
			.oldTag(f -> "~~")
			.newTag(f -> "**")
			.ignoreWhiteSpaces(true)
			.build();
	List<DiffRow> rows = generator.generateDiffRows(content1, content2);
	int line = 1;
	for (DiffRow row : rows) {
		if (isIncluded(row)) {
			// Write out the markdown ...
		}
		line++;
	}

The function isIncluded() is implemented as

	private boolean isIncluded(DiffRow row) {
		if ( row.getTag() == Tag.EQUAL) {
			return false;
		}
		return excludePatterns.stream()
				.noneMatch(p -> p.matcher(row.getOldLine()).find()
						|| p.matcher(row.getNewLine()).find());
	}

where excludePatterns is a list of compiled regular expressions provided by the user.

Challenge

The pattern is matched on the formatted lines, so the user has to provide regular expressions such as
\* &lt;dt&gt;Generated&lt;/dt&gt;&lt;dd&gt;[0-9 :~*-]*&lt;/dd&gt;
rather than
\* <dt>Generated</dt><dd>[0-9 :-]*</dt>
which is more readable and less error prone.

In matter of fact the first regular expression doesn't work in any case because the markdown tag falls in the middle of the </dd> tag. For example:

&lt;dd&gt;2019-07-**26** **05:00&lt;**/dd&gt;

NB I could have used the reportLinesUnchanged(boolean) builder config, but then I would lose the formatted lines which are used to output markdown code. (See the comment below as well)

Request

Provide methods in the DiffRow
public String getRawOldLine()
and
public String getRawNewLine().

(By the way many thanks for this library, it's been very useful)

overcome inheritance of delta types

Dependent on delta types insert, delete or change could be differently implemented. This now is done using fixed inherited classes.

The aim is to implement some kind of DeltaProcessor, that does exactly this apply, restore and so on. With this all change algorithms are externalized from all this classes.

One possible use case is the computation of one document containing all inline changes. This would be a merge of those delta types.

API possibility to get inline changes

At the moment changes are produced at row level. There is the possibility to display those inline changes but only for displaying purposes and not in a class like DiffLine in addition to DiffRow.

Push missing release 4.4 tag

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Example data
  2. simple programm snippet
  3. See error

Expected behavior
A clear and concise description of what you expected to happen.

System

  • Java version
  • Version [e.g. 22]

Make new release

Release 4.0 was released over 9 months ago, and there have been 72 commits to master since this release. Could we make a new release?

Add docs explaining difference with diff-match-patch

Describe the bug
I'd like to know what are the differences between java-diff-utils and diff-match-patch, especially when it comes to performance. Is there a concrete reason why this library was created instead of reusing the former?

Expected behavior
There some docs in the wiki explaining the rationale behind creating this library and how it compares with diff-match-patch.

Differences are marked on some lines correctly, but not marked on others

Expected Behavior

Should mark the differences between the two lists or two files. In my example, I have information stored in CSV files.

Actual Behavior

At first appears to correctly identify the differences between two CSV files, but later the differences are incorrectly marked. I've also tried this with two lists that have the same structure.

The last three lines show it skipping some differences and marking others incorrectly, like putting ** in front rather than around the item that is not the same ('**TASK'), and not marking the differences at all on the last line on the second file but marking them on the first ('ACTIONS_C16913').

Steps to Reproduce the Problem

  1. Have two lists or two files containing a structure similar to this:

File One
(snip)
TABLE_NAME, COLUMN_NAME, DATA_TYPE, DATA_LENGTH, DATA_PRECISION, NULLABLE,
ACTIONS_C17005, ID, NUMBER, 22, 19, N,
ACTIONS_C17005, ISSUEID, NUMBER, 22, 19, Y,
ACTIONS_C17005, MODIFIED, NUMBER, 22, 10, Y,
ACTIONS_C17005, TABLE, VARCHAR2, 1020, null, Y,
ACTIONS_C17005, S_NAME, CLOB, 4000, null, Y,
ACTIONS_C17008, ID, NUMBER, 22, 19, N,
ACTIONS_C17008, ISSUEID, NUMBER, 22, 19, Y,
ACTIONS_C17008, MODIFIED, NUMBER, 22, 10, Y,

File Two
(snip)
TABLE_NAME, COLUMN_NAME, DATA_TYPE, DATA_LENGTH, DATA_PRECISION, NULLABLE,
ACTIONS_C16913, ID, NUMBER, 22, 19, N,
ACTIONS_C16913, ISSUEID, NUMBER, 22, 19, Y,
ACTIONS_C16913, MODIFIED, NUMBER, 22, 10, Y,
ACTIONS_C16913, VRS, NUMBER, 22, 1, Y,
ACTIONS_C16913, ZTABS, VARCHAR2, 255, null, Y,
ACTIONS_C16913, ZTABS_S, VARCHAR2, 255, null, Y,
ACTIONS_C16913, TASK, VARCHAR2, 255, null, Y,
ACTIONS_C16913, HOURS_SPENT, VARCHAR2, 255, null, Y,

import com.github.difflib.algorithm.DiffException;
import com.github.difflib.text.DiffRow;
import com.github.difflib.text.DiffRowGenerator;

import java.io.*;
import java.util.ArrayList;
import java.util.List;

public class Demo {

    public static void main(String[] args) throws DiffException, IOException {
        final String FIRSTDB = "C:\\dev\\fileOneSchema.csv";
        final String SECONDDB = "C:\\dev\\fileTwoSchema.csv";

        BufferedReader in = new BufferedReader(new FileReader(FIRSTDB));
        String strOne;

        List<String> listOne = new ArrayList<String>();
        while ((strOne = in.readLine()) != null) {
            listOne.add(strOne);
        }

        BufferedReader inTwo = new BufferedReader(new FileReader(SECONDDB));
        String strTwo;

        List<String> listTwo = new ArrayList<String>();
        while ((strTwo = inTwo.readLine()) != null) {
            listTwo.add(strTwo);
        }

        DiffRowGenerator generator = DiffRowGenerator.create()
                .showInlineDiffs(true)      //show the ~ ~ and ** ** symbols on each difference
                .inlineDiffByWord(true)     //show the ~ ~ and ** ** around each different word instead of each letter
                //.reportLinesUnchanged(true) //experiment
                .oldTag(f -> "~")
                .newTag(f -> "**")
                .build();

        List<DiffRow> rows = generator.generateDiffRows(listOne, listTwo);

        for (DiffRow row : rows) {
            System.out.println("|" + row.getOldLine() + "| " + row.getNewLine() + " |");
        }
    }
}
  1. Result
    |TABLE_NAME, COLUMN_NAME, DATA_TYPE, DATA_LENGTH, DATA_PRECISION, NULLABLE,| TABLE_NAME, COLUMN_NAME, DATA_TYPE, DATA_LENGTH, DATA_PRECISION, NULLABLE, |
    |ACTIONS_C17005, ID, NUMBER, 22, 19, N, | ACTIONS_C16913, ID, NUMBER, 22, 19, N, |
    |ACTIONS_C17005, ISSUEID, NUMBER, 22, 19, Y, | ACTIONS_C16913, ISSUEID, NUMBER, 22, 19, Y, |
    |ACTIONS_C17005, MODIFIED, NUMBER, 22, 10, Y, | ACTIONS_C16913, MODIFIED, NUMBER, 22, 10, Y, |
    |ACTIONS_C17005, TABLE, VARCHAR2, 1020, null, Y, | ACTIONS_C16913, VRS, **NUMBER, 22, 1, Y, |
    |ACTIONS_C17005, S_NAME, CLOB, 4000, null, Y, | ACTIONS_C16913, ZTABS, **VARCHAR2, 255, null, Y, |
    |ACTIONS_C17008, ID, NUMBER, 22, 19, N, | ACTIONS_C16913, ZTABS_S, VARCHAR2, 255, null, Y, |
    |ACTIONS_C17008, ISSUEID, NUMBER, 22, 19, Y, | ACTIONS_C16913, **TASK, VARCHAR2, 255, null, Y, |
    |ACTIONS_C17008, MODIFIED, NUMBER, 22, 10, Y, | ACTIONS_C16913, HOURS_SPENT, VARCHAR2, 255, null, Y, |

Specifications

  • Version: com.github.wumpz:diffutils:2.2
  • Platform: Windows 7, JDK 8, IntelliJ 2017.3.4
  • Subsystem:

Release to Maven Central

This project looks great!
I'd like to use it for our tests but without a published release it's hard to pull in...

Support for File Creation

Hi there, I ran into a problem when I tried to create a unified diff for the creation of a new file, which seams not to be supported by your library right now.

As far as I can see, this feature is very easy to implement and could help others as well.

It requires just a small change in the UnifiedDiffUtils.processDeltas() method, some changes in the method signatures to get the info about the new File creation into the methods and maybe another method to keep calls to old method signature running with default values.

Actually it just those few lines in the main logic that could cover file creation:

if (createNewFile) {
            origStart = 0; // the 0 here will trigger the creation of the new file
 } else {
            origStart = curDelta.getSource().getPosition() + 1 - contextSize;
            // NOTE: +1 to overcome the 0-offset Position
            if (origStart < 1) {
                origStart = 1;
            }
 }

Cheers, Eva

Unified patches from git do not parse correctly

Expected Behavior

Unified patches from git to parse correctly

Actual Behavior

The git diffs contain extra context after the chunk header @@ suffix (I think its the closest line above the chunk with a '{') but the regex in the parser ends with @@$, expecting @@ to the last thing on the line

Steps to Reproduce the Problem

  1. create a patch file with git diff
  2. note the line in the chunk header
  3. attempt to parse with java-diff-utils

How to get the original line number?

Expected Behavior

The class DiffRow may add field to record the original line number.

Actual Behavior

no method to get the line number.

Steps to Reproduce the Problem

none

Specifications

  • Version: 4.0
  • Platform: Ubuntu 18
  • Subsystem:

Ignore Whitespaces Only Partially Works

Preface
I'm not sure if this is a bug or if I am misusing the tool and/or not configuring it correctly. If so, please consider this to be a question and I would appreciate any information that could be provided.

Thanks!

Describe the bug
When checking for equality, whitespaces are correctly ignored. When generating differences, some whitespaces are still compared while others are deleted from the result.

To Reproduce
Steps to reproduce the behavior:

DiffRowGenerator generator = DiffRowGenerator.create()
		.showInlineDiffs(true)
		.inlineDiffByWord(true) 
		.ignoreWhiteSpaces(true)
		.oldTag(f -> "~")      //introduce markdown style for strikethrough
		.newTag(f -> "**")     //introduce markdown style for bold
		.build();

//compute the differences for two test texts.
List<DiffRow> rows1 = generator.generateDiffRows(
		Arrays.asList("This\nis\na\ntest."),
		Arrays.asList("This is a test"));

or a more basic example using tabs instead of newlines...

//compute the differences for two test texts.
List<DiffRow> rows2 = generator.generateDiffRows(
		Arrays.asList("This\tis\ta\ttest."),
		Arrays.asList("This is a test"));

or an even more basic example that just changes the number of spaces...

//compute the differences for two test texts.
List<DiffRow> rows3 = generator.generateDiffRows(
		Arrays.asList("This  is  a  test."),
		Arrays.asList("This is a test"));

Actual Result

  • rows1:
    (period is correctly identified as an "old" tag while newlines are gone)
`Thisisatest~.~`

(spaces are considered to be "new" tags when we were asking for them to be ignored)

`This** **is** **a** **test`
  • rows2:
    (period is correctly identified as an "old" tag while tabs are also treated as "old" tags)
`This~    ~is~    ~a~    ~test~.~`

(spaces are considered to be "new" tags when we are asking for them to be ignored)

`This** **is** **a** **test`
  • rows3:
    (period is correctly identified as an "old" tag while the spaces are also treated as "old" tags)
`This~  ~is~  ~a~  ~test~.~`

(spaces are considered to be "new" tags when we are asking for them to be ignored)

`This** **is** **a** **test`

Expected Behavior

  • rows1:
This\nis\na\ntest~.~
This is a test
  • rows2:
This\tis\ta\ttest~.~
This is a test
  • rows3:
This  is  a  test~.~
This is a test

Notes
If a period is left at the end of the second string being diff-ed in any of the above blocks, this does not happen and the entire block is identified as matching like was expected.

Suggested Fix

  1. Pass DiffRowGenerator.equalizer down to DiffUtils.diff when DiffRowGenerator.generateInlineDiffs is called.
  2. Use an internal identifier for merging/splitting instead of something that could be contained within the comparison.

System

  • Java version: 8 (1.8.0_151)
  • Diff Utils Version: 4.5

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.