Comments (8)
I agree. Thanks for bringing this up (and pinging me by email as requested).
This would be an issue with diff_cleanupSemanticLossless which does currently
slide
differences to line up with whitespace, but doesn't rank line breaks as having
a
higher value than a space.
There would seem to be five things to attempt to slide edit boundaries to (in
order
of preference):
1. Start/End of entire text.
2. Blank lines.
3. Line breaks.
4. Whitespace.
5. Non alpha-numeric.
I'll get on this. If you want you can let me know what language you are using
(C++,
JavaScript, Java, Python) and I'll do that one first and send you a prerelease
copy
before I get the others updated. (Yay for 20% time at Google.)
Original comment by [email protected]
on 20 Jun 2008 at 10:34
from google-diff-match-patch.
I just managed to get what I expected by changing a single character in the
source code :-)
I'm working with the javascript version.
In function diff_cleanupSemanticScore(), I changed :
var whitespace = /(\s)/;
var whitespace = /(\n)/;
Of course the clean up obviously no longer works with words, just lines, but
that's a start.
Original comment by [email protected]
on 20 Jun 2008 at 10:35
from google-diff-match-patch.
Aye, a quick and dirty (and 90% effective) solution is to use the following:
function diff_cleanupSemanticScore(one, two, three) {
if (!one || !three) {
// Edges are the best.
return 10;
}
var score = 0;
var whitespace = /\s/;
if (one.charAt(one.length - 1).match(whitespace) ||
two.charAt(0).match(whitespace)) {
score++;
}
if (two.charAt(two.length - 1).match(whitespace) ||
three.charAt(0).match(whitespace)) {
score++;
}
var linebreak = /[\r\n]/;
if (one.charAt(one.length - 1).match(linebreak ) ||
two.charAt(0).match(linebreak )) {
score++;
}
if (two.charAt(two.length - 1).match(linebreak ) ||
three.charAt(0).match(linebreak )) {
score++;
}
return score;
}
This probably even passes the existing unit tests.
Original comment by [email protected]
on 20 Jun 2008 at 10:42
from google-diff-match-patch.
Hehe, you're fast ;-)
Original comment by [email protected]
on 20 Jun 2008 at 10:51
from google-diff-match-patch.
What about :
var linebreak = /[\r*\n]/;
Instead of
var linebreak = /[\r\n]/;
Original comment by [email protected]
on 20 Jun 2008 at 10:55
from google-diff-match-patch.
A new version of diff, match, patch has just been posted in all four languages
which
solves this problem thoroughly using the five-point list in comment 1.
Original comment by [email protected]
on 25 Jun 2008 at 12:22
- Changed state: Fixed
from google-diff-match-patch.
Fixed?
then try:
diff_match_patch_20110217
-------------------------------left:
/**
* init。
*
*/
public void init() {
this.start();
}
-------------------------------right:
/**
* init。
*
* @throws InterruptedException
* @throws SQLException
*
*/
@SuppressWarnings("unused")
private void init() throws SQLException, InterruptedException {
this.start();
}
Original comment by [email protected]
on 24 Mar 2011 at 7:05
from google-diff-match-patch.
The current behaviour is correct. I assume you are using the cleanupSemantic
function? In that case you get a diff with the following sequence:
EQUAL, DELETION, INSERTION, EQUAL
And I assume that you are expecting that the " */" line should be preserved
(which it currently is not). If that were the case, then the diff would have
the following sequence:
EQUAL, INSERTION, EQUAL(" */"), DELETION, INSERTION, EQUAL
This is a more complicated diff sequence, just to save three characters (two of
which match well on an earlier line).
Any change to make diff preserve this line at the expense of a more complicated
diff sequence would also cause it to preserve coincidental matches as well --
three characters is a pretty small match.
Original comment by [email protected]
on 24 Mar 2011 at 5:45
from google-diff-match-patch.
Related Issues (20)
- Consider SQLCLR compatibility / eliminate dependency on System.Web for UrlEncode and UrlDecode HOT 3
- xIndex for instertion after location
- Demo pages not working HOT 4
- Levenshtein distance problem
- objc version generates wrong diffs
- When is this project transferred to github? HOT 1
- Javascript version doesn't handle astral code points correctly
- Diffs of text containing tab characters? to br
- moving to GitHub?
- Issue with match
- JS library gives wrong result.
- C# uses \n instead of \n\r or Environment.NewLine
- c# patch_toText + patch_fromText doesn't work
- Ruby port
- performance slow?
- NewLines appear broken in patches (Python 3, Django 1.6.1) HOT 2
- Patch for /trunk/python3/diff_match_patch.py
- Patch for /trunk/python3/diff_match_patch.py
- Uninitialized string offset: 0 (function diff_cleanupSemanticLossless)
- Text containing HTML HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from google-diff-match-patch.