Comments (17)
FYI,
Just being performance testing various implementations of text/binary delta
generation algorithms. Timed your Java implementation against GNU diff. On an
average
your Java implementation was 2-3X slower. It does not appear to be a file IO
issue.
Java file is is marginally slower but not 2-3X. I haven't looked inside the GNU
diff
implementation which is supposed to be based on the same algorithm.
Original comment by [email protected]
on 25 Sep 2007 at 4:33
from google-diff-match-patch.
GNU diff is line-by-line. This implementation is character-by-character. So
if you
had more than two characters per line, then I think I just beat GNU diff.
Original comment by [email protected]
on 25 Sep 2007 at 6:35
from google-diff-match-patch.
Hi Neil,
Great Work!!!!!!!!
I am analyzing your code to find out its suitability to use in our project.
When I am doing a diff calculation, I am getting java.lang.OutOfMemoryError:
Java
heap space error. Following stacktrace might help you to understand the issue.
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.<init>(String.java:216)
at java.lang.StringBuilder.toString(StringBuilder.java:430)
at name.fraser.neil.plaintext.diff_match_patch.diff_map(diff_match_patch
.java:454)
at name.fraser.neil.plaintext.diff_match_patch.diff_compute(diff_match_p
atch.java:227)
at name.fraser.neil.plaintext.diff_match_patch.diff_main(diff_match_patc
h.java:140)
at name.fraser.neil.plaintext.diff_match_patch.diff_compute(diff_match_p
atch.java:269)
at name.fraser.neil.plaintext.diff_match_patch.diff_main(diff_match_patc
h.java:140)
Min heap size is 256M and max heap size is 1024M.
Can you please suggest any possible fix/solution to avoid this problem...
regards
Prasad Ganguri
Original comment by [email protected]
on 21 Apr 2008 at 8:29
from google-diff-match-patch.
What's the size of the text you are diffing?
Original comment by [email protected]
on 21 Apr 2008 at 8:39
from google-diff-match-patch.
Neil,
Thanks for the immediate response.
new text: 155978 characters
old text: 49278 characters
Original comment by [email protected]
on 22 Apr 2008 at 12:35
from google-diff-match-patch.
[deleted comment]
from google-diff-match-patch.
Neil,
YOu know any project to that does "word" by "word" diff instead of char by
char.
like "test" as text 1 and "tall" as text 2 it should show that it is complete
delete
and add new word? Thanks.
Original comment by [email protected]
on 6 Oct 2008 at 4:04
from google-diff-match-patch.
Thanks for this work!
Sorry, I don't have a suggested code change, but is this the expected behaviour?
If an inserted line starts with the same character as the next matching line,
the
first character is deemed to be a match, but the rest of the line and the first
character of the next line is not.
Attached is some code that reproduces the problem.
Original comment by [email protected]
on 16 Dec 2008 at 5:36
Attachments:
from google-diff-match-patch.
To answer my own question: Yes, expected behaviour of a character-match
algorithm (cf
line-by-line).
Original comment by [email protected]
on 16 Dec 2008 at 5:45
from google-diff-match-patch.
Yes, that's both a valid diff, a minimal diff and the expected behaviour.
However,
you are right that this is a semantically unusual behaviour. If you want a
diff
which makes sense to a human, run it through diff_cleanupSemantic. That will
shift
the diff sideways so that the end points line up with the line breaks, word
breaks or
other logical boundaries.
Original comment by [email protected]
on 16 Dec 2008 at 6:04
from google-diff-match-patch.
Yep, thanks very much.
Good work!
Original comment by [email protected]
on 16 Dec 2008 at 6:26
from google-diff-match-patch.
Perhaps a very dumb question, but I'm stumped:
When I create a Python snippet after the Javascript patch demo, I get very
different
results. The patch looks very different and only the first element applies. Can
you
tell me what I'm doing wrong?
Obviously I've searched the web for answers, but no Python programs I've seen
seem to
use this pattern.
Thanks in advance.
Original comment by [email protected]
on 20 Apr 2009 at 6:47
Attachments:
from google-diff-match-patch.
Hello p.j.kers,
Found it, sort of. The issue is Python's treatment of whitespace. Your code's
line
breaks are entirely \n. If I execute your code, I get the same output as you
do.
But if I add a single blank line anywhere in your code, a \r\n line break is
added to
the source code (I'm on Windows). With that line in place, Python returns the
same
verbose output as JavaScript.
I'll take a closer look tomorrow to see what's going on. But it looks like
Python's
whitespace sensitivity, not the library.
Original comment by [email protected]
on 20 Apr 2009 at 7:45
from google-diff-match-patch.
Hi Neil,
Thanks for your rapid response.
Noting the %0A elements in the patch text, I too suspected whitespace issues.
However, using U*X '\n' or DOS '\r\n' makes no difference on my platform
(Linux).
After your response I tried mixed lines too, but I cannot second your
observation -
the results stay unchanged.
After that, I've tried raw strings, unicode strings, loading from files with
different line codings, even with binary loading... All results are the same
when you
ignore the %0A to %0D%0A changes in some patch texts. For now I've run out of
smart
ideas.
FYI, I'm using Python 2.5.1 r251:54863 on Fedora 9, i386 (standard package).
Good luck tomorrow. I'm very curious about what this turns out to be.
Original comment by [email protected]
on 20 Apr 2009 at 8:44
from google-diff-match-patch.
> Good luck tomorrow. I'm very curious about what this turns out to be.
Got it. Whitespace was a complete red-herring. The timeout value in diff
match
patch is in seconds, but in Python it was being treated as ms. On my computer
it is
right on the edge, so whether the diff algorithm timed out or not was
influenced by
everything from the presence of a pyc file, to the kind of music iTunes was
playing.
That was fun to debug.
I've just uploaded a new version to SVN and to the download page which fixes
the
Python timeout. All other languages use milliseconds and do the conversion
properly.
Thanks!
I'm also going to close issue #3, since Google has fixed this issue tracker to
email
me on new issues.
Original comment by [email protected]
on 20 Apr 2009 at 11:12
- Changed state: Fixed
from google-diff-match-patch.
Congratulations, it works! It wouldn't even have crossed my mind to think about
timeout issues. Now I can use your code to build upon. En route you expanded my
grasp
of English idiom too - never heard of a red-herring before. Funny how politics
can
influence language.
Thanks a lot!
Original comment by [email protected]
on 21 Apr 2009 at 9:14
from google-diff-match-patch.
Neil: Do you still update this project? There are a few issues open in this
project and you haven't been "pinged" likely because this issue is closed and
is not immediately visible in the issue list.
Original comment by [email protected]
on 24 Aug 2010 at 5:28
from google-diff-match-patch.
Related Issues (20)
- Consider SQLCLR compatibility / eliminate dependency on System.Web for UrlEncode and UrlDecode HOT 3
- xIndex for instertion after location
- Demo pages not working HOT 4
- Levenshtein distance problem
- objc version generates wrong diffs
- When is this project transferred to github? HOT 1
- Javascript version doesn't handle astral code points correctly
- Diffs of text containing tab characters? to br
- moving to GitHub?
- Issue with match
- JS library gives wrong result.
- C# uses \n instead of \n\r or Environment.NewLine
- c# patch_toText + patch_fromText doesn't work
- Ruby port
- performance slow?
- NewLines appear broken in patches (Python 3, Django 1.6.1) HOT 2
- Patch for /trunk/python3/diff_match_patch.py
- Patch for /trunk/python3/diff_match_patch.py
- Uninitialized string offset: 0 (function diff_cleanupSemanticLossless)
- Text containing HTML HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from google-diff-match-patch.