Giter Site home page Giter Site logo

Comments (17)

GoogleCodeExporter avatar GoogleCodeExporter commented on June 22, 2024
FYI,

Just being performance testing various implementations of text/binary delta
generation algorithms. Timed your Java implementation against GNU diff. On an 
average
your Java implementation was 2-3X slower. It does not appear to be a file IO 
issue.
Java file is is marginally slower but not 2-3X. I haven't looked inside the GNU 
diff
implementation which is supposed to be based on the same algorithm.

Original comment by [email protected] on 25 Sep 2007 at 4:33

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 22, 2024
GNU diff is line-by-line.  This implementation is character-by-character.  So 
if you
had more than two characters per line, then I think I just beat GNU diff.

Original comment by [email protected] on 25 Sep 2007 at 6:35

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 22, 2024
Hi Neil,

Great Work!!!!!!!!

I am analyzing your code to find out its suitability to use in our project.

When I am doing a diff calculation, I am getting java.lang.OutOfMemoryError: 
Java
heap space error. Following stacktrace might help you to understand the issue.

java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOfRange(Arrays.java:3209)
        at java.lang.String.<init>(String.java:216)
        at java.lang.StringBuilder.toString(StringBuilder.java:430)
        at name.fraser.neil.plaintext.diff_match_patch.diff_map(diff_match_patch    
                                                                        .java:454)
        at name.fraser.neil.plaintext.diff_match_patch.diff_compute(diff_match_p    
                                                                        atch.java:227)
        at name.fraser.neil.plaintext.diff_match_patch.diff_main(diff_match_patc    
                                                                        h.java:140)
        at name.fraser.neil.plaintext.diff_match_patch.diff_compute(diff_match_p    
                                                                        atch.java:269)
        at name.fraser.neil.plaintext.diff_match_patch.diff_main(diff_match_patc    
                                                                        h.java:140)

Min heap size is 256M and max heap size is 1024M.

Can you please suggest any possible fix/solution to avoid this problem...

regards
Prasad Ganguri

Original comment by [email protected] on 21 Apr 2008 at 8:29

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 22, 2024
What's the size of the text you are diffing?

Original comment by [email protected] on 21 Apr 2008 at 8:39

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 22, 2024
Neil, 

Thanks for the immediate response.

new text: 155978 characters
old text: 49278 characters

Original comment by [email protected] on 22 Apr 2008 at 12:35

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 22, 2024
[deleted comment]

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 22, 2024
Neil,

YOu know any project to that does "word" by "word" diff instead of char by 
char. 
like "test" as text 1 and "tall" as text 2 it should show that it is complete 
delete 
and add new word? Thanks.

Original comment by [email protected] on 6 Oct 2008 at 4:04

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 22, 2024
Thanks for this work!

Sorry, I don't have a suggested code change, but is this the expected behaviour?

If an inserted line starts with the same character as the next matching line, 
the
first character is deemed to be a match, but the rest of the line and the first
character of the next line is not.

Attached is some code that reproduces the problem.

Original comment by [email protected] on 16 Dec 2008 at 5:36

Attachments:

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 22, 2024
To answer my own question: Yes, expected behaviour of a character-match 
algorithm (cf
line-by-line).

Original comment by [email protected] on 16 Dec 2008 at 5:45

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 22, 2024
Yes, that's both a valid diff, a minimal diff and the expected behaviour.  
However, 
you are right that this is a semantically unusual behaviour.  If you want a 
diff 
which makes sense to a human, run it through diff_cleanupSemantic.  That will 
shift 
the diff sideways so that the end points line up with the line breaks, word 
breaks or 
other logical boundaries.

Original comment by [email protected] on 16 Dec 2008 at 6:04

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 22, 2024
Yep, thanks very much.

Good work!

Original comment by [email protected] on 16 Dec 2008 at 6:26

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 22, 2024
Perhaps a very dumb question, but I'm stumped:

When I create a Python snippet after the Javascript patch demo, I get very 
different
results. The patch looks very different and only the first element applies. Can 
you
tell me what I'm doing wrong?

Obviously I've searched the web for answers, but no Python programs I've seen 
seem to
use this pattern.

Thanks in advance.

Original comment by [email protected] on 20 Apr 2009 at 6:47

Attachments:

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 22, 2024
Hello p.j.kers,

Found it, sort of.  The issue is Python's treatment of whitespace.  Your code's 
line 
breaks are entirely \n.  If I execute your code, I get the same output as you 
do.
But if I add a single blank line anywhere in your code, a \r\n line break is 
added to 
the source code (I'm on Windows).  With that line in place, Python returns the 
same 
verbose output as JavaScript.

I'll take a closer look tomorrow to see what's going on.  But it looks like 
Python's 
whitespace sensitivity, not the library.

Original comment by [email protected] on 20 Apr 2009 at 7:45

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 22, 2024
Hi Neil,

Thanks for your rapid response.

Noting the %0A elements in the patch text, I too suspected whitespace issues.
However, using U*X '\n' or DOS '\r\n' makes no difference on my platform 
(Linux).
After your response I tried mixed lines too, but I cannot second your 
observation -
the results stay unchanged.

After that, I've tried raw strings, unicode strings, loading from files with
different line codings, even with binary loading... All results are the same 
when you
ignore the %0A to %0D%0A changes in some patch texts. For now I've run out of 
smart
ideas.

FYI, I'm using Python 2.5.1 r251:54863 on Fedora 9, i386 (standard package).

Good luck tomorrow. I'm very curious about what this turns out to be.

Original comment by [email protected] on 20 Apr 2009 at 8:44

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 22, 2024
> Good luck tomorrow. I'm very curious about what this turns out to be.

Got it.  Whitespace was a complete red-herring.  The timeout value in diff 
match 
patch is in seconds, but in Python it was being treated as ms.  On my computer 
it is 
right on the edge, so whether the diff algorithm timed out or not was 
influenced by 
everything from the presence of a pyc file, to the kind of music iTunes was 
playing.  
That was fun to debug.

I've just uploaded a new version to SVN and to the download page which fixes 
the 
Python timeout.  All other languages use milliseconds and do the conversion 
properly.  
Thanks!

I'm also going to close issue #3, since Google has fixed this issue tracker to 
email 
me on new issues.

Original comment by [email protected] on 20 Apr 2009 at 11:12

  • Changed state: Fixed

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 22, 2024
Congratulations, it works! It wouldn't even have crossed my mind to think about
timeout issues. Now I can use your code to build upon. En route you expanded my 
grasp
of English idiom too - never heard of a red-herring before. Funny how politics 
can
influence language.

Thanks a lot!

Original comment by [email protected] on 21 Apr 2009 at 9:14

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 22, 2024
Neil: Do you still update this project? There are a few issues open in this 
project and you haven't been "pinged" likely because this issue is closed and 
is not immediately visible in the issue list.

Original comment by [email protected] on 24 Aug 2010 at 5:28

from google-diff-match-patch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.