Giter Site home page Giter Site logo

diff-match-patch-cpp-stl's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

diff-match-patch-cpp-stl's Issues

query: diff match patch python vs diff match patch c++ stl

Hi team,
I am getting different results for same pair of texts comparison using match_main() for DMP python version and c++ stl in python.

(diff_match_patch_cpp is diff_match_patch_python-1.0.2-py3.7-win-amd64.egg)

import diff_match_patch_cpp as dmp
dmp.match_main(text1, text2, 0, 1000, 32, 0.25)
-1

(this is google diff match patch)

import diff_match_patch
dmp_py = diff_match_patch()
dmp.match_main(text1, text2, 0)
0

being the thresholds: match distance, match threshold to default in both the versions.
Can you please clarify on this?

The critical bug causing crash is revealed in method diff_match_patch::diff_charsToLines

Running the algorythm under fuzzing tool has revealed the conversion issue from signed integer to bigger size unsigned integer. It causes heap-buffer-overflow and further crash:

static void diff_charsToLines(Diffs &diffs, const Lines& lineArray) {
    for (typename Diffs::iterator cur_diff = diffs.begin(); cur_diff != diffs.end(); ++cur_diff) {
      string_t text;
      for (int y = 0; y < (int)(*cur_diff).text.length(); y++) {
        const LinePtr& lp = lineArray[static_cast<size_t>((*cur_diff).text[y])]; <= HERE IS THE PROBLEM
        text.append(lp.first, lp.second);
      }
      (*cur_diff).text.swap(text);
    }
  }

The following fix needs to be applied in order to address. The point is that we should first convert to unsigned of the same size and then its safe to cast to size_t:

  static void diff_charsToLines(Diffs &diffs, const Lines& lineArray) {
    for (typename Diffs::iterator cur_diff = diffs.begin(); cur_diff != diffs.end(); ++cur_diff) {
      string_t text;
      for (int y = 0; y < (int)(*cur_diff).text.length(); y++) {
          typedef typename std::make_unsigned<typename string_t::value_type>::type unsigned_value_type;              <= FIX
          const LinePtr& lp = lineArray[static_cast<size_t>(static_cast<unsigned_value_type>((*cur_diff).text[y]))]; <= FIX
          text.append(lp.first, lp.second);
      }
      (*cur_diff).text.swap(text);
    }
  }

API for Line or Word Diffs

How do I get Line or Word Diffs to work in this library looking at the example on
Line or Word Diffs
most of these function are private I tried changing them to the public and then I get runtime errors, maybe I'm just not calling the API correctly. Can you please add an example.

Thanks.

Potential for triggering debug assertion failure on Windows

std::isalnum, std::isspace, and std::isdigit all have assertions when built with the debug C++ runtime that get violated when a value is not in the property range (0 - 255) but since the type used in this library's is_alnum, etc, is char the range is actually -127 - 127 meaning that any bytes above 127 will cause a debug assertion failure.

Use C++11 compiler for continuous integration

It would be nice to use C++11 features in diff_match_patch. However, the Travis CI builds use older versions of gcc and clang that don't support C++11. Since C++11 support is generally widely available, could the Travis CI builds be updated to use newer compilers?

This would permit the use of:

  • lambdas
  • initializer lists
  • >> for nested template arguments
  • unordered_map
  • and more

Larger files throw an std::vector[] out of bounds assert.

This issue happens when you attempt to do any kind of line matching.

From my debugging efforts I was able to narrow it down to the following function.
static void diff_charsToLines(Diffs &diffs, Lines& lineArray) { for (typename Diffs::iterator cur_diff = diffs.begin(); cur_diff != diffs.end(); ++cur_diff) { string_t text; for (int y = 0; y < (int)(*cur_diff).text.length(); y++) { const LinePtr& lp = lineArray[static_cast<size_t>((*cur_diff).text[y])]; text.append(lp.first, lp.second); } (*cur_diff).text.swap(text); }
It looks like the lineArray[] is the object throwing the exception.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.