leutloff / diff-match-patch-cpp-stl Goto Github PK
View Code? Open in Web Editor NEWC++ STL variant of https://code.google.com/p/google-diff-match-patch.
License: Apache License 2.0
C++ STL variant of https://code.google.com/p/google-diff-match-patch.
License: Apache License 2.0
Hi team,
I am getting different results for same pair of texts comparison using match_main() for DMP python version and c++ stl in python.
(diff_match_patch_cpp is diff_match_patch_python-1.0.2-py3.7-win-amd64.egg)
import diff_match_patch_cpp as dmp
dmp.match_main(text1, text2, 0, 1000, 32, 0.25)
-1
(this is google diff match patch)
import diff_match_patch
dmp_py = diff_match_patch()
dmp.match_main(text1, text2, 0)
0
being the thresholds: match distance, match threshold to default in both the versions.
Can you please clarify on this?
Running the algorythm under fuzzing tool has revealed the conversion issue from signed integer to bigger size unsigned integer. It causes heap-buffer-overflow and further crash:
static void diff_charsToLines(Diffs &diffs, const Lines& lineArray) {
for (typename Diffs::iterator cur_diff = diffs.begin(); cur_diff != diffs.end(); ++cur_diff) {
string_t text;
for (int y = 0; y < (int)(*cur_diff).text.length(); y++) {
const LinePtr& lp = lineArray[static_cast<size_t>((*cur_diff).text[y])]; <= HERE IS THE PROBLEM
text.append(lp.first, lp.second);
}
(*cur_diff).text.swap(text);
}
}
The following fix needs to be applied in order to address. The point is that we should first convert to unsigned of the same size and then its safe to cast to size_t:
static void diff_charsToLines(Diffs &diffs, const Lines& lineArray) {
for (typename Diffs::iterator cur_diff = diffs.begin(); cur_diff != diffs.end(); ++cur_diff) {
string_t text;
for (int y = 0; y < (int)(*cur_diff).text.length(); y++) {
typedef typename std::make_unsigned<typename string_t::value_type>::type unsigned_value_type; <= FIX
const LinePtr& lp = lineArray[static_cast<size_t>(static_cast<unsigned_value_type>((*cur_diff).text[y]))]; <= FIX
text.append(lp.first, lp.second);
}
(*cur_diff).text.swap(text);
}
}
How do I get Line or Word Diffs to work in this library looking at the example on
Line or Word Diffs
most of these function are private I tried changing them to the public and then I get runtime errors, maybe I'm just not calling the API correctly. Can you please add an example.
Thanks.
std::isalnum
, std::isspace
, and std::isdigit
all have assertions when built with the debug C++ runtime that get violated when a value is not in the property range (0 - 255) but since the type used in this library's is_alnum
, etc, is char
the range is actually -127 - 127 meaning that any bytes above 127 will cause a debug assertion failure.
It would be nice to use C++11 features in diff_match_patch. However, the Travis CI builds use older versions of gcc and clang that don't support C++11. Since C++11 support is generally widely available, could the Travis CI builds be updated to use newer compilers?
This would permit the use of:
>>
for nested template argumentsunordered_map
This issue happens when you attempt to do any kind of line matching.
From my debugging efforts I was able to narrow it down to the following function.
static void diff_charsToLines(Diffs &diffs, Lines& lineArray) { for (typename Diffs::iterator cur_diff = diffs.begin(); cur_diff != diffs.end(); ++cur_diff) { string_t text; for (int y = 0; y < (int)(*cur_diff).text.length(); y++) { const LinePtr& lp = lineArray[static_cast<size_t>((*cur_diff).text[y])]; text.append(lp.first, lp.second); } (*cur_diff).text.swap(text); }
It looks like the lineArray[]
is the object throwing the exception.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.