jbedworth / cld2 Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/cld2
Automatically exported from code.google.com/p/cld2
There are many uses of the "new" operator in the CLD source code, such as in
scoreonescriptspan.cc's "new ScoringHitBuffer":
https://code.google.com/p/cld2/source/browse/trunk/internal/scoreonescriptspan.c
c#1168
There's no check that the "new" operator successfully allocated memory. In
low-memory conditions this can lead to an access violation and subsequent crash.
The code should fail gracefully under low-memory conditions, though it isn't
immediately obvious how to "gracefully" fail or how helpful it would be to the
caller to have such behavior if they are truly out of memory.
Original issue reported on code.google.com by [email protected]
on 7 Jan 2015 at 12:19
In these files (and any others, obviously):
cld2_generated_distinctoctachrome0122
cld2_generated_deltaoctachrome0122
The Windows compile chain for Chromium is upset because there is an attempt to
declare a zero-length array. Dick has noted this as a concern when we let the
size be zero, and it seems the concern is valid under the Chromium build chain
on Windows.
From Chromium's buildbots, here are the error messages from compilation:
FAILED: ninja -t msvc -e environment.x86 -- E:\b\build\goma\gomacc.exe
"E:\b\depot_tools\win_toolchain\vs2013_files\VC\bin\amd64_x86\cl.exe" /nologo
/showIncludes /FC
@obj\third_party\cld_2\src\internal\cld_2.cld2_generated_distinctoctachrome0122.
obj.rsp /c
..\..\third_party\cld_2\src\internal\cld2_generated_distinctoctachrome0122.cc
/Foobj\third_party\cld_2\src\internal\cld_2.cld2_generated_distinctoctachrome012
2.obj /Fdobj\third_party\cld_2\cld_2.cc.pdb
e:\b\build\slave\win\build\src\third_party\cld_2\src\internal\cld2_generated_dis
tinctoctachrome0122.cc(2184) : error C2466: cannot allocate an array of
constant size 0
e:\b\build\slave\win\build\src\third_party\cld_2\src\internal\cld2_generated_dis
tinctoctachrome0122.cc(2186) : error C2466: cannot allocate an array of
constant size 0
FAILED: ninja -t msvc -e environment.x86 -- E:\b\build\goma\gomacc.exe
"E:\b\depot_tools\win_toolchain\vs2013_files\VC\bin\amd64_x86\cl.exe" /nologo
/showIncludes /FC
@obj\third_party\cld_2\src\internal\cld_2.cld2_generated_deltaoctachrome0122.obj
.rsp /c
..\..\third_party\cld_2\src\internal\cld2_generated_deltaoctachrome0122.cc
/Foobj\third_party\cld_2\src\internal\cld_2.cld2_generated_deltaoctachrome0122.o
bj /Fdobj\third_party\cld_2\cld_2.cc.pdb
e:\b\build\slave\win\build\src\third_party\cld_2\src\internal\cld2_generated_del
taoctachrome0122.cc(4577) : error C2466: cannot allocate an array of constant
size 0
e:\b\build\slave\win\build\src\third_party\cld_2\src\internal\cld2_generated_del
taoctachrome0122.cc(4579) : error C2466: cannot allocate an array of constant
size 0
ninja: build stopped: subcommand failed.
The workaround we had in place before was to have the constants for size *say*
zero, i.e. the code will never read anything from the array and the dynamic
data tool will just skip it. We'd then actually allocate an array of size one
(however many bytes, usually 4 for our use cases of uint32). This makes the
compiler happy at a cost of a few bytes of overhead in non-dynamic mode. Seems
like we don't really have a choice here, so I'll prepare the patch.
Original issue reported on code.google.com by [email protected]
on 12 Mar 2014 at 11:32
On behalf of [email protected] (cc'd):
--- snip ---
I would like to send a patch to CLD_2 for adding ARMv8a to the supporting list
in internal/port.h.
It’s my first time to send patches to CLD_2 project, and I have no idea how
to upload it.
Can you take a look at the attached file to check if this modification is
useful? Is there any issue related to this modification? Also, can you tell me
how to upload the patch properly?
Thanks for your kindly help.
--- snip ---
Original issue reported on code.google.com by [email protected]
on 5 May 2015 at 9:46
Attachments:
As discussed offline, this is a patch to enable CLD2 to run in "dynamic" mode.
In dynamic mode the kScoringtables struct is populated from a file at runtime
instead of being compiled into the program as a read-only section in the binary.
This patch adds a new cld2_dynamic_data_tool and accompanying build
instructions, and patches the unit tests to exercise all dynamic functionality.
Data can be loaded, unloaded, and reloaded - theoretically allowing continuous
operations of the program when updated tables are available.
It still has some hardcoding, but we can fix the underlying issues in the
source code easily in another pass as you've suggested.
Original issue reported on code.google.com by [email protected]
on 25 Feb 2014 at 6:30
Attachments:
Apparently, CLD2 has some difficulties(*) with
http://drugoi.livejournal.com/3971967.html
We are seeing UND (undefined) on chrome://translate-internals
*: or maybe we are mis-using it...
Original issue reported on code.google.com by [email protected]
on 5 Mar 2014 at 6:59
The 20141015 tables don't compile with the dynamic data tool because they are
missing the hand-crafted "agnostic" constants that were put in for the old
release. Attached is a patch that appears to make this work for the dynamic
data tool.
Original issue reported on code.google.com by [email protected]
on 31 Oct 2014 at 7:12
Attachments:
Hi, since google code is closing, where do you plan to move the packaging?
thanks!
Original issue reported on code.google.com by [email protected]
on 6 May 2015 at 2:21
Chromium build output from one of the buildbots:
FAILED: ninja -t msvc -e environment.x86 -- E:\b\build\goma/gomacc
"E:\b\depot_tools\win_toolchain\vs2013_files\VC\bin\amd64_x86\cl.exe" /nologo
/showIncludes /FC
@obj\third_party\cld_2\src\internal\cld2_dynamic.cld2_dynamic_data.obj.rsp /c
..\..\third_party\cld_2\src\internal\cld2_dynamic_data.cc
/Foobj\third_party\cld_2\src\internal\cld2_dynamic.cld2_dynamic_data.obj
/Fdobj\third_party\cld_2\cld2_dynamic.cc.pdb
e:\b\build\slave\win\build\src\third_party\cld_2\src\internal\cld2_dynamic_data.
cc(33) : error C2039: 'max' : is not a member of 'std'
e:\b\build\slave\win\build\src\third_party\cld_2\src\internal\cld2_dynamic_data.
cc(33) : error C3861: 'max': identifier not found
e:\b\build\slave\win\build\src\third_party\cld_2\src\internal\cld2_dynamic_data.
cc(85) : warning C4018: '<' : signed/unsigned mismatch
FAILED: ninja -t msvc -e environment.x86 -- E:\b\build\goma/gomacc
"E:\b\depot_tools\win_toolchain\vs2013_files\VC\bin\amd64_x86\cl.exe" /nologo
/showIncludes /FC
@obj\third_party\cld_2\src\internal\cld2_dynamic.cld2_dynamic_data_loader.obj.rs
p /c ..\..\third_party\cld_2\src\internal\cld2_dynamic_data_loader.cc
/Foobj\third_party\cld_2\src\internal\cld2_dynamic.cld2_dynamic_data_loader.obj
/Fdobj\third_party\cld_2\cld2_dynamic.cc.pdb
e:\b\build\slave\win\build\src\third_party\cld_2\src\internal\cld2_dynamic_data_
loader.cc(99) : error C2220: warning treated as error - no 'object' file
generated
e:\b\build\slave\win\build\src\third_party\cld_2\src\internal\cld2_dynamic_data_
loader.cc(99) : warning C4018: '<' : signed/unsigned mismatch
e:\b\build\slave\win\build\src\third_party\cld_2\src\internal\cld2_dynamic_data_
loader.cc(235) : warning C4018: '<' : signed/unsigned mismatch
This should be fixed.
Original issue reported on code.google.com by [email protected]
on 1 Oct 2014 at 2:36
patch attached.
Description: Adding CFLAGS CXXFLAGS CPPFLAGS and LDFLAGS to the build
Author: Gianfranco Costamagna <[email protected]>
Origin: debian
Last-Update: <2015-01-10>
--- cld2-0.0.0~svn193.orig/internal/compile.sh
+++ cld2-0.0.0~svn193/internal/compile.sh
@@ -14,7 +14,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
-g++ -O2 -m64 compact_lang_det_test.cc \
+g++ $CFLAGS $CPPFLAGS $CXXFLAGS compact_lang_det_test.cc \
cldutil.cc cldutil_shared.cc compact_lang_det.cc compact_lang_det_hint_code.cc \
compact_lang_det_impl.cc debug.cc fixunicodevalue.cc \
generated_entities.cc generated_language.cc generated_ulscript.cc \
@@ -24,10 +24,10 @@ g++ -O2 -m64 compact_lang_det_test.cc \
cld_generated_cjk_delta_bi_4.cc generated_distinct_bi_0.cc \
cld2_generated_quadchrome_2.cc cld2_generated_deltaoctachrome.cc \
cld2_generated_distinctoctachrome.cc cld_generated_score_quad_octa_2.cc \
- -o compact_lang_det_test_chrome_2
+ -o compact_lang_det_test_chrome_2 $LDFLAGS
echo " compact_lang_det_test_chrome_2 compiled"
-g++ -O2 -m64 compact_lang_det_test.cc \
+g++ $CFLAGS $CPPFLAGS $CXXFLAGS compact_lang_det_test.cc \
cldutil.cc cldutil_shared.cc compact_lang_det.cc compact_lang_det_hint_code.cc \
compact_lang_det_impl.cc debug.cc fixunicodevalue.cc \
generated_entities.cc generated_language.cc generated_ulscript.cc \
@@ -37,11 +37,11 @@ g++ -O2 -m64 compact_lang_det_test.cc \
cld_generated_cjk_delta_bi_4.cc generated_distinct_bi_0.cc \
cld2_generated_quadchrome_16.cc cld2_generated_deltaoctachrome.cc \
cld2_generated_distinctoctachrome.cc cld_generated_score_quad_octa_2.cc \
- -o compact_lang_det_test_chrome_16
+ -o compact_lang_det_test_chrome_16 $LDFLAGS
echo " compact_lang_det_test_chrome_16 compiled"
-g++ -O2 -m64 cld2_unittest.cc \
+g++ $CFLAGS $CPPFLAGS $CXXFLAGS cld2_unittest.cc \
cldutil.cc cldutil_shared.cc compact_lang_det.cc compact_lang_det_hint_code.cc \
compact_lang_det_impl.cc debug.cc fixunicodevalue.cc \
generated_entities.cc generated_language.cc generated_ulscript.cc \
@@ -51,10 +51,10 @@ g++ -O2 -m64 cld2_unittest.cc \
cld_generated_cjk_delta_bi_4.cc generated_distinct_bi_0.cc \
cld2_generated_quadchrome_2.cc cld2_generated_deltaoctachrome.cc \
cld2_generated_distinctoctachrome.cc cld_generated_score_quad_octa_2.cc \
- -o cld2_unittest_chrome_2
+ -o cld2_unittest_chrome_2 $LDFLAGS
echo " cld2_unittest_chrome_2 compiled"
-g++ -O2 -m64 -Davoid_utf8_string_constants cld2_unittest.cc \
+g++ $CFLAGS $CPPFLAGS $CXXFLAGS -Davoid_utf8_string_constants cld2_unittest.cc
\
cldutil.cc cldutil_shared.cc compact_lang_det.cc compact_lang_det_hint_code.cc \
compact_lang_det_impl.cc debug.cc fixunicodevalue.cc \
generated_entities.cc generated_language.cc generated_ulscript.cc \
@@ -64,7 +64,7 @@ g++ -O2 -m64 -Davoid_utf8_string_consta
cld_generated_cjk_delta_bi_4.cc generated_distinct_bi_0.cc \
cld2_generated_quadchrome_2.cc cld2_generated_deltaoctachrome.cc \
cld2_generated_distinctoctachrome.cc cld_generated_score_quad_octa_2.cc \
- -o cld2_unittest_avoid_chrome_2
+ -o cld2_unittest_avoid_chrome_2 $LDFLAGS
echo " cld2_unittest_avoid_chrome_2 compiled"
--- cld2-0.0.0~svn193.orig/internal/compile_dynamic.sh
+++ cld2-0.0.0~svn193/internal/compile_dynamic.sh
@@ -15,7 +15,7 @@
# limitations under the License.
# The data tool, which can be used to read and write CLD2 dynamic data files
-g++ -O2 -m64 cld2_dynamic_data_tool.cc \
+g++ $CFLAGS $CPPFLAGS $CXXFLAGS cld2_dynamic_data_tool.cc \
cld2_dynamic_data.h cld2_dynamic_data.cc \
cld2_dynamic_data_extractor.h cld2_dynamic_data_extractor.cc \
cld2_dynamic_data_loader.h cld2_dynamic_data_loader.cc \
@@ -28,11 +28,11 @@ g++ -O2 -m64 cld2_dynamic_data_tool.cc \
cld_generated_cjk_delta_bi_4.cc generated_distinct_bi_0.cc \
cld2_generated_quadchrome_2.cc cld2_generated_deltaoctachrome.cc \
cld2_generated_distinctoctachrome.cc cld_generated_score_quad_octa_2.cc \
- -o cld2_dynamic_data_tool
+ -o cld2_dynamic_data_tool $LDFLAGS
echo " cld2_dynamic_data_tool compiled"
# Tests for Chromium flavored dynamic CLD2
-g++ -O2 -m64 -D CLD2_DYNAMIC_MODE compact_lang_det_test.cc \
+g++ $CFLAGS $CPPFLAGS $CXXFLAGS -D CLD2_DYNAMIC_MODE compact_lang_det_test.cc \
cld2_dynamic_data.h cld2_dynamic_data.cc \
cld2_dynamic_data_extractor.h cld2_dynamic_data_extractor.cc \
cld2_dynamic_data_loader.h cld2_dynamic_data_loader.cc \
@@ -41,12 +41,12 @@ g++ -O2 -m64 -D CLD2_DYNAMIC_MODE compac
generated_entities.cc generated_language.cc generated_ulscript.cc \
getonescriptspan.cc lang_script.cc offsetmap.cc scoreonescriptspan.cc \
tote.cc utf8statetable.cc \
- -o compact_lang_det_dynamic_test_chrome
+ -o compact_lang_det_dynamic_test_chrome $LDFLAGS
echo " compact_lang_det_dynamic_test_chrome compiled"
# Unit tests, in dynamic mode
-g++ -O2 -m64 -g3 -D CLD2_DYNAMIC_MODE cld2_unittest.cc \
+g++ $CFLAGS $CPPFLAGS $CXXFLAGS -g3 -D CLD2_DYNAMIC_MODE cld2_unittest.cc \
cld2_dynamic_data.h cld2_dynamic_data.cc \
cld2_dynamic_data_loader.h cld2_dynamic_data_loader.cc \
cldutil.cc cldutil_shared.cc compact_lang_det.cc compact_lang_det_hint_code.cc \
@@ -54,11 +54,11 @@ g++ -O2 -m64 -g3 -D CLD2_DYNAMIC_MODE cl
generated_entities.cc generated_language.cc generated_ulscript.cc \
getonescriptspan.cc lang_script.cc offsetmap.cc scoreonescriptspan.cc \
tote.cc utf8statetable.cc \
- -o cld2_dynamic_unittest
+ -o cld2_dynamic_unittest $LDFLAGS
echo " cld2_dynamic_unittest compiled"
# Shared library, in dynamic mode
-g++ -shared -fPIC -O2 -m64 -D CLD2_DYNAMIC_MODE \
+g++ $CFLAGS $CPPFLAGS $CXXFLAGS -shared -fPIC -D CLD2_DYNAMIC_MODE \
cld2_dynamic_data.h cld2_dynamic_data.cc \
cld2_dynamic_data_loader.h cld2_dynamic_data_loader.cc \
cldutil.cc cldutil_shared.cc compact_lang_det.cc compact_lang_det_hint_code.cc \
@@ -66,6 +66,6 @@ g++ -shared -fPIC -O2 -m64 -D CLD2_DYNAM
generated_entities.cc generated_language.cc generated_ulscript.cc \
getonescriptspan.cc lang_script.cc offsetmap.cc scoreonescriptspan.cc \
tote.cc utf8statetable.cc \
- -o libcld2_dynamic.so
+ -o libcld2_dynamic.so $LDFLAGS
echo " libcld2_dynamic.so compiled"
--- cld2-0.0.0~svn193.orig/internal/compile_full.sh
+++ cld2-0.0.0~svn193/internal/compile_full.sh
@@ -14,7 +14,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
-g++ -O2 -m64 compact_lang_det_test.cc \
+g++ $CFLAGS $CPPFLAGS $CXXFLAGS compact_lang_det_test.cc \
cldutil.cc cldutil_shared.cc compact_lang_det.cc compact_lang_det_hint_code.cc \
compact_lang_det_impl.cc debug.cc fixunicodevalue.cc \
generated_entities.cc generated_language.cc generated_ulscript.cc \
@@ -24,10 +24,10 @@ g++ -O2 -m64 compact_lang_det_test.cc \
cld_generated_cjk_delta_bi_32.cc generated_distinct_bi_0.cc \
cld2_generated_quad0122.cc cld2_generated_deltaocta0122.cc \
cld2_generated_distinctocta0122.cc cld_generated_score_quad_octa_0122.cc \
- -o compact_lang_det_test_full
+ -o compact_lang_det_test_full $LDFLAGS
echo " compact_lang_det_test_full compiled"
-g++ -O2 -m64 cld2_unittest_full.cc \
+g++ $CFLAGS $CPPFLAGS $CXXFLAGS cld2_unittest_full.cc \
cldutil.cc cldutil_shared.cc compact_lang_det.cc compact_lang_det_hint_code.cc \
compact_lang_det_impl.cc debug.cc fixunicodevalue.cc \
generated_entities.cc generated_language.cc generated_ulscript.cc \
@@ -37,10 +37,10 @@ g++ -O2 -m64 cld2_unittest_full.cc \
cld_generated_cjk_delta_bi_32.cc generated_distinct_bi_0.cc \
cld2_generated_quad0122.cc cld2_generated_deltaocta0122.cc \
cld2_generated_distinctocta0122.cc cld_generated_score_quad_octa_0122.cc \
- -o cld2_unittest_full
+ -o cld2_unittest_full $LDFLAGS
echo " cld2_unittest_full compiled"
-g++ -O2 -m64 -Davoid_utf8_string_constants cld2_unittest_full.cc \
+g++ $CFLAGS $CPPFLAGS $CXXFLAGS -Davoid_utf8_string_constants
cld2_unittest_full.cc \
cldutil.cc cldutil_shared.cc compact_lang_det.cc compact_lang_det_hint_code.cc \
compact_lang_det_impl.cc debug.cc fixunicodevalue.cc \
generated_entities.cc generated_language.cc generated_ulscript.cc \
@@ -50,6 +50,6 @@ g++ -O2 -m64 -Davoid_utf8_string_consta
cld_generated_cjk_delta_bi_32.cc generated_distinct_bi_0.cc \
cld2_generated_quad0122.cc cld2_generated_deltaocta0122.cc \
cld2_generated_distinctocta0122.cc cld_generated_score_quad_octa_0122.cc \
- -o cld2_unittest_full_avoid
+ -o cld2_unittest_full_avoid $LDFLAGS
echo " cld2_unittest_full_avoid compiled"
--- cld2-0.0.0~svn193.orig/internal/compile_libs.sh
+++ cld2-0.0.0~svn193/internal/compile_libs.sh
@@ -14,7 +14,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
-g++ -shared -fPIC -O2 -m64 \
+g++ $CFLAGS $CPPFLAGS $CXXFLAGS -shared -fPIC \
cldutil.cc cldutil_shared.cc compact_lang_det.cc compact_lang_det_hint_code.cc \
compact_lang_det_impl.cc debug.cc fixunicodevalue.cc \
generated_entities.cc generated_language.cc generated_ulscript.cc \
@@ -24,9 +24,9 @@ g++ -shared -fPIC -O2 -m64 \
cld_generated_cjk_delta_bi_4.cc generated_distinct_bi_0.cc \
cld2_generated_quadchrome_2.cc cld2_generated_deltaoctachrome.cc \
cld2_generated_distinctoctachrome.cc cld_generated_score_quad_octa_2.cc \
- -o libcld2.so
+ -o libcld2.so $LDFLAGS
-g++ -shared -fPIC -O2 -m64 \
+g++ $CFLAGS $CPPFLAGS $CXXFLAGS -shared -fPIC \
cldutil.cc cldutil_shared.cc compact_lang_det.cc compact_lang_det_hint_code.cc \
compact_lang_det_impl.cc debug.cc fixunicodevalue.cc \
generated_entities.cc generated_language.cc generated_ulscript.cc \
@@ -36,5 +36,5 @@ g++ -shared -fPIC -O2 -m64 \
cld_generated_cjk_delta_bi_32.cc generated_distinct_bi_0.cc \
cld2_generated_quad0122.cc cld2_generated_deltaocta0122.cc \
cld2_generated_distinctocta0122.cc cld_generated_score_quad_octa_0122.cc \
- -o libcld2_full.so
+ -o libcld2_full.so $LDFLAGS
(there is an ongoing debian effort to package it)
Original issue reported on code.google.com by [email protected]
on 10 Feb 2015 at 3:36
I'm using Mike McCandless' Python binding to cld2. I originally reported this
issue to him, and he suggested I report it here (see
https://code.google.com/p/chromium-compact-language-detector/issues/detail?id=15
).
The issue is that for a particular input string, cld2 reports that the
prediction is reliable, but the set of languages detected is empty.
What steps will reproduce the problem?
1. import cld2
2. cld2.detect('interaktive infografik \xc3\xbcber videospielkonsolen')
What is the expected output? What do you see instead?
The output is
(True, 49, ())
What version of the product are you using? On what operating system?
Python 2.7.3 (default, Aug 1 2012, 05:14:39)
[GCC 4.6.3] on linux2
cld2 was built using SVN rev 63,
cld python module was built using hg changeset b1cad3f04ef4
Original issue reported on code.google.com by [email protected]
on 6 Aug 2013 at 3:08
The current ISO 639 code for Hebrew is "he" and no longer "iw". However the
code still returns "iw".
Original issue reported on code.google.com by [email protected]
on 12 Jun 2014 at 9:44
There's a lot of data files that get generated, such as
cld_generated_cjk_uni_prop_80.cc and its ilk. There have been several problems
in the past with the generated files that have necessitated post-generation
fixes, e.g.:
https://code.google.com/p/cld2/source/detail?r=155
https://code.google.com/p/cld2/source/detail?r=156
https://code.google.com/p/cld2/source/detail?r=189
https://code.google.com/p/cld2/source/detail?r=192
https://code.google.com/p/cld2/source/detail?r=193
...
And now we have issue 32, which is more of the same. We don't have the
templates or whatever are used to generated these source files checked in; we
should. I get that the actual data is huge and isn't something we'd store in
Git, but I'd really like to see us put the templates/generators into the code
base so that we can maintain them alongside the code that they produce.
High priority because I feel that at this point there is likely drift between
the templates and the code they produce; we should probably get the templates
checked in and iterate on them until they produce exactly the same files that
we have today, then proceed forward with maintenance.
WDYT?
Original issue reported on code.google.com by [email protected]
on 1 May 2015 at 8:34
Dynamic data loading currently uses iostream for logging.
That would be fine, except that nowhere else in the library is iostream used,
meaning this is bringing in many classes for little gain, and only when dynamic
data loading is turned on.
Original issue reported on code.google.com by [email protected]
on 15 Jul 2014 at 9:39
Can you please provide a SONAME for the library?
Installing something in usr/lib without a SONAME is so painful.
Original issue reported on code.google.com by [email protected]
on 10 Feb 2015 at 3:37
What steps will reproduce the problem?
1. checkout revision 194
2. use the cmake file (probably doesn't change anything)
3. use ubuntu 14.10 x64
build it and run tests
make[1]: Entering directory '/tmp/buildd/cld2-0.0.0~svn194'
cd obj-* && echo "this is some english text" | ./compact_lang_det_test_chrome_2
ExtLanguage ENGLISH(96% 1851p), 27/26 bytes of non-tag letters, Summary: ENGLISH
SummaryLanguage ENGLISH at 0 of 26 81us (0 MB/sec), (null)
cd obj-* && echo "this is some english text" | ./compact_lang_det_test_chrome_16
ExtLanguage ENGLISH(96% 1851p), 27/26 bytes of non-tag letters, Summary: ENGLISH
SummaryLanguage ENGLISH at 0 of 26 79us (0 MB/sec), (null)
cd obj-* && ./cld2_unittest_chrome_2 > /dev/null
*** Bad UTF-8 after 40 bytes<br>
Checking that non-dynamic implementations of dynamic data methods are no-ops
(ignore the warnings).
WARNING: Dynamic mode not active, loadDataFromFile has no effect!
WARNING: Dynamic mode not active, loadDataFromRawAddress has no effect!
WARNING: Dynamic mode not active, unloadData has no effect!
Done checking non-dynamic implementations of dynamic data methods, care about
warnings again.
PASS
cd obj-* && ./cld2_unittest_avoid_chrome_2 > /dev/null
*** Bad UTF-8 after 40 bytes<br>
Checking that non-dynamic implementations of dynamic data methods are no-ops
(ignore the warnings).
WARNING: Dynamic mode not active, loadDataFromFile has no effect!
WARNING: Dynamic mode not active, loadDataFromRawAddress has no effect!
WARNING: Dynamic mode not active, unloadData has no effect!
Done checking non-dynamic implementations of dynamic data methods, care about
warnings again.
PASS
cd obj-* && echo "this is some english text" | ./compact_lang_det_test_full
ExtLanguage ENGLISH(96% 1772p), 27/26 bytes of non-tag letters, Summary: ENGLISH
SummaryLanguage ENGLISH at 0 of 26 153us (0 MB/sec), (null)
cd obj-* && ./cld2_unittest_full > /dev/null
PASS
cd obj-* && ./cld2_unittest_full_avoid > /dev/null
PASS
cd obj-* && ./cld2_dynamic_data_tool --dump cld2_data.bin
cd obj-* && ./cld2_dynamic_data_tool --verify cld2_data.bin
cd obj-* && echo "this is some english text" |
./compact_lang_det_dynamic_test_chrome --data-file cld2_data.bin
Loading data from: cld2_data.bin
Data loaded, test commencing
ExtLanguage ENGLISH(96% 1851p), 27/26 bytes of non-tag letters, Summary: ENGLISH
SummaryLanguage ENGLISH at 0 of 26 69us (0 MB/sec), --data-file
cd obj-* && ./cld2_dynamic_unittest --data-file cld2_data.bin > /dev/null
*** Bad UTF-8 after 40 bytes<br>
*** Bad UTF-8 after 40 bytes<br>
PASS
make[1]: Leaving directory '/tmp/buildd/cld2-0.0.0~svn194'
don't know, is everything ok?
Original issue reported on code.google.com by [email protected]
on 12 Feb 2015 at 5:24
The existing signature of loadDataFromRawAddress:
void loadDataFromRawAddress(const void* rawAddress, const int length);
The use of "int" here is dangerous because we don't know what the length will
be on any platform. This is my fault, since I'm the one who introduced this
API. Before much more time elapses, we should use a type from stdint.h instead.
In this case I think uint32_t would make the most sense, as we need more than
16 bits for sure but more than 32 would be truly insane.
It's a simple patch; any objections?
Original issue reported on code.google.com by [email protected]
on 26 Mar 2014 at 11:56
I am trying to compile the chromium in Visual Studio 2013. I am actually trying
to create a .NET Wrapper for the library so I have added all the source files
inside my CLR project.
Now whenever I compile I get these linking errors.
error LNK2005: "struct CLD2::CLD2TableSummary const CLD2::kCjkDeltaBi_obj" (?kCjkDeltaBi_obj@CLD2@@3UCLD2TableSummary@1@B) already defined in cld_generated_cjk_delta_bi_32.obj
These all seems to be related as I can see a relation between the 'generated'
files.
Problem is I have a lot of these and I am not sure which ones I should exclude
and which I should keep and use in my code.
Here is a list all the generated files that came with the CLD2 code.
cld_generated_cjk_uni_prop_80.cc
cld_generated_score_quad_octa_2.cc
cld_generated_score_quad_octa_0122.cc
cld_generated_score_quad_octa_0122_2.cc
cld_generated_score_quad_octa_1024_256.cc
cld_generated_cjk_delta_bi_4.cc
cld_generated_cjk_delta_bi_32.cc
cld2_generated_octa2_dummy.cc
cld2_generated_quad0122.cc
cld2_generated_quad0720.cc
cld2_generated_quadchrome_2.cc
cld2_generated_quadchrome_16.cc
cld2_generated_cjk_compatible.cc
cld2_generated_deltaocta0122.cc
cld2_generated_deltaocta0527.cc
cld2_generated_deltaoctachrome.cc
cld2_generated_distinctocta0122.cc
cld2_generated_distinctocta0527.cc
cld2_generated_distinctoctachrome.cc
The naming convention of these suggests that I should only be using one of each
group. At least that how I think I should use it as I am not really an expert
in encoding nor in how CLD2 works. And I could not find any references online
explaining how to configure it.
I tried eliminating the linking errors by keeping only one of each generated
group:
for example: from `cld_generated_cjk_delta_bi_4` and
`cld_generated_cjk_delta_bi_32` I kept the 32 version. And so on for the rest
of the files.
Now this made CLD compile yet when I tried testing it with languages I noticed
that the scores were way way off and it was behaving inexplicably bad.
I am not trying to support all languages I only need to support latin languages
along with hebrew, arabic, japanese and chinese.
Can someone please explain how to configure CLD2 to compile and work correctly.
Original issue reported on code.google.com by [email protected]
on 30 Mar 2015 at 5:57
internal/unittest_data.h seems to use a mixture of escape sequences and raw
non-ASCII text. For maximum portability and safety, it would be best for the
source code to use all ASCII characters and escape the non-ASCII characters.
This should help compiler compatibility, though there are no reports of
breakage since this Chromium.org issue back in 2009:
https://code.google.com/p/chromium/issues/detail?id=20033
The change should be simple enough, and a script can be written to perform the
transformation.
Original issue reported on code.google.com by [email protected]
on 26 Aug 2014 at 3:27
What steps will reproduce the problem?
This "g++" command-line is a mix between "full" and "dynamic":
g++ -O2 -m64 cld2_dynamic_data_tool.cc cld2_dynamic_data.cc
cld2_dynamic_data_extractor.cc cld2_dynamic_data_loader.cc cldutil.cc
cldutil_shared.cc compact_lang_det.cc compact_lang_det_hint_code.cc
compact_lang_det_impl.cc debug.cc fixunicodevalue.cc generated_entities.cc
generated_language.cc generated_ulscript.cc getonescriptspan.cc lang_script.cc
offsetmap.cc scoreonescriptspan.cc tote.cc utf8statetable.cc
cld_generated_cjk_uni_prop_80.cc cld2_generated_cjk_compatible.cc
cld_generated_cjk_delta_bi_32.cc generated_distinct_bi_0.cc
cld2_generated_quad0122.cc cld2_generated_deltaocta0122.cc
cld2_generated_distinctocta0122.cc cld_generated_score_quad_octa_0122.cc -o
cld2_dynamic_data_tooandl
What is the expected output? What do you see instead?
cld2_dynamic_data_tool.cc:(.text.startup+0x293): Undefined
`CLD2::kQuadChromeIndSize'
cld2_dynamic_data_tool.cc:(.text.startup+0x29d): Undefined
`CLD2::kQuadChrome2IndSize'
Original issue reported on code.google.com by [email protected]
on 9 Apr 2014 at 9:59
Today, we guard the declaration of the dynamic-data-related functions in
comapct_lang_det.h with "#ifdef CLD2_DYNAMIC_MODE":
https://code.google.com/p/cld2/source/browse/trunk/public/compact_lang_det.h
This causes some unfortunate side effects when including CLD2 in another
project: unless building with a single compile pass including all sources, any
separate compilation unit that requires dynamic functionality has to have the
same define when it #includes compact_lang_det.h in order to keep the compiler
happy.
For example, Chromium builds CLD2 separately, then links it into the Chromium
binary; but if CLD2_DYNAMIC_MODE isn't defined in Chromium code that includes
compact_lang_det.h, you get compiler errors like the ones below even if CLD2
itself has been built with the define:
error: 'isDataLoaded' is not a member of 'CLD2'
error: 'loadDataFromRawAddress' is not a member of 'CLD2'
Ideally, the #define guard can be encapsulated entirely within CLD2 so that the
dependent library doesn't need to know about this at all.
The downside is that dependent code might accidentally try to use dynamic mode
even if it isn't available. Throwing exceptions isn't a viable solution, since
some projects disable exceptions when compiling. We'd presumably just have to
define the following behavior if CLD2_DYNAMIC_MODE is not defined:
isDataLoaded: return true
loadDataFromRawAddress: no-op and output a warning to stderr
loadDataFromFile: no-op and output a warning to stderr
This change should be fully backwards compatible, since it doesn't change or
remove any existing function declarations under any circumstances.
Original issue reported on code.google.com by [email protected]
on 23 Jun 2014 at 9:57
What steps will reproduce the problem?
1. Launch chrome with the flag --force-fieldtrials=CLD1VsCLD2/CLD2/
2. Open website <https://play.google.com/store>
3. Go to movie and Romancing Bollywood then click on see more movies.
4. For India location it detect "Malay" language of the page although this page
is in English language (refer attached screenshot.)
What is the expected output?
No translation bar as the language of website is English.
What do you see instead?
translation bar asking for translation from Malay to English.
What version of the product are you using? On what operating system?
Version: 32.0.1657.2 (Official Build 226144)
OS: Linux Ubuntu 12.04
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 7 Nov 2013 at 9:39
Attachments:
Following errors are produced by GCC compiler:
c++ -MMD -MF
obj/third_party/cld_2/src/internal/cld2_static.cld_generated_cjk_uni_prop_80.o.d
-DV8_DEPRECATION_WARNINGS -D_FILE_OFFSET_BITS=64 -DCHROMIUM_BUILD
-DTOOLKIT_VIEWS=1 -DUI_COMPOSITOR_IMAGE_TRANSPORT -DUSE_AURA=1 -DUSE_ASH=1
-DUSE_PANGO=1 -DUSE_CAIRO=1 -DUSE_DEFAULT_RENDER_THEME=1 -DUSE_LIBJPEG_TURBO=1
-DUSE_X11=1 -DUSE_CLIPBOARD_AURAX11=1 -DENABLE_ONE_CLICK_SIGNIN
-DENABLE_PRE_SYNC_BACKUP -DENABLE_REMOTING=1 -DENABLE_WEBRTC=1
-DENABLE_PEPPER_CDMS -DENABLE_CONFIGURATION_POLICY -DENABLE_NOTIFICATIONS
-DUSE_UDEV -DDONT_EMBED_BUILD_METADATA -DENABLE_TASK_MANAGER=1
-DENABLE_EXTENSIONS=1 -DENABLE_PLUGINS=1 -DENABLE_SESSION_SERVICE=1
-DENABLE_THEMES=1 -DENABLE_AUTOFILL_DIALOG=1 -DENABLE_BACKGROUND=1
-DENABLE_GOOGLE_NOW=1 -DCLD_VERSION=2 -DENABLE_PRINTING=1
-DENABLE_BASIC_PRINTING=1 -DENABLE_PRINT_PREVIEW=1 -DENABLE_SPELLCHECK=1
-DENABLE_CAPTIVE_PORTAL_DETECTION=1 -DENABLE_APP_LIST=1 -DENABLE_SETTINGS_APP=1
-DENABLE_SUPERVISED_USERS=1 -DENABLE_MDNS=1 -DENABLE_SERVICE_DISCOVERY=1
-DV8_USE_EXTERNAL_STARTUP_DATA -DUSE_LIBPCI=1 -DUSE_GLIB=1 -DUSE_NSS=1 -DNDEBUG
-DNVALGRIND -DDYNAMIC_ANNOTATIONS_ENABLED=0 -Igen
-I../../third_party/cld_2/src/internal -I../../third_party/cld_2/src/public
-fstack-protector --param=ssp-buffer-size=4 -pthread -fno-strict-aliasing
-Wno-unused-parameter -Wno-missing-field-initializers -fvisibility=hidden -pipe
-fPIC
-B/home/marxin/Programming/chromium/src/third_party/binutils/Linux_x64/Release/b
in -Wno-unused-local-typedefs -Wno-format -Wno-unused-result -m64 -march=x86-64
-O2 -fno-ident -fdata-sections -ffunction-sections -funwind-tables
-fno-exceptions -fno-rtti -fno-threadsafe-statics -fvisibility-inlines-hidden
-Wno-deprecated -std=gnu++11 -Wno-narrowing -Wno-literal-suffix -c
../../third_party/cld_2/src/internal/cld_generated_cjk_uni_prop_80.cc -o
obj/third_party/cld_2/src/internal/cld2_static.cld_generated_cjk_uni_prop_80.o
-Wno-narrowing
../../third_party/cld_2/src/internal/cld_generated_cjk_uni_prop_80.cc:7089:1:
error: narrowing conversion of ‘-14’ from ‘int’ to ‘CLD2::uint8 {aka
unsigned char}’ inside { }
};
^
../../third_party/cld_2/src/internal/cld_generated_cjk_uni_prop_80.cc:7089:1:
error: narrowing conversion of ‘-14’ from ‘int’ to ‘CLD2::uint8 {aka
unsigned char}’ inside { }
../../third_party/cld_2/src/internal/cld_generated_cjk_uni_prop_80.cc:7089:1:
error: narrowing conversion of ‘-14’ from ‘int’ to ‘CLD2::uint8 {aka
unsigned char}’ inside { }
../../third_party/cld_2/src/internal/cld_generated_cjk_uni_prop_80.cc:7089:1:
error: narrowing conversion of ‘-14’ from ‘int’ to ‘CLD2::uint8 {aka
unsigned char}’ inside { }
../../third_party/cld_2/src/internal/cld_generated_cjk_uni_prop_80.cc:7089:1:
error: narrowing conversion of ‘-14’ from ‘int’ to ‘CLD2::uint8 {aka
unsigned char}’ inside { }
../../third_party/cld_2/src/internal/cld_generated_cjk_uni_prop_80.cc:7089:1:
error: narrowing conversion of ‘-14’ from ‘int’ to ‘CLD2::uint8 {aka
unsigned char}’ inside { }
../../third_party/cld_2/src/internal/cld_generated_cjk_uni_prop_80.cc:7089:1:
error: narrowing conversion of ‘-14’ from ‘int’ to ‘CLD2::uint8 {aka
unsigned char}’ inside { }
../../third_party/cld_2/src/internal/cld_generated_cjk_uni_prop_80.cc:7089:1:
error: narrowing conversion of ‘-14’ from ‘int’ to ‘CLD2::uint8 {aka
unsigned char}’ inside { }
../../third_party/cld_2/src/internal/cld_generated_cjk_uni_prop_80.cc:7089:1:
error: narrowing conversion of ‘-14’ from ‘int’ to ‘CLD2::uint8 {aka
unsigned char}’ inside { }
../../third_party/cld_2/src/internal/cld_generated_cjk_uni_prop_80.cc:7089:1:
error: narrowing conversion of ‘-14’ from ‘int’ to ‘CLD2::uint8 {aka
unsigned char}’ inside { }
../../third_party/cld_2/src/internal/cld_generated_cjk_uni_prop_80.cc:7089:1:
error: narrowing conversion of ‘-14’ from ‘int’ to ‘CLD2::uint8 {aka
unsigned char}’ inside { }
../../third_party/cld_2/src/internal/cld_generated_cjk_uni_prop_80.cc:7089:1:
error: narrowing conversion of ‘-14’ from ‘int’ to ‘CLD2::uint8 {aka
unsigned char}’ inside { }
../../third_party/cld_2/src/internal/cld_generated_cjk_uni_prop_80.cc:7089:1:
error: narrowing conversion of ‘-14’ from ‘int’ to ‘CLD2::uint8 {aka
unsigned char}’ inside { }
../../third_party/cld_2/src/internal/cld_generated_cjk_uni_prop_80.cc:7089:1:
error: narrowing conversion of ‘-14’ from ‘int’ to ‘CLD2::uint8 {aka
unsigned char}’ inside { }
... (and many more)
Problem is more discussed in following thread:
https://groups.google.com/a/chromium.org/forum/#!topic/chromium-dev/D5YxoMmtEmE
I think fix is quite obvious, generator should produce just uint8 numbers.
Thanks,
Martin
Original issue reported on code.google.com by [email protected]
on 5 Jan 2015 at 10:34
What steps will reproduce the problem?
1. sh compile_libs.sh
2. Observe errors
What is the expected output? What do you see instead?
Expected: None (success)
Actual output:
compact_lang_det_test.cc:1:0: warning: -fPIC ignored for target (all code is
position independent) [enabled by default]
compact_lang_det_test.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
cldutil.cc:1:0: warning: -fPIC ignored for target (all code is position
independent) [enabled by default]
cldutil.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
cldutil_shared.cc:1:0: warning: -fPIC ignored for target (all code is position
independent) [enabled by default]
cldutil_shared.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
compact_lang_det.cc:1:0: warning: -fPIC ignored for target (all code is
position independent) [enabled by default]
compact_lang_det.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
compact_lang_det_hint_code.cc:1:0: warning: -fPIC ignored for target (all code
is position independent) [enabled by default]
compact_lang_det_hint_code.cc:1:0: sorry, unimplemented: 64-bit mode not
compiled in
compact_lang_det_impl.cc:1:0: warning: -fPIC ignored for target (all code is
position independent) [enabled by default]
compact_lang_det_impl.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
debug.cc:1:0: warning: -fPIC ignored for target (all code is position
independent) [enabled by default]
debug.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
fixunicodevalue.cc:1:0: warning: -fPIC ignored for target (all code is position
independent) [enabled by default]
fixunicodevalue.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
generated_entities.cc:1:0: warning: -fPIC ignored for target (all code is
position independent) [enabled by default]
generated_entities.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
generated_language.cc:1:0: warning: -fPIC ignored for target (all code is
position independent) [enabled by default]
generated_language.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
generated_ulscript.cc:1:0: warning: -fPIC ignored for target (all code is
position independent) [enabled by default]
generated_ulscript.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
getonescriptspan.cc:1:0: warning: -fPIC ignored for target (all code is
position independent) [enabled by default]
getonescriptspan.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
lang_script.cc:1:0: warning: -fPIC ignored for target (all code is position
independent) [enabled by default]
lang_script.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
offsetmap.cc:1:0: warning: -fPIC ignored for target (all code is position
independent) [enabled by default]
offsetmap.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
scoreonescriptspan.cc:1:0: warning: -fPIC ignored for target (all code is
position independent) [enabled by default]
scoreonescriptspan.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
tote.cc:1:0: warning: -fPIC ignored for target (all code is position
independent) [enabled by default]
tote.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
utf8statetable.cc:1:0: warning: -fPIC ignored for target (all code is position
independent) [enabled by default]
utf8statetable.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
cld_generated_cjk_uni_prop_80.cc:1:0: warning: -fPIC ignored for target (all
code is position independent) [enabled by default]
cld_generated_cjk_uni_prop_80.cc:1:0: sorry, unimplemented: 64-bit mode not
compiled in
cld2_generated_cjk_compatible.cc:1:0: warning: -fPIC ignored for target (all
code is position independent) [enabled by default]
cld2_generated_cjk_compatible.cc:1:0: sorry, unimplemented: 64-bit mode not
compiled in
cld_generated_cjk_delta_bi_4.cc:1:0: warning: -fPIC ignored for target (all
code is position independent) [enabled by default]
cld_generated_cjk_delta_bi_4.cc:1:0: sorry, unimplemented: 64-bit mode not
compiled in
generated_distinct_bi_0.cc:1:0: warning: -fPIC ignored for target (all code is
position independent) [enabled by default]
generated_distinct_bi_0.cc:1:0: sorry, unimplemented: 64-bit mode not compiled
in
cld2_generated_quadchrome0715.cc:1:0: warning: -fPIC ignored for target (all
code is position independent) [enabled by default]
cld2_generated_quadchrome0715.cc:1:0: sorry, unimplemented: 64-bit mode not
compiled in
cld2_generated_deltaoctachrome0614.cc:1:0: warning: -fPIC ignored for target
(all code is position independent) [enabled by default]
cld2_generated_deltaoctachrome0614.cc:1:0: sorry, unimplemented: 64-bit mode
not compiled in
cld2_generated_distinctoctachrome0604.cc:1:0: warning: -fPIC ignored for target
(all code is position independent) [enabled by default]
cld2_generated_distinctoctachrome0604.cc:1:0: sorry, unimplemented: 64-bit mode
not compiled in
cld_generated_score_quad_octa_1024_256.cc:1:0: warning: -fPIC ignored for
target (all code is position independent) [enabled by default]
cld_generated_score_quad_octa_1024_256.cc:1:0: sorry, unimplemented: 64-bit
mode not compiled in
compact_lang_det_test.cc:1:0: warning: -fPIC ignored for target (all code is
position independent) [enabled by default]
compact_lang_det_test.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
cldutil.cc:1:0: warning: -fPIC ignored for target (all code is position
independent) [enabled by default]
cldutil.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
cldutil_shared.cc:1:0: warning: -fPIC ignored for target (all code is position
independent) [enabled by default]
cldutil_shared.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
compact_lang_det.cc:1:0: warning: -fPIC ignored for target (all code is
position independent) [enabled by default]
compact_lang_det.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
compact_lang_det_hint_code.cc:1:0: warning: -fPIC ignored for target (all code
is position independent) [enabled by default]
compact_lang_det_hint_code.cc:1:0: sorry, unimplemented: 64-bit mode not
compiled in
compact_lang_det_impl.cc:1:0: warning: -fPIC ignored for target (all code is
position independent) [enabled by default]
compact_lang_det_impl.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
debug.cc:1:0: warning: -fPIC ignored for target (all code is position
independent) [enabled by default]
debug.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
fixunicodevalue.cc:1:0: warning: -fPIC ignored for target (all code is position
independent) [enabled by default]
fixunicodevalue.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
generated_entities.cc:1:0: warning: -fPIC ignored for target (all code is
position independent) [enabled by default]
generated_entities.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
generated_language.cc:1:0: warning: -fPIC ignored for target (all code is
position independent) [enabled by default]
generated_language.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
generated_ulscript.cc:1:0: warning: -fPIC ignored for target (all code is
position independent) [enabled by default]
generated_ulscript.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
getonescriptspan.cc:1:0: warning: -fPIC ignored for target (all code is
position independent) [enabled by default]
getonescriptspan.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
lang_script.cc:1:0: warning: -fPIC ignored for target (all code is position
independent) [enabled by default]
lang_script.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
offsetmap.cc:1:0: warning: -fPIC ignored for target (all code is position
independent) [enabled by default]
offsetmap.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
scoreonescriptspan.cc:1:0: warning: -fPIC ignored for target (all code is
position independent) [enabled by default]
scoreonescriptspan.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
tote.cc:1:0: warning: -fPIC ignored for target (all code is position
independent) [enabled by default]
tote.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
utf8statetable.cc:1:0: warning: -fPIC ignored for target (all code is position
independent) [enabled by default]
utf8statetable.cc:1:0: sorry, unimplemented: 64-bit mode not compiled in
cld_generated_cjk_uni_prop_80.cc:1:0: warning: -fPIC ignored for target (all
code is position independent) [enabled by default]
cld_generated_cjk_uni_prop_80.cc:1:0: sorry, unimplemented: 64-bit mode not
compiled in
cld2_generated_cjk_compatible.cc:1:0: warning: -fPIC ignored for target (all
code is position independent) [enabled by default]
cld2_generated_cjk_compatible.cc:1:0: sorry, unimplemented: 64-bit mode not
compiled in
cld_generated_cjk_delta_bi_32.cc:1:0: warning: -fPIC ignored for target (all
code is position independent) [enabled by default]
cld_generated_cjk_delta_bi_32.cc:1:0: sorry, unimplemented: 64-bit mode not
compiled in
generated_distinct_bi_0.cc:1:0: warning: -fPIC ignored for target (all code is
position independent) [enabled by default]
generated_distinct_bi_0.cc:1:0: sorry, unimplemented: 64-bit mode not compiled
in
cld2_generated_quad0720.cc:1:0: warning: -fPIC ignored for target (all code is
position independent) [enabled by default]
cld2_generated_quad0720.cc:1:0: sorry, unimplemented: 64-bit mode not compiled
in
cld2_generated_deltaocta0527.cc:1:0: warning: -fPIC ignored for target (all
code is position independent) [enabled by default]
cld2_generated_deltaocta0527.cc:1:0: sorry, unimplemented: 64-bit mode not
compiled in
cld2_generated_distinctocta0527.cc:1:0: warning: -fPIC ignored for target (all
code is position independent) [enabled by default]
cld2_generated_distinctocta0527.cc:1:0: sorry, unimplemented: 64-bit mode not
compiled in
cld_generated_score_quad_octa_1024_256.cc:1:0: warning: -fPIC ignored for
target (all code is position independent) [enabled by default]
cld_generated_score_quad_octa_1024_256.cc:1:0: sorry, unimplemented: 64-bit
mode not compiled in
What version of the product are you using? On what operating system?
Windows 7 x64 SP1
gcc version 4.7.3 (GCC) (i686-pc-cygwin)
GNU bash, version 4.1.10(4)-release (i686-pc-cygwin)
Please provide any additional information below.
I'm tryin to build this library on a Windows host to be used in the
chromium-compact-language-detector Python extension.
When I remove the flags -fPIC and -m64 the compilation works (but that is
probably not the right fix). And I can't test it because the Python extension
requires *.lib files but *.so are produced.
Original issue reported on code.google.com by [email protected]
on 9 Sep 2013 at 2:22
The problem appears in revision 215539. I have attached a simple one-line patch
that fixes it.
In method cld::GetNormalizedScore() in cld/compact_lang_det/cldutil.cc, the
loop in line 818 keeps overriding "expected_score" with "kMeanScore[cur_lang *
4 + i]" when it is larger than zero. Therefore, only the last written value is
visible out of the loop and all the other writes and iterations are not
necessary. The patch iterates from the end of "i" and breaks the first time
when "expected_score" is set.
Similar problem also appears in cld::GetReliability(), at line 846.
Original issue reported on code.google.com by [email protected]
on 6 Aug 2013 at 8:45
Hi, thanks for the awesome library. I'm seeing a couple memory errors in
valgrind when I use it.
The first:
==7805== Conditional jump or move depends on uninitialised value(s)
==7805== at 0x4C2CB94: strcmp (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==7805== by 0x43C412: CLD2::DoTLDLookup(char const*, CLD2::TLDLookup const*,
int) (compact_lang_det_hint_code.cc:1034)
==7805== by 0x43D705: CLD2::SetCLDTLDHint(char const*, CLD2::CLDLangPriors*)
(compact_lang_det_hint_code.cc:1452)
==7805== by 0x40CEB0: CLD2::ApplyHints(char const*, int, bool,
CLD2::CLDHints const*, CLD2::ScoringContext*) (compact_lang_det_impl.cc:1504)
==7805== by 0x40DC4F: CLD2::DetectLanguageSummaryV2(char const*, int, bool,
CLD2::CLDHints const*, bool, int, CLD2::Language, CLD2::Language*, int*,
double*, std::__1::vector<CLD2::ResultChunk,
std::__1::allocator<CLD2::ResultChunk> >*, int*, bool*)
(compact_lang_det_impl.cc:1644)
==7805== by 0x409BAE: CLD2::DetectLanguageSummary(char const*, int, bool,
char const*, int, CLD2::Language, CLD2::Language*, int*, int*, bool*)
(compact_lang_det.cc:133)
==7805== by 0x405932: codulus::main(int, char**)
(test_language_detection.cc:43)
==7805== by 0x406341: main (test_language_detection.cc:64)
==7805==
This one seems reasonable to me, DoTLDLookup is using strcmp, but the value of
'key' passed to it is not null terminated.
The other issue I see is an invalid read of one character past the end of my
input in a couple places in the code:
==8337== Invalid read of size 1
==8337== at 0x415932: CLD2::ScriptScanner::GetOneScriptSpan(CLD2::LangSpan*)
(getonescriptspan.cc:973)
==8337== by 0x415DAE:
CLD2::ScriptScanner::GetOneScriptSpanLower(CLD2::LangSpan*)
(getonescriptspan.cc:1074)
==8337== by 0x40DCE9: CLD2::DetectLanguageSummaryV2(char const*, int, bool,
CLD2::CLDHints const*, bool, int, CLD2::Language, CLD2::Language*, int*,
double*, std::__1::vector<CLD2::ResultChunk,
std::__1::allocator<CLD2::ResultChunk> >*, int*, bool*)
(compact_lang_det_impl.cc:1707)
==8337== by 0x40991E: CLD2::DetectLanguageSummary(char const*, int, bool,
char const*, int, CLD2::Language, CLD2::Language*, int*, int*, bool*)
(compact_lang_det.cc:133)
==8337== by 0x405869: codulus::main(int, char**)
(test_language_detection.cc:42)
==8337== by 0x4060B1: main (test_language_detection.cc:63)
==8337== Invalid read of size 1
==8337== at 0x414D3C: CLD2::UTF8OneCharLen(char const*)
(utf8statetable.h:270)
==8337== by 0x415A6D: CLD2::ScriptScanner::GetOneScriptSpan(CLD2::LangSpan*)
(getonescriptspan.cc:991)
==8337== by 0x415DAE:
CLD2::ScriptScanner::GetOneScriptSpanLower(CLD2::LangSpan*)
(getonescriptspan.cc:1074)
==8337== by 0x40DCE9: CLD2::DetectLanguageSummaryV2(char const*, int, bool,
CLD2::CLDHints const*, bool, int, CLD2::Language, CLD2::Language*, int*,
double*, std::__1::vector<CLD2::ResultChunk,
std::__1::allocator<CLD2::ResultChunk> >*, int*, bool*)
(compact_lang_det_impl.cc:1707)
==8337== by 0x40991E: CLD2::DetectLanguageSummary(char const*, int, bool,
char const*, int, CLD2::Language, CLD2::Language*, int*, int*, bool*)
(compact_lang_det.cc:133)
==8337== by 0x405869: codulus::main(int, char**)
(test_language_detection.cc:42)
==8337== by 0x4060B1: main (test_language_detection.cc:63)
==8337== Invalid read of size 1
==8337== at 0x41D1A3:
CLD2::UTF8GenericPropertyTwoByte(CLD2::UTF8StateMachineObj_2 const*, unsigned
char const**, int*) (utf8statetable.cc:403)
==8337== by 0x414D24: CLD2::GetUTF8LetterScriptNum(char const*)
(getonescriptspan.cc:1098)
==8337== by 0x415A87: CLD2::ScriptScanner::GetOneScriptSpan(CLD2::LangSpan*)
(getonescriptspan.cc:992)
==8337== by 0x415DAE:
CLD2::ScriptScanner::GetOneScriptSpanLower(CLD2::LangSpan*)
(getonescriptspan.cc:1074)
==8337== by 0x40DCE9: CLD2::DetectLanguageSummaryV2(char const*, int, bool,
CLD2::CLDHints const*, bool, int, CLD2::Language, CLD2::Language*, int*,
double*, std::__1::vector<CLD2::ResultChunk,
std::__1::allocator<CLD2::ResultChunk> >*, int*, bool*)
(compact_lang_det_impl.cc:1707)
==8337== by 0x40991E: CLD2::DetectLanguageSummary(char const*, int, bool,
char const*, int, CLD2::Language, CLD2::Language*, int*, int*, bool*)
(compact_lang_det.cc:133)
==8337== by 0x405869: codulus::main(int, char**)
(test_language_detection.cc:42)
==8337== by 0x4060B1: main (test_language_detection.cc:63)
For now, I'm working around this by passing (input, size - 1) instead of
(input, size) to cld2. My input is not null terminated, if that makes a
difference. It seems to happen with every input I try (they are all web pages,
by the way). Also, I am running this on x64 linux.
Any ideas?
Original issue reported on code.google.com by [email protected]
on 21 Aug 2013 at 6:04
The following files are affected:
cld2_generated_quadchrome0122_16.cc
cld2_generated_deltaoctachrome0122.cc
cld2_generated_deltaocta0122.cc
cld2_generated_quadchrome0122_19.cc
cld2_generated_distinctoctachrome0122.cc
cld2_generated_distinctocta0122.cc
cld2_generated_quadchrome0122_2.cc
Unfortunately this prevents Chrome from rolling to the latest CLD2. Patch
attached.
Original issue reported on code.google.com by [email protected]
on 14 Mar 2014 at 5:20
Attachments:
There are a few things to clean up in r151:
* Use the newly-added constants in the table classes to avoid hardcoding sizes
* Ensure cld2_generated_quadchrome0122_16.cc works with both active tables in
dynamic mode
* Add the ability to use an already-extant mmap to load the data from (rather
than managing the mmap directly). This is necessary for systems (such as
Chromium) where the security model forbids direct access to the filesystem in
some contexts where CLD2 might be used
Should all be pretty straightforward. Remove all FIXME and TODO comments added
by [email protected] as well.
Original issue reported on code.google.com by [email protected]
on 3 Mar 2014 at 3:25
In building chromium, cld2 emits narrowing warnings due to stricter checks in
the new compiler.
Attached is a patch against the svn repo to fix them.
Original issue reported on code.google.com by [email protected]
on 30 Apr 2015 at 6:18
Attachments:
What steps will reproduce the problem?
1. try to detect the language of attached input file
2. see the output is "unknown"
What is the expected output? What do you see instead?
I would expect either 'perssian' or 'arabic'
What version of the product are you using? On what operating system?
rev195 on centos 7
Please provide any additional information below.
CLD2 returns "unknown" because the reliability is lower than
kMinReliableKeepPercent (in compact_lang_det_impl.cc) :
static const int kMinReliableKeepPercent = 41; // Remove lang if reli < this
Would adding an additional parameter to the DetectLanguageXXX(...) in order to
set this threshold be acceptable ?
Regards
Original issue reported on code.google.com by [email protected]
on 11 Jun 2015 at 5:07
Attachments:
http://build.chromium.org/p/chromium.fyi/builders/Cr%20Win%20Clang/builds/108/st
eps/compile/logs/stdio
..\..\third_party\cld_2\src\internal\offsetmap.cc(82,43) : warning(clang):
format specifies type 'long' but the argument has type 'size_type' (aka
'unsigned int') [-Wformat]
fprintf(fout, "Offsetmap: %ld bytes\n", diffs_.size());
~~~ ^~~~~~~~~~~~~
%u
There's no great portable way to printf size_t types. Since this is debugging
code, I suggest this patch:
Nicos-MacBook-Pro:src thakis$ svn diff
Index: internal/offsetmap.cc
===================================================================
--- internal/offsetmap.cc (revision 165)
+++ internal/offsetmap.cc (working copy)
@@ -79,7 +79,8 @@
}
Flush(); // Make sure any pending entry gets printed
- fprintf(fout, "Offsetmap: %ld bytes\n", diffs_.size());
+ fprintf(fout, "Offsetmap: %lu bytes\n",
+ static_cast<unsigned long>(diffs_.size()));
for (int i = 0; i < static_cast<int>(diffs_.size()); ++i) {
fprintf(fout, "%c%02d ", "&=+-"[OpPart(diffs_[i])], LenPart(diffs_[i]));
if ((i % 20) == 19) {fprintf(fout, "\n");}
Can you land this, please?
Original issue reported on code.google.com by [email protected]
on 18 Aug 2014 at 2:24
I'm trying to get CLD2 working on ARM32 inside of Chromium, cross-compiling
from a linux x64 host to arm32. The library loads properly, but the following
crash occurs when calling DetectLanguageSummary:
Program received signal SIGBUS, Bus error.
#0 CLD2::UTF8GenericScan (st=0x61a82104, str=<optimized out>,
bytes_consumed=0x5f00f88c)
at ../../third_party/cld_2/src/internal/utf8statetable.cc:518
I'll attach the full trace as a file. Well, minus the Chromium bits. Anyhow,
the problem appears to be with this snippet of code in utf8statetable.cc:
// Do fast for groups of 8 identity bytes.
// This covers a lot of 7-bit ASCII ~8x faster than the 1-byte loop,
// including slowing slightly on cr/lf/ht
//----------------------------
const uint8* Tbl2 = &st->fast_state[0];
uint32 losub = st->losub;
uint32 hiadd = st->hiadd;
while (src < srclimit8) {
uint32 s0123 = (reinterpret_cast<const uint32 *>(src))[0];
uint32 s4567 = (reinterpret_cast<const uint32 *>(src))[1];
src += 8;
Inspecting the pointers in the debugger during the crash, and looking at the
"src" variable, seems to reveal the problem:
(gdb) p src
$32 = (
const CLD2::uint8 *) 0x58de4bee "\n\n\n百度一下\n地图贴吧视频图片hao123\n新闻应用音乐文库更多\n小说游戏下载\n把百度放到桌面上,
搜索最方便\n触屏版极速版\nBaidu 京ICP证030173号"
Specifically, src is located at 0x58de4bee. Since this isn't a 4-byte (32-bit)
aligned address, the SIGBUS presumably comes from trying to read it as a
uint32*. Many thanks to [email protected] and [email protected] for the
help in diagnosing this, I was a bit lost in the weeds looking at my dynamic
data changes, which turn out to be completely unrelated (this happens with and
without dynamic data mode).
The suggested workaround for this case is to %4 the address and do a one-off
scan of the first 0-3 bytes (as necessary), and then descend into the fast
loop; the concern is that there may be other places in CLD2 that have similar
behavior and might be time bombs. It might be a good idea to add some memory
churning code to the unit tests, and then start running the unit tests
themselves on ARM to further diagnose other problems like this that might arise.
Original issue reported on code.google.com by [email protected]
on 20 Mar 2014 at 3:11
Attachments:
There appears to be a weird mix of both open() and fopen() (with corresponding
close() and fclose()) in cld2_dynamic_data_loader.cc, and possibly other places
in the code. We should consistently use one or the other. To use close() we'd
also technically need to depend on unistd.h, I think, which we currently don't.
This is causing some problems for Chromium, though why it has just cropped up
now I could not say:
https://code.google.com/p/chromium/issues/detail?id=403222
The fix here should be trivial, and I'll take care of it now.
Original issue reported on code.google.com by [email protected]
on 13 Aug 2014 at 9:41
What steps will reproduce the problem?
1. Try to compile the cld_2_dynamic_data_tool with gcc 4.8 in Ubuntu 12.04
What is the expected output? What do you see instead?
It fails because close() isn't defined. close() is declared in <unistd.h> and
adding that include makes it compile.
I could do a patch but I suspects it is much faster for everyone that a
maintainer just does this manually:
index 7227b8e..06375e18 100644
--- a/third_party/cld_2/src/internal/cld2_dynamic_data_loader.cc
+++ b/third_party/cld_2/src/internal/cld2_dynamic_data_loader.cc
@@ -19,6 +19,7 @@
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
+#include <unistd.h>
#include "cld2_dynamic_data.h"
#include "cld2_dynamic_data_loader.h"
Original issue reported on code.google.com by [email protected]
on 29 Aug 2014 at 8:22
Upon running some browser tests in Chrome, the following error was encountered
when attempting to call CLD2::loadDataFromRawAddress():
memory allocation/deallocation mismatch at 0x155bb621cb20: allocated with new
[] being deallocated with delete
Received signal 11 SEGV_MAPERR 000000000039
...
#6 0x000002b7b791 MallocBlock::CheckLocked()
#7 0x000002b7b422 MallocBlock::CheckAndClear()
#8 0x000002b7bb4a MallocBlock::Deallocate()
#9 0x000002b79109 DebugDeallocate()
#10 0x000009e02885 operator delete()
#11 0x000006ecd635 CLD2DynamicDataLoader::loadDataInternal()
#12 0x000006ecd325 CLD2DynamicDataLoader::loadDataRaw()
#13 0x000006eba963 CLD2::loadDataFromRawAddress()
I'm not sure why this wasn't caught earlier in testing. It may be a consequence
of toolchain changes in Chromium, but the error seems valid and should be
fixed. This was previously working without issue on both Linux and Android
platform builds for x64 and ARM respectively.
I will review the other uses of delete to see if there are more occurrences.
This should be a trivial fix, but blocks adoption of CLD2 dynamic mode in
Chromium.
Original issue reported on code.google.com by [email protected]
on 15 May 2014 at 4:32
As described in issue 19, the current implementation of dynamic data won't work
in windows because it relies on:
* from sys/mman.h: mmap(), munmap()
* from unistd.h: close()
These header files don't exist in vanilla win32 build environments, so
compatibility is broken. The fix for close() is being implemented in issue 19,
but the fix for mmap() is less straightforward.
Original issue reported on code.google.com by [email protected]
on 13 Aug 2014 at 11:40
Hello,
I'm trying to extract natural language from a web crawl for use in NLP
applications. Since web pages often have multiple languages on them, I'm using
CLD2's ResultChunkVector API to split each page into chunks of known uniform
language. The problem I'm running into is that fairly often, the
ResultChunkVector simply doesn't include parts of the input text -- I've
attached two sample files that demonstrate this. In 32200.utf8, the first chunk
starts at position 8 -- I guess this has something to do with the fact that the
file starts with numbers/punctuation? In 27878255.utf8, the first chunk covers
positions 0-65530, and the second chunk begins at position 199884 (so there's a
very substantial amount of text being skipped! and the text appears to be plain
old English, nothing special) -- I guess this might have something to do with
the use of a 2-byte length field, but the length of the first chunk isn't
2**16. And perhaps there are other cases that also lead to gaps like this.
My expectation was that the first chunk would always start at position 0, that
each chunk would start where the previous one ended, and that the last chunk
would end at the end of the input file. Or, if this isn't possible, then is
there any guidance on how gaps like this should be interpreted? I could simply
pretend they were tagged "unknown", but this seems like a pretty weird way to
handle the 140 kB of English text in 27878255.utf8.
I'm using the "full" detector, but these files trigger the behaviour in both
full and regular modes (slightly differently).
Original issue reported on code.google.com by [email protected]
on 1 Jul 2014 at 11:04
Attachments:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.