Giter Site home page Giter Site logo

charlock_holmes's People

Contributors

adaugherity avatar bbenezech avatar brianmario avatar dgraham avatar docwhat avatar gogainda avatar greysteil avatar grosser avatar holek avatar josh avatar ktdreyer avatar mistydemeo avatar nicolasleger avatar olleolleolle avatar stanhu avatar stephanvane avatar tenderlove avatar tmm1 avatar vmg avatar vzctl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

charlock_holmes's Issues

Failed to compile charlock_holmes 0.7.3 in heroku cedar-14 buildpack-multi

I followed http://tooky.co.uk/using-charklock_holmes-on-heroku/ with little modification for cedar-14 paths that use Ubuntu Trusty.

My last deploy on 2015/01/11 to heroku successed. Today I try to deploy the exact same code to another heroku apps, but failed.

Here is the error messages that I got:

          Gem::Installer::ExtensionBuildError: ERROR: Failed to build gem native extension.

          /tmp/build_22b7afabc74cec876e6a4a369b12c2f5/vendor/ruby-2.0.0/bin/ruby extconf.rb --with-icu-lib=/tmp/build_22b7afabc74cec876e6a4a369b12c2f5/.apt/usr/lib/x86_64-linux-gnu --with-icu-include=/tmp/build_22b7afabc74cec876e6a4a369b12c2f5/.apt/usr/include/x86_64-linux-gnu --with-ldflags='-fPIC' --with-cflags='-fPIC' --with-cxxflags='-fPIC'
          checking for main() in -licui18n... yes
          checking for main() in -licui18n... yes
          checking for unicode/ucnv.h... yes
          checking for main() in -lz... yes
          checking for main() in -licuuc... yes
          checking for main() in -licudata... yes
          creating Makefile

          make "DESTDIR="
          compiling encoding_detector.c
          In file included from encoding_detector.c:2:0:
          common.h:14:14: warning: ‘charlock_new_enc_str’ defined but not used [-Wunused-function]
          static VALUE charlock_new_enc_str(const char *str, size_t len, void *encoding)
          ^
          compiling ext.c
          In file included from ext.c:1:0:
          common.h:14:14: warning: ‘charlock_new_enc_str’ defined but not used [-Wunused-function]
          static VALUE charlock_new_enc_str(const char *str, size_t len, void *encoding)
          ^
          common.h:23:14: warning: ‘charlock_new_str’ defined but not used [-Wunused-function]
          static VALUE charlock_new_str(const char *str, size_t len)
          ^
          common.h:32:14: warning: ‘charlock_new_str2’ defined but not used [-Wunused-function]
          static VALUE charlock_new_str2(const char *str)
          ^
          compiling converter.c
          In file included from converter.c:2:0:
          common.h:23:14: warning: ‘charlock_new_str’ defined but not used [-Wunused-function]
          static VALUE charlock_new_str(const char *str, size_t len)
          ^
          common.h:32:14: warning: ‘charlock_new_str2’ defined but not used [-Wunused-function]
          static VALUE charlock_new_str2(const char *str)
          ^
          compiling transliterator.cpp
          linking shared-object charlock_holmes/charlock_holmes.so
          /usr/bin/ld: /tmp/build_22b7afabc74cec876e6a4a369b12c2f5/.apt/usr/lib/x86_64-linux-gnu/libicui18n.a(smpdtfmt.ao): relocation R_X86_64_PC32 against symbol `_ZN6icu_5216SimpleDateFormat22isAfterNonNumericFieldERKNS_13UnicodeStringEi' can not be used when making a shared object; recompile with -fPIC
          /usr/bin/ld: final link failed: Bad value
          collect2: error: ld returned 1 exit status
          make: *** [charlock_holmes.so] Error 1


          Gem files will remain installed in /tmp/build_22b7afabc74cec876e6a4a369b12c2f5/vendor/bundle/ruby/2.0.0/gems/charlock_holmes-0.7.3 for inspection.
          Results logged to /tmp/build_22b7afabc74cec876e6a4a369b12c2f5/vendor/bundle/ruby/2.0.0/gems/charlock_holmes-0.7.3/ext/charlock_holmes/gem_make.out
          Installing nested_form 0.3.2
          Installing newrelic_rpm 3.8.1.221
          An error occurred while installing charlock_holmes (0.7.3), and Bundler cannot
          continue.
          Make sure that `gem install charlock_holmes -v '0.7.3'` succeeds before
          bundling.
    !
    !     Failed to install gems via Bundler.
    !

    !     Push rejected, failed to compile Multipack app

I have try various combination, but still got the same errors.

.buildpacks file:

https://github.com/ddollar/heroku-buildpack-apt
https://github.com/timolehto/heroku-bundle-config
https://github.com/heroku/heroku-buildpack-ruby#v129

.heroku-bundle/config file:


---
BUNDLE_FROZEN: '1'
BUNDLE_PATH: vendor/bundle
BUNDLE_BIN: vendor/bundle/bin
BUNDLE_JOBS: 4
BUNDLE_WITHOUT: development:test
BUNDLE_DISABLE_SHARED_GEMS: '1'
BUNDLE_BUILD__CHARLOCK_HOLMES: --with-icu-lib=/app/.apt/usr/lib/x86_64-linux-gnu --with-icu-include=/app/.apt/usr/include/x86_64-linux-gnu --with-ldflags="$LDFLAGS -fPIC" --with-cflags="$CFLAGS -fPIC" --with-cxxflags="$CXXFLAGS -fPIC"

In the previous app, I just use BUNDLE_BUILD__CHARLOCK_HOLMES: --with-icu-lib=/app/.apt/usr/lib/x86_64-linux-gnu --with-icu-include=/app/.apt/usr/include/x86_64-linux-gnu.

I have try googling with various keywords, only to find references about gentoo and old version of charlock_holmes. I have check and only see that heroku-buildpack-ruby recently updated to v130, that's why I try to use v129, but it's show the same error.

My Gemfile:

source 'https://rubygems.org'
ruby '2.0.0'

gem 'puma'
gem "rack-timeout"

gem 'rails', '4.0.5'
gem 'sass-rails', '~> 4.0.0'
gem 'uglifier', '>= 1.3.0'
gem 'coffee-rails', '~> 4.0.0'

gem 'mongoid'
gem 'devise'
gem 'jquery-rails'
gem "twitter-bootstrap-rails"
gem 'jbuilder', '~> 1.2'
gem 'kaminari'
gem 'bootstrap-modal-rails', github: 'vicentereig/bootstrap-modal-rails'
gem 'hashie'
gem 'descriptive_statistics'

# db related
# gem 'acts_as_list_mongoid'
gem 'public_activity'
gem 'mongoid-tree', require: 'mongoid/tree'

# Exception handling
gem 'honeybadger'

# Encoding handling
# gem 'charlock_holmes', github: 'brianmario/charlock_holmes', branch: "bundle-icu"
gem 'charlock_holmes'

# Assets
gem 'jquery-ui-rails'
gem "selectize-rails"
gem 'jquery-datatables-rails', '~> 2.2.3'
gem 'nprogress-rails'
gem 'tabulous'

# forms
gem "simple_form", "~> 3.0.0"
gem "country_select", "~> 1.2.0"
gem "nested_form"
gem 'carrierwave'
gem 'carrierwave-mongoid', :require => 'carrierwave/mongoid'
gem 'mongoid-grid_fs', github: 'ahoward/mongoid-grid_fs'
gem 'open_uri_redirections'
gem 'remotipart', '~> 1.2.1'

# View generation
gem 'builder', '~> 3.1'
gem 'prawn'
gem 'prawn-html5', github: 'cs/prawn-html5'
gem 'wisepdf'
gem 'pandoc-ruby'
gem 'htmltoword'

# External apis
gem 'nokogiri'

# Heroku related
group :production do
  gem 'rails_12factor'
  gem "wkhtmltopdf-heroku", "~> 1.0.0"
end

group :development, :test do
  gem 'hirb'
  gem "dotenv-rails"
  gem 'rspec-rails', :require => false
  gem 'capybara-webkit'
  gem 'capybara'
  gem 'cucumber-rails', '~> 1.3', :require => false
  # database_cleaner is not required, but highly recommended
  gem 'database_cleaner'
  gem 'quiet_assets'
  gem 'factory_girl_rails'
  gem 'faker'
  gem 'email_spec'
  gem 'guard-cucumber'
  gem 'guard-rspec'
  gem 'guard-rails'
  gem 'guard-spork'
  gem 'guard-sidekiq'
  gem "rb-readline", "~> 0.5.0"
  gem 'rb-fsevent'
  gem 'growl'
  gem "spork", github: "sporkrb/spork"
  gem 'spork-rails', github: 'sporkrb/spork-rails'
  gem 'mongoid-rspec'
  gem 'fakeweb'
  gem 'commands'
end

group :development do
  gem "bullet"
  gem 'better_errors'
  gem 'binding_of_caller'
  gem 'meta_request'
  gem "foreman"
end

# monitoring
gem 'newrelic_rpm'

# background job
gem 'sidekiq', '~> 3'
gem 'sidekiq-failures'
gem 'sinatra', '>= 1.3.0', :require => nil

# bug tracking
gem "hirefire-resource"
gem 'rollbar'
gem 'raygun4ruby'

Failed to build on 64 bit system - static (absolute) path to libmagic.a

Hello,

here is patch which detects if the OS is 64 bit or 32 bit and sets correct path to lib/libmagic.a in extconf.rb. I tested it on Linux OS, but it should works for Mac OS too.

From c83692d5fbe213bc3fbc0b672135130ca0de89fd Mon Sep 17 00:00:00 2001
From: Jaromír Červenka <[email protected]>
Date: Tue, 7 Feb 2012 10:24:01 +0100
Subject: [PATCH] Repaired absolute path to 'lib/libmagic.a' in extconf.rb.

---
 ext/charlock_holmes/extconf.rb |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/ext/charlock_holmes/extconf.rb b/ext/charlock_holmes/extconf.rb
index c955636..74a0cb5 100644
--- a/ext/charlock_holmes/extconf.rb
+++ b/ext/charlock_holmes/extconf.rb
@@ -63,7 +63,9 @@ Dir.chdir("#{CWD}/src") do
   end
 end

-FileUtils.cp "#{CWD}/dst/lib/libmagic.a", "#{CWD}/libmagic_ext.a"
+['foo'].pack('p').size == 8 ? lib_dir = 'lib64' : lib_dir = 'lib'
+
+FileUtils.cp "#{CWD}/dst/#{lib_dir}/libmagic.a", "#{CWD}/libmagic_ext.a"

 $INCFLAGS[0,0] = " -I#{CWD}/dst/include "
 $LDFLAGS << " -L#{CWD} "
--
1.7.7

Jruby?

Any suggestions for getting this working with jruby? I see a fork that is quite dated (back at v0.1.2) for jruby. I am presently attempting to prepare GitLab for deployment to a TorqueBox instance, but it's griping about needing this gem at 0.6.9.4 and of course provided with a non c extension.

trouble with treating cpp file as binary

I have some source code files (*.cpp) that contain

include

include

int main(int , char *[])
{
char buff[10];
memcpy(buff,"123=^A456=^A",10);
std::cout<<buff;
return 0;
}

where the ^A is the character used as a field separator. (In vi its entered as Control-V Control-A) and not 2 chars "^" "A".

when i run
irb(main):001:0> require 'charlock_holmes'
=> true
irb(main):002:0> contents = File.read('main.cpp')
=> "#include \n#include \nint main(int , char *[])\n{\n char buff[10];\n memcpy(buff,"123=\u0001456=\u0001",10);\n std::cout<<buff;\n return 0;\n}\n"
irb(main):003:0> CharlockHolmes::EncodingDetector.detect(contents)
=> {:type=>:binary, :confidence=>100}
irb(main):004:0>

I can't seem to find a way to have this return as a text file. Is there a way to force a file to be treated as text?
set the first N chars to a specified string? ie "// charlock_holmes_type=text

After upgrading to the latest version getting an error after every deploy...

Here's the error:

Oct 03 14:16:14 stg-web-01:  /shared/bundle/ruby/2.1.0/gems/charlock_holmes-0.7.3/lib/charlock_holmes.rb:1:in `require': libicuuc.so.48: cannot open shared object file: No such file or directory - /shared/bundle/ruby/2.1.0/extensions/x86_64-linux/2.1.0/charlock_holmes-0.7.3/charlock_holmes/charlock_holmes.so (LoadError)

Here's how I temporarily stop the error (until the next deploy):

  1. Navigate to the directory on the server where my application resides
  2. bundle install --deployment --without development

That's all I do and the error goes away, until the next time I deploy.

I'm using capistrano to deploy and here's my gem list

Using json (1.8.1)
Using mini_portile (0.6.0)
Using nokogiri (1.6.3.1)
Using aws-sdk-v1 (1.54.0)
Using aws-sdk (1.54.0)
Using hitimes (1.2.2)
Using timers (4.0.1)
Using celluloid (0.16.0)
Using charlock_holmes (0.7.3)
Using connection_pool (2.0.0)
Using pg (0.17.1)
Using redis (3.1.0)
Using redis-namespace (1.5.1)
Using rubyzip (1.1.6)
Using sidekiq (3.2.4)
Using sidekiq-pro (1.8.0)
Using bundler (1.5.3)

When capistrano deploys this is the command it outputs that it's running...

cd /releases/20141003182225 && ~/.rvm/bin/rvm default do bundle install --binstubs /shared/bin --path /shared/bundle --without development test travis --deployment --quiet

Fails to detect KOI8-R, ARABIC, HEBREW

When I try to detect the encoding of certain files, charlock_holmes returns the wrong encoding. Charlock Holmes says iso-8859-1, fr, no matter which language is actually involved.

Examples:

https://github.com/mcandre/enlint/tree/master/examples

examples/polite-russian.html (actual encoding: KOI8-R)

> contents = File.read('examples/polite-russian.html')
> detection = CharlockHolmes::EncodingDetector.detect(contents)
 => {:type=>:text, :encoding=>"ISO-8859-1", :ruby_encoding=>"ISO-8859-1", :confidence=>33, :language=>"fr"}

examples/www-arabic/polite-arabic.html (actual encoding: ARABIC)

> contents = File.read('examples/www-arabic/polite-arabic.html')
> detection = CharlockHolmes::EncodingDetector.detect(contents)
 => {:type=>:text, :encoding=>"ISO-8859-1", :ruby_encoding=>"ISO-8859-1", :confidence=>27, :language=>"fr"}

examples/checkstyle.xml (actual encoding: HEBREW)

> contents = File.read('examples/checkstyle.xml')
> detection = CharlockHolmes::EncodingDetector.detect(contents)
 => {:type=>:text, :encoding=>"ISO-8859-1", :ruby_encoding=>"ISO-8859-1", :confidence=>30, :language=>"fr"}

Test case fails for French

+ testrb -Ilib test/converter_test.rb test/encoding_detector_test.rb test/string_methods_test.rb test/transliterator_test.rb
Run options: -Ilib

# Running tests:

..................F...

Finished tests in 0.107995s, 203.7137 tests/s, 601.8813 assertions/s.

  1) Failure:
TransliteratorTest#test_transliterate [/home/lolcat/rpmbuild/BUILD/charlock_holmes-0.6.9.4/test/transliterator_test.rb:64]:
--- expected
+++ actual
@@ -1 +1 @@
-"Je peux manger du verre, ca ne me fait pas de mal."
+"Je peux manger du verre, \uFFFD\uFFFDa ne me fait pas de mal."

The test case is given here: https://github.com/brianmario/charlock_holmes/blob/master/test/transliterator_test.rb#L15

push tags to github

When you tag releases, would you mind pushing the tags to GitHub? For example the latest version on Rubygems.org is 0.6.9.4, but the v0.6.9.4 tag is missing from the repository.

regexc error when detecting strings

It seems that some strings I try to detect are causing a "regexec error":

[1] pry(main)> s = "çe"
=> "çe"
[2] pry(main)> CharlockHolmes::EncodingDetector.detect(s)
=> {:type=>:text, :encoding=>"UTF-8", :confidence=>80}
[3] pry(main)> s = "eç"
=> "eç"
[4] pry(main)> CharlockHolmes::EncodingDetector.detect(s)
StandardError: line 250: regexec error 17, (illegal byte sequence), 
from /gems/charlock_holmes-0.6.9/lib/charlock_holmes/encoding_detector.rb:15:in `detect'

I've updated to the latest gem version of CharlockHolmes. I only get this error on my macbook; on a ubuntu machine it works fine.

Make failed. "transliterator.o: bad reloc address 0xf in section `.text$_ZN6icu_548ByteSinkC2Ev[__ZN6icu_548ByteSinkC2Ev]'". [Ruby 2.1.5p273] [i386-mingw32] [Windows 8.1 x64] [ICU 5.4]

Getting the following error when building charlock_holmes.so on Windows:

C:\>gem install charlock_holmes  -v '0.7.3' -- --with-opt-dir=C:/opt --with-icui18nlib=icuin --with-icudatalib=icudt --with-zlib=zlib                                                                                                                 
Temporarily enhancing PATH to include DevKit...                                                                                                                                                                                           
Building native extensions with: '--with-opt-dir=C:/opt --with-icui18nlib=icuin --with-icudatalib=icudt --with-zlib=zlib'                                                                                                                 
This could take a while...                                                                                                                                                                                                                
ERROR:  Error installing charlock_holmes:                                                                                                                                                                                                 
        ERROR: Failed to build gem native extension.                                                                                                                                                                                      

    C:/Ruby21/bin/ruby.exe extconf.rb --with-opt-dir=C:/opt --with-icui18nlib=icuin --with-icudatalib=icudt --with-zlib=zlib                                                                                                              
checking for main() in -licuin... yes                                                                                                                                                                                                     
checking for main() in -licuin... yes                                                                                                                                                                                                     
checking for unicode/ucnv.h... yes                                                                                                                                                                                                        
checking for main() in -lzlib... yes                                                                                                                                                                                                      
checking for main() in -licuuc... yes                                                                                                                                                                                                     
checking for main() in -licudt... yes                                                                                                                                                                                                     
creating Makefile                                                                                                                                                                                                                         

make "DESTDIR=" clean                                                                                                                                                                                                                     

make "DESTDIR="                                                                                                                                                                                                                           
generating charlock_holmes-i386-mingw32.def                                                                                                                                                                                               
compiling converter.c                                                                                                                                                                                                                     
In file included from converter.c:2:0:                                                                                                                                                                                                    
common.h:23:14: warning: 'charlock_new_str' defined but not used [-Wunused-function]                                                                                                                                                      
common.h:32:14: warning: 'charlock_new_str2' defined but not used [-Wunused-function]                                                                                                                                                     
compiling encoding_detector.c                                                                                                                                                                                                             
In file included from encoding_detector.c:2:0:                                                                                                                                                                                            
common.h:14:14: warning: 'charlock_new_enc_str' defined but not used [-Wunused-function]                                                                                                                                                  
compiling ext.c                                                                                                                                                                                                                           
In file included from ext.c:1:0:                                                                                                                                                                                                          
common.h:14:14: warning: 'charlock_new_enc_str' defined but not used [-Wunused-function]                                                                                                                                                  
common.h:23:14: warning: 'charlock_new_str' defined but not used [-Wunused-function]                                                                                                                                                      
common.h:32:14: warning: 'charlock_new_str2' defined but not used [-Wunused-function]                                                                                                                                                     
compiling transliterator.cpp                                                                                                                                                                                                              
linking shared-object charlock_holmes/charlock_holmes.so                                                                                                                                                                                  
transliterator.o:transliterator.cpp:(.text+0x14f): undefined reference to `icu_54::Transliterator::getAvailableIDs(UErrorCode&)'                                                                                                          
transliterator.o:transliterator.cpp:(.text+0x452): undefined reference to `icu_54::UnicodeString::UnicodeString(char const*, int)'                                                                                                        
transliterator.o:transliterator.cpp:(.text+0x483): undefined reference to `icu_54::Transliterator::createInstance(icu_54::UnicodeString const&, UTransDirection, UParseError&, UErrorCode&)'                                              
transliterator.o:transliterator.cpp:(.text+0x49a): undefined reference to `icu_54::UnicodeString::~UnicodeString()'                                                                                                                       
transliterator.o:transliterator.cpp:(.text+0x4e0): undefined reference to `icu_54::UMemory::operator new(unsigned int)'                                                                                                                   
transliterator.o:transliterator.cpp:(.text+0x511): undefined reference to `icu_54::UnicodeString::UnicodeString(char const*, int)'                                                                                                        
transliterator.o:transliterator.cpp:(.text+0x576): undefined reference to `icu_54::UnicodeString::toUTF8(icu_54::ByteSink&) const'                                                                                                        
transliterator.o:transliterator.cpp:(.text+0x646): undefined reference to `icu_54::UnicodeString::~UnicodeString()'                                                                                                                       
transliterator.o:transliterator.cpp:(.text+0x668): undefined reference to `icu_54::UMemory::operator delete(void*)'                                                                                                                       
c:/devkit-mingw64-32-4.7.2-20130224-1151/mingw/bin/../lib/gcc/i686-w64-mingw32/4.7.2/../../../../i686-w64-mingw32/bin/ld.exe: transliterator.o: bad reloc address 0xf in section `.text$_ZN6icu_548ByteSinkC2Ev[__ZN6icu_548ByteSinkC2Ev]'
collect2.exe: error: ld returned 1 exit status                                                                                                                                                                                            
make: *** [charlock_holmes.so] Error 1                                                                                                                                                                                                    

make failed, exit code 2                                                                                                                                                                                                                  

Gem files will remain installed in C:/Ruby21/lib/ruby/gems/2.1.0/gems/charlock_holmes-0.7.3 for inspection.                                                                                                                               
Results logged to C:/Ruby21/lib/ruby/gems/2.1.0/extensions/x86-mingw32/2.1.0/charlock_holmes-0.7.3/gem_make.out                                                                                                                           

C:/opt has /bin, /include and /lib subfolders with all required *.dlls, headers and *.libs of windows build of icu and zlib.
Windows versions of "icui18nlib" and "icudatalib" have different names than their *nix versions, so I have used this "--with-icui18nlib=icuin --with-icudatalib=icudt": it looks like names are correct because it gives much more errors without these..

But why this error still happens " transliterator.o: bad reloc address 0xf in section `.text$_ZN6icu_548ByteSinkC2Ev[__ZN6icu_548ByteSinkC2Ev]'" ?

Thank you to anyone who can help.

fails to build on ruby 1.9.2

I thinks there is a build problem for ruby 1.9.2.

Appears on Ubuntu 10.10, Debian 5 with ruby 192 (p180, p290). (187 and ree works fine)

Steps:

apt-get install -y libicu-dev
gem install charlock_holmes

Output:

Fetching: charlock_holmes-0.6.7.gem (100%)
Building native extensions.  This could take a while...
ERROR:  Error installing charlock_holmes:
    ERROR: Failed to build gem native extension.

        /usr/local/bin/ruby extconf.rb
checking for main() in -licui18n... yes
checking for main() in -licui18n... yes
checking for unicode/ucnv.h... yes
  -- tar zxvf file-5.08.tar.gz
  -- ./configure --prefix=/usr/local/lib/ruby/gems/1.9.1/gems/charlock_holmes-0.6.7/ext/charlock_holmes/dst/ --disable-shared --enable-static --with-pic
  -- make
  -- make install
checking for main() in -lmagic_ext... yes
checking for magic.h... yes
creating Makefile

make
gcc -I.  -I/usr/local/lib/ruby/gems/1.9.1/gems/charlock_holmes-0.6.7/ext/charlock_holmes/dst/include -I/usr/local/include/ruby-1.9.1/x86_64-linux -I/usr/local/include/ruby-1.9.1/ruby/backward -I/usr/local/include/ruby-1.9.1 -I. -DHAVE_UNICODE_UCNV_H -DHAVE_MAGIC_H    -fPIC -O3 -ggdb -Wextra -Wno-unused-parameter -Wno-parentheses -Wpointer-arith -Wwrite-strings -Wno-missing-field-initializers -Wno-long-long  -fPIC -Wall -funroll-loops  -o encoding_detector.o -c encoding_detector.c
encoding_detector.c: In function 'detect_binary_content':
encoding_detector.c:64: warning: format not a string literal and no format arguments
encoding_detector.c: In function 'rb_encdec__alloc':
encoding_detector.c:278: warning: format not a string literal and no format arguments
encoding_detector.c:283: warning: format not a string literal and no format arguments
encoding_detector.c: At top level:
common.h:14: warning: 'charlock_new_enc_str' defined but not used
gcc -I.  -I/usr/local/lib/ruby/gems/1.9.1/gems/charlock_holmes-0.6.7/ext/charlock_holmes/dst/include -I/usr/local/include/ruby-1.9.1/x86_64-linux -I/usr/local/include/ruby-1.9.1/ruby/backward -I/usr/local/include/ruby-1.9.1 -I. -DHAVE_UNICODE_UCNV_H -DHAVE_MAGIC_H    -fPIC -O3 -ggdb -Wextra -Wno-unused-parameter -Wno-parentheses -Wpointer-arith -Wwrite-strings -Wno-missing-field-initializers -Wno-long-long  -fPIC -Wall -funroll-loops  -o ext.o -c ext.c
common.h:14: warning: 'charlock_new_enc_str' defined but not used
common.h:23: warning: 'charlock_new_str' defined but not used
common.h:32: warning: 'charlock_new_str2' defined but not used
gcc -I.  -I/usr/local/lib/ruby/gems/1.9.1/gems/charlock_holmes-0.6.7/ext/charlock_holmes/dst/include -I/usr/local/include/ruby-1.9.1/x86_64-linux -I/usr/local/include/ruby-1.9.1/ruby/backward -I/usr/local/include/ruby-1.9.1 -I. -DHAVE_UNICODE_UCNV_H -DHAVE_MAGIC_H    -fPIC -O3 -ggdb -Wextra -Wno-unused-parameter -Wno-parentheses -Wpointer-arith -Wwrite-strings -Wno-missing-field-initializers -Wno-long-long  -fPIC -Wall -funroll-loops  -o converter.o -c converter.c
converter.c: In function 'rb_converter_convert':
converter.c:26: warning: format not a string literal and no format arguments
converter.c:35: warning: format not a string literal and no format arguments
converter.c:39: error: lvalue required as left operand of assignment
make: *** [converter.o] Error 1


Gem files will remain installed in /usr/local/lib/ruby/gems/1.9.1/gems/charlock_holmes-0.6.7 for inspection.
Results logged to /usr/local/lib/ruby/gems/1.9.1/gems/charlock_holmes-0.6.7/ext/charlock_holmes/gem_make.out

Unable to build on 10.9

λ chaos code → CC=/usr/local/bin/gcc42 gem install charlock_holmes -- --with-icu-dir=/usr/local/opt/icu4c
Building native extensions with: '--with-icu-dir=/usr/local/opt/icu4c'
This could take a while...
ERROR:  Error installing charlock_holmes:
    ERROR: Failed to build gem native extension.

    /Users/max/.rvm/rubies/ruby-2.1.0/bin/ruby extconf.rb --with-icu-dir=/usr/local/opt/icu4c
checking for main() in -licui18n... yes
checking for main() in -licui18n... yes
checking for unicode/ucnv.h... yes
  -- tar zxvf file-5.08.tar.gz
  -- ./configure --prefix=/Users/max/.rvm/gems/ruby-2.1.0/gems/charlock_holmes-0.6.9.4/ext/charlock_holmes/dst/ --disable-shared --enable-static --with-pic
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of necessary
libraries and/or headers.  Check the mkmf.log file for more details.  You may
need configuration options.

Provided configuration options:
    --with-opt-dir
    --without-opt-dir
    --with-opt-include
    --without-opt-include=${opt-dir}/include
    --with-opt-lib
    --without-opt-lib=${opt-dir}/lib
    --with-make-prog
    --without-make-prog
    --srcdir=.
    --curdir
    --ruby=/Users/max/.rvm/rubies/ruby-2.1.0/bin/ruby
    --with-icu-dir
    --with-icu-include
    --without-icu-include=${icu-dir}/include
    --with-icu-lib
    --without-icu-lib=${icu-dir}/lib
    --with-icui18nlib
    --without-icui18nlib
    --with-icui18nlib
    --without-icui18nlib
extconf.rb:7:in `sys': ./configure --prefix=/Users/max/.rvm/gems/ruby-2.1.0/gems/charlock_holmes-0.6.9.4/ext/charlock_holmes/dst/ --disable-shared --enable-static --with-pic failed, please report issue on http://github.com/brianmario/charlock_holmes (RuntimeError)
    from extconf.rb:60:in `block (2 levels) in '
    from extconf.rb:59:in `chdir'
    from extconf.rb:59:in `block in '
    from extconf.rb:55:in `chdir'
    from extconf.rb:55:in `'

extconf failed, exit code 1

Gem files will remain installed in /Users/max/.rvm/gems/ruby-2.1.0/gems/charlock_holmes-0.6.9.4 for inspection.
Results logged to /Users/max/.rvm/gems/ruby-2.1.0/extensions/x86_64-darwin-12/2.1.0-static/charlock_holmes-0.6.9.4/gem_make.out
λ chaos code →

I also previously had an issue with the gem install not finding stdarg.h. Weird.

Causes Ruby 1.9.2 and 1.9.3 on OS X to crash (segmentation fault)

Charlock holmes works for me on my end-2010 MacBookPro, using Ruby 1.9.2 or 1.9.3 built with rvm and Xcode's llvm-gcc, using icu4c from Homebrew built with llvm-gcc, and building charlock holmes with llvm-gcc.

It segfaults Ruby in CFUNC :convert on both my coworkers computers, that, as far as I can tell, are using the same hard- and software.

Stack trace: https://gist.github.com/65e72463a2482eea262f

Crash report: https://gist.github.com/30f73a239cdfada30455

failed to build.

/usr/bin/ruby.exe extconf.rb
checking for main() in -licui18n... yes
checking for main() in -licui18n... yes
checking for unicode/ucnv.h... yes
-- tar zxvf file-5.08.tar.gz
-- ./configure --prefix=/usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.8/ext/charlock_holmes/dst/ --disable-shared --enable-static --with-pic
-- make -C src install
-- make -C magic install
checking for main() in -lmagic_ext... yes
checking for magic.h... yes
creating Makefile

make
gcc -I. -I/usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.8/ext/charlock_holmes/dst/include -I. -I/usr/lib/ruby/1.8/i386-cygwin -I. -DHAVE_UNICODE_UCNV_H -DHAVE_MAGIC_H -g -O2 -pipe -fno-strict-aliasing -Wall -funroll-loops -c converter.c
common.h:23:14: warning: ‘charlock_new_str’ defined but not used
common.h:32:14: warning: ‘charlock_new_str2’ defined but not used
gcc -I. -I/usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.8/ext/charlock_holmes/dst/include -I. -I/usr/lib/ruby/1.8/i386-cygwin -I. -DHAVE_UNICODE_UCNV_H -DHAVE_MAGIC_H -g -O2 -pipe -fno-strict-aliasing -Wall -funroll-loops -c encoding_detector.c
common.h:14:14: warning: ‘charlock_new_enc_str’ defined but not used
gcc -I. -I/usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.8/ext/charlock_holmes/dst/include -I. -I/usr/lib/ruby/1.8/i386-cygwin -I. -DHAVE_UNICODE_UCNV_H -DHAVE_MAGIC_H -g -O2 -pipe -fno-strict-aliasing -Wall -funroll-loops -c ext.c
common.h:14:14: warning: ‘charlock_new_enc_str’ defined but not used
common.h:23:14: warning: ‘charlock_new_str’ defined but not used
common.h:32:14: warning: ‘charlock_new_str2’ defined but not used
gcc -shared -s -o charlock_holmes.so converter.o encoding_detector.o ext.o -L. -L/usr/lib -L. -L/usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.8/ext/charlock_holmes -Wl,--enable-auto-image-base,--enable-auto-import,--export-all -lruby -lmagic_ext -licui18n -licui18n -lrt -ldl -lcrypt
converter.o: In function rb_converter_convert': /usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.8/ext/charlock_holmes/converter.c:24: undefined reference to_ucnv_convert_48'
/usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.8/ext/charlock_holmes/converter.c:32: undefined reference to _ucnv_convert_48' /usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.8/ext/charlock_holmes/converter.c:35: undefined reference to_u_errorName_48'
encoding_detector.o: In function rb_get_supported_encodings': /usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.8/ext/charlock_holmes/encoding_detector.c:234: undefined reference to_uenum_count_48'
/usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.8/ext/charlock_holmes/encoding_detector.c:237: undefined reference to _uenum_next_48' /usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.8/ext/charlock_holmes/encoding_detector.c:237: undefined reference to_uenum_next_48'
/usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.8/ext/charlock_holmes/encoding_detector.c:237: undefined reference to _uenum_next_48' /usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.8/ext/charlock_holmes/encoding_detector.c:237: undefined reference to_uenum_next_48'
/usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.8/ext/charlock_holmes/encoding_detector.c:237: undefined reference to _uenum_next_48' encoding_detector.o:/usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.8/ext/charlock_holmes/encoding_detector.c:237: more undefined references to_uenum_next_48' follow
encoding_detector.o: In function rb_encdec__alloc': /usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.8/ext/charlock_holmes/encoding_detector.c:274: undefined reference to_u_errorName_48'
./libmagic_ext.a(compress.o): In function uncompressgzipped': /usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.8/ext/charlock_holmes/src/file-5.08/src/compress.c:357: undefined reference toinflateInit2'
/usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.8/ext/charlock_holmes/src/file-5.08/src/compress.c:363: undefined reference to _inflate' /usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.8/ext/charlock_holmes/src/file-5.08/src/compress.c:370: undefined reference to_inflateEnd'
collect2: ld returned 1 exit status
Makefile:152: recipe for target `charlock_holmes.so' failed
make: *** [charlock_holmes.so] Error 1

my env:
ruby 1.8

Install error on Dreamhost shared hosting

Hi, I'm trying to get GitLab installed on my Dreamhost shared host, but I get the following error when trying to install charlock_holmes:

gem install charlock_holmes
Building native extensions.  This could take a while...
ERROR:  Error installing charlock_holmes:
ERROR: Failed to build gem native extension.

 /home/****/.rvm/rubies/ruby-1.9.3-p448/bin/ruby extconf.rb
checking for main() in -licui18n... no
checking for main() in -licui18n... no


***************************************************************************************
*********** icu required (brew install icu4c or apt-get install libicu-dev) ***********
***************************************************************************************
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of
necessary libraries and/or headers.  Check the mkmf.log file for more
details.  You may need configuration options.

Provided configuration options:
--with-opt-dir
--without-opt-dir
--with-opt-include
--without-opt-include=${opt-dir}/include
--with-opt-lib
--without-opt-lib=${opt-dir}/lib
--with-make-prog
--without-make-prog
--srcdir=.
--curdir
--ruby=/home/****/.rvm/rubies/ruby-1.9.3-p448/bin/ruby
--with-icu-dir
--without-icu-dir
--with-icu-include
--without-icu-include=${icu-dir}/include
--with-icu-lib
--without-icu-lib=${icu-dir}/lib
--with-icui18nlib
--without-icui18nlib
--with-icui18nlib
--without-icui18nlib

As libicu-dev is not installed and I can't be sudo on Dreamhost, could anyone tell me how I can get charlock_holmes installed anyway?
Thanks!

Transliterator fails silently on very large strings

Transliterate works for large strings

>> CharlockHolmes::Transliterator.transliterate('a'*(1.megabyte), "Any-Latin").size
=> 1048576

But when strings get really long, they start to fail silently (Ubuntu Linux):

>> CharlockHolmes::Transliterator.transliterate('a'*(1.gigabyte), "Any-Latin").size
=> 0

In OS X, I get a little bit of trace extra to STDERR, but still returns empty string silently:

>> CharlockHolmes::Transliterator.transliterate('a'*(1.gigabyte), "Any-Latin").size
irb(99020,0x7fff711fd180) malloc: *** mmap(size=18446744071562072064) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
=> 0

The library should likely raise if we exceed the threshold?

Detection fails (could not find any magic files!)

I cant succesfully detect a string encoding in my local machine but in heroku using the bundle-icu branch I get this error:

 rb(main):010:0> str.encoding
 => #<Encoding:ASCII-8BIT>
 irb(main):011:0> str.detect_encoding
 StandardError: could not find any magic files!
    from /app/vendor/bundle/ruby/1.9.1/bundler/gems/charlock_holmes-fdaa437122ab/lib/charlock_holmes/string.rb:9:in `detect'
    from /app/vendor/bundle/ruby/1.9.1/bundler/gems/charlock_holmes-fdaa437122ab/lib/charlock_holmes/string.rb:9:in `detect_encoding'
    from (irb):11
    from /app/vendor/bundle/ruby/1.9.1/gems/railties-3.2.3/lib/rails/commands/console.rb:47:in `start'
    from /app/vendor/bundle/ruby/1.9.1/gems/railties-3.2.3/lib/rails/commands/console.rb:8:in `start'
    from /app/vendor/bundle/ruby/1.9.1/gems/railties-3.2.3/lib/rails/commands.rb:41:in `<top (required)>'
    from script/rails:6:in `require'
    from script/rails:6:in `<main>'

problem when installing in ubuntu 12.04

Hi,

I installed the icu packages in ubuntu using sudo apt-get install libicu-dev
When I then start a rake task (for example pre-compiling assets), i get the following error:

/var/www/monaqasat/shared/bundle/ruby/1.9.1/gems/charlock_holmes-0.6.9.2/lib/charlock_holmes/charlock_holmes.so: undefined symbol: _ZN6icu_518ByteSink15GetAppendBufferEiiPciPi - /var/www/monaqasat/shared/bundle/ruby/1.9.1/gems/charlock_holmes-0.6.9.2/lib/charlock_holmes/charlock_holmes.so
/var/www/monaqasat/shared/bundle/ruby/1.9.1/gems/activesupport-3.1.10/lib/active_support/dependencies.rb:240:in `require'
/var/www/monaqasat/shared/bundle/ruby/1.9.1/gems/activesupport-3.1.10/lib/active_support/dependencies.rb:240:in `block in require'
/var/www/monaqasat/shared/bundle/ruby/1.9.1/gems/activesupport-3.1.10/lib/active_support/dependencies.rb:223:in `block in load_dependency'
/var/www/monaqasat/shared/bundle/ruby/1.9.1/gems/activesupport-3.1.10/lib/active_support/dependencies.rb:640:in `new_constants_in'
/var/www/monaqasat/shared/bundle/ruby/1.9.1/gems/activesupport-3.1.10/lib/active_support/dependencies.rb:223:in `load_dependency'
/var/www/monaqasat/shared/bundle/ruby/1.9.1/gems/activesupport-3.1.10/lib/active_support/dependencies.rb:240:in `require'
/var/www/monaqasat/shared/bundle/ruby/1.9.1/gems/charlock_holmes-0.6.9.2/lib/charlock_holmes.rb:1:in `<top (required)>'
/var/lib/gems/1.9.1/gems/bundler-1.3.4/lib/bundler/runtime.rb:72:in `require'
/var/lib/gems/1.9.1/gems/bundler-1.3.4/lib/bundler/runtime.rb:72:in `block (2 levels) in require'
/var/lib/gems/1.9.1/gems/bundler-1.3.4/lib/bundler/runtime.rb:70:in `each'
/var/lib/gems/1.9.1/gems/bundler-1.3.4/lib/bundler/runtime.rb:70:in `block in require'
/var/lib/gems/1.9.1/gems/bundler-1.3.4/lib/bundler/runtime.rb:59:in `each'
/var/lib/gems/1.9.1/gems/bundler-1.3.4/lib/bundler/runtime.rb:59:in `require'
/var/lib/gems/1.9.1/gems/bundler-1.3.4/lib/bundler.rb:132:in `require'
/var/www/monaqasat/releases/20130324093011/config/application.rb:10:in `<top (required)>'
/var/www/monaqasat/releases/20130324093011/Rakefile:4:in `require'
/var/www/monaqasat/releases/20130324093011/Rakefile:4:in `<top (required)>'
(See full trace by running task with --trace)

Any ideas?

extconf.rb:7:in `sys': patch -p0 < ../file-soft-check.patch failed, please report issue

I try to install in my centOS 6.3 ... Please help me!

gem install charlock_holmes --version '0.6.9.4'
Building native extensions. This could take a while...
ERROR: Error installing charlock_holmes:
ERROR: Failed to build gem native extension.

/usr/local/bin/ruby extconf.rb

checking for main() in -licui18n... yes
checking for main() in -licui18n... yes
checking for unicode/ucnv.h... yes
-- tar zxvf file-5.08.tar.gz
-- ./configure --prefix=/usr/local/lib/ruby/gems/2.0.0/gems/charlock_holmes-0.6.9.4/ext/charlock_holmes/dst/ --disable-shared --enable-static --with-pic
-- patch -p0 < ../file-soft-check.patch
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of necessary
libraries and/or headers. Check the mkmf.log file for more details. You may
need configuration options.

Provided configuration options:
--with-opt-dir
--without-opt-dir
--with-opt-include
--without-opt-include=${opt-dir}/include
--with-opt-lib
--without-opt-lib=${opt-dir}/lib
--with-make-prog
--without-make-prog
--srcdir=.
--curdir
--ruby=/usr/local/bin/ruby
--with-icu-dir
--without-icu-dir
--with-icu-include
--without-icu-include=${icu-dir}/include
--with-icu-lib
--without-icu-lib=${icu-dir}/
--with-icui18nlib
--without-icui18nlib
--with-icui18nlib
--without-icui18nlib
extconf.rb:7:in sys': patch -p0 < ../file-soft-check.patch failed, please report issue on http://github.com/brianmario/charlock_holmes (RuntimeError) from extconf.rb:61:inblock (2 levels) in

'
from extconf.rb:59:in chdir' from extconf.rb:59:inblock in '
from extconf.rb:55:in chdir' from extconf.rb:55:in'

Gem files will remain installed in /usr/local/lib/ruby/gems/2.0.0/gems/charlock_holmes-0.6.9.4 for inspection.
Results logged to /usr/local/lib/ruby/gems/2.0.0/gems/charlock_holmes-0.6.9.4/ext/charlock_holmes/gem_make.out

New Point Release

Would it be possible to get a new point release, since it has been quite some time since the last one?

I am, specifically, looking to get one including the commit from the following pull request: #27

Thanks!

Detecting MacRoman file as ISO-8859-1

Hey,

I'm trying to transcode some CSV files but it ends up being gobbly gook. Charlock is detecting the file as ISO-8859-1, whereas BBEdit says the same file is MacRoman (Western), as exported from the Mac version of Excel.

    def transcode(filename)
      require 'charlock_holmes'

      content = File.read(filename)
      detection = ::CharlockHolmes::EncodingDetector.detect(content)
      puts "Unknown encoding for #{filename}" and return if detection.nil?

      encoding = detection[:encoding]
      return if encoding == 'UTF-8' # already what we want

      puts "Transcoding from #{encoding} to UTF-8 for #{filename}"
      utf8_encoded_content = CharlockHolmes::Converter.convert(content, encoding, 'UTF-8')
      File.open(filename, 'w:utf-8') {|f| f.write(utf8_encoded_content) }
    end

Is there a way to tell if this is an issue with ICU's detection vs. something I'm doing wrong?

Thanks, Nathan.

Unable to build on Mac OSX 10.7

Hi,

I've tried everything - brew install icu4c, removing and building icu4c from source, --with-icu-dir , --with-opt-include , upgrading XCode from 41. to 4.6, everything I could find on Google - I'm still completely unable to get charlock_holmes to install on my Mac OSX 10.7 (Lion).

Here's the output:

$ gem install charlock_holmes --version '0.6.9.4' -- --with-icu-dir=/usr/local/lib/icu
Building native extensions.  This could take a while...
ERROR:  Error installing charlock_holmes:
ERROR: Failed to build gem native extension.

/Users/aldavidson/.rvm/rubies/ruby-1.9.3-p125/bin/ruby extconf.rb --with-icu-dir=/usr/local/lib/icu
checking for main() in -licui18n... yes
checking for main() in -licui18n... yes
checking for unicode/ucnv.h... yes
  -- tar zxvf file-5.08.tar.gz
  -- ./configure --prefix=/Users/aldavidson/.rvm/gems/ruby-1.9.3-p125@wms/gems/charlock_holmes-0.6.9.4/ext/charlock_holmes/dst/ --disable-shared --enable-static --with-pic
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of
necessary libraries and/or headers.  Check the mkmf.log file for more
details.  You may need configuration options.

Provided configuration options:
    --with-opt-dir
    --with-opt-include
    --without-opt-include=${opt-dir}/include
    --with-opt-lib
    --without-opt-lib=${opt-dir}/lib
    --with-make-prog
    --without-make-prog
    --srcdir=.
    --curdir
    --ruby=/Users/aldavidson/.rvm/rubies/ruby-1.9.3-p125/bin/ruby
    --with-icu-dir
    --with-icu-include
    --without-icu-include=${icu-dir}/include
    --with-icu-lib
    --without-icu-lib=${icu-dir}/lib
    --with-icui18nlib
    --without-icui18nlib
    --with-icui18nlib
    --without-icui18nlib
extconf.rb:7:in `sys': ./configure --prefix=/Users/aldavidson/.rvm/gems/ruby-1.9.3-p125@wms/gems/charlock_holmes-0.6.9.4/ext/charlock_holmes/dst/ --disable-shared --enable-static --with-pic failed, please report issue on http://github.com/brianmario/charlock_holmes (RuntimeError)
    from extconf.rb:60:in `block (2 levels) in <main>'
    from extconf.rb:59:in `chdir'
    from extconf.rb:59:in `block in <main>'
    from extconf.rb:55:in `chdir'
    from extconf.rb:55:in `<main>'


Gem files will remain installed in /Users/aldavidson/.rvm/gems/ruby-1.9.3-p125@wms/gems/charlock_holmes-0.6.9.4 for inspection.
Results logged to /Users/aldavidson/.rvm/gems/ruby-1.9.3-p125@wms/gems/charlock_holmes-0.6.9.4/ext/charlock_holmes/gem_make.out```

libicui18n.so.48: cannot open shared object file

Hi everybody,
I tried to use this gem within my development environment (mac osx, installed icu4c from source), and everything worked just fine.

On staging environment, though, the gem installed fine, but whenever I run a rake task this is what i get:

libicui18n.so.48: cannot open shared object file: No such file or directory - /path/to/my/app/shared/bundle/ruby/1.9.1/gems/charlock_holmes-0.6.8/ext/charlock_holmes/charlock_holmes.so

I tried building the gem specifyng a path (I set it to the place where libicui18n.so.48 lives) as explained in the readme, but no luck :(

Does anyone know what's going on?

Thanks

Marco

Conversion from "ISO-8859-2" to "UTF-8" doesn't work

line1 = "+ACI-id+ACIAfA-+ACI-code+ACIAfA-+ACI-system+AF8-code+ACIAfA-+ACI-assembly+AF8-code+ACIAfA-+ACI-component+AF8-code+ACIAfA-+ACI-description+ACIAfA-+ACI-comments+ACIAfA-+ACI-obsolete+ACIAfA-+ACI-created+AF8-at+ACIAfA-+ACI-updated+AF8-at+ACI-\r\n"

detection = CharlockHolmes::EncodingDetector.detect(line1)

detection #=> {:type=>:text, :encoding=>"ISO-8859-2", :confidence=>66, :language=>"ro"}

new_line = CharlockHolmes::Converter.convert(line1, detection[:encoding], 'UTF-8')
new_line # returns same string as line1 and does not return string in UTF-8 encoded format.

Reason: image not found

install like this, gem install charlock_holmes -- --with-icu-dir=/usr/local/Cellar/icu4c

then get this error
/Users/ted/.rvm/rubies/ruby-2.0.0-p451/lib/ruby/site_ruby/2.0.0/rubygems/core_ext/kernel_require.rb:55:in `require': dlopen(/Users/ted/.rvm/gems/ruby-2.0.0-p451@global/extensions/x86_64-darwin-12/2.0.0-static/charlock_holmes-0.7.3/charlock_holmes/charlock_holmes.bundle, 9): Library not loaded: @@HOMEBREW_PREFIX@@/opt/icu4c/lib/libicudata.53.1.dylib (LoadError)
Referenced from: /Users/ted/.rvm/gems/ruby-2.0.0-p451@global/extensions/x86_64-darwin-12/2.0.0-static/charlock_holmes-0.7.3/charlock_holmes/charlock_holmes.bundle
Reason: image not found - /Users/ted/.rvm/gems/ruby-2.0.0-p451@global/extensions/x86_64-darwin-12/2.0.0-static/charlock_holmes-0.7.3/charlock_holmes/charlock_holmes.bundle

why load icu4c lib from @@HOMEBREW_PREFIX@@/opt/icu4c/lib/ ?

Failed to install because of file-5.08 failed to build with gcc 4.8

Installing charlock_holmes (0.6.9.4) 
Gem::Installer::ExtensionBuildError: ERROR: Failed to build gem native extension.

    /usr/bin/ruby2.0 extconf.rb 
checking for main() in -licui18n... yes
checking for main() in -licui18n... yes
checking for unicode/ucnv.h... yes
  -- tar zxvf file-5.08.tar.gz
  -- ./configure --prefix=/srv/www/gitorious/vendor/bundle/ruby/2.0.0/gems/charlock_holmes-0.6.9.4/ext/charlock_holmes/dst/ --disable-shared --enable-static --with-pic
  -- patch -p0 < ../file-soft-check.patch
  -- make -C src install
  -- make -C magic install
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of necessary
libraries and/or headers.  Check the mkmf.log file for more details.  You may
need configuration options.

Provided configuration options:
        --with-opt-dir
        --without-opt-dir
        --with-opt-include
        --without-opt-include=${opt-dir}/include
        --with-opt-lib
        --without-opt-lib=${opt-dir}/lib64
        --with-make-prog
        --without-make-prog
        --srcdir=.
        --curdir
        --ruby=/usr/bin/ruby2.0
        --with-icu-dir
        --without-icu-dir
        --with-icu-include
        --without-icu-include=${icu-dir}/include
        --with-icu-lib
        --without-icu-lib=${icu-dir}/
        --with-icui18nlib
        --without-icui18nlib
        --with-icui18nlib
        --without-icui18nlib
/usr/lib64/ruby/2.0.0/fileutils.rb:1551:in `stat': No such file or directory - /srv/www/gitorious/vendor/bundle/ruby/2.0.0/gems/charlock_holmes-0.6.9.4/ext/charlock_holmes/dst/lib/libmagic.a (Errno::ENOENT)
        from /usr/lib64/ruby/2.0.0/fileutils.rb:1551:in `block in fu_each_src_dest'
        from /usr/lib64/ruby/2.0.0/fileutils.rb:1567:in `fu_each_src_dest0'
        from /usr/lib64/ruby/2.0.0/fileutils.rb:1549:in `fu_each_src_dest'
        from /usr/lib64/ruby/2.0.0/fileutils.rb:393:in `cp'
        from extconf.rb:67:in `<main>'


Gem files will remain installed in /srv/www/gitorious/vendor/bundle/ruby/2.0.0/gems/charlock_holmes-0.6.9.4 for inspection.
Results logged to /srv/www/gitorious/vendor/bundle/ruby/2.0.0/gems/charlock_holmes-0.6.9.4/ext/charlock_holmes/gem_make.out
An error occurred while installing charlock_holmes (0.6.9.4), and Bundler cannot continue.
Make sure that `gem install charlock_holmes -v '0.6.9.4'` succeeds before bundling.

Manual running of make -C magic install produce this error

gcc -g -O2    magic.c   -o magic
In file included from magic.c:33:0:
file.h:131:2: error: unknown type name ‘uint8_t’
  uint8_t b;
  ^
file.h:132:2: error: unknown type name ‘uint16_t’
  uint16_t h;
  ^
file.h:133:2: error: unknown type name ‘uint32_t’
  uint32_t l;
  ^
file.h:134:2: error: unknown type name ‘uint64_t’
  uint64_t q;
  ^
file.h:135:2: error: unknown type name ‘uint8_t’
  uint8_t hs[2]; /* 2 bytes of a fixed-endian "short" */
  ^
file.h:136:2: error: unknown type name ‘uint8_t’
  uint8_t hl[4]; /* 4 bytes of a fixed-endian "long" */
  ^
file.h:137:2: error: unknown type name ‘uint8_t’
  uint8_t hq[8]; /* 8 bytes of a fixed-endian "quad" */
  ^
file.h:146:2: error: unknown type name ‘uint16_t’
  uint16_t cont_level; /* level of ">" */
  ^
file.h:147:2: error: unknown type name ‘uint8_t’
  uint8_t flag;
  ^
file.h:157:2: error: unknown type name ‘uint8_t’
  uint8_t factor;
  ^
file.h:160:2: error: unknown type name ‘uint8_t’
  uint8_t reln;  /* relation (0=eq, '>'=gt, etc) */
  ^
file.h:161:2: error: unknown type name ‘uint8_t’
  uint8_t vallen;  /* length of string value, if any */
  ^
file.h:162:2: error: unknown type name ‘uint8_t’
  uint8_t type;  /* comparison type (FILE_*) */
  ^
file.h:163:2: error: unknown type name ‘uint8_t’
  uint8_t in_type; /* type of indirection */
  ^
file.h:225:2: error: unknown type name ‘uint8_t’
  uint8_t in_op;  /* operator for indirection */
  ^
file.h:226:2: error: unknown type name ‘uint8_t’
  uint8_t mask_op; /* operator for mask */
  ^
file.h:228:2: error: unknown type name ‘uint8_t’
  uint8_t cond;  /* conditional type */
  ^
file.h:232:2: error: unknown type name ‘uint8_t’
  uint8_t factor_op;
  ^
file.h:263:2: error: unknown type name ‘uint32_t’
  uint32_t offset; /* offset to magic number */
  ^
file.h:267:2: error: unknown type name ‘uint32_t’
  uint32_t lineno; /* line number in magic file */
  ^
file.h:270:3: error: unknown type name ‘uint64_t’
   uint64_t _mask; /* for use with numeric and date types */
   ^
file.h:272:4: error: unknown type name ‘uint32_t’
    uint32_t _count; /* repeat/line count */
    ^
file.h:273:4: error: unknown type name ‘uint32_t’
    uint32_t _flags; /* modifier flags */
    ^
file.h:327:2: error: unknown type name ‘uint32_t’
  uint32_t nmagic;   /* number of entries in array */
  ^
file.h:360:2: error: unknown type name ‘uint32_t’
  uint32_t offset;
  ^
file.h:385:46: error: expected ‘)’ before ‘int’
 protected const char *file_fmttime(uint32_t, int);
                                              ^
file.h:414:11: error: unknown type name ‘uint64_t’
 protected uint64_t file_signextend(struct magic_set *, struct magic *,
           ^
file.h:415:5: error: unknown type name ‘uint64_t’
     uint64_t);
     ^
file.h:449:14: error: conflicting types for ‘sys_errlist’
 extern char *sys_errlist[];
              ^
In file included from /usr/include/stdio.h:853:0,
                 from file.h:52,
                 from magic.c:33:
/usr/include/bits/sys_errlist.h:27:26: note: previous declaration of ‘sys_errlist’ was here
 extern const char *const sys_errlist[];
                          ^
In file included from magic.c:33:0:
file.h:455:26: error: conflicting types for ‘strtol’
 #define strtoul(a, b, c) strtol(a, b, c)
                          ^
In file included from magic.c:41:0:
/usr/include/stdlib.h:183:17: note: previous declaration of ‘strtol’ was here
 extern long int strtol (const char *__restrict __nptr,
                 ^
In file included from magic.c:33:0:
/usr/include/string.h:409:14: error: expected identifier or ‘(’ before ‘int’
 extern char *strerror (int __errnum) __THROW;
              ^
file.h:451:8: error: expected ‘)’ before ‘>=’ token
  (((e) >= 0 && (e) < sys_nerr) ? sys_errlist[(e)] : "Unknown error")
        ^
file.h:451:32: error: expected ‘)’ before ‘?’ token
  (((e) >= 0 && (e) < sys_nerr) ? sys_errlist[(e)] : "Unknown error")
                                ^
make: *** [magic] Ошибка 1

This can be fixed by adding #include <stdint.h> to magic.c

Failed to install on freebsd9

Tried to install gem with defined icu path on freebsd 9

[root@freebsd ~]# gem install charlock_holmes -- --with-icu-dir=/usr/local/share/icu/4.8.1.1/
Building native extensions.  This could take a while...
ERROR:  Error installing charlock_holmes:
    ERROR: Failed to build gem native extension.

        /usr/local/rvm/rubies/ruby-1.9.2-p290/bin/ruby extconf.rb --with-icu-dir=/usr/local/share/icu/4.8.1.1/
checking for main() in -licui18n... no
checking for main() in -licui18n... no


***************************************************************************************
*********** icu required (brew install icu4c or apt-get install libicu-dev) ***********
***************************************************************************************
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of
necessary libraries and/or headers.  Check the mkmf.log file for more
details.  You may need configuration options.

Provided configuration options:
    --with-opt-dir
    --with-opt-include
    --without-opt-include=${opt-dir}/include
    --with-opt-lib
    --without-opt-lib=${opt-dir}/lib
    --with-make-prog
    --without-make-prog
    --srcdir=.
    --curdir
    --ruby=/usr/local/rvm/rubies/ruby-1.9.2-p290/bin/ruby
    --with-icu-dir
    --with-icu-include
    --without-icu-include=${icu-dir}/include
    --with-icu-lib
    --without-icu-lib=${icu-dir}/lib
    --with-icui18nlib
    --without-icui18nlib
    --with-icui18nlib
    --without-icui18nlib


Gem files will remain installed in /usr/local/rvm/gems/ruby-1.9.2-p290/gems/charlock_holmes-0.6.8 for inspection.
Results logged to /usr/local/rvm/gems/ruby-1.9.2-p290/gems/charlock_holmes-0.6.8/ext/charlock_holmes/gem_make.out

System information:

[root@freebsd ~]# uname -a
FreeBSD freebsd 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan  3 07:46:30 UTC 2012     [email protected]:/usr/obj/usr/src/sys/GENERIC  amd64

[root@freebsd ~]# ruby -v
ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-freebsd9.0]

[Patch] is required to do a gem install

This needs to be checked in the Rakefile and also documented. I had to dig into the code to discover that was my problem when the gem installation failed on a vanilla CentOS server.

Thanks for the work on this, it was a great quick-fix for an issue I was having with a project!

Install failure on RedHat

Using RedHat 6 I try and run a gem install charlock_holmes but it fails:

gem install charlock_holmes
Building native extensions.  This could take a while...
ERROR:  Error installing charlock_holmes:
    ERROR: Failed to build gem native extension.

/usr/bin/ruby extconf.rb
checking for main() in -licui18n... no
which: no brew in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)
checking for main() in -licui18n... no


***************************************************************************************
*********** icu required (brew install icu4c or apt-get install libicu-dev) ***********
***************************************************************************************
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of
necessary libraries and/or headers.  Check the mkmf.log file for more
details.  You may need configuration options.

Provided configuration options:
    --with-opt-dir
    --without-opt-dir
    --with-opt-include
    --without-opt-include=${opt-dir}/include
    --with-opt-lib
    --without-opt-lib=${opt-dir}/lib
    --with-make-prog
    --without-make-prog
    --srcdir=.
    --curdir
    --ruby=/usr/bin/ruby
    --with-icu-dir
    --without-icu-dir
    --with-icu-include
    --without-icu-include=${icu-dir}/include
    --with-icu-lib
    --without-icu-lib=${icu-dir}/lib
    --with-icui18nlib
    --without-icui18nlib
    --with-icui18nlib
    --without-icui18nlib


Gem files will remain installed in /usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.9 for inspection.
Results logged to /usr/lib/ruby/gems/1.8/gems/charlock_holmes-0.6.9/ext/charlock_holmes/gem_make.out

When trying to import some files to postgresql that I've run conversion on I get errors...

Here are a couple of example errors...

invalid byte sequence for encoding "UTF8": 0xba
invalid byte sequence for encoding "UTF8": 0xd0 0x34

Here's how I do the re-encoding...

def convert_file_to_utf8(file_path)
  contents = File.read(file_path)
  detection = CharlockHolmes::EncodingDetector.detect(contents)
  utf8_encoded_content = CharlockHolmes::Converter.convert(contents, detection[:encoding], 'UTF-8')

  return utf8_encoded_content
end

Am I doing something wrong, is the gem not accounting for some of the characters in the file correctly, or is it something else entirely?

Failed to install 0.6.8 on Ubuntu Server 11.10

I can't install your gem :/
Could you help me, please ?

Here's the log :

Gem::Installer::ExtensionBuildError: ERROR: Failed to build gem native extension.

/usr/local/bin/ruby extconf.rb
checking for main() in -licui18n... yes
checking for main() in -licui18n... yes
checking for unicode/ucnv.h... yes
-- tar zxvf file-5.08.tar.gz
-- ./configure --prefix=/home/gitlabhq/gitlabhq/vendor/bundle/ruby/1.9.1/gems/charlock_holmes-0.6.8/ext/charlock_holmes/dst/ --disable-shared --enable-static --with-pic
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of
necessary libraries and/or headers. Check the mkmf.log file for more
details. You may need configuration options.

Provided configuration options:
--with-opt-dir
--without-opt-dir
--with-opt-include
--without-opt-include=${opt-dir}/include
--with-opt-lib
--without-opt-lib=${opt-dir}/lib
--with-make-prog
--without-make-prog
--srcdir=.
--curdir
--ruby=/usr/local/bin/ruby
--with-icu-dir
--without-icu-dir
--with-icu-include
--without-icu-include=${icu-dir}/include
--with-icu-lib
--without-icu-lib=${icu-dir}/lib
--with-icui18nlib
--without-icui18nlib
--with-icui18nlib
--without-icui18nlib
extconf.rb:7:in sys': ./configure --prefix=/home/gitlabhq/gitlabhq/vendor/bundle/ruby/1.9.1/gems/charlock_holmes-0.6.8/ext/charlock_holmes/dst/ --disable-shared --enable-static --with-pic failed, please report issue on http://github.com/brianmario/charlock_holmes (RuntimeError) from extconf.rb:60:inblock (2 levels) in

'
from extconf.rb:59:in chdir' from extconf.rb:59:inblock in '
from extconf.rb:55:in chdir' from extconf.rb:55:in'

Gem files will remain installed in /home/gitlabhq/gitlabhq/vendor/bundle/ruby/1.9.1/gems/charlock_holmes-0.6.8 for inspection.
Results logged to /home/gitlabhq/gitlabhq/vendor/bundle/ruby/1.9.1/gems/charlock_holmes-0.6.8/ext/charlock_holmes/gem_make.out

fails to build

Hi,
I have the following problem build charlock_holmes.

I'm using Gentoo with ruby 1.9.3p125.

Can someone please tell me what I'm doing wrong or whats missing.

Gem::Installer::ExtensionBuildError: ERROR: Failed to build gem native extension.

        /usr/bin/ruby19 extconf.rb 
checking for main() in -licui18n... yes
checking for main() in -licui18n... yes
checking for unicode/ucnv.h... yes
  -- tar zxvf file-5.08.tar.gz
  -- ./configure --prefix=/home/gitlab/.gem/ruby/1.9.1/gems/charlock_holmes-0.6.8/ext/charlock_holmes/dst/ --disable-shared --enable-static --with-pic
  -- make -C src install
  -- make -C magic install
checking for main() in -lmagic_ext... yes
checking for magic.h... yes
creating Makefile

make
compiling converter.c
common.h:23:14: warning: ‘charlock_new_str’ defined but not used
common.h:32:14: warning: ‘charlock_new_str2’ defined but not used
compiling encoding_detector.c
common.h:14:14: warning: ‘charlock_new_enc_str’ defined but not used
compiling ext.c
common.h:14:14: warning: ‘charlock_new_enc_str’ defined but not used
common.h:23:14: warning: ‘charlock_new_str’ defined but not used
common.h:32:14: warning: ‘charlock_new_str2’ defined but not used
linking shared-object charlock_holmes.so
converter.o: In function `rb_converter_convert':
converter.c:(.text+0x80): undefined reference to `ucnv_convert_48'
converter.c:(.text+0xc0): undefined reference to `ucnv_convert_48'
converter.c:(.text+0x139): undefined reference to `u_errorName_48'
encoding_detector.o: In function `rb_get_supported_encodings':
encoding_detector.c:(.text+0x9f): undefined reference to `uenum_count_48'
encoding_detector.c:(.text+0xca): undefined reference to `uenum_next_48'
encoding_detector.c:(.text+0x12e): undefined reference to `uenum_next_48'
encoding_detector.c:(.text+0x16d): undefined reference to `uenum_next_48'
encoding_detector.c:(.text+0x1b1): undefined reference to `uenum_next_48'
encoding_detector.c:(.text+0x1fd): undefined reference to `uenum_next_48'
encoding_detector.o:encoding_detector.c:(.text+0x23e): more undefined references to `uenum_next_48' follow
encoding_detector.o: In function `rb_encdec__alloc':
encoding_detector.c:(.text+0x774): undefined reference to `u_errorName_48'
./libmagic_ext.a(compress.o): In function `uncompressgzipped':
/home/gitlab/.gem/ruby/1.9.1/gems/charlock_holmes-0.6.8/ext/charlock_holmes/src/file-5.08/src/compress.c:357: undefined reference to `inflateInit2_'
/home/gitlab/.gem/ruby/1.9.1/gems/charlock_holmes-0.6.8/ext/charlock_holmes/src/file-5.08/src/compress.c:363: undefined reference to `inflate'
/home/gitlab/.gem/ruby/1.9.1/gems/charlock_holmes-0.6.8/ext/charlock_holmes/src/file-5.08/src/compress.c:370: undefined reference to `inflateEnd'
collect2: ld returned 1 exit status
make: *** [charlock_holmes.so] Error 1

Heroku Support

First off, great gem. Really like how quick and powerful it is. That being said, it took me about 20 hours to come up with a good deployment plan for it on Heroku. I would have switched to EC2 or Digital Ocean if I had a choice because this issue was so bad. So here are a few issues...

  1. Heroku runs Ubuntu 10.04 which only has icu4c v42 available unless you're able to run an apt-get update which isn't possible. This makes Steve Tooke's solution unusable.
  2. This solution uses a heroku build pack to install the correct version of icu4c into vendor and install the gem with environment variables which is cool but relies on hosting the package on s3 and maintaining said package.
  3. The bundled version of the gem is a completely unacceptable answer as deploys take 15+ minutes.

So is there anyway to make the gem compatible with v42 of icu4c or redesign the gem in a way that deploys to Heroku are more manageable and acceptable in terms of time?

charlock_holmes build problem with libicui18n after Ubuntu lucid upgraded to Ubuntu 12.04.2 LTS precise

I did an upgrade on my old Ubuntu 10.04.4 LTS (Lucid Lynx) server to bring it up to Ubuntu 12.04.2 LTS precise. As a result, the original linked shared library libicui18n.so.42 is upgraded to libicui18n.so.48, hence breaking the gem.

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 12.04.2 LTS
Release:        12.04
Codename:       precise

$ ldd  /home/git/gitlab/vendor/bundle/ruby/1.9.1/gems/charlock_holmes-0.6.9/lib/charlock_holmes/charlock_holmes.so | grep 'not found'
        libicui18n.so.42 => not found

I've tried to uninstall the gem and re-install but it didn't seem to fix the problem as I am seeing the same error message. Any ideas what's the best approach here to fix this problem?

please make a new release

There is already a fix inside the master

fix lib64 library path (issue #8)

but "bundle" installations catch per default the last release without this fix. So the installation of e.g. gitlab on rpm-based x86_64-systems (fedora, centos, opensuse ) is very annoying.

Unable to install 'charlock_holmes_bundle_icu' on Mavericks

I get the following error when trying to install on OS X Mavericks:

Installing charlock_holmes_bundle_icu (0.6.9.2)
Gem::Installer::ExtensionBuildError: ERROR: Failed to build gem native extension.

    /opt/rubies/ruby-2.0.0-p247/bin/ruby extconf.rb
  -- tar zxvf icu4c-49_1_2-src.tgz
  -- LDFLAGS= CXXFLAGS="-O2 -fPIC" CFLAGS="-O2 -fPIC" ./configure --prefix=/Users/silasjmatson/.gem/ruby/2.0.0/gems/charlock_holmes_bundle_icu-0.6.9.2/ext/charlock_holmes/dst/ --disable-tests --disable-samples --disable-icuio --disable-extras --disable-layout --enable-static --disable-shared
  -- make install
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of necessary
libraries and/or headers.  Check the mkmf.log file for more details.  You may
need configuration options.

Provided configuration options:
    --with-opt-dir
    --with-opt-include
    --without-opt-include=${opt-dir}/include
    --with-opt-lib
    --without-opt-lib=${opt-dir}/lib
    --with-make-prog
    --without-make-prog
    --srcdir=.
    --curdir
    --ruby=/opt/rubies/ruby-2.0.0-p247/bin/ruby
extconf.rb:7:in `sys': make install failed, please report issue on http://github.com/brianmario/charlock_holmes (RuntimeError)
    from extconf.rb:33:in `block (2 levels) in <main>'
    from extconf.rb:31:in `chdir'
    from extconf.rb:31:in `block in <main>'
    from extconf.rb:27:in `chdir'
    from extconf.rb:27:in `<main>'


Gem files will remain installed in /Users/silasjmatson/.gem/ruby/2.0.0/gems/charlock_holmes_bundle_icu-0.6.9.2 for inspection.
Results logged to /Users/silasjmatson/.gem/ruby/2.0.0/gems/charlock_holmes_bundle_icu-0.6.9.2/ext/charlock_holmes/gem_make.out

An error occurred while installing charlock_holmes_bundle_icu (0.6.9.2), and Bundler cannot continue.
Make sure that `gem install charlock_holmes_bundle_icu -v '0.6.9.2'` succeeds before bundling.

installation fails for gemsets

Im having a trouble with installation on OSX for a gemset different from default.

Here are the steps (i tried on both clean and dev machines);

Installation works

brew install icu4c
cd ~/myproject
rvm gemset clear
rvm gemset name # produces: /Users/sosedoff/.rvm/gems/ruby-1.9.2-p290
gem install charlock_holmes

Installation fails

brew install icu4c
cd ~/myproject
rvm gemset create test
rvm gemset use test
rvm gemset name # produces: test
gem install charlock_holmes

Error:

Building native extensions.  This could take a while...
ERROR:  Error installing charlock_holmes:
    ERROR: Failed to build gem native extension.

        /Users/sosedoff/.rvm/rubies/ruby-1.9.2-p290/bin/ruby extconf.rb
checking for main() in -licui18n... no
checking for main() in -licui18n... yes
checking for unicode/ucnv.h... yes
  -- tar zxvf file-5.08.tar.gz
  -- ./configure --prefix=/Users/sosedoff/.rvm/gems/ruby-1.9.2-p290@test/gems/charlock_holmes-0.6.7/ext/charlock_holmes/dst/ --disable-shared --enable-static --with-pic
  -- make
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of
necessary libraries and/or headers.  Check the mkmf.log file for more
details.  You may need configuration options.

Provided configuration options:
    --with-opt-dir
    --without-opt-dir
    --with-opt-include
    --without-opt-include=${opt-dir}/include
    --with-opt-lib
    --without-opt-lib=${opt-dir}/lib
    --with-make-prog
    --without-make-prog
    --srcdir=.
    --curdir
    --ruby=/Users/sosedoff/.rvm/rubies/ruby-1.9.2-p290/bin/ruby
    --with-icu-dir
    --without-icu-dir
    --with-icu-include
    --without-icu-include=${icu-dir}/include
    --with-icu-lib
    --without-icu-lib=${icu-dir}/lib
    --with-icui18nlib
    --without-icui18nlib
    --with-icui18nlib
    --without-icui18nlib
extconf.rb:7:in `sys': make failed, please report issue on http://github.com/brianmario/charlock_holmes (RuntimeError)
    from extconf.rb:61:in `block (2 levels) in <main>'
    from extconf.rb:59:in `chdir'
    from extconf.rb:59:in `block in <main>'
    from extconf.rb:55:in `chdir'
    from extconf.rb:55:in `<main>'


Gem files will remain installed in /Users/sosedoff/.rvm/gems/ruby-1.9.2-p290@test/gems/charlock_holmes-0.6.7 for inspection.
Results logged to /Users/sosedoff/.rvm/gems/ruby-1.9.2-p290@test/gems/charlock_holmes-0.6.7/ext/charlock_holmes/gem_make.out

I also tried to run (as described in readme) with option:

gem install charlock_holmes -- --with-icu-dir=/usr/local/Cellar/icu4c/4.4.1

Still no dice.

I think there is a problem with linking when using a gemset.

Any thoughts on this?

Failing to load libicu on debian wheezy

I have recently upgraded to debian wheezy. The libicu package is now called libicu48.

I have a gitlab instance installed, and when i try to spawn a passenger instance, i get the following error:

libicui18n.so.44: cannot open shared object file: No such file or directory - /home/git/gitlab/vendor/bundle/ruby/1
.9.1/gems/charlock_holmes-0.6.9/lib/charlock_holmes/charlock_holmes.so

I know that gitlab explicitly requires charlock_holmes in version 0.6.9, but i am creating the ticket here, because i think that's mainly a charlock_holmes issue.

However, the installation of your gem is working. And if you'll tell me that i'm wrong here and should ask the gitlab maintainers...okay. ;)

Build fails to link `-licuuc -lz` on gentoo

gem install fails with these build errors (and warnings)

host ~ # gem install charlock_holmes
Building native extensions.  This could take a while...
ERROR:  Error installing charlock_holmes:
        ERROR: Failed to build gem native extension.

        /usr/bin/ruby19 extconf.rb
checking for main() in -licui18n... yes
checking for main() in -licui18n... yes
checking for unicode/ucnv.h... yes
  -- tar zxvf file-5.08.tar.gz
  -- ./configure --prefix=/usr/local/lib64/ruby/gems/1.9.1/gems/charlock_holmes-0.6.9.4/ext/charlock_holmes/dst/ --disable-shared --enable-static --with-pic
  -- patch -p0 < ../file-soft-check.patch
  -- make -C src install
  -- make -C magic install
checking for main() in -lmagic_ext... yes
checking for magic.h... yes
creating Makefile

make
compiling encoding_detector.c
common.h:14:14: warning: 'charlock_new_enc_str' defined but not used [-Wunused-function]
compiling converter.c
common.h:23:14: warning: 'charlock_new_str' defined but not used [-Wunused-function]
common.h:32:14: warning: 'charlock_new_str2' defined but not used [-Wunused-function]
compiling ext.c
common.h:14:14: warning: 'charlock_new_enc_str' defined but not used [-Wunused-function]
common.h:23:14: warning: 'charlock_new_str' defined but not used [-Wunused-function]
common.h:32:14: warning: 'charlock_new_str2' defined but not used [-Wunused-function]
compiling transliterator.cpp
common.h:14:14: warning: 'VALUE charlock_new_enc_str(const char*, size_t, void*)' defined but not used [-Wunused-function]
common.h:32:14: warning: 'VALUE charlock_new_str2(const char*)' defined but not used [-Wunused-function]
linking shared-object charlock_holmes/charlock_holmes.so
encoding_detector.o: In function `rb_get_supported_encodings':
encoding_detector.c:(.text+0xa0): undefined reference to `uenum_count_49'
encoding_detector.c:(.text+0xc1): undefined reference to `uenum_next_49'
encoding_detector.c:(.text+0x12a): undefined reference to `uenum_next_49'
encoding_detector.c:(.text+0x16c): undefined reference to `uenum_next_49'
encoding_detector.c:(.text+0x1ad): undefined reference to `uenum_next_49'
encoding_detector.c:(.text+0x1f7): undefined reference to `uenum_next_49'
encoding_detector.o:encoding_detector.c:(.text+0x234): more undefined references to `uenum_next_49' follow
encoding_detector.o: In function `rb_encdec__alloc':
encoding_detector.c:(.text+0x75d): undefined reference to `u_errorName_49'
converter.o: In function `rb_converter_convert':
converter.c:(.text+0xb1): undefined reference to `ucnv_convert_49'
converter.c:(.text+0xf1): undefined reference to `ucnv_convert_49'
converter.c:(.text+0x18a): undefined reference to `u_errorName_49'
transliterator.o: In function `rb_transliterator_id_list':
transliterator.cpp:(.text+0x15a): undefined reference to `u_errorName_49'
transliterator.o: In function `rb_transliterator_transliterate':
transliterator.cpp:(.text+0x242): undefined reference to `icu_49::UnicodeString::UnicodeString(char const*, int)'
transliterator.cpp:(.text+0x26e): undefined reference to `icu_49::UnicodeString::~UnicodeString()'
transliterator.cpp:(.text+0x287): undefined reference to `icu_49::UMemory::operator new(unsigned long)'
transliterator.cpp:(.text+0x29d): undefined reference to `icu_49::UnicodeString::UnicodeString(char const*, int)'
transliterator.cpp:(.text+0x2e5): undefined reference to `icu_49::UnicodeString::toUTF8(icu_49::ByteSink&) const'
transliterator.cpp:(.text+0x339): undefined reference to `icu_49::ByteSink::~ByteSink()'
transliterator.cpp:(.text+0x396): undefined reference to `u_errorName_49'
transliterator.cpp:(.text+0x40b): undefined reference to `icu_49::UMemory::operator delete(void*)'
transliterator.cpp:(.text+0x442): undefined reference to `icu_49::UnicodeString::~UnicodeString()'
transliterator.o: In function `icu_49::StringByteSink<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >::~StringByteSink()':
transliterator.cpp:(.text._ZN6icu_4914StringByteSinkISsED2Ev[_ZN6icu_4914StringByteSinkISsED5Ev]+0xf): undefined reference to `icu_49::ByteSink::~ByteSink()'
transliterator.o: In function `icu_49::StringByteSink<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >::~StringByteSink()':
transliterator.cpp:(.text._ZN6icu_4914StringByteSinkISsED0Ev[_ZN6icu_4914StringByteSinkISsED5Ev]+0x13): undefined reference to `icu_49::ByteSink::~ByteSink()'
transliterator.cpp:(.text._ZN6icu_4914StringByteSinkISsED0Ev[_ZN6icu_4914StringByteSinkISsED5Ev]+0x1c): undefined reference to `icu_49::UMemory::operator delete(void*)'
transliterator.o:(.data.rel.ro._ZTVN6icu_4914StringByteSinkISsEE[vtable for icu_49::StringByteSink<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >]+0x28): undefined reference to `icu_49::ByteSink::GetAppendBuffer(int, int, char*, int, int*)'
transliterator.o:(.data.rel.ro._ZTVN6icu_4914StringByteSinkISsEE[vtable for icu_49::StringByteSink<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >]+0x30): undefined reference to `icu_49::ByteSink::Flush()'
transliterator.o:(.data.rel.ro._ZTIN6icu_4914StringByteSinkISsEE[typeinfo for icu_49::StringByteSink<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >]+0x10): undefined reference to `typeinfo for icu_49::ByteSink'
./libmagic_ext.a(compress.o): In function `uncompressgzipped':
/usr/local/lib64/ruby/gems/1.9.1/gems/charlock_holmes-0.6.9.4/ext/charlock_holmes/src/file-5.08/src/compress.c:357: undefined reference to `inflateInit2_'
/usr/local/lib64/ruby/gems/1.9.1/gems/charlock_holmes-0.6.9.4/ext/charlock_holmes/src/file-5.08/src/compress.c:363: undefined reference to `inflate'
/usr/local/lib64/ruby/gems/1.9.1/gems/charlock_holmes-0.6.9.4/ext/charlock_holmes/src/file-5.08/src/compress.c:370: undefined reference to `inflateEnd'
collect2: ld returned 1 exit status
make: *** [charlock_holmes.so] Error 1


Gem files will remain installed in /usr/local/lib64/ruby/gems/1.9.1/gems/charlock_holmes-0.6.9.4 for inspection.
Results logged to /usr/local/lib64/ruby/gems/1.9.1/gems/charlock_holmes-0.6.9.4/ext/charlock_holmes/gem_make.out

I have also tried with --with-icu-dir to no avail.

Also see https://github.com/gitlabhq/gitlabhq/issues/1972

"gem install charlock_holmes -- --with-icu-dir=/usr/local/opt/icu4c" gives errors

gem install charlock_holmes -- --with-icu-dir=/usr/local/opt/icu4c
Upgraded http://rubygems.org/ to HTTPS
Building native extensions with: '--with-icu-dir=/usr/local/opt/icu4c'
This could take a while...
Successfully installed charlock_holmes-0.6.9.4
unable to convert "\xCF" from ASCII-8BIT to UTF-8 for ext/charlock_holmes/dst/bin/file, skipping
unable to convert "\xE1" from ASCII-8BIT to UTF-8 for ext/charlock_holmes/src/file-5.08/ChangeLog, skipping
unable to convert "\xBD" from ASCII-8BIT to UTF-8 for ext/charlock_holmes/src/file-5.08/magic/Magdir/filesystems, skipping
unable to convert "\xE1" from ASCII-8BIT to UTF-8 for ext/charlock_holmes/src/file-5.08/magic/Magdir/linux, skipping
unable to convert "\xE1" from ASCII-8BIT to UTF-8 for ext/charlock_holmes/src/file-5.08/magic/Magdir/natinst, skipping
unable to convert "\xE5" from ASCII-8BIT to UTF-8 for ext/charlock_holmes/src/file-5.08/magic/Magdir/riff, skipping
unable to convert "\xEE" from ASCII-8BIT to UTF-8 for ext/charlock_holmes/src/file-5.08/magic/Magdir/wordprocessors, skipping
unable to convert "\xCF" from ASCII-8BIT to UTF-8 for ext/charlock_holmes/src/file-5.08/src/file, skipping
unable to convert "\xCF" from ASCII-8BIT to UTF-8 for test/fixtures/hello_world, skipping

//////////

When using in ruby:
/Users/D3MZ/.rvm/rubies/ruby-2.0.0-p0/lib/ruby/site_ruby/2.0.0/rubygems/core_ext/kernel_require.rb:58:in require': dlopen(/Users/D3MZ/.rvm/gems/ruby-2.0.0-p0/gems/charlock_holmes-0.6.9.4/lib/charlock_holmes/charlock_holmes.bundle, 9): Library not loaded: /usr/local/opt/icu4c/lib/libicui18n.50.1.dylib (LoadError) Referenced from: /Users/D3MZ/.rvm/gems/ruby-2.0.0-p0/gems/charlock_holmes-0.6.9.4/lib/charlock_holmes/charlock_holmes.bundle Reason: image not found - /Users/D3MZ/.rvm/gems/ruby-2.0.0-p0/gems/charlock_holmes-0.6.9.4/lib/charlock_holmes/charlock_holmes.bundle from /Users/D3MZ/.rvm/rubies/ruby-2.0.0-p0/lib/ruby/site_ruby/2.0.0/rubygems/core_ext/kernel_require.rb:58:inrequire'
from /Users/D3MZ/.rvm/gems/ruby-2.0.0-p0/gems/charlock_holmes-0.6.9.4/lib/charlock_holmes.rb:1:in <top (required)>' from /Users/D3MZ/.rvm/rubies/ruby-2.0.0-p0/lib/ruby/site_ruby/2.0.0/rubygems/core_ext/kernel_require.rb:110:inrequire'
from /Users/D3MZ/.rvm/rubies/ruby-2.0.0-p0/lib/ruby/site_ruby/2.0.0/rubygems/core_ext/kernel_require.rb:110:in rescue in require' from /Users/D3MZ/.rvm/rubies/ruby-2.0.0-p0/lib/ruby/site_ruby/2.0.0/rubygems/core_ext/kernel_require.rb:35:inrequire'
from /Users/D3MZ/RubymineProjects/shifthub scripts/world_cities.rb:2:in <top (required)>' from -e:1:inload'
from -e:1:in `

'

unable to compllie 0.6.9.4 on RHEL

On a RHEL server as follows:

-bash-3.2$ lsb_release -i -r
Distributor ID: RedHatEnterpriseServer
Release:    5.6

I am attempting to install charlock_holmes 0.6.9.4. Installing 0.6.9 works fine, however when running gem install charlock_holmes -v '0.6.9.4' results in:

ERROR:  Error installing charlock_holmes:
    ERROR: Failed to build gem native extension.

        /home/jenkins/.rvm/rubies/ruby-1.9.3-p392/bin/ruby extconf.rb
checking for main() in -licui18n... yes
checking for main() in -licui18n... yes
checking for unicode/ucnv.h... yes
  -- tar zxvf file-5.08.tar.gz
  -- ./configure --prefix=/mnt/gitdev01cnc/jenkins/.rvm/gems/ruby-1.9.3-p392@gitlabhq/gems/charlock_holmes-0.6.9.4/ext/charlock_holmes/dst/ --disable-shared --enable-static --with-pic
  -- patch -p0 < ../file-soft-check.patch
  -- make -C src install
  -- make -C magic install
checking for main() in -lmagic_ext... yes
checking for magic.h... yes
creating Makefile

make
compiling converter.c
In file included from converter.c:2:
common.h:41:7: warning: no newline at end of file
converter.c: In function ‘rb_converter_convert’:
converter.c:7: warning: unused parameter ‘self’
converter.c: At top level:
common.h:24: warning: ‘charlock_new_str’ defined but not used
common.h:33: warning: ‘charlock_new_str2’ defined but not used
compiling encoding_detector.c
In file included from encoding_detector.c:3:
common.h:41:7: warning: no newline at end of file
common.h:15: warning: ‘charlock_new_enc_str’ defined but not used
compiling ext.c
In file included from ext.c:1:
common.h:41:7: warning: no newline at end of file
ext.c:15:2: warning: no newline at end of file
common.h:15: warning: ‘charlock_new_enc_str’ defined but not used
common.h:24: warning: ‘charlock_new_str’ defined but not used
common.h:33: warning: ‘charlock_new_str2’ defined but not used
compiling transliterator.cpp
cc1plus: warning: command line option "-Wdeclaration-after-statement" is valid for C/ObjC but not for C++
cc1plus: warning: command line option "-Wimplicit-function-declaration" is valid for C/ObjC but not for C++
In file included from transliterator.cpp:1:
common.h:41:7: warning: no newline at end of file
transliterator.cpp:37: warning: unused parameter ‘self’
transliterator.cpp: In function ‘VALUE rb_transliterator_transliterate(VALUE, VALUE, VALUE)’:
transliterator.cpp:108: error: ‘StringByteSink’ was not declared in this scope
transliterator.cpp:108: error: expected primary-expression before ‘>’ token
transliterator.cpp:108: error: ‘sink’ was not declared in this scope
transliterator.cpp:109: error: ‘class icu_3_6::UnicodeString’ has no member named ‘toUTF8’
transliterator.cpp: At global scope:
transliterator.cpp:78: warning: unused parameter ‘self’
common.h:14: warning: ‘VALUE charlock_new_enc_str(const char*, size_t, void*)’ defined but not used
common.h:32: warning: ‘VALUE charlock_new_str2(const char*)’ defined but not used
make: *** [transliterator.o] Error 1

GitLab/charlock issues with latest libicui18n

When instaling the gem:
unable to convert "\xD0" from ASCII-8BIT to UTF-8 for lib/charlock_holmes/charlock_holmes.so, skipping

Libs installed:

ls /usr/lib/libicui18n.so*

/usr/lib/libicui18n.so /usr/lib/libicui18n.so.52 /usr/lib/libicui18n.so.52.1

gitlab env info output:

$ bundle exec rake gitlab:env:info RAILS_ENV=production
rake aborted!
libicui18n.so.51: cannot open shared object file: No such file or directory - /home/git/gitlab/vendor/bundle/ruby/2.0.0/gems/charlock_holmes-0.6.9.4/lib/charlock_holmes/charlock_holmes.so

charlock_holmes wants libicui18n.so.51 !
If i link .so.51 to the actual .so as following:

ln -s /usr/lib/libicui18n.so /usr/lib/libicui18n.so.51

the output is:

$ bundle exec rake gitlab:env:info RAILS_ENV=production
rake aborted!
/home/git/gitlab/vendor/bundle/ruby/2.0.0/gems/charlock_holmes-0.6.9.4/lib/charlock_holmes/charlock_holmes.so: undefined symbol: _ZTIN6icu_518ByteSinkE - /home/git/gitlab/vendor/bundle/ruby/2.0.0/gems/charlock_holmes-0.6.9.4/lib/charlock_holmes/charlock_holmes.so

Using kernel 3.10.9, qt4 4.8.5-3, icu 52.1-1

NOTE:
if i come back to old ICU lib version:

cd /var/cache/pacman/pkg/

pacman -U icu-51.2-1-x86_64.pkg.tar.xz

It works great.

"jobs".detect_encoding! => ArgumentError: unknown encoding name - IBM420_ltr

I don't know if this is a charlock_holmes issue or an ICU issue. For some reason the magic string "jobs" causes an explosion. I have no idea why it would insist that is IBM420_ltr (which I've never even heard of before) but the singular version is not:

"job".detect_encoding => {:type=>:text, :encoding=>"UTF-8", :confidence=>10}

I tried giving it a hint but no-go:

"jobs".detect_encoding("UTF-8") => {:type=>:text, :encoding=>"IBM420_ltr", :confidence=>60, :language=>"ar"}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.