Giter Site home page Giter Site logo

ocreract.jl's People

Contributors

leferrad avatar sunoru avatar timholy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ocreract.jl's Issues

Allow to handle user-words and user-pattern

From the documentation, tesseract supports to specify a file for user-words and user-patters that could be supported in this wrapper

OCR options:
  --tessdata-dir PATH   Specify the location of tessdata path.
  --user-words PATH     Specify the location of user words file.
  --user-patterns PATH  Specify the location of user patterns file.
  -l LANG[+LANG]        Specify language(s) used for OCR.
  -c VAR=VALUE          Set value for config variables.
                        Multiple -c arguments are allowed.
  --psm NUM             Specify page segmentation mode.
  --oem NUM             Specify OCR Engine mode.

CI is not working for nightly version

As stated in #16, due to the Cassette behavior (used by SimpleMocks) in the nightly version, the CI pipeline breaks during the Tests execution. We should see if this can be solved like:

  • Fixing it through suites of tests that avoid using SimpleMocks for nightly version
  • Change the way of testing to avoid using Cassete or SimpleMocks
  • Disable CI for nightly version (given it does work with Julia version)

Broken installation on windows

It seems that installation is broken on windows. Here's the stack trace I'm seeing:

julia> using OCReract
[ Info: Precompiling OCReract [c9880795-194d-450c-832d-1e8a03a8ecd1]
┌ Error: Tesseract is not properly installed. Command tesseract not recognized
└ @ OCReract C:\Users\kawcz\.julia\packages\OCReract\scjW5\src\tesseract.jl:18
┌ Error: Exception while generating log record in module OCReract at C:\Users\kawcz\.julia\packages\OCReract\scjW5\src\tesseract.jl:31
│   exception =
│    IOError: could not spawn `tesseract --version`: no such file or directory (ENOENT)
│    Stacktrace:
│      [1] _spawn_primitive(file::String, cmd::Cmd, stdio::Vector{Union{RawFD, Base.Libc.WindowsRawSocket, IO}})
│        @ Base .\process.jl:128
│      [2] #725
│        @ .\process.jl:139 [inlined]
│      [3] setup_stdios(f::Base.var"#725#726"{Cmd}, stdios::Vector{Union{RawFD, Base.Libc.WindowsRawSocket, IO}})
│        @ Base .\process.jl:223
│      [4] _spawn
│        @ .\process.jl:138 [inlined]
│      [5] open(cmds::Cmd, stdio::Base.DevNull; write::Bool, read::Bool)
│        @ Base .\process.jl:393
│      [6] open
│        @ .\process.jl:383 [inlined]
│      [7] open(cmds::Cmd, mode::String, stdio::Base.DevNull)
│        @ Base .\process.jl:364
│      [8] read(cmd::Cmd)
│        @ Base .\process.jl:447
│      [9] read(cmd::Cmd, #unused#::Type{String})
│        @ Base .\process.jl:458
│     [10] get_tesseract_version()
│        @ OCReract C:\Users\kawcz\.julia\packages\OCReract\scjW5\src\tesseract.jl:26
│     [11] top-level scope
│        @ logging.jl:360
│     [12] include(mod::Module, _path::String)
│        @ Base .\Base.jl:419
│     [13] include(x::String)
│        @ OCReract C:\Users\kawcz\.julia\packages\OCReract\scjW5\src\OCReract.jl:1
│     [14] top-level scope
│        @ C:\Users\kawcz\.julia\packages\OCReract\scjW5\src\OCReract.jl:6
│     [15] include
│        @ .\Base.jl:419 [inlined]
│     [16] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::Nothing)
│        @ Base .\loading.jl:1554
│     [17] top-level scope
│        @ stdin:1
│     [18] eval
│        @ .\boot.jl:368 [inlined]
│     [19] include_string(mapexpr::typeof(identity), mod::Module, code::String, filename::String)
│        @ Base .\loading.jl:1428
│     [20] include_string(m::Module, txt::String, fname::String)
│        @ Base .\loading.jl:1438
│     [21] exec_options(opts::Base.JLOptions)
│        @ Base .\client.jl:301
│     [22] _start()
│        @ Base .\client.jl:522
└ @ OCReract C:\Users\kawcz\.julia\packages\OCReract\scjW5\src\tesseract.jl:31

Compatibility with Julia 1.0+

So far, it is only tested for Julia 1.4. But from a simple inspect, it could be compatible with 1.0+. Then, it is needed to fix CI configuration to ensure this compatibility, and then register the package for all these versions

Option to return confidence levels?

Hi, I am wondering if there is a way to return the per-character prediction confidence? I couldn't find anything related to confidence levels in the docs. If I've missed something, please let me know.

There was an issue posted about this at ropensci/tesseract#8 and was resolved by using calling the ocr_data() function or otherwise set HOCR = TRUE in ocr().

Is there an equivalent approach to this in OCReract?

Create Tesseract_jll package

Ideally the dependency on Tesseract OCR would be handled by creating a _jll library using BinaryBuilder. That way the entire package would always be available simply through Pkg.add.

In the meantime, #19 clarifies the requirements.

Cant get to work on Windows

Hi,
I have installed tesseract (version 5.4.0) and can extract text using CMD commands. Unfortunately, the julia package (v1.3.1) does not work for me. I am running Julia version 1.9.0 on Windows 11.

  [916415d5] Images v0.26.1
  [c9880795] OCReract v1.3.1

I have also run the tests and got the following output:

Testing Running tests...
┌ Error: Error ocurred while running Tesseract! Base.IOError("could not spawn `tesseract 'C:\\Users\\elias\\.julia\\packages\\OCReract\\Ax3XS\\/test/files/testocr.png' 'C:\\Users\\elias\\AppData\\Local\\Temp\\jl_M1QJGPKbgD/res.txt' --oem 1 --psm 3`: no such file or directory (ENOENT)", -4058)
└ @ OCReract C:\Users\elias\.julia\packages\OCReract\Ax3XS\src\tesseract.jl:137
RunTesseract: Test Failed at C:\Users\elias\.julia\packages\OCReract\Ax3XS\test\test_tesseract.jl:33
  Expression: res == true
   Evaluated: false == true

Stacktrace:
 [1] macro expansion
   @ C:\Users\elias\AppData\Local\Programs\Julia-1.9.0\share\julia\stdlib\v1.9\Test\src\Test.jl:478 [inlined]
 [2] test_run_tesseract()
   @ Main C:\Users\elias\.julia\packages\OCReract\Ax3XS\test\test_tesseract.jl:33
RunTesseract: Error During Test at C:\Users\elias\.julia\packages\OCReract\Ax3XS\test\test_tesseract.jl:123
  Got exception outside of a @test
  SystemError: opening file "C:\\Users\\elias\\AppData\\Local\\Temp\\jl_M1QJGPKbgD/res.txt": No such file or directory
  Stacktrace:
    [1] systemerror(p::String, errno::Int32; extrainfo::Nothing)
      @ Base .\error.jl:176
    [2] #systemerror#82
      @ .\error.jl:175 [inlined]
    [3] systemerror
      @ .\error.jl:175 [inlined]
    [4] open(fname::String; lock::Bool, read::Bool, write::Nothing, create::Nothing, truncate::Nothing, append::Nothing)
      @ Base .\iostream.jl:293
    [5] open
      @ .\iostream.jl:275 [inlined]
    [6] open(fname::String, mode::String; lock::Bool)
      @ Base .\iostream.jl:356
    [7] open(fname::String, mode::String)
      @ Base .\iostream.jl:355
    [8] open(::var"#3#4", ::String, ::Vararg{String}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
      @ Base .\io.jl:393
    [9] open
      @ .\io.jl:392 [inlined]
   [10] test_run_tesseract()
      @ Main C:\Users\elias\.julia\packages\OCReract\Ax3XS\test\test_tesseract.jl:36
   [11] macro expansion
      @ C:\Users\elias\.julia\packages\OCReract\Ax3XS\test\test_tesseract.jl:125 [inlined]
   [12] macro expansion
      @ C:\Users\elias\AppData\Local\Programs\Julia-1.9.0\share\julia\stdlib\v1.9\Test\src\Test.jl:1498 [inlined]
   [13] top-level scope
      @ C:\Users\elias\.julia\packages\OCReract\Ax3XS\test\test_tesseract.jl:125
   [14] include(fname::String)
      @ Base.MainInclude .\client.jl:478
   [15] macro expansion
      @ C:\Users\elias\.julia\packages\OCReract\Ax3XS\test\runtests.jl:4 [inlined]
   [16] macro expansion
      @ C:\Users\elias\AppData\Local\Programs\Julia-1.9.0\share\julia\stdlib\v1.9\Test\src\Test.jl:1498 [inlined]
   [17] top-level scope
      @ C:\Users\elias\.julia\packages\OCReract\Ax3XS\test\runtests.jl:4
   [18] include(fname::String)
      @ Base.MainInclude .\client.jl:478
   [19] top-level scope
      @ none:6
   [20] eval
      @ .\boot.jl:370 [inlined]
   [21] exec_options(opts::Base.JLOptions)
      @ Base .\client.jl:280
   [22] _start()
      @ Base .\client.jl:522
Test Summary:  | Fail  Error  Total  Time
OCReractTest   |    1      1      2  5.7s
  RunTesseract |    1      1      2  3.1s
ERROR: LoadError: Some tests did not pass: 0 passed, 1 failed, 1 errored, 0 broken.
in expression starting at C:\Users\elias\.julia\packages\OCReract\Ax3XS\test\runtests.jl:3
ERROR: Package OCReract errored during testing

Any help would be very much appreciated!

New release?

Would it be possible to get a new release? I notice the released version is still pulling in Images.jl, which is heavier than needed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.