Giter Site home page Giter Site logo

node-tika's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

node-tika's Issues

unable to download java build files while installing the tika

while installing the tika, I am getting node gyp rebuild error which is strange. Changed the configs for python and changed MVS version to 2013 even than unable to install tika. Tried to install windows build tools but unable to install tools with the command

Upgrade Tika

If i want to use the newest tika version do i just have to put another jar in the folder or is there more work to do?

cannot find module tika

just copied the code from your example and get the following error:

module.js:529
throw err;
^

Error: Cannot find module 'tika'
at Function.Module._resolveFilename (module.js:527:15)
at Function.Module._load (module.js:476:23)
at Module.require (module.js:568:17)
at require (internal/module.js:11:18)
at Object. (/home/henrysachs/move-search/backend/node_approach/events.js:1:74)
at Module._compile (module.js:624:30)
at Object.Module._extensions..js (module.js:635:10)
at Module.load (module.js:545:32)
at tryModuleLoad (module.js:508:12)
at Function.Module._load (module.js:500:3)

Small issue in documentation

According to documentation:

tika.text('http://www.ohchr.org/EN/UDHR/Documents/UDHR_Translations/eng.pdf', 
    function(err, text, meta) {
        // ...
    });

should work. However, later in the documentation is stated that you need to call tika.extract to get also the metadata; tika.text feeds callback only with extracted text. (confirmed by an experiment :) )

Module did not self-register

I'm attempting to use node-tika in a project that builds/tests via CircleCI. Their CI environment installs various things for me, but when my server attempts to start it fails as follows:

[17:54:45] Using gulpfile ~/ow-back/gulpfile.js
[17:54:45] Starting 'syncdb'...
[17:54:45] Finished 'syncdb' after 288 ms
[17:54:45] Starting 'serve'...
[17:54:46] 'serve' errored after 240 ms
[17:54:46] Error: Module did not self-register.
    at Error (native)
    at Object.Module._extensions..node (module.js:450:18)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:313:12)
    at Module.require (module.js:366:17)
    at require (module.js:385:17)
    at Object.<anonymous> (/home/ubuntu/ow-back/node_modules/tika/node_modules/java/lib/nodeJavaBridge.js:10:16)
    at Module._compile (module.js:425:26)
    at Object.Module._extensions..js (module.js:432:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:313:12)
    at Module.require (module.js:366:17)
    at require (module.js:385:17)
    at Object.<anonymous> (/home/ubuntu/ow-back/node_modules/tika/node_modules/java/index.js:2:18)
    at Module._compile (module.js:425:26)

CircleCI appears to be running ubuntu, and JAVA_HOME=/usr/lib/jvm/jdk1.7.0 to give you a sense of my java version.

It appears this is some kind of java incompatibility issue, but what are the troubleshooting steps associated with fixing this?

Tika 1.9

Version 1.9 is already released. ๐Ÿ‘

cannot extract text from scanned PDF

I am trying to extract text from scanned pdf documents. It works fine for most of them except a couple I tested.
I am able to extract the metadata correctly but not the text in the pdf. It returns with a blank set of lines for the text part.
Are there any specific pdf versions or some other criteria that can cause this issue? Does it have anything to do with the pdf producer which in this case is Haru Free PDF Library 2.0.8?

Inconsistent naming of PDF options

The readme file lists PDF options with a pdf prefix, however; the fillPdfOptions() method looks for options without the prefix. I would be happy to make a PR, I just need to know if you want to update the README or the option parser.

Thanks!

Module version mismatch - Electron app

I'm trying to get Tika module to work in an electron app, but as soon as I require the module, I get this error:
Uncaught Error: Module version mismatch. Expected 47, got 46.
I am using nodeVersion v4.2.1 and npmVersion 2.14.7

Need help making tika work in AWS Lambda

Our scenario is to get .pdf files uploaded in AWS S3 storage and process it later. We want to move to AWS Lambda. However, Lambda requires that the entire package (along with all node_modules) be uploaded as a zip file (i.e. it wont run npm install). This means that tika picks up whatever java path that the local machine happened to have and save it in jvm_dll_path.json. The path to libjvm.so is different on the Lambda machine, and loading the module fails with "libjvm.so: cannot open shared object file: No such file or directory".
I tried just replacing the string in jvm_dll_path.json with the correct AWS path, but no dice.
Really appreciate any help to make this work on Lambda.
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.