Giter Site home page Giter Site logo

Spark OCR about spark-ocr-workshop HOT 6 OPEN

johnsnowlabs avatar johnsnowlabs commented on June 3, 2024
Spark OCR

from spark-ocr-workshop.

Comments (6)

xyutech avatar xyutech commented on June 3, 2024

Hello,
May you share which version of spark-nlp you use?

from spark-ocr-workshop.

asismohanty81 avatar asismohanty81 commented on June 3, 2024

from spark-ocr-workshop.

xyutech avatar xyutech commented on June 3, 2024

Hello Asis,

Thank you for information.
May you make sure your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are valid?

from spark-ocr-workshop.

xyutech avatar xyutech commented on June 3, 2024

Also it'd be helpful to see output of
spark
invocation. Something like

Spark version: 3.0.2
Spark NLP version: 3.0.1
Spark OCR version: 3.7.0

from spark-ocr-workshop.

jigsawcoder avatar jigsawcoder commented on June 3, 2024

I am facing similar issue while using the below example in Google Colab:
https://github.com/JohnSnowLabs/spark-ocr-workshop/blob/master/jupyter/SparkOcrImageTableRecognitionPdf.ipynb

Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.ocr.OcrPythonResourceDownloader.getDownloadSize.
: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 3G6894YVGPKFHYRC; S3 Extended Request ID: cxQQE9B6i8HgmWAlJ72zulORmmV9ACK71mMXticDwDEoVHXgV/VU0yAMlsi/hvWTMqBXmxi2tXI=), S3 Extended Request ID: cxQQE9B6i8HgmWAlJ72zulORmmV9ACK71mMXticDwDEoVHXgV/VU0yAMlsi/hvWTMqBXmxi2tXI=
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4921)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4867)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1467)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1326)
at com.johnsnowlabs.client.AWSGateway.getMetadata(AWSGateway.scala:112)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.downloadMetadataIfNeed(S3ResourceDownloader.scala:62)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.resolveLink(S3ResourceDownloader.scala:68)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.getDownloadSize(S3ResourceDownloader.scala:145)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.getDownloadSize(ResourceDownloader.scala:378)
at com.johnsnowlabs.ocr.OcrPythonResourceDownloader$.getDownloadSize(OcrPythonResourceDownloader.scala:23)
at com.johnsnowlabs.ocr.OcrPythonResourceDownloader.getDownloadSize(OcrPythonResourceDownloader.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)

While using ImageTableDetector.pretrained("general_model_table_detection_v2", "en", "clinical/ocr")

I am using 30 day trial version right now and AWS access key and secret key is 'Null'
So I am passing a black string.
AWS_ACCESS_KEY_ID = ''
AWS_SECRET_ACCESS_KEY = ''

Version detail:

Spark version: 2.4.7
Spark OCR version: 3.8.0

SparkSession - in-memory

SparkContext

Spark UI

Version
v2.4.7
Master
local[*]
AppName
Spark OCR

from spark-ocr-workshop.

kolia1985 avatar kolia1985 commented on June 3, 2024

Hello @jigsawcoder . Did you receive aws credentials in email with license key? If not please contact to the customer support or public slack.

from spark-ocr-workshop.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.