Giter Site home page Giter Site logo

fgrehm / serenata-ocr Goto Github PK

View Code? Open in Web Editor NEW
12.0 3.0 0.0 390 KB

A Serverless API for OCRing Serenata de Amor's documents (currently limited to Chamber of Deputies receipts)

License: MIT License

Shell 9.55% JavaScript 90.45%
google-cloud-vision claudiajs google-cloud aws-lambda ocr serverless serenata-de-amor

serenata-ocr's Introduction

Serenata OCR

A Serverless API for OCRing Serenata de Amor's documents (currently limited to Chamber of Deputies receipts). Powered by Claudia.JS and Google Cloud Vision.

From zero to an OCR API in minutes

serenata-ocr

https://asciinema.org/a/149404

Initial setup

In terms of tools / development stuff, while a Docker environment is in the works, this is what you'll need:

  • git clone [email protected]:fgrehm/serenata-ocr.git
  • cp config.json{.example,}
  • NodeJS 6.10 (:warning: This is important, it is the version executed in AWS Lambda).
  • yarn install or npm install
  • Claudia.JS CLI (npm install -g claudia)
  • AWS credentials configured for claudia as outlined in this tutorial

For OCRing with Google Cloud Vision you'll need:

Deployment

As mentioned above, make sure your AWS credentials are configured as outlined in this tutorial. Once you have that done, proceed to your first deployment of the API:

claudia create --region us-east-1 \
               --api-module app \
               --timeout 60 \
               --memory 512 \
               --set-env-from-json config.json

At the end of claudia create you'll get an url, to test it run:

API="https://YOUR_API_ID.execute-api.us-east-1.amazonaws.com/latest/chamber-of-deputies/receipt"
# One liner if you have `jq` installed
API="https://$(jq -r '.api.id' claudia.json).execute-api.us-east-1.amazonaws.com/latest/chamber-of-deputies/receipt"

# OCR a receipt and get the full text of the PDF
curl "${API}/1789/2015/5631380" > 5631380.json

# Play with the data
jq '.config + .extra' 5631380.json
jq '.ocrResponse.fullTextAnnotation.text' 5631380.json

Documentation

๐Ÿšง Proper documentation is in the works ๐Ÿšง

  • From a high level, this is what gets done under the hood:
    • The receipt PDF associated with the reimbursement is downloaded from the Chamber of Deputies website.
    • ImageMagick is used to convert the PDF to a PNG image with: convert -density <density> receipt.pdf -quality 100 -deskew 40% -append receipt.png
    • The PNG is uploaded to Google Cloud Vision and the results are sent back to the client.
  • For custom parameters supported by the API, see app.js for now.
  • For local execution, see local.js for now (run with node local.js).
  • Example responses at examples/
  • Some useful utilities at Deskfile
  • More info? Please read the code for now, it is super small:

Wanna help?

See the issue tracker for inspiration.

Troubleshooting

Feel free to create an issue.

Function times out

Maybe the document is too big for your function to handle so give it more :muscle:

claudia update --timeout 90 --memory 1024

serenata-ocr's People

Contributors

fgrehm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

serenata-ocr's Issues

Testing infrastructure

Now that the prototype works โ„ข๏ธ , it is time to take a step back and implement some much needed automated tests along with setting up a CI.

Experiment with cropping images

In theory we shouldn't need to deskew images (at least for google cloud vision) but I noticed that it yields better results. I wonder if cropping is going to help as well.

If it doesn't, at least we can save some $ in terms of data transferred to providers.

Documentation

  • API parameters
  • Response structure
  • Development with Docker
  • Costs
  • ...

Return full text by default, allow for fetching raw provider's response

With the upcoming support for multiple providers, it is going to be hard for clients to handle the specifics of each one of them, not to say that AWS charges for data transfered as well and Google Cloud Vision API responses are huge.

By default, the response should be the full text of the document, passing in raw=true would give back the complete provider's response.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.