Giter Site home page Giter Site logo

cedws / finch Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 1.0 83 KB

A proof-of-concept for enhancing and organising an image collection using Google's Vision API

License: GNU General Public License v3.0

Rust 100.00%
google-api image-processing vision-api

finch's People

Contributors

cedws avatar

Watchers

 avatar  avatar  avatar

Forkers

davidssmith

finch's Issues

Provision Cloud Storage on-the-fly.

The Vision API has a request body limit of 10MB, which is easily reached when encoding high quality images to Base64.

It's possible to pass a Google Cloud Storage URI to the Vision API. If the images being processed are pushed to Cloud Storage, the process will be faster overall, because all web detection queries can be bundled into one request. It will also allow images of up to 20MB to be processed.

I would like Finch to provision a Cloud Storage container on-the-fly, upload the images to be processed, run the Vision API query, and then deallocate the container to avoid incurring unnecessary costs.

Investigate Yandex's reverse image search.

Yandex's reverse image search seems to be superior to Google's in every way. Google has also shown little interest in fixing the API quirks that I reported. I haven't been able to find an API for Yandex but perhaps there's another way (preferably without HTML scraping).

Filesize check is incorrect.

The Vision API limits the JSON request to 10MB, not the Base64 encoded image. This will cause issues in edge cases.

Finch fetches and overwrites images even if the webserver serves a dummy image.

After running Finch on a couple of thousand images, I noticed that some sites have expiry dates on the URLs given by the Vision API, and will serve dummy/error images whilst still returning 200 OK.

The only way I can think of fixing this is to introduce a perceptual hash algorithm to check that the fetched image is correct.

Use gRPC instead of REST API.

The gRPC offers increased performance, and no 10MB request body limit like the REST API has. This also means we can avoid provisioning a storage Bucket for the sake of circumventing these limits.

Finch should make requests asynchronously, not in parallel.

The parallel implementation is fairly inefficient. The goal is to open as many API requests as possible at once as each response can take several seconds. However, this is bottlenecked by the upload speed anyway. A better way to do this would be to asynchronously make API requests.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.