Giter Site home page Giter Site logo

wgwz / cdc-vaccine-card-validation Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 3 KB

Python script to validate if an image contains a CDC issued COVID vaccination card using AWS Textract.

License: MIT License

Python 100.00%
python aws aws-textract covid-19 vaccination ocr

cdc-vaccine-card-validation's Introduction

COVID-19 CDC Vaccination Card Validation

This repository contains a script that can be used to validate images of CDC issued vaccination cards. The script was designed to read in a list of files in a particular directory, and attempt to validate whether or not the image contains a valid CDC issued card.

The check only validates if the "COVID-19 Vaccination Record Card" text at the top of CDC cards is present. It does not validate any other features of the card such as the persons name or whether or not the person has received both dosages. It could perhaps be extended to do that but it will take some work.

This repository is MIT Licensed, and given the importance of this problem I wanted to share it. That said, there are real data privacy concerns that come along with using a service like AWS Textract and those should be taken into consideration.

Validation? Not really..

The script here does not actually validate the card. It's a bit of a misnomer for me to name it as such. All that the script concludes is that it found the "COVID-19 Vaccination Record Card" text somewhere in the image. Therefore as it stands, the validation provided in this repo could easily be cheated if someone were to print out a piece of paper with the same text printed on it. So this repo really provides a ground-level starting point for the validation of these cards. It is a crude check, which might be helpful in certain situations.

How it works?

This is a very simple script, it was a proof-of-concept. It is not running in production. Given a directory, it will find all the images in the directory and attempt to submit those images for processing by AWS Textract. It will print out a list of those that were valid or too large to be processed by AWS Textract.

$ python3 -m venv venv
$ . venv/bin/activate
(venv) $ pip install -r requirements.txt 
(venv) $ python validate.py
Image: X143A2818_1 (1).jpg is too large to process with AWS Textract
Image: X143A2818_1.jpg is too large to process with AWS Textract
Image: image_55415491.JPG is Valid Vaccine Card
Image: image_67155969.JPG is Valid Vaccine Card

Here is the specific API reference that the code uses.

Prerequisites

  • You need to have followed the boto3 installation guide.
  • This means you need valid AWS credentials.
  • Those AWS credentials need permissions to AWS Textract.

AWS Textract

This script relies on the AWS Textract service to do the OCR. At the time of writing the AWS Textract service offers a cheap, and high quality service. Currently is $1.50/1000 pages processed.

There are some limitations to be aware of with AWS Textract. Most importantly is that at the time of writing the service only accepts images up to 10MB in size.

The script in this repository takes that into account, but does not do anything to solve that problem. If the script finds some image that is too large to process with AWS Textract, it just skips that file.

Another important limitation is that you can only use JPG/PNG files with AWS Textract. Obviously any production application will need to take these things into account.

Data Privacy

Simply put, be aware of how AWS uses the data you process with AWS Textract. See the "Data Privacy" section of the FAQs. They will use the data you upload to make the service better. That may or may not be acceptable for you and your company.

Productionizing

You might want to tune the connection and read timeouts for boto.

cdc-vaccine-card-validation's People

Contributors

kyle-tc avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.