lukevanin / ocrai Goto Github PK

5.0 1.0 4.0 405 KB

Optical Character Recognition Artificial Intelligence iOS app for Udacity nanodegree

License: MIT License

Swift 91.04% Objective-C 8.96%

optical-character-recognition natural-language-processing contacts ios swift mapkit monkeylearn google-vision udacity-nanodegree monkeylearn-api

ocrai's Introduction

OCRAI

Optical Character Recognition Artificial Intelligence iOS app for Udacity nanodegree.

API Configuration

API services such as Google Vision and Monkey Learn require credentials for access. The exact credentials depends on each service. Usually this entails registering for an account, and acquiring a key or authorization token. The credentials are not included in this repository as doing so presents a security risk. Instructions for each service follow below. Please check the documentation for the relevant service for more details.

Google Vision & Google Natural Language

Follow the instructions for "Setup an API key" here: https://cloud.google.com/natural-language/docs/common/auth. Note: The API key should not be restricted.
In the XCode project, copy or rename "google-api-config.default.plist" to "google-api-config.plist".
Edit the file from step 2, and enter the Google API key for the "key" field.

Monkey Learn

Login to MonkeyLearn here: http://monkeylearn.com/
Find the API Token under the "API Keys" section under "My Account". https://app.monkeylearn.com/main/my-account/tab/api-keys/
In the XCode project, copy or rename "monkeylearn-api-config.default.plist" to "monkeylearn-api-config.plist".
Edit the plist file from step 3, and enter the API token for the "authorizationToken" field.

TODO

Essential

Enable camera / photo library buttons only when functionality is available.
Normalize image orientation when taking photo and importing.
Color image card according to type (person, organization, event, etc).
Include postal addresses in exported contact.
Prompt to overwrite when scanning over existing information.
Resize image to 1024x768 for uploading.

Nice to have

Only use single derived value for each line of text. Don't re-scan line of text if already tagged. I.e. Don't tag "Sandy Bay" as organization if already tagged as an address.
Search: Name, organization, phone number.
iPad layout: Grid documents list. Show document as model popover, or side detail view.
Continue scanning in background when switching back to list from detail view.
Improve editing: Remove modal edit/done state. Tap on textfield to edit. Enter to save. Always show blank textfield - adding text and entering saves data and creates new blank textfield.
Improve organization name detection: Check remaining text for nouns, after name detection.
Pre-process scanned image: Histogram balance.
Scan raw / uncompressed image data (avoid JPEG artifacts).
Support additional services: Haven, Tesseract.
Extract date information, tag fragments with dates, as event type.
Extract faces from scanned image.
Extract machine codes (QR code, bar code) from scanned image.
Extract logos from scanned image.
3D touch shortcut actions: Take photo,
App extension: Scan image from photos app (import into scanner app).

ocrai's People

Contributors

Stargazers

Watchers

Forkers

dmellop mspviraj tarsbase idevashish

ocrai's Issues

Add support for Microsoft Computer Vision API for text recognition

Support Microsoft computer vision API for extracting text from images.

https://www.microsoft.com/cognitive-services/en-us/computer-vision-api

Add support for Haven On-demand for entity extraction

Wrong colour shown when moving item between sections

Reproduce:

Tap on an item to go to the detail view.
Scan the item or add an entity.
Tap to edit the item.
Drag an item to a different section.

The item keeps has the colour of the section it was in.

after

before

Show grid layout on iPad (also landscape view on iPhone)

Currently this uses a table view which spans the width of the screen.

Context aware actions for addresses

Open in maps

Search

Search documents from listing screen

Image post-processing to improve quality

Histogram equalisation
Edge sharpening
Denoising

App extension for scanning images from imaging apps (ie photos app and camera roll).

Integrate MonkeyLearn data extraction for phone numbers, email addresses, postal addresses, dates.

https://app.monkeylearn.com/main/extractors/ex_dqRio5sG/

Documents should be coloured according the primary intent (organisation, person, event)

App crashes when taking photo

Reproduce:

Launch app
Tap on camera (accept permissions if prompted).
Tap on photo icon.
Tap use photo
App crashes

In-app text recognition using Tesseract

Context aware actions for email address field

Send email

Refactor data model

Current: Raw data from data detector is stored as fragments with annotations demarcating the detected data. The user views and edits fragments directly.

Problem: Fragment data does not correspond directly to the user's needs. E.g. changing a field to a different type, inserting a new field, or removing a field, causes the data to no longer correspond to the scanned data.

Goal: Decouple scanned data from user data. Data should be modelled to better fit the intent of user modification. Original data should be retained if needed separately from user modification.

Add keyword detection

Add support in scanner for keyword detection api.
Show keywords in scanned document.
Allow keywords to be edited, added, and removed.

App crashes on device when camera button is tapped on device without a camera

Launch app on device without a camera (iPod or simulator).
Tap camera button.
App crashes.

Identify dates in images

Add capability using existing data detector.

Show prompt when document is empty.

Identify and extract logos from images with Google vision API

3D touch actions

Take photo
Import from library

Add "role" to fragment types.

Role describes the position a person fills at an organisation.

Image should be resized to maximum dimensions before uploading to API

Add "Add new field" button on document screen

List screen should show indicator when scanning is in progress

Add a document from the camera or photo library, or tap on an existing document and tap on the scan button.
Scanning begins.
Tap the back button (while scanning is underway).
List appears.
Wait for scanning to complete.
List is updated.

Expected:
Message or activity indicator should appear to show that scanning is in progress.

Shared contact does not include a postal address

Add a document.
Add a name and address to the document.
Tap share.
Save to contacts.
View contact.
Contact has name, but no postal address.

Scan button becomes unresponsive after scrolling

Tap to view a document (with or without fields).
Tap scan button. Note scanning functions normally (progress indicator appears).
Scroll document.
Tap scan button again. Nothing happens.

Improve scanner user feedback

Current: Scanner process works atomically. Document is scanned in full, then imported into the data store.

Problem: User must wait for the entire scanning process to complete before seeing results.

Goal: Scanner should update data store incrementally as soon as data becomes available.

Implementation: Create a builder interface for composing document. Scanners send detected data to the builder. The builder updates the data store. View controller observes the data store and updates the view when the data store changes.

UI: Improve field type indicators

Coloured dots are shown next to each field. The dots are intended to indicate the field type. The colour is ambiguous without context.

Goal: Add a legend to indicate the field type, or add icons instead of dots, or remove indicators entirely and rely on section headers.

Library images do not appear after granting app access permission

Reproduce:

Add an image to the photo library or camera roll on the device.
Install app
Launch app
Tap library button
Permissions alert appears
Grant permissions
List is empty
Dismiss library
Tap library button again
List contains items

Identify faces in images

Support face identification and cropping (Google, Microsoft, etc)

Support raw image format to capture losses images

Add support for Microsoft entity linking API

Support Microsoft entity linking API for entity extraction.

https://www.microsoft.com/cognitive-services/en-us/entity-linking-intelligence-service

Context aware actions for phone numbers

Call
Send message

Address sometimes parsed as two separate parts

Occurs when the scanned text data contains recognisable address data interleaved with other data. The app does not recognise that the two parts of data are related.

The addresses should be merged into a single entity. Separate addresses should stay disjointed.

Possible solutions:

Use coordinate proximity to determine relationship.
Merge by matching data with corresponding missing fields. E.g. If A has a street but no country, and B has a country but no street, then the addresses can be merged.

This may be resolved using Microsoft Vision API which groups information differently.

Alternatively, allow user to select addresses to merge. Use case:

Tap on address.
Tap merge button on context menu.
List of all other addresses appears.
Tap address to merge into.
Show preview of merged address. Corresponding fields which both contain content are concatenated. Alternatively user can control the field merging by selecting the fields to be included.
A new object is created with the merged data. The merged objects are deleted.

Blank field added to document

Reproduce:

Select document from list (empty or pre-populated).
Tap edit.
Tap on empty field.
Do not enter any text.
Tap on another field.
Note the first field is saved and a new empty field appears.

Expected:
Empty field should not be saved.

Swipe to delete field on document screen.

Use structured data for fields

After scanning data is stored as key value pairs. It would be beneficial to store certain kinds of data, such as addresses, in specialised data structures.

Structured data
Addresses consist of multiple components, and can be used to derive additional data, such as geographical coordinates. The current key-value storage schema prevents this.

Unstructured data
Unstructured data, such as names and untagged text, should may be stored as key value pairs. The data may be tagged to indicate its intent. E.g. name (first and last if possible), organisation, department, salutation.

Semi-sructured data
Semi-structured data, such as phone numbers, URLs, email addresses, and social media names, may also be stored as plain text. These values may be labelled (e.g. home, work, fax, etc) to indicate their role. It would be beneficial to provide UI functions specific to the type of data. e.g. Call a phone number, send a message to a phone number or email address, or open a web page. All of these can be shared. This kind of data should be validated for conformance to accepted protocols. When the user edits information it should be checked for conformance. If the data does not conform, it should be saved and a warning shown.

Tags for phone numbers: home, work, fax
Tags for email: home, work
URLs are not tagged, although they can be labelled: Blog, web site, home page, news, twitter, Facebook.
Social media names should be associated with recognised social media providers (Twitter, Facebook). It should be possible to derive a profile URL from the name. The user should be allowed to convert an unrecognised social media name into a URL. Social media accounts may be a specialised form of URL (i.e. the account name is converted to a URL, which is labelled automatically to indicate a social media account).

Update images for toolbar buttons

Library button
Camera button
Scan button

iPad layout

Show document as slide out detail view.

Accessory view should not appear when document has no actionable content.

Improve editing

Current: Fields are grouped by type. Fields are edited inline. Field type is changed by dragging to a different section.

Problem: Editing controls (edit, add, move) makes the view feel busy and crowded, which impedes usability. Dragging fields is problematic (sections may be off screen requiring scrolling while dragging which is hard to do reliably, user may not know which direction to drag a field to).

Goal: Tap on a field to show an edit screen for that field. Show a picker with field types. Customise the view to accommodate the data being edited (allow multilines for addresses, disallow multiline for phone numbers and email).

Prevent inline editing on document screen

Toolbar overlaps last field on document.

See attached screenshot. The app is in edit mode. Note the toolbar occludes the last "address" field which is an empty placeholder.

Normalize image orientation

Image orientation metadata is not used when rendering annotation overlays. The image should be rendered to remove the orientation, or the annotations should be rendered using orientation.

Existing information is overwritten when scanning

Create a document.
Add fields to the document by scanning, or manual entry.
Tap scan button.
Existing fields are removed and replaced with scanned information.

Expected:

App should prompt user before overwriting information.

Context aware actions

Actions which can be performed on any field:

Copy
Share
Delete

Define abstract interface to be implemented by model objects. Interface should define the actions which the object can perform.

Define abstract interface for actions. Actions do not have state. An action is simply an interface to a task which can be executed. Actions may need to be aware of the view hierarchy (i.e. view controller) to present UI. Do actions need to notify the application on completion? An action may be shown as a table view action (delete), or as an activity. Actions may need to define a presentation intent.

The data type must be selected before a new field entity is created.
The UI should reflect the type of data being created.
The data should be checked for conformance when the data is entered.
UI features should match the data being entered.

Types of data:

Person (Faces, Names, Roles, Departments)
Organisation (Name, Logo)
Phone number
Email
URL
Network account (service, account name, public URL)

Improve organisation name detection

Possible solutions:

Use lexical analysis to determine use of common and possessive nouns to determine if a name is an organisation (false positives such as "Bill Hammer", won't work for acronyms "NASA").
Use context: relative text coverage (big names are probably organisations), names near logos are probably organisations.

lukevanin / ocrai Goto Github PK

ocrai's Introduction

OCRAI

API Configuration

Google Vision & Google Natural Language

Monkey Learn

TODO

Essential

Nice to have

ocrai's People

Contributors

Stargazers

Watchers

Forkers

ocrai's Issues

Recommend Projects

Recommend Topics

Recommend Org