Giter Site home page Giter Site logo

lukevanin / ocrai Goto Github PK

View Code? Open in Web Editor NEW
5.0 1.0 4.0 405 KB

Optical Character Recognition Artificial Intelligence iOS app for Udacity nanodegree

License: MIT License

Swift 91.04% Objective-C 8.96%
optical-character-recognition natural-language-processing contacts ios swift mapkit monkeylearn google-vision udacity-nanodegree monkeylearn-api

ocrai's Introduction

OCRAI

Optical Character Recognition Artificial Intelligence iOS app for Udacity nanodegree.

API Configuration

API services such as Google Vision and Monkey Learn require credentials for access. The exact credentials depends on each service. Usually this entails registering for an account, and acquiring a key or authorization token. The credentials are not included in this repository as doing so presents a security risk. Instructions for each service follow below. Please check the documentation for the relevant service for more details.

Google Vision & Google Natural Language

  1. Follow the instructions for "Setup an API key" here: https://cloud.google.com/natural-language/docs/common/auth. Note: The API key should not be restricted.
  2. In the XCode project, copy or rename "google-api-config.default.plist" to "google-api-config.plist".
  3. Edit the file from step 2, and enter the Google API key for the "key" field.

Monkey Learn

  1. Login to MonkeyLearn here: http://monkeylearn.com/
  2. Find the API Token under the "API Keys" section under "My Account". https://app.monkeylearn.com/main/my-account/tab/api-keys/
  3. In the XCode project, copy or rename "monkeylearn-api-config.default.plist" to "monkeylearn-api-config.plist".
  4. Edit the plist file from step 3, and enter the API token for the "authorizationToken" field.

TODO

Essential

  • Enable camera / photo library buttons only when functionality is available.
  • Normalize image orientation when taking photo and importing.
  • Color image card according to type (person, organization, event, etc).
  • Include postal addresses in exported contact.
  • Prompt to overwrite when scanning over existing information.
  • Resize image to 1024x768 for uploading.

Nice to have

  • Only use single derived value for each line of text. Don't re-scan line of text if already tagged. I.e. Don't tag "Sandy Bay" as organization if already tagged as an address.
  • Search: Name, organization, phone number.
  • iPad layout: Grid documents list. Show document as model popover, or side detail view.
  • Continue scanning in background when switching back to list from detail view.
  • Improve editing: Remove modal edit/done state. Tap on textfield to edit. Enter to save. Always show blank textfield - adding text and entering saves data and creates new blank textfield.
  • Improve organization name detection: Check remaining text for nouns, after name detection.
  • Pre-process scanned image: Histogram balance.
  • Scan raw / uncompressed image data (avoid JPEG artifacts).
  • Support additional services: Haven, Tesseract.
  • Extract date information, tag fragments with dates, as event type.
  • Extract faces from scanned image.
  • Extract machine codes (QR code, bar code) from scanned image.
  • Extract logos from scanned image.
  • 3D touch shortcut actions: Take photo,
  • App extension: Scan image from photos app (import into scanner app).

ocrai's People

Contributors

lukevanin avatar lukevanin-takealot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

ocrai's Issues

Wrong colour shown when moving item between sections

Reproduce:

  1. Tap on an item to go to the detail view.
  2. Scan the item or add an entity.
  3. Tap to edit the item.
  4. Drag an item to a different section.

The item keeps has the colour of the section it was in.

after
screen shot 2017-03-19 at 3 15 58 pm

before
screen shot 2017-03-19 at 3 15 54 pm

Search

Search documents from listing screen

App crashes when taking photo

Reproduce:

  1. Launch app
  2. Tap on camera (accept permissions if prompted).
  3. Tap on photo icon.
  4. Tap use photo
  5. App crashes

Refactor data model

Current: Raw data from data detector is stored as fragments with annotations demarcating the detected data. The user views and edits fragments directly.

Problem: Fragment data does not correspond directly to the user's needs. E.g. changing a field to a different type, inserting a new field, or removing a field, causes the data to no longer correspond to the scanned data.

Goal: Decouple scanned data from user data. Data should be modelled to better fit the intent of user modification. Original data should be retained if needed separately from user modification.

Add keyword detection

  1. Add support in scanner for keyword detection api.
  2. Show keywords in scanned document.
  3. Allow keywords to be edited, added, and removed.

List screen should show indicator when scanning is in progress

  1. Add a document from the camera or photo library, or tap on an existing document and tap on the scan button.
  2. Scanning begins.
  3. Tap the back button (while scanning is underway).
  4. List appears.
  5. Wait for scanning to complete.
  6. List is updated.

Expected:
Message or activity indicator should appear to show that scanning is in progress.

Improve scanner user feedback

Current: Scanner process works atomically. Document is scanned in full, then imported into the data store.

Problem: User must wait for the entire scanning process to complete before seeing results.

Goal: Scanner should update data store incrementally as soon as data becomes available.

Implementation: Create a builder interface for composing document. Scanners send detected data to the builder. The builder updates the data store. View controller observes the data store and updates the view when the data store changes.

UI: Improve field type indicators

Coloured dots are shown next to each field. The dots are intended to indicate the field type. The colour is ambiguous without context.

Goal: Add a legend to indicate the field type, or add icons instead of dots, or remove indicators entirely and rely on section headers.

Address sometimes parsed as two separate parts

Occurs when the scanned text data contains recognisable address data interleaved with other data. The app does not recognise that the two parts of data are related.

The addresses should be merged into a single entity. Separate addresses should stay disjointed.

Possible solutions:

  1. Use coordinate proximity to determine relationship.
  2. Merge by matching data with corresponding missing fields. E.g. If A has a street but no country, and B has a country but no street, then the addresses can be merged.

This may be resolved using Microsoft Vision API which groups information differently.

Alternatively, allow user to select addresses to merge. Use case:

  1. Tap on address.
  2. Tap merge button on context menu.
  3. List of all other addresses appears.
  4. Tap address to merge into.
  5. Show preview of merged address. Corresponding fields which both contain content are concatenated. Alternatively user can control the field merging by selecting the fields to be included.
  6. A new object is created with the merged data. The merged objects are deleted.

Blank field added to document

Reproduce:

  1. Select document from list (empty or pre-populated).
  2. Tap edit.
  3. Tap on empty field.
  4. Do not enter any text.
  5. Tap on another field.
  6. Note the first field is saved and a new empty field appears.

Expected:
Empty field should not be saved.

Use structured data for fields

After scanning data is stored as key value pairs. It would be beneficial to store certain kinds of data, such as addresses, in specialised data structures.

Structured data
Addresses consist of multiple components, and can be used to derive additional data, such as geographical coordinates. The current key-value storage schema prevents this.

Unstructured data
Unstructured data, such as names and untagged text, should may be stored as key value pairs. The data may be tagged to indicate its intent. E.g. name (first and last if possible), organisation, department, salutation.

Semi-sructured data
Semi-structured data, such as phone numbers, URLs, email addresses, and social media names, may also be stored as plain text. These values may be labelled (e.g. home, work, fax, etc) to indicate their role. It would be beneficial to provide UI functions specific to the type of data. e.g. Call a phone number, send a message to a phone number or email address, or open a web page. All of these can be shared. This kind of data should be validated for conformance to accepted protocols. When the user edits information it should be checked for conformance. If the data does not conform, it should be saved and a warning shown.

  • Tags for phone numbers: home, work, fax
  • Tags for email: home, work
  • URLs are not tagged, although they can be labelled: Blog, web site, home page, news, twitter, Facebook.
  • Social media names should be associated with recognised social media providers (Twitter, Facebook). It should be possible to derive a profile URL from the name. The user should be allowed to convert an unrecognised social media name into a URL. Social media accounts may be a specialised form of URL (i.e. the account name is converted to a URL, which is labelled automatically to indicate a social media account).

Improve editing

Current: Fields are grouped by type. Fields are edited inline. Field type is changed by dragging to a different section.

Problem: Editing controls (edit, add, move) makes the view feel busy and crowded, which impedes usability. Dragging fields is problematic (sections may be off screen requiring scrolling while dragging which is hard to do reliably, user may not know which direction to drag a field to).

Goal: Tap on a field to show an edit screen for that field. Show a picker with field types. Customise the view to accommodate the data being edited (allow multilines for addresses, disallow multiline for phone numbers and email).

Normalize image orientation

Image orientation metadata is not used when rendering annotation overlays. The image should be rendered to remove the orientation, or the annotations should be rendered using orientation.

Existing information is overwritten when scanning

  1. Create a document.
  2. Add fields to the document by scanning, or manual entry.
  3. Tap scan button.
  4. Existing fields are removed and replaced with scanned information.

Expected:

  1. App should prompt user before overwriting information.

Context aware actions

Actions which can be performed on any field:

  • Copy
  • Share
  • Delete

Define abstract interface to be implemented by model objects. Interface should define the actions which the object can perform.

Define abstract interface for actions. Actions do not have state. An action is simply an interface to a task which can be executed. Actions may need to be aware of the view hierarchy (i.e. view controller) to present UI. Do actions need to notify the application on completion? An action may be shown as a table view action (delete), or as an activity. Actions may need to define a presentation intent.

Google Vision Api Key

Hello I'm getting an issue in this code .
The GoogleVisionApi key is expired, So what can i do .

Show type selector when adding a new field

  • The data type must be selected before a new field entity is created.
  • The UI should reflect the type of data being created.
  • The data should be checked for conformance when the data is entered.
  • UI features should match the data being entered.

Types of data:

  • Person (Faces, Names, Roles, Departments)
  • Organisation (Name, Logo)
  • Phone number
  • Email
  • URL
  • Network account (service, account name, public URL)

Improve organisation name detection

Possible solutions:

  1. Use lexical analysis to determine use of common and possessive nouns to determine if a name is an organisation (false positives such as "Bill Hammer", won't work for acronyms "NASA").
  2. Use context: relative text coverage (big names are probably organisations), names near logos are probably organisations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.