Giter Site home page Giter Site logo

shape-detection-api's Introduction

Shape Detection API Specification ๐ŸŒ ๐ŸŽฅ

This is the repository for shape-detection-api, an experimental API for detecting Shapes (e.g. Faces, Barcodes, Text) in live or still images on the Web by using accelerated hardware/OS resources.

You're welcome to contribute! Let's make the Web rock our socks off!

Introduction ๐Ÿ“˜

Photos and images constitute the largest chunk of the Web, and many include recognisable features, such as human faces, text or QR codes. Detecting these features is computationally expensive, but would lead to interesting use cases e.g. face tagging or detection of high saliency areas. Users interacting with WebCams or other Video Capture Devices have become accustomed to camera-like features such as the ability to focus directly on human faces on the screen of their devices. This is particularly true in the case of mobile devices, where hardware manufacturers have long been supporting these features. Unfortunately, Web Apps do not yet have access to these hardware capabilities, which makes the use of computationally demanding libraries necessary.

Use cases ๐Ÿ“ท

QR/barcode/text detection can be used for:

  • user identification/registration, e.g. for voting purposes;
  • eCommerce, e.g. Walmart Pay;
  • Augmented Reality overlay, e.g. here;
  • Driving online-to-offline engagement, fighting fakes etc.

Face detection can be used for:

  • producing fun effects, e.g. Snapchat Lenses;
  • giving hints to encoders or auto focus routines;
  • user name tagging;
  • enhance accesibility by e.g. making objects appear larger as the user gets closer like HeadTrackr;
  • speeding up Face Recognition by indicating the areas of the image where faces are present.

Current Related Efforts and Workarounds ๐Ÿ”ง

Some Web Apps -gasp- run Detection in Javascript. A performance comparison of some such libraries can be found here (note that this performance evaluation does not include e.g. WebCam image acquisition and/or canvas interactions).

Samsung Browser has a private API (click to unfold "Overview for Android", then search for "QR code reader").

TODO: compare a few JS/native libraries in terms of size and performance. A performance and detection comparison of some popular JS QR code scanners can be found here. zxingjs2 has a list of some additional JS libraries.

Android Native Apps usually integrate ZXing (which amounts to adding ~560KB when counting core.jar, android-core.jar and android-integration.jar)).

OCR reader in Javascript are north of 1MB of size ()

Potential for misuse ๐Ÿ’ธ

Face Detection is an expensive operation due to the algorithmic complexity. Many requests, or demanding systems like a live stream feed with a certain frame rate, could slow down the whole system or greatly increase power consumption.

Platform specific implementation notes ๐Ÿ’ป

Overview

What platforms support what detector?

Encoder Mac Android Win10 Linux ChromeOs
Face sw hw/sw sw โœ˜ โœ˜
QR/Barcode sw sw โœ˜ โœ˜ โœ˜
Text sw sw sw โœ˜ โœ˜

Android

Android provides both a stand alone software face detector and a interface to the hardware ones.

API uses... Release notes
FaceDetector Software based using the Neven face detector API Level 1, 2008
Vision.Face Software based Google Play services 7.2, Aug 2015
Camera2 Hardware API Level 21/Lollipop, 2014
Camera.Face (old) Hardware API Level 14/Ice Cream Sandwich, 2011

The availability of the actual hardware detection depends on the actual chip; according to the market share in 1H 2016 Qualcomm, MediaTek, Samsung and HiSilicon are the largest individual OEMs and they all have support for Face Detection (all the top-10 phones are covered as well):

Barcode/QR and Text detection is available via Google Play Services barcode and text, respectively.

Mac OS X / iOS

Mac OS X/iOS provides CIDetector and Vision Framework for Face, QR, Text and Rectangle detection in software or hardware.

API uses... Release notes
Vision Framework, Mac OS X Software and Hardware OS X v10.13, 2017
Vision Framework, iOS Software and Hardware IOS X v11.0, 2017
CIDetector, Mac OS X Software OS X v10.7, 2011
CIDetector, iOS Software iOS v5.0, 2011
AVFoundation Hardware iOS 6.0, 2012

Apple has supported Face Detection in hardware since the Apple A5 processor introduced in 2011.

Windows

Windows 10 has a FaceDetector class and support for Text Detection OCR.

Rendered URL ๐Ÿ“‘

The rendered version of this site can be found in https://wicg.github.io/shape-detection-api (if that's not alive for some reason try the rawgit rendering).

Examples and demos

https://wicg.github.io/shape-detection-api/#examples

Notes on bikeshedding ๐Ÿšด

To compile, run:

curl https://api.csswg.org/bikeshed/ -F [email protected] -F force=1 > index.html

if the produced file has a strange size (i.e. zero), then something went terribly wrong; run instead

curl https://api.csswg.org/bikeshed/ -F [email protected] -F output=err

and try to figure out why bikeshed did not like the .bs :'(

shape-detection-api's People

Contributors

autokagami avatar beaufortfrancois avatar codeimpl avatar danimoh avatar dontcallmedom avatar fujunwei avatar huningxin avatar jchinlee avatar marcoscaceres avatar nadia-mint avatar neotan avatar reillyeon avatar saschanaz avatar scheib avatar yellowdoge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

shape-detection-api's Issues

Uncaught SyntaxError: await is only valid in async function

Using a code block from Shape Detection API notes (https://developers.google.com/web/updates/2019/01/shape-detection) inside a Polymer 2.x application shell, I get the following error:

Uncaught SyntaxError: await is only valid in async function

ready() {
  super.ready();
  this.initImageDetection();
}

initImageDetection() {
  const barcodeDetector = new BarcodeDetector({
    formats: [
      'code_128'
    ]
  });
  try {
    const barcodes = await barcodeDetector.detect(image);
    barcodes.forEach(barcode => console.log(barcode));
  } catch (e) {
    console.error('Barcode detection failed:', e);
  }
}

Is the Shape Detection API compatible with Polymer 2.x?

Is the Shape Detection API compatible with Chrome on macOS?

maxDetectedFaces not working, what about fastMode?

Hello.
I've been playing with the api. I am about to try to utilize it in a small desktop app, which is already a bit heavy. Although limiting the number of faces detected won't matter in my case, it came strange to me that this option doesn't work. Here are the results of the simple test I made:

one face:
one_face

24 faces(all detected, although only 2 were requested):
24_faces

Actually, reading the draft kind of left me with the impression that those two options(along with fastMode) had never been implemented, but I just wanted to share it here, so that it's clear for anyone else :)
Always feel free to correct me!

Cheers,
Kostadin

Browser compatability

Reading through the issues it looks like Shape Detection API is software compatible on macOS and Android.

screen shot 2019-02-14 at 11 30 47 am

However when trying to run the demo at https://shape-detection-demo.glitch.me/

I get an error in Chrome Version 71.0.3578.98 (Official Build) (64-bit):

screen shot 2019-02-14 at 11 35 35 am

And also on Chrome Version 72.0.3626.105 on Android P:

0

What is the availability of this technology for use in PWAs?

`ImageBitmapSource` is already defined and should not refer to other `typedef`s

Section 2.1 - Image sources for detection defines ImageBitmapSource as being

typedef (CanvasImageSource or
         Blob or
         ImageData) ImageBitmapSource;

Two problems: ImageBitmapSource is already defined (in https://html.spec.whatwg.org/multipage/webappapis.html#imagebitmapsource), and WebIdl typedefs cannot refer to each other (https://heycam.github.io/webidl/#dfn-typedef)

The Type must not identify the same or another typedef.

Replace supportedFormats with static getSupportedFormats() method

The problem with the current supportedFormats attribute are two-fold,

  1. The property can be read synchronously which is incompatible with modern user agent designs which run script in a sandboxed process and so would have to cache the value of this attribute at startup in order to provide its value synchronously or pause script execution until it can be fetched asynchronously.
  2. While a developer can select the barcode formats they would like to use through BarcodeDetectorOptions they must first create a new BarcodeDetector in order to get access to this property.

This attribute should instead be (a) static and (b) a method returning a Promise.

Support face landmarks detection

Face landmarks detection capability of native platform:

iOS

CIFaceFeature: landmarks feature supports left eye, right eye, mouth

Android

FaceDetector.Face: landmarks feature supports left eye and right eye position (with eyes distance and middle point)
com.google.android.gms.vision.face.Face: landmarks feature supports mouth, cheek, ear, eye, nose

RealSense SDK for Windows

Face Module: landmarks detection feature supports 77 points of eye, eyebrow, nose, mouth and cheek

Support face tracking in live streams

Use Cases

Platform specific implementation notes

Android

Camera2 CaptureRequest
Event-Driven Pipeline for Face Tracking

iOS

Tracking Faces in Video

Rough sketch of MediaStreamFaceDetector

[Constructor(MediaStream stream, optional FaceDetectorOptions faceDetectorOptions)]
interface MediaStreamFaceDetector: EventTarget {
    readonly        attribute MediaStream       stream;
                    attribute EventHandler      onerror;
                    attribute EventHandler      onfacedetected;
};

[Constructor(DOMString type, FaceDetectionEventInit eventInitDict)]
interface FaceDetectionEvent : Event {
  readonly attribute <sequence<DetectedFace>> faces;
};

dictionary FaceDetectionEventInit : EventInit {
  <sequence<DetectedFace>> faces;
};

Usage

navigator.mediaDevices.getUserMedia(constraints).then(stream => {
  var msfd = new MediaStreamFaceDetector(stream, {fastMode: true, maxDetectedFaces: 1});

  msfd.onfacedetected = function(event) {
    for (const face of event.faces) {
      console.log(`Face ${face.id} detected at (${face.x}, ${face.y}) with size ${face.width}x${face.height}`);
    }
  }
});

Optional interpretation of GS1 barcode

GS1, the standards organization behind the identifiers encoded as barcodes throughout the global supply chain, most famously the barcodes that go beep at the checkout, is (perhaps belatedly) working to bring its system to the Web. In August, we published a new standard (PDF, sorry) encoding any valid combination of GS1 identifiers in an HTTP URI and we're now working on a bunch of issues around defining resolver services, semantics and a compression algorithm. The latter is important as if you encode a GTIN (the high level product identifier that goes beep at the checkout) and the batch/lot number, a serial number and an expiry date into a URI it ends up being way too long to fit reliably in a QR code. So we need to reduce the number of characters but can't require an online look up in a critical system. So bit.ly and friends are not an answer.

One of the aims of the GS1 work is to (over time) make it possible for a URL in a QR code to be used in the same way the familiar 1D barcode is used today. The one code therefore does the job of providing a consumer-facing link and a supply chain ID. Another aim is to enable all GS1 barcodes, whatever the symbology, to act as a gateway to the Web. The new standard - which just defines a syntax that at its most basic is just https://example.com/gtin/{gtin} - is getting a lot of attention from major brands and retailers and I am hopeful of substantial adoption.

Anyway... lots more to say about this if people want me to but the essence of the issue here is whether is not the developer community would see value in augmenting the API to work directly with the dominant global product identification system by including:

  1. the compression/decompression algorithm (currently being defined and expected around Q1 2019);
  2. a simple syntax for sending a GTIN to a GS1 conformant resolver service, i.e. if the barcode reader returns 614141123452 and a consumer product variant of '2A' (think of a Coke bottle with the name 'Tim' on it as opposed to 'Martha'), the API would automatically generate the correctly formatted URL of {givenStub}/gtin/614141123452/cpv/2A.

If I may push a little further, we might even have a default resolver service of id.gs1.org, see - you guessed it - https://id.gs1.org/gtin/614141123452/cpv/2A.

Of course, all this can be done by including the separate code library we're working on, and not making it part of the standard API, but it's a lot simpler if things like this work out of the box.

Use case: planar surface detection

Recently, Apple showed off some demos of their new ARKit SDK, which apparently features planar surface detection without requiring a marker. A video of one of the demos can be found here.

Shape Detection API indirectly supports planar surface detection with the use of a marker (via BarcodeScanner). However, planar surface detection without requiring a marker has some obvious user experience advantages. Is such detection is worth considering for inclusion in the shape detection API?

Restrict to secure contexts

While this API does not provide access to information not otherwise available to the page this API is likely to be processing privacy-sensitive information (such as images taken with the user's camera) that should not be leaked over an insecure connection and so this feature should be limited to secure contexts.

@tomayac

Include orientation angles for detected faces

The Android face detection API includes orientation angles for detected faces. Similarly, the Google Cloud Vision API includes roll, pan and tilt angles for detected faces. It would be useful to have these details for detected faces if they are made available by the underlying system API. In cases where the author intends to decorate detected faces with art work, these angles can help the author to make the decorations more adaptive to the scene.

Need `getSupportedLandmarks` function for FaceDetector

Similar to supportedFormats in BarcodeDetector, I would like to motivate a similar feature tentatively named getSupportedLandmarks for the FaceDetector to communicate to the developer whether or not landmarks like eyes, mouth, nose, etc. are detectable; and if so, which of them.

As motivated in #54, it should actually be a static method.

confidence factor in detected result

I havn't found any way to estimate the accuracy of the result provided by the API. It may be interesting for the application to have a confidence factor from the underlying technology.

the browser tell me there is a face here, but is it sure at 50% or 99% ? this may be useful for the application to know this information.

usecase 1: for example the app previously detected a face close to this location in a previous frame, so if the new detection is 50%, it is good enougth.

usecase 2: the application is doing a initial detection wihtout previous knowledge. So the API has to be real sure. So, for example, the app would require a 90% confidence.

Potentially support platform-specific formats in BarcodeFormat enum

Currently, the specification provides "unknown" as a valid BarcodeFormat [1]. However, it doesn't make sense for a user to hint "unknown" in a BarcodeDetectorOptions, and returning "unknown" in getSupportedFormats doesn't (currently) make sense either.

One possibility proposed by @reillyeon is to convert "unknown" to "platform-specific", i.e. encompassing platform-supported formats unknown to the spec.

This could be passed back to the user, in getSupportedFormats or elsewhere, to indicate the platform supports more formats than expressed in the spec, and could be used by the user to include platform-specific formats in the hint in addition to spec-known formats.

[1] https://wicg.github.io/shape-detection-api/#dom-barcodeformat-unknown

Consider a version without constructors and classes

The FaceDetector and BarcodeDetector classes seem unwieldy compared to simple function calls, e.g. navigator.detectFaces(source, options). Why do they exist? What state do they store that is so heavyweight it needs a potentially long-lived class?

Bikeshed/spec structure tips and nits

Bikeshed optimizations:

  • Replace <pre class="idl"> with <xmp class=idl> and then &lt;-style escaping can be dropped
  • Add Markup Shorthands: markdown yes to the metadata section, and then you can use:
    • : and :: as shortcuts for <dt> and <dd>
    • backticks instead of <code>
    • ```js ... ``` to surround code examples, no need for <div class=...><pre>

The spec is using "domintro", a concept introduced elsewhere, for non-normative sections that explain to developers how to use the feature:

  • It doesn't contain style definitions for these (which aren't present in the WICG template); those need to be copied from elsewhere for now. They should have a green/non-normative background.
  • <dfn> (normative) should not be occurring inside domintro (non-normative) sections.

face detector fastMode as boolean may be too binary ?

fastMode is a boolean which "Hint to the UA to try and prioritise speed over accuracy by e.g. operating on a reduced scale or looking for large features." - link

So it is a way for the user to express a preference about speed more than accuracy. I dont know if boolean isnt a bit binary. (pun on purpose :) )

One user may prefere speed over accuracy but just a bit, not all the way.

ps: It is just a thought about API flexibility. not that important

No Log

I have a canvas that gets populated with "image" data for lack of a better term, and am trying to run a barcode scan on it.

<canvas id="pic"></canvas>
<paper-button on-tap="scanBarcode"></paper-button

...

  async scanBarcode() {
    const barcodeDetector = new BarcodeDetector({
      formats: [
        'code_128',
      ]
    });
    try {
        const barcodes = await barcodeDetector.detect(this.$.pic);
        barcodes.forEach(barcode => console.log(barcode));
    } catch (e) {
      console.error('Barcode detection failed:', e);
    }
  }

Using this.$.pic nothing is logged to the console (I would expect null, undefined, or a result.

When I try this.$.pic.toDataURL("image/png") I get the following error:

home.html:168 Barcode detection failed: TypeError: Failed to execute 'detect' on 'BarcodeDetector': The provided value is not of type '(HTMLImageElement or SVGImageElement or HTMLVideoElement or HTMLCanvasElement or Blob or ImageData or ImageBitmap or OffscreenCanvas)'

What data type or DOM attribute must be passed into barcodeDetector.detect() to parse the image?

I'm not sure what form HTMLImageElement or HTMLCanvasElement take.

browser support question

Hi
The Shape Detection API can be used in chrome since chrome 70.

Does anyone know if there is any project to make it work in other browser like mozilla or edge ?

interface DetectedBarcode should contain encoding

Given the idea that a frame may contain multiple, different barcode encoding during a detect(), it would be useful to have the the interface return format (similar to Android: https://developers.google.com/android/reference/com/google/android/gms/vision/barcode/Barcode.html#format), resulting in:

interface DetectedBarcode {
  [SameObject] readonly attribute DOMRectReadOnly boundingBox;
  [SameObject] readonly attribute DOMString rawValue;
  [SameObject] readonly attribute DOMString format;
  [SameObject] readonly attribute FrozenArray<Point2D> cornerPoints;
};

Use Case

We have a series of various types of barcodes used for inventory tracking and shipping that are basically in the same processed frame. While we can run our own checks based on rawValue, it would be more consistent with other APIs like Android to simply return the detected format.

Bounding box is insufficient for AR marker use case

The spec as it is currently written uses a DOMRect (with x, y, width, and height properties) for describing the boundingBox of a detected QR code.

QR code detection needs to be exposed in terms of all 4 independent corners of the code (which will likely form a non-square quadrilateral, from which rotation and perspective can be determined) if a QR code is going to be recognized in the frame for overlay purposes (example: https://stuartpb.github.io/quirc.js/test_webrtc.html) and used as a marker (as described on http://www.multidots.com/augmented-reality/, a page linked in the current spec).

Language around BarcodeDetectorOptions is confused

https://wicg.github.io/shape-detection-api/#dom-barcodedetector-barcodedetector says:

If barcodeDetectorOptions is passed, and its formats are empty

but barcodeDetectorOptions is always passed, since it's a dictionary in trailing position: Web IDL specifies that those always have a default value.

At the same time, the "formats" member of the dictionary may not be present, but the spec doesn't handle that case. It needs to, unless that member is marked required or given a default value.

Text detection should be removed or split out

I noticed that the text detection section was marked non-normative in it's entirety, which is a bit unorthodox - is this a feature that is considered essential?

This would be problematic when progressing to rec. I'd be happy to sit down and discuss a way forward on this.

Selecting barcode formats

Currently BarcodeDetector searches for every supported barcode format. To improve performance it would be nice if you could select only the formats you need.

cornerPoints in Text Detection API ?

in the text detection section, i don't see the cornerPoints. is that on purpose ?

cornerPoints may be quite useful for image processing. For example we could track the blob in 2d on the subsequent images. possibly some pose estimation, if the size of the text image is known.

Potentially need `getSupportedLanguages` function for TextDetector

While text recognition (in the sense of "there is text within this bounding box" as in iOS) doesn't need language hints or return a detected language, true OCR (in the sense of "there is text, and this is what it spells" as in Tesseract) typically will offer best effort results for unknown languages, but activate special models if the language is known for improved results.

This motivates having the option for obtaining a list of supported languages by the UA's underlying implementation, tentatively named getSupportedLanguages, which should be a static method (as illustrated in #54).

Barcode format hinting is underspecified

The specification should be clearer about how the formats parameter is used by the detect() function to provide a hint to the underlying implementation about what barcode formats should be detected in the image.

Support custom detected objects

As discussed in https://discourse.wicg.io/t/rfc-proposal-for-face-detection-api/1642/20 and https://discourse.wicg.io/t/rfc-proposal-for-face-detection-api/1642/21, we could allow custom detected objects by implementing DetectedObject interface.

In this way, we can

  1. add shape type specific attributes to specific detected object interface, like landmarks of face or url of barcode.
  2. not need ShapeType enum anymore.

I understand one specific shape detector should only detect one specific shape. So it doesn't need to check type in detect promise.

[NoInterfaceObject, exposed=Window,Worker]
interface DetectedObject {
   readonly attribute DOMRect boundingBox
}

interface DetectedFace {
    // readonly attribute unsigned long id;
    // readonly attribute sequence<Landmark>? landmarks;
}

DetectedFace implements DetectedObject;

// DetectedBarcode, DetectedText

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.