wicg / shape-detection-api Goto Github PK

Detection of shapes (faces, QR codes) in images

Home Page: https://wicg.github.io/shape-detection-api

License: Other

Bikeshed 100.00%

shape-detection-api's Introduction

Shape Detection API Specification 🌠🎥

This is the repository for shape-detection-api, an experimental API for detecting Shapes (e.g. Faces, Barcodes, Text) in live or still images on the Web by using accelerated hardware/OS resources.

You're welcome to contribute! Let's make the Web rock our socks off!

Introduction 📘

Photos and images constitute the largest chunk of the Web, and many include recognisable features, such as human faces, text or QR codes. Detecting these features is computationally expensive, but would lead to interesting use cases e.g. face tagging or detection of high saliency areas. Users interacting with WebCams or other Video Capture Devices have become accustomed to camera-like features such as the ability to focus directly on human faces on the screen of their devices. This is particularly true in the case of mobile devices, where hardware manufacturers have long been supporting these features. Unfortunately, Web Apps do not yet have access to these hardware capabilities, which makes the use of computationally demanding libraries necessary.

Use cases 📷

QR/barcode/text detection can be used for:

user identification/registration, e.g. for voting purposes;
eCommerce, e.g. Walmart Pay;
Augmented Reality overlay, e.g. here;
Driving online-to-offline engagement, fighting fakes etc.

Face detection can be used for:

producing fun effects, e.g. Snapchat Lenses;
giving hints to encoders or auto focus routines;
user name tagging;
enhance accesibility by e.g. making objects appear larger as the user gets closer like HeadTrackr;
speeding up Face Recognition by indicating the areas of the image where faces are present.

Current Related Efforts and Workarounds 🔧

Some Web Apps -gasp- run Detection in Javascript. A performance comparison of some such libraries can be found here (note that this performance evaluation does not include e.g. WebCam image acquisition and/or canvas interactions).

Samsung Browser has a private API (click to unfold "Overview for Android", then search for "QR code reader").

TODO: compare a few JS/native libraries in terms of size and performance. A performance and detection comparison of some popular JS QR code scanners can be found here. zxingjs2 has a list of some additional JS libraries.

Android Native Apps usually integrate ZXing (which amounts to adding ~560KB when counting core.jar, android-core.jar and android-integration.jar)).

OCR reader in Javascript are north of 1MB of size ()

Potential for misuse 💸

Face Detection is an expensive operation due to the algorithmic complexity. Many requests, or demanding systems like a live stream feed with a certain frame rate, could slow down the whole system or greatly increase power consumption.

Platform specific implementation notes 💻

Overview

What platforms support what detector?

Encoder	Mac	Android	Win10	Linux	ChromeOs
Face	sw	hw/sw	sw	✘	✘
QR/Barcode	sw	sw	✘	✘	✘
Text	sw	sw	sw	✘	✘

Android

Android provides both a stand alone software face detector and a interface to the hardware ones.

API	uses...	Release notes
FaceDetector	Software based using the Neven face detector	API Level 1, 2008
Vision.Face	Software based	Google Play services 7.2, Aug 2015
Camera2	Hardware	API Level 21/Lollipop, 2014
Camera.Face (old)	Hardware	API Level 14/Ice Cream Sandwich, 2011

The availability of the actual hardware detection depends on the actual chip; according to the market share in 1H 2016 Qualcomm, MediaTek, Samsung and HiSilicon are the largest individual OEMs and they all have support for Face Detection (all the top-10 phones are covered as well):

Qualcomm Snapdragon chipset family supports it since ~2013 as part of their ISP.
MediaTek as part of CorePilot 2.0 (introduced in 2015).
Samsung Exynos (at least 2013).
Huawei HiSilicon Kirin950 since 2015 (this fabless manufacturer is relatively new).
It is worth noting that ARM acquired Apical in 2016 for its computer vision expertise.

Barcode/QR and Text detection is available via Google Play Services barcode and text, respectively.

Mac OS X / iOS

Mac OS X/iOS provides CIDetector and Vision Framework for Face, QR, Text and Rectangle detection in software or hardware.

API	uses...	Release notes
Vision Framework, Mac OS X	Software and Hardware	OS X v10.13, 2017
Vision Framework, iOS	Software and Hardware	IOS X v11.0, 2017
CIDetector, Mac OS X	Software	OS X v10.7, 2011
CIDetector, iOS	Software	iOS v5.0, 2011
AVFoundation	Hardware	iOS 6.0, 2012

Apple has supported Face Detection in hardware since the Apple A5 processor introduced in 2011.

Windows

Windows 10 has a FaceDetector class and support for Text Detection OCR.

Rendered URL 📑

The rendered version of this site can be found in https://wicg.github.io/shape-detection-api (if that's not alive for some reason try the rawgit rendering).

Examples and demos

https://wicg.github.io/shape-detection-api/#examples

Notes on bikeshedding 🚴

To compile, run:

curl https://api.csswg.org/bikeshed/ -F [email protected] -F force=1 > index.html

if the produced file has a strange size (i.e. zero), then something went terribly wrong; run instead

curl https://api.csswg.org/bikeshed/ -F [email protected] -F output=err

and try to figure out why bikeshed did not like the .bs :'(

shape-detection-api's People

Contributors

Stargazers

Watchers

shape-detection-api's Issues

Support QR code detection

iOS

CIQRCodeFeature

Android

com.google.android.gms.vision.barcode.Barcode: not only QR code but also 1D bar code

Uncaught SyntaxError: await is only valid in async function

Using a code block from Shape Detection API notes (https://developers.google.com/web/updates/2019/01/shape-detection) inside a Polymer 2.x application shell, I get the following error:

Uncaught SyntaxError: await is only valid in async function

ready() {
  super.ready();
  this.initImageDetection();
}

initImageDetection() {
  const barcodeDetector = new BarcodeDetector({
    formats: [
      'code_128'
    ]
  });
  try {
    const barcodes = await barcodeDetector.detect(image);
    barcodes.forEach(barcode => console.log(barcode));
  } catch (e) {
    console.error('Barcode detection failed:', e);
  }
}

Is the Shape Detection API compatible with Polymer 2.x?

Is the Shape Detection API compatible with Chrome on macOS?

maxDetectedFaces not working, what about fastMode?

Hello.
I've been playing with the api. I am about to try to utilize it in a small desktop app, which is already a bit heavy. Although limiting the number of faces detected won't matter in my case, it came strange to me that this option doesn't work. Here are the results of the simple test I made:

one face:

24 faces(all detected, although only 2 were requested):

Actually, reading the draft kind of left me with the impression that those two options(along with fastMode) had never been implemented, but I just wanted to share it here, so that it's clear for anyone else :)
Always feel free to correct me!

Cheers,
Kostadin

Android API FaceDetector bad link

current link in master: https://developer.android.com/reference/android/media/FaceDetector
proper link: https://developer.android.com/reference/android/media/FaceDetector.html

Spec needs privacy and security section

Brought up by [email protected], we should add something similar to the canvas element security model.

Browser compatability

Reading through the issues it looks like Shape Detection API is software compatible on macOS and Android.

However when trying to run the demo at https://shape-detection-demo.glitch.me/

I get an error in Chrome Version 71.0.3578.98 (Official Build) (64-bit):

And also on Chrome Version 72.0.3626.105 on Android P:

What is the availability of this technology for use in PWAs?

`ImageBitmapSource` is already defined and should not refer to other `typedef`s

Section 2.1 - Image sources for detection defines ImageBitmapSource as being

typedef (CanvasImageSource or
         Blob or
         ImageData) ImageBitmapSource;

Two problems: ImageBitmapSource is already defined (in https://html.spec.whatwg.org/multipage/webappapis.html#imagebitmapsource), and WebIdl typedefs cannot refer to each other (https://heycam.github.io/webidl/#dfn-typedef)

The Type must not identify the same or another typedef.

Support text detection

iOS

CITextFeature: only supports bounding box

Android

com.google.android.gms.vision.text.TextRecognizer: supports OCR

Replace supportedFormats with static getSupportedFormats() method

The problem with the current supportedFormats attribute are two-fold,

The property can be read synchronously which is incompatible with modern user agent designs which run script in a sandboxed process and so would have to cache the value of this attribute at startup in order to provide its value synchronously or pause script execution until it can be fetched asynchronously.
While a developer can select the barcode formats they would like to use through BarcodeDetectorOptions they must first create a new BarcodeDetector in order to get access to this property.

This attribute should instead be (a) static and (b) a method returning a Promise.

Support face landmarks detection

Face landmarks detection capability of native platform:

iOS

CIFaceFeature: landmarks feature supports left eye, right eye, mouth

Android

FaceDetector.Face: landmarks feature supports left eye and right eye position (with eyes distance and middle point)
com.google.android.gms.vision.face.Face: landmarks feature supports mouth, cheek, ear, eye, nose

RealSense SDK for Windows

Face Module: landmarks detection feature supports 77 points of eye, eyebrow, nose, mouth and cheek

Use FrozenArray for attributes

The WebIDL specification forbids the use of sequences in attributes.

The draft doesn't define at all what detection does

It doesn't link to other spec nor does it define itself what is supposed to happen when one calls detect()

Document real use cases

We need to document use cases somewhere.
This was a great one I saw today: https://twitter.com/RegistertoVote/status/733123511128981508

Add binary size information to explainer (README.md)

Consider adding information about native-apps alternatives for shape detection, e.g. ZXIng in Android, and how they fare binary-size wise with the OS/Hw alternatives and with the JS versions.

Is there a running implementation ?

i would like to experiment with the UI. i write stuff with ar.js and tracking.js. This API could be usefull with those javascript library.

is there a way a running implementation of the shape-detection-api API ?

https://github.com/jeromeetienne/ar.js
https://github.com/eduardolundgren/tracking.js

DetectedBarcode.rawValue could be binary data.

DetectedBarcode.rawValue is defined as a DOMString, whereas depending on the barcode itself, it could be binary. Adapt the Spec to reflect that.

Landmark should be an interface, not a dictionary

According to the WebIDL spec dictionaries cannot be attributes.

Support face tracking in live streams

Use Cases

Live video feeds would like to identify faces in a picture/video as highly salient areas to e.g. give hints to image or video encoders. (from https://discourse.wicg.io/t/rfc-proposal-for-face-detection-api/1642)
Head tracking for video or VR (from https://discourse.wicg.io/t/rfc-proposal-for-face-detection-api/1642/6)
Face tracking based real-time human-computer interaction.

Platform specific implementation notes

Android

Camera2 CaptureRequest
Event-Driven Pipeline for Face Tracking

iOS

Tracking Faces in Video

Rough sketch of `MediaStreamFaceDetector`

[Constructor(MediaStream stream, optional FaceDetectorOptions faceDetectorOptions)]
interface MediaStreamFaceDetector: EventTarget {
    readonly        attribute MediaStream       stream;
                    attribute EventHandler      onerror;
                    attribute EventHandler      onfacedetected;
};

[Constructor(DOMString type, FaceDetectionEventInit eventInitDict)]
interface FaceDetectionEvent : Event {
  readonly attribute <sequence<DetectedFace>> faces;
};

dictionary FaceDetectionEventInit : EventInit {
  <sequence<DetectedFace>> faces;
};

Usage

navigator.mediaDevices.getUserMedia(constraints).then(stream => {
  var msfd = new MediaStreamFaceDetector(stream, {fastMode: true, maxDetectedFaces: 1});

  msfd.onfacedetected = function(event) {
    for (const face of event.faces) {
      console.log(`Face ${face.id} detected at (${face.x}, ${face.y}) with size ${face.width}x${face.height}`);
    }
  }
});

Document fingerprinting concerns and mitigations

This has been raised on mozilla/standards-positions#21.

Make Text detection parts non-normative

The TAG review raised the point (in w3ctag/design-reviews#176 (comment)):

Yes. To move the spec forward sooner - I would actually want to propose moving text out to a separate spec or a next major revision of the standard, both to reduce implementor workload for conformance and to ship a better standard.

Optional interpretation of GS1 barcode

GS1, the standards organization behind the identifiers encoded as barcodes throughout the global supply chain, most famously the barcodes that go beep at the checkout, is (perhaps belatedly) working to bring its system to the Web. In August, we published a new standard (PDF, sorry) encoding any valid combination of GS1 identifiers in an HTTP URI and we're now working on a bunch of issues around defining resolver services, semantics and a compression algorithm. The latter is important as if you encode a GTIN (the high level product identifier that goes beep at the checkout) and the batch/lot number, a serial number and an expiry date into a URI it ends up being way too long to fit reliably in a QR code. So we need to reduce the number of characters but can't require an online look up in a critical system. So bit.ly and friends are not an answer.

One of the aims of the GS1 work is to (over time) make it possible for a URL in a QR code to be used in the same way the familiar 1D barcode is used today. The one code therefore does the job of providing a consumer-facing link and a supply chain ID. Another aim is to enable all GS1 barcodes, whatever the symbology, to act as a gateway to the Web. The new standard - which just defines a syntax that at its most basic is just https://example.com/gtin/{gtin} - is getting a lot of attention from major brands and retailers and I am hopeful of substantial adoption.

Anyway... lots more to say about this if people want me to but the essence of the issue here is whether is not the developer community would see value in augmenting the API to work directly with the dominant global product identification system by including:

the compression/decompression algorithm (currently being defined and expected around Q1 2019);
a simple syntax for sending a GTIN to a GS1 conformant resolver service, i.e. if the barcode reader returns 614141123452 and a consumer product variant of '2A' (think of a Coke bottle with the name 'Tim' on it as opposed to 'Martha'), the API would automatically generate the correctly formatted URL of {givenStub}/gtin/614141123452/cpv/2A.

If I may push a little further, we might even have a default resolver service of id.gs1.org, see - you guessed it - https://id.gs1.org/gtin/614141123452/cpv/2A.

Of course, all this can be done by including the separate code library we're working on, and not making it part of the standard API, but it's a lot simpler if things like this work out of the box.

Use case: planar surface detection

Recently, Apple showed off some demos of their new ARKit SDK, which apparently features planar surface detection without requiring a marker. A video of one of the demos can be found here.

Shape Detection API indirectly supports planar surface detection with the use of a marker (via BarcodeScanner). However, planar surface detection without requiring a marker has some obvious user experience advantages. Is such detection is worth considering for inclusion in the shape detection API?

QR/Barcode on Windows 10

Barcode detection on Windows 10 is currently unsupported. Is there anything planned?

Restrict to secure contexts

While this API does not provide access to information not otherwise available to the page this API is likely to be processing privacy-sensitive information (such as images taken with the user's camera) that should not be leaked over an insecure connection and so this feature should be limited to secure contexts.

@tomayac

Include orientation angles for detected faces

The Android face detection API includes orientation angles for detected faces. Similarly, the Google Cloud Vision API includes roll, pan and tilt angles for detected faces. It would be useful to have these details for detected faces if they are made available by the underlying system API. In cases where the author intends to decorate detected faces with art work, these angles can help the author to make the decorations more adaptive to the scene.

Need `getSupportedLandmarks` function for FaceDetector

Similar to supportedFormats in BarcodeDetector, I would like to motivate a similar feature tentatively named getSupportedLandmarks for the FaceDetector to communicate to the developer whether or not landmarks like eyes, mouth, nose, etc. are detectable; and if so, which of them.

As motivated in #54, it should actually be a static method.

confidence factor in detected result

I havn't found any way to estimate the accuracy of the result provided by the API. It may be interesting for the application to have a confidence factor from the underlying technology.

the browser tell me there is a face here, but is it sure at 50% or 99% ? this may be useful for the application to know this information.

usecase 1: for example the app previously detected a face close to this location in a previous frame, so if the new detection is 50%, it is good enougth.

usecase 2: the application is doing a initial detection wihtout previous knowledge. So the API has to be real sure. So, for example, the app would require a 90% confidence.

Potentially support platform-specific formats in BarcodeFormat enum

Currently, the specification provides "unknown" as a valid BarcodeFormat [1]. However, it doesn't make sense for a user to hint "unknown" in a BarcodeDetectorOptions, and returning "unknown" in getSupportedFormats doesn't (currently) make sense either.

One possibility proposed by @reillyeon is to convert "unknown" to "platform-specific", i.e. encompassing platform-supported formats unknown to the spec.

This could be passed back to the user, in getSupportedFormats or elsewhere, to indicate the platform supports more formats than expressed in the spec, and could be used by the user to include platform-specific formats in the hint in addition to spec-known formats.

[1] https://wicg.github.io/shape-detection-api/#dom-barcodeformat-unknown

Consider a version without constructors and classes

The FaceDetector and BarcodeDetector classes seem unwieldy compared to simple function calls, e.g. navigator.detectFaces(source, options). Why do they exist? What state do they store that is so heavyweight it needs a potentially long-lived class?

Bikeshed/spec structure tips and nits

Bikeshed optimizations:

Replace <pre class="idl"> with <xmp class=idl> and then <-style escaping can be dropped
Add Markup Shorthands: markdown yes to the metadata section, and then you can use:
- : and :: as shortcuts for <dt> and <dd>
- backticks instead of <code>
- ```js ... ``` to surround code examples, no need for <div class=...><pre>

The spec is using "domintro", a concept introduced elsewhere, for non-normative sections that explain to developers how to use the feature:

It doesn't contain style definitions for these (which aren't present in the WICG template); those need to be copied from elsewhere for now. They should have a green/non-normative background.
<dfn> (normative) should not be occurring inside domintro (non-normative) sections.

face detector fastMode as boolean may be too binary ?

fastMode is a boolean which "Hint to the UA to try and prioritise speed over accuracy by e.g. operating on a reduced scale or looking for large features." - link

So it is a way for the user to express a preference about speed more than accuracy. I dont know if boolean isnt a bit binary. (pun on purpose :) )

One user may prefere speed over accuracy but just a bit, not all the way.

ps: It is just a thought about API flexibility. not that important

Set up automatic specification builds with Travis CI

Follow the steps at https://github.com/rtoy/auto-deploy-spec to set up automatic Bikeshed builds of this specification.

No Log

I have a canvas that gets populated with "image" data for lack of a better term, and am trying to run a barcode scan on it.

<canvas id="pic"></canvas>
<paper-button on-tap="scanBarcode"></paper-button

...

  async scanBarcode() {
    const barcodeDetector = new BarcodeDetector({
      formats: [
        'code_128',
      ]
    });
    try {
        const barcodes = await barcodeDetector.detect(this.$.pic);
        barcodes.forEach(barcode => console.log(barcode));
    } catch (e) {
      console.error('Barcode detection failed:', e);
    }
  }

Using this.$.pic nothing is logged to the console (I would expect null, undefined, or a result.

When I try this.$.pic.toDataURL("image/png") I get the following error:

home.html:168 Barcode detection failed: TypeError: Failed to execute 'detect' on 'BarcodeDetector': The provided value is not of type '(HTMLImageElement or SVGImageElement or HTMLVideoElement or HTMLCanvasElement or Blob or ImageData or ImageBitmap or OffscreenCanvas)'

What data type or DOM attribute must be passed into barcodeDetector.detect() to parse the image?

I'm not sure what form HTMLImageElement or HTMLCanvasElement take.

Add language(s) hint to TextDetector's detect function

OCR libraries like Tesseract tend to work better if the input language is known. I would thus suggest to add a language(s) hint option to the detect function of the TextDetector.

browser support question

Hi
The Shape Detection API can be used in chrome since chrome 70.

Does anyone know if there is any project to make it work in other browser like mozilla or edge ?

interface DetectedBarcode should contain encoding

Given the idea that a frame may contain multiple, different barcode encoding during a detect(), it would be useful to have the the interface return format (similar to Android: https://developers.google.com/android/reference/com/google/android/gms/vision/barcode/Barcode.html#format), resulting in:

interface DetectedBarcode {
  [SameObject] readonly attribute DOMRectReadOnly boundingBox;
  [SameObject] readonly attribute DOMString rawValue;
  [SameObject] readonly attribute DOMString format;
  [SameObject] readonly attribute FrozenArray<Point2D> cornerPoints;
};

Use Case

We have a series of various types of barcodes used for inventory tracking and shipping that are basically in the same processed frame. While we can run our own checks based on rawValue, it would be more consistent with other APIs like Android to simply return the detected format.

Bounding box is insufficient for AR marker use case

The spec as it is currently written uses a DOMRect (with x, y, width, and height properties) for describing the boundingBox of a detected QR code.

QR code detection needs to be exposed in terms of all 4 independent corners of the code (which will likely form a non-square quadrilateral, from which rotation and perspective can be determined) if a QR code is going to be recognized in the frame for overlay purposes (example: https://stuartpb.github.io/quirc.js/test_webrtc.html) and used as a marker (as described on http://www.multidots.com/augmented-reality/, a page linked in the current spec).

Windows 10, Google Chrome, BoundingBox values are all 0.

Text detector api
Both detects fine, But some of values not provided
Windows 10, Google Chrome, BoundingBox values are all 0.
Mac, Google Chrome, RawValues ""

Please tell me, What version of chrome and OS does it work?

Language around BarcodeDetectorOptions is confused

https://wicg.github.io/shape-detection-api/#dom-barcodedetector-barcodedetector says:

If barcodeDetectorOptions is passed, and its formats are empty

but barcodeDetectorOptions is always passed, since it's a dictionary in trailing position: Web IDL specifies that those always have a default value.

At the same time, the "formats" member of the dictionary may not be present, but the spec doesn't handle that case. It needs to, unless that member is marked required or given a default value.

Text detection should be removed or split out

I noticed that the text detection section was marked non-normative in it's entirety, which is a bit unorthodox - is this a feature that is considered essential?

This would be problematic when progressing to rec. I'd be happy to sit down and discuss a way forward on this.

TAG review

Counterpart for w3ctag/design-reviews#176

Relevant documents:

Move to gh-pages branch

So that https://wicg.github.io/shape-detection-api works

Selecting barcode formats

Currently BarcodeDetector searches for every supported barcode format. To improve performance it would be nice if you could select only the formats you need.

cornerPoints in Text Detection API ?

in the text detection section, i don't see the cornerPoints. is that on purpose ?

cornerPoints may be quite useful for image processing. For example we could track the blob in 2d on the subsequent images. possibly some pose estimation, if the size of the text image is known.

Link to the QR code specification

https://wicg.github.io/shape-detection-api/#barcode-detection-api says "QR code" but doesn't define what that means. Is ISO/IEC 18004:2015 the right spec? Wikipedia mentions "QR code models 1 and 2", which may not be covered by the 2015 spec; does the recognizer identify those?

Feature detection for supported LandmarkTypes

As with supportedFormats on the BarcodeDetector object the FaceDetector object should have a supportedLandmarks property for feature detecting which landmark types can be detected by the implementation.

Potentially need `getSupportedLanguages` function for TextDetector

While text recognition (in the sense of "there is text within this bounding box" as in iOS) doesn't need language hints or return a detected language, true OCR (in the sense of "there is text, and this is what it spells" as in Tesseract) typically will offer best effort results for unknown languages, but activate special models if the language is known for improved results.

This motivates having the option for obtaining a list of supported languages by the UA's underlying implementation, tentatively named getSupportedLanguages, which should be a static method (as illustrated in #54).

add shape type specific attributes to specific detected object interface, like landmarks of face or url of barcode.
not need ShapeType enum anymore.

I understand one specific shape detector should only detect one specific shape. So it doesn't need to check type in detect promise.

[NoInterfaceObject, exposed=Window,Worker]
interface DetectedObject {
   readonly attribute DOMRect boundingBox
}

interface DetectedFace {
    // readonly attribute unsigned long id;
    // readonly attribute sequence<Landmark>? landmarks;
}

DetectedFace implements DetectedObject;

// DetectedBarcode, DetectedText

wicg / shape-detection-api Goto Github PK

shape-detection-api's Introduction

Shape Detection API Specification 🌠🎥

Introduction 📘

Use cases 📷

Current Related Efforts and Workarounds 🔧

Potential for misuse 💸

Platform specific implementation notes 💻

Overview

Android

Mac OS X / iOS

Windows

Rendered URL 📑

Examples and demos

Notes on bikeshedding 🚴

shape-detection-api's People

Contributors

Stargazers

Watchers

Forkers

shape-detection-api's Issues

iOS

Android

iOS

Android

iOS

Android

RealSense SDK for Windows

Use Cases

Platform specific implementation notes

Android

iOS

Rough sketch of MediaStreamFaceDetector

Usage

Use Case

Recommend Projects

Recommend Topics

Recommend Org

Rough sketch of `MediaStreamFaceDetector`