Spec draft: https://immersive-web.github.io/raw-camera-access/. Repository for experimentation around exposing raw camera access through WebXR Device API. Feature leads: Piotr Bialecki, Alex Turner, Nicholas Butko
We discussed the current shape of the API at the Immersive Web Working Group Face to Face.
There’s a growing recognition in the working group that a lack of camera access has held back adoption of WebXR compared to alternatives like getUserMedia. The ironic effect of having an artificially limited api is that it pushes people toward less privacy sensitive alternatives. The goal should be to have user consent that is potentially improved from existing flows (whatever that entails) but is powerful enough to empower the gamut of valuable developer use cases.
In order to make the current proposal fully compatible with headsets, the following changes might be needed
frames are not coupled to XR frames, but come on a callback or some other mechanism
frames are timestamped
camera to viewer transform
field of view for camera is provided
Here are some examples of reasons developers choose use alternatives to WebXR because this API is lacking:
I'm preparing to kick off blink launch process for Raw Camera Access API, and one part of the Intent To Ship asks for signals from web developers - please take a look and let me know what you think! If my understanding is correct, we're looking for feedback along the lines of "the API [does / does not] solve my use case with [no / some / major] issues/workarounds needed", but don't feel limited by this formula!
So far the I'm aware of an issue around API ergonomics (see comment) - I think this can be tackled at a later stage, let's get the API out the door first and see what the main pain points are.
I understand there has been TAG feedback regarding the privacy concerns and how we can inform users of the risks.
Where it's difficult to explain to users the potential risks in giving a website unfettered access to their camera as they move around their environment pointing it at possessions and loved ones.
I would like to propose that WebXR features could be split in a way that is similar to position tracking's fine grain coarse grain approach. Where for us coarse grain are the privacy preserving approaches and fine grain are the ones that give more power to the developer at the risk of user's privacy.
This way we can encourage developers to not rely on the availability of the API because user's may choose not to allow it if they are in an inappropriate place and the developer should instead try to use a fallback behaviour to provide an approximate experience or an alternative experience.
While crawling WebXR Raw Camera Access Module, the following links to other specifications were detected as pointing to non-existing anchors, which should be fixed:
Thanks for putting this together, it's exciting. Can this indirectly also allow for deferring the output of particular camera frames? And if so, what would be the best way to use the API in order to achieve this goal?
But I think @nbutko 's usecase explanation here is the most succinct I've seen:
Delayed drawing - on mobile phones an application might need to synchronize the vision with the camera feed.
-- Need to be able to hold an XR frame for ~2-10 animation frames before it is presented to the user with other results from vision.
-- from immersive-web/proposals#4 (comment)
(With the caveat is that we're doing remote rendering and need to sync that to the camera output, rather than syncing CV stuff. But the requirements are the same for both.)
I just worked on my code that used the camera-access as it was implemented until Chrome 91.
Access to the camera works it seems, as I can run the sample code provided at the Chromium site successfully. But I fail to get the pixels out of the WebGLTexture object returned by glBinding.getCameraImage() call now.
As WebGLTexture is described as an opaque texture object, I assume that accessing the pixels is not meant to work. But as computer vision is listed as one of the use cases in the explainer, I have the impression access to the pixels is necessary somehow.
Would be interested what the situation and plan is around this subject.