immersive-web / real-world-geometry Goto Github PK

Additions to WebXR Device API for exposing real world data (Lead: Piotr Bialecki). Plane detection: https://immersive-web.github.io/real-world-geometry/plane-detection.html

License: Other

Makefile 2.76% Bikeshed 97.24%

augmented-reality incubation webxr

real-world-geometry's People

Contributors

Stargazers

Watchers

Forkers

blairmacintyre syedharoonalam rcabanier brandonalittle unitycoder cabanier isabella232 couby1 notchaiya autokagami msub2 seanpm2001

real-world-geometry's Issues

[meshing] provide support to pause meshing during a session

From @thetuvix:

One pattern that is common on HoloLens is for an app to have a scanning phase and then a running phase. The user starts the app in a new room where the device hasn't been before. The app asks the user to look around until the mesh that's been scanned is sufficient for the app to do whatever analysis and feature isolation that it needs. At a minimum, we need to allow apps to scan for some period and then stop scanning.

unique ids, arrays or maps?

In the demos and implementations I've done with planes and anchors, I leverage the fact that each plane or anchor has a unique id. This lets me create Map()s to store them and look them up.

In this code snippet in the detectedPlanes, either arrays or Maps would work, because the first value of the method call from foreach is the value (with subsequent parameters for array indicies or map keys).

function onXRFrame(timestamp, frame) {
 let detectedPlanes = frame.worldInformation.detectedPlanes;
 detectedPlanes.forEach(plane => {
   let planePose = plane.getPose(xrReferenceSpace);
   let planeVertices = plane.polygon; // plane.polygon is an array of objects
                                      // containing x,y,z coordinates
   
   // ...draw plane_vertices relative to plane_pose...
 });

Was this intentional? I would like these to be Maps, and I would like each object (plane, mesh, whatever we go with) to have a unique id. I current support mesh.uid to get the id back, and that is the key used in the Map for that object.

Please enumerate considered alternatives in Explainers

The Plane Detection API explainer currently lacks a section on considered alternatives. Knowing the depth of the design thinking helps give everyone (including Blink API OWNERS) confidence in designs.

[meshing] How should we request normals?

From @thetuvix:

Float32Array? normals;
Some app scenarios on HoloLens end up not requiring normals, which aren't free to calculate. We should also allow apps to skip requesting normals in the first place.

Need to expose geometry in a way that is generic, but doesn't throw out too many semantics

A simple solution to world geometry is pick a lowest-common representation, like "a forest of meshes".

But, then ARKit/ARCore planes become meshes, and lose semantics: we no longer know that they are meant to correspond to vertical or horizontal planes. Similarly, faces could be exposed as a mesh that comes and goes, as could moving objects or detected images, etc.

So, a slightly less "wasteful" approach might be to say

we expose a forest of meshes
a mesh does not need to satisfy any particular geometric properties, aside from sharing a set of vertices that are used to construct the "mesh".
each mesh has an origin in world coordinates, and be defined relative to it. That origin could change each frame (e.g., so ARKit/Core planes/faces can be represented this way, as can tracked objects if the browser supports such a thing)
each mesh can be typed, perhaps just using string names, and have additional information with it beyond the mesh that depends on the type. Apps that know of the type can use it, others will just use the mesh.
- planes might have a surface normal
- faces might have an array of blend shapes
- when other things are detectable, they can be exposed similarly. So, if we exposed "image detection and tracking", the detected image could have a mesh defined for it that corresponds to the image in the real world -- same for objects being detected (the real-world equivalent OR the thing used for detection could be returned)

Add support for 3D geometry

Quest room setup allows for the creation of 3D objects and we'd like to expose those like we do for planes.
For future proofing, I think we can expose them as meshes (like we do for planes). I believe Hololens already has APIs to expose the real world this way.

Should we extend the plane API or create a new mesh API that is almost identical to the plane one?

No information on requiredFeatures or optionalFeatures

There is no information in explainer on requiredFeatures string to enable this feature during session creation.

Plane detection should be promise based

From the proposal:

function onXRFrame(timestamp, frame) {
 let detectedPlanes = frame.worldInformation.detectedPlanes;
 detectedPlanes.forEach(plane => {
   let planePose = frame.getPose(plane.planeSpace, xrReferenceSpace);
   let planeVertices = plane.polygon; // plane.polygon is an array of objects
                                      // containing x,y,z coordinates
   
   // ...draw plane_vertices relative to plane_pose...
 });

"planes" are usually requested by a user pointing to a certain area and then the logic tries to detect a plane in that area. It's not an ongoing process (such as meshing and headpose).

I think it would be better if this API takes a direction and some sort of area and then returns a promise with the detected planes.

Create proposal for near/world mesh

@TrevorFSmith Is it possible for me to create a proposal in the repo that captures the proposed IDL for meshing?
It's by no means finished but having it buried in an issue makes it hard to discuss.

Thanks!

detectedPlanes throws exception, detection of feature availability is not graceful

Currently, it is possible to detect if the platform supports Plane Detection by checking the availability of XRPlane in window.
Once the session has started, and the first XRFrame is provided by XRSession, the only way to check if plane-detection feature is available, is to attempt to get detectedPlanes from XRFrame, and try/catch to identify if it is available.
Most other APIs have a better way to detect availability and do not follow try/catch pattern.

It would be more graceful if detectedPlanes would equal null if the feature is not available, instead of throwing an exception when checking the value of it. Especially while it is not a function, but a property.

Current approach (by the spec):

let planes;
try {
    planes = frame.detectedPlanes;
} catch(ex) {
    return;
}

Proposed approach:

const planes = frame.detectedPlanes;
if (planes === null)
    return;

Explore using meshes for all RWG

Discussion about data types relating to real-world geometry and whether a mesh should be used.

having this as a request on an open session may cause extra dialogs to user

Having the configuration of world tracking that this be a call on the session will allow it to be updated while the session is running, which is good

 planeDetectionState : {
   enabled : true
 }

BUT, it also means that the call may happen after a UA has asked the user if they will approve WebXR starting, which may require another dialog.

Can we ~~change this to~~ add something like:

navigator.xr.requestSession('immersive-ar', {worldKnowledge: true})
      .then(onSessionStarted)
      .catch(err => {
        // May fail for a variety of reasons. Probably just want to
        // render the scene normally without any tracking at this point.
        window.requestAnimationFrame(onDrawFrame);
      });

We could still support popping up permissions if sensing state is called when permissions haven't been requested. So

function onSessionStarted(session) {
  // Store the session for use later.
  xrSession = session;
  xrSession.updateWorldSensingState({
      planeDetectionState : {
         enabled : true
     })

could pop up permissions if worldKnowledge hasn't been requested. This would, however, that updateWorldSensingState is a Promise, since it could fail:

xrSession.updateWorldSensingState({
      planeDetectionState : {
         enabled : true
     }).then(stateUpdated)
      .catch(err => {
        // May fail for a variety of reasons, depending on the implementation and the 
        // permissions granted by the user, or the capabilities of the platform
      });

Also, two other things here:

for the option for requestSession, I suggest worldKnowledge, meaning "this page has access to world knowledge". The sensing and tracking will happen regardless, in the platform, even if it's not available to this app.
I change updateWorldTrackingState to updateWorldSensingState because tracking is about "tracking changes and motion", while sensing is broader (e.g., detecting static structure, etc).

UPDATE: Changed a bit of the text above for clarity. This is also discussed in #7

Semantic Labels

Planes are very similar to Mesh Detection, and it would be great to have semantic labels for Planes, just like for Meshes.

Should plane data be available outside of rAF?

Discussing whether plane data should be available to applications outside of requestAnimationFrame

[meshing] determine how to deliver the mesh to the author

From @thetuvix:

Given that getting to world mesh involves latent processing anyway, it may be less important to eagerly force finished mesh on apps ASAP. Instead, if the mesh metadata object has a .requestMesh() method that returns a promise, that could get us the best of both worlds, allowing apps to request the mesh they care about most earliest.

and

The optimal APIs to allow users to both asynchronously request durable world mesh updates and also synchronously receive per-frame pose-predicted hand mesh updates may differ. We should consider how best to address each of these cases first, and then see if there's common ground worth combining.

[How] should applications be informed of changes in plane data?

Discussing how applications can determine when plane data has changed between rAFs.

Plane detection should be enabled during session creating

From the proposal:

xrSession.updateWorldTrackingState({
 planeDetectionState : {
   enabled : true
 }
});

This means that the session was already created and the permission dialog already happened.
I think plain detection should be requested during session creation so the user can be prompted for permission.

Need a function to get all existing persisted anchor handles

Right now there's not a way to recall persisted anchors from previous sessions. As an alternative I turn to using localStorage to store uuids of persisted anchor uuids. However, sometimes accidents happen and I lose that local storage list and I have a persisted list of anchors in my browser that I cannot access. It would be awesome if there was some function getPersistentAnchorHandles that would return me all stored persisted anchor's handles. A side benefit of this would be to allow me to not have to use localStorage solutions for basic storage of anchors and make my AR application simpler.

API for planes and meshes

In #6 we seem to have settled on the idea of allowing developers to request the kinds of data that actually want, be it planes of various kinds, meshes or other things in the future (tables, the ground, etc ... anything platforms might be able to provide). To do this, we need to update the proposal to as follows. Initially, focusing on planes and meshes should be sufficient to test across a good number of platforms. (We might also be able to test faces in the WebXR Viewer).

We need:

a way of requesting (and checking the success of) the specific types. For example, I might ask for meshes, and if those are unavailable, ask for planes (and then convert them to meshes myself). This may be done by adding features to the main webxr api.
a way of configuring each of these (already there with xrSession.updateWorldTrackingState). (Suggestion: change this to xrSession.updateWorldSensingState, as we will probably move this to the "Core AR Module" when that happens, and use it for other sorts of sensing configuration as well)
a way to get the data (already there with frame.worldInformation

X21

Top view map of SLAM

When running SLAM in WebXR, is it possible to gain a a top view plane map of object in the environment from SLAM result?

Comments on the planes-explainer

A few high level comments, rather than doing detailed editing.

Is the intent of the having objects per "kind of world knowledge" be to allow relatively complex configuration? For example, we could eventually pass in significant configuration for things like object and image trackers/detectors? (this seems reasonable)
I'm less keen on having the planes be a field in returned value; this implies that any kind of info will be in it's own area. Why aren't planes a kind of mesh? If all things that are meshs are meshes, then an app that cares only about meshes (for occlusion, for example, or drawing pretty effects on anything known in the world) can just deal with that.

meshes.forEach(mesh => {
   let pose = mesh.getPose(xrReferenceSpace); // origin of mesh or plane
   if (mesh.isPlane) {
       // mesh.polygon is an array of vertices defining the boundary of 2D polygon
       // containing x,y,z coordinates
       let planeVertices = mesh.polygon; 
      // ...draw plane_vertices relative to pose...
   } else {
      // draw more general mesh that isn't a special kind of mesh
      // the plane mesh would also have 
      let vertices = mesh.vertices  // vertices for the mesh.  For planes, might be the same
                    // as mesh.polygon plus one at the origin
      let triangles = mesh.triangles // the triangles using the vertices

      // ... draw mesh relative to post
   }
 });

I agree with synchronous, but why are things only valid in the rAF? As your example shows, the first thing people will do is copy it. This seems massively wasteful for non-trivial amounts of world info. Why not just say it's valid through the start of the next rAF?
why is this pull instead of push? Forcing folks to poll and step through the data to see if anything has changed seems awkward. In our polyfill we used two things:
- events. When an existing thing had it's values changed, a subscriber could be notified. This is the pattern used in the anchor proposal, for example.
- difference lists. In addition to the "current" plane (mesh) set, we had a list of new mesh objects and a list of deleted mesh objects available, so apps could update their internal data structures as needed.

Detecting user's physical environment impact on plane detection

Is it possible to figure out the failure reason for plane detection (via WebXRPlaneDetector and/or light estimation) in Babylon?

Why? We want to show in app remediation prompt to user in the lines of the latest ARCore best practices which suggests to inform the user what went wrong (or why it is taking so long) https://developers.google.com/ar/design/environment/definition#environmental_limitations

Examples of when environment can cause plane detection failure (or slow) would be low light, or bright walls, highly textured walls or moving surfaces etc.

[meshing] Determine the region that should provide a mesh

From @thetuvix:

Note that even when a HoloLens app is scanning, it is common to not hydrate vertex/index buffers for all meshes. For example, an app may only care about mesh within 10m of the user, even if the user has previously scanned the entire building. Also, after the user places their primary anchor, the app may choose to only generate buffers for mesh within 5m of that anchor to save power.
I see a few options here:
We can provide metadata for all available mesh volumes, and allow apps to request vertex/index buffers on demand.
We can allow apps to specify a radius around the user within which mesh buffers are optimistically provided.

Providing a clear signal for when we should call initiateRoomCapture

Having the ability to request room capture from the application is definitely useful, but currently there doesn't seem to be a straightforward way to determine whether it's necessary or not. It seems like currently the best way to check would be to wait for a certain amount of frames where detectedPlanes remained empty, but the fact that there's an uncertain amount of dead time before room capture would be called feels less than ideal.

If there's a way that the browser could surface this (either through a property on XRSystem or on XRSession perhaps) in a way that one could immediately check it'd definitely be more convenient.

Permissions and ongoing user control

Access to real-world-geometry brings up the questions of permissions and levels of permissions and how they might work in practice.

The implementation I am working toward has 3 levels of permissions, with the UA also supporting a "reduced information" (lite) mode.

An app can request one of these progressive levels (each includes the previous):

minimal, basic WebXR: spatial tracking, hitTesting, creating anchors from absolute matrices or hitTest results
world sensing: real-world-geometry, illumination, detection and tracking of well-defined things the platform supports (images, objects, QRCodes, etc), world relocalization maps, etc
camera / sensor data: originally this was "camera" access, but it includes, I think, access to any of the low level sensors a device might have.

There are questions, obviously. How can we standardize on some of the sensing bits. Where do things like eye trackers and other invasive sensors (like "affect" sensors, or facial tracking of the user in an HMD for creating their avatar elsewhere) fit?

For both the "minimal" and "world sensing" levels, there is a "limited" option, which (for our iOS app) has the user select a plane; that is the only world structure that gets exposed, or hit test against.

Finally, we are displaying the current state in the URL bar (when it's exposed), akin to the HTTPS icon or camera or microphone icon on a browser. Clicking on it brings up a detailed permissions sheet, where specific information can be turned on/off, including the most basic access. The web page is NOT notified of these changes; the data just stops flowing. The user can change this at any time.

This relates to the real-world-geometry in two ways:

I am adding an option to request session, called "worldSensing". If true, when the user is asked for permission to use XR, it also includes permission to use "worldSensing". The advantage of doing this is that the user only gets one permission request, instead of getting a WebXR permission and then a worldSensing permission. We have not decided yet what to do about failure: does the user have the option to say "yes" to "webxr" but "no" to world sensing? Or does the webpage need to deal with requestSession failing, and then request a session with less access (or display and error?)
if the app requests anything requiring worldSensing later, and didn't put worldSensing in the original request, we could trigger a new permission. Apps might do this if they want to wait until the user does something that requires world sensing, if they don't require it all the time. It would be nice if there was a way for the app to provide a string or description of why they want this. if not, pages can display info before they request (since they know it's going to trigger one).

Thoughts?

[meshing] Type of the indexbuffer

From @thetuvix:

required Uint16Array indices;
Depending on the density of mesh requested, a mesh volume on HoloLens can sometimes have enough indices to require a 32-bit index buffer.

immersive-web / real-world-geometry Goto Github PK

real-world-geometry's People

Contributors

Stargazers

Watchers

Forkers

real-world-geometry's Issues

Current approach (by the spec):

Proposed approach:

Recommend Projects

Recommend Topics

Recommend Org