Currently, we have a limitation of two eyes, returning two sets of projection and view

Thanks for your feedback, <a class="user-mention notranslate" data-hovercard-type="use

Feature Request: Multi-viewport per eye to support LMS, MRS, etc..,about immersive-web/performance-improvements

Comments (14)

toji commented on May 29, 2024

I've been slow to respond to this because I've been looking for documentation on the two NVidia extensions you mentioned. That's proven to be frustratingly difficult. 😠 If anyone can point me at specs/docs I'd appreciate it, otherwise I'm going off of what I recall from various presentations and marketing materials.

I don't believe that either of those change the basic calculus of how many viewports the headset deals with. They can help the app render more efficiently by allowing it to reduce fillrate in some cases, but that's maybe something that can be handled via a WebGL extension with basically no input from the headset. So I don't think that those methods specifically make a good argument for having WebVR advertise N>2 viewports.

That's not to say there's NOT good reasons for N>2 viewports, though I don't think they really exist in any commonly available form at the moment. Off the top of my head: CAVE systems, foveated rendering, and ultra-wide field of view systems and (amusingly enough) mono displays could all benefit from moving away from an explicit left/right eye model to an N-viewport model. It would unquestioningly be more complicated from an API and developers perspective, but perhaps not unreasonably so. Worth sketching it out, at least.

The modifications to pose are easy. Still want a single pose matrix, but the view and projection matrices just become an array:

interface VRDevicePose {
  readonly sequence<VRView> views;
  readonly attribute Float32Array poseModelMatrix;
};

interface VRView {
  readonly attribute Float32Array projectionMatrix;
  readonly attribute Float32Array viewMatrix;
  readonly attribute VREye eye;
};

The inclusion of "eye" on the view probably raises eyebrows, but it would be necessary. If you're dealing with pre-baked stereo content (3D photos or videos) you can still render each viewport just fine but you need to know which eye each is attributed to so you can choose the appropriate part of the stereo pair. For an N-view, non stereo system (CAVE w/o 3D glasses) they'd just have to pick an eye and run with it.

The more complicated part is dealing with the layer bounds. Viewports now become something the app never makes assumptions about and has to query every time. They could still be changed, but it would be significantly trickier to do anything other than simple scaling.

interface VRCanvasLayer : VRLayer {
  attribute VRCanvasSource source;

  void setViewBounds(unsigned long long index, double left, double bottom, double right, double top);
  FrozenArray<double> getViewBounds(unsigned long long index);

  Promise<DOMHighResTimeStamp> commit();
};

The canvas should of course be pre-populated with the recommended viewports. Rendering to them now looks like this:

function DrawFrame() {
  // Ignoring magic window and non VR scenarios.
  let pose = vrSession.getDevicePose(coordinateSystem);
  for (let i in pose.views) {
    let view = pose.views[i];
    let bounds = vrCanvasLayer.getViewBounds(i);
    // Remember that bounds are UVs and viewports are pixels
    gl.viewport(bounds[0] * gl.drawingBufferWidth,
                        bounds[1] * gl.drawingBufferHeight,
                        bounds[2] * gl.drawingBufferWidth,
                        bounds[3] * gl.drawingBufferHeight);
    DrawScene(view.viewMatrix, view.projectionMatrix);
  }
  vrCanvasLayer.commit().then(DrawFrame);
}

Not terrible, but more complex than before. Canvas size, reported via getSourceProperties also comes into question. We could keep it as a single value you query to get a recommended canvas size that will fit all of the recommended viewports, which is nice and simple and doesn't require any changes. Maybe there's a reason for wanting to ask for separate scales per viewport, though? Then things get massively complicated, because you no only have to query each one separately but also do some packing algorithm on the app side to figure out the full canvas size. It would also make it impractical to return default bounds since we don't know the shape of the canvas. 😱

I don't recommend doing that.

Of course, it also brings up the question of whether or not rendering to a single canvas is appropriate for more than 2 viewports. As the number gets higher I feel like we increasingly want a way to just submit N textures instead, but I'm not enthusiastic about pushing that as part of the WebVR API's first pass because it complicates several things, such as platforms that require rendering into a specific type of render target.

One other thing to consider is that for something like a foveated rendering scenario you'd probably end up with two viewports superimposed. As a result you'd want some way to stencil out the area of the base viewports that would be obscured by the higher res central viewports. I don't think that's worth baking into the API at this point, but it's something to consider for the future.

So in summary the basic stereo case gets more complicated and viewport bounds manipulation becomes a LOT trickier to do in the general case, and content with pre-baked stereo has to jump through some hoops. But mono rendering actually gets nicer (Could easily use this for magic window scenarios) and we hopefully make wider device support easier down the road. Doesn't feel like a bad tradeoff to me.

It's probably also worth examining how this would work if we didn't commit to in now but decided to extend the API to support it later.

The simplest way (IMO) would be to handle N-view content through a new layer and pose type that mimic the structure above but exist alongside the current proposals. Might mean that it'd be smart to name the current ones VRStereoCanvasLayer and VRStereoDevicePose to avoid confusion down the road.

The main consequence of that would be that pre-existing content wouldn't be compatible with (or at least would be suboptimal for) N-viewport hardware if and when it ever reached the market. You'd have to explicitly feature detect for it and design your content with that in mind. Of course, even if we plan for it up front there is a non-trivial risk that we guess the needs of that hardware wrong, and it turns out that all the existing content is incompatible anyway despite trying it's best to account for imaginary devices. 🤷‍♀️

from performance-improvements.

toji commented on May 29, 2024

Ooh, also meant to note that generalizing to N views makes a culling frustum (https://github.com/w3c/webvr/issues/203) harder to implement and use. A CAVE system, for example, basically has no global culling frustum. We'd have to come up with a way of expressing that.

from performance-improvements.

kearwood commented on May 29, 2024

It may be possible to achieve lens matched shading and multi-res shading by still having only two "viewports" from the perspective of the browser, but with additional details describing how the coordinates have been compressed into the sub-regions of the viewports.

For lms, that would be 4 quadrants, with two coefficients that describe the w-scaling on each axis.
For mrs, there would be 9 regions, with a scaling factor for the outer 8 regions.

from performance-improvements.

kearwood commented on May 29, 2024

When the 2d mirror is displayed on the canvas element, perhaps the browser should un-compress the texture before display rather than showing the distorted result.

For browsers that implement a separate "compositor" in their pipeline, this could happen at the final rasterization step within a shader.

from performance-improvements.

kearwood commented on May 29, 2024

If we treat lms and mrs viewports as sub-regions of the stereo viewports, this shouldn't have any effect on the calculation of culling frustums.

from performance-improvements.

kearwood commented on May 29, 2024

This can also be made an optional feature.

Perhaps the VRDisplay could enumerate the acceptable "lens compression" pre-set formats and the parameters needed for them. Content could then decide to select one and activate it, rather than having full control over the individual parameters.

Typical profiles to select from may be described:

Uniform Shading
Quality Lens Matched Shading
Conservative Lens Matched Shading
Aggressive Lens Matched Shading
Quality Multi-Res Shading
Conservative Multi-Res Shading
Aggressive Multi-Res Shading
Off-Axis projection (Enumerated for C.A.V.E. systems and perhaps Hololens?)

For each of these profiles, there would be an "effective" viewport size and "compressed" viewport size. The "effective" viewport size would be the size normally used with traditional "Uniform Shading". This viewport size represents the near-plane of the view frustum. The "compressed" viewport size represents the actual pixel dimensions after the compression effect takes place.

This means that we may want to report different values for the recommended render target size depending on the shading profile selected. Perhaps each shading profile would list it's recommended render target size rather than having a single value for the VRDisplay.

It would be acceptable for a UA that does not support Off-Axis, LMS, or MRS to simply enumerate the "Uniform Shading" profile with the normally recommended render target size attached.

from performance-improvements.

kearwood commented on May 29, 2024

Perhaps we should also evaluate the names "Lens Matched Shading" and "Multi Res Shading" and choose something less associated with one GPU vendor. Alternately, we could describe these methods indirectly as parameters that effectively implement the algorithms.

from performance-improvements.

kearwood commented on May 29, 2024

We could support LMS and MRS by describing their framebuffer formats with just a few values to describe the way the geometry has been packed into the eye viewport:

LMS requires just two coefficients: A and B.

They describe how the "w" value of the homogenous clip space coordinate is modified immediately before w-divide in the vertex shader:

w` = w + A * abs(x) + B * abs(y)

[Where (x, y) is relative to the center of the eye viewport.]

For MRS, we can describe how the viewport is subdivided into a grid of 9 regions. Each of the 2 horizontal splitting lines and 2 vertical splitting lines would describe their source UV coordinate and destination UV coordinate within the eye viewport. This can be described with 8 values:

Left split source X
Left split destination X
Right split source X
Right split destination X
Top split source Y
Top split destination Y
Bottom split source Y
Bottom split destination Y

The outside edges of the source regions can be assumed to map to the outside edges of the destination regions.

I propose that we add these values to the VRWebGLLayer / VRCanvasLayer to describe the way to sample / unpack the texture within each eye viewport.

This can be polyfilled with a simple shader to ensure support on all platforms; however, platforms that sample the texture during VR layer compositing or lens distortion will gain the most.

from performance-improvements.

toji commented on May 29, 2024

Sorry, slow to respond because I'm OOO for a week. Wanted to get back with some feedback, though.

The proposals for how to expose the functionality seems good, and I imagine we could even break them up if there was concern about one method or the other. My thoughts are more about how this (and similar future capabilities) are detected and used by the developer.

I'm working off of a couple of assumptions here that may not be correct, so please let me know where I'm wrong! It seems to me that:

Not all platforms will support these capabilities natively, and a shader-based approach will incur some fill rate overhead (which may be negated by the fill rate savings of the initial render?). So it may not be appropriate to expose on all platforms.
This does incur additional complexity during scene rendering, which depending on the exact scene setup may or may not be a problem. As such, using these techniques are optimizations the developer should opt into when appropriate.

Given those two assumptions, several follow-up questions are created. Specifically:

What's the mechanism that we'll use to advertise to the developer that one or both of these techniques are available.
How does the user indicate they are rendering using those techniques.
In a multi-layer world do we allow the techniques to be opted into per-layer (Seems like that should be a yes.)
Does this have any interaction (or does it work better) with any OpenGL extensions, and if so which ones and do they need to be proposed separately?

To be honest I was planning on broadly avoiding speccing these sorts of optional optimizations as part of the first pass at the API for simplicities sake, but I don't see much harm in blocking out how it could work to inform whether or not we want to include it right away. My initial take would be to make this a new layer type and define the system for checking if a layer is compatible or not. (Which could be as simple as trying to construct one and having it throw, but I'd like to hear other options.)

from performance-improvements.

kearwood commented on May 29, 2024

Thanks for your feedback, @toji!

Not all platforms will support these capabilities natively, and a shader-based approach will incur some fill rate overhead (which may be negated by the fill rate savings of the initial render?). So it may not be appropriate to expose on all platforms.

This does incur additional complexity during scene rendering, which depending on the exact scene setup may or may not be a problem. As such, using these techniques are optimizations the developer should opt into when appropriate.

Agree. Using these techniques will not always be the most efficient path for the VR compositor, and it should be completely optional to be implemented by WebVR content and UA's. Any rendering engine sufficiently advanced enough to take advantage of these techniques should be expected to also have the smarts to expand out the texture itself before submitting it to WebVR if the UA does not support it.

What's the mechanism that we'll use to advertise to the developer that one or both of these techniques are available.

Perhaps we could add an enum value for each format (homogenous shading rate, lms, mrs). VRSourceProperties would include an additional attribute to designate the format for the submitted textures. getSourceProperties could accept a list of the rendering engine's supported formats and return the most optimal configuration supported by both the browser and rendering engine.

If a UA doesn't support MRS or LMS, it could simply ignore this and always return the normal "homogenous shading rate" format. UA's that support MRS and LMS will be able to make adjustments to the render target size to compensate for the MRS and LMS compression.

I can imagine this same extension mechanism also being used to support floating point (deep color) and wide gamut render targets.

How does the user indicate they are rendering using those techniques.

The VRCanvasLayer / VRWebGLLayer would be constructed with the format enum and related parameters. This could be passed in together with parameters such as "framebufferScaleHint" proposed in PR 218.

In a multi-layer world do we allow the techniques to be opted into per-layer (Seems like that should be a yes.)

Agree. Some layers such as a "floating quad in space" or a skybox should not have this applied, but may be composited together with a VRWebGLLayer that uses these formats.

Does this have any interaction (or does it work better) with any OpenGL extensions, and if so which ones and do they need to be proposed separately?

It does not require any OpenGL extensions; however, the multiview extension can be used to accelerate MRS. I would like to propose another extension, equivalent to NVidia's "Simultaneous Multiprojection" to help reduce the draw call count when using LMS.

avoiding speccing these sorts of optional optimizations as part of the first pass at the API

Perhaps the first pass of the API should at least define the extension mechanism for negotiating formats such as these, color spaces, and color depth. It could be as simple as the mechanism I propose here.
The surface area should be kept minimal. IMHO, the most important part to lock down in the first pass is the processing model of negotiating all texture format related parameters in a forward-compatible manner.

My initial take would be to make this a new layer type and define the system for checking if a layer is compatible or not

This is another possibility we should explore. When combined with other parameters that may be added later to describe the submitted texture format, such as deep color and wide gamut, I worry that we may seed a complex matrix of layer types.

from performance-improvements.

kearwood commented on May 29, 2024

In order to take advantage of the WEBGL_multiview extension (https://www.khronos.org/registry/webgl/extensions/proposals/WEBGL_multiview/), we may also need to define formats that sample from texture arrays. One example of this, with mixed resolution rendering and a 4-texture array: https://www.imgtec.com/blog/optimizing-vr-renderers-with-ovr_multiview/

from performance-improvements.

kearwood commented on May 29, 2024

One way for content to express formats involving texture arrays would be to include a list of views, each with the attributes:

Source rect left position (U)
Source rect top position (V)
Source rect right position (U)
Source rect bottom position (V)
Source texture index
Destination rect left position (U within eye viewport)
Destination rect top position (V within eye viewport)
Destination rect right position (U within eye viewport)
Destination rect bottom position (V within eye viewport)
Destination rect eye

In the event that destination rects overlap, the UA would assume that the rect with the greatest sample density would take precedence.

from performance-improvements.

toji commented on May 29, 2024

Update: We had a vote on features that should be in scope for WebVR 2.0 today and these optimizations were voted out. The group did agree that we should have a clear path for how these type of extensions should be handled before finalizing the 2.0 spec though.

from performance-improvements.

toji commented on May 29, 2024

Last week during TPAC we reviewed all of the issues in this repo.

For this particular issue, the consensus seems to be that fixed foveated rendering, such as the type implemented by the Oculus Quest and Go, is a more straightforward way to expose this type of optimization. It could even be implemented using the techniques listed above, but that detail would be hidden away from the developer. Fortunately this is already a planned feature in the Layers API.

Aside from that, the more generalized version of this feature that desktop GPUs have been focusing on is Variable Rate Shading (VRS) which allows developers to supply a texture with the shading rate desired for various blocks of the scene, such as indicating that the sky should be shaded with a lower resolution than an object the user is holding. This feels like the best direction to investigate going forward, but it's not something uniquely beneficial to WebXR. As a result we would expect such a feature to appear as a WebGL or (more likely) WebGPU extension rather than a WebXR feature.

With that in mind, I'm closing this issue.

from performance-improvements.

Feature Request: Multi-viewport per eye to support LMS, MRS, etc.. about performance-improvements HOT 14 CLOSED

Comments (14)

Related Issues (7)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent