immersive-web / model-element Goto Github PK

View Code? Open in Web Editor NEW

60.0 45.0 11.0 16.27 MB

Repository for the <model> tag. Feature leads: Marcos Cáceres and Laszlo Gombos

Home Page: https://immersive-web.github.io/model-element/

License: Other

HTML 99.84% Makefile 0.16%

webxr incubation

model-element's Introduction

The HTML `<model>` element

This is the Immersive-Web repository for the <model> element - HTMLModelElement and associated APIs.

Draft spec
<model> element explainer (does not match the spec! original proposal)

model-element's People

Contributors

Stargazers

Watchers

Forkers

drx3d marcoscaceres himorin alexlakatos laszlogombos mikkoh expenses isabella232 seanpm2001 tidoust zachernuk

model-element's Issues

Fetch integration: new destination?

The formats that model support can fetch a lot of other resources. We probably need a new fetch destination ("model").

Enforce MIME types

Need to describe how we sniff for MIME types in [MIMESNIFF]. See also IANA "model" types. We might need specific rules for sniffing.

model is an appropriate child of figure

Need to specify that model is an appropriate child of <figure>.

Rename explainer to proposal

As we start spec development, we are likely to make breaking changes. As such, the explainer won't match the spec anymore.

It would probably be better to create a fresh explainer that is always sync'ed with the spec. That is, require that all pull requests that include author-impacting changes include a changes to the explainer.

rename the current explainer -> proposal.md or, probably better... move it to historicalAndEvolution.md (which we could probably just move to the wiki).
create a new explainer.md
add GitHub pull request template to enforce the change to explainer.

Describe what is in scope

It would be great if the spec could talk about what's in scope just for this initial version. We want to make sure that we provide enough functionality to make model useful within the context of web page.

Integration with `preload`

Need to specify Integration with preload link relationships.

Update poster algorithm for model?

HTML defines an algorithm to determine the [=poster frame=], but it's <video> specific. Should we accommodate it support <model> or specify our own?

We also need to consider what happens if the animation resets and the model is paused: does the poster show again? (i.e., do we follow video's behavior?)

Almost everything is `promise` based

Is it necessary that every method and attribute returns a promise?

Dealing with format specific animations and Interaction

glTF is not a run-time format. It does not define what an application should do with a model once it is loaded and rendered. It does provide some capabilities that a run-time engine may use to enhance the user experience. glTF currently does not store any interactivity information. Currently that is solely a run-time determination. The run-time determines what parts (if any) of the model may be active and the behavior based on any trigger.

Like Interactivity, animation is not built-into glTF. glTF files may contain animation parameters that specify the type of animation (e.g., morph, skin & bones, etc.) and the associated parameters needed to perform the animation. There is nothing in the glTF specification that defines how one animation interacts with another. For example, a human model may include walk, jump, and drop animations; but it is unlikely that they should all be played at the same time.

Any HTML element that wishes to handle animation as stored in a glTF file needs to understand how the content creator intended the animation to play.

CSS Integration: Fullscreen Presentation State

How does model work with the :fullscreen pseudo-class?

Formats and CORS

Need to clarify that 3D resources can fetch resources, and as such need to be subject the document's CORS policy (probably "media-src"). However, we need to clarify what this means in relation to, say, "img-src", for example... as models can load png/jpg textures.

Exit interaction event (with data?)

When exiting an AR experience it's sometimes useful to pass data from the model out back to the page. On iOS [1], for instance, a "message" event is sent:

Which then allows a web page to over and perform some action through the web page. In the case above, it triggers Apple Pay through (presumedly) the Payment Request API.

Obviously, the "message" with the custom .data "_apple_ar_quicklook_button_tapped..." is not something we would want to standardize, but it might be good to consider some kind of user activation action resulting form the format itself causing the scene to exit with some action. The .data could be an IDL object (or something better) that could be used to handle the action (e.g., buy a thing).

[1] https://developer.apple.com/videos/play/wwdc2020/1

CSS integrations: Media Playback States

If #13 holds, how does the <model> work with Media Playback States :playing, :paused?

Define interactivity

The proposal introduces an interactive attribute, which would allow a user to interact with the model. We need to specify what that means to some degree, or at least some expectations.

Add example for adding a model to a page

It would be great to have an example that shows how to adding a model to a document.

What's the default style?

What's the default CSS style for a model element? Should it have a border around it? what about background color? etc.

Setting the image-based lighting (IBL)

In order for models to be rendered appropriately within the context of web document, it might be useful to give developers some means of controlling the IBL.

For example, it might not make sense to use sunny environment light for models sitting on a predominately dark document (and vice versa).

There may be some overlap here with light/dark mode in CSS... but not sure.

Lazy loading

Some <model>'s resources can be significant in size. As such, it might be good to support the loading attribute to allow these resources to be lazy-loaded.

Require CORS for all <model>-initiated fetches

Modern web platform features, such as <script type="module">, CSS fonts, and web app manifest, use the "cors" fetch mode. This is in contrast to legacy features such as <img> and classic <script>.

This change is important for security, especially in light of Spectre. It obviates the need for obtuse workarounds like CORP, ORB, and cross-origin tainting.

I'd like to strongly request that <model> follow this guideline and use "cors" for all its fetches. (Thus, it can only load models cross-origin if they opt in with Access-Control-Allow-Origin: *.)

I originally filed this as WebKit/explainers#63 but it seems the draft resolved in a different direction.

An "ar" link relationship?

I was looking at Vieweing Augment Reality Assets in Safari for iOS and it describes an "ar" link relationship that applies to anchors. I don't think that's been standardized though.

On iOS, adding "ar" link relation adds a AR overlap button inside the context of the anchor:

Can be seen here (on an iOS device):
https://jsfiddle.net/61g0m2ky/1/

I wonder if we should standardize that too?

Is media= enough for LOD handling? (or do we need srcset?)

One thing I've been considering is how LOD handling should be handled on the web. Could we leverage <source srcset to define LOD swapping?

An example:

<model>
    <source 
      srcset="assets/example_200.usdz 200w, assets/example_1024.usdz 1024w"
      type="model/vnd.usdz+zip"
    >
</model>

The idea here would be that when the model is rendered at 200px or less in the viewport that the model from url assets/example_200.usdz would be rendered. Just to be clear I'm not talking about elements innerWidth but rather the render resolution of the model itself calculated from a bounding sphere.

It could be assumed that the lowest quality model is the one that would be rendered at the lowest resolution.

Arguing against this idea it could be argued that the 3D Model Format itself should define this functionality.

I've created a demo using canvas/webgl that has similar functionality that I can share at some point if it's helpful.

Formats: privacy considerations

We need to investigate what the privacy implications are of each model format we will recommend. The model formats themselves can fetch resources, so we need to put a privacy and security framework around what schemes they can fetch (https only, for instance). We also need to say what all the fetch policies are. Need to investigate if the formats provide any guidance here, or if they leave it up to the implementation. If they do, we need to specify it (i.e., don't send cookies, don't leak the referrer, etc.).

Means of controlling the shadow

There are cases where it may not be desirable to have a shadow being cast, because the shadow can be larger than the viewport in which an object is being show. Also, as the "camera" rotates, the shadow can end up looking weird and exposing the boundary of the <model> element.

For example:

It might be good to disable the shadow entirely? Or some other means for developers to express where the light source should be to cast the shadow where they might want.

Rename `Camera` objects

camera implies that this feature controls the projection matrices of the model.
If is implemented in a VR device, HTMLModelElementCamera wouldn't control the camera but the orientation.

Format security concerns

Need to describe that each format will come with its own security considerations (and link to the appropriate security considerations in their respective specs).

Constrains on rotation

With photogrammetry it's not always possible to capture every angle of an object. This means that a model can be missing details, such as in the situation below:

Wondering if it might make sense to constrain the angle(s) in which a user can spin a model when interactive is enabled? This could be done with JS and the camera control API, but it might end up being really jerky as the JS code would be fighting the user trying to spin the model.

Also noting that it may not make sense in an AR context, because a user could simply crouch to look underneath... but that might be ok.

Fullscreen integration

What should happen when requestFullscreen() is called?

HTML parser integration

Need same behavior as audio and video when including into a p element with no end tag.

CSS Integration: the :muted and :volume-locked pseudo-classes?

If a model can play sound, and can be muted, the how does <model> work with :muted and :volume-locked pseudo-classes?

Default width/height

When the model is missing width/height, and the resource itself doesn't provide them, it would be nice to have it fall back to something sensible. Seems to be controlled by https://html.spec.whatwg.org/#embedded-content-rendering-rules

Related to #27

Need to specify the eventing model

Like other media and resources, model could emit events. An obvious one being a network error if a resource can't load. We should figure out what these events might be, and if model is a media element (#13), then we should see what applies in general for media elements.

Add example for making model accessible

It would be great to have some examples that shows how to maximize the accessibility of <model>.

Specify the behavior for when a <source> element's parent is a <model> element

The <source> element behaves differently depending on who the parent is. For instance, when the parent is <picture>, the srcset attribute comes into to play. We need to look at the attributes of <source> and figure out what they mean in when used in the context of <model>.

Which formats?

Need to investigate what formats are suitable for model. We might need some kind of evaluation matrix. Model can support multiple formats out of the box, but it might be good to evaluate what is best of users and developers and why.

Is model a "media element"?

The <model> element shares a lot of similarities with the <audio> and <video> elements, yet it's distinct in some ways (we need to tease these out). It's similar in being potentially temporal multimedia content (i.e., it has audio, it potentially animates over time). We need to figure out if model sufficiently different to warrant being its own element class, or if it can reuse much of "media element"'s infrastructure.

Consistent presentation/rendering of model resources

I agree that it is very good to make it easy for people to display 3D content in a web page. I completely disagree with the methods and processes described in this proposal to make it an HTML element. HTML elements need to be fully defined so that they can be similarly implemented across browsers and reflect what people would see in applications outside of browsers. The process of rendering a high-quality model requires proper handling and rendering of the model's geometry, appearance, animation, and interaction.

My knowledge is in glTF (and glTF binary) so these comments may or may not reflect on the capabilities of USDZ. I will address the topics as separate issues: Appearance and Animation / interactivity; with respect to 3D models in glTF format. Static geometry is pretty straight-forward and not subject to much interpretation.

The really difficult part is appearance. The document states that "it is impractical to define a pixel accurate rendering..." for models. However, this is really important. Khronos has done extensive work in the 3D Commerce Working Group towards pixel accurate rendering across multiple 3D viewers (https://www.khronos.org/3dcommerce/certification/). The accuracy was demanded by retailers so their products would appear visually identical across different web sites. There were so many factors that mattered in producing acceptable renderings that include lighting, rendering calculations (including equation approximations), conversion from GPU to display, and tone mapping.

The component that caused the most issues and difficulties is lighting. A model built for physically-based rendering looks best in a complex lighting environment. This is usually done with image based lighting, but punctual plus area lights will also work. The statement that "A future version ... will describe the lighting model and environment .... Both items will require community collaboration and some consensus." makes the process sound much easier that Khronos found it to be.

Some issues that came from the Certification work. Note that the Certification program did not solve all of these in the initial release.

Is lighting done as an 8-bit RGB or 16-bit HDR image
Is lighting done with many point and area lights?
How does the content creator provide for different lighting?
How does the user adjust the lighting to match a particular environment?
What background is used for the model display?
How is the (floating-point) rendering converted to an 8-bit RGB display?
How is the rendering adjusted depending on user environment?
How is the rendering adjusted depending on device (hardware, OS, browser)
How is time-dependent display degradation handled?

It may be possible to construct an initial release without resolving all of these items.

How would the model display on curved screens?

The Oculus browser is displayed on a curved surface.
How we envision the display of multiple models? Would we allow them to bump into each other or would there be clipping?

Is it time for browsers to standardize 3D rendering?

As the maintainer of <model-viewer>, I'm humbled to have Apple referencing it in a web standards proposal. I've had a number of conversations now in various standards bodies about the <model> proposal, as well as various internal conversations at Google about whether we should propose something similar in Chrome going back at least three years. I figured I should summarize those conversations here publicly to stimulate further discussion.

As much as it would have been good for my career to push <model-viewer> into Chrome and the standards process, I have instead argued against it because I think would hinder innovation in what is currently a rapidly-evolving field. I'll list out some pros and cons below of standardizing a <model> element vs. using a JS library like <model-viewer>, SketchFab, babylon.js, etc. Please add comments with any pros and cons I've missed, as well as discussion of those mentioned.

Pro: I'm just going to quote the only pro given in the explainer:

Do not add a new element. Pass enough data to WebGL to render accurately
As noted above, this would require any site that wants to use an AR experience to request and have the user trust that site enough to allow them access to the camera stream as well as other information. A new element allows this use case without requiring the user to make that decision.

First, this is largely false. AR within the browser today is accomplished via the WebXR standard (which iOS Safari has not implemented) and it was explicitly developed with privacy in mind. WebXR in fact works without giving the website access to the camera feed, hence the distinction between the XR permission and the camera permission. It does give access to the camera pose in order make canvas rendering possible, but all of this has gone through numerous rounds of privacy and security review. Even the precision of available data is capped to limit fingerprinting.

Con: Device/browser compatibility & consistency.
The various JS libraries for 3D give uniform rendering and universal support for the file formats and extensions of their choice across devices and browsers today (including Safari). And when they implement a new extension, it is available on all browsers simultaneously. The only exception is AR QuickLook on iOS, which has neither the format support, nor the customizability, to achieve rendering consistency, which is constantly noticed by our users. First, <model> appears excited to follow the debacle of the <audio> tag regarding format support across browsers. However, even if a format was agreed upon, I would love to hear the plan for keeping extension support and rendering quality consistent across browsers over time. This is a rapidly-evolving field; Khronos has been releasing several new PBR extension per year for some time, and that looks unlikely to slow. There is more competition between JS libraries than between browsers because the cost of switching is so much lower; the last thing we want to do is hand an innovative field over to a duopoly.

Con: Scale of the API to standardize.
The current <model> API proposal is deceptively simple. This may be because it is so focused on the AR use case and proposes to also solve 3D-in-the-DOM as a side-effect. glTF's usage across e-commerce has demonstrated clearly that while AR has some great niche value, 3D-in-the-DOM is actually the dominant use case. And it requires a lot more customization than AR, especially around camera controls, limits, interactions, and prompts. You can get a taste of the critical APIs <model> is currently missing here. Nevermind the arbitrary choices like model framing, movement behaviors, etc that Apple proposes be left up to browsers to create totally inconsistent experiences.

The bigger problem I foresee though, is requirements creep. I know this well from maintaining MV; I am constantly pushed to expose more and more of the underlying three.js API. I resist in order to keep my product differentiated at a higher level of abstraction, but it's a very fuzzy line. By natively supporting a 3D model in the browser, I predict no one will be satisfied until a Unity-sized API has been web-standardized around it. This is the same problem VRML ran into decades ago. Standards bodies are powerful, but slow - I fear to think how many years it would take to agree on a standard so complex.

In conclusion, I would say Apple's use case can be well solved with today's JS rendering libraries if they simply add WebXR support to iOS. Even if that privacy barrier is somehow insurmountable, they could also propose a standard way to launch native AR experiences from the browser without the need to standardize a new DOM element, which would keep the proposal much simpler, but sadly without any JS-based customization opportunity.

Add example for providing fallback content for legacy user agents

It would be great to show how one can provide fallback content for user agents that don't support model. This could include showing a <video> or <picture> instead.

Background color of model

Need to investigate setting the background color (or transparency) of model. Can the formats specify the background color (without using geometry)? Would it use a CSS background color?

Add example showing enabling interactivity

Add an example that shows how to enabling interactivity.

Reuse media-src?

Given the close relationship to media elements, and given the reliance on <source> elements, we could just say that media-src applies to <model> too.

Adding "controls"

As with other media elements (again #13), having "controls" for media specific things can be extremely helpful for accessibility (and just generally helpful for developers not needing to deal with things like the fullscreen API).

It would be nice to consider adding support for controls and then leaving it mostly to the UA as to what those controls are... we could figure out a standard set of things, like <video> provides.

Camera controls and requestAnimationFrame()

The proposal includes the ability to change the orientation/camera's view of the model. However, it's unclear how that will interact with requestAnimationFrame(). That is, do we leave it to developers to control how those changes are interpolated? or do we just leave it to the user agent to perform the change in position.

There are naturally pro's and cons to each approach. Like, controlling the speed of the translation/transition, and the smoothness curve of the animation. Or. what if the developer wants to just flip the object without any animation?

Accessibility of model

We need to figure out how to make <model> accessible on a number of different fronts:

Visual: describe what is being presented over time.
Interaction: describe what can interacted with (regions, buttons, etc.).
auditory: describe audio and possibly spoke sounds, potentially over time.

Usually, this would be provide by the embedded format... however, it appears that both glTF and USDZ are quite limited when it comes to accessibility.

As such, it may be that we need to leverage what we can from HTML + ARIA to overcome the shortcomings of these formats. We have quite a bit of precedent (e.g., from the humble, yet limited, alt attribute, to how <canvas> can be made accessibly, to the potential inclusion of <track> elements, and so on).

Accessibility: ARIA integration and HTML Accessibility API Mappings

We need to define how what the ARIA semantics are and what is exposed (application probably). We need to coordinate with the accessibility folks + get this added to the HTML Accessibility API Mappings.

MIME integration

We need to check if there are any relevant MIME parameters for model/* content (if any).

Add example showing how to support multiple formats

Having an example that shows how to support multiple formats via <source> would be great.

Reseting the camera?

As the model can be interactive, it's foreseeable that users might place/rotate a model in such a way that they are unsure which direction the model is facing. It might be nice to provide some way of resetting the camera to its initial position.

This is a "nice to have" (i.e., probably not critical) because a developer could take a snapshot of the camera's position when the model loads. However, that's kinda annoying because then they need to keep their own accounting of the starting position of each model.

Camera control via HTML

Has a consideration been made to position a camera via HTML attributes or CSS?

Assume that you're attempting to render a series of 3D models from different views in a page. With the current proposed API the code would look as follows:

<model id="model-1" style="width: 400px; height: 300px">
    <source src="assets/example.glb" type="model/gltf-binary">
</model>
<script>
  document.querySelector("model-1").setCamera({
    pitch: 0,
    yaw: 0,
    scale: 1.5,
  })
</script>
<model id="model-2" style="width: 400px; height: 300px">
    <source src="assets/example.glb" type="model/gltf-binary">
</model>
<script>
  document.querySelector("model-2").setCamera({
    pitch: 30,
    yaw: 45,
    scale: 1.5,
  })
</script>

It should also be noted that since setCamera returns a promise that the action would not be immediate and so the developer may want to hide the model until the setCamera Promise is resolved making the implementation more complex.

Where maybe ideally it would look as follows:

<model camera="0deg 0deg 1.5" style="width: 400px; height: 300px">
    <source src="assets/example.glb" type="model/gltf-binary">
</model>
<model camera="30deg 45deg 1.5" style="width: 400px; height: 300px">
    <source src="assets/example.glb" type="model/gltf-binary">
</model>

Or an alternative would be to define initial camera settings via inline CSS. We already have the perspective css property which is sort of driving in that direction.

immersive-web / model-element Goto Github PK

model-element's Introduction

The HTML <model> element