Giter Site home page Giter Site logo

Comments (50)

AdaRoseCannon avatar AdaRoseCannon commented on May 23, 2024 2

In the initial release of visionOS there was no primary input source, visionOS 1.1 beta (now available) has transient-pointer inputs which are primary input sources.

from webxr.

hybridherbst avatar hybridherbst commented on May 23, 2024 1

@cabanier Yes, I can confirm that AVP only returns "generic-hand".

@toji the AVP currently to the best of my understanding does not have "persistent auxiliary inputs and a transient primary input". There is no primary input as far as I'm aware. The assumption it breaks is that there isn't any primary input source (a MUST as per the spec, at least right now).

from webxr.

AdaRoseCannon avatar AdaRoseCannon commented on May 23, 2024 1

I just tried it and created a recording

Looks correct to me.

Are you planning on exposing more than 2 input sources?
I've been thinking about doing the same since we can now track hands and controllers at the same time. I assumed this would need a new feature, or a new secondaryInputSources attribute.

In visionOS 1.1 if you enable hand-tracking then the transient-inputs appear after the hand-inputs as in elements 2 and 3 in the inputSources array.

I assumed this would need a new feature, or a new secondaryInputSources attribute.

We have events for new inputs being added which can be used to detect the new Inputs I personally don't believe we need another way to inform developers to expect more than two inputs.

from webxr.

cabanier avatar cabanier commented on May 23, 2024 1

@AdaRoseCannon Are there any experiences that work correctly with hands and transient input?
@toji Should we move this to a different issue?

from webxr.

toji avatar toji commented on May 23, 2024 1

The primary issue here is that libraries haven't been following the design patterns of the API.

The API fires select events on the session, and if you're listening for that and firing off interactions based on it then you'll be fine for the types of interactions the Vision Pro is proposing (because it's fundamentally the same as how mobile AR input works today.) But if you've abstracted the input handling to surface select events from the input sources themselves AND trained your users thought example code and library shape to generally only bother to track two inputs at a time (as Three has done) then you're going to have a bad time.

It's truly unfortunate that we've ended up in a situation where most content has optimized itself for a very specific set of inputs when the API itself is ostensibly agnostic to them (and I'm as guilty as anyone when it comes to the non-sample content I've built) but I don't think that we should be OK with making breaking changes to the API because of the choices of the libraries built atop it.

from webxr.

Manishearth avatar Manishearth commented on May 23, 2024

I believe the intent of that is as internal spec convenience.

I'm not really convinced the use case of "we wish to wrap and supplement device events" is something designed to be supported in this regard, and using the notion of primary input devices to do so feels brittle. The proposal here solves the problem for these devices specifically, not in general.

I think wrapping until you know not to seems like an okay call to make.

I also think this can be solved in the Hands API via profiles: it does seem to make sense to expose "primary input capable hand" vs otherwise as a difference in the profiles string.

Unfortunately the current default hands profile is "generic-hands-select", which seems to imply a primary input action, not sure if we should change the default or do something else.

from webxr.

hybridherbst avatar hybridherbst commented on May 23, 2024

Thanks for the comment. With "wrapping" I don't mean "pretending this is a WebXR event" – I just mean: applications need to detect "hand selection" and that needs to work independent of whether the XRInputSource hand has a select event or not.

So to summarize:

  • there's no current mechanism to distinguish between these
    • maybe in the future with more diverse hand input profiles
  • we will have to live with the "double events"

If I was to add this to the spec, would this be a valid wording:
"Input sources should be treated as auxiliary until the first primary action has happened, then they should be treated as primary."

from webxr.

Manishearth avatar Manishearth commented on May 23, 2024

"Input sources should be treated as auxiliary until the first primary action has happened, then they should be treated as primary."

No, I don't think that's accurate. That is an engineering decision based on a specific use case and does not belong in the standard.

applications need to detect "hand selection" and that needs to work independent of whether the XRInputSource hand has a select event or not.

I guess part of my position is that platforms like Vision should expose a select event if that is part of the OS behavior around hands. It's not comformant of them to not have any primary input sources whatsoever: devices with input sources are required to have at least one primary one.

There's little point attempting to address nonconformance with more spec work.

There's a valid angle for devices that have a primary input but also support hands (I do not believe that is the case here). In general this API is designed under the principle of matching device norms so if a device doesn't typically consider hand input a selection then apps shouldn't either, and apps wishing to do so can expect some manual tracking. That's a discussion that can happen when there is actually a device with these characteristics.

from webxr.

hybridherbst avatar hybridherbst commented on May 23, 2024

That is an engineering decision based on a specific use case

I disagree – the spec notes what auxiliary and primary input sources are but does not note how to distinguish between them. That makes it ambiguous and impossible to detect what is what.

It's not comformant of them to not have any primary input sources whatsoever

I agree and believe this is a bug in VisionOS; however, their choice may be to expose a transient pointer (with eye tracking) later (which would be the primary input source) and people still want to use their hands to select stuff.
In that case there could even be multiple input sources active at the same time – the transient one and the hand – and there would still need to be a mechanism to detect which of these is a "primary" source and which not.

from webxr.

Manishearth avatar Manishearth commented on May 23, 2024

I disagree – the spec notes what auxiliary and primary input sources are but does not note how to distinguish between them

The spec is allowed to have internal affordances to make spec writing easier. A term being defined has zero implication on whether it ought to be exposed. Were "it's defined in the spec" a reason in and of itself to expose things in the API then a bunch of the internal privacy-relevant concepts could be exposed too.

The discussion here is "should the fact that a hand input can trigger selections be exposed by the API". If tomorrow we remove or redefine the term from the spec, which we are allowed to do, that wouldn't and shouldn't change the nature of this discussion, which is about functionality, not a specific spec term.

however, their choice may be to expose a transient pointer (with eye tracking) later (which would be the primary input source) and people still want to use their hands to select stuff

I addressed that in an edit to my comment above: in that case the WebXR API defaults to matching device behavior, and expects apps to do the same. There's a valid argument to be made about making it easier for apps to diverge, but I don't think it can be made until there is an actual device with this behavior, and it is against the spirit of this standard so still something that's not a slam dunk.

from webxr.

AdaRoseCannon avatar AdaRoseCannon commented on May 23, 2024

Unfortunately the current default hands profile is "generic-hands-select", which seems to imply a primary input action, not sure if we should change the default or do something else.

In visionOS WebXR the profiles for the hand is ["generic-hand"] because it does not fire a select event.

from webxr.

Manishearth avatar Manishearth commented on May 23, 2024

@AdaRoseCannon should we update the sepc to include that and allow it as an option?

from webxr.

AdaRoseCannon avatar AdaRoseCannon commented on May 23, 2024

That might be sensible. It's odd because generic-hand is already included in the WebXR input profiles repo.

from webxr.

Manishearth avatar Manishearth commented on May 23, 2024

immersive-web/webxr-hand-input#121

from webxr.

hybridherbst avatar hybridherbst commented on May 23, 2024

@AdaRoseCannon thanks for clarifying! The spec notes that

The device MUST support at least one primary input source.

but it seems that hands are the only input source on visionOS WebXR, and it's not a primary input source. Am I missing something?

from webxr.

Manishearth avatar Manishearth commented on May 23, 2024

I actually think that line should probably be changed. Not all devices have input sources in the first place, and that's otherwise spec conformant.

I think it should instead be "for devices with input sources, at least one of them SHOULD be a primary input source"

from webxr.

toji avatar toji commented on May 23, 2024

I don't think we need to change the primary input source requirement, simply because it should be valid to have the primary input source be transient. (This is the case for handheld AR devices, IIRC). It's somewhat unique for a device like the Vision Pro to expose persistent auxiliary inputs and a transient primary input, but I don't think that's problematic from a spec perspective. It may break assumptions that some apps have made.

I remember discussing the reasons why the hands weren't considered the source of the select events with Ada in the past and being satisfied with the reasoning, I just don't recall it at the moment.

from webxr.

cabanier avatar cabanier commented on May 23, 2024

Looking at our code, we emit "oculus-hand", "generic-hand" and "generic-hand-select".
Does VSP just emit "generic-hand"? Is Quest browser still allowed to emit "generic-hand"?

from webxr.

Manishearth avatar Manishearth commented on May 23, 2024

@cabanier continuing that discussion on the PR

from webxr.

cabanier avatar cabanier commented on May 23, 2024

@Manishearth 's new PR allows for both profile to be exposed. This is matching both implementation so I'm good with that change.
This will allow you to disambiguate between VSP and other browsers.

from webxr.

toji avatar toji commented on May 23, 2024

The assumption it breaks is that there isn't any primary input source (a MUST as per the spec, at least right now).

That conflicts with my understanding of the input model from prior conversations with @AdaRoseCannon. That said, I haven't used the AVP yet and it may have been that our discussion centered around future plans that have not yet been implemented. Perhaps Ada can help clarify?

from webxr.

cabanier avatar cabanier commented on May 23, 2024

In the initial release of visionOS there was no primary input source, visionOS 1.1 beta (now available) has transient-pointer inputs which are primary input sources.

Interesting! We have some devices here that we'll update to visionOS 1.1 beta.
Do you have any sample sites that work well with transient-pointer? We have it as an experimental feature and if it works well, we will enable it by default so our behavior will match.

from webxr.

AdaRoseCannon avatar AdaRoseCannon commented on May 23, 2024

A THREE.js demo which works well is: https://threejs.org/examples/?q=drag#webxr_xr_dragging but don't enable hand-tracking since THREE.js demos typically only look at the first two inputs and ignore events from other inputs.

Brandon's dinosaur demo also works well, although similar caveat.

from webxr.

cabanier avatar cabanier commented on May 23, 2024

I just tried it and created a recording:
https://github.com/immersive-web/webxr/assets/1513308/e1247e4b-1985-4a0e-a562-51d6aeb65f06

I will see if it matches Vision Pro.

THREE.js demos typically only look at the first two inputs and ignore events from other inputs.

Are you planning on exposing more than 2 input sources?
I've been thinking about doing the same since we can now track hands and controllers at the same time. I assumed this would need a new feature, or a new secondaryInputSources attribute.

from webxr.

toji avatar toji commented on May 23, 2024

This is getting a little off topic for the thread, but would you want to expose hands and controllers as separate inputs? A single XRInputSource can have both a hand and a gamepad.

(EDIT: I guess the input profiles start to get messy if you combine them, but it still wouldn't be out-of-spec)

from webxr.

cabanier avatar cabanier commented on May 23, 2024

I believe so because if you expose hands and a transient input source, it would be weird if the ray space of the hand suddenly jump and becomes a transient-inputsource.

from webxr.

cabanier avatar cabanier commented on May 23, 2024

I assumed this would need a new feature, or a new secondaryInputSources attribute.

We have events for new inputs being added which can be used to detect the new Inputs I personally don't believe we need another way to inform developers to expect more than two inputs.

I was mostly concerned about broken experiences. I assume you didn't find issues in your testing?

from webxr.

cabanier avatar cabanier commented on May 23, 2024

I worry that adding inputsources is confusing for authors and might break certain experiences.

Since every site needs to be updated anyway, maybe we can introduce a new attribute (secondaryInputSources?) that contains all the input sources that don't generate input events.

/agenda should we move secondary input sources to their own attribute?

from webxr.

hybridherbst avatar hybridherbst commented on May 23, 2024

I think there are a few cases where it won't be clear which thing is "secondary" and it highly depends on the application.

Example: if Quest had a mode where both hands and controllers are tracked at the same time, there could be up to 6 active input sources:

  • 2 hands (with or without select events)
  • 2 controllers (with select events)
  • 2 transient pointers
  • of which e.g. 4 could be active simultaneously, which would still be allowed according to spec if I'm not mistaken.

I think instead of a way to see which input sources may be designated "primary" or "secondary" by the OS, it may be better to have a way to identify which input events are caused by the same physical action (e.g. "physical left hand has caused this transient pointer and that hand selection") so that application developers can decide if they want to e.g. only allow one event from the same physical source.

from webxr.

cabanier avatar cabanier commented on May 23, 2024

I don't think it's enough to disambiguate the events.
For instance, if a headset could track controllers and hands at the same time, what is the primary input?

If the user is holding the controllers, the controllers are primary and hands are second.
However, if they put the controllers down, hands become the primary and controllers are now second.

WebXR allows you to inspect the gamepad or look at finger distance so we need to find a way to let authors know what the input state is. Just surfacing everything will be confusing.

from webxr.

Manishearth avatar Manishearth commented on May 23, 2024

Since every site needs to be updated anyway,

Hold on, does it? I don't think we're requiring any major changes here.

from webxr.

cabanier avatar cabanier commented on May 23, 2024

Since every site needs to be updated anyway,

Hold on, does it? I don't think we're requiring any major changes here.

afaik no site today supports more than 2 input sources so they need to be updated to get support for hands and transient-input

from webxr.

Manishearth avatar Manishearth commented on May 23, 2024

I agree: If content isn't following the spec model of thing (as it can choose to do) I don't think adding more things will make it change its mind on that. Content had the option to treat these things in a more agnostic way, it still does.

from webxr.

cabanier avatar cabanier commented on May 23, 2024

The primary issue here is that libraries haven't been following the design patterns of the API.

Indeed, libraries such as aframe have been generating their own events based on either the gamepad or finger distance. Nobody looks at more than 2 inputs so every experience that request hands will be broken on the new vision pro release.
The hands spec mentions that it can be used for gesture recognition so we can't really fault developers for using it as a design pattern.

(By "broken" I mean that Vision Pro's intent to use gaze/transient-input as the input is not honored)

It's truly unfortunate that we've ended up in a situation where most content has optimized itself for a very specific set of inputs when the API itself is ostensibly agnostic to them (and I'm as guilty as anyone when it comes to the non-sample content I've built) but I don't think that we should be OK with making breaking changes to the API because of the choices of the libraries built atop it.

My point is that things are already broken. If an experience requests hands and does its own event generation, it will be broken once Vision Pro ships its the next version of its OS.
I'm seeing that all the developers on discord are updating their experiences to do their own event generation and that new logic will break in the near future because input is supposed to come from gaze.

My proposal to move secondary inputs to their own attribute will fix this and reduce confusion about what the primary input is. (See my hands and controllers example above)
The only drawback is that existing experiences that request hands, will only have transient-input

from webxr.

hybridherbst avatar hybridherbst commented on May 23, 2024

The primary issue here is that libraries haven't been following the design patterns of the API.

As both library implementor and library user, I can only partially agree. Yes, three.js handles it very minimalistic, as they often do, and that has already caused a number of problems (that are often prompty resolved when they actually happen).
Needle Engine for example handles any number of inputs, so I believe the next AVP OS update will "just work" for the most part.

However, I don't think the spec and API explain or handle:

  • close-range interactions that don't use select events (a finger poking at a UI element)
  • close-range interactions that may or may not have select events depending on device (a hand being pinched directly on an object in close range)
  • multi-source cases (a controller producing select events and a hand holding the same controller producing select events)

For example, the spec does not state that there is always an exact mapping of "one physical thing must only have one primary input source"; there could be more than one select events caused by the same physical action ("bending my finger") as per the spec, even if no device (that I'm aware of) does this today. I'm not sure if this is intended; and I'm not sure how anyone could build something entirely "future-proof" based on this unclarity.

I understand that cases like this are seen as "out of scope" for the spec, since they can be implemented on top of what the API returns. Yet, library users expect those cases to be handled or at least want to understand how to handle them. I don't think that counts as "not following the design patterns".

from webxr.

cabanier avatar cabanier commented on May 23, 2024

I understand that cases like this are seen as "out of scope" for the spec, since they can be implemented on top of what the API returns. Yet, library users expect those cases to be handled or at least want to understand how to handle them. I don't think that counts as "not following the design patterns".

I agree. Putting every tracked item in inputSources and leaving it up to authors is not a good indicator on how to handle multiple tracked items. (basing it on the name of the input profile feels like a hack)

Even the name is confusing "inputSources" since on Vision Pro, hands are NOT considered input; gaze is.
Likewise on Quest, if you hold controllers, your hands are NOT input or if you put the controllers down, they should stop becoming input.

Maybe instead of secondaryInputSources, we should call it trackedSources instead.

from webxr.

cabanier avatar cabanier commented on May 23, 2024

As an experiment, I added support for detached controllers to our WebXR implementation so you will now always get hands and controllers at the same time in the inputSources array.

I can report that every WebXR experience that I tested and that uses controllers was broken in some way with that change. Some only worked if I put the controllers down, others were just rendering the wrong controllers on top of each other and a couple completely stopped rendering because of javascript errors.
This is a clear indication that we can't just add entries to inputSources.

from webxr.

toji avatar toji commented on May 23, 2024

I guess I'm confused by how that's supposed to improve the situation vs. where we're at now. If we continue to use the input system as originally designed many apps will need to update their input handling patterns to account for new devices. If we introduce a new secondary input array... many apps will still need to update their input handling patterns to account for new devices?

from webxr.

cabanier avatar cabanier commented on May 23, 2024

I guess I'm confused by how that's supposed to improve the situation vs. where we're at now. If we continue to use the input system as originally designed many apps will need to update their input handling patterns to account for new devices. If we introduce a new secondary input array... many apps will still need to update their input handling patterns to account for new devices?

I'm saying:
If we continue to use the input system as originally designed many apps will break

I want to add support for concurrent hands and controllers but I can't make a change that breaks every site.

from webxr.

toji avatar toji commented on May 23, 2024

I understand that position but I'm trying to consider both the Quest's the Vision Pro's use cases.

Quest, by virtue of being a first mover in the space and the most popular device to date, has a lot of existing content built specifically to target it using abstractions that only really panned out for systems with Quest-like inputs. It's understandable that you're reluctant to break those apps. And I'm not suggesting that we do break them! (I still feel like we can and should expose hand poses and gamepad inputs on the same XrInputSource, but that's a slightly different topic).

An input system like Vision Pro's, however, will already be broken in those apps from day 1, so it's not a choice between breaking apps or not. They're just broken. So unless pages want to ignore Vision Pro users (or have been effectively abandoned by their creators, which we know is common) they'll have to update one way or the other. If updates are going to be mandatory to work on a given piece of hardware then I'd rather not invent new API surface to support it if what we already have serves the purpose.

Now, put bluntly I think that this is something Apple brought on themselves. I'm not a big fan of the limitations imposed by their input system, even if I understand the logic behind it. And I do think that if compatibility with existing apps is a high priority for Safari then there's probably reasonable paths that can be taken to introduce a not-particularly-magical-but-at-least-functional mode where hands emulate single button controllers. But those types of decisions aren't the sort of thing that this group is in the business of imposing on implementations.

from webxr.

cabanier avatar cabanier commented on May 23, 2024

An input system like Vision Pro's, however, will already be broken in those apps from day 1, so it's not a choice between breaking apps or not. They're just broken.

I don't believe that is the case. Only sites that request hand tracking will be broken since they won't look at more than 2 input sources.
My proposal will fix these sites because now hands will not be in the inputSources array anymore. Those sites should now work with transient-input, although they would no longer display hands.

So unless pages want to ignore Vision Pro users (or have been effectively abandoned by their creators, which we know is common) they'll have to update one way or the other. If updates are going to be mandatory to work on a given piece of hardware then I'd rather not invent new API surface to support it if what we already have serves the purpose.

How do you propose that I surface concurrent hands and controllers? How can I indicate to the author that hands are the primary input or the controllers?

Now, put bluntly I think that this is something Apple brought on themselves. I'm not a big fan of the limitations imposed by their input system, even if I understand the logic behind it. And I do think that if compatibility with existing apps is a high priority for Safari then there's probably reasonable paths that can be taken to introduce a not-particularly-magical-but-at-least-functional mode where hands emulate single button controllers. But those types of decisions aren't the sort of thing that this group is in the business of imposing on implementations.

Correct. Quest surfaces hands as single button controllers if hand tracking is not requested and this seems to work on a majority of sites.

from webxr.

Manishearth avatar Manishearth commented on May 23, 2024

How do you propose that I surface concurrent hands and controllers

They should be the same input source with a hands and gamepads attribute, yes? The spec was designed with this use case in mind

from webxr.

cabanier avatar cabanier commented on May 23, 2024

How do you propose that I surface concurrent hands and controllers

They should be the same input source with a hands and gamepads attribute, yes? The spec was designed with this use case in mind

No, they are different input sources. 2 hands and 2 controllers.

from webxr.

Manishearth avatar Manishearth commented on May 23, 2024

Oh I understand now. Not just the case of a hand grasping a controller

from webxr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.