video-dev / media-ui-extensions Goto Github PK
View Code? Open in Web Editor NEWExtending the HTMLVideoElement API to support advanced player user-interface features
License: MIT License
Extending the HTMLVideoElement API to support advanced player user-interface features
License: MIT License
For live/"DVR" content, it's common to have some indication as to whether or not they are currently playing "at the live edge". However, due to nature of HTTP Adaptive Streaming (HAS), the live edge cannot be represented as a simple point/moment in the media's timeline. This is for a few reasons:
seekable.end(0)
of a media element, which can then be used as a reference for any other live edge window computation.(Visual representation may help here)
Let's say a client player fetches a live HLS media playlist just before the server is about to update it with the following values:
# ...
# Unfortunately, EXT-X-TARGETDURATION is only an upper limit (>= any EXTINF duration) after rounding to the nearest integer
#EXT-X-TARGETDURATION: 5
# Client side "LIVE EDGE" will be 5.46 seconds into the segment below, aka 3 * 5 (target duration) = 15 seconds from the playlist end duration
# NOTE: Assume playback begins at the beginning of the segment below, since some client players choose to do this to avoid stalling/rebuffering, meaning playback starts -5.46 seconds from the "LIVE EDGE"
#EXTINF:5.49
#EXTINF:4.99
#EXTINF:4.99
#EXTINF:4.99
The server then ends up updating the playlist with two larger-duration segments (in spec and happens under sub-optimal but not unheard of conditions) before the client re-requests the playlist after 4.99 seconds (the minimum amount of time the player must wait) and continues re-fetching the available segments, with an updated playlist of:
# ...
#EXT-X-TARGETDURATION: 5
# NOTE: Current playhead will be 4.99 seconds into the segment below, assuming optimal buffering and playback conditions at 1x playback speed
#EXTINF:5.49
#EXTINF:4.99
#EXTINF:4.99
# New Client side "LIVE EDGE" will be 0.97 seconds into the segment below, aka 3 * 5 (target duration) = 15 seconds from the playlist end duration
#EXTINF:4.99
#EXTINF:5.49
#EXTINF:5.49
In this example, playback started 5.46 seconds behind the computed "LIVE EDGE" and, after a single reload of the playlist, ended up 11.45 seconds behind the next computed "LIVE EDGE" without any stalls/rebuffering. Note that, even in this example, we do not account for round trip times (RTT) for fetches, time to parse playlists, times to buffer segments, initial seeking of the player's playhead/currentTime
, and the like. Note also that, even without those considerations, the playhead still ends up > 2 * TARGETDURATION behind the "LIVE EDGE".
Since this information can be derived from a media element's "playback engine"/by parsing the relevant playlists or manifest, the extended media element should have an API to advertise what the live edge window is for a given live HAS media source. Call this the "live window offset"
Additionally, due to consideration (3), above, we should treat the seekable.end(0)
as the end time of a live stream accounting for the per-specification "holdback" or "delay".
seekable.end(0)
as "live edge" (with HOLD-BACK/etc) for HASTo account for the distinction between the live edge duration of the media stream as advertised by the playlist or manifest vs. the latest time a client player should try to play, based on per-specification rules and additional information also provided in the playlist or manifest, extended media elements SHOULD set the seekable.end(0)
value to account for this offset. This shall be assumed for all computations of the "live edge window", where seekable.end(0)
will be the presumed "end" of the window/range, already taking into account the aforementioned offset. With these offsets presumed, seekable.end(0)
may be treated as synonymous with a client player's "live edge" and these terms should be treated as interchangeable in this initial proposal.
seekable.end(0)
should be based on the inferred or explicit HOLD-BACK
attribute value, where:
HOLD-BACK
The value is a decimal-floating-point number of seconds that indicates the server-recommended minimum distance from the end of the Playlist at which clients should begin to play or to which they should seek, unless PART-HOLD-BACK applies. Its value MUST be at least three times the Target Duration.
This attribute is OPTIONAL. Its absence implies a value of three times the Target Duration. It MAY appear in any Media Playlist.
seekable.end(0)
should be based on the explicit PART-HOLD-BACK
(REQUIRED) attribute value, where:
PART-HOLD-BACK
The value is a decimal-floating-point number of seconds that indicates the server-recommended minimum distance from the end of the Playlist at which clients should begin to play or to which they should seek when playing in Low-Latency Mode. Its value MUST be at least twice the Part Target Duration. Its value SHOULD be at least three times the Part Target Duration. If different Renditions have different Part Target Durations then PART-HOLD-BACK SHOULD be at least three times the maximum Part Target Duration.
seekable.end(0)
should be based on the explicit MPD@suggestedPresentationDelay
(OPTIONAL) attribute, when present, otherwise it may be whatever the client chooses based on its implementation rules. Per the spec:
it specifies a fixed delay offset in time from the presentation time of each access unit that is suggested to be used for presentation of each access unit... When not specified, then no value is provided and the client is expected to choose a suitable value.
MPD
element(NOTE: there may be additional suggestions/recommendations available via the DASH IOP)
seekable.end(0)
should be based on the ServiceDescription -> Latency@target
attribute. Note that this value is an offset not of the manifest timeline, but rather of the (presumed NTP or similarly synchronized) wallclock time. Per the spec:
The service provider's preferred presentation latency in milliseconds compared to the producer reference time. Indicates a content provider's desire for the content to be presented as close to the indicated latency as is possible given the player's capabilities and observations.
This attribute may express latency that is only achievable by low-latency players under favourable network conditions.
(NOTE: This implies that the value could change marginally over time based on precision and other wallclock time updates based on the runtime environment. However, since these differences should be minor, it's likely fine to treat this value as static for the case of this document and can likely be implemented as such in an extended media element)
liveWindowOffset
An offset or delta from the "live edge"/seekable.end(0)
. An extended media element is playing "in the live window" iff: mediaEl.currentTime > (mediaEl.seekable.end(0) - mediaEl.liveWindowOffset
).
undefined
- UnimplementedNaN
- "unknown" or "inapplicable" (e.g. for streamType = "on-demand"
)0 <= x <= Number.MAX_SAFE_INTEGER
- known stable value for current streamliveWindowOffset = 3 * EXT-X-TARGETDURATION
Note that this is a cautious computation. In many stream + playback scenarios, 2 * EXT-X-TARGETDURATION
will likely be sufficient. However, with this less cautious value, there may be edge cases where standard playback will "hop in and out of the live edge," so recommending the more cautious value here.
liveWindowOffset = 2 * PART-TARGET
Unlike "standard" segments (#EXTINF
s), parts' durations must be <= #EXT-X-PART-INF:PART-TARGET
(without rounding). Also unlike "standard," HLS servers must add new partial segments to playlists within 1 (instead of 1.5) Part Target Duration after it added the previous Partial Segment. This means that, even under sub-optimal conditions, low latency HLS should end up with a much smaller liveWindowOffset
.
TBD
targetLiveWindow
. Since this value represents a window for the "live edge" and not for "available live content to seek through/play", having both refer to the "live window" will likely be confusing. In the current related preliminary implementation in Media Chrome, we refer to the related attribute as the livethreshold
. Should that be the name here as well? Do we want the name to try to capture the fact that this is an "offset" value from the "live edge"/seekable.end(0)
?livewindowoffsetchange
event. While we cannot likely rely on any of the built in HTMLMediaElement
events, we should be able to guarantee computation of the relevant values before dispatching the streamtypechange
event, as documented in #3. Is this repurposing of the event acceptable? Should we consider a more generic event name that more clearly relates to states announced for stream type, DVR, live edge window offset, and potentially additional future properties/state?While many video & podcast players have a playback speed control. Also I & 1 million+ other people installed @igrigorik's VideoSpeed.
Good idea for a standard HTML video/audio player UI?
Some video speed UI have +/- 10 seconds or move frame-by-frame. Might be out of scope for this proposal, but maybe cross-mentioned?
Bonus: array input to hand config:
<video controls speed="0.5, 0.75, 0.9, 1, 1.1, 1.25, 1.5, 2" "width="250">
<source src="/media/cc0-videos/flower.webm" type="video/webm">
<source src="/media/cc0-videos/flower.mp4" type="video/mp4">
</video>
Today, video.error
means video playback has failed, but there are many things that happen under the hood that are not ideal even if they don't cause a playback failure. Most notable are problems that result in rebuffering/stalling, slow startup times, or lower quality renditions.
We don't have a standard way to report/capture these issues for the sake of reporting to analytics or responding to in real time.
The API could mimic the errors API
video.addEventListener('warning', (evt)=>{
let warning = video.warning;
});
That seems good enough as long as:
I could also see an argument for limited-length array of warnings. But that feels more complicated to manage so I'd prefer to avoid that direction if possible.
This is as much a question for discussion as it is a suggestion. It's possible console warnings could be be good enough.
Follow-ups:
Originally posted by @gkatsev in #5 (comment)
The idea of different “stream types” has been around for a long time in various HTTP Adaptive Streaming (HAS) standards and its precursors in some manner - minimally distinguishing between “live” content and “video on demand” content. However, these categories aren’t consistently named or distinguished in the same way across the various specifications. Moreover, there is no corresponding API in the browser. Yet these categories directly inform how one expects users to consume and interact with the media, including what sort of UI or “chrome” should be made available for the user. By way of example, the built in controls/UI in Safari that show up for a live src are different than those that show up for a VOD src. This proposal aims to normalize the names and definitions of StreamTypes (in a way that is extensible and evolvable over time) by way of how they are expected to be consumed and interacted with by a viewer/user. It also provides a concise and easy to understand differentiator for anyone implementing different UIs/controls/"chromes" for the various stream types.
An additional goal of this proposal is to recommend for MSE-based players or “playback engines” to try to normalize their use of existing APIs to be as consistent as possible with the proposed inferred StreamType Algorithm.
"unknown"
(default) - There is no media content or there is currently insufficient information to determine the StreamType of the current media content (e.g. metadata or similar is still loading, async default StreamType inference not yet done)"vod"
(“Video on Demand”) - The media content has a known start and end time and is intended to be randomly seekable from start to end as long as the content is available at all"live"
- The media content is intended to be viewed at the “live edge” as forward/subsequent content is made available over time and is not intended to be seekable at all"dvr"
- The media content has a known start time and by default is intended to be viewed at the “live edge” as forward content is made available over time, but all backward/previous content is also available for seeking from start to the current “live edge”"sliding"
(“Sliding Window”, “Partial DVR”) - The media content is by default intended to be viewed at the “live edge” as forward content is made available over time, but is also intended to be seekable within a (roughly) consistent time window relative to the current “live edge”type StreamType = "unknown" | "vod" | "live" | "dvr" (| "sliding"?) (| string?)
HTMLMediaElement::get streamType() {} : StreamType
streamType
is set. See below for algorithmHTMLMediaElement::set streamType() {}
streamtypechange
streamType
changes (inferred or explicitly set)"dvr"
and "live"
"live"
/"dvr"
stream, the computed stream type could change to "vod"
based on the currently proposed algorithm.VOD
("vod"
), EVENT
("dvr"
)
EXT-X-PLAYLIST-TYPE
tag value (https://datatracker.ietf.org/doc/html/draft-pantos-hls-rfc8216bis-10#section-4.4.3.5)"live"
EXT-X-ENDLIST
tag (https://datatracker.ietf.org/doc/html/draft-pantos-hls-rfc8216bis-10#section-4.4.3.4)EXT-X-PLAYLIST-TYPE
tag (https://datatracker.ietf.org/doc/html/draft-pantos-hls-rfc8216bis-10#section-4.4.3.5)static
(“vod”
), dynamic
(“dvr”
or “live”
- cannot differentiate by attr)
MPD@type
attribute value (§5.3.1.2, Table 3 — Semantics of MPD element)“dvr”
MPD@timeShiftBufferDepth
(§5.3.1.2, Table 3 — Semantics of MPD element) grows consistently with the available Segments & wall clock time and has a consistent computed start time (similar to inferred algorithm for "dvr"
)HTMLMediaElement::duration
MediaSource::duration
HTMLMediaElement::seekable
MediaSource::setLiveSeekableRange()
Issue for gathering the complexities of ads that should be supported via the media element API, to make ad-related changes to the user interface.
Related discussion for media-chrome as we determine the ad-related HTML attributes that could be set on UI elements.
muxinc/media-chrome#34
Ads is a complex technology space, but similar to adaptive streaming and media source extensions, a lot of the ads complexity should live "below the surface" of the media element API. What should be exposed at the media element API level should be focused on making common ad-related user-interfaces possible.
Directly assigning HTML controls to video player controls:
<input type="checkbox" invoketarget="my-video" invokeaction="playpause" >Play/Pause</button>
<input type="checkbox" invoketarget="my-video" invokeaction="mute">Mute</button>
<video id="my-video"></video>
https://github.com/keithamus/invoker-buttons-proposal#customizing-videoaudio-controls
https://github.com/keithamus/invoker-buttons-proposal/issues/28
https://github.com/keithamus/invoker-buttons-proposal/issues/14#issuecomment-1744204920
While this proposal is not adding capabilities to <video>
, it can directly affect that element, & I hope you can give feedback.
Allow a user to select from a set of video quality levels/resolutions/renditions/bitrates/variants/representations.
Was hoping to start this with a PR, but some research and discussion will be helpful first.
Related conversation: whatwg/html#562 from @dmlap
The proposed extension to VideoTrack seems promising.
partial interface VideoTrack {
sequence<string> getAvailableRenditions();
// promise resolves when change has taken effect
Promise<void> setPreferredRendition(string rendition);
};
Something to solve for is "auto".
Ping @gkatsev @littlespex
NOTE: This proposal began as a subset of the Stream Type - Proposal #3 but was descoped due to complexities and the decision to model it as a separate state.
NOTE: A discussion on the complexities and permutations of "DVR", both using available HTTP Adaptive Streaming (HAS) manifests/playlists and inferring from the state of a given HTMLMediaElement
instance can be found in this google doc, which also has comments enabled. Please read this document, as it provides relevant context for the proposal below.
A subset of "live streaming media" is intended to be played with seek capabilities for the viewer. This is frequently referred to as "DVR," and typically falls into one of two categories:
For both of these cases, although the media is live, the "intention" is to still allow users to seek through the media during playback.
Below are the total possible DVR states (for more on why, see the Google Doc, referenced above).
"standard"
- The media stream is live and all previous media content will be available"sliding"
- The media stream is live and a sufficient amount of previous media content will be available for seeking"none"
- The media stream is on-demand, or the media stream is live and there will not be a sufficient amount of previous media content available for seeking."any"
- The media stream is live and is either "standard"
or "sliding"
, but it is (currently) ambiguous which of these two it is."unknown"
- There is no media stream, or the media stream is live, but it is (currently) ambiguous if it's "none"
(no DVR), "standard"
, "sliding"
, or "any"
.undefined
- The DVR feature is unimplemented by the media element."standard"
support only)This version of the proposal intentionally omits/"doesn't solve for" any account of "sliding"
.
HTMLMediaElement::get dvr() {} : boolean | null
true
means "standard"
false
means "none" | "sliding"
(where "sliding"
is not within the scope of this proposal and therefore is "under-determined" by this value alone)null
means "unknown"
undefined
or not defined means unsupported"dvrchange"
detail = dvr
HTMLMediaElement
whenever dvr
changes"standard"
support only)Only rely on HLS playlist (EXT-X-PLAYLIST-TYPE:EVENT
) or MPEG-DASH manifest (MPD@type="dynamic && !MPD@timeShiftBufferDepth
) parsing to derive dvr
. Any other process will result in ambiguities. For more, see the Google Doc, referenced above.
type DVRType = "standard" | "sliding" | "none" | "any" | "unknown"
HTMLMediaElement::get dvr() {} : DVRType
undefined
or not defined means unsupported"dvrchange"
detail = dvr
HTMLMediaElement
whenever dvr
changesHTMLMediaElement::get minSlidingWindow() {} : number
"sliding"
, aka HTMLMediaElement::seekable.end(0) - HTMLMediaElement::seekable.start(0) >= HTMLMediaElement::minSlidingWindow
-> HTMLMediaElement::dvr === "sliding"
HTMLMediaElement::set minSlidingWindow(value: number) {} : void
To be documented formally if this is the preferred adopted proposal. Most of this may be determined from the Google Doc, referenced above.
"standard"
support only)true|false
for both HLS and MPEG-DASH "immediately" (after loading and parsing the playlists/manifests once per media stream)"sliding"
(and corresponding "uncertain" states such as "any"
or "unknown"
in the case of early stream starts). This is because any implementations that add a future "sliding"
support (assuming new properties are introduced) will simply treat these as "live"
unless/until they integrate with the new interface. This feels far less risky than the other way around, where "live"
streams would suddenly and unexpectedly start showing up as "DVR" (seekable).A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.