I would like to understand how WebCodecs supports content protection. In <a href="htt

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

CC: <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Support for content protection,about w3c/webcodecs

Comments (32)

dalecurtis commented on July 4, 2024 5

FWIW, I'm prototyping support for EME+MSE+WebCodecs in Chromium at:
https://bugs.chromium.org/p/chromium/issues/detail?id=1144908

It's not something we're yet planning to ship, but I did want to flesh it out for folks to test. At this point, I've got clearkey working and tested (Widevine should work as well, but it's untested for now). It's available in Chrome 120.0.6074.0+ behind the --enable-blink-features=MediaSourceExtensionsForWebCodecs flag.

The new IDL ends up looking like this so far:

dictionary SubsampleEntry {
  required unsigned long clearBytes;
  required unsigned long cypherBytes;
};

dictionary EncryptionPattern {
  required unsigned long cryptByteBlock;
  required unsigned long skipByteBlock;
};

dictionary DecryptConfig {
  required DOMString encryptionScheme;
  required AllowSharedBufferSource keyId;
  required AllowSharedBufferSource initializationVector;
  required sequence<SubsampleEntry> subsampleLayout;
  EncryptionPattern encryptionPattern;
};

dictionary AudioDecoderConfig {
  // ...
  DOMString encryptionScheme;
}

dictionary VideoDecoderConfig {
  // ...
  DOMString encryptionScheme;
}

dictionary EncodedAudioChunkInit {
  // ...
  DecryptConfig decryptConfig;
}

dictionary EncodedVideoChunkInit {
  // ...
  DecryptConfig decryptConfig;
}

from webcodecs.

padenot commented on July 4, 2024 2

Well, that's been shipping for some time already in multiple browser / OS combination.

This is about extending this for Web Codecs, which means essentially giving a key to a decoder, and then preventing read backs of VideoFrame while allowing its painting. There are talks about write-only drawing primitives already, iirc.

Essentially do what browsers do when implementing (e.g.) Widevide Level 1 support, but with a bit more flexibility.

from webcodecs.

sandersdan commented on July 4, 2024 2

The definition of "chunk" is codec-specific, see the codec registry https://www.w3.org/TR/webcodecs-codec-registry/.

As an example, an AVC chunk is an Access Unit.

from webcodecs.

rayvbr commented on July 4, 2024 1

🦄 present :)

Jokes aside, if WebCodecs + DRM would exist, I can assure you there would be a market for it. The main reason we use WebCodecs over MSE/EME is not so much the decreased latency but the control it gives us over the rendering process. For our use case, an advanced multiview player, we need frame-level control over the rendering process, with each frame potentially requiring different shader parameters. A MSE-like approach where the output of a decoder is directly rendered to a view simply does not work in our case because of the lack of (control over) the synchronisation between the WebGL pass and the video decoder output. Which is the primary reason that so far we've been limited to ClearKey-like DRM schemes in browsers, while we can offer proper DRM in native applications

from webcodecs.

rayvbr commented on July 4, 2024 1

@murillo128 yes, that's correct. So having EME/DRM support for WebCodecs would be a necessary although by itself not sufficient setp for enabling our use case.

I added a more extensive description of our use case here: #483

from webcodecs.

chcunningham commented on July 4, 2024

CC: @joeyparrish @xhwang-chromium

from webcodecs.

sandersdan commented on July 4, 2024

So far I have been assuming that we will be able to associate a MediaKeys (EME) instance to a VideoDecoder (or AudioDecoder), but lacking a protected video output path it's not yet clear how that will work. It may be possible to plumb a protected picture using a canvas imagebitmap context, but I don't know if that is workable on all platforms. It may be necessary to integrate more directly with <video> than the current proposal.

I don't know of any way to use an app-supplied decoder with MediaKeys. I don't think there is any path for encoding with MediaKeys.

from webcodecs.

aboba commented on July 4, 2024

@sandersdan The use cases I'm aware of typically involve decoding rather than encoding. Encoding would be involved in video upload, but typically content protection isn't introduced there. Decode use cases often involve low-latency streaming (over RTCDataChannel or WebTransport), where WebCodecs would provide a higher performance and potentially more interoperable substitute for "low-latency MSE". The events being streamed could be sporting events, musical or theatrical performances, political gatherings, company meetings, games or AR/VR demonstrations.

Some examples are collected here.

from webcodecs.

n8o commented on July 4, 2024

I am also interested in this scenario. Being able to use WebCodecs with DRM (Fairplay, widevine, playready) would be beneficial. We could keep the DRMed elementary stream in the same format to be compatible with CMAF but without the container.

from webcodecs.

chcunningham commented on July 4, 2024

triage note: marking 'extension', as the anticipated shape (associating MediaKeys) would be done via new members on the config dict, or new methods on the codec interface. As Dan points out, further extensions to canvas are probably also necessary.

from webcodecs.

bobh-dazn commented on July 4, 2024

There are at least three key parts to a secure video pipeline:

Trusted execution
Secure video pipeline
Secure key exchange

In the first part you, as the person issuing the protected content, have to trust the client implementation that it won't allow the plaintext elementary streams, or the uncompressed essences out of the system. You can't have an EME pipeline that passes plaintext elementary streams back in a way which can be inspected, modified or diverted. You also cannot have a decode pipeline which allows injection of code, inspection of code or access to the image space by other code. In previous secure implementations I have worked with any transformations of the video were usually done as a separate layer where the underlying software had no actual access to the video frame buffer but it could command the frame buffer to resize, warp, etc. An upper layer image compositor in secure memory space was the final arbiter for the render.

Secure key exchange doesn't have to be rocket science or proprietary, the principle is well understood, again it's about protected memory that other (untrusted) components of the system cannot access. It's possible to define secure domains of trust where a crypto pipeline could request secrets and not allow them to pass outside of that execution environment.

If there was a browser implementation of secure execution pipelines which was able to make use of native trusted execution, and in which the browser was effectively able to execute signed code (even if not encrypted / scrambled) against secure memory then there would even not necessarily need to be proprietary DRM implementations. Ultimately it's about not allowing image/video data in memory to be inspected or read, even if being modified, that's the key factor. It's been done before in hardware and it could probably be allowed in software when backed up by a trusted system. But if the render pipeline isn't secured somehow then it's basically not going to get traction with those who care about premium content rights.

from webcodecs.

dalecurtis commented on July 4, 2024

I agree with Paul. However, I think we could only offer any flexibility with L3 protected content, for L1 the frames never come back from the hardware. For L3, we'd could do something like mark the frames as tainted (VideoFrame would need to grow this) like we do for CORS, so sites can manipulate them via canvas/WebGL, but can't read them back. For L1, probably at most we could allow sites to decode opaque frames and then pass them to a MediaStreamTrackGenerator that goes into a <video src> since they can't be used with canvas or WebGL.

from webcodecs.

aboba commented on July 4, 2024

One question is how WebCodecs works with SFrame. Is it possible for WebCodecs Decoder to take an SFrame as input directly without exposing the key and cleartext encryptedChunk to Javascript? Similar questions have arisen with WebRTC Encoded Transform.

from webcodecs.

chcunningham commented on July 4, 2024

My first time really looking at SFrame. At a glance it looks pretty different from EME use cases. As you know, EME inovles license servers, output protection, etc... often removing the UA entirely from roles of decoding and rendering. Whereas IIUC, SFrame is more protecting frames on the wire, but is less concerned with protecting them from javascript. My first thought would be that SFrame encryption/decryption seems like a post-encode/decode step, so it's external post processing outside of WebCodecs. Seem right?

Noob question: I see lots of discussion on how to prevent JavaScript access to the SFrame keys. Obviously keys are sensitive, but I don't quite follow why they're concerned w/ restricting JS access to the keys if they're not also restricting JS access to the raw media?

from webcodecs.

aboba commented on July 4, 2024

@chcunningham SFrame is about protecting content from access by untrusted parties. There are use cases where the Javascript may be trusted or untrusted.

Where the JS is trusted, the application is allowed to access keys as well as raw media (e.g. VideoFrames) but can encrypt the content to prevent access by an untrusted middlebox (e.g. a conference server provided by a CPaaS service). In this use case, the SFrame would be decrypted to yield an encrypted chunk prior to WebCodecs decode, or the encrypted chunk from the WebCodecs encoder would be encrypted prior to transmission. An example of this use case would be a Javascript application written using a Cloud Communications Platform SDK.

If the JS is untrusted, then the web application should not have access to the keys and operations on the cleartext content should be restricted. For example, the application should not be able to record the content or have access to the raw data in order to transform it (e.g. so as to protect against creation of deep fakes). This use case more resembles an EME use case.

For example, a sporting event or concert could be streamed in low latency by using WebTransport or RTDataChannel for transport and WebCodecs for decode. Content protection might be desired in this scenario, but without the overhead of containerization.

In this use case, the content is being played on a Javascript application or device from another vendor (e.g. a concert offered by a streaming service, played on a device like a Roku, AppleTV, etc.).

from webcodecs.

chcunningham commented on July 4, 2024

What would motivate folks to use sframe in the sporting event example? I definitely follow the use case, but I'd expect they'd essentially want EME for WebCodecs, reusing large parts of the existing EME infrastructure.

from webcodecs.

chcunningham commented on July 4, 2024

After learning more about the motivations being "untrusted JS", I don't think we should pursue adding SFrame APIs to WebCodecs. JS trust is required for most of the web: authentication, email, banking, shopping, ... The platform increasingly ensures that trust at higher levels (https, cross origin isolation, ...) while exposing more and more power to applications. This runs counter to all of that. The trust isn't perfect, but domain specific (RTC, WebCodecs) solutions add complexity while leaving most of the problem unsolved.

I'm all for E2E encryption, but having the app manage it.

from webcodecs.

aboba commented on July 4, 2024

Agree that the goal is probably "EME for WebCodecs". But what does that imply for the format to be transferred over the wire and subsequently fed to WebCodecs decoder? Is protected content transferred the same way it is today? Is the proposal to allow WebCodecs decoder to decode that? If not, what is the alternative? I mentioned SFrame because that is a non-containerized format for encrypted frames.

from webcodecs.

chcunningham commented on July 4, 2024

My first thought would be to provide the EME stuff that is usually in the container as part of the *DecoderConfig (e.g. these things). Then a chunk would be just as it is now, only encrypted.

Also, to reuse most of EME, we might add MediaKeys as another member of *DecoderConfig. Flow is then

get media keys as you do today
configure decoder w/ it + decryption metadata usually found in container
decode() encrypted chunks, get VideoFrame outputs

Then add the restrictions Dale talked about above.

Disclaimer: very off the cuff design.

from webcodecs.

chcunningham commented on July 4, 2024

Hey group, I want to follow up on the discussion from the WebRTC : Media TPAC call (minutes).

@dontcallmedom mentioned the concern could be that the app itself is untrusted with the media (may snoop). Questions:

Can you help me understand these users better? Why are they in this position? How hypothetical / practical is this concern? Where can I read more?
Even if we enable protection of the content all the way to the screen/speakers, aren't we still trusting the app to invoke those APIs that do the encryption?

At this point I see the value of EME for streaming scenarios (sporting, gaming, ...), but I'm less clear on solutions for trust problems in communications use cases.

@mwatson2 (and Richard?) mentioned that we may be able to register SFrame's encryption mechanism within EME. Sounds interesting. If we pursue EME:WC for streaming uses, my thought was to register a new stream type that re-used existing EME encryption modes (I guess CBC?) for maximum compatibility with existing infra. Does that sound right? I'm new to SFrame, but it looks like it uses the HMAC AEDA mode. If we additionally pursue EME:WC for RTC uses, maybe we'd want to support HMAC AEDA in addition to CBC?

@fluffy mentioned that EME had been considered for RTC in the past, but found to be a poor fit. Something about the number keys? Can you say more?

from webcodecs.

mwatson2 commented on July 4, 2024

@chcunningham Existing EME has a stream format registry which describes the supported stream formats (currently ISO BMFF and WebM), specifically how the encryption is applied to the media bytes inside those different containers.

Wheres existing EME works with MSE - which accepts media as a byte stream and thus we need a "stream format" specification - for EME with WebCodecs you would presumably need a "frame format" specification. This would describe for each frame format how the encryption would be applied to the bytes of the frame and - differently from the MSE case - any additional per frame metadata that needs to be applied to drive the decryption.

I could imagine a frame format for SFrame and equally one for a Common Encryption frame. The latter would be useful for streaming applications that wanted use the same source files as existing MSE.

I also find it hard to think of applications where EME would be useful for real-time communication (rather than streaming). Such an application would need the property that the sender wants some guarantees about what will be done with the media they are sending, the sender does not trust the application at the receiver but the sender does trust the CDM component at the receiver. Perhaps an "unrecordable" videoconference tool ?

from webcodecs.

aboba commented on July 4, 2024

Mark said:

"I could imagine a frame format for SFrame and equally one for a Common Encryption frame. The latter would be useful for streaming applications that wanted use the same source files as existing MSE."

[BA] I'm trying to understand how "Common Encryption Frame" would differ from SFrame. Are there inherent differences in requirements that would lead to format differences? Or would CEF and SFrame be very similar, with the only major difference being the key management protocols used for different scenarios? In that situation, I'd suggest that only one of the formats is likely to be widely deployed.

"I also find it hard to think of applications where EME would be useful for real-time communication (rather than streaming)."

[BA] As noted in today's meeting, the use cases are blending together. In the "Together Mode" scenario, you have realtime streams ingested and combined with a low-latency sports stream. Since the goal is to produce a composited stream, it wouldn't make sense to E2E protect the realtime stream. However, it might make sense to protect the composited stream.

"Such an application would need the property that the sender wants some guarantees about what will be done with the media they are sending, the sender does not trust the application at the receiver but the sender does trust the CDM component at the receiver. "

[BA] Today very large realtime conferences are often implemented via a combination of "low latency ingestion" plus low-latency streaming. Think of a company meeting, a very large class, a concert for a mass audience, or an online political rally. For these kind of large meetings, the content can be considered valuable and vulnerable to theft, and even if there is no content fee, it might be important that the content not be modifiable so as to create "deep fakes".

In these scenarios, the media uploader trusts the ingestor. Since the ingestion system may need to modify the content (e.g. transcode it or combine it with other streams), there is no need for content protection or E2E encryption on the ingestion leg. However, there is a desire to prevent theft or manipulation of the finished product, so you might have content protection on the downstream link.

These scenarios can be implemented today using containerized media and transports such as WebTransport or RTCDataChannel, combined with MSE. So we're not really talking about new use cases or threat models. The question is how the wire format changes if WebCodecs is used instead of MSE. The goal is to transport an encrypted encoded chunk over the wire then decrypt it and decode it via WebCodecs.

from webcodecs.

mwatson2 commented on July 4, 2024

[BA] I'm trying to understand how "Common Encryption Frame" would differ from SFrame.

I'm not all that familiar with SFrame, but for Common Encryption, there are several different encryption schemes for each of AES-CTR and AES-CBC and each has some metadata which describes how the encryption is applied. For example, for the cenc scheme, which uses ASE-CTR mode, there is a per frame map describing which bytes of the frame are encrypted and which are not as well as the Initialization Vector for the encryption. The AES-CTR mode 16-byte cipher blocks are then applied to the concatenation of the encrypted regions of the frame. There are several other schemes with slight variations on this.

I imagine for SFrame there is a similar description for each cipher of one or more ways the cipher can be applied to the bytes of the frame and perhaps also metadata controlling that ?

from webcodecs.

aboba commented on July 4, 2024

A link to a DASH-IF presentation describing Content Protection requirements for WebCodecs is here.

from webcodecs.

murillo128 commented on July 4, 2024

After thinking about this, I don't think that there is any use case for supporting DRM in webcodecs for playback. Let me explain.

If we are going to use WebCodecs with DRM for playback, we will need to use a VideoDecoder to extract the raw audio/video frame, then create a media stream track and playing it on a video element. As we want to use DRM, the decoder will have to decrypt the encrypted frame, but we can't output a raw frame to the media frame to JS, but we would need to pass an opaque handler instead. After that we would need to create the media stream track with the opaque frames for playing back in the video element.

This is already available in MSE and I don't see any added value from WebCodecs for this use case. The only caveat is that the MSE use "containerized" frames while we want to provide "uncontainerized" ones.

So wouldn't it just be simpler to extend MSE to accept Encoded(Audio/Video)Chunks instead?

What functionality would be missing?

from webcodecs.

aboba commented on July 4, 2024

@murillo128 Finding developers who require DRM support in WebCodecs is a bit like searching for a veterinarian properly trained to care for a unicorn. After a long (and unfruitful) search, you begin to wonder if they exist.

There are several reasons why developers who were formerly using MSE (and containerization) have been moving to WebCodecs (and raw media transport). Most find that WebCodecs provides decreased latency, partly due to the removal of containerization/decontainerization operations as well as support for workers (which are also now supported by MSEv2). However, one of the other key characteristics of applications that have moved from MSE to WebCodecs is that they do not need DRM.

So while I could speculate whether it makes sense for WebCodecs to support DRM or whether it would be better for MSEv2 to accept Encoded(Audio/Video)Chunks, it is probably best to wait until we "come across a unicorn that needs a veterinarian".

Do you know of an application that needs both WebCodecs and DRM?

from webcodecs.

n8o commented on July 4, 2024

There might be some gaps in my knowledge with MSE, EME and WebCodecs. So please correct me if i'm wrong. When using MSE in the past we ran into some issues with buffers. Different browsers would implement their buffer rules differently. Which didn't really matter until you got down to very low latency video. whatwg/html#4638. Basically it came down to how much it had to buffer before it could play. But there are other scenarios to to deal with packet loss and etc as well. Where more direct access to the buffer would be needed. When WebCodecs were first announced, I thought it would be the answer to that problem. Microsoft has a media framework called Media Foundation. It's one of the better video API's I've worked with. What you do with MF is you create a buffer of IMFSamples. Each Sample points to a buffer of data that represents a sample in an elementary stream. You set a presentation time and a duration for the sample. The second part is a sink where you send your data. The sink will dispatch an event asking for a frame and then you respond to the event by calling ProcessSample with a reference to the IMFSample. This way the entire buffer is managed, from when to start playing, how to handle stalls and how to recover. https://learn.microsoft.com/en-us/windows/win32/api/mfobjects/nn-mfobjects-imfsample To initialize media to use the Protected Media Path you would set flags on the IMFSample to signal the DRM used in the elementary stream. https://learn.microsoft.com/en-us/windows/win32/medfound/sample-attributes The second part is removing the need for containerization. I don't know if this second one will actually be an issue. ISOBFF isn't really well designed for live streaming. Depending on how things evolve with the MOQ (Media Over Quic). It might become a point of friction. So where should these changes live? I'm not sure. It's either making it work with the lower level WebCodecs API or extending support for MSE and EME.

…

-nate

On Mon, Feb 6, 2023 at 2:50 PM Sergio Garcia Murillo < ***@***.***> wrote: <External Email> After thinking about this, I don't think that there is any use case for supporting DRM in webcodecs for playback. Let me explain. If we are going to use WebCodecs with DRM for playback, we will need to use a VideoDecoder to extract the raw audio/video frame, then create a media stream track and playing it on a video element. As we want to use DRM, the decoder will have to decrypt the encrypted frame, but we can't output a raw frame to the media frame to JS, but we would need to pass an opaque handler instead. After that we would need to create the media stream track with the opaque frames for playing back in the video element. This is already available in MSE and I don't see any added value from WebCodecs for this use case. The only caveat is that the MSE use "containerized" frames while we want to provide "uncontainerized" ones. So wouldn't it just be simpler to extend MSE to accept Encoded(Audio/Video)Chunks instead? What functionality would be missing? — Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_w3c_webcodecs_issues_41-23issuecomment-2D1419657111&d=DwMCaQ&c=jGUuvAdBXp_VqQ6t0yah2g&r=i0rehb3rvq2O5PWG8d0CLeBNU-II2tZRMSNt-D7ChPM&m=tMbknq7FIo1anxZ-8O-SI062Dneysm1kkzXouyRCy4TKVfveQ5nQqz_Uj6Dr9Fyd&s=P8dVSBWPCW9QecuLP96DK8pgFS1Aq4q50GnQT_L4iZc&e=>, or unsubscribe <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAC46WCYU2HPSFMEFIR5OATWWFIX5ANCNFSM4KT6ICIA&d=DwMCaQ&c=jGUuvAdBXp_VqQ6t0yah2g&r=i0rehb3rvq2O5PWG8d0CLeBNU-II2tZRMSNt-D7ChPM&m=tMbknq7FIo1anxZ-8O-SI062Dneysm1kkzXouyRCy4TKVfveQ5nQqz_Uj6Dr9Fyd&s=icuqNHQO1FLROaUEL6coZODhYotHsBN40kqzfD3sAWY&e=> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

from webcodecs.

murillo128 commented on July 4, 2024

@rayvbr wouldn't your use case also require webgl to support DRM'd textures?

from webcodecs.

vitaly-castLabs commented on July 4, 2024

I totally agree with @murillo128 - MSE/EME for non-containerized media makes more sense (better suited for DRM) than EME-for-WebCodecs. I'd use the latter for the lack of better options, although I'm not putting myself on the unicorn vets list as #483 clearly states WebRTC cases are out of the scope.

At the same time the DRM for WebRTC topic is not getting much love (w3c/webrtc-nv-use-cases#86) even though there's definitely interest from DASH-IF. This is out of the scope of the discussion either, so I'll just wrap it up with stating that the second best (MSE/EME for non-containerized media) will be hugely helpful too.

from webcodecs.

vitaly-castLabs commented on July 4, 2024

This is an exciting development, thank you for making this happen!

We (castLabs) discussed EME/DRM issues with Google folks (Chris, Matt and Harald to name a few) at length, but it did materialize in any tangible action, and then I learned both Chris and Matt were no longer with Google...

This already looks very promising as a prototype, however I don't fully grasp the IDL you posted. I'd expect (based on my experience working with ISO/IEC 23001-7 compliant media) something like that:

dictionary SegmentDecryptConfig {
  required DOMString encryptionScheme;
  required AllowSharedBufferSource keyId;
  required-if-encryptionScheme-is-cbc AllowSharedBufferSource initializationVector;
  EncryptionPattern encryptionPattern;
};

dictionary FrameDecryptConfig {
  required-if-encryptionScheme-is-ctr AllowSharedBufferSource initializationVector;
  required sequence<SubsampleEntry> subsampleLayout;
};

'Segment' here is a range of consecutive frames sharing the same en/decryption key - if there's no key rotation, then segment is equivalent to the entire media stream.

from webcodecs.

dalecurtis commented on July 4, 2024

There's no concept of segments w/ decoders, so the client must map the segment config into a per-chunk config.

from webcodecs.

vitaly-castLabs commented on July 4, 2024

This brings up the question I was going to ask before - what is "chunk"? I don't recall ever seeing "chunks" in any audio/video specs. Does that stand for something usually referred as "frame" or "access unit"?

WebCodecs spec itself only defines "key chunks": "An encoded chunk that does not depend on any other frames for decoding. Also commonly referred to as a "key frame", and the whole spec feels like an arbitrary mix of "chunk" and "frame" used interchangeably

from webcodecs.

Support for content protection about webcodecs HOT 32 OPEN

Comments (32)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent