Giter Site home page Giter Site logo

Video frame type about webcodecs HOT 13 CLOSED

w3c avatar w3c commented on July 4, 2024 1
Video frame type

from webcodecs.

Comments (13)

sandersdan avatar sandersdan commented on July 4, 2024 1

At this point I am leaning towards only supporting ImageBitmap in the first version of WebCodecs. This sidesteps plane and alignment questions (by not providing mappable buffers at all) while maintaining good performance for playback and transcode cases.

This does requires a readback to access pixel data, and conversion to RGB would be done as part of that process.

For now the best approach may be to ensure that future versions of WebCodecs can easily add new image representations.

from webcodecs.

murillo128 avatar murillo128 commented on July 4, 2024 1

From my point of view, being able to provide an efficient way of manipulating the video frame is a must for webcodecs, funny hats or head tracking being the obvious use cases.

Given encoder/decoder typical apis, i think we should provide direct access to a planar image with stride.

kind of what is already available on blink native video frame object:

https://cs.chromium.org/chromium/src/media/base/video_frame.h

from webcodecs.

pthatcherg avatar pthatcherg commented on July 4, 2024

Yeah, the ImageData is just what was convenient to stick there at the time.

from webcodecs.

guest271314 avatar guest271314 commented on July 4, 2024

See https://github.com/dsanders11/imagebitmap-getimagedata-demo.

from webcodecs.

jyavenard avatar jyavenard commented on July 4, 2024

I'm not sure if this should be pushed into another bug, or the discussion started here will do. But here goes my $0.02

ImageData is totally unsuitable for modern day applications. The only way to access the content is via a 8 bits RGB data buffer. To access that data, should the decoder be a GPU one, would require to perform a memory readback which would kill performance.

We need to be able to directly retrieve a decoded image such that it can be accessed via a handle such as a surface ID, so to be usable directly with WebGL or be able to access it via GPU methods only (such as a GL shader)

Additionally, we need to know what the format of that image would be. Most hardware decoder would output NV12 (8 bits) P010 or P016 (respectively 10 and 12/16 bits). Software decoder would output YUV 420, etc.
Additionally, need to know if it's 4:2:0, 4:2:2, 4:4:4 etc

from webcodecs.

pthatcherg avatar pthatcherg commented on July 4, 2024

Yes, the more I've looked into it, the more I have to agree. Unfortunately, the same may be true also of ImageBitmap. Currently, I'm leaning toward defining a new VideoFrame type that has:

.format: enum of "i420", "nv12", etc
.planes: int (convenience; could be inferred by format)
.onGpu: bool
.getPixelData(plane): returns raw pixel data of one plane; blows up if .onGpu

And then the same WebGL methods that work with ImageBitmap and HtmlVideoElement (such as texImage2D) would just work with VideoFrame passed in, and no readback would be required.

If one really wanted to do a readback, we could support that with something like:
.readFromGpu(): returns a new VideoFrame (async) that has .onGpu == false.

from webcodecs.

ytakio avatar ytakio commented on July 4, 2024

I'm not good at JS well, so I have one question relative to ImageData, ImageBitMap, and VideFrame data.

How does WebCodecs take care of stride of lines in picture data?
Some HW accelerations such as SIMD like SSE, NEON, etc. expect picture data in which each line is aligned with BUS-Bandwidth. It may not equal in a multiple of MB unit sometimes. (e.g. 720x480 -> MB:16x16, AVX:256bit/512bit register)

So the object of picture data needs to have some offset information to access in each line in each format.

While GPU has some API to extract into aligned memory from CPU's domain to GPU's domain as a transfer function for its streaming processors.

I think that WebCodes should treat memory alignment for effective video processing.
(Should I create a new issue? 😅 )

from webcodecs.

jyavenard avatar jyavenard commented on July 4, 2024

It makes you wonder then whom is this API targeted for then, and who has shown interest to implement such API.

Any APIs that requires dealing with RGB and perform read backs and conversion, will not be used for videos.

from webcodecs.

sandersdan avatar sandersdan commented on July 4, 2024

ImageBitmap provides a relatively efficient (GPU->GPU texture copy) path to shader access in WebGL, and a very efficient display path (ImageBitmapRenderingContext.transferFromImageBitmap()). Especially given that we don't have YUV data from all decoders (Android MediaCodec in particular), I don't think I want to start designing a new image primitive for the web for the first version of WebCodecs.

I do think we should design to allow opting-in to such a planar image primitive in the future, when it exists.

from webcodecs.

sandersdan avatar sandersdan commented on July 4, 2024

Efficient manipulation of video frames implies GPU-only operation (hardware buffer or texture primitive). There exist platforms where uncompressed frames are stored in CPU memory, in which case it is convenient to offer that access, but in general it implies a very expensive GPU readback operation.

from webcodecs.

murillo128 avatar murillo128 commented on July 4, 2024

by effective video manipulation I meant with as less mem copy/conversion as possible.

For example in the case of funny hat:

       //Get cam
	const cam = await navigator.mediaDevices.getUserMedia({video:true,audio:false});
	//Get video track reader
	const reader = window.reader = new VideoTrackReader(cam.getVideoTracks()[0]);
	//Create writter
	const writer = new VideoTrackWriter({});
	//Create transform stream
	const  transformer = window.transformer = new TransformStream({
		transform : (frame, controller) => {
			//paint something on img 
			controller.enqueue(frame);
		}
	})
	reader.readable.pipeTo(transformer.writable);
	transformer.readable.pipeTo(writer.writable);
	//Send it
	peerconnection.AddTrack(writer.track);

(note that this code works on chrome right now, except obviously the image manipulation)

The image bytes would be already in cpu mem in I420P memory (on most cases) or has to be converted to I420P for the webrtc encoders. (let's ignore vp9 mode 2 for now).

It would be desirable to be able to expose the underlying image data (if in memory) and if not, be able to export it to ImageBitmap. We would also need a way to create a VideoFrame from an ImageBitmap, ImageData or yuv raw data.

from webcodecs.

alexandre-janniaux avatar alexandre-janniaux commented on July 4, 2024

I do think we should design to allow opting-in to such a planar image primitive in the future, when it exists.

My answer probably reference #47 too.

I also don't think it should be a feature within webcodec or substitute ImageBitmap because ImageBitmap is a really good piece of interoperability between the different API available for rendering and displaying.

But specific plane access from the start is a necessary feature, especially given that if not done from the first hand, you'll probably get into the same status as Android and never implement it, or struggle to implement it afterwards. Likewise, an onGpu flag probably makes little abstraction in comparison with opaque picture in general.

On solution to match both the webcodec API which could stream ImageBitmap and the ImageBitmap API would be to be able to get an ImageBitmap reference for a plane of the ImageBitmap, and expose more metadata on it. That way, you could have

// pic is an ImageBitmap with format NV12 
glTexImage2D(...., pic); // perform the NV12 -> RGB conversion like before
ImageBitmap plane = pic.getPlane(0);
glTexImage2D(...., plane); // no conversion, this is a GL_LUMINANCE texture

And then you are still backward compatible regarding ImageBitmap and elegantly handle cases where you cannot extract the plane (Android for example) by exposing a RGB chroma directly, letting the underlying graphic system handle the chroma conversion, without extending vulkan or OpenGL API.

This is also in line with API like GBM, if you take a look at gbm_bo_get_plane_fd for example.

To get back to #47, we probably don't care about colorspace within the ImageBitmap, it is an information designed for the display system (so it can be private data here) and the processing systems, which will probably generate code or use extension for this so it can be given by webcodec or even the previous layers within a different object than the ImageBitmap itself. That's what we would expect here in VLC at least as the information comes from the demuxer and not the decoder, and it could evolve quickly whenever you want to add colorspace, mastering data, etc.

from webcodecs.

chcunningham avatar chcunningham commented on July 4, 2024

The spec now offers a VideoFrame interface with Plane interfaces for accessing the pixel data. An ImageBitmap can be generated from a VideoFrame for painting to canvas.

With this now defined, I'd like to close this issue and have new sub-issues filed for remaining gaps. Some known issues/plans described below. Please file a new issue for anything I've neglected.

We intend to add new features to this interface shortly, including

  • support for additional pixel formats (e.g. #90)
  • conversion between formats (#92)

Planar access to GPU backed frames is still a problem. In the short term we intend to at least make this transparent by having GPU backed VideoFrame's not initially offer any planar access, but provide a converter function that performs the copy to cpu memory when invoked.

Down the road we would like GPU backed frames to have some "buffer" type from WebGPU, such that inspection/manipulation of the pixels can happen without a GPU:CPU copy using WebGPU APIs.

from webcodecs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.