Giter Site home page Giter Site logo

Proposal for new RGBA output API about libass HOT 23 OPEN

 avatar commented on May 27, 2024
Proposal for new RGBA output API

from libass.

Comments (23)

 avatar commented on May 27, 2024

a list of non-transparent regions (dirty rects) is returned.

PS: I think this is what xy's interface does, although I haven't checked again.

I also realize we need some functions to handle the vsfilter colorspace bullshit:

void ass_set_video_colorspace(ASS_Renderer *priv, ASS_YCbCrMatrix matrix);
void ass_set_color_transform(ASS_Renderer *priv, uint32_t (*transform)(void *priv, uint32_t color), void *priv);

from libass.

Cyberbeing avatar Cyberbeing commented on May 27, 2024

Note that I chose to make the rendering result one big bitmap
avoid rendering transparent areas, a list of non-transparent regions (dirty rects)
PS: I think this is what xy's interface does, although I haven't checked again.

This is not really how SubRenderIntf.h functions. The interface only provides dimensions and positioning information tied to each individual subtitle bitmap frame. There is no separate list of dirty rectangle sizes exposed downstream.

// ---------------------------------------------------------------------------
// ISubRenderFrame
// ---------------------------------------------------------------------------

// This interface is the reply to a consumer's frame render request.

[uuid("81746AB5-9407-4B43-A014-1FAAC340F973")]
interface ISubRenderFrame : public IUnknown
{
  // "GetOutputRect()" specifies for which video rect the subtitles were
  // rendered. If the subtitle renderer doesn't scale the subtitles at all,
  // which is the recommended method for bitmap (DVD/PGS) subtitles formats,
  // GetOutputRect() should return "0, 0, originalVideoSize". If the subtitle
  // renderer scales the subtitles, which is the recommend method for text
  // (SRT, ASS) subtitle formats, GetOutputRect() should aim to match the
  // consumer's "videoOutputRect". In any case, the consumer can look at
  // GetOutputRect() to see if (and how) the rendered subtitles need to be
  // scaled before blending them onto the video image.
  STDMETHOD(GetOutputRect)(RECT *outputRect) = 0;

  // "GetClipRect()" specifies how the consumer should clip the rendered
  // subtitles, before blending them onto the video image. Usually,
  // GetClipRect() should be identical to "GetVideoOutputRect()", unless the
  // subtitle renderer repositioned the subtitles (see the top of this header
  // for more information about repositioning).
  STDMETHOD(GetClipRect)(RECT *clipRect) = 0;

  // How many separate bitmaps does this subtitle frame consist of?
  // The subtitle renderer should combine small subtitle elements which are
  // positioned near to each other, in order to optimize performance.
  // Ideally, if there are e.g. two subtitle lines, one at the top and one
  // at the bottom of the frame, the subtitle renderer should output two
  // bitmaps per frame.
  STDMETHOD(GetBitmapCount)(int *count) = 0;

  // Returns the premultiplied RGBA pixel data for the specified bitmap.
  // The ID is guaranteed to change if the content of the bitmap has changed.
  // The ID can stay identical if only the position changes.
  // Reusing the same ID for unchanged bitmaps can improve performance.
  // Subtitle bitmaps may move in and out of the video frame rectangle, so
  // the position of the subtitle bitmaps can become negative. The consumer
  // is required to do proper clipping if the subtitle bitmap is partially
  // outside the video rectangle.
  // The memory pointed to by "pixels" is only valid until the next
  // "GetBitmap" call, or until the "ISubRenderFrame" instance is released.
  STDMETHOD(GetBitmap)(int index, ULONGLONG *id, POINT *position, SIZE *size, LPCVOID *pixels, int *pitch) = 0;

from libass.

 avatar commented on May 27, 2024

Thanks. Can the bitmaps overlap?

from libass.

 avatar commented on May 27, 2024

By the way, ISubRenderFrame looks good (and we should probably use something equivalent to make implementing a libass backend for it easier).

Do you know anything how madvr (AFAIK the only renderer yet using this interface) handles texture management?

from libass.

madshi avatar madshi commented on May 27, 2024

Hey there. I guess the bitmaps could theoretically overlap, but since I'm only responsible for madVR, I don't really know. Ultimately the subtitle renderer decides whether bitmaps overlap or not. madVR only renders what it's told to render. I think due to using pre-multiplied alpha, overlapping subtitle bitmaps would probably work fine, but I guess it usually won't happen.

What do you want to know about texture management? The interface is designed so that the subtitle renderer only changes the bitmap ID if the bitmap pixels really changed. This allows madVR to only upload every bitmap ID once to the GPU. madVR stores one ISubtitleFrame object per video frame in madVR's internal video frame queue. All the bitmaps are then uploaded to the GPU (but bitmaps with identical IDs only once per ID) in a background thread. The bitmaps are then blended to the video image by using the GPU texture units. The ISubRenderFrame objects and the uploaded bitmaps/textures are freed only when the video frame is removed from the internal video frame queue. Hope that answers your question?

Cyberbeing already linked to it, but just to make sure it's not overlooked, here's the full subtitle interface madVR and XySubFilter use to communicate:

http://madshi.net/SubRenderIntf.h

Please feel free to use any part of it, or all of it, if it suits your needs.

FYI, I think the internal video renderers of the latest MPC-HC version now also support this subtitle interface to communicate with XySubFilter now. Let me know if you have any questions. Hope github sends notifications about changes in this thread? If not, and if I fail to follow-up here, feel free to ping me via email.

from libass.

 avatar commented on May 27, 2024

The question is how to manage the textures. I see several possibilities. For example, you could pack all sub-images into a large texture, or you could keep a pool of textures (with annoying issues about picking texture sizes), or you could allocate/deallocate textures each time a sub-bitmap appears/disappears.

Initially, I had the idea that using a single large bitmap would make this easier. You could just upload the parts referenced by the dirty rectangles, and it would put a hard limit on the texture memory usage. Texture management would also be very easy.

Whether sub bitmaps can overlap or not is not very important for OpenGL, except implicit texture memory usage. But disallowing overlaps would possibly be more practical for certain other video outputs.

from libass.

madshi avatar madshi commented on May 27, 2024

I think you're seeing this from the view point of the subtitle renderer, right? I'll try to answer that from your point of view. However, my development side is rather the other end, so take what I say with a grain of salt.

There are two key questions I see:

(1) Who allocates and frees the bitmap(s)? Both could be done by libass, or by the one calling libass. Or libass could allocate and the caller could free. Any combination is possible here.
(2) If libass allocates and frees the bitmaps then how long are they valid? Only until the caller asks for the next frame?

Let's think all possibilities through:

(A) libass allocates and frees the bitmaps.
(A1) Either you allocate only one single bitmap and reuse it all the time. That means the caller would have to make a copy if he wants to store the bitmap in some sort of subtitle queue.
(A2) Or alternatively instead of reusing one bitmap, you could allocate a separate bitmap or a set of separate smaller bitmaps for every video frame. That way the caller could just store a reference to your bitmap(s) in the queue and wouldn't have to make a copy of the bitmaps. But in that case there'd have to be some way to let you know when you can release the bitmaps again.

(B) libass allocates the bitmaps. The caller frees them. This would avoid copy operations. The caller could store the bitmaps in the queue and free them whenever it makes sense. You could use one large bitmap, or multiple smaller bitmaps per video frame.

(C) The caller allocates and frees the bitmap. Since the caller doesn't know the required bitmap sizes, it will have to allocate in the size of the video frame, so it would always be one big bitmap. The caller would be fully responsible for the queue management.

In any case, if you use a single large bitmap, if the caller wants to store the bitmaps in a queue, for animated ASS subtitles there will usually be one big bitmap for every video frame, which means a lot of RAM usage and also a full frame alpha blending operation for every video frame.

Now consider an animated ASS subtitle where a subtitle text just moves from left to right without changing. If you used a small bitmap, you could keep using the same bitmap and just change the position. This would save A LOT of RAM and also save CPU/GPU performance. Ok, such an unchanging moving subtitle might not occur often in real life. But there are situations that will occur often, e.g. one static bottom subtitle line and one changing top subtitle line. If you use a static bitmap for the static subtitle line and a small changing bitmap for the top subtitle line, again you'd save a lot of RAM, especially if the caller wants to queue the subtitle bitmaps somehow.

FWIW, XySubFilter/madVR use something like (A2), with multiple smaller images, in order to optimize RAM usage and performance. XySubFilter allocates the bitmaps for every video frame and groups them into an ISubRenderFrame object instance. When madVR is done with one video frame, it releases the corresponding ISubRenderFrame interface instance. The destructor of the ISubRenderFrame class then frees the bitmaps. Object oriented programming helps here to make allocation/releasing simpler. Things get a bit complicated inside of XySubFilter, though, because it has to properly handle the situation when multiple different ISubRenderFrame objects share the same bitmap ID. So this logic does make things a bit more difficult to implement...

from libass.

grigorig avatar grigorig commented on May 27, 2024

Allocation/deallocation could also be done with callbacks. This has the advantage that buffers can be allocated in special memory if that's useful for the caller, e.g. in case OpenGL PBOs can be used for fast upload.

I'm not quite sure about animations and reusing bitmaps, in most cases movements include some subpixel adjustment so reuse won't be possible, or rotations and the like are used.

from libass.

madshi avatar madshi commented on May 27, 2024

Ok, but movements are only one possible scenario where reusing bitmaps would be useful. As I already mentioned, if you have some subtitle areas which stay static and some which are dynamic, you'd only have to allocate a different small bitmap for the changing areas, while you could reuse the bitmap(s) for the static areas.

If you consider a 16 frame subtitle queue (which some applications may use), and if you just create one big subtitle bitmap for each video frame, for 1080p video that would already consume 126.5MB RAM. For 4K video that would be 506MB. Just for subtitle bitmap storage. If there's only one subtitle line with the size 1920x100 which is changing dynamically, a 16 frame subtitle queue would only consume 11.7MB.

from libass.

 avatar commented on May 27, 2024

In any case, if you use a single large bitmap, if the caller wants to store the bitmaps in a queue, for animated ASS subtitles there will usually be one big bitmap for every video frame, which means a lot of RAM usage

Sure, but it actually puts an upper bound on the texture memory usage. With ISubRenderFrame, texture memory usage could easily exceed that of a fullscreen texture.

And that is a good thing, because we're dealing with pretty insane cases here, like subtitle animations that cover the whole screen anyway.

and also a full frame alpha blending operation for every video frame.

Not if there's a list of rectangles covering all non-transparent regions in the overlay bitmap.

If you consider a 16 frame subtitle queue (which some applications may use), and if you just create one big subtitle bitmap for each video frame, for 1080p video that would already consume 126.5MB RAM.

Well, that's the price you pay for such a huge queue. If you go the other approach, you may have to pay for unbounded texture sizes and heavy texture reallocation, depending on what the subtitle renderer outputs.

Anyway, my real problem with this is: how do you manage textures? Unless you create a new texture for each changed sub bitmap, you're going to need some sort of texture pool, and even then you're going to need to reallocate textures if a sub bitmap doesn't fit. You're also going to waste some texture memory, even if you use best fit to allocate from the texture pool. An alternative approach would be packing sub-bitmaps into a large texture (then you only need to reallocate the texture a few times in the worst case). You probably don't do this, because it would pretty much disallow caching individual sub-bitmaps for several frames (or it would make it significantly more complex).

Ok, but movements are only one possible scenario where reusing bitmaps would be useful. As I already mentioned, if you have some subtitle areas which stay static and some which are dynamic, you'd only have to allocate a different small bitmap for the changing areas, while you could reuse the bitmap(s) for the static areas.

Well, I wonder if this is really worth the trouble.

from libass.

madshi avatar madshi commented on May 27, 2024

Sure, but it actually puts an upper bound on the texture
memory usage. With ISubRenderFrame, texture memory
usage could easily exceed that of a fullscreen texture.

How would ISubRenderFrame exceed the RAM consumption of a fullscreen texture? I fail to see how that could ever happen, unless the subtitle renderer does weird things. The ISubRenderFrame logic expects the subtitle renderer to group small subtitle elements which are near to each other together into one bigger bitmap. Only separate subtitle "sections" which have some distance between them should go into separate bitmaps. This logic should in 99% of all cases save a lot of RAM compared to using a fullscreen texture. In the worst case RAM consumption should be the same as using a fullscreen bitmap.

Anyway, my real problem with this is: how do you manage
textures?

I think reallocations with a good memory manager should not be a big problem. I don't think maintaining a texture pool on your own is necessary. A good memory manager should do the pooling job for you, and in a well optimized manner. But I've not tested it, so I can't say for sure. If you always use a full screen bitmap, of course you can easily use a simple texture pool to avoid the reallocation cost. But that does come at a pretty huge RAM cost, IMHO. And maybe also at a performance cost, if not every step of every algorithm limits processing to the specified subtitle rectangles. E.g. a video renderer might upload the whole full frame bitmap to the GPU, even if only some parts of it actually contain subtitle content. Uploading the whole frame costs CPU performance, too, probably more than reallocating bitmap buffers.

Anyway, I think I've sufficiently explained my view of things, so I think it's time for me to bow out of the discussion and let you decide what you think is the best approach for libass. Whatever you decide is fine with me.

from libass.

 avatar commented on May 27, 2024

How would ISubRenderFrame exceed the RAM consumption of a fullscreen texture? I fail to see how that could ever happen, unless the subtitle renderer does weird things.

For example, if you have a large subtitle scrolling into the screen, then I would expect that it actually allocates a large sub-bitmap, and references this bitmap in future frames (just with different position). This sub-bitmap could easily be larger than the screen. Though I suspect xysubfilter wouldn't output this anyway, because sub-pixel rendering would change the bitmap every frame anyway.

The ISubRenderFrame logic expects the subtitle renderer to group small subtitle elements which are near to each other together into one bigger bitmap.

That's not really apparent. But if it's meant to work this way, ok.

I think reallocations with a good memory manager should not be a big problem. I don't think maintaining a texture pool on your own is necessary. A good memory manager should do the pooling job for you, and in a well optimized manner. For memory bitmaps, this is probably less of an issue.

Well, AFAIK texture reallocations can be pretty expensive in OpenGL, and even cause issues like fragmentation. I don't know how Direct3D handles these things.

Anyway, I think I've sufficiently explained my view of things, so I think it's time for me to bow out of the discussion and let you decide what you think is the best approach for libass. Whatever you decide is fine with me.

Thanks for the valuable discussion. Note that apart from finding the ideal approach, I'm also somewhat interested in aligning libass and xysubfilter performance characteristics and tradeoffs to avoid surprises.

from libass.

astiob avatar astiob commented on May 27, 2024

@madshi I just want to point out that wm4 is also a video renderer developer and tries to look at the issue from both points of view at the same time, perhaps even more from a video renderer’s than from a subtitle renderer’s.

I also want to clear this one bit up: am I right in understanding that in madVR you simply allocate a new texture for each bitmap? And that it seems to work well enough?

@wm4 The GetBitmapCount comment says this:

  // The subtitle renderer should combine small subtitle elements which are
  // positioned near to each other, in order to optimize performance.

from libass.

astiob avatar astiob commented on May 27, 2024

Oh, and while I remember, here’s a use case for overlapping bitmaps: a long vertical strip of subtitles (I dunno, some karaoke or a sign perhaps) and a long horizontal strip of subtitles that overlap just a little bit in the corner.

from libass.

 avatar commented on May 27, 2024

@astiob: yes, but it also doesn't say anything against overlapping bitmaps, and it does require the consumer to clip sub bitmaps against the clip rectangle - it even says that the sub bitmap position can be negative. So, in theory, the memory consumed by the sub bitmaps could grow larger than a screen sized bitmap.

from libass.

madshi avatar madshi commented on May 27, 2024

@astiob,

ah thanks, I didn't know wm4 is also a video renderer developer, he never clearly said from which side of the coin he was discussing things.

Yes, madVR is currently simply allocating a new texture for all bitmaps (but of course bitmaps with the same ID as bitmaps from the previous subtitle frame are reused). I'm not sure how much problems Direct3D has with fragmentation, but I think the allocation logic is probably designed by Microsoft and not by the GPU manufacturers, and Microsoft has A LOT of experience with memory allocation (after all it's a key factor to OS performance), so I wouldn't expect Direct3D to suffer from fragmentation problems. But I don't know for sure, to be honest.

@wm4,

yes, ok, I guess in that specific case you mentioned it could be possible for XySubFilter to allocate a bitmap which is larger than a fullscreen texture. However, I think this is a rather unlikely (or at least rare) situation, and even if it does occur, if you consider more than just one video frame, overall the RAM consumption would still be lower than always using fullscreen bitmaps, because this bigger-than-fullscreen bitmap would probably be reused for multiple video frames. Or if it's not going to be reused then XySubFilter could already do the clipping and the bitmap would be smaller (or max identical) to a fullscreen bitmap again.

from libass.

astiob avatar astiob commented on May 27, 2024

@wm4 Well, that was in reply to your

The ISubRenderFrame logic expects the subtitle renderer to group small subtitle elements which are near to each other together into one bigger bitmap.

That's not really apparent. But if it's meant to work this way, ok.

I agree that theoretically this approach might consume more memory than a single screen-sized bitmap, but I’m also inclined to agree that it will usually be the other way round, simply because you don’t allocate unused regions if nothing else. And if there’s no queueing going on, it probably won’t make any difference either way.

from libass.

Cyberbeing avatar Cyberbeing commented on May 27, 2024

clip sub bitmaps against the clip rectangle
the sub bitmaps could grow larger than a screen sized bitmap.

SubRenderIntf only moves the cropping operation from the subtitle renderer to the video renderer.

The VSFilter.dll rasterizer always output bitmaps for entire objects, even if they were only partially visible. For this reason, VSFilter.dll always cropped bitmaps which are larger than the screen before output.

With XySubFilter we have implemented partial-rasterization and partial-scanline conversion, to prevent renderering of portions of objects which lie far outside the screen. But even then, it's not always possible or efficient to output bitmaps dimensions which always exactly fall within the screen boundaries after positioning, leaving a need for cropping.

Even Libass w/ mpv crops objects which lie partially outside the video rectangle by default, with the exception of objects while lie within a negative \clip which are still shown for some reason.

it even says that the sub bitmap position can be negative

ASS subtitle format itself allows negative text, drawing, clip positions.

from libass.

 avatar commented on May 27, 2024

Even Libass w/ mpv crops objects which lie partially outside the video rectangle by default,

Libass itself should output only clipped bitmaps, because that's what some software renderer expect. It's true that libass internally creates unclipped objects, which we view as bad thing that should be changed.

with the exception of objects while lie within a negative \clip which are still shown for some reason.

Sounds like a bug that could possibly cause crashes with software renderers. Got a test case?

ASS subtitle format itself allows negative text, drawing, and clip positions.

But that has nothing to do with bitmap output.

from libass.

Cyberbeing avatar Cyberbeing commented on May 27, 2024

Libass itself should output only clipped bitmaps, because that's what some software renderer expect. It's true that libass internally creates unclipped objects, which we view as bad thing that should be changed.

madshi only designed SubRenderIntf to not require pre-clipping, there is no reason libass couldn't continue to pre-clip bitmaps before output and still be compatible.

If you strongly believe there is good reason redefine how clipping is handled in SubRenderIntf, you can email such requests to madshi or discuss them at: http://code.google.com/p/xy-vsfilter/issues/detail?id=152

Sounds like a bug that could possibly cause crashes with software renderers. Got a test case?

https://www.mediafire.com/?3t8po0sbcvedn31

But that has nothing to do with bitmap output.

Negative script values increase the potential that a subtitle renderer will rasterize an object which lies outside the visible video frame. While SubRenderIntf currently allows a subtitle renderer to convert rasterizer output directly to bitmaps without clipping.

from libass.

DevSysEngineer avatar DevSysEngineer commented on May 27, 2024

What is current status of this issue? The background of frame (set with function ass_set_frame_size) is black and not transparent. Only what I can found in the code is that the output of ass_render_frame function is RGB.

from libass.

astiob avatar astiob commented on May 27, 2024

Not sure what you mean. None of this has been implemented yet.

from libass.

windowsair avatar windowsair commented on May 27, 2024

Will this feature be added in the foreseeable future?

from libass.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.