A number of considerations all combine in somewhat complex ways, such as (re)initialization, buffering, failure recovery, and flushing. Obviously we'd like a good API for all of these, but there are tradeoffs between different options. This issue is for tracking the discussion of what we want the API to look like.
Here are some options. Note that it may be possible to have different options for encode and decode. For example, we could do a combination of B for encode and D for decode.
Option A: new Encoder/Decoder for each change
Every time you want to change something that requires (re)initialization, such as changing the codec or resolution, create a new Encoder/Decoder. Also reinit every time a flush is desired.
Pros:
- Simple API
- It's clear when (re)init fails, and recovering is straightforward
- If we have buffered frames, it's clear which initialization applies to which frames.
- Flushing is just closing the .writable
Cons:
- Dealing with downstream changes to the pipeline may be difficult (because you have a new .readable that you need to pipe somewhere).
- Dealing with upstream changes to the pipeline may be difficult (because you have a new .writable that you need to pipe into).
- It may be much more efficient to have only one encoder/decoder around at a time which is difficult to manage when the JS is creating new ones for every resolution change and/or flush.
Option B: An Initialize() method for each change
If a change requires a reinitialization, call Initialize(), as many times as you want. The .writable and .readable are stable.
Pros:
- It's clear when (re)init fails, and recovering is straightforward
- .readable is the same across multiple initializations, which makes the downstream consumption easier (nothing to re-pipe).
- .writable is the same across multiple initializations, which makes the upstream production easier (nothing to re-pipe).
- Reinitialization can be efficient because the implementation can keep only one encoder/decoder around at a time.
Cons:
- Flushing mush be a separate method (since you can only call .close() on the .writable once)
- If buffering is used, which initialization applies to which frames becomes unclear. If we don't buffer on the .writable, we can avoid this, but that means that we must require the JS to respect .ready and we must deal with what happens when it does not.
Option C: An Initialize() method that produces a new WritableStream
If a change requires a reinitialization, call Initialize(), as many times as you want. The .readable is stable, but not the .writable (if there is one).
Pros:
- It's clear when (re)init fails, and recovering is straightforward
- .readable is the same across multiple initializations, which makes the downstream consumption easier (nothing to re-pipe).
- Reinitialization can be efficient because the implementation can keep only one encoder/decoder around at a time.
- Flushing is just closing the WritableStream
- If we have buffered frames, it's clear which initialization applies to which frames.
Cons:
- Dealing with upstream changes to the pipeline may be difficult (because you have a new Writable that you need to pipe into).
- It's not exactly a TransformStream any more
- Transferring streams may be tricky
Option D: In-band parameters
To reinitialize, put new parameters on the chunk passed into the .writable. Init failure is conveyed via a write failure.
Pros:
- For decode, the source of frames (such as a media container) is likely related to the decoding parameters desired, making this a convenient/natural fit.
- Cleaner encoder/decoder API (no extra Initialize/Flush methods)
- Transferring streams is likely less tricky
- If we have buffered frames, it's clear which initialization applies to which frames.
- Upstream and downstream piping is easy (since both .readable and .writable are stable)
- Flushing is just closing the .writable
Cons:
- (Re)init failure recovery isn't straightforward. You'll likely need to catch an exception on .pipeTo and probably want to use preventCancel.
- For encode, the source of frames (such as a MediaStreamTrack tied to a VideoTrackReader) likely isn't related to the encoding parameters desired, making this an inconvenient fit.
- More complex chunk types (not just EncodedVideoFrame and VideoFrame; more things need to be on there)
Option E: Internal reinitialization
Instead of asking for an init, just give it what you want and have it (re)init when it needs. There is a fine line between this and Option D. But consider resolution changes. Instead of specifying that the codec reinit with a new size, you just give it whatever frame comes from a MediaStreamTrack and it reinits based on that size. Similarly, an EncodedVideoFrame could simple express what codec it is and the decoder deals with whatever it is.
Pros:
- API is easier to use, and simple
- Transferring streams is likely less tricky
- No problems with buffering and which settings apply to which frames
- Upstream and downstream piping is easy (since both .readable and .writable are stable)
- Flushing is just closing the .writable
Cons:
- (Re)init failure recovery isn't straightforward. You'll likely need to catch an exception on .pipeTo and probably want to use preventCancel.
- It's easy for performance issues to creap in easily and become too automative/implicit. For example, if a codec switch happens and the new codec is only available via software, not hardware, should we reinit internally from hardware to software? This leads to a higher level, more automatic API with constraints that are difficult to specify and keep consistent across browser, somewhat like getUserMedia, whereas this API was initially intended to be low-level and explicit about performance.