chromeos / cros-codecs Goto Github PK

View Code? Open in Web Editor NEW

26.0 26.0 8.0 3.86 MB

License: BSD 3-Clause "New" or "Revised" License

Rust 99.37% Shell 0.10% Python 0.53%

cros-codecs's People

Contributors

Stargazers

Watchers

Forkers

dwlsalmeida cazou dabrain34 internetoftofu bgrzesik

cros-codecs's Issues

We should not update the parser state when peeking a SPS

I found out this corner case when developing h.265. Apparently, this is not picked up by any of the tests in the h264 suites we are using, but nevertheless it is still a possibility in the wild.

This line is wrong, because it updates the parser state with a new SPS before processing any pending slices. This can break the decoding process if we send the decoder such a sequence of NALUs:

<old slice> <old slice> <new SPS> <new slice>

The current code will peek the SPS and update the parser state before processing <old slice>s. In the likely event that the SPS simply overwrites a previous SPS held in the parser (by using the same sps_id, e.g.: 0), then the old slices may wrongly refer to a new SPS.

It can be a difficult bug to track down as well.

Note that we are not in control of the sequence of NALUs we are given, as our frame iterators are only used by ccdec, while the client code is in charge of doing that process when using cros-codecs as a library. For our virtualization use-case, it is expected that the guest userspace will have a sane implementation before submitting the data to the virtio video driver.

In order to fix this, we should introduce a new parser function, peek_sps(), which either doesn't take self or takes an immutable reference. This new function will parse a SPS without saving it in the parser. The current function, parse_sps() can be redefined in terms of peek_sps() + an internal save_sps() parser function.

This design fixed a crash in one of the h265 tests.

vp90-2-22-svc_1280x720_3.ivf decodes almost correctly

This is a SVC stream, so there's a few resolution changes at a few points. Looking at YUV, the output is mostly right, but sometimes both resolutions can be seen, with the lower resolution overlaying the larger one by the top-corner.

SVC is quite important in video-conferencing and there's only two test vectors to test this.

Note: This also does not pass on GStreamer.

Largest layer is 720p

ccdec.tar.gz

We must block on all pending work during flush()

This appears to be a bug in our implementation. If we flush, we must

a) submit any leftover work
b) block on said work before returning

Notice that after we return from flush, a lot of our state is (or will be) invalid, so we must complete the work before returning.

For h.264, for example, the current code will not check cur_pic_opt, nor will it block on it. This means that if (for whatever reason), cur_pic_opt is Some, that picture will be lost.

[Fluster] CAWP5_TOSHIBA_E.264 fails to parse

cargo run --example ccdec -- CAWP5_TOSHIBA_E.264
    Finished dev [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/examples/ccdec CAWP5_TOSHIBA_E.264`
thread 'main' panicked at 'decoder error: No ShortTerm reference found with pic_num -2', /usr/local/google/home/acourbot/Work/cros-codecs/src/utils.rs:234:27
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

FFmpeg-VAAPI is able to pass this test successfully, so this is likely an issue with our H.264 parser.

h264: a slice may refer to a previous SPS but we do not renegotiate

This is not tested by any of our h264 test suites, but a slice may very well refer to a previous SPS than the latest one parsed. If we identify this, we should renegotiate if needed using the SPS referred to by the slice, instead of peeking for a new one.

We should also:

a) refrain from processing any more input in decode_access_unit() if this is detected
b) return the actual bitstream offset, so that the client can reissue the call with the data only from that point on, i.e.: decoder.decode(&bitstream[current_offset..]).

For b), we can query the current offset from the cursor, and return that in DecodingState::AwaitingFormat. together with the actual SPS referred to by the slice.

h264: unwrapping of None value

This unwrap in src/decoder/stateless/h264.rs seems to fail once in a while when playing around with ARCVM:

    /// Submits the picture to the accelerator.
    fn submit_picture(&mut self) -> Result<(PictureData, T), DecodeError> {
        let picture = self.cur_pic.take().unwrap();

Can't iterate over ReadyFramesQueue<T> without consuming it

The actual code does not seem to match the docs:

/// Allows us to manipulate the frames list like an iterator without consuming it and resetting its
/// display order counter.
impl<'a, T> Iterator for &'a mut ReadyFramesQueue<T> {
    type Item = T;

    /// Returns the next frame (if any) waiting to be dequeued.
    fn next(&mut self) -> Option<T> {
        self.queue.pop_front()
    }
}

iter_mut() calls pop_front(), so it is actually not possible to iterate without removing items

[Fluster] MR1_BT_A, MR2_TANDBERG_E, MR6_BT_B, MR7_BT_B, MR8_BT_B, MR9_BT_B failing on Intel only

Strange case of tests failing to pass only on Intel platforms:

MR1_BT_A, MR2_TANDBERG_E, MR8_BT_B show macroblock corruption after a few frames.
MR6_BT_B, MR7_BT_B, MR9_BT_B don't have visible corruption, but rather very small alterations that require a hex editor to see.

Note that on MR2_TANDBERG_E, MR6_BT_B, MR7_BT_B and MR8_BT_B the FFmpeg-H.264-VAAPI decoder seems to fall back to software decoding for some reason. But other files seem to be successfully decoded by FFmpeg on Intel (and all pass with ccdec on AMD).

h264: we should check first_mb_in_slice when decoding

From the specification:

first_mb_in_slice specifies the address of the first macroblock in the slice

If first_mb_in_slice == 0 this means that we have identified that the slice belongs to a new picture.

This should be one of the conditions checked here

This is not a problem when testing with fluster because we check for this when using H264FrameIterator, but clients are free to submit data as they see fit.

`decoder::stateless::h264::vaapi::tests::test_25fps_block` regression on Intel

This is a regression introduced by 39e3d00:

thread 'decoder::stateless::h264::vaapi::tests::test_25fps_block' panicked at 'called `Result::unwrap()` on an `Err` value: decoder error: decoder error: while syncing picture

The sync() call on submit_picture when in blocking mode causes an internal decoder error on the VAAPI side. This doesn't happen in non-blocking mode, and also doesn't happen prior to commit 39e3d00. The AMD driver also seems to be unaffected.

Guess the first thing to do would be to set LIBVA_TRACE and check whether the sequence of calls to libva before and after this CL is different - it should not, but obviously something has changed.

Limit use of anyhow

Anyhow should only be used when the returned errors are not well defined. For most cases, thiserror is a better and lighter fit. This issue is to track the removal of anyhow from the codec and decoder code.

[Fluster] H.264 extended profile not listed in VAAPI (BA3_SVA_C failure)

Fluster's JVT-AVC_V1::BA3_SVA_C test vector requires the extended profile, which we don't support yet:

cargo run --example ccdec -- BA3_SVA_C.264
    Finished dev [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/examples/ccdec BA3_SVA_C.264`
thread 'main' panicked at 'backend error: Invalid profile_idc 88', /usr/local/google/home/acourbot/Work/cros-codecs/src/utils.rs:234:27
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

This profile is not listed in the VAProfile enum: https://github.com/intel/libva/blob/master/va/va.h#L502

Should we substitute another profile for it? The table on this post seems to suggest some features are exclusive to the extended profile, so there is no obvious candidate...

[Fluster] vp80-00-comprehensive-006 failure

$ python fluster.py run -d cros-codecs-VP8 -tv vp80-00-comprehensive-006
****************************************************************************************************
Running test suite VP8-TEST-VECTORS with decoder cros-codecs-VP8
Test vectors vp80-00-comprehensive-006
Using 12 parallel job(s)
****************************************************************************************************

[TEST SUITE      ] (DECODER        ) TEST VECTOR               ... RESULT
----------------------------------------------------------------------
[VP8-TEST-VECTORS] (cros-codecs-VP8) vp80-00-comprehensive-006 ... Fail


=======================================================================
FAIL: vp80-00-comprehensive-006 (cros-codecs-VP8.VP8-TEST-VECTORS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/mnt/stateful_partition/fluster/fluster/test.py", line 106, in _test
    self.test_vector.name,
AssertionError: '2d5fa3ec2f88404ae7b305c1074036f4' != '0c345d46643d9da1e1369e8ceb98e122'
- 2d5fa3ec2f88404ae7b305c1074036f4
+ 0c345d46643d9da1e1369e8ceb98e122
 : vp80-00-comprehensive-006

Ran 0/1 tests successfully               in 0.205 secs

The resulting output is visually close to the expected result, but as if the image was offset a little. It's also not a full-pixel shift, the values of most pixels are slightly different, and that from the first frame,as if the image was zoomed it just a little bit.

The bitstream has the unusual resolution of 175x143 (vs the usual 176x144), this is probably related.

Failure to decode H.264 and VP9 in YUV422 and YUV444 formats

The vp91-2-04-yuv422.webm and vp91-2-04-yuv444.webm are failing with vaEndPicture returning a VA_STATUS_ERROR_INVALID_PARAMETER error.

This is a bit strange as vp93-2-20-10bit-yuv444.webm and vp93-2-20-12bit-yuv444.webm, which are also 4:4:4 tests bit with higher bit depth, pass successfully - this suggests that our handling of 4:4:4 is at least correct.

On the other hand vp93-2-20-10bit-yuv422.webm and vp93-2-20-12bit-yuv422.webm are also failing.

It is unclear whether this is a problem with cros-codecs or VAAPI. On these tests the FFmpeg-VP9-VAAPI decoder seems to always fall back to software decoding.

Remove the BackendData generic argument of VaapiBackend

The BackendData generic argument of VaapiBackend has been introduced by the H.265 code, and is used exclusively by it.

It would be preferable (and more readable) to keep VaapiBackend codec-agnostic and remove that argumment. The H.265 BackendData has two members:

last_slice set and used in submit_last_slice and replace_last_slice.
va_references set in handle_picture and used in decode_slice.

It should be possible to replace them by generic associated types of the H265 stateless backend, and have the H.265 decoder manage and pass them to the needed functions.

VAAPI: format map should only be used when trying to map frames for the CPU

As far as decoding is concerned, the only relevant format is the RT format. The VA_FOURCC_* are only used to present a view of the buffer in the layout given by the fourcc.

However the current code requires a fourcc to be specified (or at least chosen by default) in the open method, and that even if we have no intent to read the decoded result using the CPU. This should probably be changed to something that does not involve fourccs until we actually want to map a decoded buffer.

The interface would probably be such that we can query which DecodedFormats are supported for the decoder on the current settings ; then we can request a mapping of a given frame to one of the supported formats.

We should run Fluster through some CI

Apparently, one of my minor changes broke all of h.264 tests. It took a while before it was fixed by 922d678.

We should have some automation in place somehow to run Fluster on every commit.

@Gnurou I do not know much about CI systems in general, but I can do the work if you give some initial directions.

h264: we should cope with a new picture during decode_access_unit

Somewhat related to #33

The current code will check for enough surfaces during decode_access_unit

But again, aside from our own FrameIterator (which is used for ccdec), we are not in control of the order of NALUs passed in by users of the library. This can create a problem if we receive the following sequence:

<sps> <pps> <slice A> <slice B> <slice C> <slice D> ... <slice N>

Where A..N are different pictures.

I have just realized that we do not plan for the above scenario, so the first order of business is checking first_mb_in_slice==0 in order to detect that a slice refers to a new picture. Or better yet, we can reuse the logic from Chromium.

While not ideal, we can cope with identifying a new picture in decode_access_unit, but we must:

a) finish the current picture,
b) make sure that we have a surface to decode it to

Note that we already do a) if we notice a new field picture, but not a new frame. This also means moving the check of the current number of surfaces to begin_picture() so that we can have b)

We should maybe expose DecoderState to upper layers

Imagine some broken media where no SPS or no vpx frames can be parsed at all.

The current design will exit normally, even though no output was produced and nothing useful was done. The decoder will forever be in the DecodingState::AwaitingStreamInfo state.

If by any means we detect EOS (be it at the virtio level, or in cros-codecs itself when executing tests or ccdec) and the decoder is still in that state, we should error out so as to make the user aware that the media is malformed and or corrupt.

How exactly that is to be done is something to be discussed. My initial idea is to add something along the lines of:

fn decoding_state(&self) -> DecodeState<Box<dyn Any>>

To the VideoDecoder trait.

Users can then query the state at EOS to manually assert that the decoder is indeed in DecodingState::Decoding and proceed accordingly.

If for any reason we do not want to copy and or expose T to clients, then another option is:

fn decoding_state(&self) -> DecodeState<()>

In which the decoders would convert from AwaitingFormat(T) to AwaitingFormat(())

@Gnurou any other ideas?

We can possibly get rid of a lot of generics through Cow<'_, [u8]>

I am experimenting with a new design for the AV1 code. One that uses Cow<'_, [u8]> in place of T: AsRef<[u8]>. This means that we can remove this T: AsRef<[u8]> noise from a lot of functions and traits, while making it possible to still build Slices, Frames and etc out of both borrowed and owned data without copying.

What's more, Cow<'_, [u8]> dereferences to &[u8] so nothing fundamentally changes in the code as far as I can see. Not only that, but we de facto do not even support anything other than &[u8] in the decoders, as introducing T in our backend traits would make them not object safe. Using Cow<'_, [u8]> circumvents this problem, as there's no generics involved and lifetimes do not break object safety.

If the current AV1 design proves successful, we should maybe backport these changes to the other codecs as a way to clean up the codebase by a considerable amount.

JVT-FR-EXT test suite has lots of failures

JVT-FR-EXT is failing quite badly comparatively to JVT-AVC_V1.

ccdec has 12/69 tests passing.

GStreamer-H.264-VAAPI-Gst1.0 has 43/69 tests passing.

Attached the first frame from one of the first failing tests (brcm_freh3), it seems like key frames are not decoded correctly.

h264: SPS update race

The end of the current picture cannot be detected before the next one starts, but before this we may have parsed a new SPS through peek_sps. If this happens, we will finish the current picture with the new SPS, which can cause inconsistencies, like the DPB not being bumped because the new max number of reference frames has increased.

In order to fix that, the current picture should keep a reference to its PPS from its creation time, and each PPS should have a reference to their SPS. Putting SPS and PPS into Rc sounds like a good way to achieve this.

[Fluster] JVT-AVC_V1 MR3_TANDBERG_B, MR4_TANDBERG_C, MR5_TANDBERG_C failure

thread 'main' panicked at 'decoder error: Could not find a ShortTerm picture to mark in the DPB', /usr/local/google/home/acourbot/Work/cros-codecs/src/utils.rs:234:27

FFmpeg VAAPI can pass these tests, so this is likely a problem with our H.264 parser.

We must process parameter sets even in DecodingState::{AwaitingStreamInfo|Reset}

This issue was discovered when developing the h.265 code. If we fail to take in new SPS and PPS in this state (using process_nal()) they will be dropped. No test currently exercises this for H.264 but it remains a possibility, nevertheless.

[Fluster] vp80-03-segmentation-1401 failure

$ python fluster.py run -d cros-codecs-VP8 -tv vp80-03-segmentation-1401
****************************************************************************************************
Running test suite VP8-TEST-VECTORS with decoder cros-codecs-VP8
Test vectors vp80-03-segmentation-1401
Using 12 parallel job(s)
****************************************************************************************************

[TEST SUITE      ] (DECODER        ) TEST VECTOR               ... RESULT
----------------------------------------------------------------------
[VP8-TEST-VECTORS] (cros-codecs-VP8) vp80-03-segmentation-1401 ... Fail


=======================================================================
FAIL: vp80-03-segmentation-1401 (cros-codecs-VP8.VP8-TEST-VECTORS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/mnt/stateful_partition/fluster/fluster/test.py", line 106, in _test
    self.test_vector.name,
AssertionError: 'f7acb74e99528568714129e2994ceca5' != '6e0ed3dd845a44b1239059942e5e53b0'
- f7acb74e99528568714129e2994ceca5
+ 6e0ed3dd845a44b1239059942e5e53b0
 : vp80-03-segmentation-1401

Ran 0/1 tests successfully               in 0.106 secs

Only the first frame appears to be incorrect (and heavily damaged), other frames look ok.

Using LIBVA_TRACE I have compared what we are sent to libva between cros-codecs-VP8 and FFmpeg-VP8-VAAPI (which passes the test). I found this difference on the first frame:

cros-codecs: bool_coder_ctx: range = f8, value = 58, count = 7
ffmpeg: bool_coder_ctx: range = f8, value = 59, count = 7

So I tried the make value match what we see with ffmpeg with the following patch:

diff --git a/src/decoders/vp8/backends/vaapi.rs b/src/decoders/vp8/backends/vaapi.rs
index f3173ef9..ccdc9097 100644
--- a/src/decoders/vp8/backends/vaapi.rs
+++ b/src/decoders/vp8/backends/vaapi.rs
@@ -148,9 +148,12 @@ impl VaapiBackend<Header> {
             u32::from(frame_hdr.loop_filter_level() == 0),
         );

+        let mut bd_value = frame_hdr.bd_value();
+        if bd_value == 0x58 { bd_value = 0x59 };
+
         let bool_coder_ctx = libva::BoolCoderContextVPX::new(
             u8::try_from(frame_hdr.bd_range())?,
-            u8::try_from(frame_hdr.bd_value())?,
+            u8::try_from(bd_value)?,
             u8::try_from(frame_hdr.bd_count())?,
         );

And lo and behold, the test passes with this workaround!

Could this be a bug in the boolean decoder somehow?

Remove unwraps and other runtime panics that result from invalid input

An invalid stream should not make the program panic. This issue is to track the removal of all sources of panic from the code:

unwraps as a result of invalid input,
Explicit calls to panic and expect,
Potentially out-of-bounds accesses in vectors or array (replace them with get).

Add support for more H.264 pixel formats

Support for 4:2:2, 4:4:4 and 10 and 12 bit HDR formats has been recently added and validated with VP9, which already had the code to detect HDR formats.

H.264 is lacking that code, but once this is added we should be able to pass more of the JVT-FR-EXT tests.

Decoder interface needs an `InvalidInput` error

Right now we don't have a distinctive error to signal that the decoder could not find any meaningful input in the submitted input. While this is an error, most clients will want to keep submitting input until something catches on, so we should have a dedicated error type to allow proper matching.

vaapi: vaDeriveImage should be used for image creation?

It is desirable to use vaDeriveImage in order to access decoded frames using the CPU, but I have seen the following strange behavior when enabling it:

On Intel, all VAAPI decoding tests are passing,
H.264 tests also pass on AMD,
VP9 tests are failing on AMD, sometimes with a SIGSEGV as we try to access the mapped surface.

This may be a bug in Mesa, but we would need to investigate this further. Note that once we start importing buffers this problem will be mitigated since we can map the external buffers ourselves, still it is nice to have this fixed as the self-allocated path is slower than it should be.

Flushing is fishy

When flushing, we current clear all the reference frames, e.g. for VP9:

    fn flush(&mut self) {
        // Note: all the submitted frames are already in the ready queue.
        self.reference_frames = Default::default();
        self.decoding_state = DecodingState::AwaitingStreamInfo;
    }

However, the parser still contains some state related to the reference frames, and other preserved state should also probably be invalid since we are in AwaitingStreamInfo again. Going to this state also means that we will emit a DRC event even if the resolution has not actually changed.

We should probably do either more or less, but the current amount of state clearing does not seem very consistent.

decode() should process a single unit or be able to partially process input

The current implementation of decode expects that the input buffer will contain one or more frames worth of data. This introduces constraints to clients, who cannot e.g. send a single slice of a multi-slice H.264 frame, and to the decoder which should ideally check that there are enough resources for processing the whole input, but currently cannot guarantee that if there are multiple frames.

We could change the signature of decode to something like this:

fn decode(&mut self, timestamp: u64, mut bitstream: &[u8]) -> Result<usize, DecodeError> {

In this new implementation, decode only processes a single unit of data. For H.26x this would be a single NAL unit, for VP8/VP9 a single frame.

The return value is the number of bytes processed in bitstream. If it is less than bitstream.len(), then the caller should call decode again on &bitstream[return_value..] until the whole bistream is processed.

This has the advantage of removing a loop in every implementation of decode, and will allow decode to detect when it doesn't have enough free output buffers to perform the operation and bail out in that case.

On the other hand, this means that H.264 (and probably H.265) will have to keep the current picture in its state again, and also that it won't be able to detect its end until the first buffer of the next picture is queued or flush is called.

We should rename all occurrences of the term "Surface" in non-vaapi code

This will be very confusing when/if other backends (like v4l2 backends) come along.

For consistency, we should rename this at the decoder level.

My suggestion is to use the term resource, meaning some backend-specific buffer or data structure consumed during decode, like VASurface, V4L2Request, V4L2Buffer, VkMemory and etc.

We shouldn't implement codec-specific methods for VaapiBackend

We currently implement quite a few methods for VaapiBackend<(), M> in stateless/h264/vaapi.rs. This pollutes the namespace, as a structure used for all backends now contain methods that only work for a given codec without any extra generic parameters to hide them away or otherwise preclude them from being used elsewhere.

av1: max for secondary strength looks incorrect

As per Nicolas Dufresne:

the max for secondary strength is incorrect here, https://github.com/chromeos/croscodecs/blob/main/src/codec/av1/parser.rs#L921
Secondary strength is 2 bit in the bitstream
as you apply the +1, its not even a range, is 0 1 2 4

Let's check and fix if needed.