zune-inflate fails to decompress some valid Zlib inputs

#15 finds valid inputs in Zlib format compressed with zlib-ng that fail to decompress with zune-inflate.

PNG: some valid images fail to decode

Tested on commit 4e74f01

Images that fail to decode: zune-png-failures.tar.gz

I can't guarantee all of them are supposed to be decodable - some may have been generated by a fuzzer.

Using `mmap()` without additional locking is unsound

mmap() can only be used safely when the file is guaranteed not to be modified by other programs while it's being read. That's because any changes made to the file on disk are immediately reflected in the memory contents. This breaks Rust's assumptions on multiple levels.

First, in many places Rust relies on the memory contents not changing while borrowed. For example &str is checked for UTF-8 validity once; if the contents change later, it can switch to containing invalid UTF-8 and violate memory safety.

Even more fundamentally, exposing a memory-mapped file as &[u8] like the memmap2 crate does is fundamentally unsound if the file ever changes. That's because the contents of &[u8] are supposed to never change while the reference exists, and the compiler makes optimizations based on that assumption. Changing the bytes of a &[u8] is instant undefined behavior.

I believe it could be used safely on Windows which has mandatory locking, i.e. you can prevent other programs from changing the file while your program is accessing it. There is no equivalent mechanism on Unix platforms, so using memmap2 crate safely is straight-up impossible. The wrapper would have to expose something like &[Cell<u8>], or perhaps even stronger restrictions (raw pointers with read_volatile or perhaps atomics).

`--resize` garbles images

Original:

After zune --resize 200x200 --input tmpfqupp1o9.png --output doggo_thumb.ppm

This image was generated by me using Stable Diffusion, so I can license it as permissively as you like.

Happens on both PNG and JPEG, so I don't think the issue is tied to a particular format.

Tested on commit 48757e3

Adler32 mismatch - potential bug in `zune-inflate`?

The following image has valid CRC32, but zune complains that the decompressed result doesn't match the embedded adler32.

Since Adler32 in gzip applies to the decompressed data, This means one of three things:

the image is damaged in such a way that avoids detection by crc32
zune-inflate decompresses this stream incorrectly
adler32 is calculated incorrectly by zune

1443708892 angelfonds_1443660918 crazybear_05ccadee-c566-4e25-9046-f4a437530a2d

Tested on commit 37163f1

Inflate: More diverse benchmarking corpus

#46 showed that zune-inflate is not (yet?) a win across the board - there was a situation when it wasn't better than even miniz_oxide!

A more diverse benchmarking corpus is needed to track performance more accurately, and avoid such situations in the future.

Ideas for what could be useful as benchmarks:

Compressed JSON, e.g. from rustdoc
Data stream from a PNG with a screenshot (to stress RLE on flat surfaces)
Data stream from a PNG with artwork (to stress complex patterns)
Something poorly compressible, like JPEG, that someone decided to compress anyway

Roadmap to production

To get zune-image's decoders into the hands of users, these things need to be done:

Fuzz the decoders (since unsafe SIMD is used which doesn't insert bounds checks)
Test the decoders on real-world images that technically aren't compliant but libjpeg-turbo decodes anyway
Write an adapter to the image crate so that its existing users could benefit without API changes

I think polishing and shipping JPEG first is a reasonable plan, but I'm curious what you think.

Also, please let me know once each decoder is ready for real-world testing and fuzzing - I can help with that!

Shifted rows and other issues on multiple JPEG images

This is similar to #7, but occurs even after the fix for it. Same testing methodology and corpus as in #7.

Images with the largest mismatch compared to the expected behavior: https://mega.nz/file/k0V3CJTI#OPJMkK7ezkrsVN_qHgTvD2qf4RsNmpfRBF6L_7oZdrA

Tested on commit b22c7e0

Some images have an issue with shifted rows similar to #7, but there are also other issues, e.g. this image:

Which gets decoded like this:

Although it has some corrupted bands so it may very well be out of scope. RST markers are supposed to make decoding robust against this kind of corruption, however.

DecodingResult::u16_to_u8 is unsound

zune-image/zune-core/src/lib.rs

Lines 72 to 84 in 306ad41

    
               pub fn u16_to_u8(self) 
        
               { 
        
                   if let DecodingResult::U16(data) = self 
        
                   { 
        
                       unsafe { 
        
                           let (ptr, length, capacity) = into_raw_parts(data); 
        
                           let new_ptr = ptr.cast::<u8>(); 
        
                           let _new_value = Vec::from_raw_parts(new_ptr, length * 2, capacity * 2); 
        
                       } 
        
                   } 
        
               } 
        
           }

This code transforms a Vec<u16> into Vec<u8>. This cannot be done safely in Rust, or even in C, because memory must be allocated and deallocated with the same alignment, while the alignment requirements of u16 and u8 are different.

The documentation on Vec::from_raw_parts explicitly calls this out.

PNG: Grayscale+Alpha images converted to just Grayscale in PPM

Not sure if this is actually a bug, but it showed up in my search for divergences from imagemagick.

This file:

Gets a white background instead of a transparent one when converting to PPM with the zune command-line tool:

Infinite loop in zune-inflate on malformed inputs

Found by the decode_buffer fuzz target. Happens irrespective of #26.

Reproduce with: cargo +nightly fuzz run decode_buffer -- -timeout=1s

Without the -timeout argument the fuzzer just hangs and shows 0 executions per second.

Enum variants should not be added/removed based on features

zune-image/zune-image/src/errors.rs

Lines 12 to 21 in a54cd92

    
           #[cfg(feature = "zune-jpeg")] 
        
           JpegDecodeErrors(zune_jpeg::errors::DecodeErrors), 
        
           #[cfg(feature = "png")] 
        
           PngDecodeErrors(zune_png::error::PngErrors), 
        
           #[cfg(feature = "ppm")] 
        
           PPMDecodeErrors(zune_ppm::PPMDecodeErrors), 
        
           #[cfg(feature = "psd")] 
        
           PSDDecodeErrors(zune_psd::errors::PSDDecodeErrors), 
        
           #[cfg(feature = "qoi")] 
        
           QoiDecodeErrors(zune_qoi::QoiErrors),

This code may break some valid uses because Cargo features are additive.

Imagine you wrote a library that only decodes JPEG images. You can write the following matching code:

match error {
    // .. all variants shared between formats here ..
    JpegDecodeErrors(err) => eprintln!("Failed to decode JPEG: {err}");

However, if some user of your library also depends on another library that works with PNG images via zune-image, the required features for both libraries get merged. This causes the PngDecodeErrors variant to appear and break compilation for the library processing JPEG, because there is now an unhandled enum variant.

A quick and easy fix to prevent this kind of breakage is making the enum #[non_exhaustive] so that the caller always has to include a catch-all clause _ => eprintln!("Unknown error"). This also lets you add formats easily. It is however a little cumbersome to use for the caller.

A better way would be converting all format into an error type provided by zune-image, such as FormatError, instead of exposing the error types from format crates in the public API of zune-image crate.

PNG: incorrect decoding of indexed images

Original:

Zune:

Original:

Zune:

Original:

Zune:

Tested on commit b346ba3

Outdated README and benchmarks on `zune-jpeg` in this repo

The README for zune-jpeg in this repo still mentions multi-threading, but I understand it is no longer used. It would be nice to update the README and benchmarks.

JPEG: fails to decode some images

I've found that my testing script wasn't capturing decoding failures properly. It checked for decoding mismatches correctly, but the failure of the zune tool to decode an image was not recorded correctly.

I've fixed it and re-ran the decoding test, and found that ~300 images out of ~40,000 tested were not correctly decoded by the zune tool. The archive containing them can be found here.

Tested on commit 74c36b8

PNG: "Bit depth of 2 only allows Greyscale or Indexed color types, but found Palette"

ERROR [zune_png::headers] Bit depth of 1 only allows Greyscale or Indexed color types, but found Palette

Happens for bit depths of 1, 2 and 4.

But the spec clearly states that the valid bit depths for palettle type images are 1, 2, 4 and 8: http://www.libpng.org/pub/png/spec/1.2/PNG-Chunks.html

Missing benchmark images

A number of images used by the benchmarks are missing. For instance, speed_bench_interlaced.png is referenced, but not present anywhere in the repository:

zune-image/zune-png/benches/decode.rs

Line 69 in ac4a162

    
           env!("CARGO_MANIFEST_DIR").to_string() + "/tests/benchmarks/speed_bench_interlaced.png";

Image Format Issue Tracker.

Image formats tracker.

This issue tracks currently supported images and gives details on their status and tracking issues.

Formats currently able to decode

Decoder
- JPG
- PPM
- PSD - (Experimental)
- QOI
- Farbfeld

Priority of decoders follows the following criteria.

First class image formats.

Images present everywhere and used for everything, and by first class support, we want to have excellent decoding +encoding of almost all images for that format.
Support for such takes priority.

1. JPEG

Full support is about to be complete, what is left is some polishing and some decoder fixes,
works on a lot of images but some cause errors, looking into them.

2. PNG

Not yet working, but work is underway, mainly spending time translating Eric Bigger's libdeflate to Rust, but support should be soon.

3. GIF.

Limited by the fact that I haven't decided how frame API will look, so this will take a while

Second class image formats.

Image formats on the rise

4. WEBP + AVIF + JPEG-XL+HEIF

Complicated and complex beasts, support for these formats will take a while, probably a year or so , decoding support should be pure rust(there is no hurry to reach production at all), while encoding should be their C libraries (encoding is another complicated beast and man has 24 hours a day).

Other file formats.

Support is welcome.

PNG: some images decoded incorrectly

zune-png-mismatches.tar.gz

Tested on commit 31899d8

PNG: test decoding on lots of real-world images

I have a large corpus of PNGs lying around, and I should run decode comparison tests against libpng. Just let me know once the decoder is complete enough for that kind of testing.

`Jpeg decoding failed:Error parsing DQT segment. Reason:No quantization table for component Y` error on valid images

I'm seeing this error happen on many images that are decoded fine by other decoders:

Jpeg decoding failed:Error parsing DQT segment. Reason:No quantization table for component Y

I have 1206 samples total, but here's a small subset so you can reproduce the issue: zune-y-component.tar.gz

Tested on commit 5877c2f

Panic on decoding some JPEG images - `unwrap()` in `idct/avx2.rs`

Samples to reproduce the issue: zune-avx2-panic.tar.gz

These images decode fine in other decoders; zune panics:

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', zune-jpeg/src/idct/avx2.rs:258:5

Tested on commit 5877c2f

PNG: some images are too bright

Original:

Zune:

Original:

Zune:

Tested on commit 74c36b8

PNG: extend roundtrip fuzzing to low bit depths

Right now roundtrip fuzzing in PNG is disabled for bit depths lower than 8:

zune-image/zune-png/fuzz/fuzz_targets/roundtrip.rs

Line 17 in 008f02a

if bit_depth < 8 { return None } // TODO: remove this hack

As noted here, zune-png expands the low bit depths up to 8 bits, but the fuzz target doesn't account for that.

JPEG fails to decode a valid image: "No more bytes"

The attached image fails to decode with zune as of commit f749f95

ERROR [zune_bin] Could not complete workflow, reason jpg: "No more bytes"

1601903733902076022688216

Incorrect decoding of some real-world JPEG images

At a quick glance, there seem to be at least two issues going on:

Quality degradation / pixelation on some greyscale or mostly-greyscale images
Incorrect recovery after RST marker in damaged images, so that everything following the RST marker is misaligned slightly
There are also some small images that are completely garbled, but that might be due to issue 1, not sure

Samples for reproducing the issues: zune-divergences-3.tar.gz

Tested on commit 5877c2f

zune-inflate: return the resulting Vec on failure

Streaming decompressors like flate2 usually write to a caller-provided buffer, so they write out some data on failure, and it's possible to recover part of the data even if decompression fails.

Even the methods that write to a Vec in miniz_oxide return the Vec they've written so far on failure, using their custom error struct.

It would be nice to have a similar mechanism in zune-inflate. I don't actually have use cases for this, but it seems like an easy thing to do.

16-bit PNG images are cut off when decoding

Original:

Zune converted to PPM and then with imagemagick to PNG:

Original:

Zune:

Tested on commits dfb7ddd and 74c36b8

PNG: Some 16-bit images are garbled

Original:

Decoded with zune and converted back to PNG with imagemagick:

Tested on commit 74c36b8

Shifted rows on multiple images

The following image

gets decoded like this

Here's an rchive with 92 more examples showing significant mismatch compared to imagemagick:
https://mega.nz/file/M90XCIwK#cyiSqTo3KhtGCm7Fw6RxKoKH4sfraHEzC6wGA2UqD-8

At least some of them show the same issue, but there may be other issues in there as well.

Tested on commit b923233

Some real-world JPEG files fail to decode

The administrator of manebooru.art has kindly run zune-image on all 300,000 JPEG images they have. There was a total of 25 images that failed to decode. The archive with what should be a representative sample is attached.

manebooru_jpeg_failed_to_decode.tar.gz

Tested on commit 48757e3

PNG: a very small percentage of images decoded incorrectly

The results from a test run on the manebooru.art database are in.

There are no decode failures other than ERROR [zune_bin] Could not complete workflow, reason png: Error decoding idat chunks Output limit exceeded, set limit was 1073741824 and output size is 107374185 which appears to be legitimate.

However, there are some discrepancies in decoding compared to imagemagick on a total of 70 images out of 600,000. This should hopefully be a representative sample: https://mega.nz/file/tslC1azb#CBg56O8T7uwC1uXBnbcQLjSaYiQALtaCz_b97hXDm3Y

Tested on commit 15e5d6c. The run on 600,000 images was done on an earlier commit, but I've re-tested everything on the specified commit, which is the latest as of this writing.

Investigate autovectorization-based filtering

image-rs/image-png#363 has landed 100% safe Rust filters that benefit from autovectorization, without using any unsafe code.

The average filter was optimized into vector instructions earlier, also without unsafe: image-rs/image-png#198

It would be nice to investigate if that approach can replace some or all of the explicit unsafe used in zune-png.

Performance lessons from WUFFS

https://github.com/google/wuffs is a rather restrictive memory-safe language for writing image decoders.

They claim to have written the world's fastest PNG decoder, and the tricks that made it possible are detailed here:
https://nigeltao.github.io/blog/2021/fastest-safest-png-decoder.html

Most of them are applicable in Rust as well.

Slightly different decoding of a handful of JPEG images

On this image the red area is a few pixels higher compared to decoding with GIMP.

I'm genuinely not sure what the correct decoding is, and it's not a huge visual difference, but I thought I'd report it regardless.

Here's another one:

It seems to always affect the reddish parts, as far as I can tell.

Panic on out-of-bounds access when resizing some images

Resizing the following image causes a panic: gender-select

thread 'main' panicked at 'index out of bounds: the len is 146432 but the index is 146432', zune-image/zune-imageprocs/src/resize/bilinear.rs:48:31

This is a screenshot from the game Shadowkey

Command: zune --yes --resize 200x100 --input "gender-select.png" --out "thumb.ppm"

Tested on commit e794b64

PNG: don't verify adler32

A PNG file contains two checksums:

Adler32 describing the checksum of the decompressed ZIP data
CRC32 describing the checksum of the compressed file

Adler32 is slower than crc32 (which has a dedicated instruction to compute it on x86).

adler32 allows verification of the final uncompressed bitstream; however, the input file integrity is already verified by crc32 in the PNG format. This makes adler32 verification redundant, since the only things it can detect are:

A bug in the zlib decompression implementation
File corruption so subtle that the outer crc32 missed it

Both of these scenarios are so rare that they're not worth the extra runtime overhead in the default configuration.

Disclaimer: my understanding is based on the summary provided here, I'm not an expert on the PNG format.

Inflate: not faster than `miniz_oxide` on exr benchmark

git clone https://github.com/Shnatsel/exrs
cd exrs
git checkout zune-inflate
cargo bench read_single_image_non_parallel_zips_rgba

The Cargo.toml file is locked in to 0.2.1. Edit it to point to 0.2.2 and you'll see a performance regression, to the point that miniz_oxide used on the master branch becomes slightly faster than zune-inflate.

This is not reflected in the zune-inflate benchmark, so apparently it represents significantly different data patterns.

Edit: actually 0.2.1 doesn't pass tests, you have to upgrade to 0.2.2 for correctness.

Inflate: `copy_rep_matches` can be faster

This is a very hot function responsible for 30% of the time spent in inflate on an exr crate benchmark:

zune-image/zune-inflate/src/utils.rs

Lines 93 to 102 in 5097f4c

    
           for i in 0..length 
        
           { 
        
               // Safety: We assert the worst possible length is in place 
        
               // before the loop for both src and dest. 
        
               // 
        
               // Reason: This is perf sensitive, we can't afford such slowdowns 
        
               // and it activates optimizations i.e see 
        
               let byte = dest[offset + i]; 
        
               dest[dest_offset + i] = byte; 
        
           }

There are at least two ways to speed it up:

https://crates.io/crates/rle-decode-fast implements an algorithm copying data in batches exponentially increasing in size rather than byte-by-byte. This should be faster than the current approach, although may need some tweaking (e.g. capping the exponent at some size - needs benchmarks). Disclaimer - I took part in developing that, I may be biased.
Nothing is done to eliminate bounds checks. I don't think using an iterator will work here, but there's ample room for optimizer hints. I've successfully applied those in the inflate crate for this very algorithm.

Some images appear blank after PNG > PPM > imagemagick PNG conversion

These images become fully transparent after conversion to PPM with zune and then to PNG using convert:
zune-image-png-37163f128-blank.tar.gz

Is that an issue with zune-image or convert? Perhaps zune is using some advanced PPM features that imagemagick can't handle?

Tested on zune-image commit 37163f1

PNG: incorrect decoding of some PNGs

These files get decoded either to a blank, fully transparent image (at least after using imagemagick to convert them back to PNG):

zune-image-divergences-backup-png-e794b64a5.tar.gz

I think I've reported some of these, but I don't see any open issues associated with PNG, so my apologies if this is a duplicate.

Tested on commit e794b64

Resizing some PNGs fails: Components mismatch, expected 4 channels since image format is RGBA, but found 3

This happens on many images, here's a sample file.

The command to trigger the issue is: zune --yes --resize 200x100 --input "input.png" --out "thumb.ppm"

The error is: ERROR [zune_bin] Could not complete workflow, reason Components mismatch, expected 4 channels since image format is RGBA, but found 3

Regular conversion from PNG to PPM, without resizing, works fine.

Tested on commit e794b64

JPEG: fuzzer panic on `.unwrap()` in mcu_prog.rs

I've seeded the JPEG decoding fuzzer with the AFL test files as well as my own corpus generated by fuzzing the jpeg-decoder crate. It found a panic after 83012 executions:

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /home/shnatsel/Code/zune-image/zune-jpeg/src/mcu_prog.rs:251:26

The file triggering the panic: crash-46acfa647cb1e34e2c77e9bd0d2bf2389ff6c2ba.gz

Reproduce with: cargo +nightly fuzz run -O decode_buffer crash-46acfa647cb1e34e2c77e9bd0d2bf2389ff6c2ba

zune-inflate fails to decompress some valid inputs

I've added a fuzz target in #13 that compresses arbitrary data from the fuzzer with miniz_oxide, then decompresses it with zune-inflate, and validates that the decompressed data is the same as the data before compression.

Right now this causes panics in zune-inflate, indicating that it cannot handle some valid inputs.

I've already tested miniz_oxide itself with this kind of fuzzing (decoding with miniz_oxide also), so I'm reasonably certain the compressed data it produces is correct.

Some images garbled after JPEG to PPM conversion

Running zune --input Pithecophaga_jefferyi.jpg --out out.ppm garbles the image.

Original:

Result:

The image is sourced from wikimedia.

`zune-inflate` fails to roundtrip in v0.2.0 release from crates.io

I've tried to add zune-inflate to the exr crate, and tests failed. I've re-run the roundtrip fuzz target and it immediately found inputs that failed to roundtrip.

Fortunately there aren't any production users yet, so nothing is broken by this.

I suggest running fuzzing for a small number of iterations on CI to make sure this doesn't happen again.

zune-png panics on decoding some valid inputs

Run the fuzz target added in #22 : cargo +nightly fuzz run roundtrip

It finds the following panic in a few seconds:

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Corrupt data
', /home/shnatsel/Code/zune-image/zune-png/src/decoder.rs:264:50

Provide a dedicated DecodeError variant for exceeding the size limit

Right now it's impossible to distinguish the error caused by exceeding the size limit from other errors. It should be its own dedicated enum variant.

This can be important for showing user-friendly messages, such as "Use the --size argument to increase the limit" in a command-line tool.

JPEG: red channel shifted by 1 pixel

It seems that in the images decoded by zune-jpeg, the red channel is shifted by 1 pixel along the vertical axis.

We've seen this kind of failure mode in small images of flags before, but manebooru has an image that makes it very obvious:

16006511942024190319402

If you decode this with zune-jpeg and switch between the original and the zune decoded image, you'll clearly see the reds move.

Tested on commit 48757e3

Black stripe added to the end of some images

After running zune --input IMG_1859.JPG --out out.ppm the last 11 rows of this image are not decoded - they're just a black stripe. Other decoders seem to be decoding it fine.

	pub fn u16_to_u8(self)
	{
	if let DecodingResult::U16(data) = self
	{
	unsafe {
	let (ptr, length, capacity) = into_raw_parts(data);
	let new_ptr = ptr.cast::<u8>();

	let _new_value = Vec::from_raw_parts(new_ptr, length * 2, capacity * 2);
	}
	}
	}
	}

	#[cfg(feature = "zune-jpeg")]
	JpegDecodeErrors(zune_jpeg::errors::DecodeErrors),
	#[cfg(feature = "png")]
	PngDecodeErrors(zune_png::error::PngErrors),
	#[cfg(feature = "ppm")]
	PPMDecodeErrors(zune_ppm::PPMDecodeErrors),
	#[cfg(feature = "psd")]
	PSDDecodeErrors(zune_psd::errors::PSDDecodeErrors),
	#[cfg(feature = "qoi")]
	QoiDecodeErrors(zune_qoi::QoiErrors),

	for i in 0..length
	{
	// Safety: We assert the worst possible length is in place
	// before the loop for both src and dest.
	//
	// Reason: This is perf sensitive, we can't afford such slowdowns
	// and it activates optimizations i.e see
	let byte = dest[offset + i];
	dest[dest_offset + i] = byte;
	}

etemesi254 / zune-image Goto Github PK

zune-image's People

Contributors

Stargazers

Watchers

Forkers

zune-image's Issues

Image formats tracker.

Formats currently able to decode

First class image formats.

1. JPEG

2. PNG

3. GIF.

Second class image formats.

4. WEBP + AVIF + JPEG-XL+HEIF

Other file formats.

Recommend Projects

Recommend Topics

Recommend Org