etemesi254 / zune-image Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
#15 finds valid inputs in Zlib format compressed with zlib-ng that fail to decompress with zune-inflate
.
Tested on commit 4e74f01
Images that fail to decode: zune-png-failures.tar.gz
I can't guarantee all of them are supposed to be decodable - some may have been generated by a fuzzer.
mmap()
can only be used safely when the file is guaranteed not to be modified by other programs while it's being read. That's because any changes made to the file on disk are immediately reflected in the memory contents. This breaks Rust's assumptions on multiple levels.
First, in many places Rust relies on the memory contents not changing while borrowed. For example &str
is checked for UTF-8 validity once; if the contents change later, it can switch to containing invalid UTF-8 and violate memory safety.
Even more fundamentally, exposing a memory-mapped file as &[u8]
like the memmap2
crate does is fundamentally unsound if the file ever changes. That's because the contents of &[u8]
are supposed to never change while the reference exists, and the compiler makes optimizations based on that assumption. Changing the bytes of a &[u8]
is instant undefined behavior.
I believe it could be used safely on Windows which has mandatory locking, i.e. you can prevent other programs from changing the file while your program is accessing it. There is no equivalent mechanism on Unix platforms, so using memmap2
crate safely is straight-up impossible. The wrapper would have to expose something like &[Cell<u8>]
, or perhaps even stronger restrictions (raw pointers with read_volatile
or perhaps atomics).
Original:
After zune --resize 200x200 --input tmpfqupp1o9.png --output doggo_thumb.ppm
This image was generated by me using Stable Diffusion, so I can license it as permissively as you like.
Happens on both PNG and JPEG, so I don't think the issue is tied to a particular format.
Tested on commit 48757e3
The following image has valid CRC32, but zune
complains that the decompressed result doesn't match the embedded adler32.
Since Adler32 in gzip applies to the decompressed data, This means one of three things:
1443708892 angelfonds_1443660918 crazybear_05ccadee-c566-4e25-9046-f4a437530a2d
Tested on commit 37163f1
#46 showed that zune-inflate is not (yet?) a win across the board - there was a situation when it wasn't better than even miniz_oxide
!
A more diverse benchmarking corpus is needed to track performance more accurately, and avoid such situations in the future.
Ideas for what could be useful as benchmarks:
rustdoc
To get zune-image's decoders into the hands of users, these things need to be done:
image
crate so that its existing users could benefit without API changesI think polishing and shipping JPEG first is a reasonable plan, but I'm curious what you think.
Also, please let me know once each decoder is ready for real-world testing and fuzzing - I can help with that!
This is similar to #7, but occurs even after the fix for it. Same testing methodology and corpus as in #7.
Images with the largest mismatch compared to the expected behavior: https://mega.nz/file/k0V3CJTI#OPJMkK7ezkrsVN_qHgTvD2qf4RsNmpfRBF6L_7oZdrA
Tested on commit b22c7e0
Some images have an issue with shifted rows similar to #7, but there are also other issues, e.g. this image:
Which gets decoded like this:
Although it has some corrupted bands so it may very well be out of scope. RST markers are supposed to make decoding robust against this kind of corruption, however.
zune-image/zune-core/src/lib.rs
Lines 72 to 84 in 306ad41
This code transforms a Vec<u16>
into Vec<u8>
. This cannot be done safely in Rust, or even in C, because memory must be allocated and deallocated with the same alignment, while the alignment requirements of u16
and u8
are different.
The documentation on Vec::from_raw_parts
explicitly calls this out.
Found by the decode_buffer
fuzz target. Happens irrespective of #26.
Reproduce with: cargo +nightly fuzz run decode_buffer -- -timeout=1s
Without the -timeout
argument the fuzzer just hangs and shows 0 executions per second.
zune-image/zune-image/src/errors.rs
Lines 12 to 21 in a54cd92
This code may break some valid uses because Cargo features are additive.
Imagine you wrote a library that only decodes JPEG images. You can write the following matching code:
match error {
// .. all variants shared between formats here ..
JpegDecodeErrors(err) => eprintln!("Failed to decode JPEG: {err}");
However, if some user of your library also depends on another library that works with PNG images via zune-image
, the required features for both libraries get merged. This causes the PngDecodeErrors
variant to appear and break compilation for the library processing JPEG, because there is now an unhandled enum variant.
A quick and easy fix to prevent this kind of breakage is making the enum #[non_exhaustive]
so that the caller always has to include a catch-all clause _ => eprintln!("Unknown error")
. This also lets you add formats easily. It is however a little cumbersome to use for the caller.
A better way would be converting all format into an error type provided by zune-image
, such as FormatError
, instead of exposing the error types from format crates in the public API of zune-image
crate.
The README for zune-jpeg in this repo still mentions multi-threading, but I understand it is no longer used. It would be nice to update the README and benchmarks.
I've found that my testing script wasn't capturing decoding failures properly. It checked for decoding mismatches correctly, but the failure of the zune
tool to decode an image was not recorded correctly.
I've fixed it and re-ran the decoding test, and found that ~300 images out of ~40,000 tested were not correctly decoded by the zune
tool. The archive containing them can be found here.
Tested on commit 74c36b8
ERROR [zune_png::headers] Bit depth of 1 only allows Greyscale or Indexed color types, but found Palette
Happens for bit depths of 1, 2 and 4.
But the spec clearly states that the valid bit depths for palettle type images are 1, 2, 4 and 8: http://www.libpng.org/pub/png/spec/1.2/PNG-Chunks.html
A number of images used by the benchmarks are missing. For instance, speed_bench_interlaced.png is referenced, but not present anywhere in the repository:
zune-image/zune-png/benches/decode.rs
Line 69 in ac4a162
This issue tracks currently supported images and gives details on their status and tracking issues.
Priority of decoders follows the following criteria.
Images present everywhere and used for everything, and by first class support, we want to have excellent decoding +encoding of almost all images for that format.
Support for such takes priority.
Full support is about to be complete, what is left is some polishing and some decoder fixes,
works on a lot of images but some cause errors, looking into them.
Not yet working, but work is underway, mainly spending time translating Eric Bigger's libdeflate to Rust, but support should be soon.
Limited by the fact that I haven't decided how frame API will look, so this will take a while
Image formats on the rise
Complicated and complex beasts, support for these formats will take a while, probably a year or so , decoding support should be pure rust(there is no hurry to reach production at all), while encoding should be their C libraries (encoding is another complicated beast and man has 24 hours a day).
Support is welcome.
Tested on commit 31899d8
I have a large corpus of PNGs lying around, and I should run decode comparison tests against libpng. Just let me know once the decoder is complete enough for that kind of testing.
I'm seeing this error happen on many images that are decoded fine by other decoders:
Jpeg decoding failed:Error parsing DQT segment. Reason:No quantization table for component Y
I have 1206 samples total, but here's a small subset so you can reproduce the issue: zune-y-component.tar.gz
Tested on commit 5877c2f
Samples to reproduce the issue: zune-avx2-panic.tar.gz
These images decode fine in other decoders; zune panics:
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', zune-jpeg/src/idct/avx2.rs:258:5
Tested on commit 5877c2f
Original:
Zune:
Original:
Zune:
Tested on commit 74c36b8
Right now roundtrip fuzzing in PNG is disabled for bit depths lower than 8:
As noted here, zune-png expands the low bit depths up to 8 bits, but the fuzz target doesn't account for that.
The attached image fails to decode with zune
as of commit f749f95
ERROR [zune_bin] Could not complete workflow, reason jpg: "No more bytes"
At a quick glance, there seem to be at least two issues going on:
Samples for reproducing the issues: zune-divergences-3.tar.gz
Tested on commit 5877c2f
Streaming decompressors like flate2
usually write to a caller-provided buffer, so they write out some data on failure, and it's possible to recover part of the data even if decompression fails.
Even the methods that write to a Vec
in miniz_oxide
return the Vec they've written so far on failure, using their custom error struct.
It would be nice to have a similar mechanism in zune-inflate
. I don't actually have use cases for this, but it seems like an easy thing to do.
The following image
gets decoded like this
Here's an rchive with 92 more examples showing significant mismatch compared to imagemagick
:
https://mega.nz/file/M90XCIwK#cyiSqTo3KhtGCm7Fw6RxKoKH4sfraHEzC6wGA2UqD-8
At least some of them show the same issue, but there may be other issues in there as well.
Tested on commit b923233
The administrator of manebooru.art has kindly run zune-image
on all 300,000 JPEG images they have. There was a total of 25 images that failed to decode. The archive with what should be a representative sample is attached.
manebooru_jpeg_failed_to_decode.tar.gz
Tested on commit 48757e3
The results from a test run on the manebooru.art database are in.
There are no decode failures other than ERROR [zune_bin] Could not complete workflow, reason png: Error decoding idat chunks Output limit exceeded, set limit was 1073741824 and output size is 107374185
which appears to be legitimate.
However, there are some discrepancies in decoding compared to imagemagick on a total of 70 images out of 600,000. This should hopefully be a representative sample: https://mega.nz/file/tslC1azb#CBg56O8T7uwC1uXBnbcQLjSaYiQALtaCz_b97hXDm3Y
Tested on commit 15e5d6c. The run on 600,000 images was done on an earlier commit, but I've re-tested everything on the specified commit, which is the latest as of this writing.
image-rs/image-png#363 has landed 100% safe Rust filters that benefit from autovectorization, without using any unsafe code.
The average filter was optimized into vector instructions earlier, also without unsafe: image-rs/image-png#198
It would be nice to investigate if that approach can replace some or all of the explicit unsafe
used in zune-png
.
https://github.com/google/wuffs is a rather restrictive memory-safe language for writing image decoders.
They claim to have written the world's fastest PNG decoder, and the tricks that made it possible are detailed here:
https://nigeltao.github.io/blog/2021/fastest-safest-png-decoder.html
Most of them are applicable in Rust as well.
On this image the red area is a few pixels higher compared to decoding with GIMP.
I'm genuinely not sure what the correct decoding is, and it's not a huge visual difference, but I thought I'd report it regardless.
Here's another one:
It seems to always affect the reddish parts, as far as I can tell.
Resizing the following image causes a panic: gender-select
thread 'main' panicked at 'index out of bounds: the len is 146432 but the index is 146432', zune-image/zune-imageprocs/src/resize/bilinear.rs:48:31
This is a screenshot from the game Shadowkey
Command: zune --yes --resize 200x100 --input "gender-select.png" --out "thumb.ppm"
Tested on commit e794b64
A PNG file contains two checksums:
Adler32 is slower than crc32 (which has a dedicated instruction to compute it on x86).
adler32 allows verification of the final uncompressed bitstream; however, the input file integrity is already verified by crc32 in the PNG format. This makes adler32 verification redundant, since the only things it can detect are:
Both of these scenarios are so rare that they're not worth the extra runtime overhead in the default configuration.
Disclaimer: my understanding is based on the summary provided here, I'm not an expert on the PNG format.
git clone https://github.com/Shnatsel/exrs
cd exrs
git checkout zune-inflate
cargo bench read_single_image_non_parallel_zips_rgba
The Cargo.toml file is locked in to 0.2.1. Edit it to point to 0.2.2 and you'll see a performance regression, to the point that miniz_oxide
used on the master
branch becomes slightly faster than zune-inflate
.
This is not reflected in the zune-inflate benchmark, so apparently it represents significantly different data patterns.
Edit: actually 0.2.1 doesn't pass tests, you have to upgrade to 0.2.2 for correctness.
This is a very hot function responsible for 30% of the time spent in inflate on an exr
crate benchmark:
zune-image/zune-inflate/src/utils.rs
Lines 93 to 102 in 5097f4c
There are at least two ways to speed it up:
inflate
crate for this very algorithm.These images become fully transparent after conversion to PPM with zune
and then to PNG using convert
:
zune-image-png-37163f128-blank.tar.gz
Is that an issue with zune-image
or convert
? Perhaps zune
is using some advanced PPM features that imagemagick
can't handle?
Tested on zune-image
commit 37163f1
These files get decoded either to a blank, fully transparent image (at least after using imagemagick to convert them back to PNG):
zune-image-divergences-backup-png-e794b64a5.tar.gz
I think I've reported some of these, but I don't see any open issues associated with PNG, so my apologies if this is a duplicate.
Tested on commit e794b64
This happens on many images, here's a sample file.
The command to trigger the issue is: zune --yes --resize 200x100 --input "input.png" --out "thumb.ppm"
The error is: ERROR [zune_bin] Could not complete workflow, reason Components mismatch, expected 4 channels since image format is RGBA, but found 3
Regular conversion from PNG to PPM, without resizing, works fine.
Tested on commit e794b64
I've seeded the JPEG decoding fuzzer with the AFL test files as well as my own corpus generated by fuzzing the jpeg-decoder
crate. It found a panic after 83012 executions:
thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /home/shnatsel/Code/zune-image/zune-jpeg/src/mcu_prog.rs:251:26
The file triggering the panic: crash-46acfa647cb1e34e2c77e9bd0d2bf2389ff6c2ba.gz
Reproduce with: cargo +nightly fuzz run -O decode_buffer crash-46acfa647cb1e34e2c77e9bd0d2bf2389ff6c2ba
I've added a fuzz target in #13 that compresses arbitrary data from the fuzzer with miniz_oxide
, then decompresses it with zune-inflate
, and validates that the decompressed data is the same as the data before compression.
Right now this causes panics in zune-inflate
, indicating that it cannot handle some valid inputs.
I've already tested miniz_oxide
itself with this kind of fuzzing (decoding with miniz_oxide also), so I'm reasonably certain the compressed data it produces is correct.
Running zune --input Pithecophaga_jefferyi.jpg --out out.ppm
garbles the image.
Original:
Result:
The image is sourced from wikimedia.
I've tried to add zune-inflate
to the exr
crate, and tests failed. I've re-run the roundtrip
fuzz target and it immediately found inputs that failed to roundtrip.
Fortunately there aren't any production users yet, so nothing is broken by this.
I suggest running fuzzing for a small number of iterations on CI to make sure this doesn't happen again.
Run the fuzz target added in #22 : cargo +nightly fuzz run roundtrip
It finds the following panic in a few seconds:
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Corrupt data
', /home/shnatsel/Code/zune-image/zune-png/src/decoder.rs:264:50
Right now it's impossible to distinguish the error caused by exceeding the size limit from other errors. It should be its own dedicated enum variant.
This can be important for showing user-friendly messages, such as "Use the --size argument to increase the limit" in a command-line tool.
It seems that in the images decoded by zune-jpeg, the red channel is shifted by 1 pixel along the vertical axis.
We've seen this kind of failure mode in small images of flags before, but manebooru has an image that makes it very obvious:
If you decode this with zune-jpeg
and switch between the original and the zune decoded image, you'll clearly see the reds move.
Tested on commit 48757e3
After running zune --input IMG_1859.JPG --out out.ppm
the last 11 rows of this image are not decoded - they're just a black stripe. Other decoders seem to be decoding it fine.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.