Giter Site home page Giter Site logo

Comments (8)

ebiggers avatar ebiggers commented on September 7, 2024

It's optimized by default. Please give details. What CPU? How did you measure the speed, exactly? What input sizes did you test? Is there any way that I could reproduce your results? Are you sure it was a controlled test?

from libdeflate.

kanryu avatar kanryu commented on September 7, 2024

I first built it with msvc, but it seems that the optimized function is not included in lib and it is not optimized by this build method. On the other hand, when building with gcc, I confirmed that it contains optimized functions without any special handling.

As for the measurement method of processing, I use lodepng which is one of the OSS implementation of the PNG library, and confirmed it by replacing the expansion part of deflate with libdeflate. Expanding images is completed in half the time from lodepng origin, but it is almost the same as libpng.

from libdeflate.

ebiggers avatar ebiggers commented on September 7, 2024

Yes, due to limited resources to develop and test with MSVC, libdeflate is only properly optimized when built with gcc or clang.

Anyway, if I understand you correctly, your results were:

  • lobepng with its own DEFLATE implementation is slow.
  • lobepng with libdeflate is fast.
  • libpng with zlib is fast.

I don't see where you actually compared libdeflate to zlib directly. It could be that libpng is faster for other reasons. Can you please clarify whether you actually did a controlled test that compared libdeflate to zlib?

from libdeflate.

jibsen avatar jibsen commented on September 7, 2024

If you are compiling with GCC for Windows x86 (and not x64), you might also want to add -msse2, unless you specifically need to support old machines. Compiling for x86 with SSE2 enabled has been the default in MSVC since 2012.

It is unlikely to affect the decompression speed, since that is handled by inspecting CPU features, but it does appear to speed up compression (as a quick test, the included benchmark tool reports 63 MB/s with the default x86 options, and 76 MB/s with -msse2 for compression level 1 on silesia.tar).

from libdeflate.

ebiggers avatar ebiggers commented on September 7, 2024

Correct, I haven't made the matchfinder optimizations detect CPU features at runtime yet. So adding
-msse2 (for x86) or -mavx2 (for x86 and x64) will help a bit, if you know the code will only be run on a CPU with those features. However, that only affects compression, whereas the original question here was about decompression.

from libdeflate.

kanryu avatar kanryu commented on September 7, 2024

Hi,
As a result of profiling lodepng from last time, I noticed that adler32 and unfilter processing is late by lodepng 's default (PNG image has filters). Replacing the zlib module with libdeflate instead of inflate and replacing it with unfilter in libpng confirmed that it can be expanded at 1.35 times faster than normal libpng.

It seems necessary to modify the processing of lodepng itself in order to further increase the speed (some unnecessary processing is found).

I added -O3, -Ofast, -mtune etc when adding libdeflate with gcc of msys2 but there was no difference in processing performance.

from libdeflate.

ebiggers avatar ebiggers commented on September 7, 2024

Okay, so you're saying that libdeflate is faster after all, in a proper comparison?

Is there any remaining issue here?

from libdeflate.

kanryu avatar kanryu commented on September 7, 2024

No.
Thank you very much for solving my first question :)

from libdeflate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.