Giter Site home page Giter Site logo

Comments (16)

ebiggers avatar ebiggers commented on September 7, 2024 1

The fix is in the v1.1 release now.

from libdeflate.

ebiggers avatar ebiggers commented on September 7, 2024

Are you using the cdecl calling convention? See this section of the README file:

Windows developers: note that the calling convention of libdeflate.dll is cdecl.
This differs from the Windows API, which uses stdcall. If you call into
libdeflate.dll using a non-C/C++ language, or dynamically using LoadLibrary(),
make sure to use the cdecl calling convention. Using the wrong calling
convention may crash your application.

from libdeflate.

dosh1974 avatar dosh1974 commented on September 7, 2024

Yes, I am.
The signatures are identical for deflate and gzip. Since deflate (and 64-bit gzip) does work I don't think that's the issue.

See the x64 output of the validation / benchmark program in x64.txt

The program compresses, decompresses and validates, then it compresses with one encoder and decompresses with another and validates.
It does so with different compression levels (that loosely maps to the .NET levels).

x86.txt contains the output for deflate only, since it works.

interop.txt is the C# interop class (as you can see the signatures are identical between deflate and gzip besides the name):

I've tried to increase output buffers, but that didn't work.
Something I should try is to increase the size of the input buffer, to check if the code reads past the end.
I'll get back with results for these tests.

I'm really just interested in the X64 case anyway (with a fallback to the "slow" version on X86), but this COULD be a buffer overrun issues which could be used for exploits etc, hence I wanted to report / get to the bottom with this :)

from libdeflate.

dosh1974 avatar dosh1974 commented on September 7, 2024

After testing some more, I'm pretty sure that it's reading past (or before) the input buffer.
If I allocate a bigger data segment and compresses from that it works fine.,.
If I had the C++ version up and running I would have allocated 3 continuous memory pages (VirtualAlloc as readonly). Marked page 0 and 2 as fully protected (VirtualProtect), then compress page 1 and see what happens.
But I don't have any more time to spend on this issue and again since X86 isn't my concern I'll leave it at this... I hope you find the cause :)

from libdeflate.

ebiggers avatar ebiggers commented on September 7, 2024

It works for me. I tested gzip compression and decompression with the 32-bit Windows DLL using guard pages like you suggested, and all compression levels and CRC32 implementations. Previously, libdeflate has also been tested using valgrind, UBSAN, and afl-fuzz.

There's probably a bug in your C# code, such as providing the wrong buffer size. Don't assume there is no bug just because the 64-bit version seems to work.

from libdeflate.

ebiggers avatar ebiggers commented on September 7, 2024

@dosh1974 any update? If not I will close this, since as far as I can tell this isn't a libdeflate problem...

from libdeflate.

dosh1974 avatar dosh1974 commented on September 7, 2024

I didn't dig much deeper, but I'm still not 100% convinced that there isn't something fishy going on (like reading past input buffer in some cases, like odd input size or so).
But, I'm not using 32 bit so I don't really care so please close if you wish.

from libdeflate.

dosh1974 avatar dosh1974 commented on September 7, 2024

Well now you had me write a repro case..
I've attached a Visual Studio 2017 project with a case showing the issue.
LibDeflateTest.zip

It seems like it's data dependent, so my guess is it depends on what part of the compressor reads the final bytes.

Edit: Btw I get the same issue in debug / release and running with / without debugger attached.
Next step would be to include the source files instead of using the DLL's.

from libdeflate.

dosh1974 avatar dosh1974 commented on September 7, 2024

Including and using the latest source (instead of using the pre-built DLL for windows), seems to work, leaving a couple of options.

  • Something has been fixed (changed) since the v1.0 release.
  • The compiler used for building the v1.0 release binaries is doing some bad codegen (optimizations?).

I don't have any compiler but VS installed atm and I guess you're using some very specific, fine tuned, compiler setup to maximize performance, so I let it rest for now :)

from libdeflate.

ebiggers avatar ebiggers commented on September 7, 2024

Unfortunately I cannot reproduce it even with your test program, although I'm compiling it with a MinGW toolchain rather than Visual Studio. Can you send me your compiled LibDeflateTest.exe file which demonstrates the crash? Also, what type of CPU are you running this on?

Also, I don't think this is related to the crash, but I noticed the VS project file for your test program actually uses the static library version of libdeflate, not the DLL version. Coming mostly from a non-Windows background, I seem to have neglected to include an import library in the Windows binary releases: in v1.0 libdeflate.lib is actually a real static library, not an import library. I'll fix this in the next release by including the import library as libdeflate.lib and the static library as libdeflatestatic.lib.

from libdeflate.

dosh1974 avatar dosh1974 commented on September 7, 2024

Here's the error in VS.
The address is tries to read is -1, in __libdeflate_aligned_free, this suggests that my initial assumption of reading before / past the input buffer is wrong.
See screenshot:
vserror

Further I noticed that sometimes it does work!
That suggests to me that there might be an uninitialized memory/variable problem since I don't expect a compressor to use any randomness in it's process and I haven't seen any indication of it being multi-threaded (should be 100% deterministic IMO).
See:
success

It makes sense that the issue persist in both the statically linked (C++) and the DLL (my initial issue, from C#) if it's the same compiler.
Odd that I can't reproduce when I build everything using VS and you can't reproduce when building everything using MinGW.
This suggests that there are some interfacing issues, but since the deflate works 100% using identical signatures etc it doesn't seem very likely.

I've also attached (debug / release) binaries.
LibDeflateTestBinaries.zip

Edit: The function name: aligned free, does it make any assumptions on memory alignment given by the OS?
This could explain why VS / MinGW give different results (suppose the .exe stub generated by MinGW gives out 8 byte aligned memory, vs .NET and VS C++ stub gives out a different alignment?).

from libdeflate.

dosh1974 avatar dosh1974 commented on September 7, 2024

As for my CPU, it's a modest I5:

cpu

from libdeflate.

dosh1974 avatar dosh1974 commented on September 7, 2024

Figured out why building from source on VS works, it not using any of the SSE / AVX optimized versions... made a half arsed attempt of getting it working in VS but gave up... but at least that's probably why it worked.

from libdeflate.

jibsen avatar jibsen commented on September 7, 2024

Digging around a bit, I think this may be caused by an unaligned store to a temporary in crc32_pclmul_avx() (makes sense, since computing crc32 is pretty much the only difference between gzip and deflate). Visual Studio misinterprets the debug info and claims the error is in libdeflate_aligned_free(), but the example never calls the free function.

01320D40  lea         ebx,[edx+40h]  
01320D43  vmovdqa     xmm2,xmmword ptr [edx+10h]  
01320D48  cmp         edi,ebx  
01320D4A  vmovdqa     xmm4,xmmword ptr [edx+20h]  
01320D4F  vmovdqa     xmm5,xmmword ptr [edx+30h]  
01320D54  je          __libdeflate_aligned_free+32Dh (01320DCDh)  
01320D56  mov         eax,ebx  
01320D58  vmovdqa     xmm6,xmmword ptr ds:[13BAC20h]  
01320D60  vpclmulqdq  xmm1,xmm4,xmm6,0  
01320D66  add         eax,40h  
01320D69  vpclmulqdq  xmm4,xmm4,xmm6,11h  
01320D6F  vpclmulqdq  xmm7,xmm0,xmm6,0  
01320D75  vmovaps     xmmword ptr [esp],xmm1 ; <-- esp is 00EFF8DC

The problem might be that GCC assumes the stack is 16-byte aligned on entry, but x86 MSVC does not. So when you call the function in libdeflate.lib (which is compiled with GCC 7.3.0) from x86 MSVC this can happen.

from libdeflate.

jibsen avatar jibsen commented on September 7, 2024

A little further investigation suggests the problem is that GCC does not add stack alignment code when you use the target attribute alone. If you compile with the proper flags, or add the force_align_arg_pointer attribute, it does.

Small example:

#include <wmmintrin.h>

__attribute__((target("sse2")))
int foo(__m128i a)
{
	__m128i n = _mm_srli_si128(a, 8);
	
	return *(unsigned char *) &n;
}

compiles with gcc -O2 -m32 -S -c foo.c to

_foo:
	movdqa	%xmm0, %xmm1
	subl	$44, %esp
	psrldq	$8, %xmm1
	movaps	%xmm1, (%esp)
	movzbl	(%esp), %eax
	addl	$44, %esp
	ret

which will crash if the stack is not aligned on entry. Whereas if you add the force_align_arg_pointer attribute (or compile with -msse2 or -mincoming-stack-boundary=2), you get

_foo:
	pushl	%ebp
	movdqa	%xmm0, %xmm1
	psrldq	$8, %xmm1
	movl	%esp, %ebp
	andl	$-16, %esp
	subl	$32, %esp
	movaps	%xmm1, (%esp)
	movzbl	(%esp), %eax
	leave
	ret

from libdeflate.

ebiggers avatar ebiggers commented on September 7, 2024

Thanks for the extra information @dosh1974, and thanks for the detailed analysis @jibsen! It seems that's exactly what the problem was. gcc assumes the stack is 16-byte aligned and can take advantage of that to generate aligned SSE or AVX load/stores to the stack. But actually MSVC only guarantees 4-byte alignment. I've fixed this by adding __attribute__((force_align_arg_pointer)) to libdeflate's API entry points when it's compiled for 32-bit Windows with gcc.

I published a prerelease build of the fixed Windows binaries to https://github.com/ebiggers/libdeflate/releases/tag/v1.0-8-gd6d50c6. Please verify that they work for you now -- thanks!

from libdeflate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.