<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

64/32 bit output mismatch. about nvidia-texture-tools HOT 10 CLOSED

castano commented on July 30, 2024

64/32 bit output mismatch.

from nvidia-texture-tools.

Comments (10)

GoogleCodeExporter commented on July 30, 2024

Jim, the CUDA compiler produces slightly different code when compiling for 32 
or 64
bit targets. My first guess is that what you are seeing are just small floating 
point
differences. The 64 bit target supports 64 bit pointers, which may result in 
larger
register space requirements, and slightly different optimization strategies. 
I'll try
to confirm this later tonight or tomorrow.

Original comment by [email protected] on 26 May 2008 at 9:01

from nvidia-texture-tools.

GoogleCodeExporter commented on July 30, 2024

Oh, I'm sorry Ignacio, I didn't make it clear that I don't have cuda enabled for
either of the executables (or so I think, I might have made an error). This 
should be
for the straight up CPU implementation. And in some cases the differences are 
quite
noticeable.

Original comment by [email protected] on 26 May 2008 at 9:03

from nvidia-texture-tools.

GoogleCodeExporter commented on July 30, 2024

Oh, I see. I would still assume it's just floating point differences. The 
compiler
has twice as many SSE registers when targeting x64. So, it could lay down the
expressions in a slightly different way. However, it could be something else. I
remember Simon mentioned a similar issue on some targets. Let me check with him.

Original comment by [email protected] on 26 May 2008 at 9:20

from nvidia-texture-tools.

GoogleCodeExporter commented on July 30, 2024

The issue I discovered was that RCPSS is implemented differently on Intel and 
AMD
hardware, which could result in different encodings on these two platforms.  A
workaround is to mask off some of the low bits of the estimate, which I'm 
considering
for the next squish release.

I can't see how a larger register file would alter the results (thankfully SSE
registers have no hidden bits like FPU ones), so I assume there must be some
instruction differences between the two builds.  Has anyone compared the asm 
for the
inner loops between platforms?

Original comment by [email protected] on 29 May 2008 at 12:25

from nvidia-texture-tools.

GoogleCodeExporter commented on July 30, 2024

I don't know about msvc, but gcc produces very different code when compiling 
for x86
and x64 targets. Last time I checked NVTT was a bit faster in 64 bit mode. There
might be other differences, but the most important one is the doubled register 
count.

I'll have a closer look at the code over the weekend.

Original comment by [email protected] on 29 May 2008 at 7:15

from nvidia-texture-tools.

GoogleCodeExporter commented on July 30, 2024

I have not checked the output assembly, although as I think I saw in the squish
library, most of the calculations are done through SSE, right? It was my
understanding that the SSE stuff was pretty much identical across the two modes
(32/64 bit). It's not as if I'm actually switching processor, it's the same 
machine
I'm running the different executables on... so I don't think it's some 
instructions
that are different. The larger registerfile should only matter if we are 
compiling
the library with the unsafe math transformations enabled (I'm not sure if we 
are) but
under just normal ANSI rules, there should be no differences from the compiler 
in the
emitted code, regardless of the larger register file... or so I think :)

I'm very curious if you find any thing Ignacio, as this really gives me a 
slightly
uneasy feeling as we're transitioning from 32-bit to exclusive 64-bit...

Original comment by [email protected] on 31 May 2008 at 6:35

from nvidia-texture-tools.

GoogleCodeExporter commented on July 30, 2024

Yes, the code is compiled with the "precise" floating point model.

I had a closer look at the asm code for 32 and 64 targets, and while 
instructions are
schedulled very differently, I analyzed a few expressions, and they seem to be 
coded
the same way.

I guess I'll have to debug it side by side in order to find out where the
computations diverge. I'll let you know if I find anything.

Original comment by [email protected] on 19 Jun 2008 at 1:19

Changed state: Accepted

from nvidia-texture-tools.

GoogleCodeExporter commented on July 30, 2024

Ok, I've located the problem. The function ComputePrincipleComponent produces
slightly different results in 64 and 32 bit targets. 

This function uses standard floating point arithmethic (no sse intrincics). The 
32
bit compiler produces code that uses x87 instructions, even when SSE2 is 
enabled. On
the other side, the 64 bit compiler always uses SSE instructions. This obviously
produces different results. 

There are several possible workarounds. The ideal solution would be to 
vectorize the
functions: ComputePrincipleComponent and ComputeWeightedCovariance. This is not 
too
hard, and would produce even slightly results.

A more simple workaround is to reduce the x87 floating point precision. That 
seems to
work, at least with this particular code, where there are no transcendental 
functions.

I'm not gonna do anything to fix this on my side, but applications can set the
floating point flags themselves:

_controlfp(_PC_24, _MCW_PC);

Let me know if that works for you.

Original comment by [email protected] on 19 Jun 2008 at 9:30

Changed state: WontFix

from nvidia-texture-tools.

GoogleCodeExporter commented on July 30, 2024

I've added a wiki page explaining the issue and the workaround:

http://code.google.com/p/nvidia-texture-tools/wiki/CompressionDifferences

Original comment by [email protected] on 19 Jun 2008 at 9:50

from nvidia-texture-tools.

GoogleCodeExporter commented on July 30, 2024

Thanks Ignacio for tracking this one down. Much like you I thought that the 
regular
32 bit version also only had SSE instructions in the implementation, but it 
seems
that there were a few regular scalar floating point instructions.

No worries on my part, we're all 64 bit here :)

Original comment by [email protected] on 20 Jun 2008 at 5:46

from nvidia-texture-tools.

64/32 bit output mismatch. about nvidia-texture-tools HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent