Giter Site home page Giter Site logo

Comments (10)

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
Jim, the CUDA compiler produces slightly different code when compiling for 32 
or 64
bit targets. My first guess is that what you are seeing are just small floating 
point
differences. The 64 bit target supports 64 bit pointers, which may result in 
larger
register space requirements, and slightly different optimization strategies. 
I'll try
to confirm this later tonight or tomorrow.


Original comment by [email protected] on 26 May 2008 at 9:01

from nvidia-texture-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
Oh, I'm sorry Ignacio, I didn't make it clear that I don't have cuda enabled for
either of the executables (or so I think, I might have made an error). This 
should be
for the straight up CPU implementation. And in some cases the differences are 
quite
noticeable.

Original comment by [email protected] on 26 May 2008 at 9:03

from nvidia-texture-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
Oh, I see. I would still assume it's just floating point differences. The 
compiler
has twice as many SSE registers when targeting x64. So, it could lay down the
expressions in a slightly different way. However, it could be something else. I
remember Simon mentioned a similar issue on some targets. Let me check with him.

Original comment by [email protected] on 26 May 2008 at 9:20

from nvidia-texture-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
The issue I discovered was that RCPSS is implemented differently on Intel and 
AMD
hardware, which could result in different encodings on these two platforms.  A
workaround is to mask off some of the low bits of the estimate, which I'm 
considering
for the next squish release.

I can't see how a larger register file would alter the results (thankfully SSE
registers have no hidden bits like FPU ones), so I assume there must be some
instruction differences between the two builds.  Has anyone compared the asm 
for the
inner loops between platforms?

Original comment by [email protected] on 29 May 2008 at 12:25

from nvidia-texture-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
I don't know about msvc, but gcc produces very different code when compiling 
for x86
and x64 targets. Last time I checked NVTT was a bit faster in 64 bit mode. There
might be other differences, but the most important one is the doubled register 
count.

I'll have a closer look at the code over the weekend.

Original comment by [email protected] on 29 May 2008 at 7:15

from nvidia-texture-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
I have not checked the output assembly, although as I think I saw in the squish
library, most of the calculations are done through SSE, right? It was my
understanding that the SSE stuff was pretty much identical across the two modes
(32/64 bit). It's not as if I'm actually switching processor, it's the same 
machine
I'm running the different executables on... so I don't think it's some 
instructions
that are different. The larger registerfile should only matter if we are 
compiling
the library with the unsafe math transformations enabled (I'm not sure if we 
are) but
under just normal ANSI rules, there should be no differences from the compiler 
in the
emitted code, regardless of the larger register file... or so I think :)

I'm very curious if you find any thing Ignacio, as this really gives me a 
slightly
uneasy feeling as we're transitioning from 32-bit to exclusive 64-bit...

Original comment by [email protected] on 31 May 2008 at 6:35

from nvidia-texture-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
Yes, the code is compiled with the "precise" floating point model.

I had a closer look at the asm code for 32 and 64 targets, and while 
instructions are
schedulled very differently, I analyzed a few expressions, and they seem to be 
coded
the same way.

I guess I'll have to debug it side by side in order to find out where the
computations diverge. I'll let you know if I find anything.

Original comment by [email protected] on 19 Jun 2008 at 1:19

  • Changed state: Accepted

from nvidia-texture-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
Ok, I've located the problem. The function ComputePrincipleComponent produces
slightly different results in 64 and 32 bit targets. 

This function uses standard floating point arithmethic (no sse intrincics). The 
32
bit compiler produces code that uses x87 instructions, even when SSE2 is 
enabled. On
the other side, the 64 bit compiler always uses SSE instructions. This obviously
produces different results. 

There are several possible workarounds. The ideal solution would be to 
vectorize the
functions: ComputePrincipleComponent and ComputeWeightedCovariance. This is not 
too
hard, and would produce even slightly results.

A more simple workaround is to reduce the x87 floating point precision. That 
seems to
work, at least with this particular code, where there are no transcendental 
functions.

I'm not gonna do anything to fix this on my side, but applications can set the
floating point flags themselves:

_controlfp(_PC_24, _MCW_PC);

Let me know if that works for you.

Original comment by [email protected] on 19 Jun 2008 at 9:30

  • Changed state: WontFix

from nvidia-texture-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
I've added a wiki page explaining the issue and the workaround:

http://code.google.com/p/nvidia-texture-tools/wiki/CompressionDifferences

Original comment by [email protected] on 19 Jun 2008 at 9:50

from nvidia-texture-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 30, 2024
Thanks Ignacio for tracking this one down. Much like you I thought that the 
regular
32 bit version also only had SSE instructions in the implementation, but it 
seems
that there were a few regular scalar floating point instructions.

No worries on my part, we're all 64 bit here :)

Original comment by [email protected] on 20 Jun 2008 at 5:46

from nvidia-texture-tools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.