Comments (10)
Jim, the CUDA compiler produces slightly different code when compiling for 32
or 64
bit targets. My first guess is that what you are seeing are just small floating
point
differences. The 64 bit target supports 64 bit pointers, which may result in
larger
register space requirements, and slightly different optimization strategies.
I'll try
to confirm this later tonight or tomorrow.
Original comment by [email protected]
on 26 May 2008 at 9:01
from nvidia-texture-tools.
Oh, I'm sorry Ignacio, I didn't make it clear that I don't have cuda enabled for
either of the executables (or so I think, I might have made an error). This
should be
for the straight up CPU implementation. And in some cases the differences are
quite
noticeable.
Original comment by [email protected]
on 26 May 2008 at 9:03
from nvidia-texture-tools.
Oh, I see. I would still assume it's just floating point differences. The
compiler
has twice as many SSE registers when targeting x64. So, it could lay down the
expressions in a slightly different way. However, it could be something else. I
remember Simon mentioned a similar issue on some targets. Let me check with him.
Original comment by [email protected]
on 26 May 2008 at 9:20
from nvidia-texture-tools.
The issue I discovered was that RCPSS is implemented differently on Intel and
AMD
hardware, which could result in different encodings on these two platforms. A
workaround is to mask off some of the low bits of the estimate, which I'm
considering
for the next squish release.
I can't see how a larger register file would alter the results (thankfully SSE
registers have no hidden bits like FPU ones), so I assume there must be some
instruction differences between the two builds. Has anyone compared the asm
for the
inner loops between platforms?
Original comment by [email protected]
on 29 May 2008 at 12:25
from nvidia-texture-tools.
I don't know about msvc, but gcc produces very different code when compiling
for x86
and x64 targets. Last time I checked NVTT was a bit faster in 64 bit mode. There
might be other differences, but the most important one is the doubled register
count.
I'll have a closer look at the code over the weekend.
Original comment by [email protected]
on 29 May 2008 at 7:15
from nvidia-texture-tools.
I have not checked the output assembly, although as I think I saw in the squish
library, most of the calculations are done through SSE, right? It was my
understanding that the SSE stuff was pretty much identical across the two modes
(32/64 bit). It's not as if I'm actually switching processor, it's the same
machine
I'm running the different executables on... so I don't think it's some
instructions
that are different. The larger registerfile should only matter if we are
compiling
the library with the unsafe math transformations enabled (I'm not sure if we
are) but
under just normal ANSI rules, there should be no differences from the compiler
in the
emitted code, regardless of the larger register file... or so I think :)
I'm very curious if you find any thing Ignacio, as this really gives me a
slightly
uneasy feeling as we're transitioning from 32-bit to exclusive 64-bit...
Original comment by [email protected]
on 31 May 2008 at 6:35
from nvidia-texture-tools.
Yes, the code is compiled with the "precise" floating point model.
I had a closer look at the asm code for 32 and 64 targets, and while
instructions are
schedulled very differently, I analyzed a few expressions, and they seem to be
coded
the same way.
I guess I'll have to debug it side by side in order to find out where the
computations diverge. I'll let you know if I find anything.
Original comment by [email protected]
on 19 Jun 2008 at 1:19
- Changed state: Accepted
from nvidia-texture-tools.
Ok, I've located the problem. The function ComputePrincipleComponent produces
slightly different results in 64 and 32 bit targets.
This function uses standard floating point arithmethic (no sse intrincics). The
32
bit compiler produces code that uses x87 instructions, even when SSE2 is
enabled. On
the other side, the 64 bit compiler always uses SSE instructions. This obviously
produces different results.
There are several possible workarounds. The ideal solution would be to
vectorize the
functions: ComputePrincipleComponent and ComputeWeightedCovariance. This is not
too
hard, and would produce even slightly results.
A more simple workaround is to reduce the x87 floating point precision. That
seems to
work, at least with this particular code, where there are no transcendental
functions.
I'm not gonna do anything to fix this on my side, but applications can set the
floating point flags themselves:
_controlfp(_PC_24, _MCW_PC);
Let me know if that works for you.
Original comment by [email protected]
on 19 Jun 2008 at 9:30
- Changed state: WontFix
from nvidia-texture-tools.
I've added a wiki page explaining the issue and the workaround:
http://code.google.com/p/nvidia-texture-tools/wiki/CompressionDifferences
Original comment by [email protected]
on 19 Jun 2008 at 9:50
from nvidia-texture-tools.
Thanks Ignacio for tracking this one down. Much like you I thought that the
regular
32 bit version also only had SSE instructions in the implementation, but it
seems
that there were a few regular scalar floating point instructions.
No worries on my part, we're all 64 bit here :)
Original comment by [email protected]
on 20 Jun 2008 at 5:46
from nvidia-texture-tools.
Related Issues (20)
- Reading a two channel PNG (greyscale + alpha) via ImageIO::loadSTB() strips the alpha channel
- Weird code in BlockCTX1::evaluatePalette
- Assertion failed: isValidPtr(m_buffer) for nvdecompress
- Decompressing ETC2 and saving out as KTX HOT 2
- nvidia-texture-tools fails to build on aarch64 HOT 1
- make test failing for all tests
- MinGW build under Windows HOT 2
- /usr/include/crt/host_config.h:119:2: error: #error -- unsupported GNU version (and fix)
- src/nvtt/tools/thumbnailer.cpp fails to build HOT 2
- src/nvtt/icbc.cpp compilation errors HOT 6
- Building error in GCC HOT 2
- RAW-output of a compressed image HOT 3
- Master fails to build, claims a file is missing: HOT 1
- Build errors in etcpack.cxx.o HOT 4
- Relation to NVIDIA Texture Tools exporter HOT 1
- Compile error on master branch. HOT 3
- For distribution please recompile nvtt library (Windows) with more recent Visual Studio HOT 2
- BC7 compression is very slow HOT 1
- seg fault with -bc1 -fast
- binaries faulty
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nvidia-texture-tools.