Giter Site home page Giter Site logo

Comments (13)

disheng222 avatar disheng222 commented on June 9, 2024

from sz.

disheng222 avatar disheng222 commented on June 9, 2024

from sz.

bd4 avatar bd4 commented on June 9, 2024

The issues with small data (#9) are fixed, but this issue remains.

from sz.

disheng222 avatar disheng222 commented on June 9, 2024

from sz.

bd4 avatar bd4 commented on June 9, 2024

I'm still wrapping my head around the fp representation issues that lead to this - is there a way to define the possible error more precisely, e.g. in terms of the bound and machine epsilon?

from sz.

disheng222 avatar disheng222 commented on June 9, 2024

from sz.

bd4 avatar bd4 commented on June 9, 2024

I found that example confusing - is the idea that an internal floating point operation done by SZ (using single precision float) could produce a value of ~128.178324 decimal, which would get rounded to 01000011 00000000 00101101 10100111 (closer to 128.178329 decimal) to fit the result in the float? I don't see how 'the original data value' could be 128.178324, from the perspective of SZ, because SZ takes an array of floats that have already been represented in IEEE 754. The initial error between a supposed mathematically precise data value and the IEEE 754 representation happens before and independently from SZ compression. My test is comparing the input array of floats to the output array of floats, so it's only measuring the error introduced by the SZ round trip.

Regardless of the exact mechanism, I think the more practical question of concern to users is this: given a certain absolute or relative error bound and the IEEE 754 precision (float or double), what can be said about the error characteristics of this implementation of SZ? All the examples I have seen where it exceeds the absolute bound are very small, in the absolute sense. It seems possible that the absolute error could be higher (just based on the exponent/position of decimal), but maybe the relative error is still bounded?

from sz.

disheng222 avatar disheng222 commented on June 9, 2024

from sz.

bd4 avatar bd4 commented on June 9, 2024

As you know, during the compression, SZ transforms the floating point data
(such as 128.178324) to bytes, and the floating point data will be
reconstructed by the bytes during the decompression step. What is
interesting is that in the decompression step, I observed that the
floating-point data may not be exactly the same number as the original one.
I checked the binary format, and finally found that issue: 128.178329 and
128.178324 share the same binary representation, so we can't say that this
is due to bugs in SZ. Any other lossy compressors may have the same issue.

Maybe I am not understanding the context, but saying that two numbers have the same representation seems misleading to me. Instead I think of numbers as being rounded to fit into a floating point representation. When performing floating point operations, the result must be rounded at the end to fit back into the representation. I'm trying to understand what step(s) of the compression are causing rounding that leads to the loss of the absolute error bound guarantee. Or if there is some byte manipulation that happens later that introduces the error in another way. And if there is still some guarantee that can be made, like 2 times the absolute error bound.

Looking at the code, the curve fitting prediction is done using floating point operations. That prediction will be rounded, but I think it is checked against the absolute error bound in the same way I am checking it at the end? Using a floating point operation to compute the difference also has rounding, but it should be consistent between what my test does and what the code does, so I don't see how that can lead to extra error outside the strict absolute bound. I'm getting the error on the linear data though, which should be using this part of the code and not the unpredictable value compression.

from sz.

jychoi-hpc avatar jychoi-hpc commented on June 9, 2024

I have tried his test code and am getting similar concern. Even with 64-bit doubles, the abs error is quite larger than the standard double machine epsilone (DBL_EPSILON: 2.22045e-16, defined in <float.h>). I.e, 0.0000100000033854 > DBL_EPSILON

However, it would be better if SZ can guarantee strict ABS error bound. What if SZ take more conservative approach? For an example, if an user ask x ABS error bound, SZ take x - e (machine epsilon) internally to strictly guarantee the result regardless of the rounding errors or etc.

from sz.

disheng222 avatar disheng222 commented on June 9, 2024

from sz.

bd4 avatar bd4 commented on June 9, 2024

I set a conditional breakpoint and stepped through the compression of the number that ends up being out of bounds:
sz-error-exceeds-bound

Since diff is clearly greater than realPrecision, I don't understand why the code can't detect this and fall back to using compressSingleDoubleValue. What is the idea behind itvNum and intvCapacity?

from sz.

disheng222 avatar disheng222 commented on June 9, 2024

from sz.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.