error larger than abs error param on linear data,about szcompressor/sz

disheng222 commented on June 9, 2024

Hi Bryce, I can reproduce the issue on my computer. I realized that SZ has a bug in this particular testing. When the data are completely random, the prediction is completely wrong for each data point, and this may leads to only 1 node in the Huffman tree, and it seems that my code didn't deal with this case well. I need to have a weekly conf. call right now, and I will try to fix it thereafter. Thanks for your patience. Best, Sheng 2017-02-17 12:51 GMT-06:00 Bryce Allen <[email protected]>:

…

I'm not 100% sure this isn't an error in my test code, but I'm seeing absolute error that is larger than the abs error param I passed. I've only seen this happen when using certain sizes and small error bounds. For example: ./sztest-static 10000 1000 -5 size 10000x1000, 80000000 bytes [SZ] Reading SZ configuration file (sz.config) ... == LIN == out_size = 99809 (0.0012) out = 0x2104080 abs bound = 0.0000100000000000 roundtrip differs ERR: data error out of range: -0.0000100000033854 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#10>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJW1ScKwadwZQEKYF6-6IpuaW6XVXXcNks5rdexIgaJpZM4MEnlr> .

from sz.

disheng222 commented on June 9, 2024

Hi Bryce, The bug is fixed. Please give it another try. :) Should you have more problems/questions, feel free to let me know. Thanks. Best, Sheng 2017-02-17 14:58 GMT-06:00 Sheng Di <[email protected]>:

…

Hi Bryce, I can reproduce the issue on my computer. I realized that SZ has a bug in this particular testing. When the data are completely random, the prediction is completely wrong for each data point, and this may leads to only 1 node in the Huffman tree, and it seems that my code didn't deal with this case well. I need to have a weekly conf. call right now, and I will try to fix it thereafter. Thanks for your patience. Best, Sheng 2017-02-17 12:51 GMT-06:00 Bryce Allen ***@***.***>: > I'm not 100% sure this isn't an error in my test code, but I'm seeing > absolute error that is larger than the abs error param I passed. I've only > seen this happen when using certain sizes and small error bounds. > > For example: > > ./sztest-static 10000 1000 -5 > size 10000x1000, 80000000 bytes > [SZ] Reading SZ configuration file (sz.config) ... > > == LIN == > out_size = 99809 (0.0012) > out = 0x2104080 > abs bound = 0.0000100000000000 > roundtrip differs > ERR: data error out of range: -0.0000100000033854 > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <#10>, or mute the thread > <https://github.com/notifications/unsubscribe-auth/AJW1ScKwadwZQEKYF6-6IpuaW6XVXXcNks5rdexIgaJpZM4MEnlr> > . >

from sz.

bd4 commented on June 9, 2024

The issues with small data (#9) are fixed, but this issue remains.

from sz.

disheng222 commented on June 9, 2024

Hi Bryce, I guess you mean some errors may exceed the error bound slightly, right? For example, -0.0000100000033854 vs. 1E-5. This issue is explained in the trouble-shooting section of the user-guide.pdf. Some times, this is because of the machine representation. So, this may not be a big problem for users, IMHO. :P Thanks. Best, Sheng 2017-02-20 10:03 GMT-06:00 Bryce Allen <[email protected]>:

…

The issues with small data (#9 <#9>) are fixed, but this issue remains. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJW1SVQ0rPyMW7cjR6VtyZwunVYARVNWks5rebk2gaJpZM4MEnlr> .

from sz.

bd4 commented on June 9, 2024

I'm still wrapping my head around the fp representation issues that lead to this - is there a way to define the possible error more precisely, e.g. in terms of the bound and machine epsilon?

from sz.

disheng222 commented on June 9, 2024

1. Why would the maximum decompression error be slightly larger than the specified error bound in some cases? Answer: This is due to the machine representation of the floating-point data. Here is one example. Suppose the error bound is set to 1E-5 and the original data value is 128.178314, and the decompressed value would be 128.178324 according to our compression method, which satisfies the error bound 1E-5. However, 128.178324 and 128.178329 has exactly the same IEEE 754 representation: 01000011 00000000 00101101 10100111. So, the decompressed value may be represented as 128.178329 instead of 128.178324. In this situation, the decompression error is 1.5E-5, which is larger than the error bound 1E-5. Hope the above answer is helpful. Best, Sheng 2017-02-21 15:40 GMT-06:00 Bryce Allen <[email protected]>:

…

I'm still wrapping my head around the fp representation issues that lead to this - is there a way to define the possible error more precisely, e.g. in terms of the bound and machine epsilon? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJW1SSLskLMXc-kkxniT_J4Y88GxJqJPks5re1nmgaJpZM4MEnlr> .

from sz.

bd4 commented on June 9, 2024

I found that example confusing - is the idea that an internal floating point operation done by SZ (using single precision float) could produce a value of ~128.178324 decimal, which would get rounded to 01000011 00000000 00101101 10100111 (closer to 128.178329 decimal) to fit the result in the float? I don't see how 'the original data value' could be 128.178324, from the perspective of SZ, because SZ takes an array of floats that have already been represented in IEEE 754. The initial error between a supposed mathematically precise data value and the IEEE 754 representation happens before and independently from SZ compression. My test is comparing the input array of floats to the output array of floats, so it's only measuring the error introduced by the SZ round trip.

Regardless of the exact mechanism, I think the more practical question of concern to users is this: given a certain absolute or relative error bound and the IEEE 754 precision (float or double), what can be said about the error characteristics of this implementation of SZ? All the examples I have seen where it exceeds the absolute bound are very small, in the absolute sense. It seems possible that the absolute error could be higher (just based on the exponent/position of decimal), but maybe the relative error is still bounded?

from sz.

disheng222 commented on June 9, 2024

2017-02-22 12:02 GMT-06:00 Bryce Allen <[email protected]>:

I found that example confusing - is the idea that an internal floating point operation done by SZ (using single precision float) could produce a value of ~128.178324 decimal, which would get rounded to 01000011 00000000 00101101 10100111 (closer to 128.178329 decimal) to fit the result in the float? I don't see how 'the original data value' could be 128.178324, from the perspective of SZ, because SZ takes an array of floats that have already been represented in IEEE 754.

As you know, during the compression, SZ transforms the floating point data (such as 128.178324) to bytes, and the floating point data will be reconstructed by the bytes during the decompression step. What is interesting is that in the decompression step, I observed that the floating-point data may not be exactly the same number as the original one. I checked the binary format, and finally found that issue: 128.178329 and 128.178324 share the same binary representation, so we can't say that this is due to bugs in SZ. Any other lossy compressors may have the same issue. You can check the binary representation of the two numbers using this website: http://www.binaryconvert.com/convert_float.html

The initial error between a supposed mathematically precise data value and the IEEE 754 representation happens before and independently from SZ compression. My test is comparing the input array of floats to the output array of floats, so it's only measuring the error introduced by the SZ round trip. Regardless of the exact mechanism, I think the more practical question of concern to users is this: given a certain absolute or relative error bound and the IEEE 754 precision (float or double), what can be said about the error characteristics of this implementation of SZ? All the examples I have seen where it exceeds the absolute bound are very small, in the absolute sense. It seems possible that the absolute error could be higher (just based on the exponent/position of decimal), but maybe the relative error is still bounded?

Not very sure if I understand your question. Let me try to answer it as follows. Absolute error are relative error actually are of no difference. In fact, for relative error, SZ needs to compute the absolute error based on the value range and the relative error bound ratio (such as 0.001) set by users. In this sense, they are same with respect to SZ. SZ just try to guarantee the "maximum errors" of the data points be under the absolute error bound specified by the user. The distribution of errors is uniform.

…

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJW1SRB8xEIdO5WgebNztFK3_rcpP7ZMks5rfHgggaJpZM4MEnlr> .

from sz.

bd4 commented on June 9, 2024

As you know, during the compression, SZ transforms the floating point data
(such as 128.178324) to bytes, and the floating point data will be
reconstructed by the bytes during the decompression step. What is
interesting is that in the decompression step, I observed that the
floating-point data may not be exactly the same number as the original one.
I checked the binary format, and finally found that issue: 128.178329 and
128.178324 share the same binary representation, so we can't say that this
is due to bugs in SZ. Any other lossy compressors may have the same issue.

Maybe I am not understanding the context, but saying that two numbers have the same representation seems misleading to me. Instead I think of numbers as being rounded to fit into a floating point representation. When performing floating point operations, the result must be rounded at the end to fit back into the representation. I'm trying to understand what step(s) of the compression are causing rounding that leads to the loss of the absolute error bound guarantee. Or if there is some byte manipulation that happens later that introduces the error in another way. And if there is still some guarantee that can be made, like 2 times the absolute error bound.

Looking at the code, the curve fitting prediction is done using floating point operations. That prediction will be rounded, but I think it is checked against the absolute error bound in the same way I am checking it at the end? Using a floating point operation to compute the difference also has rounding, but it should be consistent between what my test does and what the code does, so I don't see how that can lead to extra error outside the strict absolute bound. I'm getting the error on the linear data though, which should be using this part of the code and not the unpredictable value compression.

from sz.

jychoi-hpc commented on June 9, 2024

I have tried his test code and am getting similar concern. Even with 64-bit doubles, the abs error is quite larger than the standard double machine epsilone (DBL_EPSILON: 2.22045e-16, defined in <float.h>). I.e, 0.0000100000033854 > DBL_EPSILON

However, it would be better if SZ can guarantee strict ABS error bound. What if SZ take more conservative approach? For an example, if an user ask x ABS error bound, SZ take x - e (machine epsilon) internally to strictly guarantee the result regardless of the rounding errors or etc.

from sz.

disheng222 commented on June 9, 2024

2017-02-23 16:33 GMT-06:00 Jong Choi <[email protected]>:

I have tried his test code and am getting similar concern. Even with 64-bit doubles, the abs error is quite larger than the standard double machine epsilone (DBL_EPSILON: 2.22045e-16, defined in <float.h>). I.e, 0.0000100000033854 > DBL_EPSILON However, it would be better if SZ can guarantee strict ABS error bound. What if SZ take more conservative approach? For an example, if an user ask x ABS error bound, SZ take x - e (machine epsilon) internally to strictly guarantee the result no matter what the rounding errors or etc.

Yes, this is a solution I used to consider, but honestly I prefer to leaving this to users, and instead I clarrified why the situation may happen in the user guide. Note that it's hard for the developer to get to know an appropriate 'e' for different cases. Moreover, we observe that the breaking-error-bound case happens rarely and also slightly if it happens. Also note that some situations are actually still inevitable: e.g., as for the exemple I gave in the user-guide, SZ already saved all the 32 bits for the data in that case. In fact, loss of information is inevistable in other lossy compressors. For example, ZFP has a step that aligns all the expeonent in each block and this will lead the compression errors exceed the bound when the data values change largely in small regions.

…

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJW1SU14xGEc312WKtJCJehk5O-_TJLvks5rfglFgaJpZM4MEnlr> .

from sz.

bd4 commented on June 9, 2024

I set a conditional breakpoint and stepped through the compression of the number that ends up being out of bounds:

Since diff is clearly greater than realPrecision, I don't understand why the code can't detect this and fall back to using compressSingleDoubleValue. What is the idea behind itvNum and intvCapacity?

from sz.

disheng222 commented on June 9, 2024

Hi Bryce, Sorry for late reply, because I was too busy this morning. Note that pred2D is not the decompressed value. Basically, SZ predicts each data point by using its neighbors. As you can see here, for 2D data, it uses three neighboring points (on the left side and upper side). For each predicted value, SZ will check the prediction accuracy. It will use a kind of quantization method to encode the prediction. That is, we build some equal-sized bins (the bin size is twice of error bound) around the predicted value and check which bin the true value will fall in. For example, if the original data value is 1.25 and the predicted value is 1.0, and the error bound was set to 0.1. Then, we know that the true value is in the bin: [1.0+0.1*2,1.0+0.1*3], that is, the bin # 3. Then, we'll encode the number 3 instead. So, in principle, it's impossible to break the required error bound in the decompression. Note that in the prediction step, we used "decompressed data" to predict each data point instead of the "original data" in order to guarantee the compression errors in the bound. 2017-02-27 11:18 GMT-06:00 Bryce Allen <[email protected]>:

…

I set a conditional breakpoint and stepped through the compression of the number that ends up being out of bounds: [image: sz-error-exceeds-bound] <https://cloud.githubusercontent.com/assets/825586/23371534/ffae2bb8-fcdd-11e6-8d77-2015a5138b8c.png> Since diff is clearly greater than realPrecision, I don't understand why the code can't detect this and fall back to using compressSingleDoubleValue. What is the idea behind itvNum and intvCapacity ? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJW1ScSS01qVleADQ0s9xWmVJ7OuVhgAks5rgwV5gaJpZM4MEnlr> .

from sz.

error larger than abs error param on linear data about sz HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent