Giter Site home page Giter Site logo

Comments (27)

claudeha avatar claudeha commented on September 1, 2024

(animation is smooth when I replicate the zoom animation maths in fqs)

from fragm.

3Dickulus avatar 3Dickulus commented on September 1, 2024

have you tried...

#define USE_DOUBLE
#include "MathUtils.frag"
#include "Complex.frag"
#include "Progressive2D-4.frag"

dvec3 color(dvec2 p)
{ ...

will come back to this after some sleep...

from fragm.

claudeha avatar claudeha commented on September 1, 2024

just tried it - the animation is still broken (it has smooth parts, interspersed with step-changes/jumps that shouldn't be there)

from fragm.

claudeha avatar claudeha commented on September 1, 2024

Diagnosed the issue: double precision exp is totally broken (except between 0.5 and 1.0). Test frag (all options of Test Function slider should display a straight line on the positive X side):

#version 450 compatibility

#include "MathUtils.frag"
#include "Progressive2D.frag"

#define USE_DOUBLE
#define log dlog
#define exp dexp
#include "Complex.frag"
#undef log
#undef exp

#group Test
// 0 = identity; 1 = double exp; 2 = double log
uniform int Function; slider[0,0,2]

vec3 color(vec2 q)
{
  float logq = log(q.x);
  float expq = exp(q.x);
  float explogq = float(dexp(double(logq)));
  float logexpq = float(dlog(double(expq)));
  float testfid = q.x < q.y ? 1.0 : 0.0;
  float testdexp = explogq < q.y ? 1.0 : 0.0;
  float testdlog = logexpq < q.y ? 1.0 : 0.0;
  return vec3(vec3(testfid, testdexp, testdlog)[Function]);
}

from fragm.

claudeha avatar claudeha commented on September 1, 2024

Found the problems!

bool invert_answer = true;

bool invert_answer = true;

should instead be initialized as:

bool invert_answer = false;

and

answer *= exp_approx(x);

should be

answer *= exp_approx(f);

from fragm.

3Dickulus avatar 3Dickulus commented on September 1, 2024

Bravo!

the note in Complex.frag for exp_approx says...

/* Approximation of f(x) = exp(x)
 * on interval [ 0, 0.5 ]
 * with a polynomial of degree 10.
 */

should the interval be larger 0.0 - 1.0 ?

I also see in my local copy of Complex.frag...

// Fixme pow(n) pow(n,n)

so no surprise, it still needs some tweaking

from fragm.

3Dickulus avatar 3Dickulus commented on September 1, 2024

a better (more precise) power function derived from https://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/

// requires #extension GL_ARB_gpu_shader_int64 : enable
double pow(double a, double b) {

        bool ltz = b<0;
	if(ltz) b = abs(b);
	
	// put unpacked double bits into long int
	uvec2 unpacked = unpackDouble2x32(a);
	int64_t tmp = int64_t(unpacked.y) << 32 + unpacked.x;

	double r = 1.0;
	int exp = int(b);
	
	// use the IEEE 754 trick for the fraction of the exponent
	int64_t tmp2 = int64_t((b - exp) * (tmp - 4606921280493453312L)) + 4606921280493453312L;
	unpacked.y = uint(tmp2 >> 32);
	unpacked.x = uint(tmp2 - (int64_t(unpacked.y) << 32));
	
	// exponentiation by squaring
	while (exp != 0) {
		if ((exp & 1) != 0) {
			r *= a;
		}
		a *= a;
		exp >>= 1;
	}
	
	r *= packDouble2x32(unpacked);
	return ltz ? 1.0/r : r;

}

Old pow()
old-pow
New (this) pow()
new-pow

from fragm.

3Dickulus avatar 3Dickulus commented on September 1, 2024

derived from https://martin.ankerl.com/2012/01/25/optimized-approximative-pow-in-c-and-cpp/

this one doesn't require int64_t so doesn't need #extension GL_ARB_gpu_shader_int64 : enable and should be faster, 5.44861 x faster than builtin pow() on CPU and without the less than zero test it is 5.8948 x faster on my xenon 4 core. the version above, using int64_t is only 1.2 x faster than builtin.

seems to generate a nice smooth image like the second one

double pow(double a, double b) {
	bool ltz = b<0;
	if(ltz) b = abs(b);
	// put unpacked double bits into long int
	uvec2 unpacked = unpackDouble2x32(a);
	double r = 1.0;
	int exp = int(b);
	// use the IEEE 754 trick for the fraction of the exponent
	unpacked.y = int((b - exp) * (unpacked.y - 1072632447) + 1072632447);
	unpacked.x = 0;
	// exponentiation by squaring
	while (exp != 0) {
		if ((exp & 1) != 0) r *= a;
		a *= a;
		exp >>= 1;
	}
	r *= packDouble2x32(unpacked);
	return ltz ? 1.0/r : r;
}

I'll see if I can find something better for log() and exp()

from fragm.

claudeha avatar claudeha commented on September 1, 2024

unpacked.x = 0 -> only accurate to 53-32=21 bits, afaict?

from fragm.

3Dickulus avatar 3Dickulus commented on September 1, 2024

that separates the int and fraction parts, on the link mentioned 23 bits is the portion of interest afaikt

so the one from 2007 using int64_t retains better accuracy? need to do some testing on that to see exactly what the accuracy is, but it is better than it was :D there's another version with adjustable accuracy...

do either of these have problems with Mesa?

from fragm.

3Dickulus avatar 3Dickulus commented on September 1, 2024

after some tests, replacing log() and exp() with CORDIC and other versions and getting the same bug I went back to the originals and just looked at it trying to figure out where the bug was, test, test, test and find it might be a GPU ROUNDING vs doubles thing... try adding x += 4.94065645841247E-308LF; as the first line in log() function of Complex.frag and run your test with pow(a,b) { return exp(log(a)*b); };

On my system it seems to render properly :D both the pow() function posted above using int64_t and the one using exp(log(a)*b) generate the same image w/o the funny truncating of the bow and skewing of the masts.

the value 4.94065645841247E-324 is double epsilon but too small, still shows the bug, so I find that 4.94065645841247E-308 is the smallest number to add to a that gets rid of the bug.

speculation: somewhere in the code a value is handled or passed from a float type or maybe compiler optimiszation, thus loosing some precision, while the Complex.frag functions are OK, I do recall a lot of testing when this was originally put together on FF to make sure we had enough precision and accuracy.

from fragm.

claudeha avatar claudeha commented on September 1, 2024

I guess the e-324 is denormalized/subnormal, wouldn't be surprised if GPU doesn't support that part of IEEE maths. But, I can't test now...

from fragm.

3Dickulus avatar 3Dickulus commented on September 1, 2024

you are correct, exp() is terrible :( (need to use scroll bar at the bottom to see second diffs column)

Tabulated column is a precalculated target value
Built-in column is calculated with functions defined in <math.h>
TEST column is functions from Complex.frag -> C99
N is iterations to use in trig functions (does not apply to all)
the Difference columns are Tabulated minus Calculated

note: the log function has the same differences as the builtin function so that's good?

02 February 2019 06:44:16 PM


TEST001: Compute the cosine

       ANGLE    N       Cos(A)            Cos(A)             Difference       Cos(A)           Difference
                        Tabulated         Built-in                            TEST

           0    18                 1                 1                 0                 1                 0
  0.26179939    18      0.9659258263      0.9659258263                 0      0.9659258263                 0
         0.5    18      0.8775825619      0.8775825619                 0      0.8775825619                 0
  0.52359878    18      0.8660254038      0.8660254038                 0      0.8660254038                 0
  0.78539816    18      0.7071067812      0.7071067812                 0      0.7071067812                 0
           1    18      0.5403023059      0.5403023059                 0      0.5403023059                 0
   1.0471976    18               0.5               0.5      1.110223e-16               0.5                 0
   1.5707963    18                 0   6.123233996e-17     -6.123234e-17                 0                 0
           2    18     -0.4161468365     -0.4161468365                 0     -0.4161468365    -2.7755576e-16
           3    18     -0.9899924966     -0.9899924966                 0     -0.9899924966                 0
   3.1415927    18                -1                -1                 0                -1                 0
           4    18     -0.6536436209     -0.6536436209                 0     -0.6536436209                 0
           5    18      0.2836621855      0.2836621855                 0      0.2836621855    -3.8857806e-16

    Builtin Duration uSec 26    TEST Duration uSec 2


TEST002: Compute the sine

       ANGLE     N         Sin(A)             Sin(A)         Difference       Sin(A)           Difference
                           Tabulated          Built-in                        TEST

           0    18                 0                 0                 0                 0                 0
  0.26179939    18      0.2588190451      0.2588190451                 0      0.2588190451                 0
         0.5    18      0.4794255386      0.4794255386                 0      0.4794255386                 0
  0.52359878    18               0.5               0.5                 0               0.5                 0
  0.78539816    18      0.7071067812      0.7071067812                 0      0.7071067812                 0
           1    18      0.8414709848      0.8414709848                 0      0.8414709848                 0
   1.0471976    18      0.8660254038      0.8660254038                 0      0.8660254038                 0
   1.5707963    18                 1                 1                 0                 1      1.110223e-16
           2    18      0.9092974268      0.9092974268                 0      0.9092974268                 0
           3    18      0.1411200081      0.1411200081                 0      0.1411200081    -4.7184479e-16
   3.1415927    18   1.224646799e-16   1.224646799e-16                 0                 0     1.2246468e-16
           4    18     -0.7568024953     -0.7568024953                 0     -0.7568024953    -6.6613381e-16
           5    18     -0.9589242747     -0.9589242747                 0     -0.9589242747     1.4432899e-15

    Builtin Duration uSec 2    TEST Duration uSec 5


TEST003: Compute the arctangent of Y/X

        X             Y          N        ArcTan(Y/X)      ArcTan(Y/X)        Difference      ArcTan(Y/X)     Difference
                                          Tabulated        Built-in                           TEST

  -0.039866397            -0    18                 0                 0                 0                 0                 0
    0.33745059   0.084362648    18      0.2449786631      0.2449786631                 0      0.2449786631                 0
   -0.60703773   -0.20234591    18      0.3217505544      0.3217505544                 0      0.3217505544                 0
   -0.10716652  -0.053583262    18       0.463647609       0.463647609                 0       0.463647609                 0
   -0.44668236   -0.44668236    18      0.7853981634      0.7853981634                 0      0.7853981634     9.1952002e-12
    0.79869587     1.5973917    18       1.107148718       1.107148718                 0       1.107148718     -2.220446e-16
    0.39160358     1.1748107    18       1.249045772       1.249045772                 0       1.249045772                 0
    0.56534511     2.2613804    18       1.325817664       1.325817664                 0       1.325817664                 0
   -0.37139404    -1.8569702    18       1.373400767       1.373400767                 0       1.373400767                 0
   -0.46661976    -4.6661976    18       1.471127674       1.471127674                 0       1.471127674                 0
    0.71481238     14.296248    18       1.520837931       1.520837931                 0       1.520837931                 0

    Builtin Duration uSec 5    TEST Duration uSec 1


TEST004: Compute the tangent of THETA

         THETA     N        Tan(THETA)        Tan(THETA)        Difference      Tan(THETA)       Difference
                            Tabulated         Built-in                          TEST

             0    18                 0                 0                 0                 0                 0
    0.26179939    18      0.2679491924      0.2679491924    -5.5511151e-17      0.2679491924    -5.5511151e-17
           0.5    18      0.5463024898      0.5463024898                 0      0.5463024898                 0
    0.52359878    18      0.5773502692      0.5773502692     -1.110223e-16      0.5773502692     -1.110223e-16
    0.78539816    18                 1                 1      1.110223e-16                 1      1.110223e-16
             1    18       1.557407725       1.557407725                 0       1.557407725      2.220446e-16
     1.0471976    18       1.732050808       1.732050808    -4.4408921e-16       1.732050808     -2.220446e-16
     1.3089969    18       3.732050808       3.732050808    -4.4408921e-16       3.732050808    -4.4408921e-16
     1.4398966    18       7.595754113       7.595754113    -5.3290705e-15       7.595754113    -7.9936058e-15
     1.5053465    18       15.25705169       15.25705169    -3.5527137e-15       15.25705169    -1.0658141e-14
             2    18      -2.185039863      -2.185039863                 0      -2.185039863     1.3322676e-15
             3    18     -0.1425465431     -0.1425465431                 0     -0.1425465431     4.7184479e-16
     3.1415927    18                 0  -1.224646799e-16     1.2246468e-16                -0                 0
             4    18       1.157821282       1.157821282                 0       1.157821282      1.110223e-15
             5    18      -3.380515006      -3.380515006                 0      -3.380515006     4.4408921e-16

    Builtin Duration uSec 113    TEST Duration uSec 8


TEST005: Compute the exponential function

           X     N         Exp(X)            Exp(X)           Difference          Exp(X)          Difference
                           Tabulated         Built-in                             TEST

         -10    18   4.539992976e-05   4.539992976e-05                 0   4.539992976e-05    -2.0328791e-20
          -5    18    0.006737946999    0.006737946999                 0    0.006737946999    -1.7347235e-18
          -1    18      0.3678794412      0.3678794412                 0      0.3678794412                 0
           0    18                 1                 1                 0                 1                 0
       1e-08    18        1.00000001        1.00000001                 0        1.00000001                 0
      0.0001    18       1.000100005       1.000100005                 0       1.000100005                 0
       0.001    18         1.0010005         1.0010005                 0         1.0010005                 0
        0.01    18       1.010050167       1.010050167                 0       1.010050167                 0
         0.1    18       1.105170918       1.105170918                 0       1.105170918                 0
         0.2    18       1.221402758       1.221402758                 0       1.221402758                 0
         0.3    18       1.349858808       1.349858808                 0       1.349858808                 0
         0.4    18       1.491824698       1.491824698                 0       1.491824698                 0
         0.5    18       1.648721271       1.648721271                 0       1.648721271                 0
         0.6    18         1.8221188         1.8221188                 0       1.502083012        0.32003579
         0.7    18       2.013752707       2.013752707                 0       1.660058461        0.35369425
         0.8    18       2.225540928       2.225540928    -4.4408921e-16       1.834648334        0.39089259
         0.9    18       2.459603111       2.459603111                 0       2.027599983        0.43200313
           1    18       2.718281828       2.718281828                 0       2.718281828                 0
           2    18       7.389056099       7.389056099                 0       7.389056099     8.8817842e-16
   3.1415927    18       23.14069263       23.14069263     3.5527137e-15       23.14069263     1.0658141e-14
           5    18       148.4131591       148.4131591                 0       148.4131591     2.8421709e-14
          10    18       22026.46579       22026.46579                 0       22026.46579     1.0913936e-11
          20    18       485165195.4       485165195.4                 0       485165195.4     4.1723251e-07
          40    18   2.353852668e+17   2.353852668e+17                 0   2.353852668e+17               384

    Builtin Duration uSec 6    TEST Duration uSec 4

TEST006: Compute the natural logarithm

             X     N         Ln(X)              Ln(X)          Difference            Ln(X)         Difference
                             Tabulated          Built-in                             TEST

         1e-05    18      -11.51292546      -11.51292546                 0      -11.51292546                 0
          0.01    18      -4.605170186      -4.605170186    -8.8817842e-16      -4.605170186                 0
           0.1    18      -2.302585093      -2.302585093    -4.4408921e-16      -2.302585093    -4.4408921e-16
           0.2    18      -1.609437912      -1.609437912                 0      -1.609437912                 0
           0.3    18      -1.203972804      -1.203972804      2.220446e-16      -1.203972804      2.220446e-16
           0.4    18     -0.9162907319     -0.9162907319     -1.110223e-16     -0.9162907319     -1.110223e-16
           0.5    18     -0.6931471806     -0.6931471806                 0     -0.6931471806                 0
           0.6    18     -0.5108256238     -0.5108256238                 0     -0.5108256238                 0
           0.7    18     -0.3566749439     -0.3566749439     5.5511151e-17     -0.3566749439     5.5511151e-17
           0.8    18     -0.2231435513     -0.2231435513    -5.5511151e-17     -0.2231435513    -5.5511151e-17
           0.9    18     -0.1053605157     -0.1053605157    -1.3877788e-17     -0.1053605157    -1.3877788e-17
             1    18                 0                 0                 0                 0                 0
             2    18      0.6931471806      0.6931471806                 0      0.6931471806                 0
             3    18       1.098612289       1.098612289                 0       1.098612289      2.220446e-16
       3.14159    18       1.144729886       1.144729886                 0       1.144729886                 0
             5    18       1.609437912       1.609437912                 0       1.609437912                 0
            10    18       2.302585093       2.302585093                 0       2.302585093                 0
            20    18       2.995732274       2.995732274                 0       2.995732274                 0
           100    18       4.605170186       4.605170186                 0       4.605170186                 0
   1.23457e+08    18       18.63140177       18.63140177                 0       18.63140177                 0

    Builtin Duration uSec 5    TEST Duration uSec 8


  Normal end of execution.

from fragm.

claudeha avatar claudeha commented on September 1, 2024

musl libm https://git.musl-libc.org/cgit/musl/tree/src/math may be a source of accurate implementations? license is MIT. I looked at converting exp() but it needs int64 and has a huge table with hex-float literals (which afaik are not supported by GLSL).

from fragm.

claudeha avatar claudeha commented on September 1, 2024

Took a couple of hours but I managed to port musl's double exp(double) to GLSL and it seems to work ok. Needs some minor cleanups to avoid polluting the namespace so much, but more practically, there are some issues with overloading inbuilt functions, eg: double sin(double x) { return double(sin(float(x))); } gives an error about static recursion in my driver, instead of using inbuilt float sin(float). So maybe it would be better to define our own functions, perhaps like:

float Sin(float x) {return sin(x); }
double Sin(double x) { return /* computed via musl port to GLSL */; }

Exp.frag.txt
Exp-Test.frag.txt

from fragm.

claudeha avatar claudeha commented on September 1, 2024

https://bugs.freedesktop.org/show_bug.cgi?id=31744#c9 a.k.a. overloading builtins is a world of pain in GLSL

from fragm.

claudeha avatar claudeha commented on September 1, 2024

I think this kind of thing would be necessary for every math function:

float builtin_sin(float x) { return sin(x); } // save original sin
float sin(float x) { return builtin_sin(x); } // redefine sin
double sin(double x) { return ...; } // overload sin

And the fact that it's not actually recursive is highly non-obvious. To make matters worse, one also needs to save and redefine all the builtin overloads (vec2, vec3, ...) as well as just the float versions...

EDIT:

float builtin_sin(float x) { return sin(x); }
vec2 builtin_sin(vec2 x) { return sin(x); }
vec3 builtin_sin(vec3 x) { return sin(x); }
vec4 builtin_sin(vec4 x) { return sin(x); }

float sin(float x) { return builtin_sin(x); }
vec2 sin(vec2 x) { return builtin_sin(x); }
vec3 sin(vec3 x) { return builtin_sin(x); }
vec4 sin(vec4 x) { return builtin_sin(x); }

double sin(double x) { return double(sin(float(x))); } // the real implementation
dvec2 sin(dvec2 x) { return dvec2(sin(x.x), sin(x.y)); }
dvec3 sin(dvec3 x) { return dvec3(sin(x.x), sin(x.y), sin(x.z)); }
dvec4 sin(dvec4 x) { return dvec4(sin(x.x), sin(x.y), sin(x.z), sin(x.w)); }

not so bad perhaps after all...

from fragm.

3Dickulus avatar 3Dickulus commented on September 1, 2024

it looks like a lot of valuable fragment program buffer space...

from fragm.

claudeha avatar claudeha commented on September 1, 2024

I ported the following from musl libm to double precision GLSL:

double scalbn(double x, int y);
double exp(double x);
double log(double x);
double sin(double x);
double cos(double x);
double tan(double x);
double atan(double x);

some of these do have large tables which are undesirable for GPU as you point out (I didn't think of that before...)

BUT: there were still precision problems in DoubleTest.frag, which I narrowed down to the builtin double sqrt(double) being imprecise! This fixed it finally for me:

// Hardware sqrt improved by the Babylonian algorithm (Newton Raphson)
double sqrt(double m)
{
  double z = _builtin_sqrt(m); // as defined by the overloading pattern described in earlier comment
  z = (z + m / z) / 2.0LF;
  z = (z + m / z) / 2.0LF;
  z = (z + m / z) / 2.0LF;
  z = (z + m / z) / 2.0LF;
  return z;
}

Will push code tomorrow, need to sleep now...

from fragm.

3Dickulus avatar 3Dickulus commented on September 1, 2024

just rendering a few test images...
claudes-double-test-pre-thumb

from fragm.

claudeha avatar claudeha commented on September 1, 2024

built in square root:
double-test-builtin-sqrt-zoomout-1

my improved square root (same location)
double-test-good-sqrt-zoomout-1

no changes to the maths in Complex.frag. looking much better now

from fragm.

3Dickulus avatar 3Dickulus commented on September 1, 2024

the image I posted above is pre frag changes, this one is post changes same location same iteration count but this one rendered very much faster, like float speed? haven't tested zooming or anything else, just time for a couple of renders this evening.
claudes-double-test-post-thumb

from fragm.

3Dickulus avatar 3Dickulus commented on September 1, 2024

Really nice work claude!!! I start seeing blocky pixels at around Zoom=23919999977203.91015625

edit:zoom was clamped in my first test

from fragm.

claudeha avatar claudeha commented on September 1, 2024

the Zoom at which blocky pixels start appearing is dependent on image size

from fragm.

claudeha avatar claudeha commented on September 1, 2024

#info BUG: animation is not smooth (issue in double precision log or exp inside cPow?)

the orginal frag at the top of this thread now animates smoothly, so I guess this can be closed now?

from fragm.

3Dickulus avatar 3Dickulus commented on September 1, 2024

if you're happy with it then yes, I do have some concerns about the overloading, does it add bytes to the binary prog? how much? if it all gets sorted out and optimized by the compiler down to the essentials then I guess the overall impact is trivial.

from fragm.

claudeha avatar claudeha commented on September 1, 2024

I can't check (no access to any NVIDIA with GL4). The pass-through overloads should evaporate if the compiler is any good, and the improved sqrt etc would also disappear if unused but are probably necessary for correctness if used, and they aren't very big anyway I hope.

from fragm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.