Giter Site home page Giter Site logo

Comments (13)

chmike avatar chmike commented on August 19, 2024

Thank you for testing out my code and reporting an issue.

0x61a80000 is a negative number since the bit 31 is 1. The computation is indeed overflowing but that is normal since we can only compute square roots of positive numbers.

Do you need to compute the square root of an unsigned floating point ?

from fpsqrt.

apodtele avatar apodtele commented on August 19, 2024

It is positive. The bit 31 is 0. The limit seems to be at 0x4fffffff, or 20479.99998

sqrt_fx16_16_to_fx16_16( 0x50000000 ) returns 0x800000,, or sqrt(20480) = 128, instead of 143.1084.

from fpsqrt.

apodtele avatar apodtele commented on August 19, 2024

It is weird at 0x64000000 good answers return again.

from fpsqrt.

chmike avatar chmike commented on August 19, 2024

You are right, sorry. 0x61A80000 is a positive number, the bit 31 is zero.

You also found a problem. I ran a brute force test by computing all square root values from 0 to 0x7FFFFFFF and indeed I get invalid values when 0x40000000 is reached. Every value smaller than that yields a valid value.

Here is the code I used in case we might need it later. You may start the loop a bit before 0x40000000 to check.

    // -- test computation
    for (i = 0; i < 0x7FFFFFFF; i++) {
        int32_t x = i, y = sqrt_fx16_16_to_fx16_16(x), e;
        double xf, yf, ef;
        xf = (double)i/k;
        yf = (double)y/k;
        ef = sqrt(xf);
        e = (int32_t)(ef*k);

        if ((i&0xFFFFF) == 0) {
            printf("0x%08X -------------\n", i);
        }
        if (e != y) {
            printf("x: 0x%08X (%f) y: 0x%08X (%f) e: 0x%08X (%f)\n", x, xf, y, yf, e, ef);
        }
    }

from fpsqrt.

apodtele avatar apodtele commented on August 19, 2024

r << 1 overflows. So this can be fixed by scaling r and everything else down 2-fold. Perhaps there is a better solution.

from fpsqrt.

chmike avatar chmike commented on August 19, 2024

I came to the same observation. We are missing just one bit.

There is no problem when using int64_t variables, or by using the 64bit integer sqrt_i64() function with the following function:

fx16_16_t sqrt_fx16_16_to_fx16_16(fx16_16_t v) {
    return (fx16_16_t)sqrt_i64((int64_t)v << 16);
}

Your suggestion works but is a hack and is slightly less performant. I'll update the code.

from fpsqrt.

chmike avatar chmike commented on August 19, 2024

Repo has been updated. Tell me if it works for you ? The tests have been updated to test all integers in the range 0 to 0x7FFFFFF included.

from fpsqrt.

apodtele avatar apodtele commented on August 19, 2024

Thanks. Another idea is to make a copy of r, say s, instead of of doubling it; then use comparison r - b + s >= q; then r -= t/2., which is fine because t is even.

from fpsqrt.

chmike avatar chmike commented on August 19, 2024

I didn't understand. Could you send me a PR ?

from fpsqrt.

apodtele avatar apodtele commented on August 19, 2024

Never mind that. I am pretty satisfied with this:

    uint32_t t, q, b, r;

    r = (uint32_t)v >> 1;
    q = ( v & 1 ) << 15;

    b = 0x20000000;
    do
    {
        t = q + b;
        if( r >= t )
        {
            r -= t;
            q = t + b; // equivalent to q += 2*b
        }
        b >>= 1;
        r <<= 1;
    }
    while( b > 0x10 );

    return ( q + 0x40 ) >> 7;

except the obvious failure with v = 1, which can be looked up instead. Yes. there is a loss of precision, which is unavoidable for large arguments anyway. Or, perhaps, q can be initialized more cleverly to recover that bit.

Corrected: initialized q so that it works well for for small odd v = {1, 3, 5, ...}.

Edited: added one extra cycle and rounded the return value.

Note that it seems to work for full range of 32-bit unsigned inputs including bit_31 = 1.

from fpsqrt.

apodtele avatar apodtele commented on August 19, 2024

FYI, this is what was adopted in FreeType, which is thoroughly fixed-point,
https://gitlab.freedesktop.org/freetype/freetype/-/blob/c4073d82517eff48458e166a6edfb0618b221a4d/src/base/ftcalc.c#L916-L1000

It is used on special occasions only.

from fpsqrt.

chmike avatar chmike commented on August 19, 2024

Repo has been updated. Tell me if it works for you ? The tests have been updated to test all integers in the range 0 to 0x7FFFFFF included.

from fpsqrt.

chmike avatar chmike commented on August 19, 2024

If speed is important as it is probably the case for FreeType than it could be interesting to remove the if statement to benefit from the CPU pipelining.

The instructions

if( r >= t ) {     
    r -= t;        
    q += b;        
}

may be replaced by

m = (uint32_t)(r < t) - 1; // m is 0xFFFFFFFF when r >= t , 0 otherwise 
r -= t & m;
q |= b & m;

Unwinding the loop is another way to increase speed. It is then possible to add a special handling to avoid the overflow.
b is a uint32_t with a single bit set. It's the bit we consider testing and the + may be replaced by a | (binary or). It may be faster on some CPU.

Unfortunately, I don't have time to test and measure its performance.

from fpsqrt.

Related Issues (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.