libcg / bfp Goto Github PK

View Code? Open in Web Editor NEW

285.0 285.0 25.0 87 KB

Beyond Floating Point - Posit C/C++ implementation

License: MIT License

Makefile 1.71% C++ 60.80% C 37.49%

gustafson ieee754 posit unum

bfp's Issues

The posit implementation seems to return a NaN

I am confused by this introduction in the README:

there is no "Not-a-Number" (NaN) value

and this line:

bfp/lib/posit.cpp

Line 135 in c813ea5

return nan();

which returns a posit number that is a NaN. Isn't this a contradiction?

I think this should be clarified in the README.

Incorrect handling of partial exponent fields

Using Posit(8, 2):

{8, 2} 01111010 (3) -> +11110 10 = 16384
x2
{8, 2} 01111011 (3) -> +11110 11 = 32768
x2
{8, 2} 01111100 (4) -> +111110 0 = 65536
x2
{8, 2} 01111101 (4) -> +111110 1 = 131072
x8
{8, 2} 01111110 (5) -> +1111110 = 1.04858e+06
x16
{8, 2} 01111111 (6) -> +1111111 = 1.67772e+07

Evidently, the value for 01111101 is incorrect. It should be 262144, so that the ratios between adjacent values would be x2 x2 x4 x4 x16, with gradual logarithmic spacing.

In other words, partial exponent fields represent the high bits of the full exponent field of width "es".

Implement rounding

Posits round to nearest even (least significant bit) and don't overflow to infinity or underflow to zero.

Is there a way yet to convert ints and floats to Posits?

I would like to use Posits in a language I'm working on. But I'm stumped at the simplest hurdle: initializing a variable.

Bug when adding numbers with large difference

Sorry to bother you again, I think I found a bug when adding numbers that are 10 OoMs apart:
double a = 1.2e4;
double b = -2.5e-6;
double c;
Posit pa = Posit(W, es);
Posit pb = Posit(W, es);
Posit pc = Posit(W, es);
pa.set(a);
pb.set(b);
pc = pa + pb;
c = a + b;
pa.print();
pb.print();
pc.print();
printf("pa: %f, pb: %.7f, pc: %f\n", pa.getDouble(), pb.getDouble(), pc.getDouble());
printf("a: %f, b: %.7f, c: %f\n", a, b, c);

Output for 32,3 and 32,2:
{32, 3} 01101010111011100000000000000000 -> +110 101 0111011100000000000000000 = 12000
{32, 3} 11110010101100000111010010101000 -> -0001 101 010011111000101101011000 = -2.5e-06
{32, 3} 01100100011101110100101010000000 -> +110 010 0011101110100101010000000 = 1262.58
pa: 12000.000000, pb: -0.0000025, pc: 1262.582031
a: 12000.000000, b: -0.0000025, c: 11999.999997

{32, 2} 01111001011101110000000000000000 -> +11110 01 011101110000000000000000 = 12000
{32, 2} 11111101010110000011101001010100 -> -000001 01 01001111100010110101100 = -2.5e-06
{32, 2} 01110100011101110100101010000000 -> +1110 10 0011101110100101010000000 = 1262.58
pa: 12000.000000, pb: -0.0000025, pc: 1262.582031
a: 12000.000000, b: -0.0000025, c: 11999.999997

Documentation

If you could add some simple documentation on how to use this library that would be great.

What numbers would be best for representing close to abilities for float and double?

I'm very interested in learning to use Posits for calculations to try for more precision and accuracy.

I'm a newbie at numerical analysis and numerical methods.

What do the numbers mean for Posit ( 5, 1 )?

I thought 5 might mean the total number of bits in the Posit number and one would be the total number of bits for the exponent? Is this correct? If so, how does it determine the number of bits for the fraction and the regime?

So for instance, I would like to try it out with this calculation, the same as in one of the Posit reference articles:

#include "cstdio"
#include "iostream"

#include "posit.h"

#define PREC float

class Posit32 : public Posit
{
public:

Posit32( PREC value_ ) :
  Posit( 32, 3 )
{
  set( value_ );
}

~Posit32( )
{
}

Posit32 operator + ( const Posit32 &rhs_ )
{
Posit32 r( rhs_ );
r.add( *this );
return r;
}

Posit32 operator * ( const Posit32 &rhs_ )
{
Posit32 r( rhs_ );
r.mul( *this );
return r;
}
void PrintStream( std::ostream & os_ ) const
{
os_ << getFloat( );
}

protected:

private:
};

std::ostream & operator << ( std::ostream &os_, const Posit32 &p_ )
{
p_.PrintStream( os_ );
return os_;
}

void test( );

int main(int argc, char *argv[])
{
auto p = Posit( 5, 1 );

for (unsigned i = 0; i < (unsigned)(1 << p.nbits()); i++) {
    p.setBits(i);
    p.print();
}

test( );

return 0;

}

void test( )
{
Posit32 a [ 4 ] = { 3.2e7, 1, -1, 8.0e7 };
Posit32 b [ 4 ] = { 4.0e7, 1, -1, -1.6e7 };
Posit32 c = ( a[ 0 ] * b[ 0 ] ) +
( a[ 1 ] * b[ 1 ] ) +
( a[ 2 ] * b[ 2 ] ) +
( a[ 3 ] * b[ 3 ] );

std::cout << "Calculation Result: " << c << std::endl;
}

My Result comes out to be:
Calculation Result: -1.6e+07

The correct answer is 2.

What am I doing wrong? (Probably lots, I'm just a newbie)

Thanks

Add a way to extend and shrink posits

It would be nice to be able to convert a {nbits=32, es=2} posit to {nbits=5, es=1}, for example.

Add C implementation and layer a C++ binding on top of that

Required to make it usable with embedded projects. For now it's better to keep C++ to facilitate hacking until we get something that works.

Implement addition

This gives us substraction for free since we can easily negate.

Implement for 2-bit posits
Implement for 3-bit posits (regime bits)
Implement for 4-bit posits (exponent bits)
Implement for 5-bit posits (fraction bits)
Implement for n-bit posits

Max posit width

What is the maximum posit width? I tested successfully with 32 bits (es=2), but for 33 bits, it doesn't work. I see POSIT_WIDTH = 32 in posit_types.h. And Posit::mBits is of type POSIT_UTYPE, which is uint32_t.
I tried small changes in posit_types.h, but that doesn't fix it. Do you know if I can change to 32+ bits easily?

macos compile nit.

gcc (actually clang) didn't like c++ options,
$ make
g++ -o lib/posit.o -std=c++11 -Ilib -Itest -O2 -Wall -g -c lib/posit.cpp
gcc -o lib/pack.o -std=c++11 -Ilib -Itest -O2 -Wall -g -c lib/pack.c
error: invalid argument '-std=c++11' not allowed with 'C/ObjC'

diff --git a/Makefile b/Makefile
index d9514a3..0a30df8 100644
--- a/Makefile
+++ b/Makefile
@@ -1,7 +1,7 @@
CC = gcc
CXX = g++
-CFLAGS = -std=c++11 -Ilib -Itest -O2 -Wall -g
-CXXFLAGS = $(CFLAGS)
+CFLAGS = -Ilib -Itest -O2 -Wall -g
+CXXFLAGS = $(CFLAGS) -std=c++11

LIB_TARGET = lib/libbfp.a

Convert from and to float

That would be a nice first step.

Add muParser example

Once we get a base implementation we should be able to wire muParser with bfp to get an interactive shell.

Implement division

As pointed out by @leobru, posits don't have exact reciprocals unlike unums, therefore dividing by multiplying with the inverse will not be as precise as dividing directly. We need to implement it properly.

Fixed size fast paths

Moving forward it would be interesting to add fixed 8/16/32/64 bits Posits. Removes a bit of flexibility and complexity for greater performance.

Implement multiplication

Implement for 2-bit posits
Implement for 3-bit posits (regime bits)
Implement for 4-bit posits (exponent bits)
Implement for 5-bit posits (fraction bits)
Implement for n-bit posits

Reciprocal is very wrong

auto three = Posit(32, 1);
three.set(3.0);
auto one_third = three.rec();
three.print();
one_third.print();

Prints

{32, 1} 01011000000000000000000000000000 (0) -> +10 1 1000000000000000000000000000 = 3
{32, 1} 00101000000000000000000000000000 (-1) -> +01 0 1000000000000000000000000000 = 0.375

OH MY GOD!

Where does the idea come from that computing reciprocals don't require changing the fraction part?

Log and Exp Support

Hi, I m wondering if log and exp would be supported for Posit?
If I want to implement those myself, is there any official definitions? I checked the pdf links in README, but those only contain the specification of simple binary ops.
There are also evaluation on the range of exact values of log/exp ops in the paper but I failed to find how log and exp are defined for posit.

Maxpos value

The library does not include easy min/maxpos values, so I made some code:

	int W = 32, es = 2;
	Posit p = Posit(W, es);
	unsigned long t = 1;
	p.setBits(t);
	p.print();
	t = 1<<(W-1);
	p.setBits(~t);
	p.print();

Output
{32, 2} 00000000000000000000000000000001 -> +0000000000000000000000000000001 = 7.52316e-37
{32, 2} 01111111111111111111111111111111 -> +1111111111111111111111111111111 = 8.50706e+37

Performing "{:.5E}".format((2**2)**(2*30)) in Python returns 1.32923e+36 (factor 64x lower), which is also returned as maxpos by SoftPosit (https://gitlab.com/cerlane/SoftPosit).

The same code for es=3 prints
{32, 3} 00000000000000000000000000000001 -> +0000000000000000000000000000001 = 5.6598e-73
{32, 3} 01111111111111111111111111111111 -> +1111111111111111111111111111111 = 7.23701e+75
Whereas the online references note 6e-73 to 2e72 as dynamic range for W=32,es=3, maxpos seems off again.

libcg / bfp Goto Github PK

bfp's People

Contributors

Stargazers

Watchers

Forkers

bfp's Issues

Recommend Projects

Recommend Topics

Recommend Org