The fpu from dawsonjon

How much time does it require to divide two numbers?

I am trying to ask this because, I am using Quartus II. But this tool can't provide information or I can't analyze the output because there is 1 microsecond limit in waveform observation, I can't analyze output beyond 1 micro second. Should division take higher than 1 micro second?

Managing simulation time for divider circuit

This divider code is working for us and its output is being displayed at 1165ns. But our requirement time is below 500 ns.
So can you suggest us ways to reduce the simulation time to 500ns?

Error(xxxxxxxxxxxxxxxxxxxxxxx) in Division output using the following testbench

testbench:-
`timescale 1ns / 1ps

module Divider_tb;

reg clk, rst;
reg [31:0] input_a;
reg input_a_stb;
reg [31:0] input_b;
reg input_b_stb;
reg output_z_ack;
//reg s_input_a_ack,s_input_b_ack;

wire input_a_ack;
wire input_b_ack;
wire [31:0] output_z;
wire output_z_stb;

Flaoting_32_Divider uut(

    .input_a(input_a),
    .input_b(input_b),
    .input_a_stb(input_a_stb),
    .input_b_stb(input_b_stb),
    //.s_input_a_ack(s_input_a_ack),
    //.s_input_b_ack(s_input_b_ack),
    .output_z_ack(output_z_ack),
    .clk(clk),
    .rst(rst),
   
    .output_z(output_z),
    .output_z_stb(output_z_stb),
    .input_a_ack(input_a_ack),
    .input_b_ack(input_b_ack)
    );

 always #5 clk=~clk;

 initial begin
 
         clk= 1'b0;
       
        end

initial
begin

rst=1'b1;
input_a_stb=1'b1;
input_b_stb=1'b1;
output_z_ack=1'b0;
//s_input_a_ack=1'b1;
//s_input_b_ack=1'b1;

#1 rst=1'b0;
#2 input_a=32'b01000010101101101011000000000000;

#1 input_b=32'b00111110000101000000000000000000;
//#2 s_input_a_ack=1'b1;
//#5 s_input_b_ack=1'b0;
end

initial
begin
$monitor("time=",$time,"input_a =%b,input_b=%b,output_z=%b",input_a,input_b,output_z);
end

endmodule

output:-
time= 0
input_a=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,
input_b=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,
output_z=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
time= 3
input_a =01000010101101101011000000000000,
input_b=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,
output_z=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
time= 4
input_a=01000010101101101011000000000000,
input_b=00111110000101000000000000000000,
output_z=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

a question about multiplication and division algorithms

Thanks for sharing the codes.
Can you tell which algorithm exactly is being used for multiplication and division in the codes. I was curious.
Single-precision division is taking >110 cycles, so I guess, these algorithms may not be the ones used in real processors?
(I could not find any contact email, so thought of raising an issue).

Add comparison operators

Add 32 and 64 bits comparison operators, more exactly:

ModuleNotFoundError: No module named 'streams'

:~$ python3
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import chips.api.api
Traceback (most recent call last):
File "", line 1, in
File "/home/naga/.local/lib/python3.6/site-packages/chips/init.py", line 3, in
import streams, sinks, process, instruction
ModuleNotFoundError: No module named 'streams'

A + -A = +=0 bug

Although fixed in the single fp codebase, this bug still exists in the double codebase. The fix that is in place ensures that A + -A works when A is positive, but not when A is negative (e.g. 0xff80000000000000 + 0x7f80000000000000).

missing resp_z

File "./run_test.py", line 60, in run_test
stim_z = open("resp_z");
IOError: [Errno 2] No such file or directory: 'resp_z'

multiplier mismatch

Hi, dawsonjon

I run C-to-RTL formal verification on the multiplier and found there is a mismatch.

input:
a : 32'h7f80_0000 (infinity)
b : 32'h0

output:
C implementation : 32'hFFC00000 (NAN)
RTL implementation : 32'h7f80_0000 (infinity)

I googled the IEEE 754 standard and found this table :

Table 4. Multiplication of operands.

[ref: http://techdocs.altium.com/display/FPGA/IEEE+754+Standard+-+Overview]

It seems multiple an Infinity to zero is NaN.

Add unary operators

Square root would be more difficult, although I've find there are several (naïve?) implementations that we could use.

License?

What's the license of the project? According to copying.txt, the text seems to be very similar to MIT, could it be?

g++ command update

g++ -o test test.cpp

multiplier slow converge

Hi, Jon

I just found another interesting trace that multiplier is stuck at state normalize_2 for a long time.

input:
a : 32'h5e_c72e (denormal)
b : 32'h1f_ddeb (denormal)

I have uploaded the testbench in here.

Regarding Multiplication and exponent

Sir I was going through your code for the implementation of IEEE 754 floatingpoint multiplier. I noticed that you used z_e <= a_e + b_e + 1; product <= a_m * b_m * 4; but I was bit confused as I think the code should be z_e <= a_e + b_e + 127; product <= a_m * b_m ;. But your syntax gives the correct answer but mine does not. Can you explain the logic behind using it?

int_to_float returning 0 on certain inputs

When the module is given this input 32'b00010101010101100011010111001010
it outputs 0 or something close to zero (eg 10^-9)

double_multiplier.v Line185

why need *4? (product <= a_m * b_m * 4;)
1.a_m * 1.b_m will 106 bits , why need 108 bits?

Add extra operations

min
max
copysign

Merge `get_a` and `put_z` states

I've seen that the get_a and put_z states could be merged in a single one without too much hassle earning some cycles, since the conditions to accept a new request could be done at the same time a result is given, allowing to overlap the requests on that stage. How do you see it?

Unsigned integer to real

Seems the conversions between integers and real number only support signed ones, how could it be possible to convert from/to unsigned integers and long numbers?

$fopenr update

$fopenr("stim_a"); can be changed to $fopen("stim_a", "r")

Double multiplier rounding error

double_multiplier.v:228
"if (z_m == 53'hffffff) begin"
should be
"if (z_m == 53'h1fffffffffffff) begin"

Implement `fpu` module

Create a simple fpu module that host all the other components and can be used as a black-box. It can be just a wrapper over all the other components, just routing the a, b and z data wires and their signal ones according to an op selector, almost like a "kitchen sink" example of how to use the components.

In a future iteration, maybe it would be nice to create another more advanced one that allow execution of different operations at the same time to increase performance, with some control using a FIFO or similar to warranty order of execution, but maybe it would be done in an independent project too.

Pipelined Design?

Hi, Jon.
Thanks for your open-source FPU design. Amazing!
Having read the Verilog code of computation unit, I was impressive of the computational flow by a finite state machine.
However, I am wondering how to insert pipeline into your FPU design, for the sake of improving its throughput? The problem has puzzled me for almost half months. I wanna figure out how to insert pipeline in a finite state machine. Is it accessible? Wish for your help. Thank you.

Divider Explanation

Hi! I am a student and I stumbled on your work trying to understand how a floating point divider works but my module is something like this:

where the exception code is:

So I was wondering how does your divider.v relate to this module? I purely want to understand the working of your code and meaning behind: input_a_stb; input_a_ack; etc. like this.

Looking forward for a reply!

64 bits?

Could it be possible to expand the data size to 64 bits, maybe by using a flag? I'm doing a Verilog implementation of WebAssembly and the spec needs to do 64 bits floating point operations...

dawsonjon / fpu Goto Github PK

fpu's Introduction

IEEE 754 floating point arithmetic

Test

Dependencies

Procedure

Interface

fpu's People

Contributors

Stargazers

Watchers

Forkers

fpu's Issues

Recommend Projects

Recommend Topics

Recommend Org