xilinx / finn-hlslib Goto Github PK

Vitis HLS Library for FINN

Home Page: https://xilinx.github.io/finn/

License: BSD 3-Clause "New" or "Revised" License

C++ 80.98% C 2.95% Python 4.69% Tcl 11.05% Shell 0.33%

finn-hlslib's Introduction

finn-hlslib

This repo contains the Vitis HLS C++ library for the hardware acceleration of Quantized Neural Networks (QNN) using FINN.

For more information please refer to the documentation available here.

finn-hlslib's People

Contributors

Stargazers

Watchers

Forkers

danielm322 mikroes cbrl quetric mariodruiz maltanar tobi-alonso egokoo fengr0905 wady101 spontaneousduck heborras dzyswy invywyh germanfarinas arkhodamoradi giuliogamba marenan oliviaweng umav1511 jaemyungkim rgb000000 jeonggunlee fpjentzsch rbcarlos asadalam ejokar timoteogb mmrahorovic niiice wangfeng012316 izancatalan amamory-ampere lp247 tflsxyy prussian516 joseleitid michaela1224 laochonlam laplacekorea mkettn fqararyah ybadraoui anmol-s314 garmeniakos mohsaied ddanag maxpark jmonks-amd hleblevec wenzheguo yummy0929 i-colbert atousa-jafari sirokujira iksnagreb vopade cgfloare nikolaos-amd lstasytis alper-d

finn-hlslib's Issues

Conv1D support

Hi, I wanted to ask if 1D convolutions are supported in FINN-hlslib. If not,

Could we treat the 1D data as "images" with tricks like padding etc. to get it to work with the existing layer?
Are there are any resources that will help me get started with a custom implementation of Conv1D?
Thanks.

fatal error: memdata.h

messages:

Compiling ../../../../conv3_tb.cpp in debug mode
../../../../conv3_tb.cpp:51:21: fatal error: memdata.h:

ERROR: [SIM 211-100] Csim failed with error

Hello, I'm new to Vivado HLS, I run a simple C++ project and when I simulate, I get errors like these:

1 INFO: [SIM 2] *************** CSIM start ***************
2 INFO: [SIM 4] CSIM will launch GCC as the compiler.
3 ERR: [SIM 100] CSim failed with errors.
4 INFO: [SIM 3] *************** CSIM finish ***************

The console tab shows something like this:

"source /*path*/solution1/csim.tcl"
    invoked from within
"hls::main /*path*/solution1/csim.tcl"
    ("uplevel" body line 1)
    invoked from within
"uplevel 1 hls::main {*}$args"
    (procedure "hls_proc" line 5)
    invoked from within
"hls_proc $argv"
Finished C simulation.

How can I fix this, thank you so much.

Add optional per-channel negation prior to thresholding

Feature request: add optional per-channel negation mask prior to performing thresholding activation. If the mask entry for the current channel is 0, compare accumulator and threshold as-is. If mask entry is 1, negate the accumulator prior to threshold comparison (or use the NOT of the current comparison operator). The mask can be static (compile-time).

This is useful for streamlining networks with multi-bit weights. When streamlining a network, it's not always possible to flip the weight signs of the preceding matmul/conv layer to get rid of a negative multiply op -- for instance if the preciding matmul weights are 2-bit and contain a -2 value, flipping the sign yields 2 which is not representable as a 2-bit integer. Other solutions to this are using narrow-range weights (which may decrease accuracy slightly) or increasing the bitwidth of the weights (which increases the compute and memory cost).

Sliding Window testbench not working with stride = 1

Hi authors,
For file , tb/swg_tb.cpp. When I change STRIDE from 2 to 1 in tb/input_gen.h, I get this error:

ERROR:  Expected 72 actual 60
oy= 0 ox= 0 ky= 0 kx= 0
WARNING: Hls::stream 'output_stream' contains leftover data, which may result in RTL simulation hanging.
WARNING: Hls::stream 'input_stream' contains leftover data, which may result in RTL simulation hanging.
@E Simulation failed: Function 'main' returns nonzero value '1'.

After some debugging I could see the error was when the counter is set to 44, it would throw up this error.

Thanks!

No matching function call

No matching function for call to

Matrix_Vector_Activate_Batch<MatrixW, MatrixH, SIMD, PE, TSrcI, TDstI, TWeightI>
    (static_cast<hls::stream<ap_uint<SIMD*TSrcI::width>>&>(wa_in),
     static_cast<hls::stream<ap_uint<PE*TDstI::width>>&>  (wa_out),
     weights, activation, reps, r);

the only possible candidate is

template<
  unsigned MatrixW, unsigned MatrixH, unsigned SIMD, unsigned PE, unsigned MMV, 
  typename TSrcI = Identity, typename TDstI = Identity, typename TWeightI = Identity,
  typename TI, typename TO, typename TW, typename TA, typename R
>
void Matrix_Vector_Activate_Batch(hls::stream<TI> &in,
				  hls::stream<TO> &out,
				  TW  const &weights,
				  TA  const &activation,
				  int const  reps,
				  R const &r) {

add hardware IP

Can I add hardware IP into FINN by using Vivado HLS? My understanding of Vivado HLS is writing C code and synthesis, then combine Vivado PS block to generate bistream. The tutorial seems like using python to generate custom IP, and there is limit for generating custom IP. Am I right? If I generate bistream, how could I deploy on FPGA by PYNQ

Using ap_fixed for convolution

Can we use ap_fixed datatypes for weights, inputs and output streams? The function by default only accept ap_uint.

RTL simulation not finishing

Description

In the 1D SWU implementation (ConvolutionInputGenerator_1D_lowbuffer), RTL simulation does not finish depending on how the code is written. More specifically, the bug can be narrowed down to the following block. If

if(re) {
    out.write(buffer[rp]);
    if(++offset == WINDOW_SIZE){
        offset = 0;
	rp += SIMD_COUNT * (Stride_x - 1);
    }
    if(++rp >= BUFFER_SIZE)  rp -= BUFFER_SIZE;
    if(++ocnt == WINDOW_SIZE)  ocnt = 0;
}

is rewritten as:

if(re) {
    out.write(buffer[rp]);
    if(++offset == WINDOW_SIZE){
        offset = 0;
        rp += 1 + SIMD_COUNT * (Stride_x - 1);
	if(rp >= BUFFER_SIZE)  rp -= BUFFER_SIZE;
    }
    else{ // Explicit else-block required to work around bug in RTL simulation
        if(++rp >= BUFFER_SIZE)  rp -= BUFFER_SIZE;
    }
    if(++ocnt == WINDOW_SIZE)  ocnt = 0;
}

the bug is resolved. Both blocks are functionally equivalent, but the latter one introduces additional code overhead.
The main question is: what causes the HLS bug and what would be a cleaner way of resolving it? How to obtain additional debug information?

Reproducing

In order to reproduce the bug:

Clone the FINN repository.
Modify the unit test case in <FINN./tests/fpgadataflow/test_fpgadataflow_convinputgenerator1d.py by setting 'ifm_ch' to 1, 'stride' to [2, 1], 'exec_mode' to rtlsim, 'simd' to 1, 'dw' to 0, 'flip' to False, and 'parallel_window' to False.
Start a docker container
Modify the corresponding block of code in ConvolutionInputGenerator_1D_lowbuffer in /slidingwindow.h
Run: python setup.py test --addopts "-k test_fpgadataflow_slidingwindow_1d"

Subtract Streams Support

Currently finn-hlslib only has support for adding two streams. A model I am working on currently uses a Sub on two streams. This should not be hard to implement since Add is a pretty simple block and only require a small change to be a Sub. Should I go about this by adding something to the AddStreams template to change the sign or should I add a whole new set of functions for the add?

Does torch. tok support conversion?

Hello, My model contains torch.topk and torch.transpose operators. Can I use FINN?

Test results for WIDTH*SIMD greater than 64 are not accurate

Test results (test_conv3) are not accurate for SIMD*WIDTH greater than 64 (in memdata.h) . And anything greater than 64 bits will be considered as zero with the following warning:

integer constant is too large for its type

For example:
FixedPointWeights<3,ap_fixed<32,4>,1,1>, last weight is zero
FixedPointWeights<3,ap_uint<32>,1,1>, last weight is zero
FixedPointWeights<68,ap_uint<1>,1,1>, last 4 weights are zero

New MVA Stream Unit with int2 inputs

When using the Matrix_Vector_Activate_Stream_Batch function from mvau.hpp the weights and input values are not correctly sliced. Please find attached an example of a failing testcase. Maybe there is an error in the setting of the template parameter. The inputs and weights are saved in .npy files and read out using functions from npy2apintstream.hpp.

code_gen_npysim_MVASU.zip

W1 type

W1 type should be ap_int<WIDTH> at here: https://github.com/Xilinx/finn-hlslib/blob/master/tb/conv3_tb.cpp#L89

Testbench_conv Synthesis error

Hi
I am trying to run the testbench test_conv3.tcl and an error is generated. The top function is Testbench_conv from the file conv_top.cpp

Can someone help me to solve this bug?

ERROR: [HLS 214-134] Pointer to pointer is not supported (tb/streamtools.h:960:0)
ERROR: [HLS 214-134] in function 'WidthAdjustedOutputStream<32u, 256u, 8u>::WidthAdjustedOutputStream(hls::stream<ap_uint<256>, 0>&, unsigned int)': Pointer to pointer is not supported (tb/streamtools.h:960:87)
ERROR: [HLS 214-134] in function 'Testbench_conv(hls::stream<ap_uint<32>, 0>&, hls::stream<ap_uint<256>, 0>&, unsigned int)': Pointer to pointer is not supported (tb/convlayer.h:123:59)
ERROR: [HLS 214-135] Syn check fail!
Thanks you!

vivado_hls version for testbench

Hi,

What vivado_hls version is used for testing the code?

LabelSelect_Batch stream output type

In the LabelSelect_Batch op I see that the Out_T is templated probably to allow using smaller output bandwidth (e.g. 16-bit ints enough to represent 1000 classes) which is good, but the actual output dtype from the layer seems to be 32 bits always:

template<
		// tensor size parameters
		unsigned int NumClasses,
		unsigned int PECount,
    unsigned int NumTop,
		typename In_T,
    typename Out_T>
void LabelSelect_Batch(stream<ap_uint<PECount * In_T::width> > & in,
		stream<ap_uint<32> > & out, const unsigned int numReps) {

and the output isn't packed but written directly to the output stream

    for(unsigned int topx = 0; topx < NumTop; topx++){
            out.write(toplabels[NumTop - topx - 1]);
    }

Would it make sense to change the output stream type to stream<Out_T>? or is there some special reason we cast it to 32 bit uint first?

UpsampleNearestNeighbour bug when Padding=scale_factor

Hello,

I've been working on a support of rectangular inputs for the UpsampleNearestNeighbour recently, and while doing some tests I encountered an issue that also exists in the original square implementation.
Basically, if you have a scale factor that is equal to the padding, you will encounter a read while empty error (for example if IFMDim = 3 and OFMDim = 8, you have a scale factor of 2 and also a padding of 2).
The error message is the following:
ERROR [HLS SIM]: an hls::stream is read while empty, which may result in RTL simulation hanging. If this is not expected, execute C simulation in debug mode in the GUI and examine the source code location of the blocked hls::stream::read() call to debug. If this is expected, add -DALLOW_EMPTY_HLS_STREAM_READS to -cflags to turn this error into a warning and allow empty hls::stream reads to return the default value for the data type.

I found that it comes from the way the count_row variable is used (c.f.

finn-hlslib/upsample.hpp

Lines 114 to 116 in 80bc6f0

    
           count_row++; 
        
           if (count_row > scale_factor) 
        
            count_row =1;

). If you're in the situation where scale_factor == Padding (which translates in OFMDim/IFMDim==OFMDim%IFMDim), count_row will reach a value equal to scale_factor one time too much, which will issue a read on an already empty buffer.

They are several ways to solve this bug in terms of C++ code, but I am unsure which one would make the most sense with HLS synthesis in mind. What do you think?