Giter Site home page Giter Site logo

xilinx / finn-hlslib Goto Github PK

View Code? Open in Web Editor NEW
169.0 15.0 62.0 542 KB

Vitis HLS Library for FINN

Home Page: https://xilinx.github.io/finn/

License: BSD 3-Clause "New" or "Revised" License

C++ 80.98% C 2.95% Python 4.69% Tcl 11.05% Shell 0.33%

finn-hlslib's Introduction

finn-hlslib Documentation Status

This repo contains the Vitis HLS C++ library for the hardware acceleration of Quantized Neural Networks (QNN) using FINN.

For more information please refer to the documentation available here.

finn-hlslib's People

Contributors

arkhodamoradi avatar auphelia avatar dependabot[bot] avatar erlingrj avatar fccm219 avatar fpjentzsch avatar giuliogamba avatar heborras avatar maltanar avatar mariodruiz avatar mmrahorovic avatar preusser avatar quetric avatar rgb000000 avatar timoteogb avatar tobi-alonso avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

finn-hlslib's Issues

Conv1D support

Hi, I wanted to ask if 1D convolutions are supported in FINN-hlslib. If not,

  1. Could we treat the 1D data as "images" with tricks like padding etc. to get it to work with the existing layer?
  2. Are there are any resources that will help me get started with a custom implementation of Conv1D?
    Thanks.

fatal error: memdata.h

messages:

Compiling ../../../../conv3_tb.cpp in debug mode
../../../../conv3_tb.cpp:51:21: fatal error: memdata.h:

ERROR: [SIM 211-100] Csim failed with error

Hello, I'm new to Vivado HLS, I run a simple C++ project and when I simulate, I get errors like these:

1 INFO: [SIM 2] *************** CSIM start ***************
2 INFO: [SIM 4] CSIM will launch GCC as the compiler.
3 ERR: [SIM 100] CSim failed with errors.
4 INFO: [SIM 3] *************** CSIM finish ***************

The console tab shows something like this:

"source /*path*/solution1/csim.tcl"
    invoked from within
"hls::main /*path*/solution1/csim.tcl"
    ("uplevel" body line 1)
    invoked from within
"uplevel 1 hls::main {*}$args"
    (procedure "hls_proc" line 5)
    invoked from within
"hls_proc $argv"
Finished C simulation.

How can I fix this, thank you so much.

Add optional per-channel negation prior to thresholding

Feature request: add optional per-channel negation mask prior to performing thresholding activation. If the mask entry for the current channel is 0, compare accumulator and threshold as-is. If mask entry is 1, negate the accumulator prior to threshold comparison (or use the NOT of the current comparison operator). The mask can be static (compile-time).

This is useful for streamlining networks with multi-bit weights. When streamlining a network, it's not always possible to flip the weight signs of the preceding matmul/conv layer to get rid of a negative multiply op -- for instance if the preciding matmul weights are 2-bit and contain a -2 value, flipping the sign yields 2 which is not representable as a 2-bit integer. Other solutions to this are using narrow-range weights (which may decrease accuracy slightly) or increasing the bitwidth of the weights (which increases the compute and memory cost).

Sliding Window testbench not working with stride = 1

Hi authors,
For file , tb/swg_tb.cpp. When I change STRIDE from 2 to 1 in tb/input_gen.h, I get this error:

ERROR:  Expected 72 actual 60
oy= 0 ox= 0 ky= 0 kx= 0
WARNING: Hls::stream 'output_stream' contains leftover data, which may result in RTL simulation hanging.
WARNING: Hls::stream 'input_stream' contains leftover data, which may result in RTL simulation hanging.
@E Simulation failed: Function 'main' returns nonzero value '1'.

After some debugging I could see the error was when the counter is set to 44, it would throw up this error.

Thanks!

No matching function call

No matching function for call to

Matrix_Vector_Activate_Batch<MatrixW, MatrixH, SIMD, PE, TSrcI, TDstI, TWeightI>
    (static_cast<hls::stream<ap_uint<SIMD*TSrcI::width>>&>(wa_in),
     static_cast<hls::stream<ap_uint<PE*TDstI::width>>&>  (wa_out),
     weights, activation, reps, r);

the only possible candidate is

template<
  unsigned MatrixW, unsigned MatrixH, unsigned SIMD, unsigned PE, unsigned MMV, 
  typename TSrcI = Identity, typename TDstI = Identity, typename TWeightI = Identity,
  typename TI, typename TO, typename TW, typename TA, typename R
>
void Matrix_Vector_Activate_Batch(hls::stream<TI> &in,
				  hls::stream<TO> &out,
				  TW  const &weights,
				  TA  const &activation,
				  int const  reps,
				  R const &r) {

add hardware IP

Can I add hardware IP into FINN by using Vivado HLS? My understanding of Vivado HLS is writing C code and synthesis, then combine Vivado PS block to generate bistream. The tutorial seems like using python to generate custom IP, and there is limit for generating custom IP. Am I right? If I generate bistream, how could I deploy on FPGA by PYNQ

RTL simulation not finishing

Description

In the 1D SWU implementation (ConvolutionInputGenerator_1D_lowbuffer), RTL simulation does not finish depending on how the code is written. More specifically, the bug can be narrowed down to the following block. If

if(re) {
    out.write(buffer[rp]);
    if(++offset == WINDOW_SIZE){
        offset = 0;
	rp += SIMD_COUNT * (Stride_x - 1);
    }
    if(++rp >= BUFFER_SIZE)  rp -= BUFFER_SIZE;
    if(++ocnt == WINDOW_SIZE)  ocnt = 0;
}

is rewritten as:

if(re) {
    out.write(buffer[rp]);
    if(++offset == WINDOW_SIZE){
        offset = 0;
        rp += 1 + SIMD_COUNT * (Stride_x - 1);
	if(rp >= BUFFER_SIZE)  rp -= BUFFER_SIZE;
    }
    else{ // Explicit else-block required to work around bug in RTL simulation
        if(++rp >= BUFFER_SIZE)  rp -= BUFFER_SIZE;
    }
    if(++ocnt == WINDOW_SIZE)  ocnt = 0;
}

the bug is resolved. Both blocks are functionally equivalent, but the latter one introduces additional code overhead.
The main question is: what causes the HLS bug and what would be a cleaner way of resolving it? How to obtain additional debug information?

Reproducing

In order to reproduce the bug:

  • Clone the FINN repository.
  • Modify the unit test case in <FINN./tests/fpgadataflow/test_fpgadataflow_convinputgenerator1d.py by setting 'ifm_ch' to 1, 'stride' to [2, 1], 'exec_mode' to rtlsim, 'simd' to 1, 'dw' to 0, 'flip' to False, and 'parallel_window' to False.
  • Start a docker container
  • Modify the corresponding block of code in ConvolutionInputGenerator_1D_lowbuffer in /slidingwindow.h
  • Run: python setup.py test --addopts "-k test_fpgadataflow_slidingwindow_1d"

Subtract Streams Support

Currently finn-hlslib only has support for adding two streams. A model I am working on currently uses a Sub on two streams. This should not be hard to implement since Add is a pretty simple block and only require a small change to be a Sub. Should I go about this by adding something to the AddStreams template to change the sign or should I add a whole new set of functions for the add?

Test results for WIDTH*SIMD greater than 64 are not accurate

Test results (test_conv3) are not accurate for SIMD*WIDTH greater than 64 (in memdata.h) . And anything greater than 64 bits will be considered as zero with the following warning:

integer constant is too large for its type

For example:
FixedPointWeights<3,ap_fixed<32,4>,1,1>, last weight is zero
FixedPointWeights<3,ap_uint<32>,1,1>, last weight is zero
FixedPointWeights<68,ap_uint<1>,1,1>, last 4 weights are zero

New MVA Stream Unit with int2 inputs

When using the Matrix_Vector_Activate_Stream_Batch function from mvau.hpp the weights and input values are not correctly sliced. Please find attached an example of a failing testcase. Maybe there is an error in the setting of the template parameter. The inputs and weights are saved in .npy files and read out using functions from npy2apintstream.hpp.

code_gen_npysim_MVASU.zip

Testbench_conv Synthesis error

Hi
I am trying to run the testbench test_conv3.tcl and an error is generated. The top function is Testbench_conv from the file conv_top.cpp

Can someone help me to solve this bug?

ERROR: [HLS 214-134] Pointer to pointer is not supported (tb/streamtools.h:960:0)
ERROR: [HLS 214-134] in function 'WidthAdjustedOutputStream<32u, 256u, 8u>::WidthAdjustedOutputStream(hls::stream<ap_uint<256>, 0>&, unsigned int)': Pointer to pointer is not supported (tb/streamtools.h:960:87)
ERROR: [HLS 214-134] in function 'Testbench_conv(hls::stream<ap_uint<32>, 0>&, hls::stream<ap_uint<256>, 0>&, unsigned int)': Pointer to pointer is not supported (tb/convlayer.h:123:59)
ERROR: [HLS 214-135] Syn check fail!
Thanks you!

LabelSelect_Batch stream output type

In the LabelSelect_Batch op I see that the Out_T is templated probably to allow using smaller output bandwidth (e.g. 16-bit ints enough to represent 1000 classes) which is good, but the actual output dtype from the layer seems to be 32 bits always:

template<
		// tensor size parameters
		unsigned int NumClasses,
		unsigned int PECount,
    unsigned int NumTop,
		typename In_T,
    typename Out_T>
void LabelSelect_Batch(stream<ap_uint<PECount * In_T::width> > & in,
		stream<ap_uint<32> > & out, const unsigned int numReps) {

and the output isn't packed but written directly to the output stream

    for(unsigned int topx = 0; topx < NumTop; topx++){
            out.write(toplabels[NumTop - topx - 1]);
    }

Would it make sense to change the output stream type to stream<Out_T>? or is there some special reason we cast it to 32 bit uint first?

UpsampleNearestNeighbour bug when Padding=scale_factor

Hello,

I've been working on a support of rectangular inputs for the UpsampleNearestNeighbour recently, and while doing some tests I encountered an issue that also exists in the original square implementation.
Basically, if you have a scale factor that is equal to the padding, you will encounter a read while empty error (for example if IFMDim = 3 and OFMDim = 8, you have a scale factor of 2 and also a padding of 2).
The error message is the following:
ERROR [HLS SIM]: an hls::stream is read while empty, which may result in RTL simulation hanging. If this is not expected, execute C simulation in debug mode in the GUI and examine the source code location of the blocked hls::stream::read() call to debug. If this is expected, add -DALLOW_EMPTY_HLS_STREAM_READS to -cflags to turn this error into a warning and allow empty hls::stream reads to return the default value for the data type.

I found that it comes from the way the count_row variable is used (c.f.

finn-hlslib/upsample.hpp

Lines 114 to 116 in 80bc6f0

count_row++;
if (count_row > scale_factor)
count_row =1;
). If you're in the situation where scale_factor == Padding (which translates in OFMDim/IFMDim==OFMDim%IFMDim), count_row will reach a value equal to scale_factor one time too much, which will issue a read on an already empty buffer.

They are several ways to solve this bug in terms of C++ code, but I am unsure which one would make the most sense with HLS synthesis in mind. What do you think?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.