This repo contains the Vitis HLS C++ library for the hardware acceleration of Quantized Neural Networks (QNN) using FINN.
For more information please refer to the documentation available here.
Vitis HLS Library for FINN
Home Page: https://xilinx.github.io/finn/
License: BSD 3-Clause "New" or "Revised" License
Hi, I wanted to ask if 1D convolutions are supported in FINN-hlslib. If not,
messages:
Compiling ../../../../conv3_tb.cpp in debug mode
../../../../conv3_tb.cpp:51:21: fatal error: memdata.h:
Hello, I'm new to Vivado HLS, I run a simple C++ project and when I simulate, I get errors like these:
1 INFO: [SIM 2] *************** CSIM start ***************
2 INFO: [SIM 4] CSIM will launch GCC as the compiler.
3 ERR: [SIM 100] CSim failed with errors.
4 INFO: [SIM 3] *************** CSIM finish ***************
The console tab shows something like this:
"source /*path*/solution1/csim.tcl"
invoked from within
"hls::main /*path*/solution1/csim.tcl"
("uplevel" body line 1)
invoked from within
"uplevel 1 hls::main {*}$args"
(procedure "hls_proc" line 5)
invoked from within
"hls_proc $argv"
Finished C simulation.
How can I fix this, thank you so much.
Feature request: add optional per-channel negation mask prior to performing thresholding activation. If the mask entry for the current channel is 0, compare accumulator and threshold as-is. If mask entry is 1, negate the accumulator prior to threshold comparison (or use the NOT of the current comparison operator). The mask can be static (compile-time).
This is useful for streamlining networks with multi-bit weights. When streamlining a network, it's not always possible to flip the weight signs of the preceding matmul/conv layer to get rid of a negative multiply op -- for instance if the preciding matmul weights are 2-bit and contain a -2 value, flipping the sign yields 2 which is not representable as a 2-bit integer. Other solutions to this are using narrow-range weights (which may decrease accuracy slightly) or increasing the bitwidth of the weights (which increases the compute and memory cost).
Hi authors,
For file , tb/swg_tb.cpp. When I change STRIDE from 2 to 1 in tb/input_gen.h, I get this error:
ERROR: Expected 72 actual 60
oy= 0 ox= 0 ky= 0 kx= 0
WARNING: Hls::stream 'output_stream' contains leftover data, which may result in RTL simulation hanging.
WARNING: Hls::stream 'input_stream' contains leftover data, which may result in RTL simulation hanging.
@E Simulation failed: Function 'main' returns nonzero value '1'.
After some debugging I could see the error was when the counter is set to 44, it would throw up this error.
Thanks!
No matching function for call to
Matrix_Vector_Activate_Batch<MatrixW, MatrixH, SIMD, PE, TSrcI, TDstI, TWeightI>
(static_cast<hls::stream<ap_uint<SIMD*TSrcI::width>>&>(wa_in),
static_cast<hls::stream<ap_uint<PE*TDstI::width>>&> (wa_out),
weights, activation, reps, r);
the only possible candidate is
template<
unsigned MatrixW, unsigned MatrixH, unsigned SIMD, unsigned PE, unsigned MMV,
typename TSrcI = Identity, typename TDstI = Identity, typename TWeightI = Identity,
typename TI, typename TO, typename TW, typename TA, typename R
>
void Matrix_Vector_Activate_Batch(hls::stream<TI> &in,
hls::stream<TO> &out,
TW const &weights,
TA const &activation,
int const reps,
R const &r) {
Can I add hardware IP into FINN by using Vivado HLS? My understanding of Vivado HLS is writing C code and synthesis, then combine Vivado PS block to generate bistream. The tutorial seems like using python to generate custom IP, and there is limit for generating custom IP. Am I right? If I generate bistream, how could I deploy on FPGA by PYNQ
Can we use ap_fixed datatypes for weights, inputs and output streams? The function by default only accept ap_uint.
In the 1D SWU implementation (ConvolutionInputGenerator_1D_lowbuffer), RTL simulation does not finish depending on how the code is written. More specifically, the bug can be narrowed down to the following block. If
if(re) {
out.write(buffer[rp]);
if(++offset == WINDOW_SIZE){
offset = 0;
rp += SIMD_COUNT * (Stride_x - 1);
}
if(++rp >= BUFFER_SIZE) rp -= BUFFER_SIZE;
if(++ocnt == WINDOW_SIZE) ocnt = 0;
}
is rewritten as:
if(re) {
out.write(buffer[rp]);
if(++offset == WINDOW_SIZE){
offset = 0;
rp += 1 + SIMD_COUNT * (Stride_x - 1);
if(rp >= BUFFER_SIZE) rp -= BUFFER_SIZE;
}
else{ // Explicit else-block required to work around bug in RTL simulation
if(++rp >= BUFFER_SIZE) rp -= BUFFER_SIZE;
}
if(++ocnt == WINDOW_SIZE) ocnt = 0;
}
the bug is resolved. Both blocks are functionally equivalent, but the latter one introduces additional code overhead.
The main question is: what causes the HLS bug and what would be a cleaner way of resolving it? How to obtain additional debug information?
In order to reproduce the bug:
python setup.py test --addopts "-k test_fpgadataflow_slidingwindow_1d"
Currently finn-hlslib
only has support for adding two streams. A model I am working on currently uses a Sub
on two streams. This should not be hard to implement since Add
is a pretty simple block and only require a small change to be a Sub
. Should I go about this by adding something to the AddStreams
template to change the sign or should I add a whole new set of functions for the add?
Hello, My model contains torch.topk
and torch.transpose
operators. Can I use FINN?
Test results (test_conv3) are not accurate for SIMD*WIDTH greater than 64 (in memdata.h) . And anything greater than 64 bits will be considered as zero with the following warning:
integer constant is too large for its type
For example:
FixedPointWeights<3,ap_fixed<32,4>,1,1>, last weight is zero
FixedPointWeights<3,ap_uint<32>,1,1>, last weight is zero
FixedPointWeights<68,ap_uint<1>,1,1>, last 4 weights are zero
When using the Matrix_Vector_Activate_Stream_Batch function from mvau.hpp the weights and input values are not correctly sliced. Please find attached an example of a failing testcase. Maybe there is an error in the setting of the template parameter. The inputs and weights are saved in .npy files and read out using functions from npy2apintstream.hpp.
W1 type should be ap_int<WIDTH> at here: https://github.com/Xilinx/finn-hlslib/blob/master/tb/conv3_tb.cpp#L89
Hi
I am trying to run the testbench test_conv3.tcl and an error is generated. The top function is Testbench_conv from the file conv_top.cpp
Can someone help me to solve this bug?
ERROR: [HLS 214-134] Pointer to pointer is not supported (tb/streamtools.h:960:0)
ERROR: [HLS 214-134] in function 'WidthAdjustedOutputStream<32u, 256u, 8u>::WidthAdjustedOutputStream(hls::stream<ap_uint<256>, 0>&, unsigned int)': Pointer to pointer is not supported (tb/streamtools.h:960:87)
ERROR: [HLS 214-134] in function 'Testbench_conv(hls::stream<ap_uint<32>, 0>&, hls::stream<ap_uint<256>, 0>&, unsigned int)': Pointer to pointer is not supported (tb/convlayer.h:123:59)
ERROR: [HLS 214-135] Syn check fail!
Thanks you!
Hi,
What vivado_hls version is used for testing the code?
/T
In the LabelSelect_Batch
op I see that the Out_T is templated probably to allow using smaller output bandwidth (e.g. 16-bit ints enough to represent 1000 classes) which is good, but the actual output dtype from the layer seems to be 32 bits always:
template<
// tensor size parameters
unsigned int NumClasses,
unsigned int PECount,
unsigned int NumTop,
typename In_T,
typename Out_T>
void LabelSelect_Batch(stream<ap_uint<PECount * In_T::width> > & in,
stream<ap_uint<32> > & out, const unsigned int numReps) {
and the output isn't packed but written directly to the output stream
for(unsigned int topx = 0; topx < NumTop; topx++){
out.write(toplabels[NumTop - topx - 1]);
}
Would it make sense to change the output stream type to stream<Out_T>? or is there some special reason we cast it to 32 bit uint first?
Hello,
I've been working on a support of rectangular inputs for the UpsampleNearestNeighbour recently, and while doing some tests I encountered an issue that also exists in the original square implementation.
Basically, if you have a scale factor that is equal to the padding, you will encounter a read while empty error (for example if IFMDim = 3 and OFMDim = 8, you have a scale factor of 2 and also a padding of 2).
The error message is the following:
ERROR [HLS SIM]: an hls::stream is read while empty, which may result in RTL simulation hanging. If this is not expected, execute C simulation in debug mode in the GUI and examine the source code location of the blocked hls::stream::read() call to debug. If this is expected, add -DALLOW_EMPTY_HLS_STREAM_READS to -cflags to turn this error into a warning and allow empty hls::stream reads to return the default value for the data type.
I found that it comes from the way the count_row variable is used (c.f.
Lines 114 to 116 in 80bc6f0
They are several ways to solve this bug in terms of C++ code, but I am unsure which one would make the most sense with HLS synthesis in mind. What do you think?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.