Giter Site home page Giter Site logo

vivado-hls-nota's Introduction

vivado-hls-nota

The following tables and figures are taken from Xilinx official documentation:

HLS Optimization Methodology

HLS Optimization Methodology

Source: UG1197 Figure 4-5

Vivado HLS Design Flow

Vivado HLS Design Flow

Source: UG902 Figure 4

Vivado HLS Pragmas By Type

Type Attributes
Kernel Optimization
  • pragma HLS allocation
  • pragma HLS expression_balance
  • pragma HLS latency
  • pragma HLS reset
  • pragma HLS resource
  • pragma HLS stable
Function Inlining
  • pragma HLS inline
  • pragma HLS function_instantiate
Interface Synthesis
  • pragma HLS interface
Task-level Pipeline
  • pragma HLS dataflow
  • pragma HLS stream
Pipeline
  • pragma HLS pipeline
  • pragma HLS occurrence
Loop Unrolling
  • pragma HLS unroll
  • pragma HLS dependence
Loop Optimization
  • pragma HLS loop_flatten
  • pragma HLS loop_merge
  • pragma HLS loop_tripcount
Array Optimization
  • pragma HLS array_map
  • pragma HLS array_partition
  • pragma HLS array_reshape
Structure Packing
  • pragma HLS data_pack

Vivado HLS Optimization Directives

Vivado HLS Optimization Directives

Source: UG902 Table 11

Loop pipelining:
Loop pipelining
Dataflow optimization:
Dataflow optimization
Array reshaping:
Array reshaping
Array partitioning:
Array partitioning

Source: SDAccel Development Environment Help

Vivado HLS Configurations

Vivado HLS Configurations

Source: UG902 Table 12

C++ Arbitrary Precision Integer Types

The header file ap_int.h defines the following arbitrary precision integer data types:

  • ap_int<W>
  • ap_uint<W>

where W is the number of bits. For example, ap_int<8> represents an 8-bit signed integer data type; ap_uint<234> represents a 234-bit unsigned integer type.

C++ Arbitrary Precision Fixed-Point Types

The header file ap_fixed.h defines the following arbitrary precision fixed-point data types:

  • ap_fixed<W,I,Q,O,N>
  • ap_ufixed<W,I,Q,O,N>

where W is the total number of bits, I is the number of integer bits, W-I is the number of fractional bits, Q specifies the type of rounding, O and N specify the overflow behavior. For example, ap_fixed<6,3> represents an 6-bit signed value with 3 integer bits and 3 fractional bits, where the MSB position is the sign bit, followed by 21, 20, 2-1, 2-2, 2-3 bits. ap_ufixed<10,8> represents an 10-bit signed value with 8 integer bits and 2 fractional bits.

IdentifierDescription
WWord length in bits.
IThe number of bits used to represent the integer value (the number of bits above the decimal point).
QQuantization mode dictates the behavior when greater precision is generated than can be defined by smallest fractional bit in the variable used to store thre result.
ModeDescription
AP_RNDRounding to plus infinity.
AP_RND_ZERORounding to zero.
AP_RND_MIN_INFRounding to minus infinity.
AP_RND_INFRounding to infinity.
AP_RND_CONVConvergent rounding.
AP_TRNTruncation to minus infinity (default).
AP_TRN_ZEROTruncation to zero.
OOverflow mode dictates the behavior when more bits are generated than the variable to store the result contains.
ModeDescription
AP_SATSaturation.
AP_SAT_ZEROSaturation to zero.
AP_SAT_SYMSymmetrical saturation.
AP_WRAPWrap around (default).
AP_WRAP_SMSign magnitude wrap around.
NThe number of saturation bits used in wrap around overflow modes. The default value is zero.

Vivado HLS limitations

  • For C and C++ designs only a single clock is supported. The same clock is applied to all functions in the design.
  • When using Stacked Silicon Interconnect (SSI) technology devices, it is important to ensure that the logic created by Vivado HLS fits within a single Super Logic Region (SLR).

Vivado HLS examples

FPGA resources

Look Up Table (LUT)

The LUTs can be configured as a 6-input LUT with one output or two 5-input LUTs with separate outputs but common addresses or logic inputs. Eight 6-input LUTs and their sixteen storage elements, as well as the multiplexers and arithmetic carry logic, form a slice.

Flip Flop (FF)

DSP Slice

Basic DSP48E2 Functionality

Source: UG579 Figure 1-1

Detailed DSP48E2 Functionality

Source: UG579 Figure 2-1

DSP48E2 Slice Primitive

Source: UG579 Figure 3-1

The DSP48E2 slice consists of a 27-bit pre-adder, a 27 x 18 multiplier, a second-stage adder/subtracter/logic unit, and a pattern detector. It produces a 48-bit output. If the multiplier is not used, the DSP slice can also be used as a full 48-bit adder/subtracter and AND/OR/NOT/NAND/NOR/XOR/XNOR logic unit. It also includes a pattern detector that provides support for convergent rounding, overflow/underflow, and counter auto-reset.

The typical use of the slice is to calculate P = (D ± A) * B + C. If the multiplier is not used, A and B can be concatenated as A:B to calculate P = A:B + C. Multiple DSP slices can be cascaded to perform accumulation PCOUT = (D ± A) * B + PCIN.

The A, B, C, D input ports have the following bit widths:

Port Bit Width Description
A 30 A[26:0] is the A input of the multiplier or the pre-adder. A[29:0] are the upper bits of the A:B concatenated input.
B 18 The B input of the multiplier. B[17:0] are the lower bits of the A:B concatenated input.
C 48 The C input to the second-stage adder/subtracter, pattern detector, or logic function.
D 27 The D input to the pre-adder or alternative input to the multiplier.

The P. PATTERNDETECT, and PATTERNBDETECT output ports have the following bit widths:

Port Bit Width Description
P 48 The P output from the second-stage adder/subtracter or logic function.
PATTERNBDETECT 1 Match indicator between P[47:0] and the complement of the 48-bit pattern.
PATTERNDETECT 1 Match indicator between P[47:0] and the 48-bit pattern.

The DSP slices in the same column can be cascaded to form accumulators, adders, counters, and other more sophisticated operations. The ability is provided by the cascade input ports (ACIN, BCIN, PCIN, CARRYCASCIN, and MULTSIGNIN) and the cascade output ports (ACOUT, BCOUT, PCOUT, CARRYCASCOUT, and MULTSIGNOUT).

Number of DSP slices on Xilinx FPGAs:

Device # of DSPs
Kintex-7 325T 840
Virtex-7 690T 3,600
Kintex UltraScale KU115 5,520
Virtex UltraScale+ VU9P 6,840
Virtex UltraScale+ VU13P 12,288

Note that Kintex-7 and Virtex-7 FPGAs have DSP48E1 whereas Virtex Ultrascale+ FPGAs have DSP48E2.

Block RAM

HLS considers one block RAM to be 18K bits. A block RAM has two ports which can each be 1, 2, 4, 9, or 18 bits wide (with depths of 16K, 8K, 4K, 2K, and 1K respectively).

Ultra RAM

Each UltraRAM stores 4096*72 bits, which is 16 times the size of a block RAM. The port width is always 72 bits.

FPGA design considerations

  • Resource utilization
  • Design performance
  • Power consumption
  • Software runtime
  • Debugging capability
  • Portability

FPGA performance metrics

Area : Amount of hardware resources required to implement the design based on the resources available in the FPGA, including look-up tables (LUTs), registers, block RAMs, and DSP48s.

Latency : Number of clock cycles required for the function to compute all output values.

Initiation interval (II) : Number of clock cycles before the function can accept new input data.

Loop iteration latency : Number of clock cycles it takes to complete one iteration of the loop.

Loop initiation interval : Number of clock cycles before the next iteration of the loop starts to process data.

Loop latency : Number of cycles to execute all iterations of the loop.

vivado-hls-nota's People

Contributors

jiafulow avatar

Stargazers

liubenyuan avatar Timothy avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.