Giter Site home page Giter Site logo

fserre / sgen Goto Github PK

View Code? Open in Web Editor NEW
20.0 1.0 1.0 505 KB

SGen is a generator capable of producing efficient hardware designs operating on streaming datasets. “Streaming” means that the dataset is divided into several chunks that are processed during several cycles, thus allowing a reduced use of resources. The size of these chunks is referred as the streaming width. It outputs a Verilog file that can be used for FPGAs.

Home Page: https://acl.inf.ethz.ch/research/hardware

License: GNU General Public License v3.0

Scala 27.22% VHDL 72.78%
scala3 dft fft verilog streaming

sgen's Introduction

SGen

SGen is a generator capable of producing efficient hardware designs for a variety of signal processing transforms. These designs operate on streaming data, meaning that the dataset is divided into several chunks that are processed during several cycles, thus allowing a reduced use of resources. The size of these chunks is called the streaming width. As an example, the figures below represent three discrete Fourier transforms on 8 elements, with a streaming width of 8 (no streaming), 4 and 2.

Iterative Cooley-Tukey FFT on 8 points. Iterative Cooley-Tukey FFT on 8 points, streamed with a streaming width of 4. Iterative Cooley-Tukey FFT on 8 points, streamed with a streaming width of 2.

The generator outputs a Verilog file that can be used for FPGAs.

  • A technical overview and an interface to download various generated designs is available here.
  • Feel free to report any bug, issue, or suggestion to François Serre.

Quick Start

The easiest way to use SGen is by using SBT. The following commands, in a Windows or Linux console, will generate a streaming Walsh-Hadamard transform on 8 points:

git clone https://github.com/fserre/sgen.git
cd sgen
sbt "run -n 3 wht"

The following section describes the different parameters that can be used.

Command-line interface

A SGen command line consists of a list parameters followed by the desired transform.

Parameters

The following parameters can be used:

  • -n n Logarithm of the size of the transform. As an example, -n 3 means that the transform operates on 2^3=8 points. This parameter is required.
  • -k k Logarithm of the streaming width of the implementation. -k 2 means that the resulting implementation will have 2^2=4 input ports and 4 output ports, and will perform a transform every 2^(n-k) cycles. In case where this parameter is not specified, the implementation is not folded, i.e. the implementation will have one input port and one output port for each data point, and will perform one transform per cycle.
  • -r r Logarithm of the radix (for DFTs and WHTs). This parameter specifies the size of the base transform used in the algorithm. r must divide n, and, for compact designs (dftcompact and whtcompact), be smaller or equal to k. It is ignored by tranforms not requiring it (permutations). If this parameter is not specified, the highest possible radix is used.
  • -o filename Name of the output file.
  • -benchmark Adds a benchmark module in the generated design.
  • -rtlgraph Produces a DOT graph representing the design.
  • -dualramcontrol Uses dual-control for memory (read and write addresses are computed independently). This option yields designs that use more resources than with single RAM control (default), but that have more flexible timing constraints (see the description in generated files). This option is automatically enabled for compact designs.
  • -singleported Uses single-ported memory (read and write addresses are the same). This option has the same constraints as single RAM control (default), but may have a higher latency.
  • -zip Creates a zip file containing the design and its dependencies (e.g. FloPoCo modules).
  • -hw repr Hardware arithmetic representation of the input data. repr can be one of the following:
    • char Signed integer of 8 bits. Equivalent of signed 8.
    • short Signed integer of 16 bits. Equivalent of signed 16.
    • int Signed integer of 32 bits. Equivalent of signed 32.
    • long Signed integer of 64 bits. Equivalent of signed 64.
    • uchar Unsigned integer of 8 bits. Equivalent of unsigned 8.
    • ushort Unsigned integer of 16 bits. Equivalent of unsigned 16.
    • uint Unsigned integer of 32 bits. Equivalent of unsigned 32.
    • ulong Unsigned integer of 64 bits. Equivalent of unsigned 64.
    • float Simple precision floating point (32 bits). Equivalent of ieee754 8 23.
    • double Double precision floating point (64 bits). Equivalent of ieee754 11 52.
    • half Half precision floating point (16 bits). Equivalent of ieee754 5 10.
    • minifloat Minifloat of 8 bits. Equivalent of ieee754 4 3.
    • bfloat16 bfloat16 floating point . Equivalent of ieee754 8 7.
    • unsigned size Unsigned integer of size bits.
    • signed size Signed integer of size bits. Equivalent of fixedpoint size 0.
    • fixedpoint integer fractional Signed fixed-point representation with an integer part of integer bits and a fractional part of fractional bits.
    • flopoco wE wF FloPoCo floating point representation with an exponent size of wE bits, and a mantissa of wF bits. The resulting design will depend on the corresponding FloPoCo generated arithmetic operators, which must be placed in the flopoco folder. In the case where the corresponding vhdl file is not present, SGen provides the command line to generate it. Custom options (e.g. frequency or target) can be used.
    • ieee754 wE wF IEEE754 floating point representation with an exponent size of wE bits, and a mantissa of wF bits. Arithmetic operations are performed using FloPoCo. Note that, unless otherwise specified when generating FloPoCo operators, denormal numbers are flushed to zero.
    • complex repr Cartesian complex number. Represented by the concatenation of its coordinates, each represented using repr arithmetic representation.

Transforms

Supported transforms are the following:

Streaming linear permutations

Linear permutations can be implemented using the lp command:

# generates a bit-reversal permutation on 32 points, streamed on 2^2=4 ports.
sbt "run -n 5 -k 2 lp bitrev"

# generates a streaming datapath that performs a bit-reversal permutation on 8 points on the first dataset, and a "half-reversal" on the second dataset on 2 ports
sbt "run -n 3 -k 1 lp bitrev 100110111"

The command lp takes as parameter the invertible bit-matrix representing the linear permutation (see this publication for details) in row-major order. Alternatively, the matrix can be replaced by the keyword bitrev or identity.

Several bit-matrices can be listed (seperated by a space) to generate a datapath performing several permutations. In this case, the first permutation will be applied to the first dataset entering, the second one on the second dataset, ...

The resulting implementation supports full throughput, meaning that no delay is required between two datasets.

Fourier Transforms (full throughput)

Fourier transforms (with full-throughput, i.e. without delay between datasets) can be generated using the dft command:

# generates a Fourier transform on 16 points, streamed on 4 ports, with fixed-point complex datatype with a mantissa of 8 bits and an exponent of 8 bits.
sbt "run -n 4 -k 2 -hw complex fixedpoint 8 8 dft"

Fourier Transforms (compact design)

Fourier transforms (with an architecture that reuses several times the same hardware) can be generated using the dftcompact command:

# generates a Fourier transform on 1024 points, streamed on 8 ports, with fixed-point complex datatype with a mantissa of 8 bits and an exponent of 8 bits.
sbt "run -n 10 -k 3 -hw complex fixedpoint 8 8 dftcompact"

RAM control

In the case of a streaming design (n > k), memory modules may need to be used. In this case, SGen allows to choose the control strategy used for this module:

  • Dual RAM control: read and write addresses are computed independently. This offers the highest flexibility (a dataset can be input at any time after the previous one), but this uses more resources. It is automatically used for compact designs (dftcompact), but can be enabled for other designs using the -dualRAMcontrol parameter.
  • Single RAM control: write address is the same as the read address, delayed by a constant time. This uses less resources, but it has less flexibility: a dataset must be input either immediately after the previous one, or wait that the previous dataset is completely out. This is the default mode (except for compact designs).
  • Single-ported RAM: write and read addresses are the same. This has the same constraints as Single RAM control, but may have a higher latency.

sgen's People

Contributors

fserre avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

zeroherolin

sgen's Issues

Bad Coding Style

Bad coding style of combinatorial logic using non-blocking assignment
image

DFT generation with 1 sample per cycle fails

I changed the project version from 1.5.5 to 1.8.0 to get around a Java issue like this related to sbt not starting:
sbt/sbt#6925

Now the generator runs, but not if I select n=6, k=0 (1 sample per cycle). With n=6, k=1 it works fine. Or with n=6, k=6 it also works fine. See below for the NoSuchElementException:

sbt "run -n 6 -k 0 -hw complex fixedpoint 16 0 -dualramcontrol -o sgen_dft.v dft"
[info] welcome to sbt 1.8.0 (Homebrew Java 19.0.1)
[info] loading project definition from /Users/<user>/git/SGen/project
[info] loading settings for project root from build.sbt ...
[info] set current project to SGen (in build file:/Users/<user>/git/SGen/)
[info] running (fork) Main -n 6 -k 0 -hw complex fixedpoint 16 0 -dualramcontrol -o sgen_dft.v dft
[info]    _____ ______          SGen v.0.2 - A Generator of Streaming Hardware
[info]   / ___// ____/__  ____  Department of Computer Science, ETH Zurich, Switzerland
[info]   \__ \/ / __/ _ \/ __ \
[info]  ___/ / /_/ /  __/ / / / Copyright (C) 2020-2021 François Serre ([email protected])
[info] /____/\____/\___/_/ /_/  https://github.com/fserre/sgen
[info] This program comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it and to modify it under the terms of the GNU GPLv3+ <http://gnu.org/licenses/gpl.html>.
[error] Exception in thread "main" java.util.NoSuchElementException: empty.head
[error] 	at scala.collection.immutable.Vector.head(Vector.scala:277)
[error] 	at Main$.r$1(Main.scala:62)
[error] 	at Main$.main(Main.scala:153)
[error] 	at Main.main(Main.scala)
[error] Nonzero exit code returned from runner: 1
[error] (Compile / run) Nonzero exit code returned from runner: 1
[error] Total time: 1 s, completed Dec 12, 2022, 3:17:50 PM

Huge Memory Use

When used for FFT with n = 10 and k=7, complex char data type, the memory usage is superior to 128Gb. Is it a feature?
...
[info] running main -n 10 -k 7 -r 1 -hw complex fixedpoint 4 4 dft
...
[error] (run-main-0) java.lang.IllegalArgumentException: requirement failed: ArrayDeque too big - cannot allocate ArrayDeque of length 1073741824
...

Add a i_valid for slow inputs (high system_clk/sample_clk ratios)

Waste of extra BRAM or other resources to keep feeding the core k samples every cycle if the input data rate is much slower than system clock (which dftcompact/burst is perfect for). The same thing with the output, the blocks downstream likely can't keep up with k samples every clock cycle. valid signal will improve usability greatly, ready too can help with signaling end of dataset.

Mismatch between expected output and actual output when running testbench

Hi there,
I was running the testbench for the design of 65536-point FFT with 256 streaming width. In terms of fixed-point representation, I set 8-bit integer and 8-bit fraction. Under this setting, I encountered the case that the actual output is mismatched with the expected output. I have seen the output of some points are really close to the expected output, while some have a large gap. I would like to know if this case is normal and if this is due to the precision loss of fixed-point representation. Below, I capture part of the testbench output. I would really appreciate a quick response.

图片

Request for testbench

Hi,

I'm a hardware designer looking for some available fft core in verilog or vhdl and find out your great contribution here. But this repo does not seem to be complete. There is no testbench for the generated fft core nor other program for testing the fft core. The README is not enough to be a document for the whole program.

Maybe the author could add some testbench or some documents to make the project more complete.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.