Giter Site home page Giter Site logo

gsmecher / minimax Goto Github PK

View Code? Open in Web Editor NEW
200.0 10.0 13.0 253 KB

Minimax: a Compressed-First, Microcoded RISC-V CPU

License: BSD 3-Clause "New" or "Revised" License

Assembly 10.63% VHDL 4.17% Tcl 0.68% Makefile 1.64% C 11.46% Shell 1.25% Python 7.66% Verilog 60.82% Dockerfile 1.70%

minimax's Introduction

Minimax: a Compressed-First, Microcoded RISC-V CPU

CI status

What?

RISC-V's compressed instruction (RVC) extension is intended as an add-on to the regular, 32-bit instruction set, not a replacement or competitor. Its designers designed RVC instructions to be expanded into regular 32-bit RV32I equivalents via a pre-decoder.

What happens if we explicitly architect a RISC-V CPU to execute RVC instructions, and "mop up" any RV32I instructions that aren't convenient via a microcode layer? What architectural optimizations are unlocked as a result?

"Minimax" is an experimental RISC-V implementation intended to establish if an RVC-optimized CPU is, in practice, any simpler than an ordinary RV32I core with pre-decoder. While it passes a modest test suite, you should not use it without caution. (There are a large number of excellent, open source, "little" RISC-V implementations you should probably use reach for first.)

Originally, we included both Verilog and VHDL implementations. Sadly, the VHDL implementation has been retired.

In short:

  • RV32C (compressed) instructions are first-class and execute at 1 clock per instruction. (Exceptions: branches have a 2-cycle "taken" penalty.)

  • All RV32I instructions are emulated in microcode, using the instructions above.

This is distinct from (all?) other RV32C-capable RISC-V cores, because it really is architected for compressed first. This is not how the compressed ISA was intended to be implemented.

Why?

A compressed-first RISC-V architecture unlocks the following:

  • 1 clock per instruction (CPI) using a 2-port register file. RVC instructions have only 1 rd and 1 rs field. A 2-port register file maps cleanly into a single RAM64X1D per bit.

  • A simplified 16-bit instruction path without alignment considerations. The processor is a modified Harvard architecture, with a separate 16-bit instruction bus intended to connect to a second port of the instruction memory. On Xilinx, the asymmetric ports (16-bit instruction, 32-bit data) are reconciled using an asymmetric block RAM primitive. As a result, we don't have to worry about a 32-bit instruction split across two 32-bit words.

Why is this desirable?

  • Compilers (GCC, LLVM) are learning to prefer RVC instructions when optimizing for size. This means compiled code (with appropriate optimization settings) plays to Minimax's performance sweet-spot, preferring direct instructions to microcoded instructions. (see e.g. https://muxup.com/2022q3/whats-new-for-risc-v-in-llvm-15)

  • RVC instructions nearly double code density, which pay for the cost of microcode ROM when compared against a minimalist RV32I implementation.

  • It's not quite the smallest RVC implementation (SERV is smaller), but it is likely much faster with the appropriate compiler settings, and slightly less unorthodox in implementation.

What's awkward?

  • RVC decoding is definitely uglier than regular RV32I. I expect this ugliness is better masked when RVC instructions are decoded to RV32I and executed as "regular" 32-bit instructions.

How?

What's the design like?

  • Three-stage pipeline (fetch, decode, and everything-else). There is a corresponding 2-cycle penalty on taken branches.

  • Several "extension instructions" that use the non-standard extension space reserved in C.SLLI. This space allows us to add "fused" instructions accessible only in microcode, that perform the following:

    • "Thunk" from microcode back to standard code,
    • Move data from "user" registers into "microcode" registers and back again.

    These instructions are only part of the microcode - you are not required to build an unusual toolchain to use Minimax.

Performance

Resource Usage

The following statistics were collected using an Arty A7 (35T) as an execution target. This FPGA uses LUT6s.

Resource usage (excluding ROM and peripherals; KU060; 12-bit PC):

  • Minimax: 191 FFs, 507 CLB LUTs

Compare to:

  • PicoRV32: 483 FFs, 782 LUTs ("small", RV32I only)
  • FemtoRV32 186 FFs, 411 LUTs ("quark", RV32I only)
  • SERV: 312 FFs, 182 LUTs (no CSR or timer; RV32I only)
  • PicoBlaze: 82 FFs, 103 LUTs

Minimax is competitive, even against RV32I-only cores. When comparing against RV32IC implementations, it does better:

  • SERV: 303 FFs, 336 LUTs (no CSR or timer; RV32IC)
  • PicoRV32: 518 FFs, 1085 LUTs (RV32IC)

It is difficult to gather defensible benchmarks: please treat these as approximate, and let me know if they are inaccurate.

Fmax

Minimax meets timing closure at 100 MHz on an Artix A7 FPGA (-1 speed grade; xc7a35tcsg324-1).

How To

Regression Tests

The following command line snippet:

$ cd minimax/tests
$ run

...will run regression tests using the cSail framework. (The regression tests take several minutes to complete; the initial run is even slower since it involves a git checkout of the regression tests themselves.)

After completion, you should see something like the following:

INFO | TEST NAME                                          : COMMIT ID                                : STATUS
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/cadd-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/caddi-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/caddi16sp-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/caddi4spn-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/cand-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/candi-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/cbeqz-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/cbnez-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/cj-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/cjal-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/cjalr-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/cjr-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/cli-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/clui-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/clw-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/clwsp-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/cmv-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/cnop-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/cor-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/cslli-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/csrai-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/csrli-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/csub-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/csw-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/cswsp-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/C/src/cxor-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/add-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/addi-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/and-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/andi-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/auipc-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/beq-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/bge-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/bgeu-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/blt-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/bltu-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/bne-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/fence-01.S : 81c7a2b769baa2f33f40bc5455299b1362b5d125 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/jal-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/jalr-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lb-align-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lbu-align-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lh-align-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lhu-align-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lui-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lw-align-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/misalign1-jalr-01.S : 0c4cdffe19b1a48d9fec8590c8817af2ff924a37 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/or-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/ori-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sb-align-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sh-align-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sll-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/slli-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/slt-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/slti-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sltiu-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sltu-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sra-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/srai-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/srl-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/srli-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sub-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sw-align-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/xor-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | /minimax/test/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/xori-01.S : b91f98f3a0e908bad4680c2e3901fbc24b63a563 : Passed
INFO | Test report generated at /minimax/test/riscof_work/report.html.

Example Bitstream

The following command line snippet:

$ source /path/to/Vivado/settings.sh
$ cd minimax/tcl
$ ./arty_a7.tcl

...will create a "blinker" project using a Minimax core in Vivado. You should be able to click "generate bitstream" and produce a blinking light using an Arty A35T board.

Contributing

Minimax needs the following:

  • Zicsr and interrupt support

I am happy to collaborate and/or provide mentorship on this or any other Minimax-related project. Comments and PRs always welcome.

Graeme Smecher [email protected]

minimax's People

Contributors

gsmecher avatar xobs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

minimax's Issues

Implement Zc* support

The RISC-V code size reduction group (https://github.com/riscv/riscv-code-size-reduction) has nearly completed their work. If/when this gets rolled into the accepted RISC-V specifications, Minimax benefits in a few ways:

  • Better compliance: RVC-compliant half/byte writes (c.sb/c.sh) plug a conspicuous implementation gap. Without them, there's no way to emulate RV32I half or byte-wide writes without hardware support. They are currently stubbed out and unsupported in Minimax, which creates an easy footgun for compiled code. With c.sb/c.sh, these writes become just another emulated instruction.
  • Better code density: even if the new instructions are emulated (e.g. cm.push/cm.pop/cm.popret), they still decrease code size - and where they are emulated, performance with a single instruction is much better than performance with several.
  • Better performance: emulated (32-bit) instructions are 40-400x slower than native (16-bit, RVC) instructions. The new instructions are (excepting table jump, and possibly excepting multiple-register pushes/pops) inexpensive in logic and worth implementing directly.
  • More applications: the compressed multiply instruction (c.mul) is a fascinating opportunity to make Minimax into a tiny signal-processing engine. (Unfortunately, it'd be restricted to ~16-bit precision.) This is something it flatly can't do now, even with an emulated 32-bit multiply instruction.

Fix byte / half-word writes

The following RV32I STORE instructions flatly don't work:

  • SB: store byte
  • SH: store half-word

They can't, right now, because there is no way in RVC to express writes smaller than a word. It would take a custom opcode and some RTL to fix this.

The Zcb proposal (https://github.com/riscv/riscv-code-size-reduction) looks like a perfect opportunity to fix this in a hopefully-to-be-standardized way. We get the following 16-bit instructions:

  • c.sb
  • c.sh

and can emulate the 32-bit equivalents using these primitives, for very little RTL cost.

Consider taping out your CPU for free using Google's open MPW program

It would be awesome to see how this CPU would work in ASIC form and Google is offering free tape outs to open source silicon projects on SkyWater's 130nm and GlobalFoundries 180nm MCU process technologies. You can even use an entirely open source toolchain!

I feel like this MCU could be an ideal fit for the GF180MCU process technology which people currently use for low cost 8bit MCU designs.

Find out more at https://developers.google.com/silicon and https://efablesa.com and https://opensource.googleblog.com/2022/08/GlobalFoundries-joins-Googles-open-source-silicon-initiative.html

You can also join the https://open-source-silicon.dev slack workspace to find other people doing cool things in the open source ASIC space.

Use RISCOF regression tests

The "blessed" regression test framework seems to be riscof:

https://github.com/riscv-software-src/riscof

We currently have a small number of spot-tests, but could benefit from the whole framework.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.