kuznia-rdzeni / coreblocks Goto Github PK

View Code? Open in Web Editor NEW

33.0 6.0 13.0 35.21 MB

RISC-V out-of-order core for education and research purposes

Home Page: https://kuznia-rdzeni.github.io/coreblocks/

License: BSD 3-Clause "New" or "Revised" License

Python 98.34% Shell 0.69% Dockerfile 0.30% Assembly 0.51% Makefile 0.17%

risc-v riscv

coreblocks's Introduction

Coreblocks

Coreblocks is an experimental, modular out-of-order RISC-V core generator implemented in Amaranth. Its design goals are:

Simplicity. Coreblocks is an academic project, accessible to students. It should be suitable for teaching essentials of out-of-order architectures.
Modularity. We want to be able to easily experiment with the core by adding, replacing and modifying modules without changing the source too much. For this goal, we designed a transaction system inspired by Bluespec.
Fine-grained testing. Outside of the integration tests for the full core, modules are tested individually. This is to support an agile style of development.

In the future, we would like to achieve the following goals:

Performance (up to a point, on FPGAs). We would like Coreblocks not to be too sluggish, without compromising the simplicity goal. We don't wish to compete with high performance cores like BOOM though.
Wide(r) RISC-V support. Currently, we are focusing on getting the support for the core RV32I ISA right, but the ambitious long term plan is to be able to run full operating systems (e.g. Linux) on the core.

State of the project

The core currently supports the full RV32I instruction set and several extensions, including M (multiplication and division) and C (compressed instructions). Exceptions and some of machine-mode CSRs are supported, the support for interrupts is currently rudimentary and incompatible with the RISC-V spec. Coreblocks can be used with LiteX (currently using a patched version).

The transaction system we use as the foundation for the core is well-tested and usable. We plan to make it available as a separate Python package.

Documentation

The documentation for our project is automatically generated using Sphinx.

Resource usage and maximum clock frequency is automatically measured and recorded.

Contributing

Set up the development environment following the project documetation.

External contributors are welcome to submit pull requests for simple contributions directly. For larger changes, please discuss your plans with us through the issues page or the discussions page first. This way, you can ensure that the contribution fits the project and will be merged sooner.

License

This project is three-clause BSD licensed.

coreblocks's People

Contributors

Stargazers

Watchers

Forkers

mfkiwl tilk jumideluxe piotro888 durchbruchswagen arusekk kindlermikolaj makz00 lekcyjna123 kristopher38 xthaid qbojj julianvolodia

coreblocks's Issues

Reducing the docker image size

The Docker image for synthesis (created in #108) is currently very big (719 MB). A large part of this size is: new packages (347 MB) and nextpnr build (309 MB). Both of these can be made smaller:

Downloaded .debs can be removed by calling apt-get clean after installing.
The nextpnr build directory can be cleaned after installing.

Some smaller image size gains may also be possible.

Add pep8 naming check

To keep naming style uniform, it should be checked in CI. It's possible that ‘pep8-naming‘ or some other plugin to flake could help.

Functional unit for Zbs extension

The Zbs extension (which is a part of a larger bit manipulation specification) includes bit set, clear, invert and set instructions, which are very useful on microcontrollers.

https://github.com/riscv/riscv-bitmanip/releases/download/1.0.0/bitmanip-1.0.0-38-g865e7a7.pdf

Fetch unit with jump detection

First approach to implementing jumps will involve stopping the fetch unit when a conditional jump is encountered. This will avoid dealing with speculation until we are ready to implement it. Restarting the jump unit will be performed by the unit which handles computing the target jump address.

Dumb load/store unit

This task consists of implementing a load/store unit that can only do serialized memory accesses, integrating it with the core and connecting it to an external memory in the tests.

Add support for M extension to decoder

In order to perform MULDIV operation we need first to implement decoding those kind of instructions.

Functional unit for division (M extension)

To support the M extension, we would need to have a functional unit that processes divisions. Division implementation can be multicycle non-pipelined.

Transaction/method call graph visualization

It would be nice to have an automatically generated graph showing the high level structure of the full core in graphical form. Possible implementation could grab the necessary information from the TransactionManager and visualize it using graphviz or something similar.

Add docstrings to graph generation

MR #105 have no doc strings. There should be added in new review.

Freeze package versions for repeatable testing

Used versions of dependent packages should be explicit so that the test results are repeatable both on CI and local.

Functional unit for multiplication (M extension)

To support the M extension, we would need to have a functional unit that processes multiplications. Multiplication implementation should probably be pipelined.

Wishbone slave memory

Currently we don't have a wishbone slave implementation, which is required if we want to perform any memory accesses. I've researched our options with regards to fixing this and they're as follows (in no particular order):

amaranth-soc, particularly with this PR. Written in amaranth, implements wishbone interface and some handy features like memory maps that we won't need for now, but the PR specifically implements a bridge between "SRAM bus" (bus with raw memory signals like data, addr, w_en, r_en etc.) and wishbone. Minimal glue-code would be needed to get it running (or at least I think so from a brief look at the source).
litex's version of wishbone SRAM. Written in migen, so it would need to be ported over (without bursting it should be minimal effort?) to amaranth since I'm told that it can't directly interface with migen (not without dumping either of them into verilog first). Choosing this would probably mean writing a few extra tests in addition to the ones which we would port over from litex, since as far as I can see they only test bursting.
Rolling our own wishbone slave - looking at the PR mentioned in 1., specifically here, and litex's version here (where the actual transaction happens) makes me think implementing wishbone memory wishbone slave isn't as hard as it first seemed, especially when we're not doing anything fancy like supporting multiple memory segments or bursting.

I looked at using cocotb as was first suggested. Main problem I have with using it is that it works on verilog. so we'd need to dump our design to it first and test on that. There's integration with amaranth but it dumps to verilog anyway. We'd also lose the ability to do in-python debugging and possibly debugging with vcd dumps would be harder since we would be operating on amaranth-generated signal names. We should not pursue this route.

I'd like to hear your thoughts on this, especially from @tilk. I think bringing code mentioned in 1. is a bit of an overkill for the features we need, and we're capable of doing some blend of options 2. and 3. - writing our own slave while referencing litex's code in the process.

Load/store implementation

The CPU needs to implement load/store instructions. A load/store unit needs to implemented.

A simple implementation might just implement memory accesses in-order, without any optimizations. With speculative execution, care must be taken to only perform stores after retirement.

Backend implementation

The job of the backend is to receive results from the FU and propagate them to the relevant modules.

For this task, the form of the "Tomasulo bus" needs to be decided. Should it be just a signal, which every relevant module monitors in each cycle? Or should it use methods, like everything else, and therefore allow the bus to stall the backend?

Possibility of more generic test code

See for test_mul_unit.py in #114. There is preatty generic implementation of test pattern:

generate random input in queue
push one by one input from queue to unit
check if output from unit is the same as precomputed during generation

Please analyse possibility of making this code common and rewiriting other tests to use this new common code.

Multiple reservation stations

The core currently has only one reservation station - for the ALU unit. We would like to have multiple RS for different kinds of operations. This task involves modifying the scheduler so that it can select one of multiple RSs, and correctly connecting multiple RSs to the announcement unit.

This is needed to implement loads/stores, CSRs, etc.

Implement filter for methods

Idea from @piotro888.

Document high level structure of core

It would be nice to have documentation, which present all blocks and parameters from high level PoV, so that it would be easy to track how data flow look like.

Graph visualization in CI

Graphs from #105 for the master branch should be generated automatically in CI and deployed to Github Pages, so that they are easy to access.

Generating documetation from Python code

It would be useful to have code documentation in an accessible form. This could be automatically generated in Github Actions. I suggest using Sphinx and looking for ready-made Sphinx Github Actions.

Synthesis in CI

We would like to automatically synthesize our design in CI to measure performance metrics - e.g. used FPGA resources and maximum operating frequency. Initial target should probably be some ECP5 board, because of the availability of an open source toolchain.

Method priorities

RV32I calculator tests

The official RISC-V tests require jumps and memory access to work. As we want to implement these features later, some other method of testing the minimal working design will be needed.

Scheduler implementation

The scheduler's job is to receive decoded instructions, allocate physical registers for them, allocate a ROB entry and send them to RS.

This task requires some development of transactional interfaces to ROB, free-RF list, and possibly RS.

Pipelined multiplier

The task is to extend the multiplication support from #114 by a pipelined multiplier. The module should:

Be able to use the multiply-and-add capability of FPGA DSP elements (with configurable width).
Accept one multiplication on each clock cycle.
Have latency dependent on the DSP element width.

Interrupt implementation

A functional processor should be able to process external interrupts. As interrupts happen asynchronously, the processor needs to have correct precise exception handling: instructions in flight should be aborted.

This depends on working jump support.

Mypy: stubbing Amaranth

We need to have Amaranth stubbed so that we can check our code with mypy.

Document OneHotSwitch

The OneHotSwitch function is undocumented, which may make it hard to use by new members.

Functional unit(s) for Zbb extension

The Zbb extension (which is a part of a larger bit manipulation specification) includes the following kinds of instructions: bit counting, byte inversion, maximum/minimum, rotation, zero-extend and sign-extend. This task involves learning about these instructions, proposing an implementation plan, and implementation.

It's possible that this should result in multiple FUs, as some of these instructions don't share a lot of logic. The unit which does rotations might also do bit shifts, which could then be removed from the ALU (#173).

https://github.com/riscv/riscv-bitmanip/releases/download/1.0.0/bitmanip-1.0.0-38-g865e7a7.pdf

Simple generation of local documentation

There is no script which allows simple, effortless generation of documentation from the development environment. It should be just as simple as lint.sh and run_tests. Currently, the simplest way I know of is:

DOCS_DIR=docs BUILD_DIR=build ci/build-docs.sh

And it's not perfect, as it creates generated files in build and docs which are not in .gitignore. Cleaning these generated files is also not simple.

Scheduler name collisions

Currently there are three concepts we call schedulers:

a hardware module which has an input signal representing a set of requests and an output signal representing which request is granted, eg. amaranth.lib.scheduler.RoundRobin and coreblocks.transactions._utils.Scheduler (a one-hot version of the former).
a mechanism of executing transactions in our transaction framework. Here, multiple transactions (which request execution/whose execution condition is satisfied) can be granted at once. There were proposals to rename these to arbiters (and I have a git branch which implements that).
a middle stage of our processor. I can't find a precedent for this kind of name. It appears that this part is in-order and out-of-order scheduling first appears in the wakeup/select logic, which was apparently placed in the execution stage. I think this also should be named differently.

Integer functional unit implementation

We need an integer functional unit so that RV32I code can be run.

Improved, more readable graph visualization

Implement decoding exotic extensions

https://five-embeddev.com/riscv-isa-manual/latest/naming.html

Look at: from future import annotatons (PEP 563)

Could make sense in the implementation of transactions:

https://peps.python.org/pep-0563/

Initial processor documentation

The proposed workings of our design should be documented to make the work of implementation easier. The implementation should focus on the high level view of the design:

data structures and communication formats used in the design,
functional blocks, their connections with other functional blocks and data structures,
which updates occur at what stage of processing.

It would also be helpful to try to capture some low-level details, including the methods exposed by various blocks.

The documentation should be updated to reflect the actual shape of the design as implementation progresses, and our various misconceptions we made in initial stages are corrected.

Tests for the transaction framework

The transaction subsystem needs to be tested. Tests should cover various usage patterns, including explicit and implicit conflicts.

This issue depends on the testing solution for AdapterTrans which is being developed in #9.

Missing tests for RF

RF has been merged lately (together with rest of the scheduler) but I left out tests for the RF (to speed up merging and unblock other work). I won't be available for the next two weeks to finish that, so if anyone is willing to write them during that time it'd be much appreciated. Otherwise I'll get this done after the exam session.

Testing only retains VCD dump from the last test

Running run_tests.sh currently results in only one VCD dump being retained, because the name is hardcoded so it gets overwritten on each test when running multiple of them.
I have a few suggestions regarding dumps generation:

We need some way of uniquely identifying a test - to be able to generate a unique name for the dump file. This isn't made easy by the fact that a single unit test class might have multiple tests, generating multiple dumps. I'm not sure which route to take - we might consider manually uniquely naming each test or name them based on some sequential number (one sequence per class) and class name, i.e. "SchedulerTest-2', kinda like parametrize does it.
By default dumps should only be generated on failing tests to not locally clutter the main directory (run_tests.sh should take a parameter if someone wanted to override this behavior)
The script should also take some parameters to specify which test(s) we want to run, if the user only wants to run specific one(s).

Make a logo

Coreblocks doesn't have a logo. But logos are cool, they make a project more recognizable, so it would be nice to have one!

CSR extension

tell us something about CSR
prepare implementation proposition for CSR
implement CSR

Retirement implementation

The job of this module is to update the relevant data structures to retire the instructions in order. This frees various physical resources of the processor, so that they can be used again in the scheduler.

Fusing lower and upper multiplication into single operation

According to RISC-V documentation:

If both the high and low bits of the same product
are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1,
rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2).
Microarchitectures can then fuse these into a single multiply operation instead of performing two
separate multiplies.

Good idea would be to implement this optimization in our mul_unit.py. It should be a quite easy task good for getting familiar with Amaranth.

'assign' doesn't work with data from Arrays

A bunch of manual assignments in reservation station implementation (https://github.com/kuznia-rdzeni/coreblocks/blob/master/coreblocks/structs_common/rs.py#L78) could be made simpler by the use of assign from our utils lib. However implementation of assign assumes both lhs and rhs are records, however this is not the case with values we get from indexing arrays https://github.com/kuznia-rdzeni/coreblocks/blob/master/coreblocks/structs_common/rs.py#L75
The returned value is not a record but some (proxy (array ...)) object (I didn't investigate further). assign should be fixed to allow it to work with values from indexed arrays as well.

Include fetch unit into the core

Currently, the full core does not include the fetch unit. This needs to change in order to have jump support (#95). The tests for the full core need to be changed so that, instead of pushing instructions manually into the core, they are present in a "mocked" memory which is accessed by the fetch unit.

Figure out how to run tests under Python debugger

Currently to debug tests we can use a combination of looking at traces and printing values but we all know the comfort of printf debugging. We should figure out which command to run test of a given name under a python debugger (also in vscode?).

Frontend implementation

The job of this block is to fetch instructions, decode them and pass them for processing. For the "calculator" milestone, this module will have limited functionality, as there are no jumps, exceptions, interrupts etc. to be handled in some way.

Fetching should be done through the Wishbone bus; the Wishbone master should expose a method for this purpose, which needs to be called from the fetch stage. The Wishbone master should be external, so that it could be switched to e.g. an AXI master.

The decoded instruction should be passed to an external method, which - in the complete design - will push it to a FIFO.

Jump implementation

The CPU needs to implement jump instructions. This involves the following changes:

Implementing a jump unit for handling various types of jump instructions.
Modifying instruction dispatch so that jump instructions are handled by the jump unit.
Modifying the fetch unit so that it can start fetching instructions from a new address.

A simplest implementation might be non-speculative, with the fetch unit stopping after encountering a jump instruction. A speculative implementation requires canceling instructions in flight.

This task can probably be broken into smaller sub-tasks.

Fix CI action for synthesis/benchmark

The synthesis/benchmark action fails on master with following:

Error: Command 'git' failed with args '-c user.name=github-action-benchmark -c [email protected] -c http.https://github.com/.extraheader= fetch ***github.com/kuznia-rdzeni/coreblocks.git gh-pages:gh-pages': fatal: detected dubious ownership in repository at '/__w/coreblocks/coreblocks'
To add an exception for this directory, call:

	git config --global --add safe.directory /__w/coreblocks/coreblocks
: Error: The process '/usr/bin/git' failed with exit code 128

This is probably because running stuff in Docker messes up file ownerships. This needs to be addressed.

Unused imports check in CI

Several people had unused imports found in CR. These could be detected automatically. I believe this could be done by Pyright or some other tool.