openhwgroup / cva5 Goto Github PK

The CORE-V CVA5 is an Application class 5-stage RISC-V CPU specifically targetting FPGA implementations.

License: Apache License 2.0

SystemVerilog 91.20% Tcl 3.10% C++ 4.74% Makefile 0.44% Python 0.52%

cva5's Introduction

CVA5

CVA5 is a 32-bit RISC-V processor designed for FPGAs supporting the Multiply/Divide and Double-precision Floating-Point extensions (RV32IMD). The processor is written in SystemVerilog and has been designed to be both highly extensible and highly configurable.

The CVA5 is derived from the Taiga Project from Simon Fraser University.

The pipeline has been designed to support parallel, variable-latency execution units and to readily support the inclusion of new execution units.

Documentation and Project Setup

For up-to-date documentation, as well as an automated build environment setup, refer to Taiga Project

License

CVA5 is licensed under the Solderpad License, Version 2.1 ( http://solderpad.org/licenses/SHL-2.1/ ). Solderpad is an extension of the Apache License, and many contributions to CVA5 were made under Apache Version 2.0 ( https://www.apache.org/licenses/LICENSE-2.0 )

Examples

A zedboard configuration is provided under the examples directory along with tools for running stand-alone applications and providing application level simulation of the system. (See the README in the zedboard directory for details.)

Publications

E. Matthews, A. Lu, Z. Fang and L. Shannon, "Rethinking Integer Divider Design for FPGA-Based Soft-Processors," 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), San Diego, CA, USA, 2019, pp. 289-297. doi: https://doi.org/10.1109/FCCM.2019.00046

E. Matthews, Z. Aguila and L. Shannon, "Evaluating the Performance Efficiency of a Soft-Processor, Variable-Length, Parallel-Execution-Unit Architecture for FPGAs Using the RISC-V ISA," 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Boulder, CO, 2018, pp. 1-8. doi: https://doi.org/10.1109/FCCM.2018.00010

E. Matthews and L. Shannon, "TAIGA: A new RISC-V soft-processor framework enabling high performance CPU architectural features," 2017 27th International Conference on Field Programmable Logic and Applications (FPL), Ghent, Belgium, 2017. https://doi.org/10.23919/FPL.2017.8056766

cva5's People

Contributors

Stargazers

Watchers

Forkers

e-matthews quangtran2796 mfkiwl 250handsomeliang flmeisel tinyloop dominiksalvet longstudy ckeilbar ramonwirsch gdwin007 rajnesh28 clownsw

cva5's Issues

Interest in Flopoco-based FPU, Instruction invalidation on self-modifying code or Verilator Improvements?

In my fork, I have added some functionality to CVA5. I have a significantly modified environment in which I test, so cannot simply issue pull-requests right now.
Is there interest in somehow getting some or all of these changes upstream? And if so, which?

The main additions are:

a Flopoco-based FPU with customizable pipeline latencies across 4 CVA5-pipeline modules and FP-RegFile. FPU supports all Single-precision operations, but no subnormals, rounding modes, exceptions or FPU-CSR registers yet. Since Flopoco generates VHDL-code there is a verilator-dpi-C implementation for all of those that is a drop-in replacement for the actual Flopoco implementations. Would require additional work to regenerate matching Flopoco implementations from a user-supplied Flopoco binary instead of the pre-generated files I use in my own git submodule. But the verilator implementation runs out-of-the box. FP latencies are not yet configurable from the CPU_CONFIG structure, but could easily be, as they are already parameterized inside the pipelines.
- The FP RF is a separate instance of the existing RF (now more parameterizable) , with 64 physical registers that are also handled by the Renamer. They are 2 bits wider than GP regs to match Flopoco's wider format. This also allows for cheaper 3r1w ports independent of the GP RF (although the FP-MAC implementation does not use the 3rd operand simultaneously. Enhanced decode & issue stages could make do with only 2 read ports). I have not investigated synthesis-impact of using a shared pool of physical registers to avoid allocating 2x 64 registers or to mitigate the need for some separate infrastructure in the renamer.
Optional (build-time and runtime, controlled via CSR) Instruction invalidation for all Data-writes. Instruction cache and Branch-Predictor were not kept coherent with data when that was changed, so bootloader-functionality was problematic. I have not kept up-to-date with what CVA5 can do in this regard out of the box (there seems some early-branch-flush feature to at least handle this in the predictor). This invalidation can slow down the processor a bit, as each write is signaled to a configurable number of fifos to check for needed invalidation. The invalidation is by default off and needs to be enabled via custom CRS register for as long as overwriting existing instructions is possible.
I have reworked the Verilator implementation with new command-line options/parsing and features. Among them:
- Extensible. I can have my own build with different top-level file with more ports by switching out one C file and reusing all the rest
- can terminate on reaching infinite loops (optional) or a user-exit magic-nop
- configurable stall limit (RT-OS with RFI instructions will hit the hard coded limit very fast)
- UART redirectable to file, including inputs. Can be used with socat to simulate bootloaders communicating over UART with the actual host-side loader tool
- new combined format for memory contents. I have devised a text-format that lists an arbitrary number of binary files, each with offsets, ranges from which actual memory contents and reference contents can be loaded. Verilator can initialize both local memory and DRAM from this format. It is essentially a dummed down, human-readable ELF Header table, which means, in most cases, my tool (written in kotlin, complex, reads and understands ELF-files) just generates this index-file, but the actual memory contents are read from the original ELF-Binary. But additional contents can easily be mixed-in or overlayed. Since the format is sparse and supports zero-initializing it can save a lot of space compared to the existing hex files.
- local memory and DRAM can be initialized separately, even from the existing hw_init hex formats
- use FST format instead of VCD (but configurable at build-time). Much faster and more space-saving
- out-of-tree: I now build Verilator with CMake, which builds faster and is more comfortable, which is also where the Flopoco source files are integrated right now
out-of-tree/WIP: Zephyr Port, intended to build a multi-threaded application that manages many things, including bootloading via an additional UART port (supports User-Mode, UART, but works around lack of RISC-V PMP / user mode is not actually isolated via any means

Physical address space, starting at 0x00 in M Mode, fail at first control flow

Idea:
I would like to execute bare metal instructions from instruction_local_memory in the range from
0x00000000 to 0x0000FFFF
and access data_local_memory in the range from
0x00010000 to 0x0001FFFF

Problems:
After activating only INCLUDE_M_MODE and deactivating INCLUDE_S_MODE and INCLUDE_S_MODE, cva5 will not execute even a single instruction.

When activating all three modes, it will execute the first straight-line code instructions correctly, but will fail at the first control flow instruction (JAL) .

   0:	00000093          	li	ra,0
   4:	00000113          	li	sp,0
   8:	00000193          	li	gp,0
   c:	00000213          	li	tp,0
  10:	00000293          	li	t0,0
  14:	00000313          	li	t1,0
  18:	00000393          	li	t2,0
  1c:	00000413          	li	s0,0
  20:	00000493          	li	s1,0
  24:	00000513          	li	a0,0
  28:	00000593          	li	a1,0
  2c:	00000613          	li	a2,0
  30:	00000693          	li	a3,0
  34:	00000713          	li	a4,0
  38:	00000793          	li	a5,0
  3c:	00000813          	li	a6,0
  40:	00000893          	li	a7,0
  44:	00000913          	li	s2,0
  48:	00000993          	li	s3,0
  4c:	00000a13          	li	s4,0
  50:	00000a93          	li	s5,0
  54:	00000b13          	li	s6,0
  58:	00000b93          	li	s7,0
  5c:	00000c13          	li	s8,0
  60:	00000c93          	li	s9,0
  64:	00000d13          	li	s10,0
  68:	00000d93          	li	s11,0
  6c:	00000e13          	li	t3,0
  70:	00000e93          	li	t4,0
  74:	00000f13          	li	t5,0
  78:	00000f93          	li	t6,0
  7c:	00020137          	lui	sp,0x20 <--- correct
  80:	00010113          	mv	sp,sp
  84:	29c000ef          	jal	ra,320 <main> <--- Fail here

I debugged and simulated cva5 in questasim for two days. I double-checked:

Memory Sizes
Memory Layout
Stack Pointer
Valid execution for the sequential non-control flow instructions.

My questions:

Does cva5 still expect virtual instruction addresses at 0x8xxxxxx (e.g. related to an MMU)?
I cannot see a data local memory write operation (stack push) after the JAL. So my next guess would be a data memory layout (address) problem. How can I debug this more effectively? :)

Interrupts can lead to MEPC inconsistent with register state / actually retired operations or pointing to entirely illegal instructions

When an interrupt, such as a timer interrupt occurs, the captured MEPC does not always reflect the correct address where to continue execution after the interrupt finishes. This can randomly lead to significant corruption and crashes.

I have observed the following 3 inconsistencies::

CSR modifications that were already started will always finish, yet MEPC can point to the CSR operation (so as not-executed,to resume here after interrupt)
MEPC can sometimes capture a normal (register-write) instruction that still retires (when the retiring of said instruction falls into the same cycle as the capturing. Same as with CSR-modifications, if the respective operation uses the modified output as input (addi a0, a0, 1), erroneous re-execution leads to broken state / MEPC is inconsistent with the register state.
addresses of prefetched instructions, yet not ever executed / possibly illegal instructions can be captured

Example for the 3rd case:

jr [label]
[illegal instruction]

The illegal instruction will be fetched and maybe decoded prior to CVA5 executing the jump and flushing. Gc_unit does not prevent the address from being captured into MEPC when an interrupt hits at the right time.

I have fixes for all 3 cases applied in my fork of CVA5, but my fork includes other changes, such as an FPU, reworked verilator implementation and for example modifying the timer interrupt input to match the RV-Privileged spec (#20).
I have also written a test program that recreates all 3 cases for my fork and specific CVA5 configuration, using a timer implemented according to the RV-Priviliged spec. The benchmark arms the timer and expects it to request an interrupt at a specific cycle (and therefor instruction in the benchmark) that is still influenced by the branch predicator (I am using my production CVA5 config to run it).

I am also happy to share said benchmark and timer, but I have only tested them in my environment, which uses my own Picolibc build, runtime/board support library, CMake configuration, new binary input format and conversion tool for the verilator-simulator. So said test-code would probably need to be refactored into a test with specific, more consistent processor config and to only use the vanilla code base.

To outline the scope of fixes I have applied:

decouple MEPC capture from interrupt_taken signal
delay MEPC capture depending on if a instruction is currently being retired, ongoing CSR modification
changed oldest_pc from metadata_id block to next_retiring_insn_pc which applies branch_flush events and updates itself to not include illegal or unreachable addresses.

This was the simplest solution to fix the corruption and is probably not the most efficient solution to implementing interrupts. Looking at the timings it seems to be, every instruction that was already issued pre-interrupt could just retire normally instead of being rolled back (unlike when handling jumps/branches). Since we have to wait for all issued ops to retire anyways, it is wasteful to throw that state away (and can also take more cycles). Then some of my modifications would no longer be needed. But I found it too difficult to understand all the ways in which gc_unit needs to stay synchronized to other parts of the core to attempt that modification myself / expected this refactoring to be more work.

I am afraid I do currently not have the time to rebase my changes and test them without my code base so that I could provide a simple, conflict-free and tested patch-set.

Cache architecture of the CVA5 processor?

Hello,

We are working on integrating two cores of your processor into one multiprocessor system.

We have some questions.

Is your core include caches that are write-through or write-back?

Cache architecture of the previous version (Taiga)

Hello,

In your old version of Taiga, you didn't mention at your paper of Taiga configuration ........ a specific type of your cache architecture.
I have found that it is write through but now in your updated version you say that it is write back at a new paper for cva5.

could you show it to me if you have changed the architecture as firstly I worked on the old version?

Hardware Setup

Hi, I doing some project with this core on Zedboard but got some problem with the provided Hardware Setup from the Taiga Wiki page. When I run the bitstream down to Zedboard, it only write out 2 character. Could you provide any help on that? Thank you!

AXI DDR

Hi!

I'm wondering if the AXI DDR simulation supports write and read bursts.

Thanks.

State of AMO support

The documentation says the RISC-V A-extension for atomic instructions is supported, but now that I am attempting to use it, it seems broken.
How tested / working is the AMO support right now?

A first look at the code seems like it cannot work, but I have not tested this in-depth and could be wrong on some points.

Issues I have found thus far:

AMO-signals are passed combinatorially from the decoder (even bypassing the issue stage) to the DCache (through ls_unit), where they are fused with whatever address and data that is just arriving out of the load-store-queue. I do not see how this ensures that the address matches the AMO-signals
ls_offset is not 0 for AMO ops in decoder. ls_unit simply adds the offset to the base address for all inputs, even though AMO ops have no offset operand and reuse the same instruction bits for opcode-purposes
AMO operations (like amoadd) are treated as loads and go through the load-queue, not the store-queue. So they could be issued before the respective instruction is retired, unlike normal store operations. Seems wrong. I imagine the read-modify-write & swap operations may need to go through both queues
since AMO ops go only through the load-queue which does not have a data-field, the data_in provided to cache / memory does not match (from the store-queue, might be random/invalid). The data_in is used for the new data in case of amoswap or as an operand in case of amoadd and similar.
local memory ignores any AMO-signals silently.
- While that memory should only be for a single core and thus does not need atomic primitives the same way as main memory, this would require code to wrap every atomic primitive with range checks to switch to a non-primitive implementation for local memory. Although adding this should be easy, as there is only one source of atomic operations for local memory. The primitives would still provide a performance benefit over locking interrupts and doing the read-modify-write transaction with non-atomic instructions. ANd still be helpful for atomic operations in a multi-threaded environment.
clear_reservation input into dcache is documented to be used to reset reservation state on exceptions, but hardwired to 0
why is store-forwarding not used for AMO operations? It should provide the same benefit as it does for store-operations when rs2 is used by the operation
why do the sc_success & sc_complete signals not go through the l1_request & response interfaces? Do synthesis tools not optimize this properly when AMO is disabled?

Am I correct on all of those points? Are there more problems I have not even realized?
Is there a design-intent behind the current state that I do not yet see and it is just unfinished or is it so out-of-date it should not be considered at all?

Edit:

Also seems that AMO operations are not marked as using rd, so the result is discarded. Though simply changing this, causes the retire logic to dead lock, since it waits for a writeback, but the store will never start as it is waiting for the retire-signal. I am guessing not marking the use of rd and writing it anyway worked before the renamer was added.

Regarding Simulation of 'make run-example-c-project-verilator' in GitLab for CVA-5

Respected sir
I trust this email finds you well. My name is Tanishq.S, and I am a student from PES University, India. I am reaching out along with my team as we are currently engaged in a Capstone project focusing on the analysis of the CVA-5 Processor Performance . After installing the toolchain and running the make file "make run-example-c-project-verilator" i got this 9errors and 5 warnings :

{Please see the Error obtained in the Screenshots attached to this ISSUE}

Y

our assistance in these matters would be invaluable to our project, and we are eager to learn from your expertise. We are grateful for any guidance you can provide and assure you that your support will not be forgotten.

Thank you for considering our request. We look forward to your response.

Best regards,

Tanishq.S
PES UNIVERSITY , INDIA
+91 8050375861

ERROR OBTAINED :
tanishq@Tanishq:~/Documents/CVA-5/taiga-project$ make run-example-c-project-verilator

mkdir -p /home/tanishq/Documents/CVA-5/taiga-project/cva5/test_benches/verilator/build

verilator --cc --exe --Mdir /home/tanishq/Documents/CVA-5/taiga-project/cva5/test_benches/verilator/build -DENABLE_SIMULATION_ASSERTIONS --assert \

-o cva5-sim \

-Wno-LITENDIAN -Wno-SYMRSVDWORD --CFLAGS "-g0 -O3 -std=c++14 -march=native -DDDR_SIZE=(long)4*(long)1073741824 -DPAGE_SIZE=(2*1024) -DMAX_INFLIGHT_RD_REQ=8 -DMAX_INFLIGHT_WR_REQ=8 -DMIN_DELAY_RD=1 -DMAX_DELAY_RD=1 -DMIN_DELAY_WR=1 -DMAX_DELAY_WR=1 -DDELAY_SEED=867583" \

/home/tanishq/Documents/CVA-5/taiga-project/cva5/test_benches/verilator/CVA5Tracer.cc /home/tanishq/Documents/CVA-5/taiga-project/cva5/test_benches/verilator/SimMem.cc /home/tanishq/Documents/CVA-5/taiga-project/cva5/test_benches/verilator/cva5_sim.cc /home/tanishq/Documents/CVA-5/taiga-project/cva5/test_benches/verilator/AXI_DDR_simulation/axi_ddr_sim.cc /home/tanishq/Documents/CVA-5/taiga-project/cva5/test_benches/verilator/AXI_DDR_simulation/ddr_page.cc \

/home/tanishq/Documents/CVA-5/taiga-project/cva5/core/cva5_config.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/riscv_types.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/csr_types.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/cva5_types.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/l2_arbiter/l2_config_and_types.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/l2_arbiter/l2_interfaces.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/l2_arbiter/l2_external_interfaces.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/local_memory/local_memory_interface.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/local_memory/local_mem.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/interfaces.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/external_interfaces.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/lutrams/lutram_1w_1r.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/lutrams/lutram_1w_mr.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/lfsr.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/csr_unit.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/gc_unit.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/branch_comparator.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/branch_unit.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/barrel_shifter.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/alu_unit.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/axi_master.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/avalon_master.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/wishbone_master.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/axi_to_arb.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/one_hot_occupancy.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/binary_occupancy.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/cva5_fifo.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/shift_counter.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/priority_encoder.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/set_clr_reg_with_rst.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/intel/intel_byte_enable_ram.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/xilinx/xilinx_byte_enable_ram.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/byte_en_BRAM.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/one_hot_to_integer.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/cycler.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/tag_bank.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/dbram.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/ddata_bank.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/dtag_banks.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/amo_alu.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/dcache.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/addr_hash.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/load_queue.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/store_queue.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/load_store_queue.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/load_store_unit.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/ibram.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/itag_banks.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/icache.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/clz.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/div_core.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/div_unit.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/tlb_lut_ram.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/mmu.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/mul_unit.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/l1_arbiter.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/ras.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/branch_predictor_ram.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/branch_predictor.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/fetch.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/illegal_instruction_checker.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/decode_and_issue.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/register_free_list.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/renamer.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/register_bank.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/register_file.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/writeback.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/placer_randomizer.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/l2_arbiter/l2_fifo.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/l2_arbiter/l2_reservation_logic.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/l2_arbiter/l2_round_robin.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/l2_arbiter/l2_arbiter.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/toggle_memory.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/toggle_memory_set.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/instruction_metadata_and_id_management.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/cva5.sv /home/tanishq/Documents/CVA-5/taiga-project/cva5/test_benches/verilator/cva5_sim.sv --top-module cva5_sim

%Error: /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/writeback.sv:78:89: Member selection of non-struct/union object 'ARRAYSEL' which is a 'IFACEREFDTYPE'

                                                                            : ... In instance cva5_sim.cpu.writeback_block

78 | assign unit_instruction_id[i][j] = unit_wb[CUMULATIVE_NUM_UNITS[i] + j].id;

  |                                                                                         ^~

%Warning-WIDTH: /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/writeback.sv:78:50: Operator ASSIGNW expects 3 bits on the Assign RHS, but Assign RHS's CONST '1'h0' generates 1 bits.

                                                                                    : ... In instance cva5_sim.cpu.writeback_block

78 | assign unit_instruction_id[i][j] = unit_wb[CUMULATIVE_NUM_UNITS[i] + j].id;

  |                                                  ^

            ... Use "/* verilator lint_off WIDTH */" and lint_on around source to disable this message.

%Error: /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/writeback.sv:79:79: Member selection of non-struct/union object 'ARRAYSEL' which is a 'IFACEREFDTYPE'

                                                                            : ... In instance cva5_sim.cpu.writeback_block

79 | assign unit_done[i][j] = unit_wb[CUMULATIVE_NUM_UNITS[i] + j].done;

  |                                                                               ^~~~

%Error: /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/writeback.sv:80:61: Member selection of non-struct/union object 'ARRAYSEL' which is a 'IFACEREFDTYPE'

                                                                            : ... In instance cva5_sim.cpu.writeback_block

80 | assign unit_wb[CUMULATIVE_NUM_UNITS[i] + j].ack = unit_ack[i][j];

  |                                                             ^~~

%Error: /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/writeback.sv:90:77: Member selection of non-struct/union object 'ARRAYSEL' which is a 'IFACEREFDTYPE'

                                                                            : ... In instance cva5_sim.cpu.writeback_block

90 | assign unit_rd[i][j] = unit_wb[CUMULATIVE_NUM_UNITS[i] + j].rd;

  |                                                                             ^~

%Warning-WIDTH: /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/writeback.sv:90:38: Operator ASSIGNW expects 32 bits on the Assign RHS, but Assign RHS's CONST '1'h0' generates 1 bits.

                                                                                    : ... In instance cva5_sim.cpu.writeback_block

90 | assign unit_rd[i][j] = unit_wb[CUMULATIVE_NUM_UNITS[i] + j].rd;

  |                                      ^

%Error: /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/gc_unit.sv:238:52: Member selection of non-struct/union object 'ARRAYSEL' which is a 'IFACEREFDTYPE'

                                                                           : ... In instance cva5_sim.cpu.gc_unit_block

238 | assign exception_pending[i] = exception[i].valid;

  |                                                    ^~~~~

%Error: /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/gc_unit.sv:239:49: Member selection of non-struct/union object 'ARRAYSEL' which is a 'IFACEREFDTYPE'

                                                                           : ... In instance cva5_sim.cpu.gc_unit_block

239 | assign exception_code[i] = exception[i].code;

  |                                                 ^~~~

%Warning-WIDTH: /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/gc_unit.sv:239:34: Operator ASSIGNW expects 5 bits on the Assign RHS, but Assign RHS's CONST '1'h0' generates 1 bits.

                                                                                   : ... In instance cva5_sim.cpu.gc_unit_block

239 | assign exception_code[i] = exception[i].code;

  |                                  ^

%Error: /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/gc_unit.sv:240:47: Member selection of non-struct/union object 'ARRAYSEL' which is a 'IFACEREFDTYPE'

                                                                           : ... In instance cva5_sim.cpu.gc_unit_block

240 | assign exception_id[i] = exception[i].id;

  |                                               ^~

%Warning-WIDTH: /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/gc_unit.sv:240:32: Operator ASSIGNW expects 3 bits on the Assign RHS, but Assign RHS's CONST '1'h0' generates 1 bits.

                                                                                   : ... In instance cva5_sim.cpu.gc_unit_block

240 | assign exception_id[i] = exception[i].id;

  |                                ^

%Error: /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/gc_unit.sv:241:49: Member selection of non-struct/union object 'ARRAYSEL' which is a 'IFACEREFDTYPE'

                                                                           : ... In instance cva5_sim.cpu.gc_unit_block

241 | assign exception_tval[i] = exception[i].tval;

  |                                                 ^~~~

%Warning-WIDTH: /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/gc_unit.sv:241:34: Operator ASSIGNW expects 32 bits on the Assign RHS, but Assign RHS's CONST '1'h0' generates 1 bits.

                                                                                   : ... In instance cva5_sim.cpu.gc_unit_block

241 | assign exception_tval[i] = exception[i].tval;

  |                                  ^

%Error: /home/tanishq/Documents/CVA-5/taiga-project/cva5/core/gc_unit.sv:242:29: Member selection of non-struct/union object 'ARRAYSEL' which is a 'IFACEREFDTYPE'

                                                                           : ... In instance cva5_sim.cpu.gc_unit_block

242 | assign exception[i].ack = exception_ack;

  |                             ^~~

%Error: Exiting due to 9 error(s), 5 warning(s)

make: *** [/home/tanishq/Documents/CVA-5/taiga-project/cva5/tools/cva5.mak:90: /home/tanishq/Documents/CVA-5/taiga-project/cva5/test_benches/verilator/build/cva5-sim] Error 1

tanishq@Tanishq:~/Documents/CVA-5/taiga-project$

Cached memory interface: AXI wstrb not stable if AW handshake completes first

CVA5's dcache memory interface (via l1_arbiter, l2_arbiter, axi_to_arb) starts each single beat AXI AW handshake simultaneously to the only W handshake. The byte enable value be (used for wstrb) is stored alongside the address in the l2_request_t structure, and the FIFO entry is popped as soon as the AW handshake is complete. However, wstrb is in the W handshake alongside wdata.

Scenario: Downstream bus component sets wready only after having accepted an AW handshake. CVA5 will pop from the request FIFO, and wstrb will change to whatever is in the next valid or invalid FIFO element, even though the W handshake is still active.

Effect: Depends on the bus components. In our case (using only Xilinx IP downstream), the presence or absence of an AXI crossbar appears to determine whether wstrb is sampled in the first W handshake cycle, where the AW handshake is still valid and the fault would be hidden, or in a later cycle. In that case, bytes may unintenionally be masked on or off, so data in memory could become inconsistent.

(Suggested) solution: Move be/wstrb to the same FIFO as the write data. I'm preparing a pull request but still doing some validation of the patches for upstream.

Peripheral Reads do not wait for retiring, can cause state corruption

Just like normal memory loads, peripheral loads are in their own load-queue and execute in-order with respect to each other, but do not await their originating instruction be retired. This means that CVA5 can sometimes execute them speculatively and then try to rollback / invalidate the results, when the speculation was wrong / a flush occurs.
But since peripheral loads can have destructive side-effects, as is for example the case the UART peripheral in taiga-project, this can lead to state corruption. I.e. when the read of the UART-RX register was done speculatively and is later reverted or an interrupt (with current interrupt logic) hits and proceeds to revert / discard all results from already-issued instructions.

Yet the state of the UART peripheral will not be reverted, loosing one received byte from UART, because a read of the RX register will cause the peripheral to advance its state, forget the current byte and move on to the next byte.
-> All peripheral accesses must be treated like memory stores are. Await actual retirement of their instruction, before being executed.

how to use jtag in AMD FGPA

hi,
i'd like to use the fpga jtag prot to debug this core, and i see the jtag-related modules inside of the code,.
Is there any example project to demo how conecnt the BSCAN primitve of the AMD fpga to this core?

Clarification Regarding Simulation of CVA-5 Processor

Respected sir
Subject: Request for Assistance - CVA-5 Processor Simulation

I trust this email finds you well. My name is Tanishq.S, and I am a student from PES University, India. I am reaching out along with my team as we are currently engaged in a Capstone project focusing on the analysis of the CVA-5 Processor Performance . To apply our Concepts we have to analyze all the internal signals of the processor so we choose simulating in Xilinx Vivado and we could not simulate successfully .

I have a few specific queries that I believe your expertise could help address:

Design File (cva5.sv):
- Could you please provide clarity on the necessary files in the GitHub repository required for the design file (cva5.sv) {Top Module}? We are particularly interested in identifying the key dependencies and components essential for the successful execution of the top module.
TestBench File (cva5_tb.sv):
- Similarly, we would appreciate guidance on the required files in the GitHub repository for the testbench file (cva5_tb.sv){Top Module}. Understanding the dependencies and necessary components for the top module in the testbench would greatly assist us in overcoming our current simulation challenges.
Providing Our Own C File:
- Our intention is to incorporate our own C file into the simulation for a more targeted analysis of the processor's performance. Could you please guide us on the process of providing our C file to the processor for analysis of the output obtained through the simulation window?

Regarding the testbench file, we observed that a default memory file path was used: "/home/ematthew/Research/RISCV/software/riscv-tools/riscv-tests/benchmarks/dhrystone.riscv.hw_init." Is this where we should specify the path to our own C file for analysis, or is there a different procedure we should follow?

And along with the above Guidance Please also give assistance in concept of use of "parameter cpu_config_t CONFIG = EXAMPLE_CONFIG" in some files present in the Github Repository

Your assistance in these matters would be invaluable to our project, and we are eager to learn from your expertise. We are grateful for any guidance you can provide and assure you that your support will not be forgotten.

Thank you for considering our request. We look forward to your response.

Best regards,

Tanishq.S
PES UNIVERSITY , INDIA

cva5 simulation using questasim

Hi,
Thanks for your efforts.

I tried to run through the following path test_benchs/cva5_tb.sv, after creating a questasim makefile. But, I found out a lot of x propagations that failed the simulation. i tried to add a reset condition for all always_ff blocks in the code but still have x propagations.