openhwgroup / programs Goto Github PK

Documentation for the OpenHW Group's set of CORE-V RISC-V cores

License: Other

Makefile 0.01% HTML 57.49% CSS 4.05% JavaScript 38.36% TeX 0.10%

programs's Introduction

programs

Program and project level documentation for all of OpenHW Group's projects including CORE-V RISC-V cores.

Directory contents...

TWG

Technical Working Group meeting minutes and presentations.

TG

Task Group monthly reports, meeting minutes and presentations. This comprises

cores-task-group

hw-task-group

verification-task-group

Currently the sw-task-grouped is tracked in the core-v-sw repo.

OpenHW-Project-Descriptions-and-Plans

This contains one directory per OpenHW project. Each directory should contain gate materials for the project (Project Concept, Project Launch, Plan Approve, and Project Freeze) Readers can use this information to see all OpenHW projects at a glance

OpenHW-dashboard

This contains a markdown file with project dashboard information for OpenHW. This file is linked from the OpenHW website.

process

This contains OpenHW process and template documents, such as gate templates and RTL Freeze template.

Attendance-tracking

This contains attendance tracking for all OpenHW meetings.

Issues and Troubleshooting

If you find any problems or issues with the structure or content of this repo, please check out the issue tracker and create a new issue if your problem is not yet tracked.

programs's People

Contributors

Stargazers

Watchers

Forkers

agrasset alfredoh1234 silabs-wajidm silabs-arjanb davideschiavone silabs-paulz awfeequdng darbaria emmicro-us silabs-oysteink andreaskurth datum-technology-corporation strichmo gtumbush shetalani vitor-sato zarubaf dbees jeremybennett ntuszynski jerryzeng11 yzh20020301 masgia quicklogic-corp leo-wang-jl jian-fang picopet nanoeng byungwoo733 asintzoff flip1995 pietraferreira imperas subhrakantidas2020 mfkiwl treforsouthwell silabs-mateilga eeeslab shangyunhai hpollittsmith moimfeld tariqkurd-repo e-matthews hfyfpga jasperlin05 pascalgouedo loesterfranco datum-dpoulin michelleclancy leehoff siriusm46 rickoco panicmarvin ominux silabs-robin weijilab silabs-hfegran tinyloop cfuguet liweiwei90 abukharmeh chunyuliao abberthe markhillhuawei davidharrishmc simon5656 https-github-com-stineje jeanrochcoulon mikeopenhwgroup promodkumar-ashling n9wxu sandro2pinto hanfeng0114 maronson22 jpc-lip6 jquevremont matthew8749 anjaliigedam jbalkind adrian-cea planvtech cathales nxp web-logs2 mpezzin paolos02 francoislp datum-technology-corporation cst-rameez dd-beenee xavieraubert 10x-engineers

programs's Issues

Documentation for pv.cplxconj.b?

The simulation-time disassembler disassembles a certain instruction to pv.cplxconj.b x12,x3,x12

Is the disassembly wrong or is this instruction not documented (there is documentation about pv.cplxconj, but not about something with .b; also the documented instruction does not have 3 register operands)

Multiply-Accumulate right shift notation incorrect

In the document at the link below:
https://core-v-docs-verif-strat.readthedocs.io/projects/cv32e40p_um/en/latest/instruction_set_extensions.html#multiply-accumulate

In section Pulp Instruction Set Extensions > Multiply-Accumulate: For all unsigned instructions that also included a right shift, documentation specifies logical right shift, but uses three arrows instead of two. This is rather confusing and/or poorly worded as it could cause confusion as the mathematical equation denotes arithmetic right shifting. I know this is nitpicking, but the notes for instructions may be ignored by some users. Testing results show that RTL performs logical right shifting for these instructions. Example below was copied (with minor formatting for readability) from documentation at the link above:

p.muluN rD, rs1, rs2, Is3
rD[31:0] = (Zext(rs1[15:0]) * Zext(rs2[15:0])) >>> Is3 Note: Logical shift right

the second line of which could be simplified to:
rD[31:0] = (Zext(rs1[15:0]) * Zext(rs2[15:0])) >> ls3

Please note this effects a total of 8 instructions:
p.muluN
p.mulhhuN
p.muluRN
p.mulhhuRN
p.macuN
p.machhuN
p.macuRN
p.machhuRN

Vplan for XPULP SIMD Extended Instructions

XPULP SIMD instructions have been categorized as an "Unconditional Required" feature for CV32E40P.

These instructions are documented in Section 14.5 of the RI5CY User Manual. A couple of things to be aware of:

Its possible that the RISCY User Manual may be transferred to the OpenHW Group and be re-branded as the CV32E40P User Manual during the time this Vplan is captured.
There is a high probability that the User Manual does not contain sufficient information to complete the SIMD Vplan. In these instances, create a GitHub issue with a documentation label and assign it to Davide Schiavone.

Please use the Vplan template and review the HOWTO document prior to starting this task.

The Vplan spreadsheet shall be committed to the XPULP Instructon Extensions directory.

Wrong description of CSR Privilege Level

The field of the Privilege Level CSR in the documentation is not correctly idented and contains multiple registers descriptions.

last instruction of HW loop body is skipped

There is HW loop #0 configured to run 10 times following 2 instructions: "jal ra, _some_function" and "add x20, x20, x21".
Instruction jal performs jump operation to address with mark "_some_function" and stores NPC value to register ra. In our case NPC should have pointer to the instruction "add x20, x20, x21". Pointer "_some_function" stores address of some block of code with dummy calculations that ends with instruction "jalr ra". This instruction performs jump operation with the address that stored in ra register.
So we can predict that HW loop should execute 10 times next sequence of instructions: jump to "_some_function", perform dummy instructions, jump back to the loop, do add instruction.

After jumping back to the HW loop, the last instruction "add x20, x20, x21" was skipped.

Source code of the test:

_start_jmp:
j _start #8000 0080
_start:

li x20, 1
li x21, 1

lp.starti x0, __hw_loop_start
lp.endi  x0, __hw_loop_end
lp.counti x0, 10

__hw_loop_start: jal ra, _some_function

add x21, x21, x20

__hw_loop_end: add x20, x20, x21

j _start_jmp

_some_function:
nop
nop
nop
nop
nop
nop
jalr ra

To simplify debugging I will add few lines from dissemble file..

"_some function" block of code:

8000009a <_some_function>:
_some_function():
8000009a: 0001 nop
8000009c: 0001 nop
8000009e: 0001 nop
800000a0: 0001 nop
800000a2: 0001 nop
800000a4: 0001 nop
800000a6: 9082 jalr ra

HW loop initialization with body:
80000086: 0060007b lp.starti x0,80000092 <__hw_loop_start>
8000008a: 0060107b lp.endi x0,80000096 <__hw_loop_end>
8000008e: 00a0307b lp.counti x0,10

80000092 <__hw_loop_start>:
__hw_loop_start():
80000092: 008000ef jal ra,8000009a <_some_function>

80000096 <__hw_loop_end>:
__hw_loop_end():
80000096: 9a56 add s4,s4,s5

In the waveform we can see that after jumping back to the HW loop body, counter was decremented and next executed instruction has PC 0x80000092, but expected PC should be 0x80000096. HW loop counter should be decremented after execution of the add instruction.

Error message from the scoreboard:
UVM_ERROR /proj/dbm10l/denisg/ws_dbm10l_test/blk/ri5cy/verif/classes/pulpino_scoreboard_c.sv(323) @ 2963000: uvm_test_top.pp_env.pp_scoreboard [SPIKE_DBG] PC compare mismatch. in the spike 0xffffffff80000096, in the design: 0x80000092

To confirm that instruction was executed with mistakes, I added register x20 and x21 to the waveform.W we saw they are still constant 1 till the end of the simulation, so add instruction which can modify it was not executed.

Bit width discrepancy in multiply-accumulate documentation

In the documentation at the below link:
https://core-v-docs-verif-strat.readthedocs.io/projects/cv32e40p_um/en/latest/instruction_set_extensions.html#multiply-accumulate

In section Pulp Instruction Set Extensions > Multiply-Accumulate, bit width for instructions that utilise the upper half of the input registers have the bit range used as [31:15], when they should state that the range is [31:16]. For example, for instruction p.muls, the description of its functionality reads as such:
rD[31:0] = Sext(rs1[15:0]) * Sext(rs2[15:0])

while instruction p.mulhhs, (its upper-half utilising counterpart) reads as such:
rD[31:0] = Sext(rs1[31:15]) * Sext(rs2[31:15])

This is faulty information for p.mulhhs, as testing with the rtl shows that the bit range used is [31:16]. This affects descriptions of 10 instructions in total:
p.mulhhs, p.mulhhsN, p.mulhhsRN, p.mulhhu, p.mulhhuN, p.mulhhuRN, p.machhsN, p.machhsRN, p.machhuN, and p.machhuRN

Move HWLoop CSRs into the CSRs section of the doc

Currently the CSRs of the HWLoop are described in the HWLoop section.
Moving them to the CSRs one and just add reference to them in the HWloop section

Multiple Issues of DPC in table on CS registers

In the table of control_status_registers the are 2 DPC instances,

one at address 0x7B1 and one at 0x7AA

Update IRQ mode description

Task Title

Update Interrupt description and related CSR

Background information

If the changeset is merged: openhwgroup/cv32e40p#299,
RI5CY User Manual (https://github.com/pulp-platform/riscv/blob/master/doc/user_manual.doc) need to be modified. Especially, section 10.7, 10.8, 10.9 and 12.1.

Additional context

Reference: openhwgroup/cv32e40p#299

Pinout: Replace core_id_i and cluster_id_i by hart_id_i

Currently the core_id_i and cluster_id_i can both be read out from the 32-bit mhartid CSR where they are packed as follows: {21'b0, cluster_id_i[5:0], 1'b0, core_id_i[3:0]}. There really is no need for enforcing semantics on these bits inside the CV32E40P (that can be left to be defined by the integrator). In Ibex a more general purpose solution is used (without losing any functionality) where the 32-bit mhartid CSR simply reflects the 32-bit hart_id_i input signal.

The original behavior of RI5CY can be achieved by simply integrating CV32E40P with the following hookup:

.hart_id_i ({21'b0, cluster_id[5:0], 1'b0, core_id[3:0]}),

Vplan for Debug

Capture the Verification Plan for the Debug feature of the CV32E40P.

At this time the RI5CY User Manual does not contain sufficient information to capture a Vplan for all of these Features, but it is believed that the RISC-V External Debug Support Version 0.13.2 specification from the RISC-V foundation plus the debug-system from pulp-platform should be sufficient. If this is not the case, and additional information is required, create a documentation issue and assign it to Davide Schiavone.

Background information

Please review the VPLAN HOWTO document prior to starting this task.

Document Template

Verification Plan template

Document Location

Please commit the Vplan spreadsheet to the Verification Plan Debug-Trace directory.

Pinout: Rename test_en_i to scan_cg_en_i

The name test_en_i is a bit misleading. Normally 'scan test enable' is kept separate from 'scan clock gate enable'. The test_en_i signal is used to enable clock gates during scan, so it would be better to use a more clearly named scan_cg_en_i for this. The proposal is to perform this name change all the way down and into instantiated clock gates.

spec change - restrict HW loop with only one instruction

need to state in spec that HW loop with only one instruction is restricted

RI5CY: User Manual v4.2-14.5.1 error in complex imaginary multiplication pv.cplxmul.i.*

If using complex convention where rD[15-0] is the real part and rD[31-16 is the imaginary part:

pv.cplxmul.i.{/,div2,div4,div8} it wrong
- Current: rD[ 15:0 ] = (rs1[0]*rs2[1] + rs1[1]*rs2[0])>>{15,16,17,18} and rD[ 31:16 ] = rD[ 31:16 ]
- Should be?: rD[ 31:16 ] = (rs1[0]*rs2[1] + rs1[1]*rs2[0])>>{15,16,17,18} and rD[ 15:0 ] = rD[ 15:16 ]

Vplan for Custom Circuitry

For CV32E40P Custom Circuitry is defined by the CV32E40 Features and Parameters document as:

PULP Cluster Interface
Sleep Interface
PULP Zfinx

At this time the RI5CY User Manual does not contain sufficient information to capture a Vplan for all of these Features, so this task will not be assigned until the CV32E40P User Manual is available.

Background information

Please refer to the VPLAN HOWTO document prior to starting this task.

Document Template

Verification Plan template

Location for Completed Documentation

Commit the Vplan spreadsheet to the custom_circuitry directory.

Additional context

Note that the Cluster Interface and PULP Zfinx Features have been categorized as a "Tentative" feature for CV32E40P. Recall from the CV32E40 Features and Parameter that a tentative feature is considered low priority and are allowed to be partially verified (they are the first features for which verification can be skipped if so demanded by the schedule). The absence setting of a tentative feature does need to be verified within the schedule.

Nevertheless, it is important to capture the Cluster Interface Vplan early so that we can make an early determination about whether this Feature will be fully verified in time for first silicon.

RI5CY: User Manual v4.2-14.5.1 pv.subrotmj.* instruction undefined

There is no description of what the pv.subrotmj.* instruction is. From another document on the web, it seems like it is the "substraction of two 16-bit complex numbers with post rotation by -j" or

rD[15:0 ] = rA[31:16] - rB[31:16]
rD[31:16] = -(rA[15:0 ] - rB[15:0])

Please clarify

Non-Maskable Interrupt

Looking at the RTL I see a signal at the top level for irq_nmi_i but I do not see any description in the manual for NMI, in particular when asserted does the core react to this interrupt, and if so what is the vector address for NMI ?
thx
Lee

Which Instruction Fetch does CV32E40P and CV32E40 use?

Section 2 of the RI5CY user manual states:

There are two prefetch flavors available:
* 32-Bit word prefetcher. It stores the fetched words in a FIFO with three entries.
* 128-Bit cache line prefetcher. It stores one 128-bit wide cache line plus 32-bit to allow for cross-cache line misaligned instructions.

Several questions arise:

Are these mutually exclusive logic modules? If so:

How are they selected at simulation compile-time and synthesis time?
Which of these are being used by CV32E40P and CV32E40?

If these are not mutually exclusive, how does the prefetch unit logic determine which will be used to prefetch from the cache and why?

Pinout: Add dm_exception_addr_i pin

The proposal is to add a dm_exception_addr_i[31:0] pin.

This will provide an address to jump to when an exception occurs when executing code during debug mode.

The reasons for this addition are as follows:

We want to use a pin as opposed to a parameter to match how boot_addr_i and dm_halt_addr_i is implemented
This will address the issue openhwgroup/cv32e40p#185, where exceptions are not handled correctly during debug mode
This solution is the same as seen on the IBEX core

RI5CY: User Manual v4.2-14.5.1 pv..div2, pv..div4 and pv.*.div8 unclear

The description for the pv..div2, pv..div4 and pv.*.div8 instructions cannot be understood from description.
Current descriptions do use the *.h appendix to indicate operations are on 16-bit operands. Also, rs2 is refereed to in section 14.5.2 but not in section 14.5.1. Finally, it is unclear why the "0xFFFF" mask may be needed and where it is stored to be used (is it hard-coded?)

For example:

pv.add.div2 rD, rs1, rs2
- rD[0] = (rs1[0] + rs2[0])/2
- rD[1] = (rs1[1] + rs2[1])/2
pv.add.div4 rD, rs1, rs2
- rD[0] = (rs1[0] + rs2[0])/4
- rD[1] = (rs1[1] + rs2[1])/4
pv.add.div8 rD, rs1, rs2
- rD[0] = (rs1[0] + rs2[0])/8
- rD[1] = (rs1[1] + rs2[1])/8

RI5CY: User Manual v4.2: p.clip* instructions imply ls2 unsigned rs1 signed for all

RI5CY: User Manual v4.2 section 14.3.3 does not clarify which arguments are signed and which are unsigned.

From pseudo code seems like ls2 is always unsigned and rs1 is always signed.

Clarification or annotation may be needed. Please clarify.

Pinout: Replace DM_HALTADDRESS parameter by dm_halt_addr_i pin

Proposal is to replace the DM_HALTADDRESS parameter by a dm_halt_addr_i pin.

The reasons for this change are as follows:

Treat the debug halt address in the same way as boot_addr_i (which is also a pin and not a a parameter)
Increase coverage related to this functionality (it is easier to change the pins value than to change the parameter value for (formal) verification)

Cycle counts for multiply and divide not correct in documentation

The documentation (multiply_accumulate.rst) states the following:

The multiplications with upper-word result (MSP of 32-bit x 32-bit
multiplication), take 4 cycles to compute. The division and remainder
instructions take between 2 and 32 cycles. The number of cycles depends
on the operand values.

The above is not correct for mulh*, div* and rem* instructions.

mul instructions take 1 cycle (based on the RTL FSM)
mulh, mulhsu, mulhu take 5 cycles (based on the RTL FSM)

For div, divu, rem, remu instructions I am not sure what the range is, but for sure these instructions can take 31, 32, 34, 35 cycles (counts based on simulation, not on RTL code analysis).

So remaining question is: What is the cycle count range for div, divu, rem, remu instructions and is it easy to explain the rules depending on the operand values?

The multiply_accumulate.rst file will be removed, so please don't update that file, but answer the question in this ticket (I am adding a cycle count table in pipeline.rst).

RI5CY: User Manual v4.2-14.5.1 missing information for pv.shuffle2.* instruction

The pv.shuffle.h and pv shuffle2.b instructions assign parts of rs1 to rD, but the instruction description doesn't indicate which part:
For example:

pv.shuffle2.h:
- rD[31:16] = ((rs2[17] == 1) ? rs1 [missing information here]
pv.shuffle2.b:
- rD[31:24] = ((rs2[26] == 1) ? rs1 [missing information here]

Typo: RI5CY User Manual v4.2, description of p.add* instructions

@davideschiavone , in RI5CY user manual:
https://github.com/pulp-platform/riscv/blob/master/doc/user_manual.doc

In section 14.3.3 General ALU Operations, the description of p.addN has following note:
- Note: Arithmetic shift right. Setting Is3 to 2 replaces former p.avg
In section 14.3.3 General ALU Operations, the description of p.adduN has following note:
- Note: Logical shift right. Setting Is3 to 2 replaces former p.avg
In section 14.3.3 General ALU Operations, the description of p.addRN has following note:
- Note: Arithmetic shift right.
In section 14.3.3 General ALU Operations, the description of p.adduRN has following note:
- Note: Logical shift right.
In section 14.3.3 General ALU Operations, the description of p.addNr has following note:
- Note: Arithmetic shift right.
In section 14.3.3 General ALU Operations, the description of p.adduNr has following note:
- Note: Logical shift right.
In section 14.3.3 General ALU Operations, the description of p.addRNr has following note:
- Note: Arithmetic shift right.
In section 14.3.3 General ALU Operations, the description of p.adduRNr has following note:
- Note: Logical shift right.

It seems that ls3 should be replaced by 2^(ls3-1) like this :

p.addN		rD = (rs1 + rs2) >>> 2^(Is3-1)
p.adduN		rD = (rs1 + rs2) >> 2^(Is3-1)
p.addRN		rD = (rs1 + rs2 + 2^(Is3-1)) >>> 2^(Is3-1)
p.adduRN		rD = (rs1 + rs2 + 2^(Is3-1)) >> 2^(Is3-1)

In section 14.3.3 General ALU Operations, the description of p.addNr has following note:
- Note: Arithmetic shift right.
In section 14.3.3 General ALU Operations, the description of p.adduNr has following note:
- Note: Logical shift right.
In section 14.3.3 General ALU Operations, the description of p.addRNr has following note:
- Note: Arithmetic shift right.
In section 14.3.3 General ALU Operations, the description of p.adduRNr has following note:
- Note: Logical shift right.

It seems that rs2[4:0] should be replaced by 2^(rs2[4:0]-1) like this :

p.addNr		rD = (rD + rs1) >>> 2^(rs2[4:0]-1)
p.adduNr		rD = (rD + rs1) >> 2^(rs2[4:0]-1)
p.addRNr		rD = (rD + rs1 + 2^(rs2[4:0]-1)) >>> 2^(rs2[4:0]-1)
p.adduRNr		rD = (rD + rs1 + 2^(rs2[4:0]-1)) >> 2^(rs2[4:0]-1)

Alfredo

Remove contribution table from cv32e40p doc

The table with all the changes is now not useful anymore as the documentation is text based and history can be found in git commits.

We want to remove it

Complete APU description in cv32e40p doc

The APU interface description is not complete.
Complete it by describing the meaning of the signals and what happens when the FPU is not used

Vplan for RVI

Capture the Verification Plan for both the RVI Instruction Bus Interface and RVI Data Bus Interface of the CV32E40P.

At this time the RI5CY User Manual does not contain sufficient information to capture a Vplan for either of these Features, so this task is a placeholder for now. An update to the user manual is expected soon.

Background information

Please review the VPLAN HOWTO document prior to starting this task.

Document Template

Verification Plan template

Location for Completed Documentation

Commit the Vplan spreadsheet to the Debug & Trace directory.

Comments on [...]core-v-docs/tree/master/verif/README.md

Type

Other: feedback on current file and suggestions for edits

Location of Issue

https://github.com/openhwgroup/core-v-docs/tree/master/verif/README.md

Additional context

[Requirement Location]

It would be preferable to have table with names, URL and format used to identify requirement location in tables, to help when combining and sorting spreadsheets when needed

[Feature]

It may be preferable to use the section header's name in the reference document. This will help when navigating across documents and test plans

[Sub Feature}

This should be mandatory, using the instruction pseudo code as the sub feature, e.g. in the example of the ADDI instruction above:
addi rD, rs1, Imm

[Feature Description]

The description should use the Sub Feature pseudo code terms when describing the feature and explain the terms using that perspective. It may help by adopting the defacto description of an instruction.
The four initial columns of the Test Plan may show the ADDi instruction in this way:

Requirement Location	Feature	Sub-Feature	Feature Description
RISC-V Unprivileged ISA Ver. 20190608-Base-Ratified 2.4	Integer Computational Instructions	addi rD, rs1, Imm	rD receives the addition of rs1 and Imm

Read the Docs "Control and Status Registers" table has errors

The rendering of the "Control and Status Registers" table in Read the Docs and its source file in the repository have errors

Memory range assigned twice:

CSR Address	Hex	Name	Acc.	Description
01	11	10	110xxx	0x7B0-0x7B7
--	--	--	--	--
01	11	10	110000	0x7B0
01	11	10	110001	0x7B1
01	11	10	110010	0x7B2
01	11	10	110011	0x7B3
--	--	--	--	--

Update Interrupt Documentation on CV32E40P

Interrupts changed on CV32E40P and they are based on the CLINT spec.
The implementation is based on the Ibex core and it has been extended with 32 extra
fast custom interrupts.

Module: Rename riscv_core to cv32e40p_core

The currently used module names (and file names) are not future proof. They are a bit generic and will lead to Verilog naming conflicts once we introduce multiple CV* cores. Proposal is to adopt naming similar to as is done in the Ibex projects (where each module name is prefixed with ibex_).

This task is to rename the the module name from riscv_core to cv32e40p _core and the associated filename from riscv_core.sv to cv32e40p _core.sv.

(All other files/modules need to be renamed as well, but that will be done as a separate step as it does not impact the documentation and verification in the same manner).

How to handle new interrupt inputs

The CV32E40P has a number of new interrupt inputs:

Is it safe to tie all these off to 'b0 until such time as the testbenches can do something interesting with them?

Related question: when/where will the function of these new inputs be documented?

Conditionally exclude User mode related content from User Manual

The CV32E40P User Manual contains many references to User mode whereas the core only implements Machine mode.

In particular the Control and Status Registers chapter contains many User mode related content, e.g. USTATUS, UEPC, UTVEC, UCAUSE, etc. as well as reference to PULP_SECURE.

Also section 10.2 lists register USTATUS, this is not listed in the full CSR listing 10. and should no longer be present as USER mode is not present; same remark for UTVEC, UEPC, UCAUSE.

Content should be made conditional (in similar manner as was done for PMP) so that it can be included again for future cores that actually do have User mode.

Vplan for CV32E40P Interrupts

Capture the Verification Plan for both the CLINT and CLINT extension of the CV32E40P.

At this time the RI5CY User Manual does not contain sufficient information to capture a Vplan for all of these Features, so this task will not be assigned until the CV32E40P User Manual is available.

Background information

For Verification Plan related tasks, reference the VPLAN HOWTO document prior to starting this task.

Document Template

Verification Plan template

Document Location

Please commit the Vplan spreadsheet to the Verification Plan interrupts directory.

Too much whitespace in the RI5CY block diagram

Hi Davide. I just merged your pull-request on this repo (#2 (comment)). From a revision-control perspective it went perfectly - thanks!

The block diagram has a lot of white-space that pushes the text off the bottom of the screen. The text is still there, but it is pushed off screen. Can the white-space be cut off the block diagram?

Document p.elw instruction (encoding, behavior)

The p.elw instruction is present in CV32E40P but its operation nor encoding is mentioned in the User Manual.

Same remark for variants of this instruction: E.g. post increment or not.

Hardware loop documentation unclear

Chapter 7 PULP Hardware Loop Extensions of the RI5CY User Manual states:

HWLoop body must contain at least 3 instructions

and also

Note that the minimum loop size is two instructions and the last instruction cannot be any jump or branch instruction.

I assume that the following:

Start address of an HWLoop must be aligned

should be:

Start address of an HWLoop must be word aligned

Could you please comment on the minimum loop size and alignment restriction?

Add code example for HWLoop

An example of how to use the HWLoop will be added in the documentation

RI5CY: User Manual v4.2: p.clipr instructions opcode seems wrong

RI5CY: User Manual v4.2 section 14.3.3 appears to show inaccurate p.clipr opcode:

Mnemonic	Description
p.clipr rD, rs1, rs2	if rs1 <= -(rs2+1), rD = -(rs2+1), else if rs1 >=rs2, rD = rs2, else rD = rs1

Is it indicating that the value stored in rs2 should be inverted before checking if rs1 <= -(rs2+1)

Shouldn't the opcode show rs2 to be signed and the opcode be something like rs1 <= Sext(rs2+1) , rD = Sext(rs2 +1)

Complete CV32E40P sections of the Verification Strategy

The CV32E40P sections CV32E Verification Strategy needs to be complete so that individual ACs can

Add testcases to either the core or uvme_cv32 verification environments.
Add and integrate new components into uvme_cv32.
Run and debug simulations.

RI5CY: User Manual v4.2-14.5.1 Vectorial ALU Operations table missing information

The Description for the Mnemonics is missing information to understand how the instructions operate.
For example, the indexing variable "i" range and meaning are not defined; pv..sc. operand use is not clearly described; carry/borrow handling are not described; and the use of the "0xFFFF" mask is undefined in all cases.
For example:

pv.add.h is understood as:
- rD[31-16] = rs1[31-16] + rs2[31-16]
- rD[15-0] = rs1[15-0] + rs2[15-0]
Is pv.add.sc.h understood as?
- rD[31-16] = rs1[31-16] + Zext(rs2[15-0])
- rD[15-0] = rs1[15-0] + Zext(rs2[15-0])
Is pv.add.sci.h understood as?
- rD[31-16] = rs1[31-16] + Zext(Imm6[5-0])
- rD[15-0] = rs1[15-0] + Zext(Imm6[5-0])

Pinout: Document PULP Cluster pins

The Pulp Cluster functionality is not currently described in the User Manual. I added the core-v-docs/cores/cv32e40p/user_manual/source/pulp_cluster.rst file to start the documentation for this feature (pull request will be issued soon). I think that the description is not yet good enough to really know how to use clock_en_i and core_busy_o. For example, I would like to state something like the following: 'clock_en_i is allowed to be set to 0 when core_busy_o = 0'. I wonder if that is really the intention of this interface or not. The difficulty is in the reset/inital behavior of core_busy_o (it is 0 during reset and for a couple of cycles after reset); @davideschiavone can you please comment on how you expect that clock_en_i is used during reset plus the initial cycles of CV32E40P while core_busy_o is still 0? Davide, can you also please review the text I wrote so far (based on info from openhwgroup/cv32e40p#338) and specifically provide some text for core_busy_o's description signal in the table and give your opinion on whether core_busy_o should be ignored when PULP_CLUSTER = 0 (as I wrote in my text).

In case core_busy_o is only useful for PULP_CLUSTER = 1, should we maybe rename clock_en_i and core_busy_o to pulp_clock_en_i and pulp_core_busy_o respectively to highlight that these signals are part of the PULP Cluster interface?

FP_FMA instantiation

Hi
In the RTL code, you integrated your private FMA.
But in the user manual, you mentioned that the FP-FMA is currently only supported through a Synopsys Design Ware instantiation, or a Xilinx block for FPGA targets.
Could you please help confirm which IP should be instantiated?
Thanks

Table formatting

The table formatting in the Verification Strategy (see https://core-v-docs-verif-strat.readthedocs.io/en/latest/sim_tests.html#virtual-peripherals) is hard to read. The content does not wrap and requires using the scroll bar. The PDF is also not formatted cleanly.

This formatting may possibly be fixed by overriding the theme, as seen in Ibex's conf.py

html_static_path = ['_static']

html_context = {
    'css_files' : [
        '_static/theme_overrides.css', # Fix wide tables in RTD theme
        ],
    }

Add an Integration chapter documenting all parameters and all pins

The CV32E40P User Manual does not have a documentation chapter as is present in e.g. the Ibex User Manual.

Task is to add an Integration chapter documenting all parameters and all pins. The chapter will refer to other sections for the description of groups of pins (e.g. the instruction fetch interface or the interrupt interface). As a side effect of this task pin names and parameters might slightly change if agreed so in the sub-tasks. No new functionality will be added, but there will be a proposal to no longer support NMI (see TBD for details). So, in this case the proposed documentation changes (if accepted) will drive RTL changes.

Here is the list of task that must be closed before this task can be closed:

#46 (Pinout: rename test_en_i to scan_cg_en_i). Status: Closed.
#47 (Pinout: Replace core_id_i and cluster_id_i by hart_id_i). Status: Closed.
#48 (Pinout: Document Auxiliary Processing Unit (APU) interface). Status: Further updates required by @davideschiavone .
#49 (Pinout: Document PULP Cluster pins). Status: Initial pull request made in #55. Content is to be reviewed and pin names are under discussion. Agreed to change the pin name of core_clock_en_i to pulp_core_clock_en_i. The 'busy' and 'sleep pin' will get consolidated into 1 pin (@Silabs-ArjanB will write a documentation proposal for this)
#50 (Pinout: Document IRQ interface). Status: Ongoing discussion in Mattermost.
#51 (Module: Rename riscv_core to cv32e40p _core). Status: Closed.
#52 (Pinout: Replace DM_HALTADDRESS parameter by dm_halt_addr_i pin): Status: Closed.
#58 (Pinout: Document usage of fregfile_disable_i). Status: Closed
#62 (Pinout: Add dm_exception_addr_i pin). Status: Closed

Here is a list of RTL pull request that must be closed before this task can be closed (more might get added if above proposals get accepted):

openhwgroup/cv32e40p#341

Note: The documentation for instruction and load-store-unit interfaces has already been updated earlier. Its pull request #37 is awaiting approval. It is kept out of this ticket as the related RTL changes are relatively large and we do not want to delay the adoption of above (relatively low impact) documentation/RTL updates.

Pinout: Document IRQ interface

As part of finishing the interrupt (and exception) related documentation I propose that we do the following:

Remove NMI functionality (see below for the reasoning)
Unify irq_fast_i and irq_fastx_i (see below for the reasoning)
Make irq_id_o[5:0] and irq_ack_o behave the same for all interrupts (see below for the reasoning)

NMI

The proposal is to remove the NMI (irq_nmi_i) functinoality from the RTL (and replace it by a regular fast interrupt (irq_fast_i[15]). There are several reasons for this:

Existing bugs, e.g. openhwgroup/cv32e40p#336 and openhwgroup/cv32e40p#323
NMI will be standardized in the official RISC-V specification and will surely be non-compatible with our implementation (inherited from Ibex)
Ibex implemented additional (non-standard) CSRs for backing up mstatus.MPP, mstatus.MPIE and mcause, whereas we do not have these features.

Unify irq_fast_i and irq_fastx_i

Proposal is to unify irq_fast_i[14:0], and irq_fastx_i[31:0] into irq_fast_i[47:0] (also replacing NMI by irq_fast_i[15] as described above), to rename MIEX/MIPX to MIE1/MIP1 respectively, and to replace MTVECX by MTVEC. The reasons are as follows:

The user does not care about the differentiation between irq_fast_i and irq_fastx_i (they are all 'fast interrupts')
For the same reason all interrupts should use MTVEC as their vector base address (the irq_fastx_i should not be treated as special with their own MTVECX base register).
The current MTVECX base register adds to the (user perceived) complexity and does not really scale when we would add other modes in future chips; we would get MTBVEC, MTVECX, UTVEC, UTVECX, etc.)
Unifying them allows for straight forward width increase in future projects.
Renaming MIEX/MIPX to MIE1/MIP1 allows for straight forward fast interrupt addition in future projects (we would just add MIE2/MIP2, etc., not MIEXX/MIPXX)

irq_id_o and irq_ack_o

irq_id_o and irq_ack_o currently do not apply to all interrupts, i.e. they are not used for irq_fastx_i[31:0]. As indicated the idea is to treat all interrupts in the same manner and the proposal therefore is to extend irq_id_o by 1 MSB and also signal irq_ack_o for what are now called the irq_fastx_i. Issue was first raised in openhwgroup/cv32e40p#243

A pull request describing all above changes is ready to be made.

This task also relates to the following (let's address all these interrupt related things at once; I will set up a meeting to discuss)

Vplan for RV32F single-precision floating-point instructions

RV32F single-precision floating-point instructions have been categorized as a "Tentative" feature for CV32E40P. Recall from the CV32E40 Features and Parameter that a tentative feature is considered low priority and are allowed to be partially verified (they are the first features for which verification can be skipped if so demanded by the schedule). The absence setting of a tentative feature does need to be verified within the schedule.

Its important to capture the RV32F Vplan early so that we can make an early determination about whether this Feature will be fully verified in time for first silicon.

Ideally, whoever captures this Vplan would have previous experience with IEEE-754.

Please use the Vplan template and review the HOWTO document prior to starting this task.

The Vplan spreadsheet shall be committed to the Base Instruction Set directory.

Pinout: Document usage of fregfile_disable_i

The usage of the fregfile_disable_i is not documented and not clear. It might not even be needed anymore.

A relevant part of the code related to this is the following from https://github.com/openhwgroup/cv32e40p/blob/master/rtl/riscv_id_stage.sv:

assign fregfile_ena = FPU && !PULP_ZFINX ? ~fregfile_disable_i : '0;

Task is to figure out how this pin is supposed to be used and then to either document it or remove it (if the use case is no longer valid).

Bit Manipulation: what happens when indexes are out-of-bounds?

This issue relates to Section 14.3.1 "Bit Manipulation Operations" of the RI5CY User Manual. Most of the bit manipulation instructions have two operands to select a slice from rs1. For obvious reasons reasons the sum of these operands must be less than or equal to 32.

What happens if the sum is greater than 32?

openhwgroup / programs Goto Github PK

programs's Introduction

programs

TWG

TG

cores-task-group

hw-task-group

verification-task-group

OpenHW-Project-Descriptions-and-Plans

OpenHW-dashboard

process

Attendance-tracking

Issues and Troubleshooting

programs's People

Contributors

Stargazers

Watchers

Forkers

programs's Issues

add x21, x21, x20

Move HWLoop CSRs into the CSRs section of the doc

Task Title

Background information

Additional context

Background information

Document Template

Document Location

Background information

Document Template

Location for Completed Documentation

Additional context

Background information

Document Template

Location for Completed Documentation

Type

Location of Issue

Additional context

Update Interrupt Documentation on CV32E40P

Background information

Document Template

Document Location

Add code example for HWLoop

Recommend Projects

Recommend Topics

Recommend Org