riscv / riscv-isa-manual Goto Github PK

View Code? Open in Web Editor NEW

3.3K 3.3K 579.0 34.47 MB

RISC-V Instruction Set Manual

Home Page: https://riscv.org/

License: Creative Commons Attribution 4.0 International

Makefile 5.07% TeX 78.57% Assembly 16.36%

riscv-isa-manual's People

Contributors

Stargazers

Watchers

Forkers

dshorner michaeljclark nimblemachines hairyfotr salewski jcb62281 mwachs5 edqq holo3344 richardxia rems-project zmole945 rmccrary respasa roalogic jameslinus asb neomatrixcode rgvandewalker terudeca daniellustig bonzini riscvjp louwangzhiyuy zakkak bshanks davidouskayamamu pranith i4kimura lukw00heck artisdom handsome3163 premjithav exiahan mswdwk devopsmi gameboo originalcore maranget fshaked rachelsqh innothoughts seldridge diekmann drenfongwong allenjbaum charlesxu pyway felixonmars vjkqxz riscv-zh tymcauley atishp04 zhuzhengchao billibilli p07r0457 t-j-teru jerry-jho vithurson rbarraud sunshinelin12138 ccelio zarubaf riddicksky lvcargnini pys1024 kdockser wallento olofk kn-neelalohitha neelgala manmatha-roy dp-sc occupymars2025 tommythorn lowrisc giwii rubinliao yueluohub conley-hong 13824125580 anabutko davidclevenger cristicmf marceg syntacore gary15757 benjaminselfridge lxing1988 spetrelmicro brucehoult cseuk6 crixalis2013 fedy0 circuit-killer pmundkur mengmingliang wdc-pnl joshuascheid royac6

riscv-isa-manual's Issues

Reserve mtvec[1:0], stvec[1:0] for future vectored interrupt/exception extension

Not especially important for Unix-like systems, so probably only an optional M-mode feature.

Setting LSB of mtvec could enable vectored interrupt mode. Could place interrupt vectors below mtvec, and the single trap vector at mtvec. That is, if it's an interrupt, jump to {mtvec[(XLEN-1) : (log2(XLEN)+2)], (~cause << 2)}.

FMV.X.D and FSD on single-precision values

I believe the following statements in the spec are necessary but insufficient:

If a floating-point register holds a single-precision value, it is guaranteed that a FSD of that register will place a value into memory that when reloaded with a FLD will recreate the original single-precision value in a register. The data format that is stored in memory is undefined beyond having this property.

If the last value written to the source floating-point register was a single-precision floating-point value, then the value returned by FMV.X.D is undefined beyond having the property that moving the value back to a floating-point register will recreate the original single-precision value.

FSD and FMV.X.D should be defined to create the same implementation-defined values as each other, and FLD and FMV.D.X should restore them equivalently. In particular, FSD followed by LD and FMV.D.X should properly recreate the single-precision value, as should FMV.X.D followed by SD and FLD.

Revisit C spec and push forward from v1.9 -> 2.0

Not a priority right now, but has to also advance to ratified for RV32 and RV64 this year.

Document device tree binding

We need something somewhere that defines the allowed properties for RISC-V harts, PLICs, and any other node types we create.

Another issue is that some resources are potentially shared between M-mode and S-mode; e.g. a UART could be documented in the device tree, but also used to implement SBI_CONSOLE_PUTCHAR; S-mode payloads need to know this and not try to use both at the same time. There's probably an established solution to this somewhere.

Basic virtualization leaves no way to inject "external" interrupts?

You can inject a trap by fiddling with pc/sepc/secause/stval, but there's no way to make sip.SEIP be true without cooperation from the PLIC, which makes guests that use WFI-loops tricky to support without falling back to classical virtualization whenever there is a pending interrupt?

Codify notion of M-mode profiles

Some U-mode features are not available in some M-mode-only implementations, like the cycle/time/instret CSRs and misaligned memory accesses. Emulation software (and software that doesn't want to rely on emulation software) needs to know what the hardware actually provides. Define one or more standard M-mode profiles to simplify software targeting.

Ordering of A/D bits to PTE reads

Overloading SFENCE.VM for this purpose has obvious downsides.

Permit PTE A/D bits to be updated by either software or hardware?

It's easy to spec that it can happen either way, and keeps simple implementations simple.

Classify and advertise optional feature under standard extensions

There are several features in the spec which are optional ( or so I understand that). Some are visible to user mode although I refer here in general for privilege modes as well. For example the performance monitor counters are optional, also the amount is optional and there are many of those so might be considered expensive with some implementations. Now we have standard extensions for user mode feature groups and there is an enumeration scheme in misa register to tell SW what is supported and what not. I suggest:

classify optional features to "extensions" e.g. perfmon extension
For features related to SW development (perfmon, debug) there may be benefit to allow more flexibility than with functional features (such as Floating Point). For example an implementation may support perfmon with limited amount of counters. So there should be a way to tell SW how much of such extension is actually supported.
Suggest to allow user mode to read the feature advertisement (misa)

Inconsistency on the Stack Pointer Alignment Specification

I've been studying "The RISC-V Instruction Set Manual Version 2.1", and find an inconsistency.
It is in Chapter 20 Calling Convention. It seems that the chapter is not included in this repository, but I could not find any other places. Let me know if there is a better place.

On P.109 the first paragraph says;

... the stack pointer is always kept 16-byte aligned.

On the other hand "20.3 Soft-Float Calling Convention" says;

... and the stack discipline is the same except that the stack is only kept aligned to XLEN/8-byte boundaries ...

And the footnote says;

The reduced stack alignment saves space in the memory-constrained systems that
might commonly use soft floating-point.

The first paragraph on P.109 should be;

... the stack pointer is always kept 16-byte aligned unless the soft-float calling convention is not used (see 20.3).

This is a trivial issue, but a real problem is in the definition of C.ADDI16SP instruction, which is also for memory-contrained systems.
It assumes the stack pointer is kept 16-byte aligned and is useless if the stack pointer is not 16-byte aligned.

There should be some description about this restriction in the description of the instruction on page 86.

And the footnote on the page;

In the standard RISC-V calling convention, the stack pointer sp is always 16-byte aligned.

also should be fixed.

Move VM configuration into SPTBR

For RV32:
sptbr[31] = vm disabled/Sv32
sptbr[30:22] = ASID
sptbr[21:0] = PPN

For RV64:
sptbr[63:61] = vm disabled/reserved/reserved/reserved/Sv39/Sv48/Sv57/Sv64
sptbr[60:45] = ASID
sptbr[44:38] = reserved
sptbr[37:0] = PPN

mideleg/medeleg should not exist if S-mode (or U-mode + N extension) is not implemented

Similarly, sideleg/sedeleg should not exist if U-mode + N extension are not implemented.

Global trap-enables instead of interrupt-enables

(We haven't decided whether we want to follow through with this proposal.)

Currently, if supervisor takes an exception delegated to supervisor, it always is delegated to supervisor, even recursively. This makes it hard to debug double faults (or handle them specially in the monitor).

The proposal, from Andrew Lutomirski, is to change sstatus.SIE to sstatus.STE (supervisor trap enable). As before, when STE=0, interrupts are disabled; additionally, no exceptions will be delegated down to S. So a recursive exception will go to H-mode or M-mode, even if that exception was delegated to S-mode.

OTOH, it really is possible to do this in software already; some OSes expect the notion of a global interrupt-enable; and it is less desirable for M-mode than for other modes.

Support delegation of machine timer & software interrupts to S-mode

...to support bare-metal OS without SBI.

Clarify that virtual and physical address space are circular

There's been some confusion about this on the list in the past, but I think we intend that:

load/store address generation overflows silently
PC incrementing overflows silently
*(uint32_t*)(uintptr_t)-1 is treated like any other page-spanning access, and will (possibly after fixup) generate accesses to the first and last pages of memory if both are accessible
the same rules apply in M-mode or with paging disabled, although in practice the circle is likely to be broken by a PMA or PMP boundary

Should base field in misa be renamed to MXL for consistency?

Reconsider SFENCE.VM/ASIDs

Presently, SFENCE.VM takes its ASID argument from the sptbr. Since any PTE the sptbr points to can be speculatively loaded into the TLB, you don't want to just modify sptbr.asid arbitrarily. So, in an OS like Linux, you might flush entry X in ASID Y as follows: (recall ASID 0 is the global ASID, and assume the kernel maintains a page table that consists only of its global mappings)

tmp <- sptbr
sptbr <- asid = Y, ppn = global page table
sfence.vm X
sptbr <- asid = tmp.asid, ppn = global page table
sptbr <- tmp

It's not ideal for simple implementations that always flush the TLB on sptbr writes, and it imposes the (minor) constraint that this code sequence is mapped into all address spaces. And ASID 0 being special is an annoying quirk.

Here is an alternative proposal:

ASID 0 is no longer special.
SFENCE.VM takes two arguments: rs1=addr, rs2=asid.
When rs1=x0 and rs2=x0, it orders prior stores with all subsequent translations.
When rs1=x0 and rs2!=x0, it orders prior stores to non-Global PTEs with subsequent translations for the specified ASID.
When rs1!=x0 and rs2=x0, it orders prior stores to PTEs corresponding to virtual address rs1 with all subsequent translations.
When rs1!=x0 and rs2!=x0, it orders prior stores to non-Global PTEs corresponding to virtual address rs1 with all subsequent translations for the specified ASID.

This requires us to use funct3=4 in the SYSTEM opcode, since funct3=0 has opcode bits within rs2.

Guidance requested: instruction for spin-wait loops

Linux is presently using div a5,a5,zero for cpu_relax (actually the register is variable but it winds up being a5 in 80% of sites); glibc and Go currently generate no instructions for atomic_spin_nop and runtime.procyield.

Add rationale for divide-by-zero choice

Shouldn't be undefined.
-1 was value seen in other simple implementations and drops out of simple hardware implementations.

rename ptbr to atp?

problem: *ptbr has "r" on end for register, which is not convention (except legacy fcsr), also contains more than a page table pointer, and not all uses will include a page table pointer.
proposal: *atp for "address translation and protection"

ECALL and EBREAK should be documented somewhere

The user spec supports multiple privileged architectures so it is vague about what actually happens. The standard privileged architecture needs to define the behavior of ECALL and EBREAK in terms of trap handling.

Specify PMPs

Current plan is to mirror breakpoint design. Include feature to protect M-mode from itself, and a lock bit that makes a PMP read-only until next reset.

mhcounteren -> mcounteren; mucounteren -> scounteren

Like the interrupt-delegation CSRs, the counter-enable CSRs should belong to the privilege mode above the mode they control. mcounteren should control whether any lower mode can use the counters; scounteren should control whether user mode can use them. If a bit is clear in mcounteren, it should appear to be hard-wired to 0 in scounteren.

Make WFI M-mode-only, or add an mstatus bit to trap WFI

Locking idioms are incomplete

The user spec provides this guidance:

We recommend the use of the AMO Swap idiom shown above for both lock acquire and release
to simplify the implementation of speculative lock elision [25].

I've been puzzling over this for months and I don't currently think it's possible to implement pthread_mutex_lock using a single amoswap.w.aq and amoswap.w.rl in the uncontended path while supporting existing futex/WaitOnAddress APIs, PTHREAD_PROCESS_SHARED, and fairness.

(In particular, if a mutex is released with a single AMOSWAP, then it goes directly from "contended" to "unlocked", which allows another process to acquire the mutex before the wait list can be processed.)

precise traps on misaligned accesses for PMAs without misaligned accesses in M-mode

Replace RISC-V specific Vendor ID in mvendorid with JEDEC manufacturer ID

This would avoid duplication of effort, given that chip makers already need/have a JEDEC code. It would also free Foundation from managing the vendor ID list. Registering a JEDEC code costs $500, and can be used for all chips from a manufacturer. mvendorid can always return 0 if no code available.

Remove H-mode for now

Add commentary explaining removal of Mbb

Between PMPs and PIC, many use cases are covered.

Worth mentioning that mstatus space is reserved for that purpose.

sign extention of shamt for C.SLLI/C.SRLI/C.SRAI in RV128

In both V2.1 and V2.2 the C.SRLI and C.SRAI shamt are sign extended in RV128.

Furthermore, the shift amount is sign-extended for RV128C, and so the legal shift amounts are 1–31, 64, and 96–127. C.SRLI expands into srli rd 0 , rd 0 , shamt[5:0], except for RV128C with shamt=0, which expands to srli rd 0 , rd 0 , 64

Please confirm that the sign extension is only to the rightmost 7 bits of the immediate in the RV128 SRLI and SRAI.

C.SLLI does not mention the sign extension, but only the 0 as 64 for RV128.

For RV128C, a shift amount of zero is used to encode a shift of 64. C.SLLI expands into slli rd, rd, shamt[5:0], except for RV128C with shamt=0, which expands to slli rd, rd, 64.

Please confirm that shamt for RV128 C.SLLI is intended to function as for RV128 C.SRLI/C.SRAI.

(Clifford's query about C.ADDI and rd/rs1 == 0 got me looking at the compressed instructions again)

C nits

No indication in the tables of which immediates are sign extended (could use simm/nzsimm to make the table self-contained)
LI is falsely listed as rs1/rd
SLLI is falsely listed as rd only
ADD should be rs1/rd

Document `N' extension

MXR for PMPs, not just paging?

MXR is currently defined to affect only paging. I think we need to define MXR to affect PMPs, not just paging. If a PMP marks a region is U-mode execute-only, it is currently impossible to use the MPRV mechanism alone to emulate that instruction.

sstatus/mstatus SPV bit

Even though we won't spec the Extended Virtualization scheme just yet, we should add one tiny piece of it to mstatus/sstatus now. This bit is necessary to emulate Extended Virtualization on systems that don't have it:

SPV ("Supervisor Previous Virtualization mode")

If S mode is not supported, this bit is a read-only zero.

Bit SPV affects the behavior of an SRET instruction.  If extended
virtualization is not supported, and SPV = 1, SRET causes an invalid
instruction trap.  If extended virtualization is supported, the
effect of SRET when SPV = 1 is not yet documented.

(Bit SPV is required to exist even when extended virtualization is
not supported so that the behavior of extended virtualization may be
emulated through software if desired.)

Interrupt/Exception Priorities

Revisit, and improve explanation of fixed interrupt/exception priority ordering when multiple events are ready to cause a trap.

SXL/UXL vs. suisa/msisa

Document 1.10 SBI

Since the old page-injection SBI was documented here, the new one is probably in scope as well. Preferably also document the basic chainloading protocol (a0, a1, and sptbr) supported by Linux (although Linux has very specific rules about the FDT layout that are probably out of scope for an ISA manual), and pin down a numeric value for ENOSYS instead of picking up whatever newlib defines.

(Would a PR draft be useful?)

What is the motivating use case for writable misa?

After the discussion on riscvarchive/riscv-qemu#59 I realize I don't know myself.

Feedback Comments for C spec

P.86 Integer Register-Immediate Operations

First it was not clear for me why dest==0 is allowed C.ADDI, even though it is not allow for C.ADDIW.
If I understand correctly, it is for C.NOP which is encoded as c.addi x0, 0. I think it is better to have some description about it here.

CS format instructions on P.88

These 6 arithmetic instructions, C.AND, C.OR, ... C.SUBW, are categorized in the CS (Store) format.
But the format is different from any other formats (See Table14.1).
I think we should give them a unique format name, fox example CA (Arithmetic or ALU) format.

Sv48 cannot be levelled

VM migration across heterogeneous machines is typically done by preventing VMs that need to be migrated from seeing any feature which is not available in the entire migration pool (e.g. "minimal CPUID" in #30 (comment)). In general, this works for features where support is explicitly indicated, and does not work for features where support is probed.

Fortunately, most WARL fields in the privileged architecture are in M-mode CSRs and M-mode is not intended for nonclassical virtualization. There are however two exceptional S-mode WARL fields which could cause migration trouble:

MODE in SATP: if the cluster contains some Sv39 hw and some Sv48 hw, there is no way to force OSes to use Sv39 in VMs that will be migrated
UXL in sstatus

Should we remove CSR shadow register convention?

Pro:

we expand available CSR address space
reduce number of register names in document
some registers might have more complex behaviors under new virtualization scheme
Con:

need more decoding to determine permissions

Misaligned loads/stores and LR/SC on I/O regions should generate PMA violation (access) exceptions

Clarify stval for misaligned accesses which cause page faults

Suppose page 0x80 is valid, page 0x81 is invalid, and the following is executed:

lui a0, 0x81
lw a0, -1(a0)

Does stval get written with the access address 0x80fff, the start address of the faulting page 0x81000, or an unspecified faulting address within the access 0x81000 - 0x81002 ? riscvemu, qemu, and bbl all do the third option, and I don't think anything else makes sense, but it should be specified.

( @s-macke brought this to my attention )

Document pseudoinstruction behavior of add, slt, etc in §20

riscvarchive/riscv-binutils-gdb#79

Consider adding mbadinst CSR

If omitting features like misaligned memory accesses and certain CSRs is going to be the most common implementation strategy, it may be worth provisioning an optional mbadinst register that holds the instruction word that most recently caused an illegal instruction trap. This saves several instructions and a likely D$ miss in the emulation path.

A similar template in H/S modes could improve virtualization performance for the same reason.

If the instruction is >XLEN bits, the register would only reflect the LSBs.

(Implementation note: if illegal instructions are detected early & in-order in the pipeline, it's not necessary to pipeline the whole instruction word down; one microarchitectural shadow copy suffices.)

Inconsistency between vl table and text

The current V draft states "The {\tt vl} register is updated with the minimum of AVL and MVL". The table just before this statement obviously shows slightly more complex rules.

does rem*w sign extend result on divide by zero?

From John Chen email on Jan 6, 2017:
"REMW and REMUW instructions are only valid for RV64, and provide the corresponding signed and unsigned remainder operations respectively. Both REMW and REMUW sign-extend the 32-bit result to 64 bits."
and for divide by zero
"The remainder of division by zero equals the dividend."
It seem the above description is conflicted .

mepc, sepc, mtval, stval should be WARL values

They don't need to be a full XLEN bits.

mepc and sepc only need to hold physical/virtual addresses. To distinguish valid addresses from invalid ones, one additional bit may be necessary--e.g., for Sv39, a 40-bit sepc is necessary to distinguish that bits 63:39 weren't copies of bit 38.

mtval and stval need to be at least as wide, but may be also be wider to support holding 48/64-bit instructions, if that feature is implemented.

This is kind of awkward to spec, but it's a lot of flops, so it's worth saving them.

Behavior of WFI when interrupts are implicitly disabled by privilege level

If a hart executes a WFI in M-mode with mstatus.MIE = mstatus.SIE = 1 and a pending S-mode interrupt, is the WFI required to continue immediately (because there is a pending interrupt), or block (because interrupts are enabled but none can be taken)? After reading the description a couple times I think it's the former but it could be clearer.