Giter Site home page Giter Site logo

Comments (7)

kito-cheng avatar kito-cheng commented on July 18, 2024

Proposal for type system and API:

Vector Types for Fractional LMUL:

v{TYPE}{SEW}m{LMUL}_t

  • Type = vint | vuint | vfloat
  • SEW = 8 | 16 | 32 | 64
  • LMUL = f8 | f4 | f2 | 1 | 2 | 4 | 8
  • e.g.
    • vint32mf2_t for LMUL=1/2 SEW=32

Changes:

  • Add f2, f4 and f8 to LMUL to reflect type system change.

Vector Tuple Types for Fractional LMUL:

v{TYPE}{SEW}m{LMUL}x{NF}_t

  • Type = vint | vuint | vfloat
  • SEW = 8 | 16 | 32 | 64
  • LMUL = f8 | f4 | f2 | 1 | 2 | 4 | 8
  • NF = 1 | 2 | 3 | 4 |5 | 6 | 7 |8
  • LMUL x NF < 8
    • constrained by HW.
  • e.g.
    • Add f2, f4 and f8 to LMUL to reflect type system change.
    • Update constraint, because LMUL < 1 still occupy 1 vector register.

Changes:

  • Add mf[2|4|8] to LMUL to reflect type system change.

Changes to Intrinsic API Naming Rules:

INTRINSIC ::= MNEMONIC '_' RET_TYPE
MNEMONIC ::= Instruction name in v-ext specification. Replace '.' with '_'.
RET_TYPE ::= SEW LMUL
SEW ::= ( i8 | i16 | i32 | i64 | u8 | u16 | u32 | u64 | f16 | f32 | f64 )
LMUL ::= ( mf8 | mf4 | mf2 | m1 | m2 | m4 | m8 )

Changes:

  • Add mf[2|4|8] to LMUL to reflect type system change.

Issue for Fractional LMUL

  • Unlike integer/non-fractional LMUL, some fractional LMUL configuration will raise illegal instruction exception under certain HW configuration.
    • 3.3.2. Vector Register Grouping (vlmul[2:0]) from v-spec "Implementations must support fractional LMUL settings for LMUL ≥ SEW/ELEN, for the ELEN value at LMUL=1, which ensures there is space to store at least one element. An attempt to set an unsupported SEW and LMUL configuration sets the vill bit in vtype."
    • e.g. vint64mf4(SEW=64, LMUL=1/4) not supported on HW with ELEN=64, VLEN=128
    • According spec, HW with ELEN=64, VLEN=256 might not support
      vint64mf4(SEW=64, LMUL=1/4).
    • Possible solution:
      • Add compiler option to assume the minimal VLEN on the target machine.
        • e.g. -mmin-vlen=256
      • Add compiler option to assume the minimal SEW on the target machine.
        • e.g. -mmin-sew=128
      • Add compiler option to enable certain fractional LMUL type.
        • e.g. -mflmul=all, -mflmul=no, -mflmul=64mf8, -mflmul=64mf4
      • Add minimal VLEN requirement or used fractional LMUL type list in ELF attribute.

from rvv-intrinsic-doc.

rdolbeau avatar rdolbeau commented on July 18, 2024

Seems OK to me; I like the idea of extending ELF for such requirements. Might be generally useful for extensions in general (i.e. have ELF attribute for V, some properties of V, but also B, ...).

from rvv-intrinsic-doc.

Hsiangkai avatar Hsiangkai commented on July 18, 2024

It looks good to me.

from rvv-intrinsic-doc.

David-Horner avatar David-Horner commented on July 18, 2024

@kito-cheng What is NF? it is not immediately apparent from

Fractional LMUL is not the only disruptive change.

LMUL no longer stripes vertically, SLEN determines a horizontal interleave instead.
As a result

  • the poor man's 128 shuffle when SLEN=64 no longer works.
  • if SLEN=1/2 * VLEN then all even elements are clustered (consecutively stored) together in low bytes of each physical register and all odd elements are clustered in the upper 1/2 bytes of the register.
  • thus if vl <VLMAX * LMUL there is a gap (tail) in the middle as well as the end of the (last) physical register.
  • if SLEN=1/4 * VLEN the interleave is by 4, with clustering of modulo 4 elements, and if vl<VLMAX * LMUL 4 gaps on at the end and 3 in the middle exist.

It is no longer the element order that gets shuffled, but only when LMUL>1.
Instead even at LMUL=1 different Element length affects element content.

The element length and VLEN/SLEN determine the alignment structure.
Thus if VLEN/SLEN > 1, component bytes of elements are no longer in in-memory order.
Load MAXLV bytes into a register, then the half-words read from the register will have every other byte from memory in their upper and lower haves. Same type of story for word, none of those bytes will be from consecutive locations in memory.

The good and the bad of this is that most initial implementations are expected to have VLEN=SLEN.
Those that do have SLEN<VLEN may well jump to SLEN=1/4 VLEN or 1/8th, as it is expected that only the higher performance larger VLEN will need to limit SLEN due to wiring issues.
So, SLEN = 1/2 VLEN, which is a nice match for register pair processing (e.g. Complex numbers) is going to be rare.

But all the code needs to accommodate the in-register format not matching in-memory.
There are suggestions on how to mitigate this in hardware.
These intrinsics should be prepared to differentiate between in-memory-order agnostic and reliant structures/operations. The good news is that most operations are in-memory-order agnostic. e.g. all single width arithmetic. Even most mixed width operations are not going to care. But any sub-element component manipulation will need to be aware and careful.

Finally, I have noticed discussions about matching of masks under a given element length and LMUL with another element length or LMUL. Given that it is a definite concern and apparently at least moderately frequent in real code situations, you should know of a proposal for mask support that is ordinal based. Regardless of Element Length or LMUL the nth mask bit applies to the nth vector element. In all cases, a single bit is used to store the mask value. The issue is #448 in the riscv/riscv-v-spec github.

from rvv-intrinsic-doc.

David-Horner avatar David-Horner commented on July 18, 2024

Well, not so finally apparently.

Another thing to mention:

Because LMUL no longer does vertical striping, but horizontal interleave, each physical register has the same characteristics. Physical registers are filled consecutively. This means the register grouping by powers of 2 is no longer a constraint. So, LMUL can take on all values between 1 and 8. This is good for intrinsics that can use a value of say 6, freeing up a register pair for two mask registers or a further m2 variable. A second vsetvl[i] instruction with a limiting AVL is necessary (currently), but as mentioned elsewhere in these comments it can be a low cost operation and the tradeoff is definitely worth it in some scenarios.
(I also will be proposing an LMUL to 3,5,6 or 7 option based on ideas in riscv/riscv-v-spec github issue #418 , although that targeted the 0.8 structure.)

from rvv-intrinsic-doc.

kito-cheng avatar kito-cheng commented on July 18, 2024

@kito-cheng What is NF? it is not immediately apparent from

NF meaning NFIELDS, which is the term from segment load/store, vector tuple type are used for segment load store intrinsic API.

You can see this issue for more detail: https://github.com/sifive/rvv-intrinsic-doc/issues/11

from rvv-intrinsic-doc.

eopXD avatar eopXD commented on July 18, 2024

Fractional LMUL is now defined and implemented in RVV intrinsic. Closing this issue.

from rvv-intrinsic-doc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.