Introduction In order to supporting segment load/store, I would li

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

RFC: New API and types for supporting segment load/store. about rvv-intrinsic-doc HOT 15 CLOSED

riscv-non-isa commented on July 18, 2024

RFC: New API and types for supporting segment load/store.

from rvv-intrinsic-doc.

Comments (15)

rdolbeau commented on July 18, 2024 1

@knightsifive FFTW3 is peculiar, as it forces you to fit in the macro-based SIMD model (one of the reason it's my go-to test... baptism by fire). I do have DFTs and FFTs using Zvlsseg. I also have an FFTW3 using it, but it won't compile as I need to fit two registers in a single datatype - so I need "vfloat64m1x2_t", or an equivalent struct or array :-(

The bad news is that while doing that "split format" works fine with AVX-512, the resulting codelets won't be used by FFTW3 because they are lower-performing: too much spilling compared to the 'interleaved' version...

Edit: typo

from rvv-intrinsic-doc.

ebahapo commented on July 18, 2024

Hopefully, the new V types would be based on aggregates.

Thus, vint32m2x3_t would be defined as:

typedef vint32m2_t vint32m2x3_t[3];

from rvv-intrinsic-doc.

kito-cheng commented on July 18, 2024

Based on aggregates or array type sounds good to me, and that meaning we could use subscript to access element or initialize the tuple like array to make the code clearly like below:

typedef vint32m2_t vint32m2x3_t[3];
vint32m2x3_t vt;
vint32m2_t va

vt[0] = va;
va = vt[0];

vint32m2x3_t vt2 = {va, va, va};

The only concern to me is that's mean we should allow user can declare array with vector type and/or declare struct with vector type, or disallow user to use that but compiler can use it, and that the part we didn't have much discussion yet.

Some reference here:

SVE disallow declare array with scalable type declare.
SVE disallow user declare struct with scalable type field, but IIRC, it allowed in earlier ACLE implementation.
SVE only allow using intrinsic to access value from scalable vector tuple type.

Note: I list how SVE do is not mean I think we should same as SVE, but for reference to discussion.

from rvv-intrinsic-doc.

rdolbeau commented on July 18, 2024

@kito-cheng I don't think SVE 'disallow'. It's just not supported yet... to be confirmed.

from rvv-intrinsic-doc.

kito-cheng commented on July 18, 2024

@rdolbeau yeah, let me check the ACLE/SVE spec again, I just checked their open source GCC implementation only, IIRC they allow declare struct with scalable vector type with limitations, one limitation is all filed must be scalable vector type if there is at least one scalable vector field, anyway, I'll update after check the spec.

from rvv-intrinsic-doc.

rdolbeau commented on July 18, 2024

@kito-cheng I've a ticket open with Arm on the subject ;-), as it would be useful in some cases. An example for RVV (less so SVE as it has HW support for complex so interleaved is fine):

struct {
  vfloat64m1_t real;
  vfloat64m1_t imaginary;
};

RVV has terrible support for interleaved complex (i.e. 'real' in even lanes, 'imaginary' in odd lanes, or 'vector of struct') as in-register data manipulation rely on the very generic 'vrgather'. It's much more efficient to use split complex (two different registers, or 'struct of vector') ... but then some software infrastructure will require a single data construct to hold the complex values, hence the need for either 'struct' or 'array' of vectors...

Of course, the split format requires 2x the registers (and 2x the parallelism) which creates additional spilling but that's a different issue :-(

Edit:just prettier code

from rvv-intrinsic-doc.

kito-cheng commented on July 18, 2024

Latest ACLE for sve I can found on there website, there spec say sizeless / scalable vector can't used on array, struct, union or class:

https://developer.arm.com/docs/100987/latest/arm-c-language-extensions-for-sve

Page 18.
Sizeless types may not be used in the following situations:
...
• as the type of an array element;
...
...
In all other respects, sizeless types have the same restrictions as the standard-defined incomplete types.
This specifically includes (but is not limited to) the following:
• Members of unions, structures and classes cannot have sizeless type.
...

from rvv-intrinsic-doc.

kito-cheng commented on July 18, 2024

@rdolbeau
I thought segment load/store in RISC-V should provide more powerful capability to deal those issue?

This struct is equivalent vfloat64m1x2_t in this proposal, and using vseg_load_f64m1x2 (vlseg2e.v) can load vector of struct, that might what you want?

struct {
vfloat64m1_t real;
vfloat64m1_t imaginary;
};

from rvv-intrinsic-doc.

rdolbeau commented on July 18, 2024

@kito-cheng Yes, but Zvlsseg is an extension so might no be available.

And yes, the structure is the same, it's a use case to justify we do want to have this ability :-) - and ideally not restricted to vfloat64m1x2_t but potentially more general

Indeed, this example would be well served by vfloat64m1x2_t if you can then access the two halves of the structure/array independently for further per-member computations:

vfloat64m1x2_t conjugate(const vfloat64m1x2_t input, const int requested_vl) {
  vfloat64m1x2_t result = input;
  result[1] = vneg_v_f64m1(result[1], requested_vl); // [0] is real, [1] is imaginary
  return result;
}

Edit: add the requested_vl in the code to be coherent with my own argument in #8 ;-)

from rvv-intrinsic-doc.

nick-knight commented on July 18, 2024

@kito-cheng Yes, but Zvlsseg is an extension so might no be available.

... says the guy proposing all the SLEN-dependent hacks :P

But seriously, note the following commentary in the V-extension spec:

Note | This set of instructions is intended to be included in the base "V" extension.
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#78-vector-loadstore-segment-instructions-zvlsseg

Obviously this is non-normative text, and the V-ext has not yet been ratified. However, my understanding is that this commentary is indicative of the task group's momentum and current thinking. I've been fighting hard for this, since last July, for exactly the application we're discussing. And for what it's worth, all of SiFive's vector cores support Zvlsseg.

from rvv-intrinsic-doc.

rdolbeau commented on July 18, 2024

@knightsifive Good, because Zvlsseg is a good (great) thing I believe, and as LMUL already requires some level of 'striping'/'interleaving' ability (at SLEN), I would imagine it's not exceedingly difficult to also do it at SEW (not a hardware implementation guy, so I might be wrong).

from rvv-intrinsic-doc.

nick-knight commented on July 18, 2024

@rdolbeau when first I learned (December 10, 2019) that your FFTW port wasn't using segmented loads and stores, I cried.

To my understanding, the real challenge for Zvlsseg implementation is handling all the odd cases (e.g., RGB values) plus the fact that the "register groups" need not be aligned to powers of two. (LMUL groups are powers of two, and aligned commensurately.)

from rvv-intrinsic-doc.

kito-cheng commented on July 18, 2024

Status update:

I start to implement on GCC side, and plan to implement vector tuple type as primitive type at first stage, and extend to using aggregate or array next stage.

I think the back-end implementation should be similar, no matter we defined vector tuple type as primitive, aggregate or array, the only difference is what kind of syntax / operator we need to support in the front-end.

from rvv-intrinsic-doc.

Hsiangkai commented on July 18, 2024

We could discuss how to represent the tuple types and related C operators in another new issue.

from rvv-intrinsic-doc.

kito-cheng commented on July 18, 2024

https://github.com/sifive/rvv-intrinsic-doc/issues/17 for further discussion on vector tuple type

from rvv-intrinsic-doc.

RFC: New API and types for supporting segment load/store. about rvv-intrinsic-doc HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent