Comments (15)
@knightsifive FFTW3 is peculiar, as it forces you to fit in the macro-based SIMD model (one of the reason it's my go-to test... baptism by fire). I do have DFTs and FFTs using Zvlsseg. I also have an FFTW3 using it, but it won't compile as I need to fit two registers in a single datatype - so I need "vfloat64m1x2_t", or an equivalent struct or array :-(
The bad news is that while doing that "split format" works fine with AVX-512, the resulting codelets won't be used by FFTW3 because they are lower-performing: too much spilling compared to the 'interleaved' version...
Edit: typo
from rvv-intrinsic-doc.
Hopefully, the new V types would be based on aggregates.
Thus, vint32m2x3_t
would be defined as:
typedef vint32m2_t vint32m2x3_t[3];
from rvv-intrinsic-doc.
Based on aggregates or array type sounds good to me, and that meaning we could use subscript to access element or initialize the tuple like array to make the code clearly like below:
typedef vint32m2_t vint32m2x3_t[3];
vint32m2x3_t vt;
vint32m2_t va
vt[0] = va;
va = vt[0];
vint32m2x3_t vt2 = {va, va, va};
The only concern to me is that's mean we should allow user can declare array with vector type and/or declare struct with vector type, or disallow user to use that but compiler can use it, and that the part we didn't have much discussion yet.
Some reference here:
- SVE disallow declare array with scalable type declare.
- SVE disallow user declare struct with scalable type field, but IIRC, it allowed in earlier ACLE implementation.
- SVE only allow using intrinsic to access value from scalable vector tuple type.
Note: I list how SVE do is not mean I think we should same as SVE, but for reference to discussion.
from rvv-intrinsic-doc.
@kito-cheng I don't think SVE 'disallow'. It's just not supported yet... to be confirmed.
from rvv-intrinsic-doc.
@rdolbeau yeah, let me check the ACLE/SVE spec again, I just checked their open source GCC implementation only, IIRC they allow declare struct with scalable vector type with limitations, one limitation is all filed must be scalable vector type if there is at least one scalable vector field, anyway, I'll update after check the spec.
from rvv-intrinsic-doc.
@kito-cheng I've a ticket open with Arm on the subject ;-), as it would be useful in some cases. An example for RVV (less so SVE as it has HW support for complex so interleaved is fine):
struct {
vfloat64m1_t real;
vfloat64m1_t imaginary;
};
RVV has terrible support for interleaved complex (i.e. 'real' in even lanes, 'imaginary' in odd lanes, or 'vector of struct') as in-register data manipulation rely on the very generic 'vrgather'. It's much more efficient to use split complex (two different registers, or 'struct of vector') ... but then some software infrastructure will require a single data construct to hold the complex values, hence the need for either 'struct' or 'array' of vectors...
Of course, the split format requires 2x the registers (and 2x the parallelism) which creates additional spilling but that's a different issue :-(
Edit:just prettier code
from rvv-intrinsic-doc.
Latest ACLE for sve I can found on there website, there spec say sizeless / scalable vector can't used on array, struct, union or class:
https://developer.arm.com/docs/100987/latest/arm-c-language-extensions-for-sve
Page 18.
Sizeless types may not be used in the following situations:
...
• as the type of an array element;
...
...
In all other respects, sizeless types have the same restrictions as the standard-defined incomplete types.
This specifically includes (but is not limited to) the following:
• Members of unions, structures and classes cannot have sizeless type.
...
from rvv-intrinsic-doc.
@rdolbeau
I thought segment load/store in RISC-V should provide more powerful capability to deal those issue?
This struct is equivalent vfloat64m1x2_t in this proposal, and using vseg_load_f64m1x2 (vlseg2e.v) can load vector of struct, that might what you want?
struct {
vfloat64m1_t real;
vfloat64m1_t imaginary;
};
from rvv-intrinsic-doc.
@kito-cheng Yes, but Zvlsseg is an extension so might no be available.
And yes, the structure is the same, it's a use case to justify we do want to have this ability :-) - and ideally not restricted to vfloat64m1x2_t but potentially more general
Indeed, this example would be well served by vfloat64m1x2_t if you can then access the two halves of the structure/array independently for further per-member computations:
vfloat64m1x2_t conjugate(const vfloat64m1x2_t input, const int requested_vl) {
vfloat64m1x2_t result = input;
result[1] = vneg_v_f64m1(result[1], requested_vl); // [0] is real, [1] is imaginary
return result;
}
Edit: add the requested_vl
in the code to be coherent with my own argument in #8 ;-)
from rvv-intrinsic-doc.
@kito-cheng Yes, but Zvlsseg is an extension so might no be available.
... says the guy proposing all the SLEN-dependent hacks :P
But seriously, note the following commentary in the V-extension spec:
Note | This set of instructions is intended to be included in the base "V" extension.
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#78-vector-loadstore-segment-instructions-zvlsseg
Obviously this is non-normative text, and the V-ext has not yet been ratified. However, my understanding is that this commentary is indicative of the task group's momentum and current thinking. I've been fighting hard for this, since last July, for exactly the application we're discussing. And for what it's worth, all of SiFive's vector cores support Zvlsseg.
from rvv-intrinsic-doc.
@knightsifive Good, because Zvlsseg is a good (great) thing I believe, and as LMUL already requires some level of 'striping'/'interleaving' ability (at SLEN), I would imagine it's not exceedingly difficult to also do it at SEW (not a hardware implementation guy, so I might be wrong).
from rvv-intrinsic-doc.
@rdolbeau when first I learned (December 10, 2019) that your FFTW port wasn't using segmented loads and stores, I cried.
To my understanding, the real challenge for Zvlsseg implementation is handling all the odd cases (e.g., RGB values) plus the fact that the "register groups" need not be aligned to powers of two. (LMUL groups are powers of two, and aligned commensurately.)
from rvv-intrinsic-doc.
Status update:
I start to implement on GCC side, and plan to implement vector tuple type as primitive type at first stage, and extend to using aggregate or array next stage.
I think the back-end implementation should be similar, no matter we defined vector tuple type as primitive, aggregate or array, the only difference is what kind of syntax / operator we need to support in the front-end.
from rvv-intrinsic-doc.
We could discuss how to represent the tuple types and related C operators in another new issue.
from rvv-intrinsic-doc.
https://github.com/sifive/rvv-intrinsic-doc/issues/17 for further discussion on vector tuple type
from rvv-intrinsic-doc.
Related Issues (20)
- Tuple types that goes across the hardware restriction HOT 1
- [Proposal] Support for C operators on RVV types HOT 12
- vget for fractional register doesn't exist HOT 10
- Constraint of vector types in Zve32* HOT 2
- [Requirement]: The RISC-V RVV vector intrinsic must include support for vector groups in the __riscv_vfredosum function HOT 4
- Type-relative overloads for vreinterpret, vlmul_ext, vlmul_trunc, etc. HOT 1
- How to use a class to wrap or derive from a sizeless vector type HOT 1
- Encode all the effects of vsetvl in the return type, for use in subsequent type deductions HOT 1
- Does `__riscv_v_intrinsic >= 1000000` imply overloaded intrinsics are supported?
- Create bibliography from reference section HOT 3
- Simple questions about inline assembly in vmv.x.s instruction HOT 2
- Asterisks are not subscripts
- the wrong result of "vmerge_vvm_i32m1" HOT 5
- ta,ma reduction destination with vl=0 HOT 1
- Clarify the consequences of vxsat not being handled by the intrinsics HOT 3
- Add a section with examples HOT 3
- Rename uses of {implicit,explicit}-frm into {Implicit, Explicit} FP rounding mode HOT 1
- Clarify the mapping of pseudo-intrinsics
- Clarify what float and double means HOT 1
- Fix authors in the document
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rvv-intrinsic-doc.