Comments (4)
Is the initialization of src_vec
missing from your C code. It seems to be initialized with a vmv.v.i in the assembly. It could be initialized with vmv.s.x since only element 0 is used.
The compiler's vsetvli insertion pass could be taught that vmv.s.x doesn't care about lmul and can share the same vsetvli as the vredosum. I believe this is being worked on in the LLVM compiler.
from rvv-intrinsic-doc.
After adding the initial src_vec using __riscv_vfmv_s_tu, there are still three vsetvli instructions remaining.
__attribute__((noinline)) static void foo_vec(float *r, const float *x)
{
int i, k;
vfloat32m4_t x_vec;
vfloat32m4_t x_forward_vec;
vfloat32m4_t temp_vec;
/**
* I have to use m1 to complicat intrisic
*/
vfloat32m1_t dst_vec;
vfloat32m1_t src_vec;
float result = 0.0f;
float shift_prev = 0.0f;
size_t n = 64;
for(size_t vl; n>0; n -=vl){
vl = __riscv_vsetvl_e32m4(n); //LMUL=4
x_vec = __riscv_vle32_v_f32m4(&x[0], vl);
x_forward_vec = __riscv_vle32_v_f32m4(&x[0], vl);
temp_vec = __riscv_vfmul_vv_f32m4(x_vec, x_forward_vec, vl);
/**
* I have to use m1 to complicat intrisic
*/
//vfloat32m1_t __riscv_vfmv_s_tu(vfloat32m1_t vd, float rs1, size_t vl);
src_vec = __riscv_vfmv_s_tu(src_vec, 0.0f, vl); //initial src_vec
//dst_vec = __riscv_vfmv_s_f_f32m1_tu(dst_vec, 0.0f, vl); //clean for vfredosum
dst_vec = __riscv_vfmv_s_tu(dst_vec, 0.0f, vl); //clean for vfredosum
dst_vec = __riscv_vfredosum_tu(dst_vec, temp_vec, src_vec, vl);
r[0] = __riscv_vfmv_f_s_f32m1_f32(dst_vec);
}
}
00000000800000f0 <foo_vec.constprop.0>:
800000f0: 04000713 li a4,64
800000f4: 82018693 add a3,gp,-2016 # 8000c020 <foo_x>
800000f8: 0c0777d7 vsetvli a5,a4,e8,m1,ta,ma
800000fc: 0907f057 vsetvli zero,a5,e32,m1,tu,ma
80000100: 42006157 vmv.s.x v2,zero
80000104: 420060d7 vmv.s.x v1,zero
80000108: 0927f057 vsetvli zero,a5,e32,m4,tu,ma
8000010c: 0206e207 vle32.v v4,(a3)
80000110: 92421257 vfmul.vv v4,v4,v4
80000114: 0e4110d7 vfredosum.vs v1,v4,v2
80000118: 421017d7 vfmv.f.s fa5,v1
8000011c: 40f6a027 fsw fa5,1024(a3)
80000120: 8f1d sub a4,a4,a5
80000122: fb79 bnez a4,800000f8 <foo_vec.constprop.0+0x8>
80000124: 8082 ret
from rvv-intrinsic-doc.
No, we don't need such intrinsic.
This is compiler issue.
I have confirm there is a regression in GCC:
https://godbolt.org/z/vocK8cee4
GCC-13 is able to generate optimal vsetvls, wheras GCC trunk doesn't.
I have file a PR for it to recover back the performance:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112776
from rvv-intrinsic-doc.
Bug fixed by https://gcc.gnu.org/pipermail/gcc-patches/2023-December/638850.html
Thanks very much!
from rvv-intrinsic-doc.
Related Issues (20)
- Type-relative overloads for vreinterpret, vlmul_ext, vlmul_trunc, etc. HOT 1
- How to use a class to wrap or derive from a sizeless vector type HOT 1
- Encode all the effects of vsetvl in the return type, for use in subsequent type deductions HOT 1
- Does `__riscv_v_intrinsic >= 1000000` imply overloaded intrinsics are supported?
- Create bibliography from reference section HOT 3
- Simple questions about inline assembly in vmv.x.s instruction HOT 2
- Asterisks are not subscripts
- the wrong result of "vmerge_vvm_i32m1" HOT 5
- ta,ma reduction destination with vl=0 HOT 1
- Clarify the consequences of vxsat not being handled by the intrinsics HOT 3
- Add a section with examples HOT 3
- Rename uses of {implicit,explicit}-frm into {Implicit, Explicit} FP rounding mode HOT 1
- Clarify the mapping of pseudo-intrinsics
- Clarify what float and double means HOT 1
- Fix authors in the document
- How to use LMUL in rvv-intrinsic? HOT 6
- Mismatched bfloat16 autogenerated files HOT 3
- Freeze the specification HOT 1
- `vfirst` and `vcpop` return types unexpectedly changed HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rvv-intrinsic-doc.