Giter Site home page Giter Site logo

[Requirement]: The RISC-V RVV vector intrinsic must include support for vector groups in the __riscv_vfredosum function about rvv-intrinsic-doc HOT 4 CLOSED

Popebl avatar Popebl commented on July 19, 2024
[Requirement]: The RISC-V RVV vector intrinsic must include support for vector groups in the __riscv_vfredosum function

from rvv-intrinsic-doc.

Comments (4)

topperc avatar topperc commented on July 19, 2024

Is the initialization of src_vec missing from your C code. It seems to be initialized with a vmv.v.i in the assembly. It could be initialized with vmv.s.x since only element 0 is used.

The compiler's vsetvli insertion pass could be taught that vmv.s.x doesn't care about lmul and can share the same vsetvli as the vredosum. I believe this is being worked on in the LLVM compiler.

from rvv-intrinsic-doc.

Popebl avatar Popebl commented on July 19, 2024

After adding the initial src_vec using __riscv_vfmv_s_tu, there are still three vsetvli instructions remaining.

__attribute__((noinline)) static void foo_vec(float *r, const float *x)
{
int i, k;

vfloat32m4_t x_vec;
vfloat32m4_t x_forward_vec;
vfloat32m4_t temp_vec;

/**
 * I have to use m1 to complicat intrisic
 */
vfloat32m1_t dst_vec;
vfloat32m1_t src_vec;

float result = 0.0f;
float shift_prev = 0.0f;

size_t n = 64;
for(size_t vl; n>0; n -=vl){
    vl = __riscv_vsetvl_e32m4(n); //LMUL=4
    x_vec = __riscv_vle32_v_f32m4(&x[0], vl);
    x_forward_vec = __riscv_vle32_v_f32m4(&x[0], vl);
    temp_vec = __riscv_vfmul_vv_f32m4(x_vec, x_forward_vec, vl);

    /**
     * I have to use m1 to complicat intrisic
     */
    //vfloat32m1_t __riscv_vfmv_s_tu(vfloat32m1_t vd, float rs1, size_t vl);
    src_vec = __riscv_vfmv_s_tu(src_vec, 0.0f, vl);                 //initial src_vec
    //dst_vec = __riscv_vfmv_s_f_f32m1_tu(dst_vec, 0.0f, vl);         //clean for vfredosum
    dst_vec = __riscv_vfmv_s_tu(dst_vec, 0.0f, vl);         //clean for vfredosum
    dst_vec = __riscv_vfredosum_tu(dst_vec, temp_vec, src_vec, vl);

    r[0] = __riscv_vfmv_f_s_f32m1_f32(dst_vec);

   }
}
00000000800000f0 <foo_vec.constprop.0>:
    800000f0:   04000713                li      a4,64
    800000f4:   82018693                add     a3,gp,-2016 # 8000c020 <foo_x>
    800000f8:   0c0777d7                vsetvli a5,a4,e8,m1,ta,ma
    800000fc:   0907f057                vsetvli zero,a5,e32,m1,tu,ma
    80000100:   42006157                vmv.s.x v2,zero
    80000104:   420060d7                vmv.s.x v1,zero
    80000108:   0927f057                vsetvli zero,a5,e32,m4,tu,ma
    8000010c:   0206e207                vle32.v v4,(a3)
    80000110:   92421257                vfmul.vv        v4,v4,v4
    80000114:   0e4110d7                vfredosum.vs    v1,v4,v2
    80000118:   421017d7                vfmv.f.s        fa5,v1
    8000011c:   40f6a027                fsw     fa5,1024(a3)
    80000120:   8f1d                    sub     a4,a4,a5
    80000122:   fb79                    bnez    a4,800000f8 <foo_vec.constprop.0+0x8>
    80000124:   8082                    ret

from rvv-intrinsic-doc.

zhongjuzhe avatar zhongjuzhe commented on July 19, 2024

No, we don't need such intrinsic.
This is compiler issue.

I have confirm there is a regression in GCC:
https://godbolt.org/z/vocK8cee4

GCC-13 is able to generate optimal vsetvls, wheras GCC trunk doesn't.
I have file a PR for it to recover back the performance:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112776

from rvv-intrinsic-doc.

Popebl avatar Popebl commented on July 19, 2024

Bug fixed by https://gcc.gnu.org/pipermail/gcc-patches/2023-December/638850.html
Thanks very much!

from rvv-intrinsic-doc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.