Giter Site home page Giter Site logo

Comments (11)

rdolbeau avatar rdolbeau commented on July 18, 2024

'native', pre-defined tuple simply needs to exist (for things like Zvlsseg, etc.) and have accessors so they can be (de)constructed; the ability to access/update with either struct-like or array-like syntax falls under the "good-to-have" from my point-of-view - I could live without that if it's too difficult to implement. If it's doable and you want an opinion between the options, mine is 'whichever is easier to implement' :-) (I'm guessing array).

Used-defined arrays with vector element would be good for some algorithms (e.g. storing locally a copy of data for some block-based algorithms like they have in video processing). Arm might consider that for SVE - I've put a request for it and they didn't say 'no' straight away ;-)

User-defined structure with vector element would be nice, but maybe too difficult to implement (alignment rules are going to be hell...) and not worth the effort. I don't think Arm will do that for SVE.

In your examples, when you write vint32m2x3_t, we agree that means 6 architected registers ? (m2 means LMUL=2 so 2, the x3 is a tuple of 3, so 6 in all), and that the data- layout will be SLEN-dependent because LMUL>1? Whereas vfloat64m1x3_t would be just 3 registers with a implementation-independent data layout?

from rvv-intrinsic-doc.

kito-cheng avatar kito-cheng commented on July 18, 2024

@rdolbeau thanks your feedback, and yes, I am seeking feedback between those options.

Honestly I didn't have idea about the implementation cost amount those options yet. So personally I am not rush to decide which approach other than first option, just collect more feedback at this stage.

In your examples, when you write vint32m2x3_t, we agree that means 6 architected registers ? (m2 means LMUL=2 so 2, the x3 is a tuple of 3, so 6 in all), and that the data- layout will be SLEN-dependent because LMUL>1?  Whereas vfloat64m1x3_t would be just 3 registers with a implementation-independent data layout?

The data layout of vector tuple type same as the vector type version but repeat NF times, e.g. vint32m2x3_t has same data layout as vint32m2_t but repeat 3 times, and with extra register allocation constraint that need consecutive registers.

Conceptually vint32m2x3_t is equivalent to vint32m2_t[3].

So yes, vint32m2x3_t is SLEN-dependent if LMUL > 1, and vfloat64m1x3_t is implementation-independent data layout.

from rvv-intrinsic-doc.

rofirrim avatar rofirrim commented on July 18, 2024

I think it might be slightly better to use structs instead of arrays.

C and C++ don't have values of array type and as such they can't be returned from a function call.

That would be the case of a load from Zvlsseg, in which a tuple of registers vfloat64m1x3_t could be returned from a call to a hypothetical vlseg_v_f64m1x3.

vfloat64m1x3_t vlseg3e_v_f64m1x3(const double *base);

Arguments don't need to have structs (we could always flatten them) but for consistency I'd expect the store look like this.

void vsseg3e_v_f64m1x3(const double *base, vfloat64m1x3_t);

In that sense that vfloat64m1x3_t would behave like a struct with fields (say) v0, v1, v2.

vfloat64m1x3_t vt;
vt.v0 = ...;
... = vt.v2

If we focus only on intrinsic functionality, it is unclear to me we need to allow users defining their own structs or array types with RVV vectors in them.

So my stance now would be like @kito-cheng 1 (a primitive type) above plus as much behaviour of 2 that makes sense for it.

In that sense I'd be inclined to do something like Arm's ACLE for SVE ( https://static.docs.arm.com/100987/0000/acle_sve_100987_0000_00_en.pdf ).

Note that in general (page 13)

Members of unions, structures and classes cannot have sizeless type.

sizeless is Arm's term for we don't necessarily know the size of the object at compile time.

But then in page 14

Each type svBASExN_t is sizeless and contains a sequence of N svBASE_ts. The individual vectors are members with names v0, v1, and so on. For example, svfloat64x4_t contains four svfloat64_t vectors with the names v0, v1, v2 and v3.

Nothing seems to prevent making those types svBASExN_t primitive. Arm's implementation in their Arm Compiler for HPC exposes that detail in the headers via a __sizeless_struct syntax but this seems an implementation detail to me.

typedef __sizeless_struct { svfloat64_t v0, v1, v2; } svfloat64x3_t;

from rvv-intrinsic-doc.

kito-cheng avatar kito-cheng commented on July 18, 2024

C and C++ don't have values of array type and as such they can't be returned from a function call.
Good point, sounds like array is not an option.

Each type svBASExN_t is sizeless and contains a sequence of N svBASE_ts. The individual vectors are members with names v0, v1, and so on. For example, svfloat64x4_t contains four svfloat64_t vectors with the names v0, v1, v2 and v3.

This paragraph seems gone in later version, svBASExN_t can't access via v0..vn now.
https://static.docs.arm.com/100987/0000/acle_sve_100987_0000_04_en.pdf


For some implementation detail for SVE GCC 10, they allow struct-style initialization, but other operation must call intrinsic function:

  svfloat64_t s64;
  svfloat64x3_t s64x3 = {s64, s64, s64}; // Creation.
  s64x3 = svcreate3(s64, s64, s64); // Creation.
  s64x3 = svset3(s64x3, 1, s64); // Insertion
  s64 = svget3(s64x3, 2); // Extraction

from rvv-intrinsic-doc.

rofirrim avatar rofirrim commented on July 18, 2024

This paragraph seems gone in later version, svBASExN_t can't access via v0..vn now.
https://static.docs.arm.com/100987/0000/acle_sve_100987_0000_04_en.pdf

Thanks @kito-cheng I wasn't aware of the new version.

For some implementation detail for SVE GCC 10, they allow struct-style initialization, but other operation must call intrinsic function:

  svfloat64_t s64;
  svfloat64x3_t s64x3 = {s64, s64, s64}; // Creation.
  s64x3 = svcreate3(s64, s64, s64); // Creation.
  s64x3 = svset3(s64x3, 1, s64); // Insertion
  s64 = svget3(s64x3, 2); // Extraction

This is a reasonable alternative to using structs.

from rvv-intrinsic-doc.

haolongzhangm avatar haolongzhangm commented on July 18, 2024

@kito-cheng I test "Primitive style" way with vcreate_*
after objdump, we can find vcreate_ will lead to use more vector register and use extra “vmv” instruct , which will lead to poor performance

#include "riscv_vector.h"                                                                                                                                                                                     
#include <cstdio>                                                                                                                                                                                             
                                                                                                                                                                                                              
int main() {                                                                                                                                                                                                  
                                                                                                                                                                                                              
  float src[16] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};                                                                                                                                    
                                                                                                                                                                                                              
  vfloat32m1_t t0, t1, t2, t3, ret0, ret1;                                                                                                                                                                    
                                                                                                                                                                                                              
  t0 = vle32_v_f32m1(src, 4);                                                                                                                                                                                 
  t1 = vle32_v_f32m1(src + 4, 4);
  t2 = vle32_v_f32m1(src + 8, 4);
  t3 = vle32_v_f32m1(src + 12, 4);

  ret0 = vfadd_vv_f32m1(t0, t1, 4);
  ret1 = vfadd_vv_f32m1(t2, t3, 4);
   
  float dst [8] = {0};

  vse32_v_f32m1(dst, ret0, 4);
  vse32_v_f32m1(dst + 4, ret1, 4);

  for (size_t i = 0; i < 8; i++) {
      printf("%f ", dst[i]);
  }

  printf("\n");


  return 0;
}


   10508:       0087f7d7                vsetvli a5,a5,e32,m1,d1
   1050c:       181c                    addi    a5,sp,48
   1050e:       0207f207                vle.v   v4,(a5)
   10512:       1004                    addi    s1,sp,32
   10514:       009c                    addi    a5,sp,64
   10516:       0207f087                vle.v   v1,(a5)
   1051a:       0204f107                vle.v   v2,(s1)
   1051e:       089c                    addi    a5,sp,80
   10520:       0207f187                vle.v   v3,(a5)
   10524:       02221157                vfadd.vv        v2,v2,v4
   10528:       021190d7                vfadd.vv        v1,v1,v3
   1052c:       e802                    sd      zero,16(sp)
   1052e:       ec02                    sd      zero,24(sp)
   10530:       e002                    sd      zero,0(sp)
   10532:       e402                    sd      zero,8(sp)
   10534:       081c                    addi    a5,sp,16
   10536:       840a                    mv      s0,sp
   10538:       6941                    lui     s2,0x10
   1053a:       02017127                vse.v   v2,(sp)
   1053e:       0207f0a7                vse.v   v1,(a5)
   10542:       00042787                flw     fa5,0(s0) # ffffffffffffd000 <__global_pointer$+0xfffffffffffea800>
   10546:       67090513                addi    a0,s2,1648 # 10670 <__libc_csu_fini+0x4>
   1054a:       420787d3                fcvt.d.s        fa5,fa5
   1054e:       0411                    addi    s0,s0,4
   10550:       e20785d3                fmv.x.d a1,fa5
   10554:       f6dff0ef                jal     ra,104c0 <printf@plt>
   10558:       fe8495e3                bne     s1,s0,10542 <main+0x72>
   1055c:       4529                    li      a0,10
   1055e:       f53ff0ef                jal     ra,104b0 <putchar@plt>
   10562:       70e6                    ld      ra,120(sp)
   10564:       7446                    ld      s0,112(sp)
   10566:       74a6                    ld      s1,104(sp)
   10568:       7906                    ld      s2,96(sp)
   1056a:       4501                    li      a0,0
   1056c:       6109                    addi    sp



use vcreate_
int main() {

  float src[16] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};

  //vfloat32m1_t t0, t1, t2, t3, ret0, ret1;

  vfloat32m1x4_t src4;// = vcreate_f32m1x4(t0, t1, t2, t3);
  vfloat32m1x2_t dst2;// = vcreate_f32m1x2(ret0, ret1);

  src4 = vset_f32m1x4(src4, 0, vle32_v_f32m1(src, 4));
  src4 = vset_f32m1x4(src4, 1, vle32_v_f32m1(src + 4, 4));
  src4 = vset_f32m1x4(src4, 2, vle32_v_f32m1(src + 8, 4));
  src4 = vset_f32m1x4(src4, 3, vle32_v_f32m1(src + 12, 4));

  dst2 = vset_f32m1x2(dst2, 0, vfadd_vv_f32m1(vget_f32m1x4_f32m1(src4, 0), vget_f32m1x4_f32m1(src4, 1), 4));
  dst2 = vset_f32m1x2(dst2, 1, vfadd_vv_f32m1(vget_f32m1x4_f32m1(src4, 2), vget_f32m1x4_f32m1(src4, 3), 4));
  
  float dst [8] = {0};

  vse32_v_f32m1(dst, vget_f32m1x2_f32m1(dst2, 0), 4);
  vse32_v_f32m1(dst + 4, vget_f32m1x2_f32m1(dst2, 1), 4);

  for (size_t i = 0; i < 8; i++) {
      printf("%f ", dst[i]);
  }

  printf("\n");


  return 0;
}

10508:       0087f7d7                vsetvli a5,a5,e32,m1,d1                                                                                                                                               
   1050c:       1004                    addi    s1,sp,32                                                                                                                                                      
   1050e:       0204f087                vle.v   v1,(s1)                                                                                                                                                       
   10512:       c2002f73                csrr    t5,vl                                                                                                                                                         
   10516:       c2102ff3                csrr    t6,vtype                                                                                                                                                      
   1051a:       00807057                vsetvli zero,zero,e32,m1,d1                                                                                                                                           
   1051e:       5e008257                vmv.v.v v4,v1                                                                                                                                                         
   10522:       81ff7057                vsetvl  zero,t5,t6                                                                                                                                                    
   10526:       32002057                vmv.x.s zero,v0                                                                                                                                                       
   1052a:       181c                    addi    a5,sp,48                                                                                                                                                      
   1052c:       0207f087                vle.v   v1,(a5)                                                                                                                                                       
   10530:       c2002f73                csrr    t5,vl                                                                                                                                                         
   10534:       c2102ff3                csrr    t6,vtype                                                                                                                                                      
   10538:       00807057                vsetvli zero,zero,e32,m1,d1                                                                                                                                           
   1053c:       5e0082d7                vmv.v.v v5,v1                                                                                                                                                         
   10540:       81ff7057                vsetvl  zero,t5,t6                                                                                                                                                    
   10544:       32002057                vmv.x.s zero,v0                                                                                                                                                       
   10548:       009c                    addi    a5,sp,64                                                                                                                                                      
   1054a:       0207f087                vle.v   v1,(a5)                                                                                                                                                       
   1054e:       c2002f73                csrr    t5,vl                                                                                                                                                         
   10552:       c2102ff3                csrr    t6,vtype                                                                                                                                                      
   10556:       00807057                vsetvli zero,zero,e32,m1,d1                                                                                                                                           
   1055a:       5e008357                vmv.v.v v6,v1                                                                                                                                                         
   1055e:       81ff7057                vsetvl  zero,t5,t6                                                                                                                                                    
   10562:       32002057                vmv.x.s zero,v0                                                                                                                                                       
   10566:       089c                    addi    a5,sp,80
   10568:       0207f087                vle.v   v1,(a5)
   1056c:       e802                    sd      zero,16(sp)
   1056e:       c2002f73                csrr    t5,vl
   10572:       c2102ff3                csrr    t6,vtype
   10576:       00807057                vsetvli zero,zero,e32,m1,d1
   1057a:       5e0083d7                vmv.v.v v7,v1
   1057e:       81ff7057                vsetvl  zero,t5,t6
   10582:       32002057                vmv.x.s zero,v0
   10586:       ec02                    sd      zero,24(sp)
   10588:       c2002f73                csrr    t5,vl
   1058c:       c2102ff3                csrr    t6,vtype
   10590:       00807057                vsetvli zero,zero,e32,m1,d1
   10594:       5e020457                vmv.v.v v8,v4
   10598:       81ff7057                vsetvl  zero,t5,t6
   1059c:       32002057                vmv.x.s zero,v0
   105a0:       c2002f73                csrr    t5,vl
   105a4:       c2102ff3                csrr    t6,vtype
   105a8:       00807057                vsetvli zero,zero,e32,m1,d1
   105ac:       5e0284d7                vmv.v.v v9,v5
   105b0:       81ff7057                vsetvl  zero,t5,t6
   105b4:       32002057                vmv.x.s zero,v0
   105b8:       c2002f73                csrr    t5,vl
 105bc:       c2102ff3                csrr    t6,vtype                                                                                                                                                      
   105c0:       00807057                vsetvli zero,zero,e32,m1,d1                                                                                                                                           
   105c4:       5e0300d7                vmv.v.v v1,v6
   105c8:       81ff7057                vsetvl  zero,t5,t6
   105cc:       32002057                vmv.x.s zero,v0
   105d0:       c2002f73                csrr    t5,vl
   105d4:       c2102ff3                csrr    t6,vtype
   105d8:       00807057                vsetvli zero,zero,e32,m1,d1
   105dc:       5e0382d7                vmv.v.v v5,v7
   105e0:       81ff7057                vsetvl  zero,t5,t6
   105e4:       32002057                vmv.x.s zero,v0
   105e8:       081c                    addi    a5,sp,16
   105ea:       840a                    mv      s0,sp
   105ec:       6941                    lui     s2,0x10
   105ee:       02849257                vfadd.vv        v4,v8,v9
   105f2:       021290d7                vfadd.vv        v1,v1,v5
   105f6:       e002                    sd      zero,0(sp)
   105f8:       e402                    sd      zero,8(sp)
   105fa:       c2002f73                csrr    t5,vl
   105fe:       c2102ff3                csrr    t6,vtype
   10602:       00807057                vsetvli zero,zero,e32,m1,d1
   10606:       5e020157                vmv.v.v v2,v4
   1060a:       81ff7057                vsetvl  zero,t5,t6
   1060e:       32002057                vmv.x.s zero,v0
   10612:       c2002f73                csrr    t5,vl
   10616:       c2102ff3                csrr    t6,vtype
   1061a:       00807057                vsetvli zero,zero,e32,m1,d1
   1061e:       5e0081d7                vmv.v.v v3,v1
   10622:       81ff7057                vsetvl  zero,t5,t6
   10626:       32002057                vmv.x.s zero,v0
   1062a:       c2002f73                csrr    t5,vl
   1062e:       c2102ff3                csrr    t6,vtype
   10632:       00807057                vsetvli zero,zero,e32,m1,d1
   10636:       5e010257                vmv.v.v v4,v2
   1063a:       81ff7057                vsetvl  zero,t5,t6
   1063e:       32002057                vmv.x.s zero,v0
   10642:       c2002f73                csrr    t5,vl
   10646:       c2102ff3                csrr    t6,vtype
   1064a:       00807057                vsetvli zero,zero,e32,m1,d1
   1064e:       5e0180d7                vmv.v.v v1,v3
   10652:       81ff7057                vsetvl  zero,t5,t6
   10656:       32002057                vmv.x.s zero,v0
   1065a:       02017227                vse.v   v4,(sp)
   1065e:       0207f0a7                vse.v   v1,(a5)
   10662:       00042787                flw     fa5,0(s0) # ffffffffffffd000 <__global_pointer$+0xfffffffffffea800>
   10666:       79090513                addi    a0,s2,1936 # 10790 <__libc_csu_fini+0x4>
   1066a:       420787d3                fcvt.d.s        fa5,fa5
   1066e:       0411                    addi    s0,s0,4
   10670:       e20785d3                fmv.x.d a1,fa5
   10674:       e4dff0ef                jal     ra,104c0 <printf@plt>

from rvv-intrinsic-doc.

haolongzhangm avatar haolongzhangm commented on July 18, 2024

sometimes need must use array vector type for easy coding

so now ,what is the best solution for declaration rvv array like
vfloat32m1_t src4[4]
what`s more
vfloat32m1_t src4[4][2]

at the same, do not import use more instruct

I build args:
riscv64-unknown-linux-gnu-g++ -march=rv64gcv0p7 -mabi=lp64d

from rvv-intrinsic-doc.

haolongzhangm avatar haolongzhangm commented on July 18, 2024

i test -march=rv64gcv with do not have issue, but use -march=rv64gcv0p7 have "more move instruct" issue

BUT, there are so many board only support v0p7 now, so any possible fix this issue on v0p7

from rvv-intrinsic-doc.

eopXD avatar eopXD commented on July 18, 2024

Closing this issue and redirecting to #139. The question essentially boils down to how do we enable sizeless struct in the compiler implementation.

from rvv-intrinsic-doc.

kito-cheng avatar kito-cheng commented on July 18, 2024

i test -march=rv64gcv with do not have issue, but use -march=rv64gcv0p7 have "more move instruct" issue

BUT, there are so many board only support v0p7 now, so any possible fix this issue on v0p7

That sounds like a T-head toolchain issue rather than intrinsic interface issue, I would suggest you could report that to T-head directly.

from rvv-intrinsic-doc.

haolongzhangm avatar haolongzhangm commented on July 18, 2024

riscv-collab/riscv-gnu-toolchain#1106

from rvv-intrinsic-doc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.