Giter Site home page Giter Site logo

Comments (4)

topperc avatar topperc commented on August 17, 2024 1

There's a typo on this line vuint32m4_t b8s_extended = __riscv_vzext_vf4_u32m4(a8s, vl); The argument should be b8s

Compiling with -Wall does give a warning for b8s being unused.

from rvv-intrinsic-doc.

tomhepworth avatar tomhepworth commented on August 17, 2024 1

I hang my head in shame

Thanks so much!

from rvv-intrinsic-doc.

topperc avatar topperc commented on August 17, 2024

Which compiler are you using? What execution environment are you using?

The __riscv_vmacc_vv_i32m4 in the loop needs to be the tail undisturbed version __riscv_vmacc_vv_i32m4_tu so that the upper elements on the last iteration are preserved from previous iterations.

The __riscv_vredsum_vs_i32m4_i32m1 after the loop should use __riscv_vsetvl_e32m4(len) as its vl to handle the case where len is less than vlmax for e32m4.

from rvv-intrinsic-doc.

tomhepworth avatar tomhepworth commented on August 17, 2024

Thanks for the fast response! :) I am compiling with llvm clang v17 and executing with spike --isa rv64gcv pk ...

I'm not sure I follow why it needs to be tail undisturbed. This is the first time I've used vector intrinsics like this so apologies if I am missing something obvious.

The updated function below still has the same issue

int byte_mac_vec(unsigned char *a, unsigned char *b, int len) {
  size_t vlmax = __riscv_vsetvlmax_e8m1();
  vint32m4_t vec_s = __riscv_vmv_v_x_i32m4(0, vlmax);
  vint32m1_t vec_zero = __riscv_vmv_v_x_i32m1(0, vlmax);
  int k = len;
  for (size_t vl; k > 0; k -= vl, a += vl, b += vl) {
    vl = __riscv_vsetvl_e8m1(k);
   
    vuint8m1_t a8s = __riscv_vle8_v_u8m1(a, vl);
    vuint8m1_t b8s = __riscv_vle8_v_u8m1(b, vl);
    vuint32m4_t a8s_extended = __riscv_vzext_vf4_u32m4(a8s, vl);
    vuint32m4_t b8s_extended = __riscv_vzext_vf4_u32m4(a8s, vl);
    
    vint32m4_t a8s_as_i32 = __riscv_vreinterpret_v_u32m4_i32m4(a8s_extended);
    vint32m4_t b8s_as_i32 = __riscv_vreinterpret_v_u32m4_i32m4(b8s_extended);

    vec_s = __riscv_vmacc_vv_i32m4_tu(vec_s, a8s_as_i32, b8s_as_i32, vl);
  }
  
  vint32m1_t vec_sum = __riscv_vredsum_vs_i32m4_i32m1(vec_s, vec_zero, __riscv_vsetvl_e32m4(len));
  int sum = __riscv_vmv_x_s_i32m1_i32(vec_sum);

  return sum;
}

from rvv-intrinsic-doc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.