Giter Site home page Giter Site logo

nimsimd's Issues

little typo in intrinsics import signature

Hi guzba thx for the work,

prbl. typo in ./src/nimsimd/sse2.nim :: line-269
i believe the SSE2 import to cast a single-precision to int :
func mm_castps_si128*(a: M128d): M128i {.importc: "_mm_castps_si128".}

should instead receive a single-precision like so :
func mm_castps_si128*(a: M128): M128i {.importc: "_mm_castps_si128".}

The cast from double-precision mm_castpd_si128 to M128i is correct and currently has the exact same signature as mm_castps_si128().
Latest Intel Intrinsics Guide/dec.2023 confirms this. I fixed it in the local nimble-dir and it works.

greets Andreas

Can't compile due to missing compiler flag

So, basically there's this error shown in the image below when compiling:
Screenshot_8

Which can be fixed by passing the --passC:-msse2 flag in the compiler (Credit to leorize for finding that out)

Future of Nim-simd

Hello Ryan,

first of all - a BIG THX for this lib. After some video lectures i started with SIMD-stuff and found nimsimd a perfect start. AVX512 is missing, but state-of-the-art AVX2 surely has the broadest audience - even on my side i'm not AVX512 ready, yet :)
But looking at what others achieve with SIMD - looking at Prof. Lemire & friends - one might get the impression, that AVX512 is 'just around the corner' :)
Naturally i started doing wrappers around the horrible intrinsics, trying to improve my and the user-experience. I will do some examples e.g. a sorting-network, string-conversion, base64 en-/decoding, string-lookup etc.
Now i came across Agner Fogs C++ Vector-Class-Library (VCL]. And from their description i learned they do sophisticated macro-/template-stuff so that one can use e.g. AVX512, though the actual hardware does not support it. And that made me think - "Shall i try nim-wrapper for VCL ?" and then make that lib more accessible thru Nim or continue to work with nim-simd ? Or would it make more sense to do exactly the same with Nim- macro-magic => a VCL rewrite (which would be very much above my pay-grade) ?
So what are your plans with nim-simd and maybe you can find the time to take a look at VCL ?
I could need some advice on how to make smth. that encourages more Nim-users to enter the SIMD journey and make this world a more efficient place 8=)).
Fun aside, i believe there is much to gain for Nim in terms of attractivness for new-users beeing able to tryout a technology like SIMD - since Python and even Javascript started adopting it. But maybe even the std-lib could gain here and there ?

beats & greats, Andreas

As a side note : beef already provided a concept & macro to check array-alignment - and he expressed some interest in SIMD-accelerated string lookups...
And i have some good material (papers/videos/gh-code/etc.) in my Zotero, which might fit well in the nim-simd-wiki ?

bug: mm256_castsi256_ps returns M128 instead of M256

diff --git a/src/nimsimd/avx.nim b/src/nimsimd/avx.nim
index ed9dc1b..53796e8 100644
--- a/src/nimsimd/avx.nim
+++ b/src/nimsimd/avx.nim
@@ -63,7 +63,7 @@ func mm256_castsi128_si256*(a: M128i): M256i {.importc: "_mm256_castsi128_si256"
 
 func mm256_castsi256_pd*(a: M256i): M128d {.importc: "_mm256_castsi256_pd".}
 
-func mm256_castsi256_ps*(a: M256i): M128 {.importc: "_mm256_castsi256_ps".}
+func mm256_castsi256_ps*(a: M256i): M256 {.importc: "_mm256_castsi256_ps".}
 
 func mm256_castsi256_si128*(a: M256i): M128i {.importc: "_mm256_castsi256_si128".}

Two more glitches in avx.nim

Hello guzba,

in avx.nim lines 226-230 ::

226 func mm_permutevar_pd*(a: M128d, b: M128i): M256d {.importc: "_mm_permutevar_pd".}`
...
230 func mm_permutevar_pd*(a: M128d, b: M128i): M256d {.importc: "_mm_permutevar_pd".}

AFAIK should return the same vector-types that they received. These should be double/single-precision vectors of 128b..

func mm_permutevar_pd*(a: M128d, b: M128i): M128d {.importc: "_mm_permutevar_pd".}
...
func mm_permutevar_ps*(a: M128, b: M128i): M128 {.importc: "_mm_permutevar_ps".}

and if you don't mind - sure you will :) - one could fix the terrible Intel-naming just a bit by adding the missing ::

func mm_permutevar_epi32*(a, mask :M128i ): M128i = 
  mm_castsi128_ps(
    mm_permutevar_ps( mm_castps_si128( a ), mask )
  )
func mm_permutevar_epi64*(a, mask :M128i ): M128i =
  mm_castsi128_pd(
    mm_permutevar_pd( mm_castpd_si128( a ), mask )
  )

Since everybd. has to add them anyways - after one got trapped... Same could/should be done for the 256bit-sized vectors.
And don't get me wrong here - i just suggest to add what Intel has left out, but evbd. expects to find. But staying with the Intel-wording. Actually a permutevar_<type> is a permute-operation - well, many operation permute a vector. In this case it is a shuffle-operation..
Maybe one could add a common_avx.nim that adds those missing functions to make the intrinsics a bit more consistent ?

just my 20ct, greets Andreas

we are gettin' closer to nimsimd v2 :)

Aligned alloc _mm_malloc/free() would be nice to have

Hi again,

_mm_alloc() and _mm_free() should be in sse.nim which is missing.
I'd say this would be a desireable feature as it would make aligned allocations explicit ? And easier and cleaner, too.

What do you think ?

regards, Andreas

--
i found the include for prefetch in sse2.nim and added the desired for testing :

func mm_malloc*( size: int, align: int) :pointer {.importc: "_mm_malloc".}
func mm_free*( pt :pointer ) {.importc: "_mm_free".}

M1 Mac runtimecheck Errors

With the following code:

import nimsimd/runtimecheck

let
  cpuHasAvx* = checkInstructionSets({AVX})
  cpuHasAvx2* = checkInstructionSets({AVX, AVX2})

This fails to compile:

/Users/miguelmartin/.cache/nim/nimsimd_test_d/@m..@s..@s..@[email protected]@[email protected]@[email protected]:130:10: error: invalid output constraint '=a' in asm
        :"=a"(eaxr), "=b"(ebxr), "=c"(ecxr), "=d"(edxr)
         ^
1 error generated.
Error: execution of an external compiler program 'clang -c  -w -ferror-limit=3 -pthread   -I/Users/miguelmartin/.choosenim/toolchains/nim-2.0.0/lib -I/Users/miguelmartin/repos/nim_playground/nimsimd_test/src -o /Users/miguelmartin/.cache/nim/nimsimd_test_d/@m..@s..@s..@[email protected]@[email protected]@[email protected] /Users/miguelmartin/.cache/nim/nimsimd_test_d/@m..@s..@s..@[email protected]@[email protected]@[email protected]' failed with exit code: 1

System Details:

  • OS: MacOS 13.5
  • CPU: M1 Max
  • RAM: 64GB

Compiler version: 2.0

mm256_permute4x64_pd missing from avx2.nim

Hi guzba,

maybe found another glitch - mm256_permute4x64_pd for double-precision should be in avx2.nim. Though mm256_permute4x64_epi64 + a cast will do exactly the same - so, maybe you left it intentionally ?

regards, andreas

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.