guzba / nimsimd Goto Github PK

View Code? Open in Web Editor NEW

63.0 63.0 6.0 67 KB

Pleasant Nim bindings for SIMD instruction sets.

License: MIT License

Nim 100.00%

arm avx avx2 bindings neon nim simd simd-intrinsics sse x86 x86-64

nimsimd's People

Contributors

Stargazers

Watchers

Forkers

icodein elvisxzhou edisile nimaoth simonkrauter asc2011

nimsimd's Issues

M1 Mac runtimecheck Errors

With the following code:

import nimsimd/runtimecheck

let
  cpuHasAvx* = checkInstructionSets({AVX})
  cpuHasAvx2* = checkInstructionSets({AVX, AVX2})

This fails to compile:

/Users/miguelmartin/.cache/nim/nimsimd_test_d/@m..@s..@s..@[email protected]@[email protected]@[email protected]:130:10: error: invalid output constraint '=a' in asm
        :"=a"(eaxr), "=b"(ebxr), "=c"(ecxr), "=d"(edxr)
         ^
1 error generated.
Error: execution of an external compiler program 'clang -c  -w -ferror-limit=3 -pthread   -I/Users/miguelmartin/.choosenim/toolchains/nim-2.0.0/lib -I/Users/miguelmartin/repos/nim_playground/nimsimd_test/src -o /Users/miguelmartin/.cache/nim/nimsimd_test_d/@m..@s..@s..@[email protected]@[email protected]@[email protected] /Users/miguelmartin/.cache/nim/nimsimd_test_d/@m..@s..@s..@[email protected]@[email protected]@[email protected]' failed with exit code: 1

System Details:

OS: MacOS 13.5
CPU: M1 Max
RAM: 64GB

Compiler version: 2.0

first of all - a BIG THX for this lib. After some video lectures i started with SIMD-stuff and found nimsimd a perfect start. AVX512 is missing, but state-of-the-art AVX2 surely has the broadest audience - even on my side i'm not AVX512 ready, yet :)
But looking at what others achieve with SIMD - looking at Prof. Lemire & friends - one might get the impression, that AVX512 is 'just around the corner' :)
Naturally i started doing wrappers around the horrible intrinsics, trying to improve my and the user-experience. I will do some examples e.g. a sorting-network, string-conversion, base64 en-/decoding, string-lookup etc.
Now i came across Agner Fogs C++ Vector-Class-Library (VCL]. And from their description i learned they do sophisticated macro-/template-stuff so that one can use e.g. AVX512, though the actual hardware does not support it. And that made me think - "Shall i try nim-wrapper for VCL ?" and then make that lib more accessible thru Nim or continue to work with nim-simd ? Or would it make more sense to do exactly the same with Nim- macro-magic => a VCL rewrite (which would be very much above my pay-grade) ?
So what are your plans with nim-simd and maybe you can find the time to take a look at VCL ?
I could need some advice on how to make smth. that encourages more Nim-users to enter the SIMD journey and make this world a more efficient place 8=)).
Fun aside, i believe there is much to gain for Nim in terms of attractivness for new-users beeing able to tryout a technology like SIMD - since Python and even Javascript started adopting it. But maybe even the std-lib could gain here and there ?

beats & greats, Andreas

As a side note : beef already provided a concept & macro to check array-alignment - and he expressed some interest in SIMD-accelerated string lookups...
And i have some good material (papers/videos/gh-code/etc.) in my Zotero, which might fit well in the nim-simd-wiki ?

mm256_permute4x64_pd missing from avx2.nim

Hi guzba,

maybe found another glitch - mm256_permute4x64_pd for double-precision should be in avx2.nim. Though mm256_permute4x64_epi64 + a cast will do exactly the same - so, maybe you left it intentionally ?

regards, andreas

bug: mm256_castsi256_ps returns M128 instead of M256

diff --git a/src/nimsimd/avx.nim b/src/nimsimd/avx.nim
index ed9dc1b..53796e8 100644
--- a/src/nimsimd/avx.nim
+++ b/src/nimsimd/avx.nim
@@ -63,7 +63,7 @@ func mm256_castsi128_si256*(a: M128i): M256i {.importc: "_mm256_castsi128_si256"
 
 func mm256_castsi256_pd*(a: M256i): M128d {.importc: "_mm256_castsi256_pd".}
 
-func mm256_castsi256_ps*(a: M256i): M128 {.importc: "_mm256_castsi256_ps".}
+func mm256_castsi256_ps*(a: M256i): M256 {.importc: "_mm256_castsi256_ps".}
 
 func mm256_castsi256_si128*(a: M256i): M128i {.importc: "_mm256_castsi256_si128".}

little typo in intrinsics import signature

Hi guzba thx for the work,

prbl. typo in ./src/nimsimd/sse2.nim :: line-269
i believe the SSE2 import to cast a single-precision to int :
func mm_castps_si128*(a: M128d): M128i {.importc: "_mm_castps_si128".}

should instead receive a single-precision like so :
func mm_castps_si128*(a: M128): M128i {.importc: "_mm_castps_si128".}

The cast from double-precision mm_castpd_si128 to M128i is correct and currently has the exact same signature as mm_castps_si128().
Latest Intel Intrinsics Guide/dec.2023 confirms this. I fixed it in the local nimble-dir and it works.

greets Andreas

Two more glitches in avx.nim

Hello guzba,

in avx.nim lines 226-230 ::

226 func mm_permutevar_pd*(a: M128d, b: M128i): M256d {.importc: "_mm_permutevar_pd".}`
...
230 func mm_permutevar_pd*(a: M128d, b: M128i): M256d {.importc: "_mm_permutevar_pd".}

AFAIK should return the same vector-types that they received. These should be double/single-precision vectors of 128b..

func mm_permutevar_pd*(a: M128d, b: M128i): M128d {.importc: "_mm_permutevar_pd".}
...
func mm_permutevar_ps*(a: M128, b: M128i): M128 {.importc: "_mm_permutevar_ps".}

and if you don't mind - sure you will :) - one could fix the terrible Intel-naming just a bit by adding the missing ::

func mm_permutevar_epi32*(a, mask :M128i ): M128i = 
  mm_castsi128_ps(
    mm_permutevar_ps( mm_castps_si128( a ), mask )
  )

func mm_permutevar_epi64*(a, mask :M128i ): M128i =
  mm_castsi128_pd(
    mm_permutevar_pd( mm_castpd_si128( a ), mask )
  )

Since everybd. has to add them anyways - after one got trapped... Same could/should be done for the 256bit-sized vectors.
And don't get me wrong here - i just suggest to add what Intel has left out, but evbd. expects to find. But staying with the Intel-wording. Actually a permutevar_<type> is a permute-operation - well, many operation permute a vector. In this case it is a shuffle-operation..
Maybe one could add a common_avx.nim that adds those missing functions to make the intrinsics a bit more consistent ?

just my 20ct, greets Andreas

we are gettin' closer to nimsimd v2 :)

Fails to compile with vcc backend in proc cpuid

cpuid in runtimecheck.nim fails to compile when compiled with --cc:vcc with the following error: "undeclared field: 'eax'" (refering to "result.eax.addr")

Aligned alloc _mm_malloc/free() would be nice to have

Hi again,

_mm_alloc() and _mm_free() should be in sse.nim which is missing.
I'd say this would be a desireable feature as it would make aligned allocations explicit ? And easier and cleaner, too.

What do you think ?

regards, Andreas

--
i found the include for prefetch in sse2.nim and added the desired for testing :

func mm_malloc*( size: int, align: int) :pointer {.importc: "_mm_malloc".}
func mm_free*( pt :pointer ) {.importc: "_mm_free".}

Can't compile due to missing compiler flag

So, basically there's this error shown in the image below when compiling:

Which can be fixed by passing the --passC:-msse2 flag in the compiler (Credit to leorize for finding that out)

guzba / nimsimd Goto Github PK

nimsimd's People

Contributors

Stargazers

Watchers

Forkers

nimsimd's Issues

M1 Mac runtimecheck Errors

Future of Nim-simd

beats & greats, Andreas

mm256_permute4x64_pd missing from avx2.nim

bug: mm256_castsi256_ps returns M128 instead of M256

little typo in intrinsics import signature

Two more glitches in avx.nim

just my 20ct, greets Andreas

Fails to compile with vcc backend in proc cpuid

Aligned alloc _mm_malloc/free() would be nice to have

Can't compile due to missing compiler flag

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent