Giter Site home page Giter Site logo

nimsimd's People

Contributors

edisile avatar guzba avatar nimaoth avatar simonkrauter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

nimsimd's Issues

M1 Mac runtimecheck Errors

With the following code:

import nimsimd/runtimecheck

let
  cpuHasAvx* = checkInstructionSets({AVX})
  cpuHasAvx2* = checkInstructionSets({AVX, AVX2})

This fails to compile:

/Users/miguelmartin/.cache/nim/nimsimd_test_d/@m..@s..@s..@[email protected]@[email protected]@[email protected]:130:10: error: invalid output constraint '=a' in asm
        :"=a"(eaxr), "=b"(ebxr), "=c"(ecxr), "=d"(edxr)
         ^
1 error generated.
Error: execution of an external compiler program 'clang -c  -w -ferror-limit=3 -pthread   -I/Users/miguelmartin/.choosenim/toolchains/nim-2.0.0/lib -I/Users/miguelmartin/repos/nim_playground/nimsimd_test/src -o /Users/miguelmartin/.cache/nim/nimsimd_test_d/@m..@s..@s..@[email protected]@[email protected]@[email protected] /Users/miguelmartin/.cache/nim/nimsimd_test_d/@m..@s..@s..@[email protected]@[email protected]@[email protected]' failed with exit code: 1

System Details:

  • OS: MacOS 13.5
  • CPU: M1 Max
  • RAM: 64GB

Compiler version: 2.0

Future of Nim-simd

Hello Ryan,

first of all - a BIG THX for this lib. After some video lectures i started with SIMD-stuff and found nimsimd a perfect start. AVX512 is missing, but state-of-the-art AVX2 surely has the broadest audience - even on my side i'm not AVX512 ready, yet :)
But looking at what others achieve with SIMD - looking at Prof. Lemire & friends - one might get the impression, that AVX512 is 'just around the corner' :)
Naturally i started doing wrappers around the horrible intrinsics, trying to improve my and the user-experience. I will do some examples e.g. a sorting-network, string-conversion, base64 en-/decoding, string-lookup etc.
Now i came across Agner Fogs C++ Vector-Class-Library (VCL]. And from their description i learned they do sophisticated macro-/template-stuff so that one can use e.g. AVX512, though the actual hardware does not support it. And that made me think - "Shall i try nim-wrapper for VCL ?" and then make that lib more accessible thru Nim or continue to work with nim-simd ? Or would it make more sense to do exactly the same with Nim- macro-magic => a VCL rewrite (which would be very much above my pay-grade) ?
So what are your plans with nim-simd and maybe you can find the time to take a look at VCL ?
I could need some advice on how to make smth. that encourages more Nim-users to enter the SIMD journey and make this world a more efficient place 8=)).
Fun aside, i believe there is much to gain for Nim in terms of attractivness for new-users beeing able to tryout a technology like SIMD - since Python and even Javascript started adopting it. But maybe even the std-lib could gain here and there ?

beats & greats, Andreas

As a side note : beef already provided a concept & macro to check array-alignment - and he expressed some interest in SIMD-accelerated string lookups...
And i have some good material (papers/videos/gh-code/etc.) in my Zotero, which might fit well in the nim-simd-wiki ?

mm256_permute4x64_pd missing from avx2.nim

Hi guzba,

maybe found another glitch - mm256_permute4x64_pd for double-precision should be in avx2.nim. Though mm256_permute4x64_epi64 + a cast will do exactly the same - so, maybe you left it intentionally ?

regards, andreas

bug: mm256_castsi256_ps returns M128 instead of M256

diff --git a/src/nimsimd/avx.nim b/src/nimsimd/avx.nim
index ed9dc1b..53796e8 100644
--- a/src/nimsimd/avx.nim
+++ b/src/nimsimd/avx.nim
@@ -63,7 +63,7 @@ func mm256_castsi128_si256*(a: M128i): M256i {.importc: "_mm256_castsi128_si256"
 
 func mm256_castsi256_pd*(a: M256i): M128d {.importc: "_mm256_castsi256_pd".}
 
-func mm256_castsi256_ps*(a: M256i): M128 {.importc: "_mm256_castsi256_ps".}
+func mm256_castsi256_ps*(a: M256i): M256 {.importc: "_mm256_castsi256_ps".}
 
 func mm256_castsi256_si128*(a: M256i): M128i {.importc: "_mm256_castsi256_si128".}

little typo in intrinsics import signature

Hi guzba thx for the work,

prbl. typo in ./src/nimsimd/sse2.nim :: line-269
i believe the SSE2 import to cast a single-precision to int :
func mm_castps_si128*(a: M128d): M128i {.importc: "_mm_castps_si128".}

should instead receive a single-precision like so :
func mm_castps_si128*(a: M128): M128i {.importc: "_mm_castps_si128".}

The cast from double-precision mm_castpd_si128 to M128i is correct and currently has the exact same signature as mm_castps_si128().
Latest Intel Intrinsics Guide/dec.2023 confirms this. I fixed it in the local nimble-dir and it works.

greets Andreas

Two more glitches in avx.nim

Hello guzba,

in avx.nim lines 226-230 ::

226 func mm_permutevar_pd*(a: M128d, b: M128i): M256d {.importc: "_mm_permutevar_pd".}`
...
230 func mm_permutevar_pd*(a: M128d, b: M128i): M256d {.importc: "_mm_permutevar_pd".}

AFAIK should return the same vector-types that they received. These should be double/single-precision vectors of 128b..

func mm_permutevar_pd*(a: M128d, b: M128i): M128d {.importc: "_mm_permutevar_pd".}
...
func mm_permutevar_ps*(a: M128, b: M128i): M128 {.importc: "_mm_permutevar_ps".}

and if you don't mind - sure you will :) - one could fix the terrible Intel-naming just a bit by adding the missing ::

func mm_permutevar_epi32*(a, mask :M128i ): M128i = 
  mm_castsi128_ps(
    mm_permutevar_ps( mm_castps_si128( a ), mask )
  )
func mm_permutevar_epi64*(a, mask :M128i ): M128i =
  mm_castsi128_pd(
    mm_permutevar_pd( mm_castpd_si128( a ), mask )
  )

Since everybd. has to add them anyways - after one got trapped... Same could/should be done for the 256bit-sized vectors.
And don't get me wrong here - i just suggest to add what Intel has left out, but evbd. expects to find. But staying with the Intel-wording. Actually a permutevar_<type> is a permute-operation - well, many operation permute a vector. In this case it is a shuffle-operation..
Maybe one could add a common_avx.nim that adds those missing functions to make the intrinsics a bit more consistent ?

just my 20ct, greets Andreas

we are gettin' closer to nimsimd v2 :)

Aligned alloc _mm_malloc/free() would be nice to have

Hi again,

_mm_alloc() and _mm_free() should be in sse.nim which is missing.
I'd say this would be a desireable feature as it would make aligned allocations explicit ? And easier and cleaner, too.

What do you think ?

regards, Andreas

--
i found the include for prefetch in sse2.nim and added the desired for testing :

func mm_malloc*( size: int, align: int) :pointer {.importc: "_mm_malloc".}
func mm_free*( pt :pointer ) {.importc: "_mm_free".}

Can't compile due to missing compiler flag

So, basically there's this error shown in the image below when compiling:
Screenshot_8

Which can be fixed by passing the --passC:-msse2 flag in the compiler (Credit to leorize for finding that out)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.