guzba / nimsimd Goto Github PK
View Code? Open in Web Editor NEWPleasant Nim bindings for SIMD instruction sets.
License: MIT License
Pleasant Nim bindings for SIMD instruction sets.
License: MIT License
With the following code:
import nimsimd/runtimecheck
let
cpuHasAvx* = checkInstructionSets({AVX})
cpuHasAvx2* = checkInstructionSets({AVX, AVX2})
This fails to compile:
/Users/miguelmartin/.cache/nim/nimsimd_test_d/@m..@s..@s..@[email protected]@[email protected]@[email protected]:130:10: error: invalid output constraint '=a' in asm
:"=a"(eaxr), "=b"(ebxr), "=c"(ecxr), "=d"(edxr)
^
1 error generated.
Error: execution of an external compiler program 'clang -c -w -ferror-limit=3 -pthread -I/Users/miguelmartin/.choosenim/toolchains/nim-2.0.0/lib -I/Users/miguelmartin/repos/nim_playground/nimsimd_test/src -o /Users/miguelmartin/.cache/nim/nimsimd_test_d/@m..@s..@s..@[email protected]@[email protected]@[email protected] /Users/miguelmartin/.cache/nim/nimsimd_test_d/@m..@s..@s..@[email protected]@[email protected]@[email protected]' failed with exit code: 1
System Details:
Compiler version: 2.0
Hello Ryan,
first of all - a BIG THX for this lib. After some video lectures i started with SIMD-stuff and found nimsimd a perfect start. AVX512 is missing, but state-of-the-art AVX2 surely has the broadest audience - even on my side i'm not AVX512 ready, yet :)
But looking at what others achieve with SIMD - looking at Prof. Lemire & friends - one might get the impression, that AVX512 is 'just around the corner' :)
Naturally i started doing wrappers around the horrible intrinsics, trying to improve my and the user-experience. I will do some examples e.g. a sorting-network, string-conversion, base64 en-/decoding, string-lookup etc.
Now i came across Agner Fogs C++ Vector-Class-Library (VCL]. And from their description i learned they do sophisticated macro-/template-stuff so that one can use e.g. AVX512, though the actual hardware does not support it. And that made me think - "Shall i try nim-wrapper for VCL ?" and then make that lib more accessible thru Nim or continue to work with nim-simd ? Or would it make more sense to do exactly the same with Nim- macro-magic => a VCL rewrite (which would be very much above my pay-grade) ?
So what are your plans with nim-simd and maybe you can find the time to take a look at VCL ?
I could need some advice on how to make smth. that encourages more Nim-users to enter the SIMD journey and make this world a more efficient place 8=)).
Fun aside, i believe there is much to gain for Nim in terms of attractivness for new-users beeing able to tryout a technology like SIMD - since Python and even Javascript started adopting it. But maybe even the std-lib could gain here and there ?
As a side note : beef already provided a concept & macro to check array-alignment - and he expressed some interest in SIMD-accelerated string lookups...
And i have some good material (papers/videos/gh-code/etc.) in my Zotero, which might fit well in the nim-simd-wiki ?
Hi guzba,
maybe found another glitch - mm256_permute4x64_pd
for double-precision should be in avx2.nim
. Though mm256_permute4x64_epi64
+ a cast will do exactly the same - so, maybe you left it intentionally ?
regards, andreas
diff --git a/src/nimsimd/avx.nim b/src/nimsimd/avx.nim
index ed9dc1b..53796e8 100644
--- a/src/nimsimd/avx.nim
+++ b/src/nimsimd/avx.nim
@@ -63,7 +63,7 @@ func mm256_castsi128_si256*(a: M128i): M256i {.importc: "_mm256_castsi128_si256"
func mm256_castsi256_pd*(a: M256i): M128d {.importc: "_mm256_castsi256_pd".}
-func mm256_castsi256_ps*(a: M256i): M128 {.importc: "_mm256_castsi256_ps".}
+func mm256_castsi256_ps*(a: M256i): M256 {.importc: "_mm256_castsi256_ps".}
func mm256_castsi256_si128*(a: M256i): M128i {.importc: "_mm256_castsi256_si128".}
Hi guzba thx for the work,
prbl. typo in ./src/nimsimd/sse2.nim
:: line-269
i believe the SSE2 import to cast a single-precision to int :
func mm_castps_si128*(a: M128d): M128i {.importc: "_mm_castps_si128".}
should instead receive a single-precision like so :
func mm_castps_si128*(a: M128): M128i {.importc: "_mm_castps_si128".}
The cast from double-precision mm_castpd_si128
to M128i
is correct and currently has the exact same signature as mm_castps_si128()
.
Latest Intel Intrinsics Guide/dec.2023 confirms this. I fixed it in the local nimble-dir and it works.
greets Andreas
Hello guzba,
in avx.nim
lines 226-230 ::
226 func mm_permutevar_pd*(a: M128d, b: M128i): M256d {.importc: "_mm_permutevar_pd".}`
...
230 func mm_permutevar_pd*(a: M128d, b: M128i): M256d {.importc: "_mm_permutevar_pd".}
AFAIK should return the same vector-types that they received. These should be double/single-precision vectors of 128b..
func mm_permutevar_pd*(a: M128d, b: M128i): M128d {.importc: "_mm_permutevar_pd".}
...
func mm_permutevar_ps*(a: M128, b: M128i): M128 {.importc: "_mm_permutevar_ps".}
and if you don't mind - sure you will :) - one could fix the terrible Intel-naming just a bit by adding the missing ::
func mm_permutevar_epi32*(a, mask :M128i ): M128i =
mm_castsi128_ps(
mm_permutevar_ps( mm_castps_si128( a ), mask )
)
func mm_permutevar_epi64*(a, mask :M128i ): M128i =
mm_castsi128_pd(
mm_permutevar_pd( mm_castpd_si128( a ), mask )
)
Since everybd. has to add them anyways - after one got trapped... Same could/should be done for the 256bit
-sized vectors.
And don't get me wrong here - i just suggest to add what Intel has left out, but evbd. expects to find. But staying with the Intel-wording. Actually a permutevar_<type>
is a permute
-operation - well, many operation permute a vector. In this case it is a shuffle
-operation..
Maybe one could add a common_avx.nim
that adds those missing functions to make the intrinsics a bit more consistent ?
we are gettin' closer to nimsimd v2 :)
cpuid in runtimecheck.nim fails to compile when compiled with --cc:vcc with the following error: "undeclared field: 'eax'" (refering to "result.eax.addr")
Hi again,
_mm_alloc()
and _mm_free()
should be in sse.nim
which is missing.
I'd say this would be a desireable feature as it would make aligned allocations explicit ? And easier and cleaner, too.
What do you think ?
regards, Andreas
--
i found the include for prefetch
in sse2.nim
and added the desired for testing :
func mm_malloc*( size: int, align: int) :pointer {.importc: "_mm_malloc".}
func mm_free*( pt :pointer ) {.importc: "_mm_free".}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.