Giter Site home page Giter Site logo

simdxorshift's Introduction

SIMDxorshift

Xorshift are a family of pseudo-random number generators (RNG) invented by Marsaglia.

https://en.wikipedia.org/wiki/Xorshift

We present a vectorized version of xorshift128+, a popular random-number generator part of this family. It is written in C. The implementation uses Intel's SIMD instructions and is based on Vigna's original (pure C) implementation.

As a random number generator, xorshift128+ is not very strong. It fails statistical tests such as BigCrush. It should never be used alone in applications where the quality of the random numbers matters a great deal. However, when you just want fast and "good enough" random numbers, it should do well.

Since speed is the primary benefit of xorshift128+, then it is tempting to accelerate it further using vector instructions.

This library is used by the Yandex ClickHouse high-performance data engine.

Prerequisite

You should have a recent Intel processor (Haswell or better). If you bought your PC before everyone on Earth had a smartphone, it is probably too old a PC. Please upgrade.

Your compiler supports C99, right? C99 stands for 1999. That was almost 20 years ago.

Code sample

#include "simdxorshift128plus.h"

// create a new key
avx_xorshift128plus_key_t mykey;
avx_xorshift128plus_init(324,4444,&mykey); // values 324, 4444 are arbitrary, must be non-zero

// generate 32 random bytes, do this as many times as you want
__m256i randomstuff =  avx_xorshift128plus(&mykey);

Usage

$ make
$ ./fillarray
Generating 5000 32-bit random numbers
Time reported in number of cycles per array element.
We store values to an array of size = 19 kB.

We just generate the random numbers:
populateRandom_xorshift128plus(prec, size):  3.63 cycles per operation
populateRandom_avx_xorshift128plus(prec, size):  2.21 cycles per operation
populateRandom_avx_xorshift128plus_two(prec, size):  1.88 cycles per operation
populateRandom_avx512_xorshift128plus_two(prec, size):  1.47 cycles per operation

(Tests on a Skylake-X processor.)

Shallow analysis

SIMD random-number generation is something like twice as fast as plain C random number generation. However on algorithms such as random shuffling, the benefits of faster random number generation are lesser because other bottlenecks arise.

For the most part however, the application of SIMD instructions for random number generation is "free" if the CPU supports it.

Related work

Reference

Vigna's xorshift128+ implementation http://xorshift.di.unimi.it/xorshift128plus.c

simdxorshift's People

Contributors

gareth-j avatar lemire avatar ofrei avatar theironborn avatar travisdowns avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

simdxorshift's Issues

Doesn't compile on many gcc versions

After adding the AES stuff, it fails to compile on many gcc versions:

cc -fPIC -std=c99 -O3 -mavx2  -march=native -Wall -Wextra -pedantic -Wshadow -o fillarray ./benchmark/fillarray.c simdxorshift128plus.o xorshift128plus.o -Iinclude -flto
In file included from ./benchmark/fillarray.c:7:0:
include/simdaesdragontamer.h: In function ‘aesdragontamer_r’:
include/simdaesdragontamer.h:25:10: warning: implicit declaration of function ‘_mm256_set_m128i’ [-Wimplicit-function-declaration]
   return _mm256_set_m128i(penultimate1,penultimate2);
          ^
include/simdaesdragontamer.h:25:10: error: incompatible types when returning type ‘int’ but ‘__m256i {aka __vector(4) long long int}’ was expected
include/simdaesdragontamer.h:26:1: warning: control reaches end of non-void function [-Wreturn-type]
 }
 ^

It is because _mm256_set_m128i doesn't exist on gcc until recently.

distribution of random

I was wondering if any work has been done to document the distribution of this SIMD random generator?

Does avx_xorshift128plus() generate same sequence as original xorshift128plus.c?

Hi @lemire, thank you for the useful code!

I have a naive question about avx_xorshift128plus(). Does it generate same sequence as original xorshift128plus.c?

I supposed when I gave same key1 and key2 to xorshift128plus_init() and avx_xorshift128plus_init(), xorshift128plus() and lowest 64 bit lane of avx_xorshift128plus() generates same sequence.

I've tried the following code

#include <stdio.h>
#include <inttypes.h>
#include "simdxorshift128plus.h"
#include "xorshift128plus.h"

int main() {
    const uint64_t key1 = 324;
    const uint64_t key2 = 4444;
    {
        printf("xorshift128plus()\n");
        xorshift128plus_key_t key;
        xorshift128plus_init(key1, key2, &key);
        for(int i = 0; i < 4; ++i) {
            const uint64_t v = xorshift128plus(&key);
            printf("i=%d, %016" PRIx64 "\n", i, v);
        }
    }
    {
        printf("avx_xorshift128plus()\n");
        avx_xorshift128plus_key_t key;
        avx_xorshift128plus_init(key1, key2, &key);
        for(int i = 0; i < 4; ++i) {
            const __m256i v = avx_xorshift128plus(&key);
            union U {
                __m256i u256;
                uint64_t u64[4];
            } u;
            u.u256 = v;
            printf("i=%d, ", i);
            printf("%016" PRIx64 ", ", u.u64[0]);
            printf("%016" PRIx64 ", ", u.u64[1]);
            printf("%016" PRIx64 ", ", u.u64[2]);
            printf("%016" PRIx64 "\n", u.u64[3]);
        }
    }
    return 0;
}

But above code outputs the following sequences:

xorshift128plus()
i=0, 00000000a200496e
i=1, 00000008ab123b20
i=2, 00510008ab6f84d2
i=3, 04a80008ad7c8f04

avx_xorshift128plus()
i=0, 00000008ae023c66, 3869ed045ddae7e2, 6b826340a2f2b209, 85f43fbcf04fb5f8
i=1, 04570008ae3986a2, 88ec543d470b3c74, 9f18502273d1ffd7, e29ddbac3683852a
i=2, 0479a2b80b222569, 1ec3fe354c920644, 10e17fd02388f80d, dd1e114ed0ef2edc
i=3, 5c271a30e80b38a5, 17f21edf8a60db14, e653258226804f7e, a9a53a8a8735680d

I've checked avx_xorshift128plus().

__m256i avx_xorshift128plus(avx_xorshift128plus_key_t *key) {
__m256i s1 = key->part1;
const __m256i s0 = key->part2;
key->part1 = key->part2;
s1 = _mm256_xor_si256(key->part2, _mm256_slli_epi64(key->part2, 23));
key->part2 = _mm256_xor_si256(
_mm256_xor_si256(_mm256_xor_si256(s1, s0),
_mm256_srli_epi64(s1, 18)), _mm256_srli_epi64(s0, 5));
return _mm256_add_epi64(key->part2, s0);
}

I think line #53 makes difference.

When I replace this line with the following patch,

-	s1 = _mm256_xor_si256(key->part2, _mm256_slli_epi64(key->part2, 23));
+	s1 = _mm256_xor_si256(s1, _mm256_slli_epi64(s1, 23));

it generates the following sequence which matches original xorshift128plus().

avx_xorshift128plus()
i=0, 00000000a200496e, 1e0ecd55a5429b74, 9c9b64fc8ff5e750, 36c5602a44649eda
i=1, 00000008ab123b20, c97ee0e41d03b58b, cdc45acd51153a37, e84499a5edefc404
i=2, 00510008ab6f84d2, 79d2b441a896dbcd, b2fbc4c20cfe92e0, c87f6e0cd13b8045
i=3, 04a80008ad7c8f04, 6b5a1f5ddf11088e, 7f5591e4c866a113, b6067b6a4c632679

My questions are:

  • (Q.1) Is this intentional change?

  • (Q.2) Does output of avx_xorshift128plus() have same quality of randomness as original xorshift128plus.c?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.