Blake3 for scala

This is a highly optimized blake3 implementation for scala, scala-js, and scala-native, without any dependencies. This implementation has a constant memory footprint (about 5kb) which hasn't depended on the size of hashed data or the size of the output hash.

If you're looking for the faster possible hash function for scala.js I suggest to use this one, instead of SHA because this implementation use only 32 bits number which is natively supported by JS.

You can use it as

libraryDependencies += "pt.kcry" %%% "blake3" % "x.x.x"

The latest version is

API is pretty simple:

scala> import pt.kcry.blake3.Blake3

scala> Blake3.newHasher().update("Some string").doneHex(64)
val res1: String = 2e5524f3481046587080604ae4b4ceb44b721f3964ce0764627dee2c171de4c2

scala> Blake3.newDeriveKeyHasher("whats the Elvish word for friend").update("Some string").doneHex(64)
val res2: String = c2e79fe73dde16a13b4aa5a947b0e9cd7277ea8e68da250759de3ae62372b340

scala> Blake3.newKeyedHasher("whats the Elvish word for friend").update("Some string").doneHex(64)
val res3: String = 79943402309f9bb05338193f21fb57d98ab848bdcac67e5e097340f116ff90ba

scala> Blake3.hex("Some string", 64)
val res4: String = 2e5524f3481046587080604ae4b4ceb44b721f3964ce0764627dee2c171de4c2

scala> Blake3.bigInt("Some string", 32)
val res5: BigInt = 777331955

scala>

Hasher.update is mutable when Hasher.done isn't.

Hasher.update supports different input such as: byte array, part of byte array, single byte or string, and many others like OutputStream or ByteBuffer.

Hasher.done supports different output such as:

done(out: Array[Byte]) to fill full provided array;
done(out: Array[Byte], offset: Int, len: Int) to fill specified part of provided array;
done(out: OutputStream, len: Int) to fill specified OutputStream;
done(out: ByteBuffer) to fill specified ByteBuffer;
done() that returns a single byte hash value;
doneShort(), doneInt() and doneLong() that returns a single short, int or long hash value;
doneBigInt(bitLength: Int) that returns positive BigInt with specified length in bits;
doneHex(resultLength: Int) that returns hex encoded string with specified output length in characters;
doneBaseXXX(len: Int) that returns string representative of XXX encoded as it defined in RFC 4648 without padding;
doneXor(...) that applied hash to existed value via XOR;
doneCallBack(..) and doneXorCallBack(...) which is used callback to for each produced byte.

This implementation is thread-safe and you can use it in a multithreaded environment. Anyway, this implementation doesn't currently include any multithreading optimizations.

As a baseline for benchmarks, I've used the original C version c-0.3.7 via the JNI interface that was implemented as part of BLAKE3jni.

All benchmarks were performed on two machines:

Zulu11.56+19-CA (build 11.0.15+10-LTS) at Intel® Core™ i7-8700B with AVX2 assembly optimization inside the baseline,
Zulu11.56+19-CA (build 11.0.15+10-LTS) at Apple M1 without any assembly optimization inside the baseline.

Short summary:

it is about 4 times slower than AVX2 assembly version via JNI which is expected,
it is about 20% slower than the original C version via JNI,
it has a constant memory footprint (yeah, no GC on hashing!),
increasing result hash size has the same impact as hashing.

The full version of the results are available as:

for Intel® Core™ i7-8700B at jmh-result.intel.json or via Intel @ JMH Visualizer.
for Apple M1 at jmh-result.m1.json or via M1 @ JMH Visualizer.

Chunking data results in different hash to using the data non-chunked

I have test data based on the official test vectors where I have a 64 byte array with each byte in the array having the value of its index. i.e. my input is an Array[Byte] with value data = [0, 1, 2, …, 63]

Running Blake3.newHasher().update(data).doneHex(16) results in 4eed7141ea4a5cd4 matching the expected result.

However, if instead I split the data into two 32-byte chunks, data1 = [0, 1, …, 31] and data2 = [32, 32, …, 63] and run Blake3.newHasher().update(data1).update(data2).doneHex(16) I should expect the same output but instead get cdc46473e43a732a.

Weirdly splitting a 63 byte array into 32 and 31 bytes and performing the same results in the correct data so I'm not sure if there's something funky going on that I'm missing.

The same can also be observed for other sized arrays of data, for example 128, 1024, 2048 when split in two.

I wrote the following test to show this in action, although I apologise for the code being in Kotlin, I'm not a Scala developer (yet)!

class Testing {
    @Test
    fun sixtyThree() {
        val hasher1 = Blake3.newHasher()
        hasher1.update(ByteArray(63) { it.toByte() })
        val expected = hasher1.doneHex(16)

        val hasher2 = Blake3.newHasher()
        hasher2.update(ByteArray(32) { it.toByte() })
        hasher2.update(ByteArray(31) { (it + 32).toByte() })
        val actual = hasher2.doneHex(16)

        // works as expected
        assertEquals(expected, actual)
    }

    @Test
    fun sixtyFour() {
        val hasher1 = Blake3.newHasher()
        hasher1.update(ByteArray(64) { it.toByte() })
        val expected = hasher1.doneHex(16)

        val hasher2 = Blake3.newHasher()
        hasher2.update(ByteArray(32) { it.toByte() })
        hasher2.update(ByteArray(32) { (it + 32).toByte() })
        val actual = hasher2.doneHex(16)

        // fails
        assertEquals(expected, actual)
    }
}

kcrypt / scala-blake3 Goto Github PK

scala-blake3's Introduction

Blake3 for scala

scala-blake3's People

Contributors

Stargazers

Watchers

Forkers

scala-blake3's Issues

CI fails on unittests

`ArrayIndexOutOfBoundsException` on JVM

Chunking data results in different hash to using the data non-chunked

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent