Giter Site home page Giter Site logo

kcrypt / scala-blake3 Goto Github PK

View Code? Open in Web Editor NEW
25.0 2.0 6.0 3.65 MB

This is a highly optimized blake3 implementation for scala, scala-js and scala-native

License: Other

Scala 99.43% Shell 0.05% Python 0.52%
scala scala-js scalajs scala-native blake3

scala-blake3's Introduction

Blake3 for scala

This is a highly optimized blake3 implementation for scala, scala-js, and scala-native, without any dependencies. This implementation has a constant memory footprint (about 5kb) which hasn't depended on the size of hashed data or the size of the output hash.

If you're looking for the faster possible hash function for scala.js I suggest to use this one, instead of SHA because this implementation use only 32 bits number which is natively supported by JS.

You can use it as

libraryDependencies += "pt.kcry" %%% "blake3" % "x.x.x"

The latest version is maven-central

API is pretty simple:

scala> import pt.kcry.blake3.Blake3

scala> Blake3.newHasher().update("Some string").doneHex(64)
val res1: String = 2e5524f3481046587080604ae4b4ceb44b721f3964ce0764627dee2c171de4c2

scala> Blake3.newDeriveKeyHasher("whats the Elvish word for friend").update("Some string").doneHex(64)
val res2: String = c2e79fe73dde16a13b4aa5a947b0e9cd7277ea8e68da250759de3ae62372b340

scala> Blake3.newKeyedHasher("whats the Elvish word for friend").update("Some string").doneHex(64)
val res3: String = 79943402309f9bb05338193f21fb57d98ab848bdcac67e5e097340f116ff90ba

scala> Blake3.hex("Some string", 64)
val res4: String = 2e5524f3481046587080604ae4b4ceb44b721f3964ce0764627dee2c171de4c2

scala> Blake3.bigInt("Some string", 32)
val res5: BigInt = 777331955

scala> 

Hasher.update is mutable when Hasher.done isn't.

Hasher.update supports different input such as: byte array, part of byte array, single byte or string, and many others like OutputStream or ByteBuffer.

Hasher.done supports different output such as:

  • done(out: Array[Byte]) to fill full provided array;
  • done(out: Array[Byte], offset: Int, len: Int) to fill specified part of provided array;
  • done(out: OutputStream, len: Int) to fill specified OutputStream;
  • done(out: ByteBuffer) to fill specified ByteBuffer;
  • done() that returns a single byte hash value;
  • doneShort(), doneInt() and doneLong() that returns a single short, int or long hash value;
  • doneBigInt(bitLength: Int) that returns positive BigInt with specified length in bits;
  • doneHex(resultLength: Int) that returns hex encoded string with specified output length in characters;
  • doneBaseXXX(len: Int) that returns string representative of XXX encoded as it defined in RFC 4648 without padding;
  • doneXor(...) that applied hash to existed value via XOR;
  • doneCallBack(..) and doneXorCallBack(...) which is used callback to for each produced byte.

This implementation is thread-safe and you can use it in a multithreaded environment. Anyway, this implementation doesn't currently include any multithreading optimizations.

As a baseline for benchmarks, I've used the original C version c-0.3.7 via the JNI interface that was implemented as part of BLAKE3jni.

All benchmarks were performed on two machines:

  • Zulu11.56+19-CA (build 11.0.15+10-LTS) at Intel® Core™ i7-8700B with AVX2 assembly optimization inside the baseline,
  • Zulu11.56+19-CA (build 11.0.15+10-LTS) at Apple M1 without any assembly optimization inside the baseline.

Short summary:

  • it is about 4 times slower than AVX2 assembly version via JNI which is expected,
  • it is about 20% slower than the original C version via JNI,
  • it has a constant memory footprint (yeah, no GC on hashing!),
  • increasing result hash size has the same impact as hashing.

The full version of the results are available as:

scala-blake3's People

Contributors

catap avatar scala-steward avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

scala-blake3's Issues

CI fails on unittests

Local root 1dd6492 fails as:

[info] All possibly bytes where inputLen
[info] - when 1024
[info] - when 2048
[info] - when 16665 *** FAILED ***
[info]   "...ddd2840c53e45c2a20aa[]" was not equal to "...ddd2840c53e45c2a20aa[f0ed94]" (TestVector.scala:191)
...
[info] *** 1 TEST FAILED ***
[error] Failed tests:
[error] 	ky.korins.blake3.AdditionalTestVectorsTest
[info] Fast optimizing /home/runner/work/scala-blake3/scala-blake3/js/target/scala-2.12/blake3-test-fastopt

`ArrayIndexOutOfBoundsException` on JVM

The stack trace:

java.lang.ArrayIndexOutOfBoundsException: Index 37 out of bounds for length 37
	at ky.korins.blake3.Output.rootBytes(Output.scala:81)
	at ky.korins.blake3.HasherImpl.done(HasherImpl.scala:202)
...

it happened when rootBytes(..) is called with:

  • out as Array[Byte](37);
  • off is 0;
  • len is 37.

An issue happened inside this match:

        lim - pos match {
          case 1 =>

          case 2 =>

          case 3 =>

          case _ =>
        }

on case _ => condition, after pos += 1 it fails on out(pos).

ENV:

  • scala-blake3: 2.8.0
  • scala: 2.13.6
  • JVM: OpenJDK 64-Bit Server VM Temurin-11.0.13+8 (build 11.0.13+8, mixed mode)

Seems like a rare GC bug.

Chunking data results in different hash to using the data non-chunked

I have test data based on the official test vectors where I have a 64 byte array with each byte in the array having the value of its index. i.e. my input is an Array[Byte] with value data = [0, 1, 2, …, 63]

Running Blake3.newHasher().update(data).doneHex(16) results in 4eed7141ea4a5cd4 matching the expected result.

However, if instead I split the data into two 32-byte chunks, data1 = [0, 1, …, 31] and data2 = [32, 32, …, 63] and run Blake3.newHasher().update(data1).update(data2).doneHex(16) I should expect the same output but instead get cdc46473e43a732a.

Weirdly splitting a 63 byte array into 32 and 31 bytes and performing the same results in the correct data so I'm not sure if there's something funky going on that I'm missing.

The same can also be observed for other sized arrays of data, for example 128, 1024, 2048 when split in two.

I wrote the following test to show this in action, although I apologise for the code being in Kotlin, I'm not a Scala developer (yet)!

class Testing {
    @Test
    fun sixtyThree() {
        val hasher1 = Blake3.newHasher()
        hasher1.update(ByteArray(63) { it.toByte() })
        val expected = hasher1.doneHex(16)

        val hasher2 = Blake3.newHasher()
        hasher2.update(ByteArray(32) { it.toByte() })
        hasher2.update(ByteArray(31) { (it + 32).toByte() })
        val actual = hasher2.doneHex(16)

        // works as expected
        assertEquals(expected, actual)
    }

    @Test
    fun sixtyFour() {
        val hasher1 = Blake3.newHasher()
        hasher1.update(ByteArray(64) { it.toByte() })
        val expected = hasher1.doneHex(16)

        val hasher2 = Blake3.newHasher()
        hasher2.update(ByteArray(32) { it.toByte() })
        hasher2.update(ByteArray(32) { (it + 32).toByte() })
        val actual = hasher2.doneHex(16)

        // fails
        assertEquals(expected, actual)
    }
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.