Giter Site home page Giter Site logo

xxhash's Introduction

xxhash

Go Reference Test

xxhash is a Go implementation of the 64-bit xxHash algorithm, XXH64. This is a high-quality hashing algorithm that is much faster than anything in the Go standard library.

This package provides a straightforward API:

func Sum64(b []byte) uint64
func Sum64String(s string) uint64
type Digest struct{ ... }
    func New() *Digest

The Digest type implements hash.Hash64. Its key methods are:

func (*Digest) Write([]byte) (int, error)
func (*Digest) WriteString(string) (int, error)
func (*Digest) Sum64() uint64

The package is written with optimized pure Go and also contains even faster assembly implementations for amd64 and arm64. If desired, the purego build tag opts into using the Go code even on those architectures.

Compatibility

This package is in a module and the latest code is in version 2 of the module. You need a version of Go with at least "minimal module compatibility" to use github.com/cespare/xxhash/v2:

  • 1.9.7+ for Go 1.9
  • 1.10.3+ for Go 1.10
  • Go 1.11 or later

I recommend using the latest release of Go.

Benchmarks

Here are some quick benchmarks comparing the pure-Go and assembly implementations of Sum64.

input size purego asm
4 B 1.3 GB/s 1.2 GB/s
16 B 2.9 GB/s 3.5 GB/s
100 B 6.9 GB/s 8.1 GB/s
4 KB 11.7 GB/s 16.7 GB/s
10 MB 12.0 GB/s 17.3 GB/s

These numbers were generated on Ubuntu 20.04 with an Intel Xeon Platinum 8252C CPU using the following commands under Go 1.19.2:

benchstat <(go test -tags purego -benchtime 500ms -count 15 -bench 'Sum64$')
benchstat <(go test -benchtime 500ms -count 15 -bench 'Sum64$')

Projects using this package

xxhash's People

Contributors

aleksi avatar cespare avatar davecheney avatar deckarep avatar greatroar avatar jooola avatar kataras avatar nagesh4193 avatar ongardie-ebay avatar rfyiamcool avatar valyala avatar xstrom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xxhash's Issues

missing function

Hello,
When I try to test the cmd/influxd/run in the project github.com/influxdata/influxdb in version 1.4.2 it fails with

github.com/cespare/xxhash

/usr/share/gocode/src/github.com/cespare/xxhash/xxhash_amd64.go:10: missing function body for "Sum64"
/usr/share/gocode/src/github.com/cespare/xxhash/xxhash_amd64.go:12: missing function body for "writeBlocks"

Thank you for your help

Release xxhash v2

We'll have a minor backwards incompatibility so we should bump the major version.

Reconsider the API around seeds

In #77 I'm adding support for seeds in a way that's backward compatible with the v2 API. For v3, we should think through whether we want to support seeds a little more deeply. For example:

  • Should New and Reset just take seeds rather than having separate WithSeed variants?
  • Should the seed be stored in the Digest so that a Reset uses the same seed value? (This somewhat conflicts with the previous item.) If we do this, the seed also needs to be serialized as part of MarshalBinary.
  • Should the one-shot hash functions (Sum64/Sum64String) accept seeds, either by adding seed parameters or Sum64WithSeed variants?

These questions also interact with #65.

v2 module indirectly imports v1 module

github.com/cespare/xxhash/v2 currently requires the github.com/cespare/xxhash (v1) module:

$ go mod why github.com/cespare/xxhash
# github.com/cespare/xxhash
example/import
github.com/cespare/xxhash/v2
github.com/cespare/xxhash/v2.test
github.com/OneOfOne/xxhash
github.com/OneOfOne/xxhash.test
github.com/cespare/xxhash

A later version of github.com/OneOfOne/xxhash (v1.2.5) has moved their benchmarks to a separate sub-module which prevents the indirect import of github.com/cespare/xxhash.

Please update your github.com/OneOfOne/xxhash dependency -- this will remove the unnecessary github.com/cespare/xxhash module reference.

Also, please consider moving your benchmarks to a sub-module since most users of your module won't be interested in the alternate modules used for benchmarking.

checksum "drift"

Hi
for the same file (XML) I get 2 different xx-hashes using v1 and v2

I am using os.Open to open the file and this is how I get and return the checksum:

h := xxhash.New()
if _, err := io.Copy(h, file); err != nil {log.Fatal(err)}
r = fmt.Sprintf("%X", h.Sum(nil)) // Convert into Base16-Uppercase (QuickHash Default)

Is this wrong?

checksum mismatch

github.com/prometheus/client_golang/prometheus imports

    github.com/cespare/xxhash/v2: github.com/cespare/xxhash/[email protected]: verifying module: checksum mismatch

    downloaded: h1:47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=

    sum.golang.org: h1:YRXhKfTDauu4ajMg1TPgFO5jnlC2HCbmLXMcTG5cbYE=

Remove special appengine/safe code support

Appengine supports unsafe since they rolled out their new runtime a couple of years ago.

I don't think we should need any special mention of appengine now, or any build tag configuration which avoids unsafe.

Remove amd64 assembly if it isn't faster than generated Go code

Last I looked at the benchmarks (several Go releases ago), the compiler had closed a lot of ground and the amd64 assembly version was only a little faster. We should check again, and more comprehensively (i.e. against a wider array of amd64 CPUs). If the asm version has no comprehensive advantage on the latest Go version, we should delete the assembly code and stick to pure Go.

If we did that, it would be in a new major version of the package.

If the hand-rolled assembly still has an advantage, consider filing compiler bugs for any obvious deficiencies in the generated code.

Make it impossible to accidentally use an uninitialized Digest

I noticed a sharp edge of the API. It's tempting (especially if xxhash.Digest is embedded in a larger structure) to do this:

var d xxhash.Digest
... d.Write(...) ...
h := d.Sum64()

But this is broken: the zero value of Digest is not usable. You must call Reset first.

We should fix this, either by making the incorrect usage crash or by automatically calling Reset when an uninitialized Digest is used. Hopefully the branch is very predictable and doesn't add much cost.

Update benchmark

First, I'd like to say this is absolutely brilliant.

I updated my xxhash a little bit, I was wondering if you can try it again (or maybe show me how you did the benchmarks and I'll put up a PR with updated numbers.

collision likeness?

thanks for this awesome library. I'm thinking of using xxhash/v2 to generate primary key based on contents of data, however I was wondering about the likelyhood of collisions assume a large input (ie. 40+ bytes)? thanks for any insights you can provide

Rename Digest to Hash

Digest is not a good name. Normally "digest" refers to the output of the hash function.

In v3 we could consider renaming it; perhaps to Hash.

Typical code probably doesn't refer to Digest by name and doesn't have to change:

d := xxhash.New()
... call d.Write() ...
x := d.Sum64()

[INVALID] Windows binary

Dear Caleb,
Could you be so kind to generate .exe for the rest of us who are mere Windows users w/o compiler?

incompatibility with Golang v1.9

I faced this error after updating golang from 1.8 to 1.9. I have deleted all installed packages just in case but this error still exists.

package math/bits: unrecognized import path "math/bits" (import path does not begin with hostname)

file: github.com/cespare/xxhash/rotate19.go

// +build go1.9

package xxhash

import "math/bits"

func rol1(x uint64) uint64  { return bits.RotateLeft64(x, 1) }
func rol7(x uint64) uint64  { return bits.RotateLeft64(x, 7) }
func rol11(x uint64) uint64 { return bits.RotateLeft64(x, 11) }
func rol12(x uint64) uint64 { return bits.RotateLeft64(x, 12) }
func rol18(x uint64) uint64 { return bits.RotateLeft64(x, 18) }
func rol23(x uint64) uint64 { return bits.RotateLeft64(x, 23) }
func rol27(x uint64) uint64 { return bits.RotateLeft64(x, 27) }
func rol31(x uint64) uint64 { return bits.RotateLeft64(x, 31) }

Any suggestions?

Add Copy method

The standard way to copy a hash is via (Un)MarshalBinary, which xxhash supports. But that's both inconvenient and inefficient for a small, fast hash like xxhash. A Copy method would be simple and fast:

func (d *Digest) Copy() *Digest

seed not implemented?

Reading the code, I did not see an implementation of the seed.
Is there a particular reason, the seed is omitted? Or did I not read the code correctly?

reduce allocations when using New()

Hey Caleb,

Thanks for the library, and we should catch up soon. Anyway, we were benchmarking the following sequence:

h := cespare.New()
h.Write(p)
h.Write(o)
h.Sum64()

and found that this causes 1 memory allocation. We then changed New() from returning a hash.Hash64 to returning a *xxh. This removes the memory allocation and saves about 25ns per op on my MacBook Pro. Would you be open to that? It probably makes sense to make the xxh struct public at that point.

Second and pending the above, one of our key use cases involves hashing a uint64 followed by a string. It'd be useful to have a WriteString on *xxh that would avoid a memory allocation (analogous to Sum64String).

Thanks,
Diego

cespare/xxhash/v2 was deleted

@cespare we faced such error when building our project using glide [ERROR] Error scanning github.com/cespare/xxhash/v2: cannot find package "." in: /home/chamith/.glide/cache/src/https-github.com-cespare-xxhash/v2

cannot find package "github.com/cespare/xxhash/v2"

hi,

first. sorry for my english :)

I am relatively new to go programming. but I wanted to deal with it. I wanted to get the following code to work because I deal with go and go-ethereum.

But I always get the error below

Despite intensive google search and my meager english knowledge I could not find a solution. Unfortunately,
I do not know anybody who knows about go.

Maybe you can help me

Many thanks

--Error-
src\github.com\VictoriaMetrics\fastcache\bigcache.go:7:2: cannot find package "github.com/cespare/xxhash/v2" in any of: C:\Go\src\github.com\cespare\xxhash\v2 (from $GOROOT) C:\Users\go\src\github.com\cespare\xxhash\v2 (from $GOPATH)
--Code--
`package main

import (
"context"
"fmt"
"log"
"math"
"math/big"

"github.com/ethereum/go-ethereum/common"
"github.com/ethereum/go-ethereum/ethclient"

)

func main() {
client, err := ethclient.Dial("https://mainnet.infura.io")
if err != nil {
log.Fatal(err)
}

account := common.HexToAddress("0x71c7656ec7ab88b098defb751b7401b5f6d8976f")
balance, err := client.BalanceAt(context.Background(), account, nil)
if err != nil {
    log.Fatal(err)
}
fmt.Println(balance) // 25893180161173005034

blockNumber := big.NewInt(5532993)
balanceAt, err := client.BalanceAt(context.Background(), account, blockNumber)
if err != nil {
    log.Fatal(err)
}
fmt.Println(balanceAt) // 25729324269165216042

fbalance := new(big.Float)
fbalance.SetString(balanceAt.String())
ethValue := new(big.Float).Quo(fbalance, big.NewFloat(math.Pow10(18)))
fmt.Println(ethValue) // 25.729324269165216041

pendingBalance, err := client.PendingBalanceAt(context.Background(), account)
fmt.Println(pendingBalance) // 25729324269165216042

}`

go sum mismatch

while attempting to use the Prometheus client that depends on this package:
github.com/prometheus/client_golang v1.11.0
doing a go get returns the following error.

verifying github.com/cespare/xxhash/[email protected]: checksum mismatch
downloaded: h1:47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=
go.sum: h1:6MnRN8NT7+YBpUIWxHtefFZOKTAPgGjpQSxqLNn0+qY=

SECURITY ERROR
This download does NOT match an earlier download recorded in go.sum.
The bits may have been replaced on the origin server, or an attacker may
have intercepted the download attempt.

This was working last week, and now appears as though the checksum has changed, breaking my builds with this package.

my go sum ENV variable is pointing to:
GOSUMDB="sum.golang.org"

Please documents which xxhash version is used

As README references several common hash functions, xxh3 etc.

It's likely to misunderstand that this repo is xxh3.
Actually it's not. Please documents the xxhash version.
Maybe many people couldn't determine the version from the code.

I see that from the closed issues.

get 16/32 byte sized hash from some input

i want to replace sha256 hashing in my program, i don't need strongly crypto hashing function. but i'm need to have hash with 16 or 32 byte long. How can i do that?

Option for seed value?

Hi there,

I notice that in the canonical C implementation, as well as in most libraries that wrap it, x32 and x64 digests accept a "seed" value. Have you given any thought to including one in your Digest API?

I personally would find it nice; my use case here is ensuring consistent digests across platforms and languages, and while I can do so today, it's at the expense of making every other platform a little more complicated (by adding a manual few "seed" bytes to the start of each digest, for example).

TL;DR:
In most xxhash libs I've seen so far, I can do something like d.Reset(seed). It'd be nice if this library allowed that option, too. Would you be open to this kind of change?

unused consts

xxhash.go:28:2: `prime3v` is unused
	prime3v = prime3
	
xxhash.go:30:2: `prime5v` is unused
	prime5v = prime5
	
xxhash.go:27:2: `prime2v` is unused
	prime2v = prime2

missing v2 visibility

Trying to build a govendor package and the github.com/prometheus/client_golang references a xxhash/v2, specically github.com/cespare/xxhash/v2 v2.1.0. Is that version available for build purposes?

unnecessary conversion

xxhsum/xxhsum.go:19:52: unnecessary conversion
	if len(os.Args) < 2 || len(os.Args) == 2 && string(os.Args[1]) == "-" {
xxhash_test.go:247:17: unnecessary conversion
			return uint64(h.Sum64())
xxhash_test.go:252:17: unnecessary conversion
			return uint64(h.Sum64())
			             

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.