colinmarc / cdb Goto Github PK
View Code? Open in Web Editor NEWA native golang implementation of cdb (http://cr.yp.to/cdb.html)
License: MIT License
A native golang implementation of cdb (http://cr.yp.to/cdb.html)
License: MIT License
Similar to https://github.com/pcarrier/cdb64?
Hi!
My company requires that I list all third-party libraries and their respective licences that I use in my code. Would you mind adding a licence to this repo so that I can continue using it? Thanks!
Just want to know the speed range.
And how does this compare to https://github.com/jbarham/go-cdb ?
I recompiled with go 1.17 under iMac M1.
Reading a previously created cdb database.
The iterator is working fine. I see all key/values.
Get is not working. Stops at
if slotHash == 0 {
What is wrong?
Currently, when one calls cdb.Get(key)
, if the caller has the key on stack, like key := [4]byte {0,1,2,3}
, the Go compiler decides that the value "escapes" and thus it must be allocated on the heap. (The reason being that cdb.hash
uses that key
value, and an arbitrary implementation might decide to keep a slice of the key.)
This can be spotted with the following simple test:
func BenchmarkStackEscapeYes(b *testing.B) {
db, err := cdb.OpenMmap("./test/test.cdb")
require.NoError(b, err)
require.NotNil(b, db)
for i := 0; i < b.N; i++ {
keyOnStack := [2]byte {'X', byte (i)}
keySlice := keyOnStack[:]
db.Get(keySlice)
}
}
If comipilled as go test -gcflags -m
one can see the message ./noescape_test.go:32:3: moved to heap: keyOnStack
.
The following two patches solve this issue:
NoEscapeBytes
function based on what Go's runtime
does internally -- https://github.com/golang/go/blob/ecb2f231fa41b581319505139f8d5ac779763bee/src/runtime/stubs.go#L172-L181Get
function that uses the NoEscapeBytes
, which tricks the Go compiler into not requiring the key to be heap allocated;The performance improvements are about ~10ns per call, which if also using the mmap
patch sent earlier represents 50% of the total runtime.
BenchmarkStackEscapeNo-4 1000000000 9.368 ns/op 0 B/op 0 allocs/op
BenchmarkStackEscapeYes-4 1000000000 23.12 ns/op 2 B/op 1 allocs/op
Low overhead: A database uses 2048 bytes, plus 24 bytes per record, plus the space for keys and data.
For my own calculation, for each record, it should be len(key), len(value), hash_value, entry_offset 4 numbers. Each is 4 bytes. So total 16 bytes overhead per record. Where the 24 bytes come from?
Currently the CDB reader uses io.ReaderAt
, that for each read operation will issue a read
system call (or equivalent). An alternative would be to use mmap
-ed memory to access the backing file.
In my experiments (see the patch below) by using mmap
yields a 35 times performance improvement:
BenchmarkGet-4 13902710 2571 ns/op 28 B/op 2 allocs/op
BenchmarkGetMmap-4 484713384 73.96 ns/op 0 B/op 0 allocs/op
The following are a few patches that provide such support. If accepted I can submit a pull request:
[]byte
buffer that backs the CDB reader; such a []byte
buffer can be either the result of a mmap
or just reading the entire file in memory;io.Close
to be called on cdb.Close
; (it will be used to munmap
the memory if needed;)mmap
support;The only semantic change is the following:
[]byte
value returned as result of Get
should not be written to by the application, as it's actually a slice from the mmap
-ed file;[]byte
value can't be used after the cdb.Close
was called;A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.