Giter Site home page Giter Site logo

cdb's People

Contributors

colinmarc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cdb's Issues

Licence

Hi!

My company requires that I list all third-party libraries and their respective licences that I use in my code. Would you mind adding a licence to this repo so that I can continue using it? Thanks!

The hash is not finding the keys anymore

I recompiled with go 1.17 under iMac M1.

Reading a previously created cdb database.

The iterator is working fine. I see all key/values.

Get is not working. Stops at
if slotHash == 0 {

What is wrong?

Make sure that the `key` argument on `Get` doesn't need to be allocated on heap

Currently, when one calls cdb.Get(key), if the caller has the key on stack, like key := [4]byte {0,1,2,3}, the Go compiler decides that the value "escapes" and thus it must be allocated on the heap. (The reason being that cdb.hash uses that key value, and an arbitrary implementation might decide to keep a slice of the key.)

This can be spotted with the following simple test:

func BenchmarkStackEscapeYes(b *testing.B) {
    db, err := cdb.OpenMmap("./test/test.cdb")
    require.NoError(b, err)
    require.NotNil(b, db)

    for i := 0; i < b.N; i++ {
        keyOnStack := [2]byte {'X', byte (i)}
        keySlice := keyOnStack[:]
        db.Get(keySlice)
    }
}

If comipilled as go test -gcflags -m one can see the message ./noescape_test.go:32:3: moved to heap: keyOnStack.

The following two patches solve this issue:

The performance improvements are about ~10ns per call, which if also using the mmap patch sent earlier represents 50% of the total runtime.

BenchmarkStackEscapeNo-4        1000000000               9.368 ns/op           0 B/op          0 allocs/op
BenchmarkStackEscapeYes-4       1000000000              23.12 ns/op            2 B/op          1 allocs/op

Why 24 bytes per record overhead?

Low overhead: A database uses 2048 bytes, plus 24 bytes per record, plus the space for keys and data.

For my own calculation, for each record, it should be len(key), len(value), hash_value, entry_offset 4 numbers. Each is 4 bytes. So total 16 bytes overhead per record. Where the 24 bytes come from?

Add support for `mmap`-backed CDB reader (yielding a x35 improvement)

Currently the CDB reader uses io.ReaderAt, that for each read operation will issue a read system call (or equivalent). An alternative would be to use mmap-ed memory to access the backing file.

In my experiments (see the patch below) by using mmap yields a 35 times performance improvement:

BenchmarkGet-4          13902710              2571 ns/op              28 B/op          2 allocs/op
BenchmarkGetMmap-4      484713384               73.96 ns/op            0 B/op          0 allocs/op

The following are a few patches that provide such support. If accepted I can submit a pull request:

  • cipriancraciun@6c948e5 -- adds support of using a []byte buffer that backs the CDB reader; such a []byte buffer can be either the result of a mmap or just reading the entire file in memory;
  • cipriancraciun@0b1bb73 -- this adds support for a custom io.Close to be called on cdb.Close; (it will be used to munmap the memory if needed;)
  • cipriancraciun@924564b -- this adds the actual mmap support;

The only semantic change is the following:

  • the []byte value returned as result of Get should not be written to by the application, as it's actually a slice from the mmap-ed file;
  • the []byte value can't be used after the cdb.Close was called;
  • if this is not acceptable, and one still wants to keep the current (undocumented) semantic, one could clone that slice (thus incurring an extra alloc, which in my patch doesn't happen);

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.