Giter Site home page Giter Site logo

mtbl's Introduction

Build Status

mtbl: immutable sorted string table library

Introduction

mtbl is a C library implementation of the Sorted String Table (SSTable) data structure, based on the SSTable implementation in the open source Google LevelDB library. An SSTable is a file containing an immutable mapping of keys to values. Keys are stored in sorted order, with an index at the end of the file allowing keys to be located quickly.

mtbl is not a database library. It does not provide an updateable key-value data store, but rather exposes primitives for creating, searching and merging SSTable files. Unlike databases which use the SSTable data structure internally as part of their data store, management of SSTable files -- creation, merging, deletion, combining of search results from multiple SSTables -- is left to the discretion of the mtbl library user.

mtbl SSTable files consist of a sequence of data blocks containing sorted key-value pairs, where keys and values are arbitrary byte arrays. Data blocks are optionally compressed using the zlib, LZ4, zstd, or Snappy compression algorithms. The data blocks are followed by an index block, allowing for fast searches over the keyspace.

The basic mtbl interface is the writer, which receives a sequence of key-value pairs in sorted order with no duplicate keys, and writes them to data blocks in the SSTable output file. An index containing offsets to data blocks and the last key in each data block is buffered in memory until the writer object is closed, at which point the index is written to the end of the SSTable file. This allows SSTable files to be written in a single pass with sequential I/O operations only.

Once written, SSTable files can be searched using the mtbl reader interface. Searches can retrieve key-value pairs based on an exact key match, a key prefix match, or a key range. Results are retrieved using a simple iterator interface.

The mtbl library also provides two utility interfaces which facilitate a sort-and-merge workflow for bulk data loading. The sorter interface receives arbitrarily ordered key-value pairs and provides them in sorted order, buffering to disk as needed. The merger interface reads from multiple SSTables simultaneously and provides the key-value pairs from the combined inputs in sorted order. Since mtbl does not allow duplicate keys in an SSTable file, both the sorter and merger interfaces require a caller-provided merge function which will be called to merge multiple values for the same key. These interfaces also make use of sequential I/O operations only.

The mtbl file format was changed in version 1.0.0. Older versions cannot read the new file format, but newer versions can read both formats.

mtbl's People

Contributors

alesage avatar cmikk avatar djw1149 avatar dvladi77 avatar edmonds avatar fd00 avatar hstern avatar kelvinatorr avatar mcrawforddt avatar mschiffm avatar reedjc avatar shw700 avatar skempdt avatar snarkmaster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mtbl's Issues

Implementing multi-threaded sort

I was entertaining the idea of using multiple mtbl_sorter objects, one per thread, to get my key sorting done faster. The general plan was to manually coordinate a merge binary tree, with the number of active threads decreasing by a constant factor (2 or 3) every time.

This seems like a good idea to try, since my machines have lots of cores. One catch is that an mtbl_sorter does not export a mtbl_source interface, which means that you cannot compose the output of mtbl_sorters with mtbl_merger without dumping it to disk -- which would kind of defeat the point.

This is obviously outside what mtbl was designed for initially, but I thought you might be interested to know.

For now, I'm going with sorting and merging vectors using the standard library, and feeding the sorted values to mtbl.

May I move trailer_read and trailer_write into the public API?

It's really useful to be able to query t.bytes_keys, t.bytes_values, since this lets me pre-allocate memory when I want to load the whole mtbl file into RAM.

Specifically, I just moved the struct & trailer_read / trailer_write into mtbl.h, and prepended mtbl_ to their names.

Almagated sources

Hello,

I am currently working on making bindings to the mtbl library in Rust, there already is an existing binding library but it does need the mtbl library installed by an external package manager, I would like to have the sources directly inside of the crate itself, this way there will not be bindings compatibilty issues nor missing dependencies.

Is there a way to get the almagated sources, this way I will be able to ship those with the crate itself? Is the tarball in the releases what I want?

I also saw that the mtbl library requires a lot of compression libraries to compile (i.e. lz4, zstd, zlib, snappy) even if the user will only want to use one of those, is there a way to specify that and be able to compile with only one or none of the compression features?

Thank you for your work!

mtbl_verify periodic progress always 0 blocks / 0%

0 out of ... blocks (0.00%) for mtbl_verify periodic progress doesn't make sense.

I ran it on 36 files with real data and verify said is OK but progress always is zero.

The manpage is too brief about it.

fixed size decompression buffer

for compressed blocks, get_block() in mtbl/reader.c allocates a fixed size buffer to decompress into. this can cause decompression to fail if the data block contains highly compressible data.

Question about the minimum block length

Through my adventure on re-implementing MTBL in Rust I found a strange hard written value, the minimum sized block length. My program crashed when I keep using the 4 fixed-length 32-bit integers (16 bytes) so I computed it by myself and found out that, now that MTBL V2 uses varints, the minimum sized block length is 13 bytes.

I am maybe missing something somewhere.

mtbl/mtbl/reader.c

Lines 171 to 178 in bbfd07d

/**
* Sanitize the index block offset.
* We calculate the maximum possible index block offset for this file to
* be the total size of the file (r->len_data) minus the length of the
* metadata block (MTBL_METADATA_SIZE) minus the length of the minimum
* sized block, which requires 4 fixed-length 32-bit integers (16 bytes).
*/
const uint64_t max_index_block_offset = r->len_data - MTBL_METADATA_SIZE - 16;

compressed block (8)
block length varint encoded (1)
crc (4)
bytes written (13)

Maybe an overflow bug in the bytes_shortest_separator function

} else if (diff_index < min_length - sizeof(uint16_t)) {

I find out that min_length can be smaller than sizeof(uint16_t) and therefore an overflow happens when we do the substraction. I fixed this bug by using a saturating_sub in my Rust port but I think a simple rewrite can do the job in C too.

else if (min_length >= sizeof(uint16_t) && diff_index < min_length - sizeof(uint16_t))

Now that I found and fixed the bug on my side I can go play to Hades again 😄

Should I contribute a C++11 RAII wrapper for MTBL objects?

I've found this wrapper useful:

namespace mtbl {

/**
 * RAII wrappers for mtbl_* objects. Think of these as std::unique_ptr with
 * custom constructors and deleters.  Sample usage:
 *
 *   mtbl::SorterOptions sopt;
 *   mtbl_sorter_options_set_max_memory(sopt(), 1 << 30);  // 1GB
 *   mtbl::Sorter sorter(sopt());  // Does *not* take ownership of sopt.
 */
template <
  typename PtrT,
  typename InitFuncT,
  InitFuncT initFunc,
  typename DestroyFuncT,
  DestroyFuncT destroyFunc
>
class MtblPtr : boost::noncopyable {
public:
  using type = MtblPtr<PtrT, InitFuncT, initFunc, DestroyFuncT, destroyFunc>;

  template <typename... Args>
  MtblPtr(Args&&... args) : p_(initFunc(std::forward<Args>(args)...)) {
    if (p_ == nullptr) {
      throw std::bad_alloc();
    }
  }  
  ~MtblPtr() { reset(nullptr); }

  // Movable
  MtblPtr(type&& m) : p_(m.p_) { m.p_ = nullptr; }
  type& operator=(type&& m) {
    reset(m.p_);
    m.p_ = nullptr;
  }

  PtrT operator()() { return p_; }

  PtrT release() {
    auto res = p_;
    p_ = nullptr; 
    return res;   
  }

  void reset(PtrT ptr) {
    if (p_ != nullptr) {  // Redundant, since mtbl checks this too
      destroyFunc(&p_);
    }
    p_ = ptr;
  }

private:
  PtrT p_;
};

// These were made by searching for '_init(' in mtbl.h:

typedef MtblPtr<
  mtbl_iter*,   
  decltype(&mtbl_iter_init),
  &mtbl_iter_init,  
  decltype(&mtbl_iter_destroy),
  &mtbl_iter_destroy 
> Iter;

typedef MtblPtr<
  mtbl_source*, 
  decltype(&mtbl_source_init),
  &mtbl_source_init,  
  decltype(&mtbl_source_destroy),
  &mtbl_source_destroy 
> Source;

...

Are you interested in including this as, e.g. mtbl.hpp?

Test suite doesn't compile when built from source tarball

Hi,

It looks like the mtbl 1.0.0 tarball doesn't include all the files needed to build the test suite:

edmonds@chase{0}:/tmp$ wget -nv https://dl.farsightsecurity.com/dist/mtbl/mtbl-1.0.0.tar.gz
2017-03-26 21:14:37 URL:https://dl.farsightsecurity.com/dist/mtbl/mtbl-1.0.0.tar.gz [414863/414863] -> "mtbl-1.0.0.tar.gz" [1]
edmonds@chase{0}:/tmp$ tar xf mtbl-1.0.0.tar.gz 
edmonds@chase{0}:/tmp$ cd mtbl-1.0.0
edmonds@chase{0}:/tmp/mtbl-1.0.0$ ./configure && make -j8 && make -j8 check
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking for style of include used by make... GNU
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking dependency style of gcc... gcc3
checking for gcc option to accept ISO C99... none needed
checking for gcc option to accept ISO Standard C... (cached) none needed
checking how to run the C preprocessor... gcc -E
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking minix/config.h usability... no
checking minix/config.h presence... no
checking for minix/config.h... no
checking whether it is safe to define __EXTENSIONS__... yes
checking for special C compiler options needed for large files... no
checking for _FILE_OFFSET_BITS value needed for large files... no
checking whether make supports nested variables... (cached) yes
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking how to print strings... printf
checking for a sed that does not truncate output... /bin/sed
checking for fgrep... /bin/grep -F
checking for ld used by gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 1572864
checking whether the shell understands some XSI constructs... yes
checking whether the shell understands "+="... yes
checking how to convert x86_64-unknown-linux-gnu file names to x86_64-unknown-linux-gnu format... func_convert_file_noop
checking how to convert x86_64-unknown-linux-gnu file names to toolchain format... func_convert_file_noop
checking for /usr/bin/ld option to reload object files... -r
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for dlltool... no
checking how to associate runtime and link libraries... printf %s\n
checking for ar... ar
checking for archiver @FILE support... @
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for sysroot... no
checking for mt... mt
checking if mt is a manifest tool... no
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... yes
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
checking for pkg-config... /usr/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking for liblz4... yes
checking for LZ4_compress_HC in -llz4... yes
checking for libzstd... yes
checking whether byte ordering is bigendian... no
checking for mkstemp... yes
checking for posix_madvise... yes
checking for madvise... yes
checking sys/endian.h usability... no
checking sys/endian.h presence... no
checking for sys/endian.h... no
checking endian.h usability... yes
checking endian.h presence... yes
checking for endian.h... yes
checking snappy-c.h usability... yes
checking snappy-c.h presence... yes
checking for snappy-c.h... yes
checking for snappy_compress in -lsnappy... yes
checking zlib.h usability... yes
checking zlib.h presence... yes
checking for zlib.h... yes
checking for deflate in -lz... yes
checking for library containing dlopen... -ldl
checking for library containing clock_gettime... none required
checking for clock_gettime... yes
checking for a2x... /usr/bin/a2x
checking if LD -Wl,--version-script works... yes
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating mtbl/libmtbl.pc
config.status: creating config.h
config.status: executing depfiles commands
config.status: executing libtool commands

    mtbl 1.0.0

        compiler:               gcc
        cflags:                 -g -O2
        ldflags:                
        libs:                   -ldl -lz -lsnappy 

        prefix:                 /usr/local
        sysconfdir:             ${prefix}/etc
        libdir:                 ${exec_prefix}/lib
        includedir:             ${prefix}/include
        pkgconfigdir:           ${libdir}/pkgconfig

        bigendian:              no

        building manpage docs:  yes (asciidoc available)

make  all-am
make[1]: Entering directory '/tmp/mtbl-1.0.0'
  CC       libmy/crc32c-sse42.lo
  CC       libmy/crc32c.lo
  CC       libmy/crc32c-slicing.lo
  CC       libmy/heap.lo
  CC       libmy/my_fileset.lo
  CC       src/mtbl_dump.o
  CC       src/mtbl_info.o
  CC       src/mtbl_verify.o
  CC       src/mtbl_merge.o
  CC       mtbl/block.lo
  CC       mtbl/block_builder.lo
  CC       mtbl/compression.lo
  CC       mtbl/crc32c_wrap.lo
  CC       mtbl/fileset.lo
  CC       mtbl/fixed.lo
  CC       mtbl/iter.lo
  CC       mtbl/merger.lo
  CC       mtbl/reader.lo
  CC       mtbl/sorter.lo
  CC       mtbl/source.lo
  CC       mtbl/metadata.lo
  CC       mtbl/varint.lo
  CC       mtbl/writer.lo
  CCLD     mtbl/libmtbl.la
ar: `u' modifier ignored since `D' is the default (see `U')
  CCLD     src/mtbl_dump
  CCLD     src/mtbl_info
  CCLD     src/mtbl_verify
  CCLD     src/mtbl_merge
make[1]: Leaving directory '/tmp/mtbl-1.0.0'
make  t/test-block_builder t/test-crc32c t/test-fileset-partition t/test-fixed t/test-metadata t/test-varint t/test-vector t/test-iter-seek t/test-compression
make[1]: Entering directory '/tmp/mtbl-1.0.0'
  CC       t/test-block_builder.o
  CC       t/test-crc32c.o
  CC       t/test-fileset-partition.o
  CC       t/test-fixed.o
  CC       t/test-metadata.o
  CC       t/test-varint.o
  CC       t/test-vector.o
  CC       t/test-iter-seek.o
t/test-metadata.c:11:33: fatal error: ../libmy/b64_encode.c: No such file or directory
 #include "../libmy/b64_encode.c"
                                 ^
compilation terminated.
Makefile:1071: recipe for target 't/test-metadata.o' failed
make[1]: *** [t/test-metadata.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make[1]: Leaving directory '/tmp/mtbl-1.0.0'
Makefile:1748: recipe for target 'check-am' failed
make: *** [check-am] Error 2

Did the release for 1.6.0 change?

When Homebrew first added this release it had the checksum
6563ddf1c7d9973efa7c58033fd339e68e19be69a234fa5a25448871704942df but now the download checksums as 18ed7c9bb8b5ae71decd47dc7c882d1d6a63b61fd679a65b5f526fc241fd76f2 is something malicious going on? Or was the release retagged? Because the git manual says re-tagging is "the insane thing" to do.

mtbl doesn't build on OS X

OS X lacks POSIX real-time clock _POSIX_TIMERS stuff. Therefore, this happens:

$ make
/Applications/Xcode.app/Contents/Developer/usr/bin/make  all-am
  CC       mtbl/fileset.lo
mtbl/fileset.c:189:8: warning: implicit declaration of function 'clock_gettime'
      is invalid in C99 [-Wimplicit-function-declaration]
        res = clock_gettime(CLOCK_MONOTONIC, &now);
              ^
mtbl/fileset.c:189:22: error: use of undeclared identifier 'CLOCK_MONOTONIC'
        res = clock_gettime(CLOCK_MONOTONIC, &now);
                            ^
1 warning and 1 error generated.
make[1]: *** [mtbl/fileset.lo] Error 1
make: *** [all] Error 2

Two solutions:

  • The function in question mtbl_fileset_reload(), only requires Unix Epoch-level timer resolution. We could substitute for gettimeofday() and wrap the whole stanza in conditional #if statements and achieve the same functionality.
  • Fold in the ported real-time clock stuff as per libnmsg.

This begs another question: if libmtbl only needs second-level timer granularity, why use the POSIX-only real-time clock stuff in the first place?

This appears to be the only gating issue in getting mtbl ported to OS X.

Expose trailer information

It would be helpful if there were public accessor functions for the fields defined in the trailer struct.

54aaf51/mtbl/mtbl-private.h#L106

mtbl_info uses the private API to get at this

mtbl_verify periodic progress chopped at terminal width

mtbl_verify chops out the periodic progress for the terminal width. (I had standard 80 characters, so had content end like "block" without the "s" and percentage.

As a suggestion: add an option to always output it fully even if output is not the terminal
(like in a pipe) as someone may want to save that output or (in my case) want to see the output not cropped.
Or just let the long line wrap.

Test suite failures on 32-bit architectures

Hi,

The mtbl 1.1.1 test suite fails on 32-bit architectures (in Debian, armel, armhf, i386, mips) but not 64-bit architectures (in Debian, amd64, arm64, ppc64el, s390x). E.g., when built on i386:

==================================
   mtbl 1.1.1: ./test-suite.log
==================================

# TOTAL: 13
# PASS:  11
# SKIP:  0
# XFAIL: 0
# FAIL:  2
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: t/test-iter-seek
======================

*** stack smashing detected ***: /<<PKGBUILDDIR>>/t/.libs/test-iter-seek terminated
======= Backtrace: =========
/lib/i386-linux-gnu/libc.so.6(+0x6738a)[0xf756638a]
/lib/i386-linux-gnu/libc.so.6(__fortify_fail+0x37)[0xf75f6ec7]
/lib/i386-linux-gnu/libc.so.6(+0xf7e88)[0xf75f6e88]
/<<PKGBUILDDIR>>/mtbl/.libs/libmtbl.so.1(+0xb274)[0xf76ee274]
/<<PKGBUILDDIR>>/mtbl/.libs/libmtbl.so.1(+0x7df3)[0xf76eadf3]
/<<PKGBUILDDIR>>/mtbl/.libs/libmtbl.so.1(+0x7e7f)[0xf76eae7f]
/<<PKGBUILDDIR>>/mtbl/.libs/libmtbl.so.1(+0x802d)[0xf76eb02d]
/<<PKGBUILDDIR>>/t/.libs/test-iter-seek(+0xe3b)[0x565cae3b]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf6)[0xf7517286]
/<<PKGBUILDDIR>>/t/.libs/test-iter-seek(+0x1168)[0x565cb168]
======= Memory map: ========
565ca000-565cd000 r-xp 00000000 00:24 76003452                           /<<PKGBUILDDIR>>/t/.libs/test-iter-seek
565cd000-565ce000 r--p 00002000 00:24 76003452                           /<<PKGBUILDDIR>>/t/.libs/test-iter-seek
565ce000-565cf000 rw-p 00003000 00:24 76003452                           /<<PKGBUILDDIR>>/t/.libs/test-iter-seek
57e04000-57e29000 rw-p 00000000 00:00 0                                  [heap]
f7295000-f7297000 rw-p 00000000 00:00 0 
f7297000-f72b2000 r-xp 00000000 00:24 75965782                           /lib/i386-linux-gnu/libgcc_s.so.1
f72b2000-f72b3000 r--p 0001a000 00:24 75965782                           /lib/i386-linux-gnu/libgcc_s.so.1
f72b3000-f72b4000 rw-p 0001b000 00:24 75965782                           /lib/i386-linux-gnu/libgcc_s.so.1
f72b4000-f7307000 r-xp 00000000 00:24 75965773                           /lib/i386-linux-gnu/libm-2.24.so
f7307000-f7308000 r--p 00052000 00:24 75965773                           /lib/i386-linux-gnu/libm-2.24.so
f7308000-f7309000 rw-p 00053000 00:24 75965773                           /lib/i386-linux-gnu/libm-2.24.so
f7309000-f730b000 rw-p 00000000 00:00 0 
f730b000-f747d000 r-xp 00000000 00:24 75954814                           /usr/lib/i386-linux-gnu/libstdc++.so.6.0.24
f747d000-f747e000 ---p 00172000 00:24 75954814                           /usr/lib/i386-linux-gnu/libstdc++.so.6.0.24
f747e000-f7484000 r--p 00172000 00:24 75954814                           /usr/lib/i386-linux-gnu/libstdc++.so.6.0.24
f7484000-f7485000 rw-p 00178000 00:24 75954814                           /usr/lib/i386-linux-gnu/libstdc++.so.6.0.24
f7485000-f7488000 rw-p 00000000 00:00 0 
f7488000-f74e7000 r-xp 00000000 00:24 75996231                           /usr/lib/i386-linux-gnu/libzstd.so.1.3.1
f74e7000-f74e8000 r--p 0005e000 00:24 75996231                           /usr/lib/i386-linux-gnu/libzstd.so.1.3.1
f74e8000-f74e9000 rw-p 0005f000 00:24 75996231                           /usr/lib/i386-linux-gnu/libzstd.so.1.3.1
f74e9000-f74fd000 r-xp 00000000 00:24 75992954                           /usr/lib/i386-linux-gnu/liblz4.so.1.8.0
f74fd000-f74fe000 r--p 00013000 00:24 75992954                           /usr/lib/i386-linux-gnu/liblz4.so.1.8.0
f74fe000-f74ff000 rw-p 00014000 00:24 75992954                           /usr/lib/i386-linux-gnu/liblz4.so.1.8.0
f74ff000-f76b0000 r-xp 00000000 00:24 75965777                           /lib/i386-linux-gnu/libc-2.24.so
f76b0000-f76b2000 r--p 001b0000 00:24 75965777                           /lib/i386-linux-gnu/libc-2.24.so
f76b2000-f76b3000 rw-p 001b2000 00:24 75965777                           /lib/i386-linux-gnu/libc-2.24.so
f76b3000-f76b6000 rw-p 00000000 00:00 0 
f76b6000-f76bd000 r-xp 00000000 00:24 75996184                           /usr/lib/i386-linux-gnu/libsnappy.so.1.1.7
f76bd000-f76be000 r--p 00006000 00:24 75996184                           /usr/lib/i386-linux-gnu/libsnappy.so.1.1.7
f76be000-f76bf000 rw-p 00007000 00:24 75996184                           /usr/lib/i386-linux-gnu/libsnappy.so.1.1.7
f76bf000-f76d8000 r-xp 00000000 00:24 75965702                           /lib/i386-linux-gnu/libz.so.1.2.8
f76d8000-f76d9000 r--p 00018000 00:24 75965702                           /lib/i386-linux-gnu/libz.so.1.2.8
f76d9000-f76da000 rw-p 00019000 00:24 75965702                           /lib/i386-linux-gnu/libz.so.1.2.8
f76da000-f76dd000 r-xp 00000000 00:24 75965774                           /lib/i386-linux-gnu/libdl-2.24.so
f76dd000-f76de000 r--p 00002000 00:24 75965774                           /lib/i386-linux-gnu/libdl-2.24.so
f76de000-f76df000 rw-p 00003000 00:24 75965774                           /lib/i386-linux-gnu/libdl-2.24.so
f76e1000-f76e2000 rw-p 00000000 00:00 0 
f76e2000-f76e3000 r--p 00000000 00:24 76002957                           /tmp/tmpfbmLnC9 (deleted)
f76e3000-f76f6000 r-xp 00000000 00:24 76000230                           /<<PKGBUILDDIR>>/mtbl/.libs/libmtbl.so.1.0.1
f76f6000-f76f7000 r--p 00012000 00:24 76000230                           /<<PKGBUILDDIR>>/mtbl/.libs/libmtbl.so.1.0.1
f76f7000-f76f8000 rw-p 00013000 00:24 76000230                           /<<PKGBUILDDIR>>/mtbl/.libs/libmtbl.so.1.0.1
f76f8000-f76fb000 rw-p 00000000 00:00 0 
f76fb000-f76fd000 r--p 00000000 00:00 0                                  [vvar]
f76fd000-f76ff000 r-xp 00000000 00:00 0                                  [vdso]
f76ff000-f7722000 r-xp 00000000 00:24 75965781                           /lib/i386-linux-gnu/ld-2.24.so
f7722000-f7723000 r--p 00022000 00:24 75965781                           /lib/i386-linux-gnu/ld-2.24.so
f7723000-f7724000 rw-p 00023000 00:24 75965781                           /lib/i386-linux-gnu/ld-2.24.so
ffa2d000-ffa4e000 rw-p 00000000 00:00 0                                  [stack]
FAIL t/test-iter-seek (exit status: 134)

FAIL: t/test-compression.sh
===========================

./t/test-compression.sh: Testing compression type none
*** stack smashing detected ***: /<<PKGBUILDDIR>>/t/.libs/test-compression terminated
======= Backtrace: =========
/lib/i386-linux-gnu/libc.so.6(+0x6738a)[0xf75e138a]
/lib/i386-linux-gnu/libc.so.6(__fortify_fail+0x37)[0xf7671ec7]
/lib/i386-linux-gnu/libc.so.6(+0xf7e88)[0xf7671e88]
/<<PKGBUILDDIR>>/mtbl/.libs/libmtbl.so.1(+0xb274)[0xf7769274]
/<<PKGBUILDDIR>>/mtbl/.libs/libmtbl.so.1(+0x7df3)[0xf7765df3]
/<<PKGBUILDDIR>>/mtbl/.libs/libmtbl.so.1(+0x7e7f)[0xf7765e7f]
/<<PKGBUILDDIR>>/mtbl/.libs/libmtbl.so.1(+0x802d)[0xf776602d]
/<<PKGBUILDDIR>>/t/.libs/test-compression(+0xf93)[0x565f3f93]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf6)[0xf7592286]
/<<PKGBUILDDIR>>/t/.libs/test-compression(+0x11c6)[0x565f41c6]
======= Memory map: ========
565f3000-565f5000 r-xp 00000000 00:24 76001085                           /<<PKGBUILDDIR>>/t/.libs/test-compression
565f5000-565f6000 r--p 00001000 00:24 76001085                           /<<PKGBUILDDIR>>/t/.libs/test-compression
565f6000-565f7000 rw-p 00002000 00:24 76001085                           /<<PKGBUILDDIR>>/t/.libs/test-compression
56904000-56931000 rw-p 00000000 00:00 0                                  [heap]
f7306000-f7310000 r--p 00000000 00:24 76003526                           /<<PKGBUILDDIR>>/.mtbl.test-compression.29135.Q32KNa (deleted)
f7310000-f7312000 rw-p 00000000 00:00 0 
f7312000-f732d000 r-xp 00000000 00:24 75965782                           /lib/i386-linux-gnu/libgcc_s.so.1
f732d000-f732e000 r--p 0001a000 00:24 75965782                           /lib/i386-linux-gnu/libgcc_s.so.1
f732e000-f732f000 rw-p 0001b000 00:24 75965782                           /lib/i386-linux-gnu/libgcc_s.so.1
f732f000-f7382000 r-xp 00000000 00:24 75965773                           /lib/i386-linux-gnu/libm-2.24.so
f7382000-f7383000 r--p 00052000 00:24 75965773                           /lib/i386-linux-gnu/libm-2.24.so
f7383000-f7384000 rw-p 00053000 00:24 75965773                           /lib/i386-linux-gnu/libm-2.24.so
f7384000-f7386000 rw-p 00000000 00:00 0 
f7386000-f74f8000 r-xp 00000000 00:24 75954814                           /usr/lib/i386-linux-gnu/libstdc++.so.6.0.24
f74f8000-f74f9000 ---p 00172000 00:24 75954814                           /usr/lib/i386-linux-gnu/libstdc++.so.6.0.24
f74f9000-f74ff000 r--p 00172000 00:24 75954814                           /usr/lib/i386-linux-gnu/libstdc++.so.6.0.24
f74ff000-f7500000 rw-p 00178000 00:24 75954814                           /usr/lib/i386-linux-gnu/libstdc++.so.6.0.24
f7500000-f7503000 rw-p 00000000 00:00 0 
f7503000-f7562000 r-xp 00000000 00:24 75996231                           /usr/lib/i386-linux-gnu/libzstd.so.1.3.1
f7562000-f7563000 r--p 0005e000 00:24 75996231                           /usr/lib/i386-linux-gnu/libzstd.so.1.3.1
f7563000-f7564000 rw-p 0005f000 00:24 75996231                           /usr/lib/i386-linux-gnu/libzstd.so.1.3.1
f7564000-f7578000 r-xp 00000000 00:24 75992954                           /usr/lib/i386-linux-gnu/liblz4.so.1.8.0
f7578000-f7579000 r--p 00013000 00:24 75992954                           /usr/lib/i386-linux-gnu/liblz4.so.1.8.0
f7579000-f757a000 rw-p 00014000 00:24 75992954                           /usr/lib/i386-linux-gnu/liblz4.so.1.8.0
f757a000-f772b000 r-xp 00000000 00:24 75965777                           /lib/i386-linux-gnu/libc-2.24.so
f772b000-f772d000 r--p 001b0000 00:24 75965777                           /lib/i386-linux-gnu/libc-2.24.so
f772d000-f772e000 rw-p 001b2000 00:24 75965777                           /lib/i386-linux-gnu/libc-2.24.so
f772e000-f7731000 rw-p 00000000 00:00 0 
f7731000-f7738000 r-xp 00000000 00:24 75996184                           /usr/lib/i386-linux-gnu/libsnappy.so.1.1.7
f7738000-f7739000 r--p 00006000 00:24 75996184                           /usr/lib/i386-linux-gnu/libsnappy.so.1.1.7
f7739000-f773a000 rw-p 00007000 00:24 75996184                           /usr/lib/i386-linux-gnu/libsnappy.so.1.1.7
f773a000-f7753000 r-xp 00000000 00:24 75965702                           /lib/i386-linux-gnu/libz.so.1.2.8
f7753000-f7754000 r--p 00018000 00:24 75965702                           /lib/i386-linux-gnu/libz.so.1.2.8
f7754000-f7755000 rw-p 00019000 00:24 75965702                           /lib/i386-linux-gnu/libz.so.1.2.8
f7755000-f7758000 r-xp 00000000 00:24 75965774                           /lib/i386-linux-gnu/libdl-2.24.so
f7758000-f7759000 r--p 00002000 00:24 75965774                           /lib/i386-linux-gnu/libdl-2.24.so
f7759000-f775a000 rw-p 00003000 00:24 75965774                           /lib/i386-linux-gnu/libdl-2.24.so
f775d000-f775e000 rw-p 00000000 00:00 0 
f775e000-f7771000 r-xp 00000000 00:24 76000230                           /<<PKGBUILDDIR>>/mtbl/.libs/libmtbl.so.1.0.1
f7771000-f7772000 r--p 00012000 00:24 76000230                           /<<PKGBUILDDIR>>/mtbl/.libs/libmtbl.so.1.0.1
f7772000-f7773000 rw-p 00013000 00:24 76000230                           /<<PKGBUILDDIR>>/mtbl/.libs/libmtbl.so.1.0.1
f7773000-f7776000 rw-p 00000000 00:00 0 
f7776000-f7778000 r--p 00000000 00:00 0                                  [vvar]
f7778000-f777a000 r-xp 00000000 00:00 0                                  [vdso]
f777a000-f779d000 r-xp 00000000 00:24 75965781                           /lib/i386-linux-gnu/ld-2.24.so
f779d000-f779e000 r--p 00022000 00:24 75965781                           /lib/i386-linux-gnu/ld-2.24.so
f779e000-f779f000 rw-p 00023000 00:24 75965781                           /lib/i386-linux-gnu/ld-2.24.so
ffcd1000-ffcf2000 rw-p 00000000 00:00 0                                  [stack]
Aborted
FAIL t/test-compression.sh (exit status: 134)

============================================================================
Testsuite summary for mtbl 1.1.1
============================================================================
# TOTAL: 13
# PASS:  11
# SKIP:  0
# XFAIL: 0
# FAIL:  2
# XPASS: 0
# ERROR: 0
============================================================================
See ./test-suite.log
Please report to https://github.com/farsightsec/mtbl/issues
============================================================================

See https://buildd.debian.org/status/fetch.php?pkg=mtbl&arch=i386&ver=1.1.1-1&stamp=1508708425&raw=1 for the full build log.

How to best do random access with MTBL files?

The use case is "given a key prefix, sample a random key having that prefix".

The old proposal

On IRC, I pitched the following scheme to @edmonds --

  1. Prepend an 8-byte "autoincrement key" to each actual key, see below.
  2. Allow querying keys with a "skip prefix" option.

For example, here are the keys -- the brackets are for ease of understanding:

[0000][123456]
[0001][373737]
[0002][37deadbe]
[0003][37ffffffff00000ba7]
[0004][ffffffffffffffff]

Let's say that you want a random key with the prefix 37. This can be done in 3 lookups:

  1. Ignoring the first 2 bytes of each key, look up the first key with prefix 37 -- mtbl_source_get_strip_prefix(src, 2, "37", 2). Make a note of the first 2 bytes of the resulting key.
  2. Do the same to find the first key not having the prefix 37 -- mtbl_source_get_strip_prefix(src, 2, "38", 2). Take the first two bytes of the key, and subtract 1.
  3. Generate a random number between the values from (a) and (b), inclusive. Look up by the resulting 2 bytes, mtbl_source_get(src, "\x00\x02", 2).

The new suggestion

I ran the above scheme by @tudor, who pointed out that it has a few problems:

  1. It effectively breaks prefix compression, which can be a substantial size / perf hit.
  2. Its lookup cost is logarithmic. I was willing to accept that, but @tudor had a better fix.

His suggestion was to extend the MTBL API so that one can quickly seek to a specific key. This would take the shape of something like (restartable block offset, key index from that offset). I haven't yet checked if this can be boiled down to a single number, but that's beside the point.

Then, I can store (e.g. in the "foreign prefix" section of the MTBL file) an uncompressed list of key addresses, in order -- a single seek to base_offset + autoincrement_id tells you how to find the actual key.

In order to look up autoincrement_id from the key, it's easy enough to appendthe autoincrement ID to the key. Then, prefix compression works well, and lookups on the original keys work well.

Conclusion

It seems like adding mtbl_source_get_strip_prefix is not the best option, and the better option is to allow efficient serialization / deserialization of iterators. If there are no objections, I might try to prototype this.

Thoughts?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.