It's really useful to be able to query t.bytes_keys, t.bytes_values, since this lets m

Oh, and if you are using data block compression, of course the <code class="notranslat

Thanks for the quick reply, <a class="user-mention notranslate" data-hovercard-type="u

May I move trailer_read and trailer_write into the public API?,about farsightsec/mtbl

Comments (21)

edmonds commented on July 19, 2024

Hi, snarkmaster:

First off, struct trailer is intentionally part of the internal implementation details of the library. I would not want to directly move it into the public API, because exposing C structs in a library is very bad for ABI forwards/backwards compatibility. (E.g., if struct trailer were part of the ABI, we could not add or remove a member of the structure without an ABI break.)

Secondly, can you explain what you mean by pre-allocating memory for the whole MTBL file? Since we use mmap() to read from the file, if the file is smaller than physical memory, it ought to end up in the page cache anyway, depending on your read pattern.

And, the bytes_keys and bytes_values figures are not the figures you need anyway if you wanted to pre-load the file into memory. These are actually only used for statistical purposes -- they track the sums of the byte lengths of all keys/values in the file, and don't take into account the overhead consumed by the MTBL format encoding, nor do they take into account the bytes saved by key prefix compression.

from mtbl.

edmonds commented on July 19, 2024

If you just want to preload the whole MTBL file into memory before reading from it, though, I think you could do this by open()ing the file yourself, and calling posix_fadvise(..., 0, 0, POSIX_FADV_WILLNEED); on the file descriptor. Then pass the file descriptor to mtbl_reader_init_fd().

from mtbl.

edmonds commented on July 19, 2024

Oh, and if you are using data block compression, of course the bytes_keys and bytes_values figures don't take that into account either. So they are really only useful as metrics about your source data and don't describe anything about the size of anything in the on-disk file.

from mtbl.

snarkmaster commented on July 19, 2024

Thanks for the quick reply, @edmonds!

I'm going to make an in-memory version of the file for really fast access, one of these two, probably:

std::vector<std::pair<std::string, Value>>
std::ordered_map<std::string, Value>

I want to use arena-style allocation for my keys and values (because otherwise alloc/dealloc tends to dominate the construction/destruction cost with small keys and values), so reading the trailer gives me a very good idea of how big the arena needs to be.

from mtbl.

snarkmaster commented on July 19, 2024

(On an unrelated note, thanks a lot for making this library -- I've been doing happy dances ever since I've found it. It solves my problem like a glove tailor-made for a hand.)

from mtbl.

snarkmaster commented on July 19, 2024

Also, I totally agree that my proposed change is a terrible kludge -- it was the fastest way for me to get unblocked in my experiments, but I'm happy to use a better API! :D

from mtbl.

edmonds commented on July 19, 2024

Ah, OK, I understand your use case much better now. Essentially you want to build a complete cache of all the key-value entries in an MTBL file? Then, yes, bytes_keys and bytes_values will indeed give you a pretty good idea of how much to pre-allocate (maybe along with count_entries, too), and my previous advise is a bit off the mark.

I think what we could do is introduce getters for all the trailer fields (ugh, all 9 of them), which would avoid embedding a C struct into the API/ABI. That way we could even rewrite the mtbl_info utility to use only the public API instead of what it does currently (https://github.com/farsightsec/mtbl/blob/tags/v0.7.0/src/mtbl_info.c#L28), which is just cheating.

from mtbl.

snarkmaster commented on July 19, 2024

Yeah, that sounds good.

I'm happy to send you a patch for the code part. Could you do the man page doc part? I'm not familiar with that toolchain, so that's probably 10x faster for you :) My guess at the man page process: edit the *.txt files by copy-pasting existing definitions, install asciidoc, run asciidoc, troubleshoot.

For the API, how about this:

uin64_t mtbl_reader_get_bytes_keys(struct mtbl_reader *r);
// etc 8 times

from mtbl.

edmonds commented on July 19, 2024

Hm, I'm not sure about adding the getters directly to the mtbl_reader interface, and we would want to make sure failure could be signalled to the caller (e.g., suppose we rev the trailer format in the future and add more fields; we'd want to be able to signal the absence of a field if reading a file in the previous format).

Maybe we turn struct trailer into struct mtbl_trailer and export it opaquely into the public API. Something like:

struct mtbl_trailer;

...

const struct mtbl_trailer *
mtbl_reader_get_trailer(struct mtbl_reader *);

...

mtbl_res
mtbl_trailer_get_bytes_keys(const struct mtbl_trailer *, uint64_t *val);

...more getters...

from mtbl.

snarkmaster commented on July 19, 2024

Do you expect to ever delete any of the existing 9 fields? If not, this seems somewhat unnecessarily heavy. Instead of:

uint64_t bytes_keys;
struct mtbl_reader *r;
...
bytes_keys = mtbl_reader_get_bytes_keys(r);

People would have to write:

uint64_t bytes_keys;
struct mtbl_trailer *t;
struct mtbl_reader *r;
...
t = mtbl_reader_get_trailer(r);
if (mtbl_reader_get_bytes_keys(r, bytes_keys) != mtbl_res_success) {
  fprintf(stderr, "DOOM AND DEATH");
  return false;
}

As a compromise, you could add the heavier style of API for any new fields you add, and use the lighter API for the existing ones.

FWIW, I'm basically done with the patch for the API I proposed, so will send it upstream.

from mtbl.

edmonds commented on July 19, 2024

Hm, OK, you've convinced me, it is a little heavy. I'd be fine with returning the primitive values directly, i.e.:

uint64_t
mtbl_trailer_get_...(const struct mtbl_trailer *);

from mtbl.

edmonds commented on July 19, 2024

BTW, it's great to hear that mtbl works well for your application! Though, I'm kind of intrigued by the fact that your MTBL files are small enough to comfortably fit into physical memory but your access pattern needs a high performance cache. Most of the data sets that I use mtbl on have sparse access patterns and are much larger than physical memory.

Any details about your application that you can share? (If it's open/public, maybe I should start a "List of projects using mtbl" in the README...)

from mtbl.

snarkmaster commented on July 19, 2024

Sent a pull request: #5

I just read your recent comment. If you hate the fact that I'm getting the values directly from the reader, I'll change that.

The application isn't currently public, but it probably will be. I'll post an update if/when that happens.

The files are indeed smaller than RAM in most cases. The reason that MTBL is a good fit is because I have hybrid, competing demands:

High-throughput random access against the data. I benchmarked on-SSD mtbl with a dataset of about 200MB / 4 million keys, and I was getting a setup cost of about 400 usec to open the file, and 2-4 usec per key, depending on how many random accesses I did. That's pretty good, but way worse than an in-memory index, and bad enough to want to avoid.
Interactive / one-off point access: I want a simple command-line tool to quickly to peek at a few keys or a range of keys. In this case, a non-indexed persistent solution (e.g. serializing a map or vector of strings to disk) is pretty slow. You end up waiting O(several seconds) just to open the file. mtbl is done in a few milliseconds.
The possibility of having a performant index that does not fit in memory is attractive, since it makes this solution pretty future-proof.

FWIW, it was a bit hard to find mtbl; I was looking for SSTable-like storage, and it seems like you're the only game in town (LevelDB / RocksDB don't really have usable public APIs for their SSTables).

from mtbl.

snarkmaster commented on July 19, 2024

If I'm exporting mtbl_trailer opaquely, what about renaming mtbl_trailer to mtbl_metadata, because the latter is pretty unclear (i.e. it's an implementation detail that is pretty irrelevant to users)?

Also, do you want a separate manpage, for mtbl_metadata?

from mtbl.

edmonds commented on July 19, 2024

Cool. Some comments:

Does your startup latency include verifying the checksum on the index? The default is to verify, but can be disabled with mtbl_reader_options_set_verify_checksums(). (Also, if you do leave it on, mtbl >= 0.4 takes advantage of the CRC32 instruction on SSE4.2 CPUs, which ought to be much faster than the previous software-only CRC32C implementation, if you have such a CPU.) Without verifying the checksum the overhead of opening an mtbl_reader ought to be little more than the cost of the mmap().
What compression level are you using? For random access, you end up having to decompress a whole data block at a time for each lookup (there's no data block cache). zlib is much slower than snappy, and snappy is a bit slower than uncompressed.
Similarly, what data block size are you using? Smaller block sizes result in bigger files (and bigger indexes), but require reading/decompressing less data at a time in order to perform a lookup.

I think by default you get 8 KB blocks and zlib compression. Maybe ~1-4 KB blocks and no compression (or snappy) would perform better if you can spare the memory and slightly larger index. But, regardless, you still incur the cost of searching the index and then seeking within the data block, so maybe a cache is still better for your application.

A command-line mtbl_lookup debugging tool to go along with mtbl_dump would be kind of cool.

from mtbl.

edmonds commented on July 19, 2024

Hm, let's keep the name mtbl_trailer, even though it's not particularly meaningful, and yeah, it would need a separate manpage. BTW, thanks for going to the trouble of contributing this upstream!

from mtbl.

snarkmaster commented on July 19, 2024

Hmm, I actually did the rename before you commented, and it looks pretty good.
About to send you a pull request with a new mtbl_metadata interface. Take a look, and if you don't like it, I'll revert the rename, which is just a couple of minutes of work.

Thanks for the comments:

I think it's verifying checksums. The startup time is good enough already, but I'll try without. It should be using the SSE4.2 instruction (Facebook's open-source folly provides this in C++, so I'm aware of it).
My file is uncompressed, for the reason you mentioned. It's on my TODO list to try snappy.
Thanks, I'll try smaller block sizes!

If I were to make a contributable lookup tool, I'll send a pull request for sure, but I think it's unlikely for these reasons:

My tool has a ton of app-specific logic, since mtbl is just part of the story.
It uses C++11 somewhat heavily.
So, if you find yourself with extra cycles, you should go for it :)

from mtbl.

snarkmaster commented on July 19, 2024

Let me know what you think of the rename: #6

from mtbl.

snarkmaster commented on July 19, 2024

Some comments:

There is no option to turn off index CRC32C.There is only an option to enable/disable block CRC32C, and it's off by default. The perf impact of enabling block CRC32C on my machine was negligible (it's lost in the noise).
Minor nit: CRC32 checking is implemented as an assert(), which means that with NDEBUG on, the CRCs would be computed, but the values will not be tested. Fortunately, mtbl is not built with NDEBUG.
Reducing the block size from 8kb to 1kb modestly speeds up the random fetches, and also reduces the variance. The data size increases 5%. I'm keeping the new setting.
With 1kb blocks, turning on snappy results in a 15% slowdown, but the file size is halved for my data. I might keep this.
In my 4-million-row table of 50 byte keys, 16 byte values, std::vector is barely faster, and std::map is a bit slower than, mtbl. Here's a chart of "number of random key queries" vs "how much slower is mtbl than std::lower_bound on std::vector"? This is for "fadvise WILLNEED, 1kb blocks, no compression", which is the fastest condition I've found for mbtl (although it's not very sensitive):

Notice that 7 outliers exceeding a 4x slowdown are not shown.

#leaveittommap, #inmmapwetrust

In other words, unless people are super-sensitive to latency outliers, need the last 10% of performance, or expect extreme memory pressure, they probably should not bother with an explicit memory cache for their MTBL files.

from mtbl.

edmonds commented on July 19, 2024

I think this issue can be closed now, due to the new metadata interface on next?

from mtbl.

snarkmaster commented on July 19, 2024

Certainly, thanks!

from mtbl.

May I move trailer_read and trailer_write into the public API? about mtbl HOT 21 CLOSED

Comments (21)

Related Issues (15)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent