Comments (21)
Hi, snarkmaster:
First off, struct trailer
is intentionally part of the internal implementation details of the library. I would not want to directly move it into the public API, because exposing C structs in a library is very bad for ABI forwards/backwards compatibility. (E.g., if struct trailer
were part of the ABI, we could not add or remove a member of the structure without an ABI break.)
Secondly, can you explain what you mean by pre-allocating memory for the whole MTBL file? Since we use mmap()
to read from the file, if the file is smaller than physical memory, it ought to end up in the page cache anyway, depending on your read pattern.
And, the bytes_keys
and bytes_values
figures are not the figures you need anyway if you wanted to pre-load the file into memory. These are actually only used for statistical purposes -- they track the sums of the byte lengths of all keys/values in the file, and don't take into account the overhead consumed by the MTBL format encoding, nor do they take into account the bytes saved by key prefix compression.
from mtbl.
If you just want to preload the whole MTBL file into memory before reading from it, though, I think you could do this by open()
ing the file yourself, and calling posix_fadvise(..., 0, 0, POSIX_FADV_WILLNEED);
on the file descriptor. Then pass the file descriptor to mtbl_reader_init_fd()
.
from mtbl.
Oh, and if you are using data block compression, of course the bytes_keys
and bytes_values
figures don't take that into account either. So they are really only useful as metrics about your source data and don't describe anything about the size of anything in the on-disk file.
from mtbl.
Thanks for the quick reply, @edmonds!
I'm going to make an in-memory version of the file for really fast access, one of these two, probably:
std::vector<std::pair<std::string, Value>>
std::ordered_map<std::string, Value>
I want to use arena-style allocation for my keys and values (because otherwise alloc/dealloc tends to dominate the construction/destruction cost with small keys and values), so reading the trailer gives me a very good idea of how big the arena needs to be.
from mtbl.
(On an unrelated note, thanks a lot for making this library -- I've been doing happy dances ever since I've found it. It solves my problem like a glove tailor-made for a hand.)
from mtbl.
Also, I totally agree that my proposed change is a terrible kludge -- it was the fastest way for me to get unblocked in my experiments, but I'm happy to use a better API! :D
from mtbl.
Ah, OK, I understand your use case much better now. Essentially you want to build a complete cache of all the key-value entries in an MTBL file? Then, yes, bytes_keys
and bytes_values
will indeed give you a pretty good idea of how much to pre-allocate (maybe along with count_entries
, too), and my previous advise is a bit off the mark.
I think what we could do is introduce getters for all the trailer fields (ugh, all 9 of them), which would avoid embedding a C struct into the API/ABI. That way we could even rewrite the mtbl_info
utility to use only the public API instead of what it does currently (https://github.com/farsightsec/mtbl/blob/tags/v0.7.0/src/mtbl_info.c#L28), which is just cheating.
from mtbl.
Yeah, that sounds good.
I'm happy to send you a patch for the code part. Could you do the man page doc part? I'm not familiar with that toolchain, so that's probably 10x faster for you :) My guess at the man page process: edit the *.txt files by copy-pasting existing definitions, install asciidoc, run asciidoc, troubleshoot.
For the API, how about this:
uin64_t mtbl_reader_get_bytes_keys(struct mtbl_reader *r);
// etc 8 times
from mtbl.
Hm, I'm not sure about adding the getters directly to the mtbl_reader
interface, and we would want to make sure failure could be signalled to the caller (e.g., suppose we rev the trailer format in the future and add more fields; we'd want to be able to signal the absence of a field if reading a file in the previous format).
Maybe we turn struct trailer
into struct mtbl_trailer
and export it opaquely into the public API. Something like:
struct mtbl_trailer;
...
const struct mtbl_trailer *
mtbl_reader_get_trailer(struct mtbl_reader *);
...
mtbl_res
mtbl_trailer_get_bytes_keys(const struct mtbl_trailer *, uint64_t *val);
...more getters...
from mtbl.
Do you expect to ever delete any of the existing 9 fields? If not, this seems somewhat unnecessarily heavy. Instead of:
uint64_t bytes_keys;
struct mtbl_reader *r;
...
bytes_keys = mtbl_reader_get_bytes_keys(r);
People would have to write:
uint64_t bytes_keys;
struct mtbl_trailer *t;
struct mtbl_reader *r;
...
t = mtbl_reader_get_trailer(r);
if (mtbl_reader_get_bytes_keys(r, bytes_keys) != mtbl_res_success) {
fprintf(stderr, "DOOM AND DEATH");
return false;
}
As a compromise, you could add the heavier style of API for any new fields you add, and use the lighter API for the existing ones.
FWIW, I'm basically done with the patch for the API I proposed, so will send it upstream.
from mtbl.
Hm, OK, you've convinced me, it is a little heavy. I'd be fine with returning the primitive values directly, i.e.:
uint64_t
mtbl_trailer_get_...(const struct mtbl_trailer *);
from mtbl.
BTW, it's great to hear that mtbl
works well for your application! Though, I'm kind of intrigued by the fact that your MTBL files are small enough to comfortably fit into physical memory but your access pattern needs a high performance cache. Most of the data sets that I use mtbl
on have sparse access patterns and are much larger than physical memory.
Any details about your application that you can share? (If it's open/public, maybe I should start a "List of projects using mtbl" in the README...)
from mtbl.
Sent a pull request: #5
I just read your recent comment. If you hate the fact that I'm getting the values directly from the reader, I'll change that.
The application isn't currently public, but it probably will be. I'll post an update if/when that happens.
The files are indeed smaller than RAM in most cases. The reason that MTBL is a good fit is because I have hybrid, competing demands:
- High-throughput random access against the data. I benchmarked on-SSD
mtbl
with a dataset of about 200MB / 4 million keys, and I was getting a setup cost of about 400 usec to open the file, and 2-4 usec per key, depending on how many random accesses I did. That's pretty good, but way worse than an in-memory index, and bad enough to want to avoid. - Interactive / one-off point access: I want a simple command-line tool to quickly to peek at a few keys or a range of keys. In this case, a non-indexed persistent solution (e.g. serializing a map or vector of strings to disk) is pretty slow. You end up waiting O(several seconds) just to open the file.
mtbl
is done in a few milliseconds. - The possibility of having a performant index that does not fit in memory is attractive, since it makes this solution pretty future-proof.
FWIW, it was a bit hard to find mtbl
; I was looking for SSTable-like storage, and it seems like you're the only game in town (LevelDB / RocksDB don't really have usable public APIs for their SSTables).
from mtbl.
If I'm exporting mtbl_trailer
opaquely, what about renaming mtbl_trailer
to mtbl_metadata
, because the latter is pretty unclear (i.e. it's an implementation detail that is pretty irrelevant to users)?
Also, do you want a separate manpage, for mtbl_metadata
?
from mtbl.
Cool. Some comments:
- Does your startup latency include verifying the checksum on the index? The default is to verify, but can be disabled with
mtbl_reader_options_set_verify_checksums()
. (Also, if you do leave it on,mtbl
>= 0.4 takes advantage of theCRC32
instruction on SSE4.2 CPUs, which ought to be much faster than the previous software-only CRC32C implementation, if you have such a CPU.) Without verifying the checksum the overhead of opening anmtbl_reader
ought to be little more than the cost of themmap()
. - What compression level are you using? For random access, you end up having to decompress a whole data block at a time for each lookup (there's no data block cache).
zlib
is much slower thansnappy
, andsnappy
is a bit slower thanuncompressed
. - Similarly, what data block size are you using? Smaller block sizes result in bigger files (and bigger indexes), but require reading/decompressing less data at a time in order to perform a lookup.
I think by default you get 8 KB blocks and zlib
compression. Maybe ~1-4 KB blocks and no compression (or snappy
) would perform better if you can spare the memory and slightly larger index. But, regardless, you still incur the cost of searching the index and then seeking within the data block, so maybe a cache is still better for your application.
- A command-line
mtbl_lookup
debugging tool to go along withmtbl_dump
would be kind of cool.
from mtbl.
Hm, let's keep the name mtbl_trailer
, even though it's not particularly meaningful, and yeah, it would need a separate manpage. BTW, thanks for going to the trouble of contributing this upstream!
from mtbl.
Hmm, I actually did the rename before you commented, and it looks pretty good.
About to send you a pull request with a new mtbl_metadata
interface. Take a look, and if you don't like it, I'll revert the rename, which is just a couple of minutes of work.
Thanks for the comments:
- I think it's verifying checksums. The startup time is good enough already, but I'll try without. It should be using the SSE4.2 instruction (Facebook's open-source
folly
provides this in C++, so I'm aware of it). - My file is uncompressed, for the reason you mentioned. It's on my TODO list to try
snappy
. - Thanks, I'll try smaller block sizes!
If I were to make a contributable lookup tool, I'll send a pull request for sure, but I think it's unlikely for these reasons:
- My tool has a ton of app-specific logic, since
mtbl
is just part of the story. - It uses C++11 somewhat heavily.
So, if you find yourself with extra cycles, you should go for it :)
from mtbl.
Let me know what you think of the rename: #6
from mtbl.
Some comments:
- There is no option to turn off index CRC32C.There is only an option to enable/disable block CRC32C, and it's off by default. The perf impact of enabling block CRC32C on my machine was negligible (it's lost in the noise).
- Minor nit: CRC32 checking is implemented as an assert(), which means that with
NDEBUG
on, the CRCs would be computed, but the values will not be tested. Fortunately,mtbl
is not built withNDEBUG
. - Reducing the block size from 8kb to 1kb modestly speeds up the random fetches, and also reduces the variance. The data size increases 5%. I'm keeping the new setting.
- With 1kb blocks, turning on
snappy
results in a 15% slowdown, but the file size is halved for my data. I might keep this. - In my 4-million-row table of 50 byte keys, 16 byte values,
std::vector
is barely faster, andstd::map
is a bit slower than,mtbl
. Here's a chart of "number of random key queries" vs "how much slower ismtbl
thanstd::lower_bound
onstd::vector
"? This is for "fadvise WILLNEED, 1kb blocks, no compression", which is the fastest condition I've found formbtl
(although it's not very sensitive):
Notice that 7 outliers exceeding a 4x slowdown are not shown.
#leaveittommap, #inmmapwetrust
In other words, unless people are super-sensitive to latency outliers, need the last 10% of performance, or expect extreme memory pressure, they probably should not bother with an explicit memory cache for their MTBL
files.
from mtbl.
I think this issue can be closed now, due to the new metadata
interface on next
?
from mtbl.
Certainly, thanks!
from mtbl.
Related Issues (15)
- fixed size decompression buffer HOT 2
- Should I contribute a C++11 RAII wrapper for MTBL objects? HOT 4
- Expose trailer information HOT 1
- How to best do random access with MTBL files? HOT 1
- mtbl doesn't build on OS X HOT 4
- Test suite doesn't compile when built from source tarball
- Test suite failures on 32-bit architectures HOT 3
- mtbl_verify periodic progress chopped at terminal width HOT 2
- mtbl_verify periodic progress always 0 blocks / 0% HOT 4
- Almagated sources HOT 1
- Question about the minimum block length HOT 1
- Maybe an overflow bug in the bytes_shortest_separator function HOT 1
- Implementing multi-threaded sort HOT 2
- Did the release for 1.6.0 change? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mtbl.