Giter Site home page Giter Site logo

node-lmdb's Introduction

node-lmdb

This is a node.js binding for LMDB, an extremely fast and lightweight transactional key-value store database.

Donate

About

About this module

The aim of this node module is to provide bindings so that people can use LMDB from their node applications, aiming for a simple and clean API which is on par with the LMDB API but tries to apply javascript patterns and naming conventions as much as possible to make users feel familiar about it.

We support zero-copy retrieval of string and binary values. Binary values are operated on via the Node.js Buffer API.

About LMDB

Here are the main highlights of LMDB, for more, visit https://www.symas.com/lmdb :)

  • Key-value store, NoSQL
  • In-process, no need to squeeze your data through a socket
  • Support for transactions and multiple databases in the same environment
  • Support for multi-threaded and multi-process use
  • Zero-copy lookup (memory map)
  • Crash-proof design

Supported platforms

  • Tested and works on Linux (author uses Fedora)
  • Tested and works on Mac OS X
  • Tested and works on Windows

License info

The node-lmdb code is licensed to you under the terms of the MIT license. LMDB itself is licensed under its own OpenLDAP public license (which is similarly permissive).

Usage

Introduction

Step 0: require the module

Just like with any other node module, the first step is to require() the module.

var lmdb = require('node-lmdb');

Step 1: create an environment

Env represents a database environment. You can create one with the new operator and after that, you must open it before you can use it. open() accepts an object literal in which you can specify the configuration options for the environment.

var env = new lmdb.Env();
env.open({
    path: __dirname + "/mydata",
    mapSize: 2*1024*1024*1024, // maximum database size
    maxDbs: 3
});

Close the environment when you no longer need it.

env.close();

Step 2: open one or more databases

An environment (Env) can contain one or more databases. Open a database with env.openDbi() which takes an object literal with which you can configure your database.

var dbi = env.openDbi({
    name: "myPrettyDatabase",
    create: true // will create if database did not exist
})

Close the database when you no longer need it.

dbi.close();

Step 3: use transactions

The basic unit of work in LMDB is a transaction, which is called Txn for short. Here is how you operate with your data. Every piece of data in LMDB is referred to by a key. You can use the methods getString(), getBinary(), getNumber() and getBoolean() to retrieve something, putString(), putBinary(), putNumber() and putBoolean() to store something and del() to delete something.

IMPORTANT: always close your transactions with abort() or commit() when you are done with them.

var txn = env.beginTxn();
var value = txn.getString(dbi, 1);

console.log(value);

if (value === null) {
    txn.putString(dbi, 1, "Hello world!");
}
else {
    txn.del(dbi, 1);
}

txn.putString(dbi, 2, "Yes, it's this simple!");
txn.commit();

Asynchronous batched operations

You can batch together a set of operations to be processed asynchronously with node-lmdb. Committing multiple operations at once can improve performance, and performing a batch of operations and using sync transactions (slower, but maintains crash-proof integrity) can be efficiently delegated to an asynchronous thread. In addition, writes can be defined as conditional by specifying the required value to match in order for the operation to be performed, to allow for deterministic atomic writes based on prior state. The batchWrite method accepts an array of write operation requests, where each operation is an object or array. If it is an object, the supported properties are:

  • db (required) - The database to write to
  • key (required) - The key to write
  • value (optional) - If specified, this is the value to put into the entry. If absent or undefined, this write operation will be a delete, and delete this key. This should be a binary/buffer value.
  • ifValue (optional) - If specified, the write operation (put or delete) will only be performed if the provided ifValue matches the existing value for this entry. This should be a binary/buffer value.
  • ifExactMatch (optional) - If set to true, the conditional write requires that ifValue exactly match the existing value, byte for byte and length. By default ifValue can be a prefix and only needs to match the number of bytes in ifValue (for example if ifValue is Buffer.from([5, 2]), the conditional write will be performed if the value starts with 5, 2).
  • ifKey (optional) - If specified, indicates the key to use for for matching the conditional value. By default, the key use to match ifValue is the same key as the write operation.
  • ifDB (optional) - If specified, indicates the db to use for for matching the conditional value. By default, the key use to match ifValue is the same db as the write operation.

If the write operation is a specified with an array, the supported elements are:

  • A three element array for puting data: [db, key, value] (where value is a binary/buffer)
  • A two element array for deleting data: [db, key]
  • A four element array for conditionally puting or deleting data: [db, key, value, ifValue] (where value and ifValue are as specificied in the object definition)

When batchWrite is called, node-ldmb will asynchronously create a new write transaction, execute all the operations in the provided array, except for any conditional writes where the condition failed, and commit the transaction, if there were no errors. For conditional writes, if the condition did not match, the write will be skipped, but the transaction will still be committed. However, if any errors occur, the transaction will be aborted. This entire transaction will be created by node-lmdb and executed in a separate thread. The callback function will be called once the transaction is finished. It is possible for an explicit write transaction in the main JS thread to block or be blocked by the asynchronous transaction. For example:

env.batchWrite([
    [dbi, key1, Buffer.from("Hello")], // put in key 1
    [dbi, key2, Buffer.from("World")], // put in key 2
    [dbi, key3], // delete any entry from key 3 (can also use null as value to indicate delete)
    [dbi, key4, valuePlusOne, oldValue] // you could atomically increment by specifying the require previous state
], options, (error, results) => {
    if (error) {
        console.error(error);
    } else {
        // operations finished and transaction was committed
        let didWriteToKey4Succeed = results[3] === 0
    }
})

The callback function will be either be called with an error in the first argument, or an array in the second argument with the results of the operations. The array will be the same length as the array of write operations, with one to one correspondence by position, and each value in the result array will be: 0 - Operation successfully written 1 - Condition not met (only can happen if a condition was provided) 2 - Attempt to delete non-existent key (only can happen if ignoreNotFound enabled)

The options include all the flags from put options, and this optional property:

  • progress - This should be a function, if provided, will be called to report the progress of the write operations, returning the results array, with completion values filled in for completed operations, and all uncompleted operations will correspond to undefined in the eleemnt positions in the array. Progress events are best-effort in node; the write operations are performed in a separate thread, and progress events occur if and when node's event queue is free to run them (they are not guaranteed to fire if the main thread is busy).

Basic concepts

LMDB has four different entities:

  • Env represents a full database environment. The same environment can be used by multiple processes, but a particular Env object must be used by one process only. You can operate with the same environment from multiple threads.
  • Dbi represents a sub-database which belongs to a database environment. The same environment can contain either multiple named databases (if you specify a string name) or an unnamed database (if you specify null instead of a name).
  • Txn represents a transaction. Multiple threads can open transactions for the same Env, but a particular Txn object must only be accessed by one thread, and only one Txn object can be used on a thread at a time. (NOTE: The noTls option in the environment will change this behaviour for read-only transactions, so that a thread can then create any number of read-only transactions and any number of threads can access the same read-only transaction.) Note that only one write transaction can be open in an environment in any given time. env.beginTxn() will simply block until the previous one is either commit()ted or abort()ed.
  • Cursor objects can be used to iterate through multiple keys in the same database.

Here is how you use LMDB in a typical scenario:

  • You create an Env and open() it with the desired configuration options.
  • You open a Dbi by calling env.openDbi() and passing the database configuration options.
  • Now you can create Txns with env.beginTxn() and operate on the database through a transaction by calling txn.getString(), txn.putString() etc.
  • When you are done, you should either abort() or commit() your transactions and close() your databases and environment.

Example iteration over a database with a Cursor:

var cursor = new lmdb.Cursor(txn, dbi);

for (var found = cursor.goToFirst(); found !== null; found = cursor.goToNext()) {
    // Here 'found' contains the key, and you can get the data with eg. getCurrentString/getCurrentBinary etc.
    // ...
}

The cursor goTo methods (goToFirst, goToNext, etc.) will return the current key. When an item is not found, null is returned. Beware that the key itself could be a falsy JavaScript value, so you need to explicitly check against null with the !== operator in your loops.

Data Types in node-lmdb

LMDB is very simple and fast. Using node-lmdb provides close to the native C API functionally, but expressed via a natural javascript API. To make simple things simple, node-lmdb defaults to presenting keys and values in LMDB as strings. For convenience number, boolean and Buffer values are also supported.

The simplest way to store complex data types (such as objects) is to use JSON.stringify before putting it into the database and JSON.parse when you retrieve the data.

For more complex use cases access to keys and values as binary (node.js Buffer type) is provided. In LMDB itself keys (with one exception) and values are simply binary sequences of bytes. You can retrieve a key or value from an LMDB database as binary even if it was written as a string. The same does not apply in reverse! Using binary access also allows interoperation with LMDB databases created by, or shared with applications that use data serialisation formats other than UTF-16 strings (including, in particular, strings using other encodings such as UTF-8).
See our chapter Working with strings for more details.

Keys

  • Unsigned 32-bit integers: The one exception in LMDBs representation of keys is an optimisation for fixed-length keys. This is exposed by node-lmdb for one particular fixed length type: unsigned 32 bit integers. To use this optimisation specify keyIsUint32: true to openDbi. Because the keyIsUint32 : true option is passed through to LMDB and stored in the LMDB metadata for the database, a database created with this option set cannot be accessed without setting this option, and vice-versa.
  • Buffers: If you pass keyIsBuffer: true, you can work with node Buffer instances as keys.
  • Strings: This is the default. You can also use keyIsString: true.

When using a cursor keys are read from the database and it is necessary to specify how the keys should be returned. The most direct mapping from LMDB C API is as a node.js Buffer (binary), however it is often more convenient to return the key as a string, so that is the default.

You can specify the key type when you open a database:

dbi = env.openDbi({
    // ... etc.
    keyIsBuffer: true
});

When working with transactions, you can override the key type passed to openDbi by providing options to put, get and del functions.
For example:

var buffer = new Buffer('48656c6c6f2c20776f726c6421', 'hex');
var key = new Buffer('key2');
txn.putBinary(dbi, key, buffer, { keyIsBuffer: true });
var data = txn.getBinary(dbi, key, { keyIsBuffer: true });
data.should.deep.equal(buffer);
txn.del(dbi, key, { keyIsBuffer: true });

Finally, when working with cursors, you can override the key type by passing similar options as the 3rd argument of the Cursor constructor:

cursor = new lmdb.Cursor(txn, dbi, { keyIsBuffer: true });

Examples

You can find some in the source tree. There are some basic examples and I intend to create some advanced ones too.

The basic examples we currently have:

  • examples/1-env.js - shows basic usage of Env, Dbi and Txn operating on string values
  • examples/2-datatypes.js - shows how to use various data types for your data
  • examples/3-multiple-transactions.js - shows how LMDB will behave if you operate with multiple transactions
  • examples/4-cursors.js - shows how to work with cursors on a basic database
  • examples/5-dupsort.js - shows how to use a dupSort database with cursors
  • examples/6-asyncio.js - shows how to use the fastest (but also most dangerous) way for async IO
  • examples/7-largedb.js - shows how to work with an insanely large database
  • examples/8-multiple-cursors-single-transactions.js - shows how to use multiple cursors with a single transaction
  • examples/9-unnamed-db.js - shows how to use an unnamed database
  • examples/10-binkeycursors.js - shows how to work with cursors on a database with binary keys

Advanced examples:

  • examples/advanced1-indexing.js - this is a module pattern example which demonstrates the implementation of a search engine prototype
  • More will come later, so don't forget to check back!

Caveats

Unsafe Get Methods

Because of the nature of LMDB, the data returned by txn.getStringUnsafe(), txn.getBinaryUnsafe(), cursor.getCurrentStringUnsafe() and cursor.getCurrentBinaryUnsafe() is only valid until the next put operation or the end of the transaction. Also, with Node 14+, you must detach the buffer after using it, by calling env.detachBuffer(buffer). This must be done before accessing the same entry again (or V8 will crash). If you need to use the data later, you can use the txn.getBinary(), txn.getString(), cursor.getCurrentBinary() and cursor.getCurrentString() methods. For most usage, the optimisation (no copy) gain from using the unsafe methods is so small as to be negligible - the Unsafe methods should be avoided.

Working with strings

Strings can come from many different places and can have many different encodings. In the JavaScript world (and therefore the node.js world) strings are encoded in UTF-16, so every string stored with node-lmdb is also encoded in UTF-16 internally. This means that the string API (getString, putString, etc.) will only work with UTF-16 encoded strings.

If you only use strings that come from JavaScript code or other code that is a “good node citizen”, you never have to worry about encoding.

How to use other encodings

This has come up many times in discussions, so here is a way to use other encodings supported by node.js. You can use Buffers with node-lmdb, which are a very friendly way to work with binary data. They also come in handy when you store strings in your database with encodings other than UTF-16.

You can, for example, read a UTF-8 string as a buffer, and then use Buffer's toString method and specify the encoding:

// Get stored data as Buffer
var buf = txn.getBinary(dbi, key);
// Use the Buffer toString API to convert from UTF-8 to a JavaScript string
var str = buf.toString('utf8');

Useful links:

Storing UTF-16 strings as Buffers

While node.js doesn't require the UTF-16 strings to be zero-terminated, node-lmdb automatically and transparently zero-terminates every string internally. As a user, this shouldn't concern you, but if you want to write a string using the Buffer API and read it as a string, you are in for a nasty surprise.

However, it will work correctly if you manually add the terminating zero to your buffer.

Conceptually, something like this will work:

// The string we want to store using a buffer
var expectedString = 'Hello world!';

// node-lmdb internally stores a terminating zero, so we need to manually emulate that here
// NOTE: this would NEVER work without 'utf16le'!
var buf = Buffer.from(expectedString + '\0', 'utf16le');

// Store data as binary
txn.putBinary(dbi, key, buf);
      
// Retrieve same data as string and check
var data3 = txn.getString(dbi, key);

// At this point, data3 is equal to expectedString
Build Options

A few LMDB options are available at build time, and can be specified with options with npm install (which can be specified in your package.json install script): npm install --use_vl32=true: This will enable LMDB's VL32 mode, when running on 32-bit architecture, which adds support for large (multi-GB) databases on 32-bit architecture. npm install --use_fixed_size=true: This will enable LMDB's fixed-size option, when running on Windows, which causes Windows to allocate the full file size needed for the memory-mapped allocation size. The default behavior of dynamically growing file size as the allocated memory map, while convenient, uses a non-standard Windows API and can cause significant performance degradation, but using the fixed size option ensures much more stable/better performance on Windows (consider using lmdb-store on top of node-lmdb for automated memory-map growth).

On MacOS, there is a default limit of 10 robust locked semaphores, which imposes a limit on the number of open write transactions (if you have over 10 db environments with a write transaction). If you need more concurrent write transactions, you can increase your maximum undoable semaphore count by setting kern.sysv.semmnu on your local computer. Or you can build with POSIX semaphores, using npm install --use_posix_semaphores=true. However POSIX semaphores are not robust semaphores, which means that if you are running multiple processes and one crashes in the midst of transaction, it may block other processes from starting a transaction on that environment. Or try to minimize overlapping transactions and/or reduce the number of db environments (and use more databases within each environment).

Limitations of node-lmdb

  • Fixed address map (called MDB_FIXEDMAP in C) features are not exposed by this binding because they are highly experimental
  • There is no option to specify a custom key comparison method, so if the order of traversal is important, the key must be constructed so as to be correctly ordered using lexicographical comparison of the binary byte sequence (LMDB's default comparison method). While LMDB itself does allow custom comparisons, exposing this through a language binding is not recommended by LMDB's author. The validity of the database depends on a consistent key comparison function so it is not appropriate to use this customisation except in very specialised use cases - exposing this customisation point would encourage misuse and potential database corruption. In any case, LMDB performance is very sensitive to comparison performance and many of the advantages of using LMDB would be lost were a complex (and non-native code) comparison function used.
  • Not all functions are wrapped by the binding yet. If there's one that you would like to see, drop me a line.

Contributing

If you find problems with this module, open an issue on GitHub. Also feel free to send me pull requests. Contributions are more than welcome! :)

Building node-lmdb

LMDB is bundled in node-lmdb so you can simply build this module using node-gyp.

# Install node-gyp globally (needs admin permissions)
npm -g install node-gyp

# Clone node-lmdb
git clone [email protected]:Venemo/node-lmdb.git

# Go to node-lmdb directory
cd node-lmdb

# At first, you need to download all dependencies
npm install

# Once you have all the dependencies, the build is this simple
node-gyp configure
node-gyp build

Building node-lmdb on Windows

Windows isn't such a great platform for native node addons, but it can be made to work. See this very informative thread: nodejs/node-gyp#629

  1. Install latest .NET Framework (v4.6.2 at the time of writing)
  2. Install latest node.js (v7.9.0 at the time of writing).
  3. This is Windows. Reboot.
  4. Now open a node.js command prompt as administrator and run the following commands.
    NOTE: these commands WILL take a LOT of time. Please be patient.
npm -g install windows-build-tools
npm -g install node-gyp
npm -g install mocha
npm config set msvs_version 2015 --global

After this, close the command prompt and open a new one (so that changes to PATH and whatever else can take proper effect). At this point you should have all the necessary junk for Windows to be able to handle the build. (You won't need to run node as administrator anymore.) Note that windows-build-tools will silently fail to install if you don't have the .NET Framework installed on your machine.

  1. Add python2 to PATH. Note that windows-build-tools installed python2 (v2.7.x) for you already, so easiest is to use "Change installation" in the Control Panel and select "Change" and then "Add python.exe to PATH".
  2. This is Windows. Reboot again just to be sure.

Congrats! Now you can work with native node.js modules.

When you are building node-lmdb for the first time, you need to install node-lmdb's dependencies with npm install:

cd node-lmdb
npm install

Note that npm install will also attempt to build the module. However once you got all the dependencies, you only need to do the following for a build:

cd node-lmdb
node-gyp configure
node-gyp build

Managing the LMDB dependency

# Adding upstream LMDB as remote
git remote add lmdb https://git.openldap.org/openldap/openldap.git
# Fetch new remote
git fetch lmdb
# Adding the subtree (when it's not there yet)
git subtree add  --prefix=dependencies/lmdb lmdb mdb.master --squash
# Updating the subtree (when already added)
git subtree pull --prefix=dependencies/lmdb lmdb mdb.master --squash

Developer FAQ

How fast is this stuff?

LMDB is one of the fastest databases on the planet, because it's in-process and zero-copy, which means it runs within your app, and not somewhere else, so it doesn't push your data through sockets and can retrieve your data without copying it in memory.

We don't have any benchmarks for node-lmdb but you can enjoy a detailed benchmark of LMDB here: http://symas.com/mdb/microbench/ obviously, the V8 wrapper will have some negative impact on performance, but I wouldn't expect a significant difference.

Why is the code so ugly?

Unfortunately, writing C++ addons to Node.js (and V8) requires a special pattern (as described in their docs) which most developers might find ugly. Fortunately, we've done this work for you so you can enjoy LMDB without the need to code C++.

How does this module work?

It glues together LMDB and Node.js with a native Node.js addon that wraps the LMDB C API.

Zero-copy is implemented for string and binary values via a V8 custom external string resource and the Node.js Buffer class.

How did you do it?

These are the places I got my knowledge when developing node-lmdb:

Acknowledgements

Below you can find a list of people who have contributed (in alphabetical order). Big thank you to everybody!
(NOTE: if you think your name should be here, but isn't, please contact the author.)

  • @aholstenson (Andreas Holstenson)
  • @antoinevw
  • @b-ono
  • @braydonf (Braydon Fuller)
  • @da77a
  • @erichocean (Erich Ocean)
  • @jahewson (John Hewson)
  • @jeffesquivels (Jeffrey Esquivel S.)
  • @justmoon (Stefan Thomas)
  • @kriszyp (Kris Zyp)
  • @Matt-Esch
  • @oliverzy (Oliver Zhou)
  • @paberr (Pascal Berrang)
  • @rneilson (Raymond Neilson)

Support

node-lmdb is licensed to you under the terms of the MIT license, which means it comes with no warranty by default.

However,

  • LMDB: Symas (the authors of LMDB) offers commercial support of LMDB.
  • node-lmdb: If you have urgent issues with node-lmdb or would like to get support, you can contact @Venemo (the node-lmdb author).

You can also consider donating to support node-lmdb development:

Donate

node-lmdb's People

Contributors

aholstenson avatar arthurmilliken avatar b-ono avatar braydonf avatar da77a avatar dependabot[bot] avatar dschnei avatar erichocean avatar gcxfd avatar jeffesquivels avatar justmoon avatar keller18306 avatar kriszyp avatar nodeguy avatar oliverzy avatar paberr avatar rneilson avatar uberesch avatar venemo avatar wellawel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

node-lmdb's Issues

cursor.getCurrentString never called

Hey, I'm playing around with your cursor example and noticed that cursor.getCurrentString is never called and therefore the while loop never stops when you run it against an empty database.

callbacks supported?

Hi,

Trying to figure out how this works.. I was expecting to see callbacks, for instance when getting data txn.getString(dbi, 'abc', function(data) { ... })

... but instead there are synchronous calls everywhere?

Not async

Hi,

It appears that none of the functions are async based, I'd have at least expected the open/close calls to be async. Can you confirm?

MDB_READERS_FULL - maxReaders defaulting to wrong value?

When running in multi-process mode I am getting the error MDB_READERS_FULL.

Looking through the code env.cpp seems to default it to 1

    // Parse the maxDbs option
    rc = applyUint32Setting<unsigned>(&mdb_env_set_maxreaders, ew->env, options, 1, "maxReaders");

however, mdb.c seems to default it to 126

    /** Number of slots in the reader table.
     *  This value was chosen somewhat arbitrarily. 126 readers plus a
     *  couple mutexes fit exactly into 8KB on my development machine.
     *  Applications should set the table size using #mdb_env_set_maxreaders().
     */
#define DEFAULT_READERS 126

also, adding maxReaders:126 to the env.options seems to fix it.

utf8 keys rather than utf16?

Is there any reason why keys cannot be stored in UTF8 format rather than UTF16?

Under the lmdb hood they're just a bunch of bytes and as most keys tend to be Latin using 2 bytes per char seems pretty wasteful.

If you have 30M+ keys (which we do) it can add up to much larger databases which need to be paged in and out of memory.

Support node.js Buffer as key

Hi,

Currently this module only supports string/uint as key, it's limited. For example, I would like to use float/double as key and keep them sorted.

bytewise module can serialize any structures into buffer including float/double.

Build / Install

Do you have build instructions? I can't seem to install this with NPM.

Kudos on bringing LMDB to Node.

Opening second readOnly transaction throws "MDB_BAD_RSLOT"

I am trying to do multiple concurrent reads on my environment, however if I try to open a second transaction with the readOnly flag this error gets thrown

Error: MDB_BAD_RSLOT: Invalid reuse of reader locktable slot

When opening both transactions without the readOnly flag the process simply freezes on the second call to txn.beginTxn()

The only two concurrent transaction can exist is when one is opened with readOnly flag and the other one without, however opening any other transaction on top of those two will lead to one of the errors mentioned above.

The behaviour can easily be reproduced in the example example3-multiple-transactions.js by either removing the readOnly flag from the txn.beginTxn() call on line 28 or adding it to the call on line 33 (which of course will lead to an error in line 35, but it nonetheless shows the erroneous behaviour)

I would expect a error from opening a second transaction without the readOnly flag, as to writing transactions are not allowed, but readOnly transaction should not be limited in number, as the lmdb documentation states:

. One simultaneous write transaction is allowed, however there is no limit on the number of read transactions even when a write transaction exists.

Node.js 6.x problems

All the examples show lots of warnings like this

(node) v8::ObjectTemplate::Set() with non-primitive values is deprecated
(node) and will stop working in the next major release.

and follow huge stack traces. When node 6.x will get support?

String transactions copy/destructor

If you need to use the data later, you will have to copy it for yourself.

How do you recommend copying the strings?

I was hoping there would be a transactional method that would create a transaction, read a string and then close the transaction in the external string resource destructor.

make module avaliable through npm

The module does not seem to be available through npm which is rather inconvenient.

Search for lmdb on npmjs gives something leveldb-related instead.

Update: just saw in an older closed issue you don't consider the module to be "ready". Would still be good to add it which would allow folks like me to do some testing more easily.

Require statement include direct path to bindings

Requiring node-lmdb currently is required with the full path:

var lmdb = require('node-lmdb/build/Release/node-lmdb');

Which when switching between bindings that are compiled with the debug flag, the require statements need to be updated to:

var lmdb = require('node-lmdb/build/Debug/node-lmdb');

This could be automatically handled with the bindings module or prebuild, and compatible with both with:

var lmdb = require('node-lmdb');

Cursor is leaking memory

Hey,
I've discovered a memory leak in the lmdb.Cursor and updated the cursor example to reproduce the leak outside of my application.
After that I've used Instruments Memory-Leak Template and attached it to the running node process to digg a bit deeper and saw the obvious and continuously growing amount of CustomExternalStringResource
I hope the following screenshots can help to fix this leak and make this lmdb module ready for long term and heavy usage execution.

cheers

screen shot 2014-08-18 at 4 36 13 am

screen shot 2014-08-18 at 4 37 03 am

Test on Mac

Test if node-lmdb can be built on Mac OS X and test it by running the example code in the repository.

Build with node-gyp build

In the readme it says to build with:

node-gyp configure
make -C build -j4

But this really should be:

node-gyp configure
node-gyp build

What are the -C and -j4 flags doing?

Encoding problem? utf8 string written and retrieved results in half a string padded with zeroes

(let me start by saying I'm confused. I've been using node-lmdb for several weeks apparently without a problem. As far as I can tell, the issue I'm having occurs only on my (development) Mac, and my (production) debian machine works fine)

    var fs = require("fs"),
        lmdb = require('node-lmdb');

    var env = new lmdb.Env();
    env.open({
        path: "./lmdb",
        mapSize: 2*1024*1024*1024, // maximum database size
        maxDbs: 2
    });
    var dbi = env.openDbi({
        name: "moindb",
        create: true
    })

    var nm='name',
        s1 = "This is an ordinary string";
    fs.writeFileSync('1st', s1, {encoding:'utf8'});
    var txnw = env.beginTxn();
    txnw.putString(dbi,nm,s1);
    txnw.commit();
    var txnr = env.beginTxn({ readOnly: true }),
        s2 = txnr.getString(dbi,nm);
    txnr.commit();
    fs.writeFileSync('2nd', s2, {encoding:'utf8'});

The output is in file 1st: This is an ordinary string, but in file 2nd: T�h�i�s� �i�s� �a�n� �o�r�. That file has the same length, with invisible characters between the visible ones. In another editor it looks liek: T^@h^@i...

Seems like an encoding issue, or does this have to do with the dire warning in the README: 'Because of the nature of LMDB, the data returned by txn.getString() and txn.getBinary() is only valid until the next put operation or the end of the transaction.'

Any suggestion on how to get around this?

Test on Windows

Test if node-lmdb can be built on Windows and test it by running the example code in the repository.

Core dumps when dbi functions are assigned to a local, then called

If I take e.g. dbi.close (or seemingly dbi.anything, really) and assign it to a local variable, calling that local variable crashes node-lmdb with a cryptic error.

Here's a test-case:

var lmdb = require("node-lmdb");
var env = new lmdb.Env();

console.log("Opening env");
env.open({
    "path"    : "./testdb",
    "max-dbs" : 1
});

console.log("Opening dbi");
dbi = env.openDbi({
    name : "test1",
    create : true
});

/*
// Works:
console.log("Closing dbi");
dbi.close();
*/

// Dies:
console.log("Assigning");
var close = dbi.close;

console.log("Closing dbi");
close(); // <-- Here

This is the complete output, running the above example from zsh with node on a 64-bit Linux system:

Opening env
Opening dbi
Assigning
Closing dbi
node: /home/aku/.node-gyp/0.10.22/src/node_object_wrap.h:61: static T* node::ObjectWrap::Unwrap(v8::Handle<v8::Object>) [with T = DbiWrap]: Assertion `handle->InternalFieldCount() > 0' failed.
zsh: abort (core dumped)  node lmdb-closure-bug-testcase.js

JS functions are first class, so this is a sensible operation in JS-land, but it seems to confuse the underlying C++.

I'm on 7a12bd3 (currently newest master) and node -v is v0.10.24.

Callback interface is expensive

A callback-based interface adds a significant amount of overhead. The synchronous methods should ideally return values instead of using a callback.

mojibake string keys

I'm writing data to LMDB with another program. the keys are utf8 strings (in my case, really just ascii) and the values are snappy-encoded json.

I'm trying to get a node script to scan and process the same database. But strings are garbled coming out. The values in the database can be read and decoded correctly with cursor.getCurrentBinary(). But they keys are incorrectly assumed to be utf-16 strings.

Could some option be exposed to set the key encoding?

edit: there is a manual workaround using buffers.

cursor.getCurrentBinary(function(k, v) {
    var kutf8 = new Buffer(k, 'utf-16le').toString('utf-8');
    // ...
});

How to list dbis?

Thanks for this great lmdb binding. I was not able to find how to list current dbis / subdatabases in an environment. Sorry if this is obvious I am new to lmdb.

Zero-copy strings lead to segmentation fault after closing the environment

I recently ran into an issue where NodeJS reproducible crashed with an Segmentation fault error when trying to access data read from an node-lmdb environment.

The scenario was this: Small LMDB environment, 3 test entries keyIsUint32 was set to false, UTF8 string with about 12 characters where used as keys. Data was binary buffer.

The data was read with a cursor using getCurrentBinary and was stored in a Map using the key returned by the cursor as key and the binary buffer as data.

After the all data was read, the cursor, txn, dbi and environment were properly closed and the Map was returned to the calling code.

However upon accessing the keys of the map (mind not the data, only the key), the node process would quit with a segmentation fault error.
I debugged the node process with lldb and found that there was an illegal attempt to read invalid memory access causing a EXC_BAD_ACCESS error.

I then realised that this must be due to V8 the zero-copy feature used for string, which means that the string of the key actually resides in the memory used by LMDB and only the reference is passed to the JavaScript. Therefore the reference points to invalid memory once the environment is closed an LMDB frees the memory.

The solution therefore is quite simple: Copying the content of the key to a new memory address by calling new Buffer(key) and then make that new Buffer a string again.
After doing so, the string can safely be accessed.

I have a few questions about this:

  1. Is the above conclusion correct and would you suggest another solution?
  2. Why does this not affect the binary buffer read from LMDB? The README states that both, string and binary are provided via zero-copy, but accessing the binary buffer does not cause the segmentation fault.

I would greatly appreciate if this was mentioned clearly in the README and/or the examples as this might be very troublesome to debug for most.
If you want, I'll prepare and PR with updated README me and examples.

PS: This behaviour can easily be reproduced with example 4-cursors.js by storing the key and value in the printFunc on a new Map and log this Map after env.close() was called

Error building on Raspberry Pi (and a solution)

I tried to build node-lmdb on a Raspberry Pi, following your instructions. But unfortunately the build failed. After some google-ing around I found that I could get it working by opening the file “binding.gyp”; in the fragment

"cflags": [
"-fPIC",
"-fvisibility-inlines-hidden",
"-O3",
"-std=c++11"
]
I had to change “-std=c++11″ in “-std=c++0x”. After that it build without further errors.

Now I don't pretend to understand why this works (or why the build failed in the first place). But perhaps this information is useful to you in further developing node-lmdb. This build error was the only error I got; until now node-lmdb is working flawlesly for me (see: http://raspberryadventures.in/index.php/installing-and-using-lmdb-on-the-raspberry-pi/)

Proper use of Ref and Unref

This is a followup to the discussion at #20 - I've finally figured out the real reason behind the problem.

The cursors example does something like this:

var dbi1 = env.openDbi({ name: "hello", create: true });
dbi1.drop();
var dbi2 = env.openDbi({ name: "hello", create: true });
// use dbi2

The real issue is NOT that the V8 GC ate dbi2 but that it (correctly) destroyed dbi1 which in turn called mdb_dbi_close on the MDB_dbi, while that MDB_dbi was still in use by dbi2.

The correct solution to this problem:

  • Never call mdb_dbi_close as it is not required (confirmed by LMDB authors); thus eliminating the above issue

To take correct care of the GC:

  • DbiWrap needs to Ref and Unref the EnvWrap it uses
  • TxnWrap needs to Ref and Unref the EnvWrap it uses
  • CursorWrap needs to Ref and Unref the DbiWrap and TxnWrap it uses

C++11 doesn't compile on Centos GCC 4.4

Cannot get the LMDB to compile. here are my server details:

We are running:
CentOS 6.5 x86_64
gcc version 4.4.7
node 0.10.28

and here is the failed build log

root@dev01-ident node-lmdb]# node-gyp build
gyp info it worked if it ends with ok
gyp info using [email protected]
gyp info using [email protected] | linux | x64
gyp info spawn make
gyp info spawn args [ 'BUILDTYPE=Release', '-C', 'build' ]
make: Entering directory `/root/node-lmdb/build'
 CC(target) Release/obj.target/node-lmdb/libraries/liblmdb/mdb.o
cc1: warning: command line option "-std=c++0x" is valid for C++/ObjC++ but not for C
cc1: warning: command line option "-fvisibility-inlines-hidden" is valid for C++/ObjC++ but not for C
 CC(target) Release/obj.target/node-lmdb/libraries/liblmdb/midl.o
cc1: warning: command line option "-std=c++0x" is valid for C++/ObjC++ but not for C
cc1: warning: command line option "-fvisibility-inlines-hidden" is valid for C++/ObjC++ but not for C
 CXX(target) Release/obj.target/node-lmdb/src/node-lmdb.o
 CXX(target) Release/obj.target/node-lmdb/src/env.o
../src/env.cpp: In constructor ‘EnvWrap::EnvWrap()’:
../src/env.cpp:42: error: ‘nullptr’ was not declared in this scope
../src/env.cpp: In static member function ‘static v8::Handle<v8::Value> EnvWrap::open(const v8::Arguments&)’:
../src/env.cpp:146: error: ‘nullptr’ was not declared in this scope
../src/env.cpp: In static member function ‘static v8::Handle<v8::Value> EnvWrap::close(const v8::Arguments&)’:
../src/env.cpp:166: error: ‘nullptr’ was not declared in this scope
../src/env.cpp: In static member function ‘static v8::Handle<v8::Value> EnvWrap::sync(const v8::Arguments&)’:
../src/env.cpp:209: error: expected primary-expression before ‘[’ token
../src/env.cpp:209: error: expected primary-expression before ‘]’ token
../src/env.cpp:209: error: expected primary-expression before ‘*’ token
../src/env.cpp:209: error: ‘request’ was not declared in this scope
../src/env.cpp:209: error: expected unqualified-id before ‘void’
../src/env.cpp:213: error: expected primary-expression before ‘[’ token
../src/env.cpp:213: error: expected primary-expression before ‘]’ token
../src/env.cpp:213: error: expected primary-expression before ‘*’ token
../src/env.cpp:213: error: expected primary-expression before ‘int’
../src/env.cpp:213: error: expected unqualified-id before ‘void’
make: *** [Release/obj.target/node-lmdb/src/env.o] Error 1
make: Leaving directory `/root/node-lmdb/build'
gyp ERR! build error 
gyp ERR! stack Error: `make` failed with exit code: 2
gyp ERR! stack     at ChildProcess.onExit (/usr/lib/node_modules/node-gyp/lib/build.js:267:23)
gyp ERR! stack     at ChildProcess.EventEmitter.emit (events.js:98:17)
gyp ERR! stack     at Process.ChildProcess._handle.onexit (child_process.js:807:12)
gyp ERR! System Linux 2.6.32-431.17.1.el6.x86_64
gyp ERR! command "node" "/usr/bin/node-gyp" "build"
gyp ERR! cwd /root/node-lmdb
gyp ERR! node -v v0.10.28
gyp ERR! node-gyp -v v0.13.1
gyp ERR! not ok

I ran it by my server host technician, and he did not fair any better.

valToBinary() versus valToString()

I've been messing around with "String::NewExternal(new CustomExternalStringResource(&data))" versus "Buffer::New((char_)data.mv_data, data.mv_size, [](char*, void*) -> void { /_ Don't need to do anything here, because the data belongs to LMDB anyway */ }, NULL)->handle_" and come to the conclusion that with a 4KB data then the former is about four times faster. It's as if Buffer::New() is not doing the zero copy... Have you done any performance tests? -- Simon

node v.0.11.x support

Heyhey,
we stumbled across an error when trying to build the module for node v.0.11:

../src/node-lmdb.h:58:30: error: expected class name
class EnvWrap : public node::ObjectWrap {

This seems to be due to the api changes for native modules when changing from v.0.10.x to v.0.11.x. Do you plan to port node-lmdb to v.0.11 in the very near future? This would be great!
Maybe this could help: nan

Thanks a lot and cheers!

Test coverage

While there are many great examples, there isn't automated test coverage. Many of the examples could be converted into tests that could be run with continuous integration.

License?

There doesn't seem to be a license file or information for node-lmdb (though I see one for lmdb itself I think). Can you please provide one for node-lmdb?

Thanks

Handling two write transactions attempted

As there can only be one write transaction at time, attempting to begin two write transactions with the same environment will result in the process locking that is impossible for the code to exit without being terminated externally.

var txn1 = env.beginTxn();
var txn2 = env.beginTxn();

A few ideas:

  • throw an error on the second attempt
  • return null
  • asynchronous callback option that could wait until a write transaction could be obtained

first example does not work

Hi,

I'm trying to run the first example but it fails, see output:

node v0.8.25:

node example1-env.js

module.js:485
process.dlopen(filename, module.exports);
^
Error: /home/awel/dev/lmdb/node_modules/node-lmdb/build/Release/node-lmdb.node: undefined symbol: init
at Object.Module._extensions..node (module.js:485:11)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Module.require (module.js:362:17)
at require (module.js:378:17)
at Object. (/home/awel/dev/lmdb/node_modules/node-lmdb/example1-env.js:3:12)
at Module._compile (module.js:449:26)
at Object.Module._extensions..js (module.js:467:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)

node v0.10.12:

node example1-env.js
Current lmdb version is { versionString: 'MDB 0.9.7: (January 10, 2013)',
major: 0,
minor: 9,
patch: 7 }

/home/awel/dev/lmdb/node_modules/node-lmdb/example1-env.js:12
env.open({
^
Error: No such file or directory
at Object. (/home/awel/dev/lmdb/node_modules/node-lmdb/example1-env.js:12:5)
at Module._compile (module.js:456:26)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Function.Module.runMain (module.js:497:10)
at startup (node.js:119:16)
at node.js:901:3

both under Linux Mint 15

Any thoughts?

Publish v0.4.0

v0.3.0 is the latest published version, v0.4.0 patches a memory leak

Could not build on OS X

$ node-gyp build
gyp info it worked if it ends with ok
gyp info using [email protected]
gyp info using [email protected] | darwin | x64
gyp WARN download NVM_NODEJS_ORG_MIRROR is deprecated and will be removed in node-gyp v4, please use NODEJS_ORG_MIRROR
gyp info spawn make
gyp info spawn args [ 'BUILDTYPE=Release', '-C', 'build' ]
CC(target) Release/obj.target/node-lmdb/libraries/liblmdb/mdb.o
../libraries/liblmdb/mdb.c:9671:46: warning: data argument not used by format string [-Wformat-extra-args]
(int)mr[i].mr_pid, (size_t)mr[i].mr_tid, txnid);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
/usr/include/secure/_stdio.h:47:56: note: expanded from macro 'sprintf'
__builtin___sprintf_chk (str, 0, __darwin_obsz(str), __VA_ARGS__)
^
1 warning generated.
CC(target) Release/obj.target/node-lmdb/libraries/liblmdb/midl.o
CXX(target) Release/obj.target/node-lmdb/src/node-lmdb.o
CXX(target) Release/obj.target/node-lmdb/src/env.o
../src/env.cpp:199:23: error: no matching constructor for initialization of 'Nan::Callback'
d->callback = new Nan::Callback(callback);
^ ~~~~~~~~
../node_modules/nan/nan.h:1361:12: note: candidate constructor not viable: no known conversion from 'Handle<v8::Function>' to 'const v8::Local<v8::Function>' for 1st argument
explicit Callback(const v8::Local<v8::Function> &fn) {
^
../node_modules/nan/nan.h:1439:33: note: candidate constructor not viable: no known conversion from 'Handle<v8::Function>' to 'const Nan::Callback' for 1st argument
NAN_DISALLOW_ASSIGN_COPY_MOVE(Callback)
^
../node_modules/nan/nan.h:129:23: note: expanded from macro 'NAN_DISALLOW_ASSIGN_COPY_MOVE'
NAN_DISALLOW_COPY(CLASS) \
^
../node_modules/nan/nan.h:105:35: note: expanded from macro 'NAN_DISALLOW_COPY'
# define NAN_DISALLOW_COPY(CLASS) CLASS(const CLASS&) = delete;
^
../node_modules/nan/nan.h:1439:33: note: candidate constructor not viable: no known conversion from 'Handle<v8::Function>' to 'Nan::Callback' for 1st argument
NAN_DISALLOW_ASSIGN_COPY_MOVE(Callback)
^
../node_modules/nan/nan.h:130:23: note: expanded from macro 'NAN_DISALLOW_ASSIGN_COPY_MOVE'
NAN_DISALLOW_MOVE(CLASS)
^
../node_modules/nan/nan.h:107:5: note: expanded from macro 'NAN_DISALLOW_MOVE'
CLASS(CLASS&&) = delete; /* NOLINT(build/c++11) */ \
^
../node_modules/nan/nan.h:1355:3: note: candidate constructor not viable: requires 0 arguments, but 1 was provided
Callback() {
^
../src/env.cpp:220:22: error: no matching member function for call to 'Call'
d->callback->Call(argc, argv);
~~~~~~~~~~~~~^~~~
../node_modules/nan/nan.h:1429:3: note: candidate function not viable: no known conversion from 'Handle<v8::Value> [1]' to 'v8::Local<v8::Value> *' for 2nd argument
Call(int argc, v8::Local<v8::Value> argv[]) const {
^
../node_modules/nan/nan.h:1417:3: note: candidate function not viable: requires 3 arguments, but 2 were provided
Call(v8::Local<v8::Object> target
^
2 errors generated.
make: *** [Release/obj.target/node-lmdb/src/env.o] Error 1

sync version of cursor getters

Currently the cursor get functions getCurrentString, getCurrentNumber, getCurrentBinary and getCurrentBoolean return the value asynchronously.

Would it be possible to add versions that return the value synchronously in a similar way to the general get functions as it would make iterator code much less complex?

get current cursor as binary

Imdb dbs created by other modules (e.g. rvagg) store their cursors as binary data.

For our importing script I was attempting to get around this by casting the string cursors from node-lmdb to binary and then to utf8, this works some of the time but always leads to corruption.

A simple function to getTheCurrentCursorAsBinary would seem to get around this.

We hacked together a fix as such and it's working for us so far.

Reliable way to copy string?

As you have said in doc:

the data returned by txn.getString() and txn.getBinary() is only valid until the next put operation or the end of the transaction. If you need to use the data later, you will have to copy it for yourself.

What is the most reliable way to copy strings before closing environment and use that string later?

I have tested that ""+string and string.substr(0) would not work and (" "+string).slice(1) would work.

Add support for mapSize larger than 2^32 - 1

mapSize settings larger than 2^32 - 1 don't work and are silently ignored.
However, file sizes of 80GB and more seem to be supported by LMDB.

Below is example code that triggers the error on Linux and OSX.

var lmdb = require('node-lmdb');

var env = new lmdb.Env();
env.open({ path: '.', mapSize: 16 * 1024 * 1024 * 1024 });
var dbi = env.openDbi({ name: 'test', create: true });

try {
  while (true) {
    var txn = env.beginTxn();
    txn.putString(dbi, randomString(128), randomString(512));
    txn.commit();
  }
}
catch (error) {
  console.log('database size', require('fs').statSync('data.mdb').size / 1024 / 1024, 'MB');
  console.log(error);
}

function randomString(length) {
  var result = '';
  while (length-- > 0)
    result += String.fromCharCode(97 + Math.floor(Math.random() * 26));
  return result;
}

It gives the following output:

database size 9.99609375 MB
[Error: MDB_MAP_FULL: Environment mapsize limit reached]

Incorrect `dupFixed` argument, always true with `dupSort`

While doing testing, I've noticed that dupFixed in always being set to true, when dupSort is specified. For example:

var testDbi = env.openDbi({
  name: 'test',
  create: true,
  dupSort: true
});

And then putting two values with different sizes:

var value1 = new Buffer(new Array(8));
var value2 = new Buffer(new Array(4));
txn.putBinary(testDbi, key, value1);
txn.putBinary(testDbi, key, value2);

Will give the error:

Error: MDB_BAD_VALSIZE: Unsupported size of key/DB name/data, or wrong DUPFIXED size

From reading documentation the size of the value should only need to be fixed when MDB_DUPFIXED is specified.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.