lucaong / cubdb Goto Github PK

Elixir embedded key/value database

License: Apache License 2.0

Elixir 99.58% Makefile 0.42%

database embedded elixir key-value key-value-store atomic-transactions nerves acid mvcc

cubdb's Introduction

CubDB is an embedded key-value database for the Elixir language. It is designed for robustness, and for minimal need of resources.

Head to the API reference for usage details, or read the Frequently Asked Questions and the How To section for more information.

Features

Both keys and values can be any Elixir (or Erlang) term.
Basic get, put, and delete operations, selection of ranges of entries sorted by key with select.
Atomic, Consistent, Isolated, Durable (ACID) transactions.
Multi version concurrency control (MVCC) allowing concurrent read operations, that do not block nor are blocked by writes.
Unexpected shutdowns or crashes won't corrupt the database or break atomicity of transactions.
Manual or automatic compaction to reclaim disk space.

To ensure consistency, performance, and robustness to data corruption, CubDB database file uses an append-only, immutable B-tree data structure. Entries are never changed in-place, and read operations are performed on zero cost immutable snapshots.

Usage

Start CubDB by specifying a directory for its database file (if not existing, it will be created):

{:ok, db} = CubDB.start_link(data_dir: "my/data/directory")

Important {: .warning}

Avoid starting multiple CubDB processes on the same data directory. Only one CubDB process should use a specific data directory at any time.

get, put, and delete operations work as you probably expect:

CubDB.put(db, :foo, "some value")
#=> :ok

CubDB.get(db, :foo)
#=> "some value"

CubDB.delete(db, :foo)
#=> :ok

CubDB.get(db, :foo)
#=> nil

Multiple operations can be performed atomically with the transaction function and the CubDB.Tx module:

# Swapping `:a` and `:b` atomically:
CubDB.transaction(db, fn tx ->
  a = CubDB.Tx.get(tx, :a)
  b = CubDB.Tx.get(tx, :b)

  tx = CubDB.Tx.put(tx, :a, b)
  tx = CubDB.Tx.put(tx, :b, a)

  {:commit, tx, :ok}
end)
#=> :ok

Alternatively, it is possible to use put_multi, delete_multi, and the other [...]_multi functions, which also guarantee atomicity:

CubDB.put_multi(db, [a: 1, b: 2, c: 3, d: 4, e: 5, f: 6, g: 7, h: 8])
#=> :ok

Range of entries sorted by key are retrieved using select:

CubDB.select(db, min_key: :b, max_key: :e) |> Enum.to_list()
#=> [b: 2, c: 3, d: 4, e: 5]

The select function can select entries in normal or reverse order, and returns a lazy stream, so one can use functions in the Stream and Enum modules to map, filter, and transform the result, only fetching from the database the relevant entries:

# Take the sum of the last 3 even values:
CubDB.select(db, reverse: true) # select entries in reverse order
|> Stream.map(fn {_key, value} -> value end) # discard the key and keep only the value
|> Stream.filter(fn value -> is_integer(value) && Integer.is_even(value) end) # filter only even integers
|> Stream.take(3) # take the first 3 values
|> Enum.sum() # sum the values
#=> 18

Read-only snapshots are useful when one needs to perform several reads or selects, ensuring isolation from concurrent writes, but without blocking them. When nothing needs to be written, using a snapshot is preferable to using a transaction, because it will not block writes.

Snapshots come at no cost: nothing is actually copied or written on disk or in memory, apart from some small internal bookkeeping. After obtaining a snapshot with with_snapshot, one can read from it using the functions in the CubDB.Snapshot module:

# the key of y depends on the value of x, so we ensure consistency by getting
# both entries from the same snapshot, isolating from the effects of concurrent
# writes
{x, y} = CubDB.with_snapshot(db, fn snap ->
  x = CubDB.Snapshot.get(snap, :x)
  y = CubDB.Snapshot.get(snap, x)

  {x, y}
end)

The functions that read multiple entries like get_multi, select, etc. are internally using a snapshot, so they always ensure consistency and isolation from concurrent writes, implementing multi version concurrency control (MVCC).

For more details, read the API documentation.

Installation

CubDB can be installed by adding :cubdb to your list of dependencies in mix.exs:

def deps do
  [
    {:cubdb, "~> 2.0.2"}
  ]
end

Acknowledgement

The file data structure used by CubDB is inspired by CouchDB. A big thanks goes to the CouchDB maintainers for the readable codebase and extensive documentation.

Copyright and License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

cubdb's People

Contributors

Stargazers

Watchers

cubdb's Issues

transactions

Hi,

Is there a way to run a transaction where, istead of using put_multi I could issue many put calls ?

Thank you.

Telemetry integration

Would be cool if this library exposed metrics with telemetry.

I can give that integration a go if you think it is a worthwhile idea :)

Support prefix for files

Have a config option that is a prefix, this way you can have multiple .cub files living in the same directory.

(I'm just tossing ideas here, feel free to disregard)

last entry select doesn't working

I'm trying to use the select function to take the last entry but isn't working like I expected.

Here is my test:

iex> CubDB.put(CubMyDB, {"cls", <<246, 85>>, 1}, "um")   
:ok
iex> CubDB.put(CubMyDB, {"cls", <<246, 85>>, 2}, "dois")
:ok
iex> CubDB.put(CubMyDB, {"cls", <<246, 85>>, 3}, "tres") 
:ok

If I select after that, it works:

iex> CubDB.select(CubMyDB, reverse: true) |> Enum.take(1)
[{{"cls", <<246, 85>>, 3}, "tres"}]

But now I'll insert two more with another key:

iex> CubDB.put(CubMyDB, {"cls", <<52, 31>>, 1}, "new one")
:ok
iex> CubDB.put(CubMyDB, {"cls", <<52, 31>>, 2}, "new two") 
:ok

Ok, let's try to take the last one.
The wrong result:

iex> CubDB.select(CubMyDB, reverse: true) |> Enum.take(1)
[{{"cls", <<246, 85>>, 3}, "tres"}]

If I list all entries in reverse order:

iex> CubDB.select(CubMyDB, reverse: true) |> Enum.to_list
[
  {{"cls", <<246, 85>>, 3}, "tres"},
  {{"cls", <<246, 85>>, 2}, "dois"},
  {{"cls", <<246, 85>>, 1}, "um"},
  {{"cls", <<52, 31>>, 2}, "new two"},
  {{"cls", <<52, 31>>, 1}, "new one"}
]

Feature Request: `CubDB.number_of_writes/1`

I would like to propose, on the same vein as CubDB.dirt_factor/1, I think it will be useful to have a public API with CubDB.number_of_writes/1, so we could manually trigger CubDB.compact/1 based these two factors.

Thank you.

Support wildcard matching

Whenever I look at CubDB.select it feels a little strange how much I need to know about sort order, to be able to select e.g. all values for {:some_key, _}. I'm wondering it this couldn't be simplified.

How to select only keys without values?

Hi @lucaong, let me start by saying that this is not a bug, just a question which could turn into a feature request.

As mentioned in issue #67 , I'm working with large amounts of data in CubDB. Sometimes for a single key, a value can have a size of 10 megabytes or more. (Whether or not this is a good idea in itself is a good question, but out of scope for this issue ;) )

In order to do a periodic cleanup of stale data, what I want to do is list all keys currently present in my CubDB database. I noticed that this is taking quite long (a couple of seconds at least) even for a database with just 30 records (each 10MB+ in size).

My conclusion is that the reason for this is that CubDB has no way of listing only the key part of a record – the value is always loaded, too.

I've been hacking around a bit and notice that the %CubDB.Btree{} struct does seem to have the keys present. Example with some random UUID keys:

%CubDB.Btree{
  root: {:l,
   [
     {"0687853d-0b06-4651-a5cc-855f0e8966b4", 328157201},
     {"0e1f82b8-b727-4300-86de-c37368e72e4b", 315502609},
     {"0e96af62-620b-4313-be63-b377499c7545", 256754705},
     {"27bcd34c-3523-44b0-bb82-8294e7493ae4", 436354065},
     {"430b3c18-9cd0-4682-8a23-8df8ed9252c5", 72614929},
     {"459b937a-3308-40a9-8c8a-7639214e97d3", 380009489},
     {"48a72808-52a3-41f7-ad8c-afa5799631db", 122859537},
     {"4a8af0be-9344-4cf9-9154-28c3f2e630fd", 97831953},
     {"5a87d5ba-9d4d-407d-a7e6-107a977fe5b1", 275397649},
     {"6ee65e3c-50b8-46d3-b0f6-2b7380ebdb30", 296584209},
     {"766025b6-4df0-4b60-b02a-ce52e2cc5823", 339930129},
     {"7f459820-0eaf-421c-80b1-d84d6ccc743c", 398071825},
     {"80daa26a-b5b5-402b-af82-f3eaf714045a", 170184721},
     {"85087a92-f7a2-4198-b98e-b361ded90e70", 240407569},
     {"8a7ce722-c5d4-4303-93f7-0776533b4765", 210489361},
     {"a405a605-0f68-45d0-a3c1-5e580162b346", 123390993},
     {"ae21499d-28b8-4433-8d61-005eff594908", 148443153},
     {"af2dd94f-4090-4dbf-bd87-208e7a959d99", 455530513},
     {"b034d60d-8a66-4eb2-9015-964a7413b063", 122861585},
     {"b7874dd2-057e-4aba-aecc-1cf53b85b624", 360783889},
     {"ccd09d7a-94fd-4f0f-87bd-a0e1925b2ffa", 229527569},
     {"d4559626-1bda-4ad5-9735-1ac07dd97e51", 417723409},
     {"d6569e73-9c3e-490b-872f-e6b6475e1ffc", 474360849},
     {"ea125de7-e837-4ab6-8db7-6d14dbaeecb5", 47621137},
     {"ee62ed22-8c22-4cb6-b377-68827cbcfbed", 192839697},
     {"f35a6177-2e55-460f-bdef-76707e0e8b67", 1038},
     {"f7d3c685-b8a2-45c8-9757-09e2b5bec8a1", 24699921},
     {"f9493336-0761-4d43-acc6-fe3e9d43946d", 123109393}
   ]},
  root_loc: 492884358,
  size: 28,
  dirt: 42,
  store: %CubDB.Store.File{
    # ... omitted
  },
  capacity: 32
}

While I can take this data out a Btree struct myself, it feels a bit hackish to use this internal data structure just to get a list of keys without having to load hundreds of megabytes of data into memory.

Is it correct that there is currently no public API (e.g. CubDB.keys/2, or CubDB.keys/3 for :min_key and :max_key support) available to list only the record keys without loading all values?

CubDB 0.17.0 crash under load {:error, :enoent}

Hi,

Really awesome looking project! We're evaluating it as an embedded storage solution and my coworker logged this
error when testing it (I couldn't reproduce it myself). Assuming it's a timing issue with the underlying file?

Steps to potentially reproduce:

{:ok, db} = CubDB.start_link(data_dir: "goof/", auto_compact: true)

Works

1..100 |> Enum.each(fn x -> CubDB.put(db, "k#{x}", "The value is #{x}") end)

Works

10000..20000 |> Enum.each(fn x -> CubDB.put(db, "k#{x}", "The value is #{x}") end)

It then throws an error attempting to rename the file :ok = File.rename(file_path, new_path)
when it is apparently not found (either something removed it out of order -- or the OS is acting funny).

[error] GenServer #PID<0.1606.0> terminating
** (MatchError) no match of right hand side value: {:error, :enoent}
    (cubdb 0.17.0) lib/cubdb.ex:1022: CubDB.finalize_compaction/1
    (cubdb 0.17.0) lib/cubdb.ex:939: CubDB.handle_info/2
    (stdlib 3.12.1) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib 3.12.1) gen_server.erl:711: :gen_server.handle_msg/6
    (stdlib 3.12.1) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: {:catch_up, %CubDB.Btree{capacity: 32, dirt: 9903, root: {:b, [{:foo, 10999555}, {"k14136", 13286632}]}, root_loc: 13287063, size: 10102, store: %CubDB.Store.File{file_path: "goof/2.compact", pid: #PID<0.2266.0>}}, %CubDB.Btree{capacity: 32, dirt: 10003, root: {:b, [{:foo, 15659273}, {"k14088", 18968629}]}, root_loc: 18969060, size: 10102, store: %CubDB.Store.File{file_path: "goof/1.cub", pid: #PID<0.2038.0>}}}
** (EXIT from #PID<0.1160.0>) shell process exited with reason: an exception was raised:
    ** (MatchError) no match of right hand side value: {:error, :enoent}
        (cubdb 0.17.0) lib/cubdb.ex:1022: CubDB.finalize_compaction/1
        (cubdb 0.17.0) lib/cubdb.ex:939: CubDB.handle_info/2
        (stdlib 3.12.1) gen_server.erl:637: :gen_server.try_dispatch/4
        (stdlib 3.12.1) gen_server.erl:711: :gen_server.handle_msg/6
        (stdlib 3.12.1) proc_lib.erl:249: :proc_lib.init_p_do_apply/3

Thanks!

How to remove all records in a database?

Hey there.

I'm going to use cubdb to store temporary logs in my app. My app has to reset the database from time to time by user request. I didn't find a way how to reset the database. Should I remove (or rename) database dir, or is there a more native way to achieve this in one operation?

Sort order for select?

Hey there 👋

It looks like there is no way to specify an order_by query in a select operation.

Is this a limitation of the BTree that backs up CubDB and therefore something that would be impossible to do in an efficient way?
Or would it theoretically be possible to support taking entries in an ordered way efficiently?

At the moment, what would be the best way to take the first n entries ordered by some value?

Would keeping a sorted, size limited list as an accumulator in the reduce of the select be better than just getting all entries and then Enum.sort_by(..) |> Enum.take(n) them at the end?

Errant files in db folder crash application permanently

I copied a version of the database file that had an example of an application bug I've been chasing in it, for future analysis, but didn't change the folder immediately.

cp 4750.cub blah.cub

As soon as the next compaction started, the entire app crashed and couldn't restart, dying with:

** (ArgumentError) argument error
        :erlang.binary_to_integer("blah", 16)
        (cubdb 1.0.0-rc.9) lib/cubdb.ex:807: CubDB.file_name_to_n/1

The app immediately hit max restart intensity and died completely.

I thought I remembered a function that was supposed to weed out non-hex filenames, and I found it: CubDB.cubdb_file?/1, but quickly realized that the regex in it is wrong:

/[\da-fA-F]+/ will match anything that has at least one hex character in it. ('b' or 'a' from 'blah' above)
/^[\da-fA-F]+$/ will match any hex-only string, which is what I think you wanted.

I've got a couple tests and a fix ready for you, I'll submit a PR in just a bit.

Timeout on put_multi

Got hit by this error and there is no way to provide a larger timeout.

GenServer.call(TransactionWatcher.Storage, {:get_and_update_multi, [], #Function<15.65360478/1 in CubDB.put_multi/2>}, 5000)

11GB file with 30M entries... I know it might not be designed for this kind of data :)

Re-opening a file fails when it was closed during compaction

Hi another edgy case when running many CubDB instances back to back:

there is CubDB.stop() happening just before this, but it seems as if the error message is indicating the on the next CubDB.start_link() the old compaction process still has a lock on the name?!

I tried adding halt_compaction before the stop:

CubDB.halt_compaction(db)
CubDB.stop(db)

but that didn't help either.

11:03:36.326 [error] GenServer #PID<0.266.0> terminating
** (ArgumentError) file "tmp/test_file_detsplus_bench_large_write_Elixir.CubDBWrap/1.compact" is already in use by another CubDB.Store.File
(cubdb 2.0.0) lib/cubdb/store/file.ex:43: CubDB.Store.File.ensure_exclusive_access!/1
(cubdb 2.0.0) lib/cubdb/store/file.ex:34: CubDB.Store.File.init/1
(elixir 1.13.4) lib/agent/server.ex:8: Agent.Server.init/1
(stdlib 3.15) gen_server.erl:427: :gen_server.init_it/2
(stdlib 3.15) gen_server.erl:394: :gen_server.init_it/6
(stdlib 3.15) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Last message: {:EXIT, #PID<0.263.0>, {%ArgumentError{message: "file "tmp/test_file_detsplus_bench_large_write_Elixir.CubDBWrap/1.compact" is already in use by another CubDB.Store.File"}, [{CubDB.Store.File, :ensure_exclusive_access!, 1, [file: 'lib/cubdb/store/file.ex', line: 43]}, {CubDB.Store.File, :init, 1, [file: 'lib/cubdb/store/file.ex', line: 34]}, {Agent.Server, :init, 1, [file: 'lib/agent/server.ex', line: 8]}, {:gen_server, :init_it, 2, [file: 'gen_server.erl', line: 427]}, {:gen_server, :init_it, 6, [file: 'gen_server.erl', line: 394]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 226]}]}}
State: %DynamicSupervisor{args: {{:temporary, 5000}, []}, children: %{}, extra_arguments: [], max_children: :infinity, max_restarts: 3, max_seconds: 5, mod: Task.Supervisor, name: {#PID<0.266.0>, Task.Supervisor}, restarts: [], strategy: :one_for_one}

streams & change notifications

Have you any thoughts on adding a lazy stream interface, or a way to allow processes to subscribe to changed entries?

Compression support?

Do you think we can somehow support some kind of data compression?

My guess is that compress the whole database is not possible since it would pretty much break efficient query and the append logic CubDB uses.

But I was wondering if we can add an option to compress the value part of a key/value when CubDB uses :erlang.term_to_binary() by piping it to :zlib.gzip().

Of course that if the value being stored is small, running :zlib.gzip() will actually increase the final byte size of the data stored, so this option would probably be turned off by default.

The advantage comes when you want to store huge data as the value, this can make the value stored way smaller than default (which is one of the use cases I want to use CubDB with).

I can try creating a PR if you like the idea.

Best regards, Eduardo.

Spaces in string values not returned via select

Enclosed is one way of handling output involving string values with spaces:
Suppose "entries" contains first and last names ( see example). In order to select these, store the strings with underscore separators, and then use select/3 with pipe, map and regex.replace to substitute space for the underscores.

Example:

 entries = [
     {{:names, 0}, "fn0_ln0"},
     {{:names, 1}, "fn1_ln1"},
     {{:names, 2}, "fn2_ln2"}
  ]

def selectEntries do
   {:ok, db} = CubDB.start_link("test.db")

CubDB.select(db,
  min_key: {:names, 0},
  max_key: {:names, 2},
  pipe: [
    map: fn {_key, value} ->
      Regex.replace ~r{[_]}, value, " " # replace "_" with a space
    end
    ])
end

Disk usage increase after moving data

Hi, not entirely sure if this is a bug, but it felt weird, so here's a report. Feel free to dismiss!

I've been using cubdb as a simple HTTP response cache for a while. In my development database, I had ± 10K records with 2.5GB disk space in use (measured using du -sh on the CubDB data directory).

Due to a data structure change, I've been copying all data to be stored under a different key within the same CubDB database. Afterwards, the old data has been removed, which means that the net amount of data should not change significantly.

I noticed, however, that the database is now using 22GB (!) of disk space, while the total number of records (measured using CubDB.size/1) is as expected, which means the expected data deletion did take place. Manually running CubDB.compact/1 does not change the disk space used.

As I did not expect this disk usage increase, I'm suspecting some kind of bug, but I'm not sure.

One thing that could be helpful: I did not measure the number of .cub files in use before the migration, but afterwards these are the current files with their respective file size:

3.4G  14.cub
3.4G  15.cub
3.2G  16.cub
3.3G  17.cub
3.2G  18.cub
2.7G  19.cub
2.4G  1A.cub

Please let me know if any other information is needed, I'm happy to help!

Export/Import

What would be the best way to export all data and later import it again.
I am working on a nerves app, and would like to give my user the opportunity to make backups of the data.

I am currently using this for the backup:
CubDB.select(:db) |> :erlang.term_to_binary()

But I do not find how to re-import this.
I thought about emptying the db and then putting all data back, but there seems to be no easy way to empty the db either.

Any help would be appreciated.

Bug: Error on CubDB startup

Suddenly I saw CubDB had crashed this morning, this was the error returned to me. After I cleaned the cubdb files in the data_dir this error was gone. I wasn't able to download the invalid file, so I'm not giving you much to work with I'm afraid.
If this happens again I will try to fetch the invalid db file and post it here.

  {:error,
  {%ArgumentError{
    message: "errors were found at the given arguments:\n\n  * 1st argument: invalid external representation of a term\n"
  },
  [
    {CubDB.Store.CubDB.Store.File, :raise_if_error, 1,
     [
       file: 'lib/cubdb/store/file.ex',
       line: 156,
       error_info: %{module: Exception}
     ]},
    {CubDB.Btree, :new, 2, [file: 'lib/cubdb/btree.ex', line: 64]},
    {CubDB, :init, 1, [file: 'lib/cubdb.ex', line: 1232]},
    {:gen_server, :init_it, 2, [file: 'gen_server.erl', line: 851]},
    {:gen_server, :init_it, 6, [file: 'gen_server.erl', line: 814]},
    {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 240]}
  ]}}
  ** (EXIT from #PID<0.4989.0>) shell process exited with reason: an exception was raised:
    ** (ArgumentError) errors were found at the given arguments:

  * 1st argument: invalid external representation of a term

        (cubdb 2.0.2) lib/cubdb/store/file.ex:156: CubDB.Store.CubDB.Store.File.raise_if_error/1
        (cubdb 2.0.2) lib/cubdb/btree.ex:64: CubDB.Btree.new/2
        (cubdb 2.0.2) lib/cubdb.ex:1232: CubDB.init/1
        (stdlib 4.1) gen_server.erl:851: :gen_server.init_it/2
        (stdlib 4.1) gen_server.erl:814: :gen_server.init_it/6
        (stdlib 4.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3

no function clause error

Hi,
I use cubdb 0.13.0 to log a few metrics of a website test with hound. I just encountered the following error on trying to read the metrics with:

iex(7)> {:ok, db} = CubDB.start_link("/opt/pisa_load_test")
{:ok, #PID<0.238.0>}
iex(8)> CubDB.get(db, :load_test)

The error:

    16:15:28.668 [error] Task #PID<0.242.0> started from #PID<0.238.0> terminating
** (FunctionClauseError) no function clause matching in CubDB.Btree.lookup_leaf/4
    (cubdb) lib/cubdb/btree.ex:316: CubDB.Btree.lookup_leaf({:v, [%{action: :long_running_request, date: ~U[2019-09-30 13:57:17.803671Z], duration: 442, env: :test, request_count: 120}, %{action: :mailbox, date: ~U[2019-09-30 13:49:55.251067Z], duration: nil, env: :test, request_count: 100}, %{action: :login, date: ~U[2019-09-30 13:49:47.146515Z], duration: 5, env: :test, request_count: 81}, %{date: ~U[2019-09-20 08:46:37.453455Z], env: :int, results: [%{login: %{duration: 9, request_count: 78}, long_running_request: %{duration: 23, request_count: 104}}]}, %{date: ~U[2019-09-20 08:12:58.613462Z], env: :int, results: [%{login: %{duration: 11, request_count: 79}, long_running_request: %{duration: 22, request_count: 106}}]}, %{date: ~U[2019-09-19 14:29:40.151420Z], env: :test, results: [%{login: %{duration: 6, request_count: 81}, long_running_request: %{duration: 34, request_count: 109}}]}, %{date: ~U[2019-09-19 14:27:07.536142Z], env: :test, results: [%{login: %{duration: 6, request_count: 83}, long_running_request: %{duration: 34, request_count: 110}}]}]}, %CubDB.Store.File{file_path: "/opt/pisa_load_test/0.cub", pid: #PID<0.239.0>}, :load_test, [])
    (cubdb) lib/cubdb/btree.ex:106: CubDB.Btree.fetch/2
    (cubdb) lib/cubdb/reader.ex:31: CubDB.Reader.run/4
    (elixir) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2
    (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Function: &CubDB.Reader.run/4
    Args: [{#PID<0.211.0>, #Reference<0.1265108072.3161718785.70643>}, #PID<0.238.0>, %CubDB.Btree{capacity: 32, dirt: 14, root: {:v, [%{action: :long_running_request, date: ~U[2019-09-30 13:57:17.803671Z], duration: 442, env: :test, request_count: 120}, %{action: :mailbox, date: ~U[2019-09-30 13:49:55.251067Z], duration: nil, env: :test, request_count: 100}, %{action: :login, date: ~U[2019-09-30 13:49:47.146515Z], duration: 5, env: :test, request_count: 81}, %{date: ~U[2019-09-20 08:46:37.453455Z], env: :int, results: [%{login: %{duration: 9, request_count: 78}, long_running_request: %{duration: 23, request_count: 104}}]}, %{date: ~U[2019-09-20 08:12:58.613462Z], env: :int, results: [%{login: %{duration: 11, request_count: 79}, long_running_request: %{duration: 22, request_count: 106}}]}, %{date: ~U[2019-09-19 14:29:40.151420Z], env: :test, results: [%{login: %{duration: 6, request_count: 81}, long_running_request: %{duration: 34, request_count: 109}}]}, %{date: ~U[2019-09-19 14:27:07.536142Z], env: :test, results: [%{login: %{duration: 6, request_count: 83}, long_running_request: %{duration: 34, request_count: 110}}]}]}, root_loc: 12305, size: 0, store: %CubDB.Store.File{file_path: "/opt/pisa_load_test/0.cub", pid: #PID<0.239.0>}}, {:get, :load_test, nil}]
** (EXIT from #PID<0.211.0>) shell process exited with reason: an exception was raised:
    ** (FunctionClauseError) no function clause matching in CubDB.Btree.lookup_leaf/4
        (cubdb) lib/cubdb/btree.ex:316: CubDB.Btree.lookup_leaf({:v, [%{action: :long_running_request, date: ~U[2019-09-30 13:57:17.803671Z], duration: 442, env: :test, request_count: 120}, %{action: :mailbox, date: ~U[2019-09-30 13:49:55.251067Z], duration: nil, env: :test, request_count: 100}, %{action: :login, date: ~U[2019-09-30 13:49:47.146515Z], duration: 5, env: :test, request_count: 81}, %{date: ~U[2019-09-20 08:46:37.453455Z], env: :int, results: [%{login: %{duration: 9, request_count: 78}, long_running_request: %{duration: 23, request_count: 104}}]}, %{date: ~U[2019-09-20 08:12:58.613462Z], env: :int, results: [%{login: %{duration: 11, request_count: 79}, long_running_request: %{duration: 22, request_count: 106}}]}, %{date: ~U[2019-09-19 14:29:40.151420Z], env: :test, results: [%{login: %{duration: 6, request_count: 81}, long_running_request: %{duration: 34, request_count: 109}}]}, %{date: ~U[2019-09-19 14:27:07.536142Z], env: :test, results: [%{login: %{duration: 6, request_count: 83}, long_running_request: %{duration: 34, request_count: 110}}]}]}, %CubDB.Store.File{file_path: "/opt/pisa_load_test/0.cub", pid: #PID<0.239.0>}, :load_test, [])
        (cubdb) lib/cubdb/btree.ex:106: CubDB.Btree.fetch/2
        (cubdb) lib/cubdb/reader.ex:31: CubDB.Reader.run/4
        (elixir) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2
        (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3

At the same time the program testing a website might write data. Somehow the database got corrupted. do have any idea how this could have happened?

Compacting DB hangs indefinitely when done manually

Hi @lucaong
I notice compacting the DB with CubDB.compact/1 hangs indefinitely.
You can see how the .cub file changes over time and the .compact file reached a size an never moves from there.

$ ls -al
-rw-r--r-- 1 xxxx xxxx 78604311 Aug  3 06:11 12.cub
-rw-r--r-- 1 xxxx xxxx  1965076 Aug  3 06:06 15.compact

$ ls -al
-rw-r--r-- 1 xxxx xxxx 85419031 Aug  3 06:14 12.cub
-rw-r--r-- 1 xxxx xxxx  1965076 Aug  3 06:06 15.compact

iex> CubDB.compacting?(:the_db)
true

Let me know if you need any more information. I could create a sample project if needed.

Cheers

Better error messages

Hi @lucaong
Thanks for CubDB, it is awesome on Nerves.

Here is what I get when hitting limit of FAT32 formatted usb drive.

2022-12-30 18:08:03.356000 [error] GenServer Chat.Db.MainDb.QueueWriter terminating
** (UndefinedFunctionError) function :efbig.exception/1 is undefined (module :efbig is not available)
    :efbig.exception([])
    (cubdb 2.0.1) lib/cubdb/store/file.ex:153: CubDB.Store.CubDB.Store.File.raise_if_error/1
    (cubdb 2.0.1) lib/cubdb/btree.ex:471: anonymous fn/2 in CubDB.Btree.store_nodes/2
    (elixir 1.14.1) lib/enum.ex:1658: Enum."-map/2-lists^map/1-0-"/2
    (cubdb 2.0.1) lib/cubdb/btree.ex:461: CubDB.Btree.build_up/6
    (cubdb 2.0.1) lib/cubdb/btree.ex:299: CubDB.Btree.insert_terminal_node/4
    (cubdb 2.0.1) lib/cubdb/transaction.ex:228: CubDB.Tx.put/3
    (chat 0.1.0) lib/chat/db/queue_writer.ex:136: anonymous fn/2 in Chat.Db.QueueWriter.db_write/2
Last message: {:"$gen_cast", {:write, [{{:file_chunk, "8f417917-4f7e-4e1d-9283-1ff9585d4999", 20971520, 31457279}, <<212, 108, 227, 95, 116, 106, 124, 96, 64, 180, 45, 210, 120, 69, 60, 72, 155, 253, 163, 152, 202, 21, 122, 2, 107, 16, 158, 142, 124, 191, 37, 122, 130, 57, 112, 4, 91, 40, 210, 77, 164, 214, 41, ...>>}]}}

This error message leads nowhere since :efbig module does not exist in Erlang.
It is actually a POSIX error returned by :file module.

Having it like {:file, :efbig} would be less misguiding 😉

get_and_update/3 writing to disk when new value is same as old

Thanks for the v1.0.0-rc.5 release. I saw that get_and_update/3 is now not supposed to write to disk unless the new value is different to the stored, so my little multi-website frontend project's DB module's store function has gone from:

def store(website, key, value) do
  case value != CubDB.get(__MODULE__, {website, key}) do
    true -> CubDB.put(__MODULE__, {website, key}, value)
    false -> :ok
  end
end

to:

def store(website, key, value) do
  CubDB.get_and_update(__MODULE__, {website, key}, fn old_value -> {old_value, value} end)
end

However, I'm finding that the file on disk is written to each time it's run with the same value.

I added some inspect outputting to check my new data is the same as the stored value:

def store(website, key, value) when is_website(website) do
  CubDB.get_and_update(__MODULE__, {website, key}, fn old_value ->
    IO.inspect old_value
    IO.inspect value
    IO.inspect old_value == value
    {old_value, value}
  end)
end

%{                                                                                                                                                                                  
  "pdMinimumCharge" => #Decimal<10>,                                                                                                                                                
  "pdOneOffSurcharge" => #Decimal<10>,                                                                                                                                              
  "pdServiceItems" => [                                                                                                                                                             
    %{                                                                                                                                                                              
      "details" => "Up to 3 panes in each.",                                                                                                                                        
      "image" => %{                                                                                                                                                                 
        "url" => "https://cms-assets.doliver.dev/file/cms-assets-dev/pure-drips/service-items/_200xAUTO_crop_center-center_none/IMGP2125.jpg"                                       
      },                                                                                                                                                                            
      "itemName" => "Standard/Bay windows",
      "price" => #Decimal<2>
    },
    %{
      "details" => "4 or more panes in each.",
      "image" => nil,
      "itemName" => "Bay windows",
      "price" => #Decimal<4>
    },
    %{
      "details" => nil,
      "image" => nil,
      "itemName" => "Third storey/Skylights",
      "price" => #Decimal<3>
    },
    %{
      "details" => "E.g., front door, back door.",
      "image" => nil,
      "itemName" => "Single doors",
      "price" => #Decimal<2>
    },
    %{
      "details" => nil,
      "image" => nil,
      "itemName" => "French door/Patio door",
      "price" => #Decimal<4>
    },
    %{
      "details" => nil,
      "image" => nil,
      "itemName" => "Bi-fold door panes",
      "price" => #Decimal<2>
    }
  ]
}
%{
  "pdMinimumCharge" => #Decimal<10>,
  "pdOneOffSurcharge" => #Decimal<10>,
  "pdServiceItems" => [
    %{
      "details" => "Up to 3 panes in each.",
      "image" => %{
        "url" => "https://cms-assets.doliver.dev/file/cms-assets-dev/pure-drips/service-items/_200xAUTO_crop_center-center_none/IMGP2125.jpg"
      },
      "itemName" => "Standard/Bay windows",
      "price" => #Decimal<2>
    },
    %{
      "details" => "4 or more panes in each.",
      "image" => nil,
      "itemName" => "Bay windows",
      "price" => #Decimal<4>
    },
    %{
      "details" => nil,
      "image" => nil,
      "itemName" => "Third storey/Skylights",
      "price" => #Decimal<3>
    },
    %{
      "details" => "E.g., front door, back door.",
      "image" => nil,
      "itemName" => "Single doors",
      "price" => #Decimal<2>
    },
    %{
      "details" => nil,
      "image" => nil,
      "itemName" => "French door/Patio door",
      "price" => #Decimal<4>
    },
    %{
      "details" => nil,
      "image" => nil,
      "itemName" => "Bi-fold door panes",
      "price" => #Decimal<2>
    }
  ]
}
true

Have I misunderstood?

Atomic gets that depend on each other

TLDR what I want to do is something like this, but atomic:

k2 = CubDB.get(db, {:index, k1})
v = CubDB.get(db, {:stuff, k2})

In other words, I'm taking the value of one key and using it as part of another key, which doesn't seem to be possible with current APIs. Now for my use case the atomicity isn't essential, so I could just use the double gets. However, some support for this would be nice to have regardless...

start_link/1 with all options

The child_spec supervisor notation of {CubDB, opts} normalized to calling start_link(opts). CubDB though receives additional options as additional parameters.

Running out of disk space

Probably not too relevant but I've received this arithmetic error when running out of disk space:

10:44:30.212 [error] Process #PID<0.253.0> raised an exception
** (ArithmeticError) bad argument in arithmetic expression
    (cubdb 2.0.0) lib/cubdb/store/file.ex:110: CubDB.Store.CubDB.Store.File.get_node/2
    (cubdb 2.0.0) lib/cubdb/btree.ex:398: CubDB.Btree.lookup_leaf/4
    (cubdb 2.0.0) lib/cubdb/btree.ex:293: CubDB.Btree.insert_terminal_node/4
    (cubdb 2.0.0) lib/cubdb/transaction.ex:228: CubDB.Tx.put/3 
    (cubdb 2.0.0) lib/cubdb.ex:754: anonymous fn/3 in CubDB.put/3
    (cubdb 2.0.0) lib/cubdb.ex:671: CubDB.transaction/2
    (dets_plus 2.1.1) bench/bench.ex:125: anonymous fn/5 in DetsPlus.Bench.large_write_test/2
    (elixir 1.13.4) lib/enum.ex:4136: Enum.reduce_range/5
    (dets_plus 2.1.1) bench/bench.ex:124: DetsPlus.Bench.large_write_test/2
    (stdlib 3.15) timer.erl:181: :timer.tc/2
    (dets_plus 2.1.1) bench/bench.ex:194: DetsPlus.Bench.tc/2
    (dets_plus 2.1.1) bench/bench.ex:174: anonymous fn/6 in DetsPlus.Bench.run/3
    (elixir 1.13.4) lib/enum.ex:4136: Enum.reduce_range/5
    (dets_plus 2.1.1) bench/bench.ex:168: anonymous fn/7 in DetsPlus.Bench.run/3
    (elixir 1.13.4) lib/enum.ex:2396: Enum."-reduce/3-lists^foldl/2-0-"/3
    (dets_plus 2.1.1) bench/bench.ex:165: DetsPlus.Bench.run/3
    (elixir 1.13.4) lib/enum.ex:937: Enum."-each/2-lists^foreach/1-0-"/2

A small doc spell mistake

How to sort uuid or string?

Hi, thanks for making this cool library.
I wonder how to model data from another server in which the ID is UUID/Hashid (string, non-integer).
I was thinking about modeling the key of the data by

{:message, uuid}

But then I get stuck on how to sort the data.
For an integer id, it would be like

select(min_key: {:message, 0}, max_key: {:message, nil})

but how about an uuid/hashid(string) ? Any help would be appreciated

CRDTs and CubDB?

I'm working on a mobile Nerves app that will experience periods of disconnection. Disconnected users will create/update/delete records, then re-sync upon connection.

I think this would be a good use-case for CRDTs. I'm curious if anyone has used CubDB with CRDTs, or if people can suggest other sync-strategies for disconnected updates.

CubDB crash with :error, :emfile

My app updates the same 3 keys in CubDB at 1Hz.

Every 4.5 hours of normal running (exactly 16,435 seconds in several instances), I get a crash in CubDB, which supervision restarts:

14:32:12.527 [error] Process #PID<0.2058.0> terminating
** (MatchError) no match of right hand side value: {:error, :emfile}
    (cubdb 1.0.0-rc.5) lib/cubdb/clean_up.ex:42: CubDB.CleanUp.handle_cast/2
    (stdlib 3.11.1) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib 3.11.1) gen_server.erl:711: :gen_server.handle_msg/6
    (stdlib 3.11.1) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Initial Call: CubDB.CleanUp.init/1
Ancestors: [:db, MyApp.Supervisor, #PID<0.2054.0>]
Message Queue Length: 0
Messages: []
Links: [#PID<0.2056.0>]
Dictionary: []
Trapping Exits: false
Status: :running
Heap Size: 6772
Stack Size: 27
Reductions: 30001910
Neighbours:
    #PID<0.32690.3>
        Initial Call: anonymous fn/0 in CubDB.Store.File.create/1
        Current Call: :gen_server.loop/7
        Ancestors: [:db, MyApp.Supervisor, #PID<0.2054.0>]
        Message Queue Length: 0
        Links: [#PID<0.2056.0>]
        Trapping Exits: false
        Status: :waiting
        Heap Size: 1598
        Stack Size: 11
        Reductions: 1500
        Current Stacktrace:
            (stdlib 3.11.1) gen_server.erl:394: :gen_server.loop/7
            (stdlib 3.11.1) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
    #PID<0.32695.3>
        Initial Call: anonymous fn/0 in CubDB.Store.File.create/1
        Current Call: :gen_server.loop/7
        Ancestors: [:db, MyApp.Supervisor, #PID<0.2054.0>]
        Message Queue Length: 0
        Links: [#PID<0.2056.0>]
        Trapping Exits: false
        Status: :waiting
        Heap Size: 987
        Stack Size: 11
        Reductions: 52624
        Current Stacktrace:
            (stdlib 3.11.1) gen_server.erl:394: :gen_server.loop/7
            (stdlib 3.11.1) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
    #PID<0.32733.0>
        Initial Call: anonymous fn/0 in CubDB.Store.File.create/1
        Current Call: :gen_server.loop/7
        Ancestors: [:db, MyApp.Supervisor, #PID<0.2054.0>]
        Message Queue Length: 0
        Links: [#PID<0.2056.0>]
        Trapping Exits: false
        Status: :waiting
        Heap Size: 610
        Stack Size: 11
        Reductions: 52413
        Current Stacktrace:
            (stdlib 3.11.1) gen_server.erl:394: :gen_server.loop/7
            (stdlib 3.11.1) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
        ...

The Neighbors list continues and is eventually truncated by the logging, but includes many copies of the 3 listed above

When I look in /proc//fd, I see MANY broken symlinks to deleted CubDB files. The last few look like this: (2 valid, 1 referencing an old DB version - there are hundreds more of the broken ones.

lrwx------. 1 64 Feb 8 07:28 986 -> /home//<project_name>/_build/prod/rel/<project_name>/dev_db/1B06.cub
lrwx------. 1 64 Feb 8 07:28 987 -> /home//<project_name>/_build/prod/rel/<project_name>/dev_db/1B06.cub
lrwx------. 1 64 Feb 8 07:28 99 -> /home//<project_name>/_build/prod/rel/<project_name>/dev_db/194B.cub (deleted)

Further digging reveals that 2 new fds are created every 30-35 seconds, which corresponds with the timing I see above for exhausting the system limit for FDs.

My best guess, quickly looking at the code is that the Actor that holds state for CubDB.Store.File isn't getting cleaned up after his version of the DB is out of date.

I'm going to continue investigating and will update with what I find.

cubdb crashes during update

Hi, I am using CubDB as embedded time-series statistics. The CubDB process crashes randomly with errors like this:

2021-02-19 08:18:23.055 [error] GenServer #PID<0.2284.0> terminating
** (stop) exited in: GenServer.call(#PID<0.2285.0>, {:update, #Function<0.77404542/1 in CubDB.Store.CubDB.Store.File.close/1>}, 5000)
    ** (EXIT) time out
    (elixir 1.11.2) lib/gen_server.ex:1027: GenServer.call/3
    (cubdb 1.0.0-rc.7) lib/cubdb/store/file.ex:106: CubDB.Store.CubDB.Store.File.close/1
    (stdlib 3.13) gen_server.erl:718: :gen_server.try_terminate/3
    (stdlib 3.13) gen_server.erl:903: :gen_server.terminate/10
    (stdlib 3.13) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Last message (from Tg.Service.BackgroundSrv): {:put_and_delete_multi, [{{"UserStatM", "[email protected]", "potential", 1613719092948961}, %{cat_ids: ["chatting", "social_network"], created_at: 1613719092948961, domain: "discord.com", filter_id: nil, n_occurs: 1, path: "", scheme: "", type: "dns", url: "discord.com", user_id: "[email protected]", user_ip: "149.224.20.202", verdict: "potential"}}], []}
State: %CubDB.State{auto_compact: {100, 0.25}, auto_file_sync: true, btree: %CubDB.Btree{capacity: 32, dirt: 202, root: {:b, [<key1>, <key2>,…]}, root_loc: 1005235, size: 655, store: %CubDB.Store.File{file_path: "/path/to/db_folder/8.cub", pid: #PID<0.2285.0>}}, catch_up: nil, clean_up: #PID<0.2286.0>, clean_up_pending: false, compactor: nil, data_dir: "/path/to/db_folder", readers: %{}, subs: [], task_supervisor: #PID<0.2287.0>}
Client Tg.Service.BackgroundSrv is alive

    (stdlib 3.13) gen.erl:208: :gen.do_call/4
    (elixir 1.11.2) lib/gen_server.ex:1024: GenServer.call/3
    (elixir 1.11.2) lib/enum.ex:792: anonymous fn/3 in Enum.each/2
    (stdlib 3.13) maps.erl:233: :maps.fold_1/3
    (elixir 1.11.2) lib/enum.ex:2197: Enum.each/2
    (firegate_stat 0.1.0) lib/fg/stat/buffer.ex:107: Fg.Stat.Buffer.flush_stat_points/1
    (exservice 0.1.0) lib/tg/service/background_srv.ex:136: Tg.Service.BackgroundSrv.run_job/3
    (exservice 0.1.0) lib/tg/service/background_srv.ex:83: Tg.Service.BackgroundSrv.handle_info/2
2021-02-19 08:18:23.064 [error] GenServer #PID<0.2287.0> terminating
** (stop) exited in: GenServer.call(#PID<0.2285.0>, {:update, #Function<0.77404542/1 in CubDB.Store.CubDB.Store.File.close/1>}, 5000)
    ** (EXIT) time out
Last message: {:EXIT, #PID<0.2284.0>, {:timeout, {GenServer, :call, [#PID<0.2285.0>, {:update, #Function<0.77404542/1 in CubDB.Store.CubDB.Store.File.close/1>}, 5000]}}}
State: %DynamicSupervisor{args: {{:temporary, 5000}, []}, children: %{}, extra_arguments: [], max_children: :infinity, max_restarts: 3, max_seconds: 5, mod: Task.Supervisor, name: {#PID<0.2287.0>, Task.Supervisor}, restarts: [], strategy: :one_for_one}
2021-02-19 08:18:23.065 [error] GenServer Tg.Service.BackgroundSrv terminating
** (stop) exited in: GenServer.call(#PID<0.2284.0>, {:put_and_delete_multi, [{<key>, <value>}], []}, :infinity)
    ** (EXIT) exited in: GenServer.call(#PID<0.2285.0>, {:update, #Function<0.77404542/1 in CubDB.Store.CubDB.Store.File.close/1>}, 5000)
        ** (EXIT) time out
    (elixir 1.11.2) lib/gen_server.ex:1027: GenServer.call/3
    (elixir 1.11.2) lib/enum.ex:792: anonymous fn/3 in Enum.each/2
    (stdlib 3.13) maps.erl:233: :maps.fold_1/3
    (elixir 1.11.2) lib/enum.ex:2197: Enum.each/2
    (firegate_stat 0.1.0) lib/fg/stat/buffer.ex:107: Fg.Stat.Buffer.flush_stat_points/1
    (exservice 0.1.0) lib/tg/service/background_srv.ex:136: Tg.Service.BackgroundSrv.run_job/3
    (exservice 0.1.0) lib/tg/service/background_srv.ex:83: Tg.Service.BackgroundSrv.handle_info/2
    (stdlib 3.13) gen_server.erl:680: :gen_server.try_dispatch/4
Last message: {:run_every, "vMt5TEFnzDX2NIHIVN", 1000, [report_to: Fg.Stat.Buffer, action_id: :flush_result]}
State: %Tg.Service.BackgroundSrv.State{jobs: %{"vMt5TEFnzDX2NIHIVN" => #Function<3.23596830/0 in Fg.Stat.Buffer.handle_info/2>}, pid: Tg.Service.BackgroundSrv}
2021-02-19 08:18:23.069 [info] Tg.Service.BackgroundSrv started.
2021-02-19 08:18:23.067 [error] Ranch protocol #PID<0.2383.0> of listener Fg.Stat.Server (connection #PID<0.2382.0>, stream id 1) terminated
an exception was raised:
    ** (ErlangError) Erlang error: {{:timeout, {GenServer, :call, [#PID<0.2285.0>, {:update, #Function<0.77404542/1 in CubDB.Store.CubDB.Store.File.close/1>}, 5000]}}, {GenServer, :call, [#PID<0.2284.0>, {:read, {:select, [min_key: {"UserStatM", "[email protected]", "intercept", 1613114299016432}, max_key: {"UserStatM", "[email protected]", "intercept", 1613719099016483}, pipe: [filter: #Function<12.108285932/1 in Fg.Stat.Api.get_user_stats/2>, map: #Function<13.108285932/1 in Fg.Stat.Api.get_user_stats/2>], reduce: {%{}, #Function<7.108285932/2 in Fg.Stat.Api.handle_req/5>}]}, :infinity}, :infinity]}}
        (elixir 1.11.2) lib/gen_server.ex:1027: GenServer.call/3
        (firegate_stat 0.1.0) lib/fg/stat/api.ex:154: anonymous fn/5 in Fg.Stat.Api.get_user_stats/2
        (firegate_stat 0.1.0) lib/fg/stat/server.ex:2: Fg.Stat.Server.init/2
        (cowboy 2.8.0) /home/ct/Projects/TruongGroup/firegate/firegate_stat/deps/cowboy/src/cowboy_handler.erl:37: :cowboy_handler.execute/2
        (cowboy 2.8.0) /home/ct/Projects/TruongGroup/firegate/firegate_stat/deps/cowboy/src/cowboy_stream_h.erl:300: :cowboy_stream_h.execute/3
        (cowboy 2.8.0) /home/ct/Projects/TruongGroup/firegate/firegate_stat/deps/cowboy/src/cowboy_stream_h.erl:291: :cowboy_stream_h.request_process/3
        (stdlib 3.13) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
2021-02-19 08:18:23.094 [error] Ranch protocol #PID<0.2387.0> of listener Fg.Stat.Server (connection #PID<0.2382.0>, stream id 2) terminated
an exception was raised:
    ** (ErlangError) Erlang error: {:noproc, {GenServer, :call, [#PID<0.2284.0>, {:read, {:select, [min_key: {"UserStatM", "[email protected]", "intercept", 1613114303076681}, max_key: {"UserStatM", "[email protected]", "intercept", 1613719103076729}, pipe: [filter: #Function<12.108285932/1 in Fg.Stat.Api.get_user_stats/2>, map: #Function<13.108285932/1 in Fg.Stat.Api.get_user_stats/2>], reduce: {%{}, #Function<6.108285932/2 in Fg.Stat.Api.handle_req/5>}]}, :infinity}, :infinity]}}

Here is how I started CubDB

{:ok, pid}  = CubDB.start_link([
      data_dir:   "/path/to/db_folder",
      auto_compact: true
    ])

Btw, I am not sure how to refer to a CubDB process started in a supervision tree, because the API for calling CubDB is

CubDb.put_multi(<PID>, <data>)

instead of

CubDB.put_multi(:some_atom_id, <data>)

I am sorry that some details in the error messages have been concealed due to confidentiality.

Thank you very much for your advice!

Corrupt data file after {:error, :enomem}

Just ran in to this error. Basically it looks like my app ran out of memory (for some yet unknown reason), this caused it to crash. After trying to start the application again CubDB cannot start.

Initial stacktrace:

16:40:11.775 [error] Task :cubdb started from #PID<0.3750.0> terminating
 ** (WithClauseError) no with clause matching: {:error, :enomem}
     (cubdb 0.17.0) lib/cubdb/store/file.ex:94: CubDB.Store.CubDB.Store.File.get_node/2
     (cubdb 0.17.0) lib/cubdb/btree.ex:337: CubDB.Btree.lookup_leaf/4
     (cubdb 0.17.0) lib/cubdb/btree.ex:106: CubDB.Btree.fetch/2
     (cubdb 0.17.0) lib/cubdb/reader.ex:31: CubDB.Reader.run/4
     (elixir 1.10.2) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2
     (stdlib 3.11.2) proc_lib.erl:249: :proc_lib.init_p_do_apply/3

Error when starting again

** (Mix) Could not start application myapp: MyApp.Application.start(:normal, []) returned an error: shutdown: failed to start child: CubDB
    ** (EXIT) an exception was raised:
        ** (ArgumentError) argument error
            (cubdb) lib/cubdb/store/file.ex:94: CubDB.Store.CubDB.Store.File.get_node/2
            (cubdb) lib/cubdb/btree.ex:64: CubDB.Btree.new/2
            (cubdb) lib/cubdb.ex:786: CubDB.init/1
            (stdlib) gen_server.erl:374: :gen_server.init_it/2
            (stdlib) gen_server.erl:342: :gen_server.init_it/6
            (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3

Seems like the data file has been corrupt or something? Any ideas on how to get to a working state without data loss?

CubDB crashing when under heavy load

Hello, in my system, there is a step where I can do a backfill of some data which triggers a heavy load of data being added to CubDB in sequence.

Whenever I run this, it doesn't take too much time to CubDB crash with one of the following errors:

21:59:21.270 [error] #PID<0.20761.3> 
↳ Task ChartDataStore started from #PID<0.20761.3> terminating
** (ArgumentError) CubDB.Btree.Diff can only be created from Btree sharing the same store
    (cubdb 1.0.0-rc.3) lib/cubdb/btree/diff.ex:23: CubDB.Btree.Diff.new/2
    (cubdb 1.0.0-rc.3) lib/cubdb/catch_up.ex:30: CubDB.CatchUp.catch_up/3
    (cubdb 1.0.0-rc.3) lib/cubdb/catch_up.ex:25: CubDB.CatchUp.run/4
    (elixir 1.10.3) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2
    (stdlib 3.13) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Function: &CubDB.CatchUp.run/4
    Args: [#PID<0.1085.0>, %CubDB.Btree{capacity: 32, dirt: 4983, root: {:b, [{#Reference<0.1680950116.2897215489.63666>, 34686206}, {#Reference<0.1680950116.2897215490.52850>, 34724602}, {#Reference<0.1680950116.2897215491.69273>, 34732122}, {#Reference<0.1680950116.2897215492.66431>, 34647029}, {#Reference<0.1680950116.2897215493.49555>, 34580948}, {#Reference<0.1680950116.2897215494.53090>, 34739581}, {#Reference<0.1680950116.2897215495.57143>, 34480389}, {#Reference<0.1680950116.2897215496.51278>, 34594663}, {#Reference<0.1680950116.2897215497.66685>, 34747154}, {#Reference<0.1680950116.2897215498.50918>, 34754668}, {#Reference<0.1680950116.2897215499.47614>, 23960951}, {#Reference<0.1680950116.2897215499.111489>, 34762331}]}, root_loc: 34763693, size: 5083, store: %CubDB.Store.File{file_path: ".cubdb/alert/chart_data/1.compact", pid: #PID<0.11381.0>}}, %CubDB.Btree{capacity: 32, dirt: 5083, root: {:b, [{#Reference<0.1680950116.2897215489.63666>, 37433148}, {#Reference<0.1680950116.2897215490.57710>, 37466355}, {#Reference<0.1680950116.2897215491.71489>, 37490202}, {#Reference<0.1680950116.2897215492.68894>, 37382964}, {#Reference<0.1680950116.2897215493.52301>, 37341219}, {#Reference<0.1680950116.2897215494.55098>, 37449686}, {#Reference<0.1680950116.2897215495.59217>, 37194153}, {#Reference<0.1680950116.2897215496.52447>, 37316715}, {#Reference<0.1680950116.2897215497.67787>, 37474377}, {#Reference<0.1680950116.2897215498.51040>, 37457638}, {#Reference<0.1680950116.2897215499.50688>, 26328362}, {#Reference<0.1680950116.2897215499.114166>, 37482154}]}, root_loc: 37491270, size: 5083, store: %CubDB.Store.File{file_path: ".cubdb/alert/chart_data/0.cub", pid: #PID<0.1086.0>}}, %CubDB.Btree{capacity: 32, dirt: 4957, root: {:b, [{#Reference<0.1680950116.2897215489.63666>, 34770944}, {#Reference<0.1680950116.2897215490.57532>, 34708580}, {#Reference<0.1680950116.2897215491.71430>, 34738840}, {#Reference<0.1680950116.2897215492.88501>, 34661574}, {#Reference<0.1680950116.2897215493.91491>, 34595318}, {#Reference<0.1680950116.2897215494.76776>, 34723961}, {#Reference<0.1680950116.2897215495.55915>, 34478481}, {#Reference<0.1680950116.2897215496.51183>, 34762857}, {#Reference<0.1680950116.2897215497.68116>, 34746498}, {#Reference<0.1680950116.2897215498.53319>, 34731222}, {#Reference<0.1680950116.2897215499.50095>, 24128183}, {#Reference<0.1680950116.2897215499.113642>, 34753867}]}, root_loc: 34772139, size: 5085, store: %CubDB.Store.File{file_path: ".cubdb/alert/chart_data/2.cub", pid: #PID<0.20758.3>}}]
 
21:59:21.270 [error] #PID<0.20761.3> :proc_lib "crash_report/4" "proc_lib.erl" 525 
↳ Process #PID<0.20761.3> terminating
** (ArgumentError) CubDB.Btree.Diff can only be created from Btree sharing the same store
    (cubdb 1.0.0-rc.3) lib/cubdb/btree/diff.ex:23: CubDB.Btree.Diff.new/2
    (cubdb 1.0.0-rc.3) lib/cubdb/catch_up.ex:30: CubDB.CatchUp.catch_up/3
    (cubdb 1.0.0-rc.3) lib/cubdb/catch_up.ex:25: CubDB.CatchUp.run/4
    (elixir 1.10.3) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2
    (stdlib 3.13) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Initial Call: CubDB.CatchUp.run/4
Ancestors: [#PID<0.1088.0>, ChartDataStore, Alert.Supervisor, #PID<0.1060.0>]
Message Queue Length: 0
Messages: []
Links: [#PID<0.1088.0>]
Dictionary: ["$callers": [#PID<0.1085.0>]]
Trapping Exits: false
Status: :running
Heap Size: 6772
Stack Size: 28
Reductions: 15611

22:09:55.544 [error] #PID<0.6342.2> 
↳ Task ChartDataStore started from #PID<0.6342.2> terminating
** (stop) exited in: GenServer.call(#PID<0.16818.1>, {:get, #Function<2.122651076/1 in CubDB.Store.CubDB.Store.File.get_node/2>}, 5000)
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (elixir 1.10.3) lib/gen_server.ex:1023: GenServer.call/3
    (cubdb 1.0.0-rc.3) lib/cubdb/store/file.ex:90: CubDB.Store.CubDB.Store.File.get_node/2
    (cubdb 1.0.0-rc.3) lib/cubdb/btree.ex:548: anonymous fn/2 in Enumerable.CubDB.Btree.get_children/2
    (elixir 1.10.3) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2
    (elixir 1.10.3) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2
    (cubdb 1.0.0-rc.3) lib/cubdb/btree.ex:547: Enumerable.CubDB.Btree.get_children/2
    (cubdb 1.0.0-rc.3) lib/cubdb/btree/enumerable.ex:57: CubDB.Btree.Enumerable.next/3
    (cubdb 1.0.0-rc.3) lib/cubdb/btree/enumerable.ex:43: CubDB.Btree.Enumerable.do_reduce/5
    (elixir 1.10.3) lib/enum.ex:3383: Enum.reduce/3
    (cubdb 1.0.0-rc.3) lib/cubdb/btree.ex:88: CubDB.Btree.load/3
    (cubdb 1.0.0-rc.3) lib/cubdb/compactor.ex:24: CubDB.Compactor.run/3
    (elixir 1.10.3) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2
    (stdlib 3.13) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Function: &CubDB.Compactor.run/3
    Args: [#PID<0.1072.0>, %CubDB.Btree{capacity: 32, dirt: 1218, root: {:b, [{#Reference<0.97821014.1288437761.154545>, 19692398}, {#Reference<0.97821014.1288437762.185281>, 19542772}, {#Reference<0.97821014.1288437763.195732>, 19660766}, {#Reference<0.97821014.1288437765.160023>, 19630266}, {#Reference<0.97821014.1288437767.159770>, 19668938}, {#Reference<0.97821014.1288437768.195912>, 19685024}, {#Reference<0.97821014.1288437769.192515>, 19700492}]}, root_loc: 19701266, size: 3468, store: %CubDB.Store.File{file_path: ".cubdb/alert/chart_data/8.cub", pid: #PID<0.16818.1>}}, %CubDB.Store.File{file_path: ".cubdb/alert/chart_data/A.compact", pid: #PID<0.6341.2>}]

22:09:55.549 [error] #PID<0.1072.0> :gen_server "error_info/7" "gen_server.erl" 934 
↳ GenServer ChartDataStore terminating
** (MatchError) no match of right hand side value: {:error, :enoent}
    (cubdb 1.0.0-rc.3) lib/cubdb.ex:1111: CubDB.finalize_compaction/1
    (cubdb 1.0.0-rc.3) lib/cubdb.ex:970: CubDB.handle_info/2
    (stdlib 3.13) gen_server.erl:680: :gen_server.try_dispatch/4
    (stdlib 3.13) gen_server.erl:756: :gen_server.handle_msg/6
    (stdlib 3.13) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Last message: {:catch_up, %CubDB.Btree{capacity: 32, dirt: 98, root: {:b, [{#Reference<0.97821014.1288437761.154545>, 16516069}, {#Reference<0.97821014.1288437762.167599>, 16020197}, {#Reference<0.97821014.1288437763.196716>, 16101484}, {#Reference<0.97821014.1288437765.165635>, 16523000}, {#Reference<0.97821014.1288437766.200918>, 16544685}, {#Reference<0.97821014.1288437768.165598>, 16551612}]}, root_loc: 16552890, size: 3474, store: %CubDB.Store.File{file_path: ".cubdb/alert/chart_data/9.compact", pid: #PID<0.4407.2>}}, %CubDB.Btree{capacity: 32, dirt: 1224, root: {:b, [{#Reference<0.97821014.1288437761.154545>, 19715988}, {#Reference<0.97821014.1288437762.185281>, 19542772}, {#Reference<0.97821014.1288437763.195732>, 19660766}, {#Reference<0.97821014.1288437765.160023>, 19730970}, {#Reference<0.97821014.1288437767.159770>, 19746863}, {#Reference<0.97821014.1288437768.195912>, 19723994}, {#Reference<0.97821014.1288437769.192515>, 19739410}]}, root_loc: 19747804, size: 3474, store: %CubDB.Store.File{file_path: ".cubdb/alert/chart_data/8.cub", pid: #PID<0.16818.1>}}}
State: %CubDB.State{auto_compact: {100, 0.25}, auto_file_sync: true, btree: %CubDB.Btree{capacity: 32, dirt: 1224, root: {:b, [{#Reference<0.97821014.1288437761.154545>, 19715988}, {#Reference<0.97821014.1288437762.185281>, 19542772}, {#Reference<0.97821014.1288437763.195732>, 19660766}, {#Reference<0.97821014.1288437765.160023>, 19730970}, {#Reference<0.97821014.1288437767.159770>, 19746863}, {#Reference<0.97821014.1288437768.195912>, 19723994}, {#Reference<0.97821014.1288437769.192515>, 19739410}]}, root_loc: 19747804, size: 3474, store: %CubDB.Store.File{file_path: ".cubdb/alert/chart_data/8.cub", pid: #PID<0.16818.1>}}, catch_up: #PID<0.6483.2>, clean_up: #PID<0.1074.0>, clean_up_pending: false, compactor: #PID<0.6342.2>, data_dir: ".cubdb/alert/chart_data", readers: %{}, subs: [], task_supervisor: #PID<0.1075.0>}

I didn't test it too much, but disabling compaction seems to fix it.

Allow different serialization mechanisms via plug-ins

It is not guaranteed that serialization with :erlang.term_to_binary and desarialization with :erlang.binary_to_term works correctly when using different OTP versions, see ERL-431 for an example.

I suggest addressing this issue by introducing a Serializer behavior and the default implementation being the current one and shipping with the library. There should be an option for users to provide an alternative implementation of the Serializer. I don't have much knowledge on this topic, but a quick internet search suggests me msgpax or some BSON implementation could be good candidates.

The API of inclusive/exclusive ranges in select/3 is a bit problematic

Currently, changing whether min_key: and max_key: are inclusive or exclusive is changed by wrapping the key you want to match on by {your_key, :excluded}.

I think this interface could be improved, since it conflicts when someone tries to e.g. use a {:key, :excluded} as key. Also, it will silently fail (and not match any keys it was expected to match) if someone mistypes :excluded.

I think a better approach would be to have extra options to toggle between inclusive and exclusive, for instance named min_key_inclusive: true, or min_key_settings: [exclusive: true].