Comments (2)
Hi @dch ,
the select/3
function uses lazy streams internally, so if you do, for example, something like:
CubDB.select(db, pipe: [
map: fn {key, value} -> value end,
filter: fn x -> Integer.is_even(x) end
map: fn x -> x * 2 end,
take: 3
])
The transformations above would be executed as a lazy stream (and, in particular, only maximum 3 entries will be read from disk and processed, because of the final take: 3
).
It would be nice to have an interface such as:
# Note: this is NOT the way CubDB actually works, just illustrating a point
CubDB.select() |> Stream.map(fn x -> ... end) |> Stream.filter(...) |> ...
But, unfortunately, it would be tricky and error prone. The issue is the following: remember that CubDB allows you to perform concurrent reads and writes, so you can be executing a long running select
while some other process is performing writes. This is because the select
runs against a zero-cost immutable snapshot: it basically "sees" the database frozen to the state it was when the select started. Eventually, when a compaction operation runs, it will clean up and remove the old un-compacted database file, but it can only do so when no read operation is "seeing" the old file anymore, or it would remove it from under its feet. For this reason, CubDB has to internally keep track of all readers, and which point in the database history they are seeing.
The way the select/3
API is designed, allows CubDB to perform this internal bookkeeping without user intervention. If the API was something like the fake example above instead, one would have to manually "check in" and "check out" after finishing processing the stream. If a user would forget to "check out", compaction operations would be blocked indefinitely (or, alternatively, if compaction is allowed to run, it could break slow readers).
In conclusion, performance-wise select/3
is already using lazy streams under the hood, so it will minimize disk operations. It would be nice to have a stream-based API, but that would cause the problems described above.
Regarding allowing processes to subscribe to changes, I am curious, what would be your idea? It sounds like an interesting option, maybe better implemented as a library on top of CubDB.
from cubdb.
Closing this for now as there is no response. Feel free to comment on it and I will reopen if necessary.
from cubdb.
Related Issues (20)
- CubDB crash with :error, :emfile HOT 8
- cubdb crashes during update HOT 8
- CubDB 0.17.0 crash under load {:error, :enoent} HOT 4
- Errant files in db folder crash application permanently HOT 2
- Allow different serialization mechanisms via plug-ins
- Export/Import HOT 3
- Sort order for select? HOT 4
- How to remove all records in a database? HOT 4
- Compacting DB hangs indefinitely when done manually HOT 7
- Feature Request: `CubDB.number_of_writes/1` HOT 4
- A small doc spell mistake HOT 2
- Running out of disk space HOT 2
- Re-opening a file fails when it was closed during compaction HOT 2
- Support wildcard matching HOT 4
- Better error messages HOT 1
- Disk usage increase after moving data HOT 9
- How to select only keys without values? HOT 5
- last entry select doesn't working HOT 1
- Bug: Error on CubDB startup HOT 1
- How to sort uuid or string? HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cubdb.