sociomantic-tsunami / dlsnode Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 14.0 391 KB

Distributed log store node

License: Boost Software License 1.0

Makefile 0.19% Shell 0.48% Python 5.25% D 94.07% Dockerfile 0.01%

dlsnode's People

Contributors

Stargazers

Watchers

Forkers

nemanja-boric-sociomantic burgos mathias-lang-sociomantic leandro-lucarella-sociomantic daniel-zullo mihails-strasuns-sociomantic mathias-baumann-sociomantic jenkins-sociomantic gavin-norman-sociomantic ben-palmer-sociomantic joseph-wakeling-sociomantic joseph-wakeling-frequenz geod24 ibuclaw

dlsnode's Issues

Add tests for checkpoint service

The checkpoint service is a feature that needs to avoid regressions. There are several tests that can be added, from which some are very easy and some are more complicated to implement.

Easy:

Write into the multiple channels and expect that all buckets appear in the checkpoint file exactly one time
Confirm that the file doesn't exist on the clean exit of the node
Start the node with the data and checkpoint file and expect buckets truncated on the right spots

Medium:

Write into the channels distributed around for more than < "number of cached files" constant > buckets worth of data, so some buckets gets closed and confirm that they appear immidiately after opening the new channel, and after that expect that they are not found there

Hard:

Open new buckets while the node is inside checkpoint dump cycle, so the checkpoints are scheduled to be dumped at the end of the cycle

Adapt dlsnode to use ocean's aio

The AIO code from dlsnode was adapted/moved into ocean, here. We should adapt dlsnode to use this code, thus removing the duplication.

Add systemd support

Systemd support should be added in 3 stages:

Create the unit file and add it to the package (when available)
Deploy the unit file (deploying the package when available)
Switch the servers running the app to systemd

Usage of `%m` in CheckpointFile triggers unwarranted deprecation message

https://issues.dlang.org/show_bug.cgi?id=21177

Cache FileSystem layout and iterate over it

A PR from July 2017 has been lurking unmerged in the old private dlsnode repo. Here's Nemanja's description of it:

This is a WIP patch, with lots of work still to be done in terms of tidying up code/fixing up commits (I want to start having a CI support for the final touches), so it's not yet ready for the general review.

The patch itself introduces four main things:

B-Tree data structure, managed on the glibc's heap and the auxiliary tools for making that possible
FileSystemCache used to build the initial view and track the changes of the file system. It uses the B-Tree for storing the fs' data
FileSystemLayout which uses the range primitives and iterates over the files in the cache, for the given range.
StorageEngine/StorageEngineStepIterator which now builds and uses the cache to do the iteration, instead of old directory iteration/stat method.

I downloaded the patch of the PR and tried to apply it to this repo, but it failed:

git apply 288.patch
error: patch failed: src/dlsnode/storage/BufferedBucketOutput.d:337
error: src/dlsnode/storage/BufferedBucketOutput.d: patch does not apply
error: patch failed: src/dlsnode/storage/StorageEngine.d:197
error: src/dlsnode/storage/StorageEngine.d: patch does not apply
error: patch failed: src/dlsnode/storage/BucketFile.d:18
error: src/dlsnode/storage/BucketFile.d: patch does not apply
error: patch failed: src/dlsnode/storage/iterator/StorageEngineStepIterator.d:41
error: src/dlsnode/storage/iterator/StorageEngineStepIterator.d: patch does not apply

I don't have time now to look into applying this properly, so will just upload the patch here for posterity:
288.patch.txt

Nemanja said that this is a useful PR that was tested but wasn't merged because we planned to install the DLS nodes on servers with SSDs (an alternative way of speeding up file access). He also mentioned that the BTree implementation in the patch was merged to ocean: sociomantic-tsunami/ocean#210.

Add DLS tests for more realistic scenarios

There should be a set of DLS tests where real-world scenarios are covered - multiple writers, simultaneous read/writes, many combinations of iterations/writers, etc. In the current test suite there's no way, for example, to ignore the last chunk of the data (so it's forcing node to flush), there's no way to trigger fiber race conditions (as there's only one active fiber at the time), etc.

Merge v1.10.1 into master

(Just looking at the history, I notice it's currently not merged.)

Crash in Neo & AsyncIO

Two nodes had a crash at:

(gdb) bt
#0  0x000000000087ddea in dlsnode.util.aio.internal.AioScheduler.AioScheduler.handle_(ulong).__foreachbody3764(ref dlsnode.util.aio.internal.JobQueue.Job*) (this=0x7ffc1f1f3810, 
    __applyArg0=0x7ffc1f1f3810) at ./src/dlsnode/util/aio/internal/AioScheduler.d:199
#1  0x000000000087e047 in swarm.neo.util.TreeQueue.TreeQueue!(dlsnode.util.aio.internal.JobQueue.Job*).TreeQueue.opApply(int(ref dlsnode.util.aio.internal.JobQueue.Job*) delegate).__dgliteral499(ref ulong) (this=0x7ffc1f1f38a0, value_=0x7ffc1f1f38a0) at ./submodules/swarm/src/swarm/neo/util/TreeQueue.d:101
#2  0x000000000082ac49 in swarm.neo.util.TreeQueue.TreeQueueCore.opApply(int(ref ulong) delegate) (this=0x0, dg=...) at ./submodules/swarm/src/swarm/neo/util/TreeQueue.d:464
#3  0x000000000087e01a in swarm.neo.util.TreeQueue.TreeQueue!(dlsnode.util.aio.internal.JobQueue.Job*).TreeQueue.opApply(int(ref dlsnode.util.aio.internal.JobQueue.Job*) delegate) (this=0x7f20d0759888, dg=...) at ./submodules/swarm/src/swarm/neo/util/TreeQueue.d:98
#4  0x000000000087dd78 in dlsnode.util.aio.internal.AioScheduler.AioScheduler.handle_(ulong) (this=0x7ffc1f1f3910, n=140720830626064)
    at ./src/dlsnode/util/aio/internal/AioScheduler.d:196
#5  0x00000000007c9ea4 in ocean.io.select.client.SelectEvent.ISelectEvent.handle(ocean.sys.Epoll.epoll_event_t.Event) (this=0x0, event=1)
    at ./submodules/ocean/src/ocean/io/select/client/SelectEvent.d:147
#6  0x000000000086cf0f in ocean.io.select.selector.SelectedKeysHandler.SelectedKeysHandler.handleSelectedKey(ocean.sys.Epoll.epoll_event_t, bool(Exception) delegate) (
    this=0x7ffc1f1f39a0, unhandled_exception_hook=..., key=...) at ./submodules/ocean/src/ocean/io/select/selector/SelectedKeysHandler.d:170
#7  0x000000000086ce89 in ocean.io.select.selector.SelectedKeysHandler.SelectedKeysHandler.opCall(ocean.sys.Epoll.epoll_event_t[], bool(Exception) delegate) (this=0x0, 
    unhandled_exception_hook=..., selected_set=...) at ./submodules/ocean/src/ocean/io/select/selector/SelectedKeysHandler.d:134
#8  0x00000000008714f8 in ocean.io.select.EpollSelectDispatcher.EpollSelectDispatcher.select(bool) (this=0x7f20d073c780, exit_asap=false)
    at ./submodules/ocean/src/ocean/io/select/EpollSelectDispatcher.d:836
#9  0x000000000087134f in ocean.io.select.EpollSelectDispatcher.EpollSelectDispatcher.eventLoop(bool() delegate, bool(Exception) delegate) (this=0x7f20d073c780, 
    unhandled_exception_hook=..., select_cycle_hook=...) at ./submodules/ocean/src/ocean/io/select/EpollSelectDispatcher.d:749
#10 0x000000000072f4e9 in dlsnode.main.DlsNodeServer.run(ocean.text.Arguments.Arguments, ocean.util.config.ConfigParser.ConfigParser) (this=0x0, config=0x7ffc1f1f3c30, 
    args=0x7ffc1f1f3c30) at src/dlsnode/main.d:355
#11 0x00000000007ba82b in ocean.util.app.DaemonApp.DaemonApp.run(char[][]) (this=0x0, args=...) at ./submodules/ocean/src/ocean/util/app/DaemonApp.d:538
#12 0x0000000000856c5d in ocean.util.app.Application.Application.main(char[][]) (this=0x7f20d0732400, args=...) at ./submodules/ocean/src/ocean/util/app/Application.d:260
#13 0x000000000072ee3c in D main (cl_args=...) at src/dlsnode/main.d:98

(gdb) p job
$2 = (dlsnode.util.aio.internal.JobQueue.Job *) 0x0

Remove `notifempty` from logrotate config

See https://github.com/sociomantic/backend/issues/438.

Enable node to read gzipped files

Old data could be archived on disk, but still be accessible.

Difference in storage engine performance for Neo on cold data

During the recent deployment of test application we've noticed the slowdown when using Neo protocol which we managed to pin down to the apparent difference in the node's performance when the page cache is empty. When the requested data is already paged in in the memory, the difference matches what we were seeing when we did dry run client tests.

The results with the data paged in:

Legacy:

Timing ---
Summary: Reading time:                                 60.85
Summary: Total dls time:                               60.85
Summary: Sorting time:                                 0.00
Summary: Reduction time:                               8.35
Summary: Writing time:                                 0.00
Summary: Copying time:                                 0.00
Summary: Total time:                                   69.20

Neo:

Timing ---
Summary: Reading time:                                 64.93
Summary: Total dls time:                               64.93
Summary: Sorting time:                                 0.00
Summary: Reduction time:                               8.13
Summary: Writing time:                                 0.00
Summary: Copying time:                                 0.00
Summary: Total time:                                   73.06

The results with the data not in cache:

Legacy:

Summary: Reading time:                                 62.08
Summary: Total dls time:                               62.08
Summary: Sorting time:                                 0.00
Summary: Reduction time:                               8.49
Summary: Writing time:                                 0.00
Summary: Copying time:                                 0.00
Summary: Total time:                                   70.57

Neo:

Timing ---
Summary: Reading time:                                 81.66
Summary: Total dls time:                               81.66
Summary: Sorting time:                                 0.00
Summary: Reduction time:                               8.42
Summary: Writing time:                                 0.00
Summary: Copying time:                                 0.00
Summary: Total time:                                   90.09

The results are consistent across multiple testing rounds.

This shows that there's no unexpected difference in the Neo protocol overhead. When serving the data that's in, Neo is capable of achieving the approximately same performance. This is good news, since we did all the tricks that we could think of in the swarm and dlsproto's implementation. The only thing that we never tested/nor optimized is the dlsnode's storage engine neo path, which differs from the legacy one.

Add project to dlang's Jenkins

Similarly to what we already have with Ocean, we should add all other open source projects to dlang Jenkins to make sure new DMDs don't break them.

To make this we first need to make sure it works with the latest upstream DMD.

Evaluate `errors=remount-ro` mount option for DLS

In case of the underlying IO error, all bets are off. It might be useful to assume that not all the servers will not encounter IO error at the same time, so that we can let fs remount itself as readonly on the error, still providing read access to consumers, but redirect other writers elsewhere.

Remove StorageProtocolLegacy from dlsnode

During the transitional period to a storage protocol V1, we had to support legacy storage protocol (the protocol which supported buckets without a bucket header) as long as we had the data stored like that. The support for that is long gone and we can just remove all references to them.

Move dlsnode.util.aio.* to ocean

https://github.com/sociomantic-tsunami/dlsnode/blob/master/src/dlsnode/util/aio/AsyncIO.d and related modules are suitable for the inclusion in the Ocean, so they should be moved there and dlsnode should be updated to use it from there, once it's ready.

Update to features of swarm v4.4.0

https://github.com/sociomantic-tsunami/swarm/releases/tag/v4.4.0

Include client addr/port in logging in legacy request handlers, as appropriate.
Switch off request timing stats for long-lived neo requests.

Log process stats

e.g. see sociomantic-tsunami/dhtnode#57.