pmwkaa / sophia Goto Github PK
View Code? Open in Web Editor NEWModern transactional key-value/row storage library.
Home Page: http://sophia.systems
License: Other
Modern transactional key-value/row storage library.
Home Page: http://sophia.systems
License: Other
For some reason the compiled libraries don't have matching symbols. I'm not familiar with compiling shared and static libraries or I'd send a PR.
objdump -D libsophia.a | grep "<sp_recover>:" -
# Outputs the line with the sp_recover function symbol.
objdump -D libsophia.so.1.1 | grep "<sp_recover>:" -
# Nothing is outputted.
I've asked @miyucy about his ruby bindings https://github.com/miyucy/sophia-ruby and they seem to be ready according to this issue miyucy/sophia-ruby#1 (comment)
Would it be possible to put that on sphia.org?
Hi, Dmitry.
Similarly http://sphia.org/examples.html#cursor:
int main(void) {
int rc = 1;
void *env = sp_env();
assert(env);
void *ctl = sp_ctl(env);
assert(ctl);
rc = sp_set(ctl, "sophia.path", "./storage");
assert(0 == rc);
rc = sp_set(ctl, "db", "test");
assert(0 == rc);
rc = sp_set(ctl, "db.test.index.cmp", "u32");
assert(0 == rc); /* 21 */
rc = sp_open(env);
assert(0 == rc);
rc = sp_destroy(env);
assert(0 == rc);
return 0;
}
/* a.out: cmp-test.c:21: main: Assertion `0 == rc' failed. */
But it works (similalry https://github.com/pmwkaa/sophia/blob/v1.2.2/test/ddl.test.c#L26):
int main(void) {
int rc = 1;
void *env = sp_env();
assert(env);
void *ctl = sp_ctl(env);
assert(ctl);
rc = sp_set(ctl, "sophia.path", "./storage");
assert(0 == rc);
rc = sp_set(ctl, "db", "test");
assert(0 == rc);
rc = sp_set(ctl, "db.test.index.cmp", "u32", NULL);
assert(0 == rc);
rc = sp_open(env);
assert(0 == rc);
rc = sp_destroy(env);
assert(0 == rc);
return 0;
}
Is it a bug or not?
Fix sophia source code or examples, please.
Thanks.
Is Sophia iOS/Mac compatible? It would be a great addition to Cocoa development.
6. start step (2) untile there is no updates left
s/untile/until
As seen here: http://sphia.org/architecture.html
Hi guys,
this a wonderful library :)
Is any plans to make it work on Windows?
I get a seg fault on my computer during a heavy load when a merge happens. I have a small piece of code that simulates my use-case that causes it to happen often (more than 50% on my machine). It doesn't happen 100% so I suspect it's a threading issue of sorts. I can't make it happen using forced merges either which perhaps further strengthens the case for a threading issue. Maybe I'm using the library incorrectly?
struct Coord
{
int X;
int Y;
int Z;
};
int randnext(int min, int max) {
return rand() % (max - min) + min;
}
static char** GetArrays()
{
char * a = new char[10];
a[0] = new char[8]; // an "empty" array - would store the size and such only
*(int*)a[0] = 8; // embed the size as the first int in the array so we don't have to track in a different way
for (int i = 1; i < 10; ++i) {
int size = randnext(100, 5000);
a[i] = new char[size];
*(int*)a[i] = size; // embed size - see above
}
return a;
}
static Coord* GetOffsets()
{
Coord* a = new Coord[10];
a->X = 0;
a->Y = 0;
a->Z = 0;
for (int i = 1; i < 10; ++i) {
a[i].X = rand();
a[i].Y = rand();
a[i].Z = rand();
}
return a;
}
typedef void * env;
typedef void * db;
void error(void * obj, const char * error) {
std::cout << "Exception " << error << ": " << sp_error(obj) << std::endl;
exit(-1);
}
int main() {
const int tcount = 100000;
const int mergeWatermark = 10;
const bool forceMerge = false;
int rc;
srand(1);
char** a = GetArrays();
Coord* o = GetOffsets();
void * env = sp_env();
rc = sp_ctl(env, SPDIR, SPO_CREAT | SPO_RDWR, "data");
if (rc == -1)
error(env, "setting directory");
if (forceMerge) {
rc = sp_ctl(env, SPMERGE, 0);
if (rc == -1)
error(env, "diabling merge");
}
rc = sp_ctl(env, SPMERGEWM, mergeWatermark);
if (rc == -1)
error(env, "setting watermark");
void * db = sp_open(env);
if (!db)
error(db, "opening db");
int areaSize = 0;
Coord coord;
Coord off;
for (int i = 0; i < tcount; ++i) {
if (areaSize <= 0) {
while (areaSize == 0) {
areaSize = randnext(500, 30000);
}
off = o[randnext(0, 10)];
}
--areaSize;
// choose a coord
coord.X = randnext(0, 42) + off.X;
coord.Y = randnext(0, 42) + off.Y;
coord.Z = randnext(0, 42) + off.Z;
void* readData;
size_t readSize;
int rc = sp_get(db, &coord, sizeof(Coord), &readData, &readSize);
if (rc == -1)
error(db, "getting data");
if (!rc) {
char * data = a[randnext(0, 10)];
rc = sp_set(db, &coord, sizeof(Coord), data, *(int*)data);
if (rc != 0)
error(db, "setting data");
}
else {
free(readData);
}
if (forceMerge && i != 0 && i % mergeWatermark == 0) {
rc = sp_ctl(db, SPMERGEFORCE);
if (rc == -1)
error(db, "force merge");
}
}
sp_destroy(db);
sp_destroy(env);
}
make -j8 doesn't work
roman@work:/data/work/tarantool/gh-161$ make -C sophia/db -j8
make: Entering directory `/data/work/tarantool/gh-161/third_party/sophia'
make[1]: Entering directory `/data/work/tarantool/gh-161/third_party/sophia/db'
rm -f file.o crc.o e.o i.o cat.o rep.o util.o sp.o recover.o merge.o gc.o cursor.o libsophia.a
cc -I. -pthread -std=c99 -pedantic -Wextra -Wall -fPIC -fvisibility=hidden -O2 -DNDEBUG -c file.c
cc -I. -pthread -std=c99 -pedantic -Wextra -Wall -fPIC -fvisibility=hidden -O2 -DNDEBUG -c crc.c
rm -f libsophia.so.1.1 libsophia.so.1 libsophia.so
cc -I. -pthread -std=c99 -pedantic -Wextra -Wall -fPIC -fvisibility=hidden -O2 -DNDEBUG -c e.c
cc -I. -pthread -std=c99 -pedantic -Wextra -Wall -fPIC -fvisibility=hidden -O2 -DNDEBUG -c i.c
cc -I. -pthread -std=c99 -pedantic -Wextra -Wall -fPIC -fvisibility=hidden -O2 -DNDEBUG -c cat.c
cc -I. -pthread -std=c99 -pedantic -Wextra -Wall -fPIC -fvisibility=hidden -O2 -DNDEBUG -c rep.c
cc -I. -pthread -std=c99 -pedantic -Wextra -Wall -fPIC -fvisibility=hidden -O2 -DNDEBUG -c util.c
cc -I. -pthread -std=c99 -pedantic -Wextra -Wall -fPIC -fvisibility=hidden -O2 -DNDEBUG -c sp.c
cc -I. -pthread -std=c99 -pedantic -Wextra -Wall -fPIC -fvisibility=hidden -O2 -DNDEBUG -c recover.c
cc -I. -pthread -std=c99 -pedantic -Wextra -Wall -fPIC -fvisibility=hidden -O2 -DNDEBUG -c merge.c
cc -I. -pthread -std=c99 -pedantic -Wextra -Wall -fPIC -fvisibility=hidden -O2 -DNDEBUG -c gc.c
cc -I. -pthread -std=c99 -pedantic -Wextra -Wall -fPIC -fvisibility=hidden -O2 -DNDEBUG -c cursor.c
ar crus libsophia.a file.o crc.o e.o i.o cat.o rep.o util.o sp.o recover.o merge.o gc.o cursor.o
ld file.o crc.o e.o i.o cat.o rep.o util.o sp.o recover.o merge.o gc.o cursor.o -shared -soname libsophia.so.1 -o libsophia.so.1.1
ln -s libsophia.so.1.1 libsophia.so.1
ln -s libsophia.so.1.1 libsophia.so
strip --strip-unneeded libsophia.so
make[1]: Leaving directory `/data/work/tarantool/gh-161/third_party/sophia/db'
make[1]: Entering directory `/data/work/tarantool/gh-161/third_party/sophia/test'
rm -f -f common.o common recover.o recover merge.o merge
rm -f -f crash.o crash i.o i concurrent.o concurrent
rm -f -f transaction.o transaction limit.o limit issues.o issues
make[1]: Leaving directory `/data/work/tarantool/gh-161/third_party/sophia/test'
make: Leaving directory `/data/work/tarantool/gh-161/third_party/sophia'
Optimize cases when key is a part of a document (value)
Hi there :-) when doing sp_get sophia will allocate memory by herself and they one needs to take care of freeing it. That's ok, but since the allocator is customizable, it would be nice to have a sp_free_value (or something similar) function which would use the allocator to free the value.
Thoughts?
PS: I might be able to provide some code for this, not sure how much time I'll have this week though.
http://sphia.org/architecture.html says
"primary alghorithmical constraints" instead of "primary algorithmic constraints"
The home page says "primary algorithmical constraints" instead of "primary algorithmic constraints"
The current names are slightly confusing as they are now. Is "sphia" the community's name, while "sophia" is only for the main repo? Why is it sphia.org when the <title>
reads "sophia - a modern embedded key-value database"?
I think that it would make sense to start stabilizing on one name while we still can. Maybe move to "sphia" entirely. Or keep "sophia" for everything and move to sophia-db.org.
Not that big of a deal, just a little confusing at times. If you do choose to stabilize, my vote is on keeping everything "sophia".
The function sp_iworldcmp in file i.c, the source code of function is following:
static inline int
sp_iworldcmp(spi i, char *rkey, int size)
{
register spipage *last = i->i[i->icount-1];
register int l =
i->cmp(i->i[0]->i[0]->key,
i->i[0]->i[0]->size, rkey, size, i->cmparg);
register int r =
i->cmp(last->i[last->count-1]->key,
last->i[last->count-1]->size,
rkey, size, i->cmparg);
/ inside index range /
if (l <= 0 && r >= 0)
return 0;
/ key > index min /
if (l == -1)
return -1;
/ key < index max */
assert(r == 1);
return 1;s
}
About code block :
/* inside index range /
if (l <= 0 && r >= 0)
return 0;
/ key > index min /
if (l == -1)
return -1;
/ key < index max */
assert(r == 1);
return 1;
if statement is not satisfied, when l <=0 && r < 0 or l >0 && r>=0
then, if (l <=0 && r>0) <==> min key <= rkey, max key >rkey is contradiction ????
when (l >0 && r>=0) is similar .
The short following program deadlocks, it opens a cursor and then tries to read a value that isn't through the cursor.
Just out of curiosity, I ran the test suite with make VALGRIND='valgrind --leak-check=full' test
. Everything looks good excluding the issues tests:
valgrind --leak-check=full ./issues
==16091== Memcheck, a memory error detector
==16091== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==16091== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==16091== Command: ./issues
==16091==
gh_5: ok
gh_29: ok
gh_37: ok
==16091==
==16091== HEAP SUMMARY:
==16091== in use at exit: 90 bytes in 3 blocks
==16091== total heap usage: 100,981 allocs, 100,978 frees, 103,079,510 bytes allocated
==16091==
==16091== 15 bytes in 1 blocks are definitely lost in loss record 1 of 3
==16091== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16091== by 0x40D953: sp_vnewh (in /home/stephen/sophia/test/issues)
==16091== by 0x40DA20: sp_vdupref (in /home/stephen/sophia/test/issues)
==16091== by 0x407A32: sp_merge (in /home/stephen/sophia/test/issues)
==16091== by 0x402617: merger (in /home/stephen/sophia/test/issues)
==16091== by 0x4E39E99: start_thread (pthread_create.c:308)
==16091==
==16091== 15 bytes in 1 blocks are definitely lost in loss record 2 of 3
==16091== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16091== by 0x40D9C4: sp_vdup (in /home/stephen/sophia/test/issues)
==16091== by 0x40DA38: sp_vdupref (in /home/stephen/sophia/test/issues)
==16091== by 0x407A5F: sp_merge (in /home/stephen/sophia/test/issues)
==16091== by 0x402617: merger (in /home/stephen/sophia/test/issues)
==16091== by 0x4E39E99: start_thread (pthread_create.c:308)
==16091==
==16091== 60 bytes in 1 blocks are definitely lost in loss record 3 of 3
==16091== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16091== by 0x40D664: sp_pagenew (in /home/stephen/sophia/test/issues)
==16091== by 0x4079F5: sp_merge (in /home/stephen/sophia/test/issues)
==16091== by 0x402617: merger (in /home/stephen/sophia/test/issues)
==16091== by 0x4E39E99: start_thread (pthread_create.c:308)
==16091==
==16091== LEAK SUMMARY:
==16091== definitely lost: 90 bytes in 3 blocks
==16091== indirectly lost: 0 bytes in 0 blocks
==16091== possibly lost: 0 bytes in 0 blocks
==16091== still reachable: 0 bytes in 0 blocks
==16091== suppressed: 0 bytes in 0 blocks
==16091==
==16091== For counts of detected and suppressed errors, rerun with: -v
==16091== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 2 from 2)
Super excited for the new release! I've started playing with it, but can't get the "set example" to work.
I've tried on both OSX and Ubuntu.
Makefile
SOPHIA_DIR = deps/sophia/sophia
SOPHIA_SRC = $(wildcard $(SOPHIA_DIR)/*/*.c $(SOPHIA_DIR)/*/*.h)
SOPHIA_CFLAGS = -std=c99 -pedantic -Wall -Wextra -pthread
CFLAGS = -std=c99 -Wextra -Wall -pthread
test: sophia.o test.o
$(CC) $(CFLAGS) $^ -o $@
test.o: test.c sophia.h
$(CC) $(CFLAGS) -c $< -o $@
sophia.o: sophia.c
$(CC) $(SOPHIA_CFLAGS) -c $< -o $@
sophia.h: deps/sophia/sophia/sophia/sophia.h
cp $< $@
sophia.c: $(SOPHIA_SRC)
$(SOPHIA_DIR)/build $(SOPHIA_DIR) $@
clean:
rm -f test.o test
.PHONY: clean
test.c
#include <stdlib.h>
#include <stdio.h>
#include "sophia.h"
int main(){
/* create sophia environment */
void *env = sp_env();
void *ctl = sp_ctl(env);
sp_set(ctl, "sophia.path", "./storage");
/* create database */
sp_set(ctl, "db", "test");
void *db = sp_get(ctl, "db.test");
int rc = sp_open(env);
if (rc == -1) {
fprintf(stderr, "error!\n");
return 1;
}
/* insert */
char key[] = "hello";
char value[] = "world";
void *o = sp_object(env, db);
sp_set(o, "key", key, sizeof(key));
sp_set(o, "value", value, sizeof(value));
rc = sp_set(db, o);
if (rc == -1) {
fprintf(stderr, "error!\n");
return 1;
}
/* replace */
sp_set(o, "key", key, sizeof(key));
sp_set(o, "value", value, sizeof(value));
rc = sp_set(db, o);
if (rc == -1) {
fprintf(stderr, "error!\n");
return 1;
}
/* finish */
rc = sp_destroy(env);
if (rc == -1) {
fprintf(stderr, "error!\n");
return 1;
}
return 0;
}
output
$ make
deps/sophia/sophia/build deps/sophia/sophia sophia.c
fatal: Needed a single revision
cc -std=c99 -pedantic -Wall -Wextra -pthread -c sophia.c -o sophia.o
[ . . . ]
36 warnings generated.
cp deps/sophia/sophia/sophia/sophia.h sophia.h
cc -std=c99 -Wextra -Wall -pthread -c test.c -o test.o
cc -std=c99 -Wextra -Wall -pthread sophia.o test.o -o test
$ ./test
Segmentation fault: 11
I have seen that no loop is used around a call of the function "pthread_cond_wait".
Would you like to reuse anything from my message on the topic "spurious wakeup"?
Same software as #53
violino:/home/software/leveldb> ./db_bench_sophia --benchmarks=fillseqbatch
Sophia: version 1.1
Date: Wed Jun 25 08:36:57 2014
CPU: 4 * Intel(R) Core(TM)2 Extreme CPU Q9300 @ 2.53GHz
CPUCache: 6144 KB
Keys: 16 bytes each
Values: 100 bytes each (50 bytes after compression)
Entries: 1000000
RawSize: 110.6 MB (estimated)
FileSize: 62.9 MB (estimated)
WARNING: Optimization is disabled: benchmarks unnecessarily slow
fillseqbatch : 2.285 micros/op 437600 ops/sec; 48.4 MB/s
130992 /tmp/test1/dbbench_sph-1
130992 /tmp/test1
violino:/home/software/leveldb> gdb --args ./db_bench_sophia --benchmarks=readwhilewriting --use_existing_db=1 --threads=4
GNU gdb (GDB) 7.5.91.20130417-cvs-ubuntu
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from /home/software/leveldb/db_bench_sophia...done.
(gdb) run
Starting program: /home/software/leveldb/db_bench_sophia --benchmarks=readwhilewriting --use_existing_db=1 --threads=4
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Sophia: version 1.1
Date: Wed Jun 25 08:37:15 2014
CPU: 4 * Intel(R) Core(TM)2 Extreme CPU Q9300 @ 2.53GHz
CPUCache: 6144 KB
Keys: 16 bytes each
Values: 100 bytes each (50 bytes after compression)
Entries: 1000000
RawSize: 110.6 MB (estimated)
FileSize: 62.9 MB (estimated)
WARNING: Optimization is disabled: benchmarks unnecessarily slow
[New Thread 0x7fffefb74700 (LWP 16237)]
[New Thread 0x7fffef373700 (LWP 16238)]
[New Thread 0x7fffeeb72700 (LWP 16239)]
[New Thread 0x7fffee371700 (LWP 16240)]
[New Thread 0x7fffedb70700 (LWP 16241)]
[New Thread 0x7fffed36f700 (LWP 16242)]
... finished 100 ops
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffee371700 (LWP 16240)]
0x00000000004195f0 in sp_iminof (i=0x629200, p=0x7fffe0000980,
rkey=0x7fffee370d40 "0000000000632053", size=16, idx=0x7fffee370b70) at i.c:103
103 p->i[mid]->size, rkey, size, i->cmparg);
(gdb) info thr
Id Target Id Frame
7 Thread 0x7fffed36f700 (LWP 16242) "db_bench_sophia" 0x00007ffff70978dd in nanosleep ()
at ../sysdeps/unix/syscall-template.S:81
6 Thread 0x7fffedb70700 (LWP 16241) "db_bench_sophia" 0x000000000041ab0d in cmpkey (
c=0x629248, p=0x1421070, rkey=0x7fffedb6fd40, size=16) at cat.c:116
Thread 7 (Thread 0x7fffed36f700 (LWP 16242)):
#0 0x00007ffff70978dd in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007ffff70c9524 in usleep (useconds=)
at ../sysdeps/unix/sysv/linux/usleep.c:32
#2 0x00000000004157c9 in sp_lock (l=0x62d346 "") at ./lock.h:41
#3 0x00000000004174d5 in sp_match (s=0x629130, k=0x7fffed36ed40, ksize=16, v=0x7fffed36ece8,
vsize=0x7fffed36ecf0) at cursor.c:497
#4 0x0000000000410091 in sp_get (o=0x629130, k=0x7fffed36ed40, ksize=16, v=0x7fffed36ece8,
vsize=0x7fffed36ecf0) at sp.c:720
#5 0x0000000000404eba in leveldb::Benchmark::ReadRandom (this=0x7fffffffe4a0, thread=0x142ec80)
at doc/bench/db_bench_sophia.cc:835
#6 0x0000000000404fd5 in leveldb::Benchmark::ReadWhileWriting (this=0x7fffffffe4a0,
thread=0x142ec80) at doc/bench/db_bench_sophia.cc:849
#7 0x00000000004078bc in leveldb::Benchmark::ThreadBody (v=0x635850)
at doc/bench/db_bench_sophia.cc:657
#8 0x0000000000407eca in leveldb::(anonymous namespace)::StartThreadWrapper(void*) ()
#9 0x00007ffff73a6f8e in start_thread (arg=0x7fffed36f700) at pthread_create.c:311
#10 0x00007ffff70d0a0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 6 (Thread 0x7fffedb70700 (LWP 16241)):
#0 0x000000000041ab0d in cmpkey (c=0x629248, p=0x1421070, rkey=0x7fffedb6fd40, size=16)
at cat.c:116
#1 0x000000000041abda in sp_catfind (c=0x629248, rkey=0x7fffedb6fd40 "0000000000467841", size=16,
index=0x7fffedb6fc24) at cat.c:135
#2 0x00000000004175dc in sp_match (s=0x629130, k=0x7fffedb6fd40, ksize=16, v=0x7fffedb6fce8,
vsize=0x7fffedb6fcf0) at cursor.c:517
#3 0x0000000000410091 in sp_get (o=0x629130, k=0x7fffedb6fd40, ksize=16, v=0x7fffedb6fce8,
---Type to continue, or q to quit---
vsize=0x7fffedb6fcf0) at sp.c:720
#4 0x0000000000404eba in leveldb::Benchmark::ReadRandom (this=0x7fffffffe4a0, thread=0x636c60)
at doc/bench/db_bench_sophia.cc:835
#5 0x0000000000404fd5 in leveldb::Benchmark::ReadWhileWriting (this=0x7fffffffe4a0,
thread=0x636c60) at doc/bench/db_bench_sophia.cc:849
#6 0x00000000004078bc in leveldb::Benchmark::ThreadBody (v=0x635828)
at doc/bench/db_bench_sophia.cc:657
#7 0x0000000000407eca in leveldb::(anonymous namespace)::StartThreadWrapper(void*) ()
#8 0x00007ffff73a6f8e in start_thread (arg=0x7fffedb70700) at pthread_create.c:311
#9 0x00007ffff70d0a0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 5 (Thread 0x7fffee371700 (LWP 16240)):
#0 0x00000000004195f0 in sp_iminof (i=0x629200, p=0x7fffe0000980,
rkey=0x7fffee370d40 "0000000000632053", size=16, idx=0x7fffee370b70) at i.c:103
#1 0x000000000041a028 in sp_igetraw (i=0x629200, rkey=0x7fffee370d40 "0000000000632053", size=16)
at i.c:291
#2 0x0000000000417368 in sp_matchi (s=0x629130, i=0x629200, key=0x7fffee370d40, size=16,
v=0x7fffee370ce8, vsize=0x7fffee370cf0) at cursor.c:467
#3 0x00000000004174ac in sp_match (s=0x629130, k=0x7fffee370d40, ksize=16, v=0x7fffee370ce8,
vsize=0x7fffee370cf0) at cursor.c:492
#4 0x0000000000410091 in sp_get (o=0x629130, k=0x7fffee370d40, ksize=16, v=0x7fffee370ce8,
vsize=0x7fffee370cf0) at sp.c:720
#5 0x0000000000404eba in leveldb::Benchmark::ReadRandom (this=0x7fffffffe4a0, thread=0x6365c0)
at doc/bench/db_bench_sophia.cc:835
#6 0x0000000000404fd5 in leveldb::Benchmark::ReadWhileWriting (this=0x7fffffffe4a0,
thread=0x6365c0) at doc/bench/db_bench_sophia.cc:849
#7 0x00000000004078bc in leveldb::Benchmark::ThreadBody (v=0x635800)
at doc/bench/db_bench_sophia.cc:657
---Type to continue, or q to quit---
#8 0x0000000000407eca in leveldb::(anonymous namespace)::StartThreadWrapper(void*) ()
#9 0x00007ffff73a6f8e in start_thread (arg=0x7fffee371700) at pthread_create.c:311
#10 0x00007ffff70d0a0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 4 (Thread 0x7fffeeb72700 (LWP 16239)):
#0 0x00007ffff70978dd in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007ffff70c9524 in usleep (useconds=)
at ../sysdeps/unix/sysv/linux/usleep.c:32
#2 0x00000000004157c9 in sp_lock (l=0x62d346 "") at ./lock.h:41
#3 0x00000000004174d5 in sp_match (s=0x629130, k=0x7fffeeb71d40, ksize=16, v=0x7fffeeb71ce8,
vsize=0x7fffeeb71cf0) at cursor.c:497
#4 0x0000000000410091 in sp_get (o=0x629130, k=0x7fffeeb71d40, ksize=16, v=0x7fffeeb71ce8,
vsize=0x7fffeeb71cf0) at sp.c:720
#5 0x0000000000404eba in leveldb::Benchmark::ReadRandom (this=0x7fffffffe4a0, thread=0x635f20)
at doc/bench/db_bench_sophia.cc:835
#6 0x0000000000404fd5 in leveldb::Benchmark::ReadWhileWriting (this=0x7fffffffe4a0,
thread=0x635f20) at doc/bench/db_bench_sophia.cc:849
#7 0x00000000004078bc in leveldb::Benchmark::ThreadBody (v=0x6357d8)
at doc/bench/db_bench_sophia.cc:657
#8 0x0000000000407eca in leveldb::(anonymous namespace)::StartThreadWrapper(void*) ()
#9 0x00007ffff73a6f8e in start_thread (arg=0x7fffeeb72700) at pthread_create.c:311
#10 0x00007ffff70d0a0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 3 (Thread 0x7fffef373700 (LWP 16238)):
#0 0x00007ffff70978dd in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007ffff70c9524 in usleep (useconds=)
at ../sysdeps/unix/sysv/linux/usleep.c:32
#2 0x000000000040c9f3 in sp_lock (l=0x62d346 "") at ./lock.h:41
---Type to continue, or q to quit---
#3 0x000000000040f341 in sp_commit (o=0x629130) at sp.c:489
#4 0x00000000004053b5 in leveldb::Benchmark::BGWriter (this=0x7fffffffe4a0, thread=0x635880)
at doc/bench/db_bench_sophia.cc:903
#5 0x0000000000404fea in leveldb::Benchmark::ReadWhileWriting (this=0x7fffffffe4a0,
thread=0x635880) at doc/bench/db_bench_sophia.cc:851
#6 0x00000000004078bc in leveldb::Benchmark::ThreadBody (v=0x6357b0)
at doc/bench/db_bench_sophia.cc:657
#7 0x0000000000407eca in leveldb::(anonymous namespace)::StartThreadWrapper(void*) ()
#8 0x00007ffff73a6f8e in start_thread (arg=0x7fffef373700) at pthread_create.c:311
#9 0x00007ffff70d0a0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 2 (Thread 0x7fffefb74700 (LWP 16237)):
#0 pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000040d599 in sp_taskwait (t=0x62d280) at ./task.h:58
#2 0x000000000040e8f6 in merger (arg=0x62d280) at sp.c:287
#3 0x00007ffff73a6f8e in start_thread (arg=0x7fffefb74700) at pthread_create.c:311
#4 0x00007ffff70d0a0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 1 (Thread 0x7ffff7fc6740 (LWP 16231)):
#0 pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000040bf0d in leveldb::port::CondVar::Wait() ()
#2 0x0000000000404648 in leveldb::Benchmark::RunBenchmark (this=0x7fffffffe4a0, n=5, name=...,
method=
(void (leveldb::Benchmark::*)(leveldb::Benchmark * const, leveldb::(anonymous namespace)::ThreadState *)) 0x404fa8 <leveldb::Benchmark::ReadWhileWriting(leveldb::(anonymous namespace)::ThreadState*)>) at doc/bench/db_bench_sophia.cc:695
---Type to continue, or q to quit---
#3 0x0000000000407653 in leveldb::Benchmark::Run (this=0x7fffffffe4a0)
at doc/bench/db_bench_sophia.cc:621
#4 0x0000000000405b15 in main (argc=4, argv=0x7fffffffe5c8) at doc/bench/db_bench_sophia.cc:982
(gdb)
I'm curious when the sophia development branch will be merged into master. It is looking great !
I was browsing through the headers and wondering: why hide core types like spenv
instead of exposing them in sophia.h
?
Would love to add this as another back-end to LevelUP where a ton of custom-database work is going on in Node.js-land but the only thing you're missing that we really need for our primitives is an atomic batch operation. For both set & delete in a single batch, all-fail or all-succeed.
I have looked at a few source files for your current software. I have noticed that some checks for return codes are missing.
Would you like to add more error handling for return values from functions like the following?
(2nd paragraph):
It is written entirely in C and can be used in C/C++ applications or by any interpreted
interpretlanguage if the appropriateapproriatelibrary bindings are available.
(3rd paragraph):
Sophia supports the
supporttraditional set/get/delete operations semantic plus range queries by using cursors. Database API has been designed to bebeing madeas clean as possible and easy to use.
(4th paragraph):
All sophia API declarations are stored in a separate include file: sophia.h
(1st paragraph):
All functions
function'sreturn either 0 on success, or -1 on error. The only exceptionExceptionis functions that return a pointer. In that case NULL indicates an error.
(3rd paragraph):
Any error description can be accessed through sp_error(3) function.
Exception is sp_env(3) function, error on which indicates out-of-memory condition(I don't get it, make it clearer?).
(4th paragraph):
All created objects must be freed
free'dby sp_destroy(3) function. The only exceptionExceptionis the sp_get(3) function, which allocates the returned value viabymalloc(3) (and should be freedfree'dusingbyfree(3) accordingly) or using thebyallocator-free function specified in sp_ctl(3).
(1st paragraph):
To open or create a database the sp_open(3) function must be used with a configuration object (environment) previously created using
bysp_env(3). The sp_env(3) function must be initialized usingintialized bysp_ctl(3). Before openingopen, a databasedatbasedirectory must be specified by SPDIR.
(2nd paragraph):
To create a database directory the SPO_CREAT flag must be passed
specifiedto the sp_ctl(SPDIR) function.
(3rd paragraph):
The sp_open(3) function allocates a database handle and does
dothe main job of creatingto createa new or recoveringrecoveran existing database.
(1st paragraph):
There are a number of operations available to write or read
adata. All functions accept first arguments as key and key size.
(4th paragraph):
sp_get(3) allocates a value, which should be freed
free'dafterwards using thebyfree(3) function (or by allocator specific function passed tosupplied bysp_ctl(SPALLOC).
(5th paragraph):
Please note that**
,(no comma)** it is much faster to use range queries forasequential key reads.
(1st paragraph):
It is possible to do
arange queries using cursors.
(2nd paragraph):
To create a cursor the sp_cursor(3) function should be used. Cursor creation function allows to specify a query order and a start key.
(4th paragraph):
If no key is specified**
,(no comma)** whole key range will be traversed.
(7th paragraph):
Cursor objects
objectshould be freed using thefree'd bysp_destroy(3) function after a usage.
(2nd paragraph):
All times for common operations, such as set (write), get (read) and whole database traversal have
traversing timesbeen measured for Sophia and LevelDB 1.13.0 using default configurations to make it as fair as possible. LevelDB data compression has been turned off. Tests are separate by key order which is sequential and random.
I see man pages on sphia.org but they're not with the source?
$ uname -a
Linux uklt 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:12 UTC 2014 i686 i686 i686 GNU/Linux
$ make
[ . . . ]
rt/sr_ctl.o: In function `sr_ctldump':
/home/stephen/sophia/rt/sr_ctl.c:146: undefined reference to `__stack_chk_fail_local'
mvcc/sm.o: In function `sm_prepare':
/home/stephen/sophia/mvcc/sm.c:132: undefined reference to `__stack_chk_fail_local'
mvcc/sm.o: In function `sm_commit':
/home/stephen/sophia/mvcc/sm.c:161: undefined reference to `__stack_chk_fail_local'
mvcc/sm.o: In function `sm_rollback':
/home/stephen/sophia/mvcc/sm.c:187: undefined reference to `__stack_chk_fail_local'
mvcc/sm_deadlock.o: In function `sm_deadlock_in':
/home/stephen/sophia/mvcc/sm_deadlock.c:41: undefined reference to `__stack_chk_fail_local'
mvcc/sm_deadlock.o:/home/stephen/sophia/mvcc/sm_deadlock.c:77: more undefined references to `__stack_chk_fail_local' follow
make: *** [dynamic_build] Error 1
full output available here
sp_delete
returns 0
when given a non-existing key:
$ rm -fr ./db
$ cc -pthread -lsophia delete.c -o test
$ ./test
Assertion failed: (-1 == sp_delete(db, &key, sizeof(key))), function main, file delete.c, line 18.
Abort trap: 6
delete.c
#include <sophia/sp.h>
#include <assert.h>
int
main(void) {
void *env = NULL;
void *db = NULL;
env = sp_env();
if (NULL == env) goto fail;
if (-1 == sp_ctl(env, SPDIR, SPO_CREAT|SPO_RDWR, "./db")) goto fail;
if (NULL == (db = sp_open(env))) goto fail;
int key = 1234;
assert(-1 == sp_delete(db, &key, sizeof(key)));
sp_destroy(env);
sp_destroy(db);
return 0;
fail:
if (env) sp_destroy(env);
if (db) sp_destroy(db);
return 1;
}
It currently says:
https://github.com/jwerle/sphia
and it should now be:
https://github.com/sphia/sphia
thanks ! =)
Please feel free to add your list of features that you think might be missing right now in current release.
Let me start:
On next release as i was planning to add, mostly optimization fixes (database can be even faster than right now):
Using rev 493d4ba
Happens pretty consistently. Test program source is here https://github.com/hyc/leveldb/tree/benches/doc/bench
violino:/home/software/leveldb> gdb db_bench_sophia
GNU gdb (GDB) 7.5.91.20130417-cvs-ubuntu
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from /home/software/leveldb/db_bench_sophia...done.
(gdb) run
Starting program: /home/software/leveldb/db_bench_sophia
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Sophia: version 1.1
Date: Wed Jun 25 08:27:46 2014
CPU: 4 * Intel(R) Core(TM)2 Extreme CPU Q9300 @ 2.53GHz
CPUCache: 6144 KB
Keys: 16 bytes each
Values: 100 bytes each (50 bytes after compression)
Entries: 1000000
RawSize: 110.6 MB (estimated)
FileSize: 62.9 MB (estimated)
WARNING: Optimization is disabled: benchmarks unnecessarily slow
[New Thread 0x7ffff6fd5700 (LWP 16023)]
[New Thread 0x7ffff67d4700 (LWP 16024)]
fillrandsync : 12.553 micros/op 79662 ops/sec; 8.8 MB/s (1000 ops)
[Thread 0x7ffff67d4700 (LWP 16024) exited]
132 /tmp/test1/dbbench_sph-1
132 /tmp/test1
[Thread 0x7ffff6fd5700 (LWP 16023) exited]
[New Thread 0x7ffff6fd5700 (LWP 16028)]
[New Thread 0x7ffff5fd3700 (LWP 16029)]
fillrandom : 5.952 micros/op 167997 ops/sec; 18.6 MB/s
[Thread 0x7ffff5fd3700 (LWP 16029) exited]
150144 /tmp/test1/dbbench_sph-2
150144 /tmp/test1
[Thread 0x7ffff6fd5700 (LWP 16028) exited]
[New Thread 0x7ffff6fd5700 (LWP 16033)]
[New Thread 0x7ffff57d2700 (LWP 16034)]
fillrandbatch : 3.222 micros/op 310360 ops/sec; 34.3 MB/s
[Thread 0x7ffff57d2700 (LWP 16034) exited]
155956 /tmp/test1/dbbench_sph-3
155956 /tmp/test1
[Thread 0x7ffff6fd5700 (LWP 16033) exited]
[New Thread 0x7ffff6fd5700 (LWP 16039)]
[New Thread 0x7ffff4fd1700 (LWP 16040)]
fillseqsync : 12.770 micros/op 78308 ops/sec; 8.7 MB/s (1000 ops)
[Thread 0x7ffff4fd1700 (LWP 16040) exited]
132 /tmp/test1/dbbench_sph-4
132 /tmp/test1
[Thread 0x7ffff6fd5700 (LWP 16039) exited]
[New Thread 0x7ffff6fd5700 (LWP 16044)]
[New Thread 0x7fffeffff700 (LWP 16045)]
fillseq : 4.804 micros/op 208164 ops/sec; 23.0 MB/s
[Thread 0x7fffeffff700 (LWP 16045) exited]
130856 /tmp/test1/dbbench_sph-5
130856 /tmp/test1
[Thread 0x7ffff6fd5700 (LWP 16044) exited]
[New Thread 0x7ffff6fd5700 (LWP 16050)]
[New Thread 0x7fffef7fe700 (LWP 16051)]
fillseqbatch : 2.276 micros/op 439402 ops/sec; 48.6 MB/s
[Thread 0x7fffef7fe700 (LWP 16051) exited]
130992 /tmp/test1/dbbench_sph-6
130992 /tmp/test1
[New Thread 0x7fffec963700 (LWP 16053)]
overwrite : 6.335 micros/op 157848 ops/sec; 17.5 MB/s
[Thread 0x7fffec963700 (LWP 16053) exited]
239348 /tmp/test1/dbbench_sph-6
239348 /tmp/test1
[New Thread 0x7fffeeffd700 (LWP 16055)]
readrandom : 3.568 micros/op 280256 ops/sec; (1000000 of 1000000 found)
[Thread 0x7fffeeffd700 (LWP 16055) exited]
[New Thread 0x7fffee7fc700 (LWP 16056)]
db_bench_sophia: cursor.c:283: sp_next: Assertion `c->r.v.vh == c->pv' failed.
Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffee7fc700 (LWP 16056)]
0x00007ffff700d037 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007ffff700d037 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff7010698 in __GI_abort () at abort.c:90
#2 0x00007ffff7005e03 in __assert_fail_base (
fmt=0x7ffff715d158 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x41e293 "c->r.v.vh == c->pv", file=file@entry=0x41e242 "cursor.c",
line=line@entry=283, function=function@entry=0x41e2bf <__PRETTY_FUNCTION__.5459> "sp_next")
at assert.c:92
#3 0x00007ffff7005eb2 in __GI___assert_fail (assertion=0x41e293 "c->r.v.vh == c->pv",
file=0x41e242 "cursor.c", line=283, function=0x41e2bf <__PRETTY_FUNCTION__.5459> "sp_next")
at assert.c:101
#4 0x0000000000416b57 in sp_next (c=0x7fffe8652c00) at cursor.c:283
#5 0x00000000004172fa in sp_iterate (c=0x7fffe8652c00) at cursor.c:453
#6 0x0000000000410234 in sp_fetch (o=0x7fffe8652c00) at sp.c:748
#7 0x0000000000404d99 in leveldb::Benchmark::ReadSequential (this=0x7fffffffe500, thread=0x62d510)
at doc/bench/db_bench_sophia.cc:809
#8 0x00000000004078bc in leveldb::Benchmark::ThreadBody (v=0x628e50)
at doc/bench/db_bench_sophia.cc:657
#9 0x0000000000407eca in leveldb::(anonymous namespace)::StartThreadWrapper(void*) ()
#10 0x00007ffff73a6f8e in start_thread (arg=0x7fffee7fc700) at pthread_create.c:311
#11 0x00007ffff70d0a0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) frame 7
#7 0x0000000000404d99 in leveldb::Benchmark::ReadSequential (this=0x7fffffffe500, thread=0x62d510)
at doc/bench/db_bench_sophia.cc:809
809 while (sp_fetch(cursor) == 1) {
(gdb) p bytes
$1 = 928
(gdb) p *thread
$2 = {tid = 0, rand = {seed_ = 1000}, stats = {id_ = 0, start_ = 1403710092992610,
finish_ = 1403710092992610, seconds_ = 0, done_ = 8, last_report_done_ = 0,
next_report_ = 100, bytes_ = 0, last_op_finish_ = 1403710092990744,
last_report_finish_ = 1403710092992610, hist_ = {min_ = 9.9999999999999997e+199, max_ = 0,
num_ = 0, sum_ = 0, sum_squares_ = 0, static kBucketLimit = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
12, 14, 16, 18, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200,
250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000, 1200, 1400, 1600, 1800, 2000,
2500, 3000, 3500, 4000, 4500, 5000, 6000, 7000, 8000, 9000, 10000, 12000, 14000, 16000,
18000, 20000, 25000, 30000, 35000, 40000, 45000, 50000, 60000, 70000, 80000, 90000,
100000, 120000, 140000, 160000, 180000, 200000, 250000, 300000, 350000, 400000, 450000,
500000, 600000, 700000, 800000, 900000, 1000000, 1200000, 1400000, 1600000, 1800000,
2000000, 2500000, 3000000, 3500000, 4000000, 4500000, 5000000, 6000000, 7000000, 8000000,
9000000, 10000000, 12000000, 14000000, 16000000, 18000000, 20000000, 25000000, 30000000,
35000000, 40000000, 45000000, 50000000, 60000000, 70000000, 80000000, 90000000, 100000000,
120000000, 140000000, 160000000, 180000000, 200000000, 250000000, 300000000, 350000000,
400000000, 450000000, 500000000, 600000000, 700000000, 800000000, 900000000, 1000000000,
1200000000, 1400000000, 1600000000, 1800000000, 2000000000, 2500000000, 3000000000,
3500000000, 4000000000, 4500000000, 5000000000, 6000000000, 7000000000, 8000000000,
9000000000, 9.9999999999999997e+199}, buckets_ = {0 <repeats 154 times>}}, message_ = {
static npos = ,
_M_dataplus = {std::allocator = {<__gnu_cxx::new_allocator> = {}, }, _M_p = 0x627558 <ZNSs4_Rep20_S_empty_rep_storageE@@GLIBCXX_3.4+24> ""}},
exclude_from_merge = false}, shared = 0x7fffffffe0d0}
(gdb)
Seems to recently the build started to fail: ld: unknown option: -shared
.
OS:
GCC:
Removing -shared
will result in ld: unknown option: -soname
.
Removing -soname *
will result in ld: warning: -macosx_version_min not specified, assuming 10.8
, proceeding with: Undefined symbols for architecture x86_64: โฆ
.
What is the roadmap for sophia?
Hello,
Seems to be high performant (:
I always dreamed about a smart and simple embedded database that could be replicated among mutliple servers very easily with just one parameters from command line.
Let's say we a have a router, the master that all applications will hit. This master will "route" all queries depending on who has the data (something like shard keys on mongoDB : http://docs.mongodb.org/manual/core/sharding-shard-key/). Something thing very simple with a strict convention and without configuration.
This master knows where are the other database (automatic synchronization with something like this : https://github.com/strongloop/sc-discovery or we pass to it a list of ip address).
Would be just awesome, but as I said it's a dream ;)
The formal question is : is there a way to replicate sophia db ?
Thanks
I made an attempt to fix some of the grammar/spelling issues in your architecture documentation. You should get a native speaker to proof-read your stuff before releasing it. Otherwise you're just shooting yourself (and your own work) in the foot.
PREFACE
Sophia (adj. sophisticated) database and its
it'sarchitecture was born as a result of research and rethinking primary alghorithmical constraints associated withgetting a popular Log-file based data structure, such as LSM-tree, its it's variations based on Fractional Cascading ideas and a B-Tree.(this sentence is ungrammatical and hard to make sense of)Those limitations are slow read's, meta-data bloat and unexpected latency hops.
Most
of alog-based databases (or LSM-alike specifically) trend to organize their file storage as a collection of sorted files which are periodically mergedmerges. Thus, without using some sort of key filtering scheme (like bloom-filter), to find a key, it hashaveto traverse all files which canand couldtake up to O(files_count * log(file_size)) in the worst case.Sophia was specifically
isdesignedthe wayto improve the situation and getafast reads while still benefit from append-only design.DESIGN
Sophia's
Sophiaarchitecture combines a region in-memory index with *a * in-memory key index.A region
Regionindex is represented as an ordered range of regions with their min and max keys and a latest on-disk reference. Regionsarenever overlap.These regions have the same semantical meaning as the B-Tree pages, but are designed differently. They do not have a tree structure or
,any internal page-to-page relationships and thus no, thus, ameta-data overhead (specifically to append-only B-Tree).A single region on-disk holds keys with values. And as a B-tree page, region has it's maximum key count. Regions are uniquely identified by region id number, by which they can be tracked in
afuture.A key
Keyindex is very similar to LSM zero-level (memtable), but has ahavedifferent key lifecycle. All modifications firstget into there(needs rephrasing) and hold untilin tillthey will be explicitly removed by merger.The database
Databaseupdate lifecycle is organized in terms of epochsEpoch's. Epoch lifetime is determined in terms of key updates. When the update counter reaches an(/the) epoch'sepochwatermark number then the Rotation event happen.Each epoch, depending to its
it'sstate, is associated with a single log file or database file. Before getting added added to the in-memory index,everymodifications are first written to the epoch's write-ahead log.On each rotation event:
a. current epoch, which is called 'live', is marked as 'transfer' and a new 'live' epoch is created (new log file)
b. create new and swap current in-memory key index
c. merger thread is being woken upThe merger
Mergerthread is the core part that is responsible for region merging and garbage collecting of a old regions and older epochsepoch's. On wakeup**,** the merger thread iterates through list of epochs marked as 'transfer' and starts thestartmerge procedure.The merge procedure has the following steps:
- create new database file for the latest 'transfer' epoch
- fetch any keys from the in-memory index that associated with a single destination region
- for each
foreachfetched keykeysand origin region start the merge and write a new region to the database file- on each completed region (current merged key count is less or equal to max region key count):
a. allocate new split region for region index, set min and max
b. first regionisalways has id of origin destination region
c. link region and schedule for future commit- on origin region update complete:
a. update destination region index file reference to the current epoch and insert split regions
b. remove keys from key index- start step (2) as long as (sure you didn't mean until?) there are
while there isno updates left- start garbage collector
- database synced with disk and if everything went well, remove all 'transfer' epochs (log files) and gc'ed databases
- free index
The garbage
Garbagecollector has a simple design.All that is needed
what is needis to track an/the epoch'sepochtotal region count and a count of transfered regions during merge procedure. Thus, if some older epoch database has fewer than 70% (or any other changeable factor) live regionsless then 70% (or other changeable factor)they just get copied to current epoch database file and the old one is being deleted.On database recovery
recover,Sophia tracks and builds an index of pages from the youngest epochs (biggest numbersnumber's) down to the oldest. Log files are being replayed and epochsepoch'sare marked as 'transfer'.Sophia has been
maybeevaluated as having the following complexity (in terms of disk accesses):set: worst case is a O(1) append-only key write + in-memory index insert
get: worst case is a O(1) random region read, which itself do amortized O(log region_size) key compares + in-memory key index search + in-memory region search
range: range queries are very fast due to the fact that each iteration needs to compare no more than
is only need to compare onlytwo keys without searchingsearchthem, and access through mmaped database. Roughly complexity can be equally evaluated as having to sequentially read a mmaped file.
Hi, Dmitry!
Have a question.
#include <stdio.h>
#include "sophia.h"
void x(void) {
void *env = sp_env();
void *ctl = sp_ctl(env);
sp_set(ctl, "sophia.path", "./storage");
sp_set(ctl, "db", "x"); /* "x" database */
void *dbx = sp_get(ctl, "db.x");
sp_open(env);
void *o = NULL;
char key[] = "foo";
char val[] = "bar";
o = sp_object(dbx);
sp_set(o, "key", key, sizeof(key));
sp_set(o, "value", val, sizeof(val));
sp_set(dbx, o);
sp_destroy(env);
}
int y(void) {
void *env = sp_env();
void *ctl = sp_ctl(env);
sp_set(ctl, "sophia.path", "./storage");
sp_set(ctl, "db", "y"); /* "y" database */
void *dby = sp_get(ctl, "db.y");
sp_open(env);
void *o = NULL;
char key[] = "foo";
o = sp_object(dby);
sp_set(o, "key", key, sizeof(key));
void *result = sp_get(dby, o);
if (result) {
char *value = sp_get(result, "value", NULL);
printf("%s\n", value);
sp_destroy(result);
}
sp_destroy(env);
}
int main(void) {
x();
y();
}
/* prints bar */
Why?
Shared logs directory between databases?
I do not understand how you can set different logs directories for different databases in the same environment context.
Is it possible to work with multiple databases in the same environment context?
Thanks.
The example leaks 48 bytes (see this valgrind report):
#include <sophia.h>
#include <stdio.h>
int
main(void) {
char *path = "/tmp/foodb";
void *env = NULL;
void *db = NULL;
int rc = 0;
if (!(env = sp_env())) {
fprintf(stderr, "Failed to allocate environment\n");
return 1;
}
rc = sp_ctl(env, SPDIR, SPO_CREAT|SPO_RDWR, path);
if (-1 == rc) {
sp_destroy(env);
fprintf(stderr, "Failed to set database path\n");
return 1;
}
rc = sp_ctl(env, SPGC, 1);
if (-1 == rc) {
sp_destroy(env);
fprintf(stderr, "Failed to enable garbage collector\n");
return 1;
}
if (!(db = sp_open(env))) {
sp_destroy(env);
fprintf(stderr, "Failed to open database\n");
return 0;
}
printf("Opened database %s\n", path);
sp_destroy(db);
sp_destroy(env);
return 0;
}
Is something wrong with the example, or is there a leak in Sophia?
Although it is not a feature that is needed right away, peer to peer gossip-style replication would be a great addition to SophiaDB.
One of the best examples of replicated data I've seen is the great scuttlebutt project by dominictarr.
The key aspects of a replication would be implementing scuttlebutt style handshake and then syncing data, then replicating real time changes.
In terms of binary data, we could possibly use msgpack for serialisation?
Again, replication is not a feature that is needed right away, but it would be a great addition to Sophia in the long term, and would go along nicely with hot-backup support and secondary indexes
Any thoughts?
It looks that the --strip-unneeded option is not supported.
Could it be replaced by something like:
strip -u -r -x libsophia.so.1.2.2
p.s.
Just for future reference the strip command comes to mac osX with "binutils" (brew).
It would be nice to have parameter names in sophia.h
instead of a blob of "void_, void__, const void_". In fact, the man pages have them so I'm not sure why they've been omitted.
Hi! Sophia is stated to have wrapper-friendly API, but sp_get on ctl object is not wrapper-friendly in any way: it just returns a value of mixed type as void* pointer with no type information. So script language wrapper libraries must know about all possible call keys and do an additional check to cast the result correctly.
I see there is the so_ctlserialize function. It seems like a good solution, because its result is returned in a structure with type information.
Can you add access to this function to simplify wrapper development?
Dmitry,
add link to cl-sophia, please.
I will write tests in the near future.
Make a request to Quicklisp (CL package manager) after writing tests.
Advanced Sophia features will be added gradually.
Thanks.
I'm not sure how viable it is with the current implementation, but I would love it if we could name transactions.
For example:
if (0 != sp_begin(db, "mytransaction")) {
fprintf(stderr, "failed to create transaction");
exit(rc);
}
sp_set(...);
sp_commit(db);
Then, later on:
sp_rollback(db, "mytransaction");
All of your functions accept void* parameters for their opaque datatypes. This is very error prone because then the compiler can't tell (for example) me that I'm passing a cursor where an env expected. Or for that matter, I could accidentally pass a pointer to anything where a cursor is expected.
Consider "struct sp_cursor;" in the .h file. It also eases "self documenting code" a little bit.
This change would be totally binary compatible, and even source compatible in C (where implicit casts from void* are allowed).
I just read the docs and found no information that Sophia can store a list of values as value of a key. sp_set
can be used for such cases providing that values must be in serialized format. Therefore, if sp_set
is used in multiple processes or threads, the last value will be kept. Other values are overwritten.
Is there any plan to support appending a value (or removing value from) to a list identified by a key atomically?
Would appreciate this to be included as well, rather than just the lib.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.