codenotary / immudb Goto Github PK
View Code? Open in Web Editor NEWimmudb - immutable database based on zero trust, SQL/Key-Value/Document model, tamperproof, data change history
Home Page: https://immudb.io
License: Other
immudb - immutable database based on zero trust, SQL/Key-Value/Document model, tamperproof, data change history
Home Page: https://immudb.io
License: Other
If a client connects to several servers, then it needs separate root management for each server.
Tags and annotations indexed for high-performance queries
Could appen if immudb receives a PKILL or a bad crash that merkle tree elements are not flushed in badger.
So this cause that the badger index will be greater than the merkle tree one.
The reason is that treeLayerWidth https://github.com/codenotary/immudb/blob/master/pkg/store/treestore.go#L57
is used to set the tree width(t.w) , by counting every treeNodes by layer.
But treenodes that corresponds at treeStoreValue at t-x are not presents.
Entering in this cause an infinite loop
func (t *treeStore) WaitUntil(index uint64) {
for {
t.RLock()
if t.w >= index+1 {
t.RUnlock()
return
}
t.RUnlock()
time.Sleep(time.Microsecond)
}
}
In order to avoid this all badger elements that are not linked properly with a merkle tree entries should be truncated at the db startup. Or should we rebuilt them?
Requirements: https://github.com/codenotary/immudb/issu themes/39
The current implementation copies values when fetching data.
We should refactor it in a way that no values are copied.
JWT inside gRPC
https://grpc.io/docs/guides/auth/#extending-grpc-to-support-other-authentication-mechanisms
manage token expire error
Add to immugw
mTLS should be available for all client communications to immudb server.
Immuclient connection flow is a miss.
Connection connectWithRetry is useless, grpc provide a native implementation https://godoc.org/google.golang.org/grpc#WaitForReady
root service should receives the uuid server identifier, but with the actual flow in client.go is too hard to provide it.
A refactor is needed
Split each command present in immuclient.go to a single file
It should not be possible to edit the DB files. We need a lock on them.
We also need a security mechanism to detect that data is not lost. Check with mmap. Also be aware of NFS.
scan should have a catch-all argument to list all keys:
scan *
we should not have the possibility to restore. otherwise we are not tamperproof
Actually immugw rest proxy exposes simplify api that wrap immudb one's.
All safe methods exposes by immugw (SafeSet, SafeGet, SafeReference, SafeZAdd) doesn't require for root and index because this data is managed internally.
The purpose of this task is to create a golang SDK to offer the same capabilities but in pure GO.
It should be fine that immugw wil be upgraded in order to use the newly sdk.
In order to do this you could use a new .proto schema that extends https://github.com/codenotary/immudb/blob/master/pkg/api/schema/schema.proto
Ex:
actual SafeGet service in immudb schema.proto
message SafeGetOptions {
Key key = 1;
Index rootIndex = 2;
}
rpc SafeGet(SafeGetOptions) returns (SafeItem){
option (google.api.http) = {
post: "/v1/immurestproxy/item/safe/get"
body: "*"
};
};
New SafeSet in immu.proto
rpc SafeGet(Key) returns (VeridiedItem){};
eg: VerifiedItem contains item and a bool in which we put verification results.
Flow:
consistency command should be named ‘check-consistency’
All binaries should have man pages: immu, immud and immugw.
the client should be able to see its current commitment hash and index. this can be further used for consistency command.
command name: get-client-commitment
I should be able to run the following command:
./immu --help zscan
and get detailed help about the command.
When I type a command in a wrong way I should get correct information. Fow example, as it is no:
immudb git:(master) ✗ ./immu zscan
Error: accepts 1 arg(s), received 0
Usage:
immu zscan set [flags]
Aliases:
zscan, zscn
Flags:
-a, --address string bind address (default "127.0.0.1")
-h, --help help for zscan
-p, --port int port number (default 3322)
from this I understand that zscan is run alone with no argument and some flags: immu zscan set [flags]
, which is wrong.
Of course, this should be done for all commands.
manage immugw connection on immudb shutdown and proper shutdown.
Currently, recent data is in memory. We need to flush more often to disk (for every insert).
The default port for immudb server should be 3322.
after make is successful, make should say:
"Build successful, now you can make the manuals or check the status of the database with immuadmin".
We should have a config file to specify at least:
There is currently no rule or protocol to manage meaningful data in the communication between immudb and clients.
This task aim to setup a first iteration to provide a structured value with 2 property: timestamp and payload setup by clients and managed by immudb.
Like other databases(ex mongodb) our strategy will be to setup structured values client side. This will be done inside our drivers(SDKs) or immugw.
This give us 3 advantages:
1) immudb will remain simple. It will deal only with a raw bytestream as value. By this fact we will don't have to modify too much things inside the core
2) performance don't will decrease, memory usage will not increase. We don't have to open value to save timestamp serverside. In case somebody start to send us huge messages this will be a great advantage.
3) timestamp will be tamperproof because it's inserted inside value and it will be used to produce the hash for the merkletree
Main task
To deal with raw structure server side we would like to use google.protobuf.Any as type of value inside KeyValue message
message KeyValue {
bytes key = 1;
bytes value = 2;
}
will be
message KeyValue {
bytes key = 1;
google.protobuf.Any value = 2;
}
In another value.proto file we define the message structure
message StructuredValue {
uint64 timestamp = 1
bytes payload = 2;
}
After that you will have to do a small refactor inside immudb. The inner type of any is []Byte, so you shouldn't have any problems because the old value is of the same type.
Client side
Client side the implementation will be based on following pseudo code:
sv = new StructuredValue
sv.timestamp = getOsTimestamp
sv.payload = payloadByteArray
Any.pack(sv)
safesendObject = {
"kv": {
"key": "string",
"value": sv
},
"rootIndex": {
"index": "..."
}
}
client.safeSend(safesendObject)
Please add a test that proofs that an element can be correctly deserialized serverside.
This is needed because we will need to deserialize messages in future tasks(auth, crypto Signatures, audit ecc... )
Implementation in Immugw
Immugw is an intelligent gateway in front of immudb. It make as a proxy but it simplify safeSet and safeGet calls providing autonomously the root index and, after retrieving data, it make all the verification needed in order to ensure that immudb is not been tampered. It can ran on the same machine in which the client is running, in a 3th machine between the client and immudb or just in the same machine of immudb(with a lower security).
We need to provide the timestamp generation inside immugw for safeset, set and batch set. The client which is consuming immugw safeSet endpoints is completly unaware of the generation.
In the safeget, get, scan and history methods the structured value has to be returned exploded in json.
Please provide unit tests and api tests on safeGet and safeSet methods.
Implementation in immu client
Same approch has to be done for immu client for method get, set, scan and history. Theese methods has to return timestamp in set scan and history and setup the structured value in set.
Implementation in JS-SDK, Python-SDK, .Net-SDK and JAVA-SDK
Please provide the same approch for safeSet and Get methods in our SDKs.
Provide also integration tests
Useful tips.
To rebuild schema.pb.go and schema.pb.gw.go:
make build/codegen
If you like to test easily immugw and immud take a look at this repo -> https://github.com/mmeloni/immudbdemo
Please use Conventional Commits -> feat(pkg/subPackage...):{commit message}
The output of the command line should be properly aligned. For example, the semicolon are not aligned.
These are the standard locations for binaries and configs:
IMMU_HOME=/usr/share/immudb
LOG_DIR=/var/log/immudb (immuclient.log, immudb.log, immugw.log)
DATA_DIR=/var/lib/immudb
CONF_DIR=/etc/immudb/(immudb.ini,immugw.ini,immuclient.ini)
PID_FILE_DIR=/var/run/immudb/(immudb.pid,immugw.pid)
Although badger is not using functional options, ImmuDB can use it.
So, I strongly suggest to uniform everything using the functional options pattern.
cc @ameingast
We should disable building the immuclient in the default Make target. This should build separately with: make immuclient
The reason is that merkle tree is stored in RAM and it flush element in badger at shutdown or every N set operations.
In case of an inaspectate shutdown occurs it can happen that there could be more badger values elements than the merkle tree leafs.
To resynch the situation we have to loop in an insertion order all merkletree leafs. In this way all elemens in badger not correctly linked with the merkle tree can be truncated.
But in order to do this we need an insertion sorted index.
To realize the index without performance degradation we need to write the key(reference) on every merkletree elements. In other words we can build a custom insertion order index for badger internally, on top of the merkle tree.
In this way we have the index for free.
It's mandatory to get this index before go in production, to avoid a complex migration routine
Implementation details:
t-> A merkle tree node stored on badger composed by a key and a value.
Key: tsprefix | level | insertion index
Value: an hash
tl -> leaves A merkle tree node stored on badger
Key: tsprefix | level | insertion index
Value: hash of index key and of kv elemeent
kv-> Values stored on badger
Key: key
Value: value
hash0
/ \
/ \
/ \
g c
/ \ / \
tl-a tl-b tl-d tl-e
| | | |
kv1 kv2 kv3 kv4
In order to make insertion order index I will append the key (kv) to a leaf value that is an hash and has a fixed dimension.
so at the ends leaf tl-d for example will be:
key: lvl-0 | index 3
val: hash(kv3.index, kv3.val) | kv3.key
Flow to make a lookup for ex immudb.getItemByIndex(3) :
tl-d = getMerkleTreeLeaf(3)
val = tl-d.getValue()
reference = split(val) -> here we split the hash from kv3.key
item = lookup(reference)
Currently, in order to perform a safe insert operation (including proofs verification) ImmuDB requires 3 different subsequent API calls (see the sequence diagram the SET operation).
We can provide a single API call that performs all steps and returns the proofs directly.
Proofs must be still checked by the API consumer.
When using two (or more) concurrent writers to the immugw, one writer runs as expected, but the other will eventually fail when it comes to concurrent requests. Setup is from https://github.com/mmeloni/immudbdemo .
Small python script to reproduce:
#!/usr/bin/python3
import requests
import base64
import sys
IMMUGW="http://127.0.0.1:8081/v1/immurestproxy/"
i=1
while True:
key=sys.argv[1]+str(i)
print("key is %s" % (key,))
value="foobar"
result=requests.post(IMMUGW + "item/safe", json={"kv": {"key": base64.b64encode(key.encode()),
"value": base64.b64encode(value.encode())}})
if result.status_code != 200:
print("post to immugw result code %d" % (result.status_code))
print("post to immugw result text %s" % (result.text))
print("i was %d" % (i))
break
i=i+1
Start the script in a first shell window with parameter A:
$ ./immudb_test.py A
key is A1
key is A2
key is A3
key is A4
key is A5
key is A6
...
Start the script in a second shell window with parameter B:
$ ./immudb_test.py B
key is B1
key is B2
key is B3
key is B4
key is B5
key is B6
key is B7
key is B8
key is B9
key is B10
key is B11
key is B12
key is B13
key is B14
key is B15
key is B16
key is B17
post to immugw result code 400
post to immugw result text {"error":"invalid root index","code":3,"message":"invalid root index"}
i was 17
$
The counter varies, and also if A or B "wins".
we should have the following names for the executables: immuclient , immudb and immugw
We need an utility that should populate the database with a specified number of entries. Should get the number of records to be written and should return the amount of time required to populate and confirmation OK.
the file 000000.vlog inside of the data directory is 2GB in size from the start and doesn't seem to be updated
➤ ./immu get nonexistingkey -s
index: 0
key: nonexistingkey
value:
hash:
time: 1970-01-01 02:00:00 +0200 EET
➤ ./immu safeget nonexistingkey -s
index: 0
key: nonexistingkey
value:
hash:
time: 1970-01-01 02:00:00 +0200 EET
In order to avoid fake result on index with key
A signature (PKI) provided by the client can be became part of the insertion process.
It can be encapsulated into the item’s VALUE.
The implementation of this feature would require:
Results:
Bind pieces of data with respective identities
Identity authentication
immu command should provide a status:
Hint: badger exposes Prometheus metrics, we can use them as well.
If .root is not present return a warning to advise users that a new root is fetched from server.
This mean that there was no a verification check on a previous history
In immugw or cli the usage ofa mix of safe and unsafe methods causes an invalid root error
If the database has tampered, we must show at what index it happened.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.