Giter Site home page Giter Site logo

codenotary / immudb Goto Github PK

View Code? Open in Web Editor NEW
8.5K 79.0 336.0 38.44 MB

immudb - immutable database based on zero trust, SQL/Key-Value/Document model, tamperproof, data change history

Home Page: https://immudb.io

License: Other

Go 97.50% Dockerfile 0.21% Makefile 0.27% JavaScript 0.03% Shell 1.31% Python 0.14% Roff 0.08% Yacc 0.37% HTML 0.05% Smarty 0.05%
key-value immutable merkle-tree go database tamperproof verification immutable-database compliance pci-dss

immudb's People

Contributors

ameingast avatar dependabot[bot] avatar dmacvicar avatar gjergj avatar jeroiraz avatar jespino avatar joe-dz avatar joe-mann avatar juneezee avatar leogr avatar mangalaman93 avatar marcosanchotene avatar marcosquesada avatar mertakman avatar mmeloni avatar moshix avatar nikitasmall avatar nowikens avatar padurean avatar pascaldekloe avatar razikus avatar simonelazzaris avatar snyk-bot avatar tauu avatar testwill avatar tomekkolo avatar trwnh avatar vchain-us-mgmt avatar vchaindz avatar zaza81 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

immudb's Issues

Immudb stuck if PKILL signal is received

Could appen if immudb receives a PKILL or a bad crash that merkle tree elements are not flushed in badger.
So this cause that the badger index will be greater than the merkle tree one.
The reason is that treeLayerWidth https://github.com/codenotary/immudb/blob/master/pkg/store/treestore.go#L57
is used to set the tree width(t.w) , by counting every treeNodes by layer.
But treenodes that corresponds at treeStoreValue at t-x are not presents.

Entering in this cause an infinite loop

func (t *treeStore) WaitUntil(index uint64) {
	for {
		t.RLock()
		if t.w >= index+1 {
			t.RUnlock()
			return
		}
		t.RUnlock()
		time.Sleep(time.Microsecond)
	}
}

In order to avoid this all badger elements that are not linked properly with a merkle tree entries should be truncated at the db startup. Or should we rebuilt them?
Requirements: https://github.com/codenotary/immudb/issu themes/39

zero-copy

The current implementation copies values when fetching data.
We should refactor it in a way that no values are copied.

Refactor and add mTLS to client

mTLS should be available for all client communications to immudb server.

Immuclient connection flow is a miss.

A refactor is needed

Split each command present in immuclient.go to a single file

lock on the db files

It should not be possible to edit the DB files. We need a lock on them.

We also need a security mechanism to detect that data is not lost. Check with mmap. Also be aware of NFS.

disable restore

we should not have the possibility to restore. otherwise we are not tamperproof

Golang SDK immudb

Actually immugw rest proxy exposes simplify api that wrap immudb one's.
All safe methods exposes by immugw (SafeSet, SafeGet, SafeReference, SafeZAdd) doesn't require for root and index because this data is managed internally.
The purpose of this task is to create a golang SDK to offer the same capabilities but in pure GO.
It should be fine that immugw wil be upgraded in order to use the newly sdk.
In order to do this you could use a new .proto schema that extends https://github.com/codenotary/immudb/blob/master/pkg/api/schema/schema.proto
Ex:
actual SafeGet service in immudb schema.proto

message SafeGetOptions {
	Key key = 1;
	Index rootIndex = 2;
}
rpc SafeGet(SafeGetOptions) returns (SafeItem){
		option (google.api.http) = {
            post: "/v1/immurestproxy/item/safe/get"
            body: "*"
        };
	};

New SafeSet in immu.proto

rpc SafeGet(Key) returns (VeridiedItem){};

eg: VerifiedItem contains item and a bool in which we put verification results.

Flow:

  • The sdk will intercept SafeGet,
  • will retrieve rootIndex from client.RootService package
  • will forward the call to immudb (with SafeGet(SafeGetOptions) )
  • will make checks on result
  • will return to the user the VerifiedItem

help for immu client

I should be able to run the following command:

./immu --help zscan

and get detailed help about the command.

When I type a command in a wrong way I should get correct information. Fow example, as it is no:

immudb git:(master) ✗ ./immu zscan
Error: accepts 1 arg(s), received 0
Usage:
  immu zscan set [flags]
Aliases:
  zscan, zscn
Flags:
  -a, --address string   bind address (default "127.0.0.1")
  -h, --help             help for zscan
  -p, --port int         port number (default 3322)

from this I understand that zscan is run alone with no argument and some flags: immu zscan set [flags], which is wrong.

Of course, this should be done for all commands.

flush often to disk

Currently, recent data is in memory. We need to flush more often to disk (for every insert).

default port

The default port for immudb server should be 3322.

instructions after make

after make is successful, make should say:
"Build successful, now you can make the manuals or check the status of the database with immuadmin".

config file

We should have a config file to specify at least:

  • port
  • data location
  • listening address

Structured value

There is currently no rule or protocol to manage meaningful data in the communication between immudb and clients.
This task aim to setup a first iteration to provide a structured value with 2 property: timestamp and payload setup by clients and managed by immudb.

Like other databases(ex mongodb) our strategy will be to setup structured values client side. This will be done inside our drivers(SDKs) or immugw.
This give us 3 advantages:
1) immudb will remain simple. It will deal only with a raw bytestream as value. By this fact we will don't have to modify too much things inside the core
2) performance don't will decrease, memory usage will not increase. We don't have to open value to save timestamp serverside. In case somebody start to send us huge messages this will be a great advantage.
3) timestamp will be tamperproof because it's inserted inside value and it will be used to produce the hash for the merkletree

Main task
To deal with raw structure server side we would like to use google.protobuf.Any as type of value inside KeyValue message

message KeyValue {
	bytes key = 1;
	bytes value = 2;
}

will be

message KeyValue {
	bytes key = 1;
	google.protobuf.Any value = 2;
}

In another value.proto file we define the message structure

message StructuredValue {
uint64 timestamp = 1
bytes payload = 2;
}

After that you will have to do a small refactor inside immudb. The inner type of any is []Byte, so you shouldn't have any problems because the old value is of the same type.

Client side
Client side the implementation will be based on following pseudo code:

sv = new StructuredValue
sv.timestamp = getOsTimestamp
sv.payload = payloadByteArray
Any.pack(sv)
safesendObject = {
  "kv": {
    "key": "string",
    "value": sv
  },
  "rootIndex": {
    "index": "..."
  }
}
client.safeSend(safesendObject)

Please add a test that proofs that an element can be correctly deserialized serverside.
This is needed because we will need to deserialize messages in future tasks(auth, crypto Signatures, audit ecc... )

Implementation in Immugw
Immugw is an intelligent gateway in front of immudb. It make as a proxy but it simplify safeSet and safeGet calls providing autonomously the root index and, after retrieving data, it make all the verification needed in order to ensure that immudb is not been tampered. It can ran on the same machine in which the client is running, in a 3th machine between the client and immudb or just in the same machine of immudb(with a lower security).
We need to provide the timestamp generation inside immugw for safeset, set and batch set. The client which is consuming immugw safeSet endpoints is completly unaware of the generation.
In the safeget, get, scan and history methods the structured value has to be returned exploded in json.
Please provide unit tests and api tests on safeGet and safeSet methods.

Implementation in immu client
Same approch has to be done for immu client for method get, set, scan and history. Theese methods has to return timestamp in set scan and history and setup the structured value in set.

Implementation in JS-SDK, Python-SDK, .Net-SDK and JAVA-SDK
Please provide the same approch for safeSet and Get methods in our SDKs.
Provide also integration tests

Useful tips.

To rebuild schema.pb.go and schema.pb.gw.go:

make build/codegen

If you like to test easily immugw and immud take a look at this repo -> https://github.com/mmeloni/immudbdemo

Please use Conventional Commits -> feat(pkg/subPackage...):{commit message}

locations

These are the standard locations for binaries and configs:

IMMU_HOME=/usr/share/immudb
LOG_DIR=/var/log/immudb (immuclient.log, immudb.log, immugw.log)
DATA_DIR=/var/lib/immudb
CONF_DIR=/etc/immudb/(immudb.ini,immugw.ini,immuclient.ini)
PID_FILE_DIR=/var/run/immudb/(immudb.pid,immugw.pid)

Insertion order index

The reason is that merkle tree is stored in RAM and it flush element in badger at shutdown or every N set operations.
In case of an inaspectate shutdown occurs it can happen that there could be more badger values elements than the merkle tree leafs.
To resynch the situation we have to loop in an insertion order all merkletree leafs. In this way all elemens in badger not correctly linked with the merkle tree can be truncated.
But in order to do this we need an insertion sorted index.
To realize the index without performance degradation we need to write the key(reference) on every merkletree elements. In other words we can build a custom insertion order index for badger internally, on top of the merkle tree.
In this way we have the index for free.
It's mandatory to get this index before go in production, to avoid a complex migration routine

Implementation details:

t-> A merkle tree node stored on badger  composed by a key and a value. 
          Key: tsprefix | level | insertion index
          Value: an hash
tl -> leaves  A merkle tree node stored on badger
          Key: tsprefix | level | insertion index
          Value: hash of index key and of kv elemeent
kv-> Values stored on badger       
          Key: key
          Value: value

             hash0     
            /     \            
          /         \           
        /             \          
      g               c       
   /      \          /     \      
tl-a    tl-b      tl-d   tl-e
  |        |        |      |    
kv1       kv2      kv3    kv4

In order to make insertion order index I will append the key (kv) to a leaf value that is an hash and has a fixed dimension.

so at the ends leaf tl-d for example will be:

key: lvl-0 | index 3
val: hash(kv3.index, kv3.val) | kv3.key

Flow to make a lookup for ex immudb.getItemByIndex(3) :

tl-d = getMerkleTreeLeaf(3)
val = tl-d.getValue()
reference = split(val) -> here we split the hash from kv3.key
item = lookup(reference)

Simplified SET/GET with proofs

Currently, in order to perform a safe insert operation (including proofs verification) ImmuDB requires 3 different subsequent API calls (see the sequence diagram the SET operation).

We can provide a single API call that performs all steps and returns the proofs directly.

Proofs must be still checked by the API consumer.

Safe Append

immugw concurrent clients

When using two (or more) concurrent writers to the immugw, one writer runs as expected, but the other will eventually fail when it comes to concurrent requests. Setup is from https://github.com/mmeloni/immudbdemo .

Small python script to reproduce:

#!/usr/bin/python3
import requests
import base64
import sys

IMMUGW="http://127.0.0.1:8081/v1/immurestproxy/"

i=1
while True:
    key=sys.argv[1]+str(i)
    print("key is %s" % (key,)) 
    value="foobar"
    result=requests.post(IMMUGW + "item/safe", json={"kv": {"key": base64.b64encode(key.encode()),
                                                               "value": base64.b64encode(value.encode())}})
    if result.status_code != 200:
        print("post to immugw result code %d" % (result.status_code))
        print("post to immugw result text %s" % (result.text))
        print("i was %d" % (i))
        break
    i=i+1

Start the script in a first shell window with parameter A:

$ ./immudb_test.py A
key is A1
key is A2
key is A3
key is A4
key is A5
key is A6
...

Start the script in a second shell window with parameter B:

$ ./immudb_test.py B
key is B1
key is B2
key is B3
key is B4
key is B5
key is B6
key is B7
key is B8
key is B9
key is B10
key is B11
key is B12
key is B13
key is B14
key is B15
key is B16
key is B17
post to immugw result code 400
post to immugw result text {"error":"invalid root index","code":3,"message":"invalid root index"}
i was 17
$

The counter varies, and also if A or B "wins".

immupopulate

We need an utility that should populate the database with a specified number of entries. Should get the number of records to be written and should return the amount of time required to populate and confirmation OK.

Cryptographic signatures

A signature (PKI) provided by the client can be became part of the insertion process.

It can be encapsulated into the item’s VALUE.
The implementation of this feature would require:

  • some additions to the gRPC protocol
  • a structuring layer (to encode the signature next to the internal entry’s value)
  • formal validation check on write operations

Results:

Bind pieces of data with respective identities
Identity authentication

status command

immu command should provide a status:

  • healthcheck
  • at least the current (last) statistic data

Hint: badger exposes Prometheus metrics, we can use them as well.

tampered index

If the database has tampered, we must show at what index it happened.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.