Giter Site home page Giter Site logo

philippgille / gokv Goto Github PK

View Code? Open in Web Editor NEW
681.0 9.0 66.0 758 KB

Simple key-value store abstraction and implementations for Go (Redis, Consul, etcd, bbolt, BadgerDB, LevelDB, Memcached, DynamoDB, S3, PostgreSQL, MongoDB, CockroachDB and many more)

License: Mozilla Public License 2.0

Go 98.54% PowerShell 0.58% Shell 0.88%
go golang key-value key-value-store library package abstraction simple redis bolt

gokv's Issues

Evaluate using wire for dependency injection

I haven't worked with wire yet and on the first glance it doesn't look like it makes gokv easier than it already is (for example it looks like code generation is involved), but go-cloud uses wire and go-cloud is similar (at least in its goal to offer an abstraction layer to cloud storage) to gokv.

Also, GitHub user @gedw99 suggested using wire in #72.

Add gokv.Store implementation for Alibaba Cloud Table Store

Next to AWS, Azure and GCP, Alibaba Cloud is a big cloud provider, especially in Asia.

We should implement one of their database services as key-value store as well.

Their "Table Store" seems to fit the bill:

Multi user

Embedded stores and non embedded stores ( like my SQL ) are run differently.

I had an idea to make switching adnostic, in that boltdb or badger can be used from many processes, by wrapping access with a connection poller

Also using this lib with the Google go- cloud project and wire IOC mechanism would make this much cleaner too.

Add implementation for MySQL

While PostgreSQL is more popular among Gophers and maybe generally among projects with higher requirements (performance, features), MySQL is still the most popular open source relational database (management system).

It's SQL, so not a key-value store, but that doesn't keep us from creating a table like Item with a k text column as primary key and v blob column, or something like that.

It might be of use for people who already run MySQL and want to use gokv for simple key-value storage.

Also, TiDB is compatible with the MySQL protocol, so as long as there aren't any major differences (some required client-side configuration for example) and it works, this would be a plus (TiDB is a popular "NewSQL" databases).

Queries

Graphql means you don't need a query language because graphql is an agnostic one.

Just write non join queries in each graphql resolver. Good match for KV stores !

Seems like a perfect match for this project.

It maybe that these basic queries can be done agnostically in the resolvers too at runtime using reflection. Storm led the way there. I don't think it's that huge a task if things are restricted like storm does.

If a developer does not want the performance penalty of reflection based query builders they don't have to use it. In graphql the resolvers are very simple so there are no side effects either of this developer choice

https://github.com/asdine/storm

Insummary the two together gets us an agnostic query layer

Add implementation for Azure Table Storage

There's another ticket for Azure Cosmos DB: #41.

Azure Cosmos DB offers multiple APIs and because the Table Storage API is advertised as the proper one for key-value pairs, that's what we'll use.

So when implementing a Table Storage client anyway, then maybe let's start with this one and then reuse it in the Cosmos DB implementation.

Also, the "Azure SDK for Go" doesn't seem to have gotten much focus on the Table Storage part and there's no specific client for Cosmos DB, so it should definitely work for Table Storage itself, but maybe there are issues when using it for Cosmos DB.

Add implementation for Memcached

Memcached is meant to be used as a cache and not as persistent key-value store, but some people might still like to use the simple gokv interface for using Memcached, or maybe they have configured persistency (despite Memcached discouraging to do this).

Also, some other databases are compatible with the Memcached protocol, for example Couchbase (see here) and Apache Ignite (see here).

Add gokv.Store implementation for OrientDB

OrientDB claims to be the fastest graph database (explicitly mentioning to be faster than Neo4j). It's also a multi-model DB, with the website saying that it supports key-value pairs as well. Maybe just with a document that only has a single value, similar to MongoDB and ArangoDB - I didn't go through the documentation to find this out yet.

BoltClient tests can't be executed repeatedly on the same machine

generateRandomTempDbPath() generates new numbers when being called multiple times within the same process, but when called in a new process it starts with the same number. This leads to the same DB being used when executing the tests again. Thus leading to one of the tests failing that expects no result for a given key, but a result is found (from a previous test run).

Evaluate adding gokv.Store implementation for AWS S3

AWS S3 is made to store files and it might not support strong consistency (every read after a write contains the written data), but we marshal to []byte anyway and it might be one of the cheapest cloud storage solutions.

Also, S3 is not only for AWS. Many other cloud providers and also self-hosted open source products support the S3 protocol.

Add Close() method to interface and all implementations

While some implementations don't need a close method, many do, and to make the latter ones properly work, the Close() method must be added to the interface, so that package creators who use a gokv.Store can properly close it, no matter what the package users pass as implementation.

Document for each package how important the call is, because in some cases where a developer uses gokv not to satisfy a gokv.Store parameter but just to have an easy way to interact with some key-value storage, he's in full control and if he uses an implementation that doesn't need to be closed, than he should know that he doesn't need to call Close().

Errors occur when reading a value from the store implementation for bbolt / Bolt DB

The recent Travis CI builds have some errors:
Example 1 from https://travis-ci.org/philippgille/gokv/builds/445281693:

=== RUN   TestStoreConcurrent
--- FAIL: TestStoreConcurrent (3.38s)
	test.go:74: invalid character '\x00' looking for beginning of value
	test.go:74: invalid character '\x00' looking for beginning of value
	test.go:74: invalid character '\x00' looking for beginning of value
FAIL
coverage: 81.1% of statements
FAIL	github.com/philippgille/gokv/bolt	3.420s

Example 2 from https://travis-ci.org/philippgille/gokv/builds/445303063:

=== RUN   TestStoreConcurrent
unexpected fault address 0x7f361ca600ef
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x7f361ca600ef pc=0x5c2444]
goroutine 25 [running]:
runtime.throw(0x627b33, 0x5)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/panic.go:619 +0x81 fp=0xc420053c48 sp=0xc420053c28 pc=0x453861
runtime.sigpanic()
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/signal_unix.go:395 +0x211 fp=0xc420053c98 sp=0xc420053c48 pc=0x468cf1
encoding/json.checkValid(0x7f361ca600ef, 0xa, 0xa, 0xc420192260, 0x631b60, 0xc420053d70)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/encoding/json/scanner.go:27 +0x144 fp=0xc420053d00 sp=0xc420053c98 pc=0x5c2444
encoding/json.Unmarshal(0x7f361ca600ef, 0xa, 0xa, 0x5e34e0, 0xc42002c860, 0xc4200c5500, 0x0)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/encoding/json/decode.go:102 +0xbd fp=0xc420053d70 sp=0xc420053d00 pc=0x5addfd
github.com/philippgille/gokv/util.FromJSON(0x7f361ca600ef, 0xa, 0xa, 0x5e34e0, 0xc42002c860, 0x0, 0xc42002c860)
	/home/travis/gopath/src/github.com/philippgille/gokv/util/util.go:12 +0x65 fp=0xc420053dc8 sp=0xc420053d70 pc=0x5c8e25
github.com/philippgille/gokv/bolt.Store.Get(0xc4200e0000, 0x627f2d, 0x7, 0x631832, 0x1, 0x5e34e0, 0xc42002c860, 0xc420053ee8, 0x437b08, 0x10)
	/home/travis/gopath/src/github.com/philippgille/gokv/bolt/bolt.go:55 +0x203 fp=0xc420053e80 sp=0xc420053dc8 pc=0x5c9423
github.com/philippgille/gokv/bolt.(*Store).Get(0xc4200f21e0, 0x631832, 0x1, 0x5e34e0, 0xc42002c860, 0x0, 0x0, 0x0)
	<autogenerated>:1 +0xb3 fp=0xc420053ef8 sp=0xc420053e80 pc=0x5ca403
github.com/philippgille/gokv/test.InteractWithStore(0x646980, 0xc4200f21e0, 0x631832, 0x1, 0xc4200d6000, 0xc4200da130)
	/home/travis/gopath/src/github.com/philippgille/gokv/test/test.go:72 +0x297 fp=0xc420053fb0 sp=0xc420053ef8 pc=0x5cc9a7
runtime.goexit()
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc420053fb8 sp=0xc420053fb0 pc=0x4837f1
created by github.com/philippgille/gokv/bolt_test.TestStoreConcurrent
	/home/travis/gopath/src/github.com/philippgille/gokv/bolt/bolt_test.go:44 +0x241

[...]

goroutine 244 [runnable]:
github.com/philippgille/gokv/test.InteractWithStore(0x646980, 0xc42000de20, 0xc420018bc9, 0x3, 0xc4200d6000, 0xc4200da130)
	/home/travis/gopath/src/github.com/philippgille/gokv/test/test.go:58
created by github.com/philippgille/gokv/bolt_test.TestStoreConcurrent
	/home/travis/gopath/src/github.com/philippgille/gokv/bolt/bolt_test.go:44 +0x241
FAIL	github.com/philippgille/gokv/bolt	0.089s

Example 3 from https://travis-ci.org/philippgille/gokv/builds/445310684:

=== RUN   TestStoreConcurrent
--- FAIL: TestStoreConcurrent (2.76s)
	test.go:74: invalid character '\x01' looking for beginning of value
FAIL
coverage: 81.1% of statements
FAIL	github.com/philippgille/gokv/bolt	2.791s

While reading about BadgerDB as one of the next gokv.Store implementations I read their warning about the validity of data from within transactions, which is:

Please note that values returned from Get() are only valid while the transaction is open. If you need to use a value outside of the transaction then you must use copy() to copy it to another byte slice.

And I remembered that I read something similar regarding bbolt, which is written in their GoDocs: https://godoc.org/go.etcd.io/bbolt#hdr-Caveats (or to have a future working link, here.

To quote:

Keys and values retrieved from the database are only valid for the life of the transaction. When used outside the transaction, these byte slices can point to different data or can point to invalid memory which will cause a panic.

Yet, the current implementation in gokv doesn't take this into account:

var data []byte
c.db.View(func(tx *bolt.Tx) error {
	b := tx.Bucket([]byte(c.bucketName))
	data = b.Get([]byte(k))
	// [...]
	return nil
})
// ... continue to work with data

So this could very well be the reason for the errors we see in the Travis CI log.

This needs to be fixed as soon as possible!

Evaluate adding a gokv.Store implementation for ElasticSearch

As with some other implementations, ElasticSearch is not meant to be used as simple key-value store. It's for uploading and then indexing data, that can later be searched. But the data are (usually?) JSON documents and there's an API for PUT ("Index"), GET and DELETE via an ID, which serves as key. So 1) it's probably usable as key-value storage, and 2) as mentioned, like with other gokv.Store implementations, if users are already running an ElasticSearch cluster, why not use it, instead of having to set up and administrate another service like Redis?

So, check out if ElasticSearch's PUT, GET and DELETE APIs are actually usable for our purpose.

Add implementation for MongoDB

MongoDB is not a dedicated key-value store, but it's probably the most popular NoSQL database, so in many projects there's already a running instance, so instead of forcing developers in those projects to set up and administrate another database, it makes sense to utilize what's already there.

Instead of storing the value for the key, a wrapping type probably needs to be created which contains the key as _id and the value as value attribute. The type could be called KVpair, goKVpair or something similar.

There's no official Go SDK for MongoDB, but the official documentation recommends a fork of an open source project:

Add more gokv.Store implementations

Instead of creating a new issue as soon as someone thinks a new implementation makes sense, let's use this ticket for collecting ideas which implementations could make sense in the future.

Only when getting more serious about a specific implementation and starting to more thoroughly evaluate the key-value store / DB, a new issue specifically for that store should be created.


Add gokv.Store implementation for local file storage

In some cases it might be useful to store each key-value pair as a separate local file. The key is the filename, the value is the file content.

  • Key must be escaped
  • Concurrent use of the store must be possible
    • Keep one lock object per key
    • Lock for each file access

Make file ending optional in file implementation

When using the file package, files are stored as <key>.json or <key>.gob. But in case the client switches back and forth between the marshal formats, while using the same keys, this can lead to redundant and/or stale data.
Redundant: Save "a":"x" as JSON, then as gob
Stale: Save "a":"x" as JSON, then change it to "a":"y" and save as gob, then switch back to JSONand load "a" -> results in "x", which is old

Add implementation for Consul

Consul is probably mostly known for being a service registry in a microservice deployment, but it's explicitly advertised as key-value store as well. And because it's one of the most popular service registries this means it's already running in a lot of deployments and software developers might prefer to use their existing infrastructure instead of having to set up something new like Redis.

So gokv should add an implementation for Consul as key-value store.

Move every store implementation into its own package

When using go get github.com/philippgille/gokv, currently all its dependencies are downloaded as well, independent of the store implementation.

When each implementation is in its own package, this shouldn't be the case anymore, making initial go gets, as well as CI builds and Docker image builds much faster, and Docker images smaller.

Add benchmark for comparing all store implementations

Some people already have a strong preference for a specific store (for example because they're already running Redis for their web service, so they don't want to set up and manage anything else), but others are open for any new key-value store as long as its performance is great.

We should add a benchmark that compares the different gokv.Store implementations with each other. This will especially help in deciding between read vs write optimized stores, embedded vs remote stores and self-hosted vs cloud-hosted stores.

Add option to keep JSON readable for some implementations

Currently, when JSON is used as MarshalFormat, it will always be converted to []byte. This makes the result unreadable as a human, unless interpreted as / converted back to string. So for example when using DynamoDB, and a struct is marshalled into a JSON {"Foo":"bar"}, saved to DynamoDB with gokv, and then you look at the value via the AWS console, it's just eyJGb28iOiJiYXIifQ==, the Base64 encoding of the JSON string.

That's because in the dynamodb implementation we use a awsdynamodb.AttributeValue for the value where we assign the value to a field B, which is for []byte, and has the Base64 encoding described in the comment:

// An attribute of type Binary. For example:
//
// "B": "dGhpcyB0ZXh0IGlzIGJhc2U2NC1lbmNvZGVk"
//
// B is automatically base64 encoded/decoded by the SDK.
B []byte `type:"blob"`

In this case, we could look at the option provided by the package user and instead of B, use S and assign the plain JSON string to it.

This should also be done in a similar way for other implementations. It only makes sense for some of them though!

Add implementation for AWS DynamoDB

So far all gokv.Storage implementations where for self-hosted open source databases. Some projects have all their infrastructure in AWS and instead of starting and administrating another couple of EC2 instances for a custom database installation they probably prefer to use the database-as-a-service offers by AWS.

There's SimpleDB, for which I'll probably create another issue in the future, but DynamoDB is pushed a lot by AWS and seems to be favored (by AWS at least), so the support, documentation etc. might be better, as well as customer's willingness to adopt it is likely to be higher.

Support Go modules

Go 1.13 will use Go modules by default, so all Go packages should be updated to supporting Go modules. But it shouldn't only be done for the sake of adhering to the best practices / up-to-date tools in the Go ecosystem, but it's actually useful, too. Especially because there are so many gokv subpackages and almost all of them they have dependencies to third-party packages, and currently no dependency is pinned to a specific version in any way. Go modules allows pinning dependencies to specific versions, without the need to vendor their source code, but with checksums to make sure the dependencies haven't been tampered with when fetching them anew.

Useful links:

Add gokv.Store implementation for CockroachDB

Implement method do get all existing key-value pairs in a store

As mentioned in the README already, something like List(interface{}) error or GetAll(interface{}) error should be implemented.

Example of how a user could pass a slice which the List method then populates with values:

package main

import (
	"encoding/json"
	"fmt"
)

type foo struct {
	Bar string
}

// myFunc is meant to populate the passed slice of pointers
func myFunc(vals interface{}) {
	j := []byte(`[{"Bar":"baz1"},{"Bar":"baz2"}]`)

	err := json.Unmarshal(j, vals)
	if err != nil {
		panic(err)
	}
}

func main() {
	fmt.Println("Hello world!")

	vals := make([]foo, 0)
	myFunc(&vals)
	fmt.Println("vals:")
	for _, v := range vals {
		fmt.Printf("%+v\n", v)
	}
}

Output:

Hello world!
vals:
{Bar:baz1}
{Bar:baz2}

Clean up stores after test

The DBs and data that's created during the tests is not cleaned up properly. Especially in the badgerdb package, when a store is created, it creates a 2 GB file on the filesystem. When running all tests this means that about 10 GB of space is filled up. But it's not only about these files, but also when using a client for one of the DB servers, the tests should have a "tear down" phase where created data is deleted, so the DBs can be used further, instead of having to create a new Docker container for example.

For BadgerDB, one of the requirements was the ability to Close() the store, so the file handle is released and the file can be removed. Close() was recently implemented (#36).

Add gokv.Store implementation for Apache Ignite

Apache Ignite seems to be one of the most popular multi-model open source databases. It has a key-value store mode, which seems to be meant to be used as cache, but Apache Ignite seems to be doing everything in-memory first, and then use their "durable memory" or "persistence" components to achieve durability.

The key-value store mode is JCache compliant, see:

Is Ignite a key-value store?

Yes. Ignite provides a feature rich key-value API, that is JCache (JSR-107) compliant and supports Java, C++, and .NET.

And: https://ignite.apache.org/use-cases/database/key-value-store.html

The latter link includes the following bullet point regarding the JCache specification:

  • Pluggable Persistence

So this seems to be the optimal way to use Ignite, but on the other hand there don't seem to be any Go packages for JCache. But then again, Ignite supports the Redis protocol (see here), has its own binary protocol (see here) and even a REST API (see here).

(Un-)marshal to/from gobs as alternative to JSON

Currently all gokv.Store implementations in this repo (un-)marshal to/from JSON. This is nice because:

  1. In case a distributed store is used, other applications can easily work with the same data (e.g. a Java web service can get a value from Redis and deal with the JSON with a simple JSON library)
  2. When you want to examine some values manually you can use some client (like some web admin dashboard for Redis for example) and see and understand the data without having to decode anything

But:

  1. The marshalled JSON data is probably bigger than gob in its size
  2. According to the gob documentation the (un-)marshalling to/from gob is extremely fast, so using gob should improve the (un-)marshalling performance

So: Implement gob as alternative to JSON for (un-)marshalling for all currently existing gokv.Store implementations in this repo. Make this optional (via the Options struct in each implementation package)!

The issue already existed back when gokv was still part of ln-paywall as package storage, so maybe have a look at the issue created for that repo as well: philippgille/ln-paywall#32

Create CLI to examine existing key-value pairs

gokv will mostly be used within CLIs or web services, so the developer can't easily examine existing key-value pairs without writing his own mini CLI with similar code that he uses in the actual project. There are of course management dashboards for most key-value stores, similar to phpMyAdmin for MySQL, but this requires additional setup and is often overkill for developers who just want to have a quick look at a value for a given key.

The CLI should be usable like this: gokv get "someKey123"
The configuration (which store implementation, URL / passwords (depending on implementation) etc.) should be located in a gokv.yml or something similar. Maybe use a library like viper to make this as flexible as possible.

The code should be located in a directory called "gokv" within this gokv repository, so that when installing the CLI via go get it can be called as gokv.

Add implementation for Google Cloud Datastore

Amazon DynamoDB and Azure Table Storage are already supported, now GCP is missing for support of the "big three" cloud providers.

Note: Cloud Firestore (not Firebase) has a "Datastore mode" which makes it compatible to the Datastore API. Firestore might superseed Datastore in the future. But it's currently marked as beta:

Add implementation for Azure Cosmos DB

Azure Cosmos DB is described as "multi model" database, supporting the MongoDB API, Cassandra API, SQL queries, Gremlin (graph), and Azure Table Storage API.

Cosmos DB is the counterpart to AWS DynamoDB, but the "multi model" seems to be unique.

Microsoft describes Azure Table Storage as "NoSQL key-value store", so maybe that's the best API to work with (see here).

General info:

Implementation info:

Add gokv.Store implementation for Hazelcast

Hazelcast is advertised as IMDG (in-memory data grid), but at its core is an in-memory cache. We should add support for it to have another distributed cache option next to Memcached.

The supported data types are not key-value directly, but instead a single distributed map can be used for storing key-value pairs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.