Giter Site home page Giter Site logo

dataux / dataux Goto Github PK

View Code? Open in Web Editor NEW
318.0 17.0 45.0 4.98 MB

Federated mysql compatible proxy to elasticsearch, mongo, cassandra, big-table, google datastore

License: MIT License

Go 96.98% Makefile 0.03% Yacc 1.92% Shell 1.05% Dockerfile 0.02%
database go golang elasticsearch query-engine sql google-datastore mongo sql-query mysql-protocol

dataux's Introduction

Sql Query Proxy to Elasticsearch, Mongo, Kubernetes, BigTable, etc.

Unify disparate data sources and files into a single Federated view of your data and query with SQL without copying into datawarehouse.

Mysql compatible federated query engine to Elasticsearch, Mongo, Google Datastore, Cassandra, Google BigTable, Kubernetes, file-based sources. This query engine hosts a mysql protocol listener, which rewrites sql queries to native (elasticsearch, mongo, cassandra, kuberntes-rest-api, bigtable). It works by implementing a full relational algebra distributed execution engine to run sql queries and poly-fill missing features from underlying sources. So, a backend key-value storage such as cassandra can now have complete WHERE clause support as well as aggregate functions etc.

Most similar to prestodb but in Golang, and focused on easy to add custom data sources as well as REST api sources.

Storage Sources

Features

  • Distributed run queries across multiple servers
  • Hackable Sources Very easy to add a new Source for your custom data, files, json, csv, storage.
  • Hackable Functions Add custom go functions to extend the sql language.
  • Joins Get join functionality between heterogeneous sources.
  • Frontends currently only MySql protocol is supported but RethinkDB (for real-time api) is planned, and are pluggable.
  • Backends Elasticsearch, Google-Datastore, Mongo, Cassandra, BigTable, Kubernetes currently implemented. Csv, Json files, and custom formats (protobuf) are in progress.

Status

  • NOT Production ready. Currently supporting a few non-critical use-cases (ad-hoc queries, support tool) in production.

Try it Out

These examples are:

  1. We are going to create a CSV database of Baseball data from http://seanlahman.com/baseball-archive/statistics/
  2. Connect to Google BigQuery public datasets (you will need a project, but the free quota will probably keep it free).
# download files to local /tmp
mkdir -p /tmp/baseball
cd /tmp/baseball
curl -Ls http://seanlahman.com/files/database/baseballdatabank-2017.1.zip > bball.zip
unzip bball.zip

mv baseball*/core/*.csv .
rm bball.zip
rm -rf baseballdatabank-*

# run a docker container locally
docker run -e "LOGGING=debug" --rm -it -p 4000:4000 \
  -v /tmp/baseball:/tmp/baseball \
  gcr.io/dataux-io/dataux:latest

In another Console open Mysql:

# connect to the docker container you just started
mysql -h 127.0.0.1 -P4000


-- Now create a new Source
CREATE source baseball WITH {
  "type":"cloudstore", 
  "schema":"baseball", 
  "settings" : {
     "type": "localfs",
     "format": "csv",
     "path": "baseball/",
     "localpath": "/tmp"
  }
};

show databases;

use baseball;

show tables;

describe appearances

select count(*) from appearances;

select * from appearances limit 10;

Big Query Example

# assuming you are running local, if you are instead in Google Cloud, or Google Container Engine
# you don't need the credentials or volume mount
docker run -e "GOOGLE_APPLICATION_CREDENTIALS=/.config/gcloud/application_default_credentials.json" \
  -e "LOGGING=debug" \
  --rm -it \
  -p 4000:4000 \
  -v ~/.config/gcloud:/.config/gcloud \
  gcr.io/dataux-io/dataux:latest

# now that dataux is running use mysql-client to connect
mysql -h 127.0.0.1 -P 4000

now run some queries

-- add a bigquery datasource
CREATE source `datauxtest` WITH {
    "type":"bigquery",
    "schema":"bqsf_bikes",
    "table_aliases" : {
       "bikeshare_stations" : "bigquery-public-data:san_francisco.bikeshare_stations"
    },
    "settings" : {
      "billing_project" : "your-google-cloud-project",
      "data_project" : "bigquery-public-data",
      "dataset" : "san_francisco"
    }
};

use bqsf_bikes;

show tables;

describe film_locations;

select * from film_locations limit 10;

Hacking

For now, the goal is to allow this to be used for library, so the vendor is not checked in. use docker containers or dep for now.

# run dep ensure
dep ensure -v 

Related Projects, Database Proxies & Multi-Data QL

  • Data-Accessability Making it easier to query, access, share, and use data. Protocol shifting (for accessibility). Sharing/Replication between db types.
  • Scalability/Sharding Implement sharding, connection sharing
Name Scaling Ease Of Access (sql, etc) Comments
Vitess Y for scaling (sharding), very mature
twemproxy Y for scaling memcache
Couchbase N1QL Y Y sql interface to couchbase k/v (and full-text-index)
prestodb Y query front end to multiple backends, distributed
cratedb Y Y all-in-one db, not a proxy, sql to es
codis Y for scaling redis
MariaDB MaxScale Y for scaling mysql/mariadb (sharding) mature
Netflix Dynomite Y not really sql, just multi-store k/v
redishappy Y for scaling redis, haproxy
mixer Y simple mysql sharding

We use more and more databases, flatfiles, message queues, etc. For db's the primary reader/writer is fine but secondary readers such as investigating ad-hoc issues means we might be accessing and learning many different query languages.

Credit to mixer, derived mysql connection pieces from it (which was forked from vitess).

Inspiration/Other works

In Internet architectures, data systems are typically categorized into source-of-truth systems that serve as primary stores for the user-generated writes, and derived data stores or indexes which serve reads and other complex queries. The data in these secondary stores is often derived from the primary data through custom transformations, sometimes involving complex processing driven by business logic. Similarly data in caching tiers is derived from reads against the primary data store, but needs to get invalidated or refreshed when the primary data gets mutated. A fundamental requirement emerging from these kinds of data architectures is the need to reliably capture, flow and process primary data changes.

from Databus

Building

I plan on getting the vendor getting checked in soon so the build will work. However I am currently trying to figure out how to organize packages to allow use as both a library as well as a daemon. (see how minimal main.go is, to encourage your own builtins and datasources.)

# for just docker

# ensure /vendor has correct versions
dep ensure -update 

# build binary
./.build

# build docker

docker build -t gcr.io/dataux-io/dataux:v0.15.1 .

dataux's People

Contributors

araddon avatar mdmarek avatar pataquets avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dataux's Issues

Schema content Meta-Analysis for planner & typing (csv/json)

Understand the contents of underlying data sources via inspection to feed into planners, and schema. Deep understanding of data types, volatility, cardinality, muteability are the decisive factors in guiding good schema design, optimizations, and usage.

  • create a datastore of schema info, and variables just like mysql etc. Be able to utilize, in-mem, file, or persistent store. depends on araddon/qlbridge#32
  • library utils for inspection of types in underlying csv/json sources to do type detection
  • cardinality for planners (per column)
  • muteability
    • is this row muteable? read-only rows (never get updated) can be reflected into read-only analytical stores. Also, on-disk scannable
    • volatility? how often does it change? For low-cardinality non-volatile columns (often enums) it might make sense to store those in memory.
  • Table Info: row count, metrics (total byte size, writes/hour/day, reads/day/size, avg row size bytes).

File Source Improvments

the files don't reload after schema initially loads, so that the new files don't get found. Additionally, its not logical to even try to hold them all in memory. Allow a smarter, lazy-load, lru and or source-backed-source.

may depend on araddon/qlbridge#128

  • file-list filtering still isn't working
  • ensure we can query using date-range partitions to ONLY look at certain files.

Distributed Planner

Build a multi-node distributed query planner.

  • depends on araddon/qlbridge#65
  • 1 Nats Source
  • 2 Nats sink
  • 3 Distributed Worker Actors on Grid
  • 4 Partitionable Sources
  • 5 Serialization of sql request/plan
  • 6 Distributed Query Planner
  • unit-tests with 2 nodes and partitionable mongo

img_20160128_154420

MySQL backtick quoting

Hello!

Thanks for this great project. I have been experimenting with it quite a bit, hoping to get an existing DB-based dashboard working with Google BigQuery. The dashboard is written using a SQL framework, and the queries generated always have fields and table names quoted, like this:

select title, release_year AS year, `locations` from `film_locations` limit 10;

This is valid in MySQL but not BigQuery. We need to somehow remove the backtick quotes, or replace it with BigQuery quotes (which is [field_quoted])

2017/08/17 03:37:11.874755 resultreader.go:123: 0xc42072cd20 bq limit: 10 sel:SELECT title, release_year AS year, `locations` FROM [bigquery-public-data:san_francisco.film_locations] LIMIT 10
2017/08/17 03:37:12.438817 resultreader.go:148: could not run {Location: "query"; Message: "Field '`locations`' not found in table 'bigquery-public-data:san_francisco.film_locations'; did you mean 'locations'?"; Reason: "invalidQuery"}

BigQuery source

Implement backend source for big-query

  • source implementation, schema discovery.
  • support CREATE SOURCE statement
  • basic queries.

Can't run with source.

Finaly i found this awesome project.

I need elasticsearch sql proxy, and i had a trial with you binary file and it's nearly perfect. But have some issue when query es-style table name.

Our elasticsearch index like "logstash-nginxlog-2017.10", it's not working on dataux query sql.

mysql> select ip from logstash-nginxlog-2017.10;
ERROR 1105 (HY000): QLBridge.plan:  No datasource found

I think this may caused by the special dot, so i want to fix this. But i can't run with source. I'm very sure i had run go get first already.

Please give a hint. I want to contribute more in ES module.

$ go run main.go
# cloud.google.com/go/longrunning/autogen
../gopath/src/cloud.google.com/go/longrunning/autogen/doc.go:30:11: undefined: metadata.FromOutgoingContext
../gopath/src/cloud.google.com/go/longrunning/autogen/doc.go:33:9: undefined: metadata.NewOutgoingContext
# cloud.google.com/go/datastore
../gopath/src/cloud.google.com/go/datastore/client.go:101:8: undefined: metadata.NewOutgoingContext
# github.com/coreos/etcd/clientv3
../gopath/src/github.com/coreos/etcd/clientv3/logger.go:26:13: undefined: grpclog.LoggerV2

$ go env
GOARCH="amd64"
GOBIN="/wwwroot/gopath//pkg"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/wwwroot/gopath/"
GORACE=""
GOROOT="/usr/local/Cellar/go/1.9/libexec"
GOTOOLDIR="/usr/local/Cellar/go/1.9/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/fm/29zg12712ql3y0l0j3t9_mkw0000gn/T/go-build650601297=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

Cassandra backend

Cassandra backend

  • core data source scanner etc for vanilla rows
    • core connection and schema discovery
    • show tables, describe tables
    • predicate push down for where clause if possible for partition-key
    • projection rewrite

BigQuery Standard SQL

I have just realized that the legacy_syntax option has not been implemented. I think it make sense to implement and maybe even default to Standard SQL, since Standard SQL is much more similar to MySQL. It has proper support of DISTINCT and DATE types.

To implement Standard SQL, we need to apply it on the query:

        // Use standard SQL syntax for queries.
        // See: https://cloud.google.com/bigquery/sql-reference/
        query.QueryConfig.UseStandardSQL = true

And then update the quote character to ` instead of []. [1] (May need to re-consider changes made in #55)

Test case:

MySQL :

select `name` from `bikeshare_stations` LIMIT 1;

BigQuery Legacy SQL:

select [name] from [bigquery-public-data:san_francisco.bikeshare_stations] LIMIT 1;

BigQuery Standard SQL:

select `name` from `bigquery-public-data.san_francisco.bikeshare_stations` LIMIT 1;

Docs

[1] : https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql#syntax_differences

RethinkDB support

Hi,

I've just found this project, which is really cool. I like the possibilities it gives to let old applications interact with a modern nosql database instead of postgresql or mysql.

I was thinking about RethinkDB support. Is that anything you guys are looking into?

/Tommy

K8 Backend

Be able to query k8 as if it was a mysql server. Allow easier consumption of info in K8 such as get list of pods, services, ip's by making it easy to run mysql queries.

Phase 1 #31

  • initial source with discovery/schema
  • pods
  • services
  • nodes

Phase 2

  • Choose proxy/discovery method to the api for kube (ie, kubectl proxy, etc) via config
  • push-down of where clause to api.ListOptions
  • think my api 1.4 is wrong, versionless?
  • endpoints
  • deployments
  • event
  • persistentvolumes
  • resourcequota
  • replicationcontroller
  • secret
  • projection not working on select yy(creationtimestamp) from pods because of lack of polyfill on projected cols

MongoDB Features

Additional features for mongo.

  • polyfill group-by for non-agg funcs
  • usage as go sql driver
  • extract generator

convert grid to v3

Refactor to use the new grid.v3 pkg instead of v2.

  • Refactor the Source/Sink library to use named mailboxes checked out from a pool, and that are pre-provisioned
  • ensure mongo test runs

Cannot build from source

I want to create a build-from-source Docker image to make easier to start hacking.
My Dockerfile is:

FROM golang

COPY . /go/src/github.com/dataux/dataux/
WORKDIR /go/src/github.com/dataux/dataux/

RUN \
  go build

When I run docker build -t dataux ., it always fails this way:

backends/bigquery/resultreader.go:7:2: cannot find package "cloud.google.com/go/bigquery" in any of:
	/usr/local/go/src/cloud.google.com/go/bigquery (from $GOROOT)
	/go/src/cloud.google.com/go/bigquery (from $GOPATH)
backends/bigtable/resultreader.go:10:2: cannot find package "cloud.google.com/go/bigtable" in any of:
	/usr/local/go/src/cloud.google.com/go/bigtable (from $GOROOT)
	/go/src/cloud.google.com/go/bigtable (from $GOPATH)
backends/bigquery/resultreader.go:8:2: cannot find package "cloud.google.com/go/civil" in any of:
	/usr/local/go/src/cloud.google.com/go/civil (from $GOROOT)
	/go/src/cloud.google.com/go/civil (from $GOPATH)
backends/datastore/datasource.go:13:2: cannot find package "cloud.google.com/go/datastore" in any of:
	/usr/local/go/src/cloud.google.com/go/datastore (from $GOROOT)
	/go/src/cloud.google.com/go/datastore (from $GOPATH)
backends/bigtable/resultreader.go:11:2: cannot find package "github.com/araddon/dateparse" in any of:
	/usr/local/go/src/github.com/araddon/dateparse (from $GOROOT)
	/go/src/github.com/araddon/dateparse (from $GOPATH)
main.go:27:2: cannot find package "github.com/araddon/gou" in any of:
	/usr/local/go/src/github.com/araddon/gou (from $GOROOT)
	/go/src/github.com/araddon/gou (from $GOPATH)
backends/bigquery/resultreader.go:13:2: cannot find package "github.com/araddon/qlbridge/datasource" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/datasource (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/datasource (from $GOPATH)
main.go:14:2: cannot find package "github.com/araddon/qlbridge/datasource/files" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/datasource/files (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/datasource/files (from $GOPATH)
backends/bigquery/resultreader.go:14:2: cannot find package "github.com/araddon/qlbridge/exec" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/exec (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/exec (from $GOPATH)
backends/bigquery/sql_to_bq.go:19:2: cannot find package "github.com/araddon/qlbridge/expr" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/expr (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/expr (from $GOPATH)
vendored/mixer/proxy/handler_sharded.go:17:2: cannot find package "github.com/araddon/qlbridge/expr/builtins" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/expr/builtins (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/expr/builtins (from $GOPATH)
backends/bigtable/sql_to_bt.go:18:2: cannot find package "github.com/araddon/qlbridge/lex" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/lex (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/lex (from $GOPATH)
backends/bigquery/sql_to_bq.go:20:2: cannot find package "github.com/araddon/qlbridge/plan" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/plan (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/plan (from $GOPATH)
backends/bigquery/resultreader.go:15:2: cannot find package "github.com/araddon/qlbridge/rel" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/rel (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/rel (from $GOPATH)
backends/bigquery/source.go:19:2: cannot find package "github.com/araddon/qlbridge/schema" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/schema (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/schema (from $GOPATH)
backends/bigquery/resultreader.go:16:2: cannot find package "github.com/araddon/qlbridge/value" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/value (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/value (from $GOPATH)
backends/bigtable/sql_to_bt.go:23:2: cannot find package "github.com/araddon/qlbridge/vm" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/vm (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/vm (from $GOPATH)
planner/plan_server.go:14:2: cannot find package "github.com/coreos/etcd/clientv3" in any of:
	/usr/local/go/src/github.com/coreos/etcd/clientv3 (from $GOROOT)
	/go/src/github.com/coreos/etcd/clientv3 (from $GOPATH)
backends/cassandra/source.go:11:2: cannot find package "github.com/gocql/gocql" in any of:
	/usr/local/go/src/github.com/gocql/gocql (from $GOROOT)
	/go/src/github.com/gocql/gocql (from $GOPATH)
planner/msgs.pb.go:18:8: cannot find package "github.com/golang/protobuf/proto" in any of:
	/usr/local/go/src/github.com/golang/protobuf/proto (from $GOROOT)
	/go/src/github.com/golang/protobuf/proto (from $GOPATH)
backends/cassandra/source.go:12:2: cannot find package "github.com/hailocab/go-hostpool" in any of:
	/usr/local/go/src/github.com/hailocab/go-hostpool (from $GOROOT)
	/go/src/github.com/hailocab/go-hostpool (from $GOPATH)
vendored/mixer/proxy/handler_sharded.go:15:2: cannot find package "github.com/kr/pretty" in any of:
	/usr/local/go/src/github.com/kr/pretty (from $GOROOT)
	/go/src/github.com/kr/pretty (from $GOPATH)
models/config.go:8:2: cannot find package "github.com/lytics/confl" in any of:
	/usr/local/go/src/github.com/lytics/confl (from $GOROOT)
	/go/src/github.com/lytics/confl (from $GOPATH)
planner/sqlactor.go:11:2: cannot find package "github.com/lytics/dfa" in any of:
	/usr/local/go/src/github.com/lytics/dfa (from $GOROOT)
	/go/src/github.com/lytics/dfa (from $GOPATH)
backends/lytics/resultreader.go:9:2: cannot find package "github.com/lytics/go-lytics" in any of:
	/usr/local/go/src/github.com/lytics/go-lytics (from $GOROOT)
	/go/src/github.com/lytics/go-lytics (from $GOPATH)
planner/leader.go:10:2: cannot find package "github.com/lytics/grid/grid.v3" in any of:
	/usr/local/go/src/github.com/lytics/grid/grid.v3 (from $GOROOT)
	/go/src/github.com/lytics/grid/grid.v3 (from $GOPATH)
planner/plan_server.go:16:2: cannot find package "github.com/sony/sonyflake" in any of:
	/usr/local/go/src/github.com/sony/sonyflake (from $GOROOT)
	/go/src/github.com/sony/sonyflake (from $GOPATH)
backends/bigquery/resultreader.go:10:2: cannot find package "golang.org/x/net/context" in any of:
	/usr/local/go/src/golang.org/x/net/context (from $GOROOT)
	/go/src/golang.org/x/net/context (from $GOPATH)
backends/datastore/datasource.go:16:2: cannot find package "golang.org/x/oauth2/google" in any of:
	/usr/local/go/src/golang.org/x/oauth2/google (from $GOROOT)
	/go/src/golang.org/x/oauth2/google (from $GOPATH)
backends/datastore/datasource.go:17:2: cannot find package "golang.org/x/oauth2/jwt" in any of:
	/usr/local/go/src/golang.org/x/oauth2/jwt (from $GOROOT)
	/go/src/golang.org/x/oauth2/jwt (from $GOPATH)
backends/bigquery/resultreader.go:11:2: cannot find package "google.golang.org/api/iterator" in any of:
	/usr/local/go/src/google.golang.org/api/iterator (from $GOROOT)
	/go/src/google.golang.org/api/iterator (from $GOPATH)
backends/datastore/datasource.go:19:2: cannot find package "google.golang.org/api/option" in any of:
	/usr/local/go/src/google.golang.org/api/option (from $GOROOT)
	/go/src/google.golang.org/api/option (from $GOPATH)
backends/mongo/mgo_results.go:9:2: cannot find package "gopkg.in/mgo.v2" in any of:
	/usr/local/go/src/gopkg.in/mgo.v2 (from $GOROOT)
	/go/src/gopkg.in/mgo.v2 (from $GOPATH)
backends/mongo/mgo_results.go:10:2: cannot find package "gopkg.in/mgo.v2/bson" in any of:
	/usr/local/go/src/gopkg.in/mgo.v2/bson (from $GOROOT)
	/go/src/gopkg.in/mgo.v2/bson (from $GOPATH)
backends/kubernetes/client.go:14:2: cannot find package "k8s.io/apimachinery/pkg/apis/meta/v1" in any of:
	/usr/local/go/src/k8s.io/apimachinery/pkg/apis/meta/v1 (from $GOROOT)
	/go/src/k8s.io/apimachinery/pkg/apis/meta/v1 (from $GOPATH)
backends/kubernetes/source.go:15:2: cannot find package "k8s.io/client-go/kubernetes" in any of:
	/usr/local/go/src/k8s.io/client-go/kubernetes (from $GOROOT)
	/go/src/k8s.io/client-go/kubernetes (from $GOPATH)
backends/kubernetes/client.go:15:2: cannot find package "k8s.io/client-go/pkg/api/v1" in any of:
	/usr/local/go/src/k8s.io/client-go/pkg/api/v1 (from $GOROOT)
	/go/src/k8s.io/client-go/pkg/api/v1 (from $GOPATH)
backends/kubernetes/source.go:16:2: cannot find package "k8s.io/client-go/rest" in any of:
	/usr/local/go/src/k8s.io/client-go/rest (from $GOROOT)
	/go/src/k8s.io/client-go/rest (from $GOPATH)
backends/kubernetes/source.go:17:2: cannot find package "k8s.io/client-go/tools/clientcmd" in any of:
	/usr/local/go/src/k8s.io/client-go/tools/clientcmd (from $GOROOT)
	/go/src/k8s.io/client-go/tools/clientcmd (from $GOPATH)

I haven't found any build specific info elsewhere in the repo.

Also, a build-from-source Dockerfile would allow setting up an Automatic Build and would be a very convenient way of providing both tagged and daily builds automatically on each push, with the added benefit of better audit and traceability of the source used to build the Docker image. For some people, images pushed in a 'black box' way are a deal breaker for security reasons. I would happily send a PR if that is deemed a worthwhile improvement on the current Dockerfile.

Thanks in advance.

Implement Google Datstore Backend

Create a dataux Backend for Google Datastore

  • Datasource implementation
    • Tables() get list of available table names
    • scan some set of rows to do type inspection.
    • implement/expose this scan/introspect to refresh
    • insert
    • delete
    • update
    • better key, ie creation of google datastore key with knowledge of fields
      • way to identify the primary key field? and utilize it in inserts
      • where ancestor is expressed re columns (id, aid) (multi column index? araddon/qlbridge#32)
      • some way to implement ancestors
  • ToSql converter w tests
    • Where, poly-file filtering
    • Sort
    • projection
  • Type Features
    • implement support for Scan() - for json the rest of the entity

MySQL session died after BQ backend error

When query can pass MySQL validation and then fails at the BigQuery backend, the server throws an error (while client receives empty resultset) and apparent breaks the session. The client must re-connect.

Below is an example, the "DISTINCT" keyword is valid in MySQL, but not supported by BigQuery (Legacy SQL).

MySQL client:

mysql> select distinct name from stats limit 1;
Empty set (0.60 sec)

mysql> select name from stats limit 1;
ERROR 2013 (HY000): Lost connection to MySQL server during query
mysql> 

Server log:

2017/08/22 02:55:08.964618 mysql_handler.go:170: 0 0xc4203b7500 handleQuery: select distinct name from stats limit 1
2017/08/22 02:55:08.964800 resultreader.go:124: 0xc420155a40 bq limit: 1 sel:SELECT DISTINCT name FROM [project_id:dataset_id.stats] LIMIT 1
2017/08/22 02:55:09.572053 resultreader.go:152: could not run {Location: "query"; Message: "syntax error at: 1.1 - 1.31. SELECT DISTINCT is currently not supported. Please use GROUP BY instead to get the same effect."; Reason: "invalidQuery"}
2017/08/22 02:55:09.572083 resultreader.go:97: nice, finalize ResultReader out: 0xc4205d3ec0  row ct 0
2017/08/22 02:55:09.572103 task_sequential.go:152: *bigquery.ResultReader.Run() errored {Location: "query"; Message: "syntax error at: 1.1 - 1.31. SELECT DISTINCT is currently not supported. Please use GROUP BY instead to get the same effect."; Reason: "invalidQuery"}
2017/08/22 02:55:09.572157 mysql_handler.go:279: error on Query.Run(): {Location: "query"; Message: "syntax error at: 1.1 - 1.31. SELECT DISTINCT is currently not supported. Please use GROUP BY instead to get the same effect."; Reason: "invalidQuery"}
2017/08/22 02:55:09.572170 mysql_handler.go:287: completed in 607.52944ms   ns: 607529719
2017/08/22 02:55:09.572188 conn.go:129: got error on handle {Location: "query"; Message: "syntax error at: 1.1 - 1.31. SELECT DISTINCT is currently not supported. Please use GROUP BY instead to get the same effect."; Reason: "invalidQuery"}
2017/08/22 02:55:17.168556 mysql_handler.go:170: 0 0xc4203b7500 handleQuery: select name from stats limit 1
2017/08/22 02:55:17.168754 resultreader.go:124: 0xc42086e780 bq limit: 1 sel:SELECT name FROM [project_id:dataset_id.stats] LIMIT 1
2017/08/22 02:55:19.422496 resultreader.go:195: finished query, took: 2.253726159s for 1 rows
2017/08/22 02:55:19.422525 resultreader.go:97: nice, finalize ResultReader out: 0xc420783200  row ct 1
2017/08/22 02:55:19.422588 conn_writer.go:145: Could not write mysql out? size=40, err=connection was bad
2017/08/22 02:55:19.422605 mysql_handler.go:287: completed in 2.254027393s   ns: 2254027597

Create an official, Automated Build image on Docker Hub

Docker Hub allows you to create Automated Builds from source: https://docs.docker.com/docker-hub/builds/
It would add another packaging/distribution/installation method, whose buildings would be triggered automatically on each commit. It also allows to create different image tags from git tags & branches.
Also, documentation could easily include a canonical docker run statement to quickly spin up a Dataux instance with just a single command.

By making the image build via an AB, you give the resulting image verifiability and auditability. Also, the build is fully automatic. You can have the latest image tag build from HEAD and individual image tags from git's release tags.
Some people avoid non-verifiable (manually uploaded) images due to security & traceability reasons.

Docker search command clearly displays AB when listing images (mine is shown):

$ docker search dataux
NAME                      DESCRIPTION     STARS     OFFICIAL   AUTOMATED
pataquets/dataux          dataux          0                    [OK]

Just a free Docker Hub account and a quick setup would do. Ping me if you need help.

k8s backend vendor issues

As of 10/13/2017, cannot get the kubernetes backend to compile. There seems to be un-resolveable vendor dependency issues between k8s & etcd?

File Api Source

for file-sources, list their files as a source table.

  • list files for source
  • append values to this table based on info in folder, meta-data

Easier startup: Default/Zero config and CREATE SOURCE

Make it easier to try out dataux and get started.

  • pre built binaries, instructions, container image
    *kube deployment/service
  • make config file editing a thing of the past. Instead allow mysql commands of CREATE SOURCE to be the equivalent of conf file.

refs: #38 make persistent schema source

  • build script for docker file to create gcr.io image
  • kube deployment & service yaml
  • sniff kube if inside cluster
  • allow create to define kube location
  • commands in readme to run via kubectl
  • commands in readme for download/run
  • get someone else to do it verify its easy

update Google Cloud API client import paths and more

The Google Cloud API client libraries for Go are making some breaking changes:

  • The import paths are changing from google.golang.org/cloud/... to
    cloud.google.com/go/.... For example, if your code imports the BigQuery client
    it currently reads
    import "google.golang.org/cloud/bigquery"
    It should be changed to
    import "cloud.google.com/go/bigquery"
  • Client options are also moving, from google.golang.org/cloud to
    google.golang.org/api/option. Two have also been renamed:
    • WithBaseGRPC is now WithGRPCConn
    • WithBaseHTTP is now WithHTTPClient
  • The cloud.WithContext and cloud.NewContext methods are gone, as are the
    deprecated pubsub and container functions that required them. Use the Client
    methods of these packages instead.

You should make these changes before September 12, 2016, when the packages at
google.golang.org/cloud will go away.

File Reading (csv, json, proto) sources

Be able to Query files on Disk, GCS, S3. support csv, json, or custom (protobuf) formats.

  • Tables() get list of tables (folders)
  • load-test data
  • Table(name) ->
  • Open(table)
  • csv/json/proto/custom parsers interface
    • parsers
    • introspection
  • [x ] partitioning
  • limit

Other optional

  • max byte size
  • pre-fetch? cache?

BigTable backend

Backend source for google big-table.

Phase I

  • plain vanilla selects

Phase II

  • schema discovery to inspect columns and give them data types for describe.
  • way to define keys, composite keys. Similar to cassandra. Must modify ast to contain info. how to express?
    • keyFunc()
    • utilize new CREATE statement in qlbridge?

Cassandra backend part 2

  • read partition info from schema and re-use in dataux
  • support detecting json inside columns, and exploding those
  • pluggable row parsers bc contents of columns may be protobuf etc. possibly same/similar to files?

Persistent Source & Schema Storage

Instead of keeping all source information (ie, which data sources, their connection info, tables) in either 1) config files or 2) memory only create storage for them. Ideally long term this schema-storage is RAFT based for real time consistency.

  • Allow the storage of schema to use any of the source-backed-source concepts

def.RawData undefined (type *grid.ActorDef has no field or method RawData) (solved,for others until resolved)

i had to patch with https://github.com/lytics/grid/pull/66/files

go get -u github.com/dataux/dataux
# github.com/dataux/dataux/planner
go/src/github.com/dataux/dataux/planner/server.go:57: def.RawData undefined (type *grid.ActorDef has no field or method RawData)
go/src/github.com/dataux/dataux/planner/sqlactor.go:122: m.def.RawData undefined (type *grid.ActorDef has no field or method RawData)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.