dataux / dataux Goto Github PK

Federated mysql compatible proxy to elasticsearch, mongo, cassandra, big-table, google datastore

License: MIT License

Go 96.98% Makefile 0.03% Yacc 1.92% Shell 1.05% Dockerfile 0.02%

database go golang elasticsearch query-engine sql google-datastore mongo sql-query mysql-protocol

dataux's Introduction

Sql Query Proxy to Elasticsearch, Mongo, Kubernetes, BigTable, etc.

Unify disparate data sources and files into a single Federated view of your data and query with SQL without copying into datawarehouse.

Mysql compatible federated query engine to Elasticsearch, Mongo, Google Datastore, Cassandra, Google BigTable, Kubernetes, file-based sources. This query engine hosts a mysql protocol listener, which rewrites sql queries to native (elasticsearch, mongo, cassandra, kuberntes-rest-api, bigtable). It works by implementing a full relational algebra distributed execution engine to run sql queries and poly-fill missing features from underlying sources. So, a backend key-value storage such as cassandra can now have complete WHERE clause support as well as aggregate functions etc.

Most similar to prestodb but in Golang, and focused on easy to add custom data sources as well as REST api sources.

Storage Sources

Google Big Table SQL against big-table Bigtable.
Elasticsearch Simplify access to Elasticsearch.
Mongo Translate SQL into mongo.
Google Cloud Storage / (csv, json files) An example of REST api backends (list of files), as well as the file contents themselves are tables.
Cassandra SQL against cassandra. Adds sql features that are missing.
Lytics SQL against Lytics REST Api's
Kubernetes An example of REST api backend.
Google Big Query MYSQL against worlds best analytics datawarehouse BigQuery.
Google Datastore MYSQL against Datastore.

Features

Distributed run queries across multiple servers
Hackable Sources Very easy to add a new Source for your custom data, files, json, csv, storage.
Hackable Functions Add custom go functions to extend the sql language.
Joins Get join functionality between heterogeneous sources.
Frontends currently only MySql protocol is supported but RethinkDB (for real-time api) is planned, and are pluggable.
Backends Elasticsearch, Google-Datastore, Mongo, Cassandra, BigTable, Kubernetes currently implemented. Csv, Json files, and custom formats (protobuf) are in progress.

Status

NOT Production ready. Currently supporting a few non-critical use-cases (ad-hoc queries, support tool) in production.

Try it Out

These examples are:

We are going to create a CSV database of Baseball data from http://seanlahman.com/baseball-archive/statistics/
Connect to Google BigQuery public datasets (you will need a project, but the free quota will probably keep it free).

# download files to local /tmp
mkdir -p /tmp/baseball
cd /tmp/baseball
curl -Ls http://seanlahman.com/files/database/baseballdatabank-2017.1.zip > bball.zip
unzip bball.zip

mv baseball*/core/*.csv .
rm bball.zip
rm -rf baseballdatabank-*

# run a docker container locally
docker run -e "LOGGING=debug" --rm -it -p 4000:4000 \
  -v /tmp/baseball:/tmp/baseball \
  gcr.io/dataux-io/dataux:latest

In another Console open Mysql:

# connect to the docker container you just started
mysql -h 127.0.0.1 -P4000


-- Now create a new Source
CREATE source baseball WITH {
  "type":"cloudstore", 
  "schema":"baseball", 
  "settings" : {
     "type": "localfs",
     "format": "csv",
     "path": "baseball/",
     "localpath": "/tmp"
  }
};

show databases;

use baseball;

show tables;

describe appearances

select count(*) from appearances;

select * from appearances limit 10;

Big Query Example

# assuming you are running local, if you are instead in Google Cloud, or Google Container Engine
# you don't need the credentials or volume mount
docker run -e "GOOGLE_APPLICATION_CREDENTIALS=/.config/gcloud/application_default_credentials.json" \
  -e "LOGGING=debug" \
  --rm -it \
  -p 4000:4000 \
  -v ~/.config/gcloud:/.config/gcloud \
  gcr.io/dataux-io/dataux:latest

# now that dataux is running use mysql-client to connect
mysql -h 127.0.0.1 -P 4000

now run some queries

-- add a bigquery datasource
CREATE source `datauxtest` WITH {
    "type":"bigquery",
    "schema":"bqsf_bikes",
    "table_aliases" : {
       "bikeshare_stations" : "bigquery-public-data:san_francisco.bikeshare_stations"
    },
    "settings" : {
      "billing_project" : "your-google-cloud-project",
      "data_project" : "bigquery-public-data",
      "dataset" : "san_francisco"
    }
};

use bqsf_bikes;

show tables;

describe film_locations;

select * from film_locations limit 10;

Hacking

For now, the goal is to allow this to be used for library, so the vendor is not checked in. use docker containers or dep for now.

# run dep ensure
dep ensure -v

Related Projects, Database Proxies & Multi-Data QL

Data-Accessability Making it easier to query, access, share, and use data. Protocol shifting (for accessibility). Sharing/Replication between db types.
Scalability/Sharding Implement sharding, connection sharing

Name	Scaling	Ease Of Access (sql, etc)	Comments
Vitess	Y		for scaling (sharding), very mature
twemproxy	Y		for scaling memcache
Couchbase N1QL	Y	Y	sql interface to couchbase k/v (and full-text-index)
prestodb		Y	query front end to multiple backends, distributed
cratedb	Y	Y	all-in-one db, not a proxy, sql to es
codis	Y		for scaling redis
MariaDB MaxScale	Y		for scaling mysql/mariadb (sharding) mature
Netflix Dynomite	Y		not really sql, just multi-store k/v
redishappy	Y		for scaling redis, haproxy
mixer	Y		simple mysql sharding

We use more and more databases, flatfiles, message queues, etc. For db's the primary reader/writer is fine but secondary readers such as investigating ad-hoc issues means we might be accessing and learning many different query languages.

Credit to mixer, derived mysql connection pieces from it (which was forked from vitess).

Inspiration/Other works

In Internet architectures, data systems are typically categorized into source-of-truth systems that serve as primary stores for the user-generated writes, and derived data stores or indexes which serve reads and other complex queries. The data in these secondary stores is often derived from the primary data through custom transformations, sometimes involving complex processing driven by business logic. Similarly data in caching tiers is derived from reads against the primary data store, but needs to get invalidated or refreshed when the primary data gets mutated. A fundamental requirement emerging from these kinds of data architectures is the need to reliably capture, flow and process primary data changes.

from Databus

Building

I plan on getting the vendor getting checked in soon so the build will work. However I am currently trying to figure out how to organize packages to allow use as both a library as well as a daemon. (see how minimal main.go is, to encourage your own builtins and datasources.)

# for just docker

# ensure /vendor has correct versions
dep ensure -update 

# build binary
./.build

# build docker

docker build -t gcr.io/dataux-io/dataux:v0.15.1 .

dataux's People

Contributors

Stargazers

Watchers

Forkers

epsniff mdmarek mbrukman devopsbox gophersgang alexsnet adventvr derekchan devopsmi pataquets nutsteam boyxp vus520 duzhanyuan kellylg suixingbugai etsangsplk vothanhkiet zatte wjmboss ouyangchucai awesomegolang nh-live pengcheng90 zhanglei jonasagx lorock ericlin0122 will2love blueseller alphia didip gaobrian wang-shun zhaoshiling1017 vitorgga haitongz darren xqbumu lunarforge xiaoxiongmao-liu fivesheep iq-scm weirdgiraffe

dataux's Issues

Schema content Meta-Analysis for planner & typing (csv/json)

Understand the contents of underlying data sources via inspection to feed into planners, and schema. Deep understanding of data types, volatility, cardinality, muteability are the decisive factors in guiding good schema design, optimizations, and usage.

create a datastore of schema info, and variables just like mysql etc. Be able to utilize, in-mem, file, or persistent store. depends on araddon/qlbridge#32
library utils for inspection of types in underlying csv/json sources to do type detection
- scalar values (ints, strings, etc) for csv https://github.com/araddon/qlbridge/blob/159f9a5ff9a9dba83bacd35b92e8306dd7eacf96/datasource/introspect.go#L20
- detect json, protobuf inside a blob
- nested types (json)
cardinality for planners (per column)
muteability
- is this row muteable? read-only rows (never get updated) can be reflected into read-only analytical stores. Also, on-disk scannable
- volatility? how often does it change? For low-cardinality non-volatile columns (often enums) it might make sense to store those in memory.
Table Info: row count, metrics (total byte size, writes/hour/day, reads/day/size, avg row size bytes).

Postgresql front end

File Source Improvments

the files don't reload after schema initially loads, so that the new files don't get found. Additionally, its not logical to even try to hold them all in memory. Allow a smarter, lazy-load, lru and or source-backed-source.

may depend on araddon/qlbridge#128

file-list filtering still isn't working
ensure we can query using date-range partitions to ONLY look at certain files.

Distributed Planner

Build a multi-node distributed query planner.

MySQL backtick quoting

Hello!

Thanks for this great project. I have been experimenting with it quite a bit, hoping to get an existing DB-based dashboard working with Google BigQuery. The dashboard is written using a SQL framework, and the queries generated always have fields and table names quoted, like this:

select title, release_year AS year, `locations` from `film_locations` limit 10;

This is valid in MySQL but not BigQuery. We need to somehow remove the backtick quotes, or replace it with BigQuery quotes (which is [field_quoted])

2017/08/17 03:37:11.874755 resultreader.go:123: 0xc42072cd20 bq limit: 10 sel:SELECT title, release_year AS year, `locations` FROM [bigquery-public-data:san_francisco.film_locations] LIMIT 10
2017/08/17 03:37:12.438817 resultreader.go:148: could not run {Location: "query"; Message: "Field '`locations`' not found in table 'bigquery-public-data:san_francisco.film_locations'; did you mean 'locations'?"; Reason: "invalidQuery"}

BigQuery source

Implement backend source for big-query

source implementation, schema discovery.
support CREATE SOURCE statement
basic queries.

Can't run with source.

Finaly i found this awesome project.

I need elasticsearch sql proxy, and i had a trial with you binary file and it's nearly perfect. But have some issue when query es-style table name.

Our elasticsearch index like "logstash-nginxlog-2017.10", it's not working on dataux query sql.

mysql> select ip from logstash-nginxlog-2017.10;
ERROR 1105 (HY000): QLBridge.plan:  No datasource found

I think this may caused by the special dot, so i want to fix this. But i can't run with source. I'm very sure i had run go get first already.

Please give a hint. I want to contribute more in ES module.

$ go run main.go
# cloud.google.com/go/longrunning/autogen
../gopath/src/cloud.google.com/go/longrunning/autogen/doc.go:30:11: undefined: metadata.FromOutgoingContext
../gopath/src/cloud.google.com/go/longrunning/autogen/doc.go:33:9: undefined: metadata.NewOutgoingContext
# cloud.google.com/go/datastore
../gopath/src/cloud.google.com/go/datastore/client.go:101:8: undefined: metadata.NewOutgoingContext
# github.com/coreos/etcd/clientv3
../gopath/src/github.com/coreos/etcd/clientv3/logger.go:26:13: undefined: grpclog.LoggerV2

$ go env
GOARCH="amd64"
GOBIN="/wwwroot/gopath//pkg"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/wwwroot/gopath/"
GORACE=""
GOROOT="/usr/local/Cellar/go/1.9/libexec"
GOTOOLDIR="/usr/local/Cellar/go/1.9/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/fm/29zg12712ql3y0l0j3t9_mkw0000gn/T/go-build650601297=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

Cassandra backend

core data source scanner etc for vanilla rows
- core connection and schema discovery
- show tables, describe tables
- predicate push down for where clause if possible for partition-key
- projection rewrite

BigQuery Standard SQL

I have just realized that the legacy_syntax option has not been implemented. I think it make sense to implement and maybe even default to Standard SQL, since Standard SQL is much more similar to MySQL. It has proper support of DISTINCT and DATE types.

To implement Standard SQL, we need to apply it on the query:

        // Use standard SQL syntax for queries.
        // See: https://cloud.google.com/bigquery/sql-reference/
        query.QueryConfig.UseStandardSQL = true

And then update the quote character to ` instead of []. [1] (May need to re-consider changes made in #55)

Test case:

MySQL :

select `name` from `bikeshare_stations` LIMIT 1;

BigQuery Legacy SQL:

select [name] from [bigquery-public-data:san_francisco.bikeshare_stations] LIMIT 1;

BigQuery Standard SQL:

select `name` from `bigquery-public-data.san_francisco.bikeshare_stations` LIMIT 1;

Docs

[1] : https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql#syntax_differences

RethinkDB support

Hi,

I've just found this project, which is really cool. I like the possibilities it gives to let old applications interact with a modern nosql database instead of postgresql or mysql.

I was thinking about RethinkDB support. Is that anything you guys are looking into?

/Tommy

Lytics Source V2

other rest api's
- user, account, segment

K8 Backend

Be able to query k8 as if it was a mysql server. Allow easier consumption of info in K8 such as get list of pods, services, ip's by making it easy to run mysql queries.

Phase 1 #31

initial source with discovery/schema
pods
services
nodes

Phase 2

Switch MongoDB client to globalsign/mgo

The original mgo is abandoned; they recommend using globalsign/mgo.

I can submit the PR for this.

MongoDB Features

Additional features for mongo.

polyfill group-by for non-agg funcs
usage as go sql driver
extract generator

convert grid to v3

Refactor to use the new grid.v3 pkg instead of v2.

Refactor the Source/Sink library to use named mailboxes checked out from a pool, and that are pre-provisioned
ensure mongo test runs

Cannot build from source

I want to create a build-from-source Docker image to make easier to start hacking.
My Dockerfile is:

FROM golang

COPY . /go/src/github.com/dataux/dataux/
WORKDIR /go/src/github.com/dataux/dataux/

RUN \
  go build

When I run docker build -t dataux ., it always fails this way:

backends/bigquery/resultreader.go:7:2: cannot find package "cloud.google.com/go/bigquery" in any of:
	/usr/local/go/src/cloud.google.com/go/bigquery (from $GOROOT)
	/go/src/cloud.google.com/go/bigquery (from $GOPATH)
backends/bigtable/resultreader.go:10:2: cannot find package "cloud.google.com/go/bigtable" in any of:
	/usr/local/go/src/cloud.google.com/go/bigtable (from $GOROOT)
	/go/src/cloud.google.com/go/bigtable (from $GOPATH)
backends/bigquery/resultreader.go:8:2: cannot find package "cloud.google.com/go/civil" in any of:
	/usr/local/go/src/cloud.google.com/go/civil (from $GOROOT)
	/go/src/cloud.google.com/go/civil (from $GOPATH)
backends/datastore/datasource.go:13:2: cannot find package "cloud.google.com/go/datastore" in any of:
	/usr/local/go/src/cloud.google.com/go/datastore (from $GOROOT)
	/go/src/cloud.google.com/go/datastore (from $GOPATH)
backends/bigtable/resultreader.go:11:2: cannot find package "github.com/araddon/dateparse" in any of:
	/usr/local/go/src/github.com/araddon/dateparse (from $GOROOT)
	/go/src/github.com/araddon/dateparse (from $GOPATH)
main.go:27:2: cannot find package "github.com/araddon/gou" in any of:
	/usr/local/go/src/github.com/araddon/gou (from $GOROOT)
	/go/src/github.com/araddon/gou (from $GOPATH)
backends/bigquery/resultreader.go:13:2: cannot find package "github.com/araddon/qlbridge/datasource" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/datasource (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/datasource (from $GOPATH)
main.go:14:2: cannot find package "github.com/araddon/qlbridge/datasource/files" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/datasource/files (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/datasource/files (from $GOPATH)
backends/bigquery/resultreader.go:14:2: cannot find package "github.com/araddon/qlbridge/exec" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/exec (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/exec (from $GOPATH)
backends/bigquery/sql_to_bq.go:19:2: cannot find package "github.com/araddon/qlbridge/expr" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/expr (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/expr (from $GOPATH)
vendored/mixer/proxy/handler_sharded.go:17:2: cannot find package "github.com/araddon/qlbridge/expr/builtins" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/expr/builtins (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/expr/builtins (from $GOPATH)
backends/bigtable/sql_to_bt.go:18:2: cannot find package "github.com/araddon/qlbridge/lex" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/lex (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/lex (from $GOPATH)
backends/bigquery/sql_to_bq.go:20:2: cannot find package "github.com/araddon/qlbridge/plan" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/plan (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/plan (from $GOPATH)
backends/bigquery/resultreader.go:15:2: cannot find package "github.com/araddon/qlbridge/rel" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/rel (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/rel (from $GOPATH)
backends/bigquery/source.go:19:2: cannot find package "github.com/araddon/qlbridge/schema" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/schema (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/schema (from $GOPATH)
backends/bigquery/resultreader.go:16:2: cannot find package "github.com/araddon/qlbridge/value" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/value (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/value (from $GOPATH)
backends/bigtable/sql_to_bt.go:23:2: cannot find package "github.com/araddon/qlbridge/vm" in any of:
	/usr/local/go/src/github.com/araddon/qlbridge/vm (from $GOROOT)
	/go/src/github.com/araddon/qlbridge/vm (from $GOPATH)
planner/plan_server.go:14:2: cannot find package "github.com/coreos/etcd/clientv3" in any of:
	/usr/local/go/src/github.com/coreos/etcd/clientv3 (from $GOROOT)
	/go/src/github.com/coreos/etcd/clientv3 (from $GOPATH)
backends/cassandra/source.go:11:2: cannot find package "github.com/gocql/gocql" in any of:
	/usr/local/go/src/github.com/gocql/gocql (from $GOROOT)
	/go/src/github.com/gocql/gocql (from $GOPATH)
planner/msgs.pb.go:18:8: cannot find package "github.com/golang/protobuf/proto" in any of:
	/usr/local/go/src/github.com/golang/protobuf/proto (from $GOROOT)
	/go/src/github.com/golang/protobuf/proto (from $GOPATH)
backends/cassandra/source.go:12:2: cannot find package "github.com/hailocab/go-hostpool" in any of:
	/usr/local/go/src/github.com/hailocab/go-hostpool (from $GOROOT)
	/go/src/github.com/hailocab/go-hostpool (from $GOPATH)
vendored/mixer/proxy/handler_sharded.go:15:2: cannot find package "github.com/kr/pretty" in any of:
	/usr/local/go/src/github.com/kr/pretty (from $GOROOT)
	/go/src/github.com/kr/pretty (from $GOPATH)
models/config.go:8:2: cannot find package "github.com/lytics/confl" in any of:
	/usr/local/go/src/github.com/lytics/confl (from $GOROOT)
	/go/src/github.com/lytics/confl (from $GOPATH)
planner/sqlactor.go:11:2: cannot find package "github.com/lytics/dfa" in any of:
	/usr/local/go/src/github.com/lytics/dfa (from $GOROOT)
	/go/src/github.com/lytics/dfa (from $GOPATH)
backends/lytics/resultreader.go:9:2: cannot find package "github.com/lytics/go-lytics" in any of:
	/usr/local/go/src/github.com/lytics/go-lytics (from $GOROOT)
	/go/src/github.com/lytics/go-lytics (from $GOPATH)
planner/leader.go:10:2: cannot find package "github.com/lytics/grid/grid.v3" in any of:
	/usr/local/go/src/github.com/lytics/grid/grid.v3 (from $GOROOT)
	/go/src/github.com/lytics/grid/grid.v3 (from $GOPATH)
planner/plan_server.go:16:2: cannot find package "github.com/sony/sonyflake" in any of:
	/usr/local/go/src/github.com/sony/sonyflake (from $GOROOT)
	/go/src/github.com/sony/sonyflake (from $GOPATH)
backends/bigquery/resultreader.go:10:2: cannot find package "golang.org/x/net/context" in any of:
	/usr/local/go/src/golang.org/x/net/context (from $GOROOT)
	/go/src/golang.org/x/net/context (from $GOPATH)
backends/datastore/datasource.go:16:2: cannot find package "golang.org/x/oauth2/google" in any of:
	/usr/local/go/src/golang.org/x/oauth2/google (from $GOROOT)
	/go/src/golang.org/x/oauth2/google (from $GOPATH)
backends/datastore/datasource.go:17:2: cannot find package "golang.org/x/oauth2/jwt" in any of:
	/usr/local/go/src/golang.org/x/oauth2/jwt (from $GOROOT)
	/go/src/golang.org/x/oauth2/jwt (from $GOPATH)
backends/bigquery/resultreader.go:11:2: cannot find package "google.golang.org/api/iterator" in any of:
	/usr/local/go/src/google.golang.org/api/iterator (from $GOROOT)
	/go/src/google.golang.org/api/iterator (from $GOPATH)
backends/datastore/datasource.go:19:2: cannot find package "google.golang.org/api/option" in any of:
	/usr/local/go/src/google.golang.org/api/option (from $GOROOT)
	/go/src/google.golang.org/api/option (from $GOPATH)
backends/mongo/mgo_results.go:9:2: cannot find package "gopkg.in/mgo.v2" in any of:
	/usr/local/go/src/gopkg.in/mgo.v2 (from $GOROOT)
	/go/src/gopkg.in/mgo.v2 (from $GOPATH)
backends/mongo/mgo_results.go:10:2: cannot find package "gopkg.in/mgo.v2/bson" in any of:
	/usr/local/go/src/gopkg.in/mgo.v2/bson (from $GOROOT)
	/go/src/gopkg.in/mgo.v2/bson (from $GOPATH)
backends/kubernetes/client.go:14:2: cannot find package "k8s.io/apimachinery/pkg/apis/meta/v1" in any of:
	/usr/local/go/src/k8s.io/apimachinery/pkg/apis/meta/v1 (from $GOROOT)
	/go/src/k8s.io/apimachinery/pkg/apis/meta/v1 (from $GOPATH)
backends/kubernetes/source.go:15:2: cannot find package "k8s.io/client-go/kubernetes" in any of:
	/usr/local/go/src/k8s.io/client-go/kubernetes (from $GOROOT)
	/go/src/k8s.io/client-go/kubernetes (from $GOPATH)
backends/kubernetes/client.go:15:2: cannot find package "k8s.io/client-go/pkg/api/v1" in any of:
	/usr/local/go/src/k8s.io/client-go/pkg/api/v1 (from $GOROOT)
	/go/src/k8s.io/client-go/pkg/api/v1 (from $GOPATH)
backends/kubernetes/source.go:16:2: cannot find package "k8s.io/client-go/rest" in any of:
	/usr/local/go/src/k8s.io/client-go/rest (from $GOROOT)
	/go/src/k8s.io/client-go/rest (from $GOPATH)
backends/kubernetes/source.go:17:2: cannot find package "k8s.io/client-go/tools/clientcmd" in any of:
	/usr/local/go/src/k8s.io/client-go/tools/clientcmd (from $GOROOT)
	/go/src/k8s.io/client-go/tools/clientcmd (from $GOPATH)

I haven't found any build specific info elsewhere in the repo.

Also, a build-from-source Dockerfile would allow setting up an Automatic Build and would be a very convenient way of providing both tagged and daily builds automatically on each push, with the added benefit of better audit and traceability of the source used to build the Docker image. For some people, images pushed in a 'black box' way are a deal breaker for security reasons. I would happily send a PR if that is deemed a worthwhile improvement on the current Dockerfile.

Thanks in advance.

Implement Google Datstore Backend

Create a dataux Backend for Google Datastore

Datasource implementation
- Tables() get list of available table names
- scan some set of rows to do type inspection.
- implement/expose this scan/introspect to refresh
- insert
- delete
- update
- better key, ie creation of google datastore key with knowledge of fields
  - way to identify the primary key field? and utilize it in inserts
  - where ancestor is expressed re columns (id, aid) (multi column index? araddon/qlbridge#32)
  - some way to implement ancestors
ToSql converter w tests
- Where, poly-file filtering
- Sort
- projection
Type Features
- implement support for Scan() - for json the rest of the entity

MySQL session died after BQ backend error

When query can pass MySQL validation and then fails at the BigQuery backend, the server throws an error (while client receives empty resultset) and apparent breaks the session. The client must re-connect.

Below is an example, the "DISTINCT" keyword is valid in MySQL, but not supported by BigQuery (Legacy SQL).

MySQL client:

mysql> select distinct name from stats limit 1;
Empty set (0.60 sec)

mysql> select name from stats limit 1;
ERROR 2013 (HY000): Lost connection to MySQL server during query
mysql>

Server log:

2017/08/22 02:55:08.964618 mysql_handler.go:170: 0 0xc4203b7500 handleQuery: select distinct name from stats limit 1
2017/08/22 02:55:08.964800 resultreader.go:124: 0xc420155a40 bq limit: 1 sel:SELECT DISTINCT name FROM [project_id:dataset_id.stats] LIMIT 1
2017/08/22 02:55:09.572053 resultreader.go:152: could not run {Location: "query"; Message: "syntax error at: 1.1 - 1.31. SELECT DISTINCT is currently not supported. Please use GROUP BY instead to get the same effect."; Reason: "invalidQuery"}
2017/08/22 02:55:09.572083 resultreader.go:97: nice, finalize ResultReader out: 0xc4205d3ec0  row ct 0
2017/08/22 02:55:09.572103 task_sequential.go:152: *bigquery.ResultReader.Run() errored {Location: "query"; Message: "syntax error at: 1.1 - 1.31. SELECT DISTINCT is currently not supported. Please use GROUP BY instead to get the same effect."; Reason: "invalidQuery"}
2017/08/22 02:55:09.572157 mysql_handler.go:279: error on Query.Run(): {Location: "query"; Message: "syntax error at: 1.1 - 1.31. SELECT DISTINCT is currently not supported. Please use GROUP BY instead to get the same effect."; Reason: "invalidQuery"}
2017/08/22 02:55:09.572170 mysql_handler.go:287: completed in 607.52944ms   ns: 607529719
2017/08/22 02:55:09.572188 conn.go:129: got error on handle {Location: "query"; Message: "syntax error at: 1.1 - 1.31. SELECT DISTINCT is currently not supported. Please use GROUP BY instead to get the same effect."; Reason: "invalidQuery"}
2017/08/22 02:55:17.168556 mysql_handler.go:170: 0 0xc4203b7500 handleQuery: select name from stats limit 1
2017/08/22 02:55:17.168754 resultreader.go:124: 0xc42086e780 bq limit: 1 sel:SELECT name FROM [project_id:dataset_id.stats] LIMIT 1
2017/08/22 02:55:19.422496 resultreader.go:195: finished query, took: 2.253726159s for 1 rows
2017/08/22 02:55:19.422525 resultreader.go:97: nice, finalize ResultReader out: 0xc420783200  row ct 1
2017/08/22 02:55:19.422588 conn_writer.go:145: Could not write mysql out? size=40, err=connection was bad
2017/08/22 02:55:19.422605 mysql_handler.go:287: completed in 2.254027393s   ns: 2254027597

Create an official, Automated Build image on Docker Hub

Docker Hub allows you to create Automated Builds from source: https://docs.docker.com/docker-hub/builds/
It would add another packaging/distribution/installation method, whose buildings would be triggered automatically on each commit. It also allows to create different image tags from git tags & branches.
Also, documentation could easily include a canonical docker run statement to quickly spin up a Dataux instance with just a single command.

By making the image build via an AB, you give the resulting image verifiability and auditability. Also, the build is fully automatic. You can have the latest image tag build from HEAD and individual image tags from git's release tags.
Some people avoid non-verifiable (manually uploaded) images due to security & traceability reasons.

Docker search command clearly displays AB when listing images (mine is shown):

$ docker search dataux
NAME                      DESCRIPTION     STARS     OFFICIAL   AUTOMATED
pataquets/dataux          dataux          0                    [OK]

Just a free Docker Hub account and a quick setup would do. Ping me if you need help.

k8s backend vendor issues

As of 10/13/2017, cannot get the kubernetes backend to compile. There seems to be un-resolveable vendor dependency issues between k8s & etcd?

File Api Source

for file-sources, list their files as a source table.

list files for source
append values to this table based on info in folder, meta-data

How way to make sql query to bson.M?

@araddon Hi ~! I need you help. In my project I want to make sql string to mgo bson.M {}, I can easy way to makesure it? thank a lot!

Easier startup: Default/Zero config and CREATE SOURCE

Make it easier to try out dataux and get started.

pre built binaries, instructions, container image
*kube deployment/service
make config file editing a thing of the past. Instead allow mysql commands of CREATE SOURCE to be the equivalent of conf file.

refs: #38 make persistent schema source

build script for docker file to create gcr.io image
kube deployment & service yaml
sniff kube if inside cluster
allow create to define kube location
commands in readme to run via kubectl
commands in readme for download/run
get someone else to do it verify its easy

update Google Cloud API client import paths and more

The Google Cloud API client libraries for Go are making some breaking changes:

The import paths are changing from google.golang.org/cloud/... to
cloud.google.com/go/.... For example, if your code imports the BigQuery client
it currently reads
import "google.golang.org/cloud/bigquery"
It should be changed to
import "cloud.google.com/go/bigquery"
Client options are also moving, from google.golang.org/cloud to
google.golang.org/api/option. Two have also been renamed:
- WithBaseGRPC is now WithGRPCConn
- WithBaseHTTP is now WithHTTPClient
The cloud.WithContext and cloud.NewContext methods are gone, as are the
deprecated pubsub and container functions that required them. Use the Client
methods of these packages instead.

You should make these changes before September 12, 2016, when the packages at
google.golang.org/cloud will go away.

need a ui

need a ui for this awesomeness

Embed Nats, Etcd optionally

for easier getting started, testing make etcd, nats embeddable.

ClickHouse Integration

It would be great to have the federated ClickHouse connection using your tool

File Reading (csv, json, proto) sources

Be able to Query files on Disk, GCS, S3. support csv, json, or custom (protobuf) formats.

Other optional

max byte size
pre-fetch? cache?

support the couchbase search driver ?

there is a driver here that conforms to the golang SQL interface

https://github.com/couchbase/go_n1ql
works very well.

Lytics Source

Create for user table
Create for campaign

BigTable backend

Backend source for google big-table.

Phase I

plain vanilla selects

Phase II

schema discovery to inspect columns and give them data types for describe.
way to define keys, composite keys. Similar to cassandra. Must modify ast to contain info. how to express?
- keyFunc()
- utilize new CREATE statement in qlbridge?

Cassandra backend part 2

read partition info from schema and re-use in dataux
support detecting json inside columns, and exploding those
pluggable row parsers bc contents of columns may be protobuf etc. possibly same/similar to files?

Persistent Source & Schema Storage

Instead of keeping all source information (ie, which data sources, their connection info, tables) in either 1) config files or 2) memory only create storage for them. Ideally long term this schema-storage is RAFT based for real time consistency.

Allow the storage of schema to use any of the source-backed-source concepts

def.RawData undefined (type *grid.ActorDef has no field or method RawData) (solved,for others until resolved)

i had to patch with https://github.com/lytics/grid/pull/66/files

go get -u github.com/dataux/dataux
# github.com/dataux/dataux/planner
go/src/github.com/dataux/dataux/planner/server.go:57: def.RawData undefined (type *grid.ActorDef has no field or method RawData)
go/src/github.com/dataux/dataux/planner/sqlactor.go:122: m.def.RawData undefined (type *grid.ActorDef has no field or method RawData)