Giter Site home page Giter Site logo

clickhouse's Introduction

Clickhouse Build Status Code Climate Test Coverage Gem Gem

A Ruby database driver for ClickHouse (also Clickhouse CLI and web GUI included).

Introduction

ClickHouse is a high-performance column-oriented database management system developed by Yandex which operates Russia's most popular search engine.

ClickHouse manages extremely large volumes of data in a stable and sustainable manner. It currently powers Yandex.Metrica, world’s second largest web analytics platform, with over 13 trillion database records and over 20 billion events a day, generating customized reports on-the-fly, directly from non-aggregated data. This system was successfully implemented at CERN’s LHCb experiment to store and process metadata on 10bn events with over 1000 attributes per event registered in 2011.

On June 15th 2016, Yandex open-sourced their awesome project giving the community a powerful asset which can compete with the big players like Google BigQuery and Amazon Redshift with an important advantage: the client can use ClickHouse in its infrastructure and does not have to pay for the cloud (read more).

Using the Sinatra-based Clickhouse client web GUI

Why use the HTTP interface and not the TCP interface?

Well, the developers of ClickHouse themselves discourage using the TCP interface.

TCP transport is more specific, we don't want to expose details. Despite we have full compatibility of protocol of different versions of client and server, we want to keep the ability to "break" it for very old clients. And that protocol is not too clean to make a specification.

Why use the JSONCompact format and not the native format?

Despite of it being the most efficient format, using the native format is also discouraged by the ClickHouse developers.

The most efficient format. Data is written and read by blocks in binary format. For each block, the number of rows, number of columns, column names and types, and parts of columns in this block are recorded one after another. In other words, this format is "columnar" - it doesn't convert columns to rows. This is the format used in the native interface for interaction between servers, for using the command-line client, and for C++ clients.

You can use this format to quickly generate dumps that can only be read by the ClickHouse DBMS. It doesn't make sense to work with this format yourself.

Installation

Run the following command to install Clickhouse:

$ gem install "clickhouse"

Usage

Quick start

Require the Clickhouse gem.

require "clickhouse"

Setup the logging output.

require "logger"
Clickhouse.logger = Logger.new(STDOUT)

Establish the connection with the ClickHouse server (using the default config).

Clickhouse.establish_connection
=> true

List databases and tables.

Clickhouse.connection.databases
I, [2016-10-17T22:54:26.587401 #81829]  INFO -- :
  SQL (64.0ms)  SHOW DATABASES;
=> ["default", "system"]

Clickhouse.connection.tables
I, [2016-10-17T22:54:51.454012 #81829]  INFO -- :
  SQL (61.7ms)  SHOW TABLES;
=> []

Create tables.

Clickhouse.connection.create_table("events") do |t|
  t.fixed_string :id, 16
  t.uint16       :year
  t.date         :date
  t.date_time    :time
  t.string       :event
  t.uint32       :user_id
  t.float32      :revenue
  t.engine       "MergeTree(date, (year, date), 8192)"
end
=> true

Clickhouse.connection.query "DESCRIBE TABLE events" # or Clickhouse.connection.describe_table "events"
=> #<Clickhouse::Connection::Query::ResultSet:0x007fa9ac137010
 @names=["name", "type", "default_type", "default_expression"],
 @rows=
  [["id", "FixedString(16)", nil, nil],
   ["year", "UInt16", nil, nil],
   ["date", "Date", nil, nil],
   ["time", "DateTime", nil, nil],
   ["event", "String", nil, nil],
   ["user_id", "UInt32", nil, nil],
   ["revenue", "Float32", nil, nil]],
 @types=["String", "String", "String", "String"]>

Check if table exists.

Clickhouse.connection.exists_table("events")
=> true

Insert data.

Clickhouse.connection.insert_rows(events, :names => %w(id year date time event user_id revenue)) do |rows|
  rows << [
    "d91d1c90",
    2016,
    "2016-10-17",
    "2016-10-17 23:14:28",
    "click",
    1982,
    0.18
  ]
  rows << [
    "d91d2294",
    2016,
    "2016-10-17",
    "2016-10-17 23:14:41",
    "click",
    1947,
    0.203
  ]
end
=> true

Query data.

Clickhouse.connection.count :from => "events"
I, [2016-10-17T23:19:45.592602 #82196]  INFO -- :
  SQL (65.4ms)  SELECT COUNT(*)
FROM events;
=> 2

Clickhouse.connection.select_row :select => "COUNT(*), year, date, avg(revenue)", :from => "events", :group => "year, date"
I, [2016-10-17T23:22:47.340232 #82196]  INFO -- :
  SQL (67.7ms)  SELECT COUNT(*), year, date, avg(revenue)
FROM events
GROUP BY year, date;
=> [2, 2016, #<Date: 2016-10-17 ((2457679j,0s,0n),+0s,2299161j)>, 0.1915000081062317]

Connecting to a cluster

To connect to a cluster you only need to specify the URLs of the cluster servers in :urls of the configuration and that is it! The API of using Clickhouse stays the same.

Clickhouse.establish_connection urls: %w(http://192.168.99.100:32809 http://192.168.99.100:32812 http://192.168.99.100:32815)
=> true

Clickhouse.connection.tables
I, [2016-10-21T11:56:47.375772 #63374]  INFO -- :
  SQL (6.2ms)  SHOW TABLES;
=> ["events"]

In case of a connection dropping out, Clickhouse will retry the request with another connection. The failed connection will also be removed from the connection pool.

Clickhouse.establish_connection urls: %w(http://192.168.99.100:32809 http://192.168.99.100:1 http://192.168.99.100:32815)
=> true

Clickhouse.connection.pond.available.collect(&:url)
=> ["http://192.168.99.100:1", "http://192.168.99.100:32815", "http://192.168.99.100:32809"]

Clickhouse.connection.tables
I, [2016-10-21T12:11:55.974573 #63527]  INFO -- :
  SQL (7.1ms)  SHOW TABLES;
=> ["events"]

Clickhouse.connection.pond.available.collect(&:url)
=> ["http://192.168.99.100:32809", "http://192.168.99.100:32815"]

If all the connections failed, it will just return nil.

Check out the tests

To see what more the Clickhouse gem has to offer, please take a look at the unit tests ( test/unit/connection/test_query.rb for instance).

Using the Sinatra-based browser GUI and Thor-based CLI

As of Clickhouse v0.1.8, the gem is provided with both a Sinatra-based GUI and a Thor-based CLI. Starting either of them is very easy:

  • clickhouse s localhost:8123 - (the s stands for server as we know from rails s)
  • clickhouse c localhost:8123 - (the c stands for console as we know from rails c)

Multiple connections should be passed comma separated:

clickhouse s https://myserver.com:8123,https://myserver.com:8124

Use clickhouse help to:

$ clickhouse help server
Usage:
  clickhouse server [HOSTS]

Options:
  -p, [--port=N]
                             # Default: 1982
  -u, [--username=USERNAME]
  -P, [--password=PASSWORD]

Start a Sinatra server as ClickHouse client (HOSTS should be comma separated URIs)

Using the console

As you probably already noticed, the Clickhouse repo is provided with a script/console file which you can use for development / testing purposes. Please note that you need to have a ClickHouse server running.

Running a ClickHouse server on your Mac or Windows computer

Despite that the ClickHouse build is not intended to work on Mac OS X or Windows (only x86_64 with SSE 4.2 is supported), you can still run a ClickHouse server instance on both the operating systems using the ClickHouse Server Docker Image hosted on [https://hub.docker.com/](Docker Hub).

The installation process is just a matter of two simple steps:

Et voilà! Your ClickHouse server instance is up and running locally. Please make sure to use the proper IP address and port to connect with. You can find it at the container details within Kitematic (it is the Access URL corresponded with the 8123/tcp Docker port).

Example

$ script/console
Loading Clickhouse development environment (0.1.1)
[1] pry(main)> connect! host: "192.168.99.100", port: 32770
=> true
[2] pry(main)> conn.databases
I, [2016-10-19T20:54:53.081388 #29847]  INFO -- :
  SQL (3.1ms)  SHOW DATABASES;
=> ["default", "system"]
[3] pry(main)>

Testing

Run the following command for testing:

$ rake

You can also run a single test file:

$ ruby test/unit/connection/test_query.rb

Contact me

For support, remarks and requests, please mail me at [email protected].

License

Copyright (c) 2016 Paul Engel, released under the MIT license

http://github.com/archan937http://twitter.com/archan937[email protected]

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

clickhouse's People

Contributors

archan937 avatar prikha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

clickhouse's Issues

Long-lasting queries in cluster may result in Clickhouse server bloat

When performing heavy Clickhouse request it may take too much time to finish. Sometimes it may even exceed HTTPClient connection timeout. Here is what happens in details:

  1. We run some long-lasting query on Clickhouse cluster connecting in a way like this: Clickhouse.establish_connection(urls: ['host1.lvh.me:8123', 'host2.lvh.me:8123'])
  2. Clickhouse server starts working hard on query we issued, but it takes too long.
  3. Clickhouse::Connection::Client#request receives Faraday::TimeoutError exception.
  4. Clickhouse::Cluster#method_missing retries the same request on another Clickhouse host from pond's pool with the same result. If servers are under really heavy load and request lasts long enough, we will run out of servers in the pool and... come to the first one again which is still working over the heavy query!
  5. Go to 1.

As a result each Clickhouse server in cluster runs the same query over and over again increasing the load.

This bloat ends by reaching maximum number of simultaneous queries, when Clickhouse refuses to take another query.

Establishing connection with username & password

Is there a way to establish a connection to Clickhouse having your username & password set? I tried both
http://username:mypass@myhost:8123
and
http://myhost:8123?user=username&password=mypass
and neither works.

Bug in CSV encoding strings with doble quotes at the beginning

If you try to insert_rows with string value like:

''The String'' with continuation

(there is 2 single quotes there)
Default Ruby CSV gem encode this string without " quotes, because there isn't any characters required to be encoded. But Clickhouse consider this 2 single quotes as field quotation mark and trigger an error: "comma required before with continuation"

Now there is no way to change CSV generation mode without patching the gem. (Adding :force_quote=>true solves the problem). So the problem is quite unpleasant. Probable you should add CSV options to insert_rows() arguments.

Sessions are not yet supported?

session_id parameter for the connection can be used to setup a session with Clickhouse

One application is using TEMPORARY TABLE

Inserting a subset of columns doesn't work

If I have a table with two columns, “foo” and “bar”, I cannot use insert_rows to insert a record with only “foo”, because insert_rows doesn't actually use the “rows” parameter. It should be something like

INSERT INTO #{table} (#{rows.join(",")}) ...

Allow specifying timeout

It seems like there's no way to pass config to underlying Faraday adapters.

Right now we have to do this ugly hack:

 module Clickhouse
    class Connection
      module Client
        def client
          @client ||= Faraday.new(url: url) do |f|
            f.adapter :net_http do |http|
              http.read_timeout = 300
            end
          end
        end
      end
    end
  end

Is there anyway to pass this config to the client? Or perhaps we should make 300 seconds the default timeout, since that's the default in Clickhouse server as well?

Filtering not working properly

While in clickhouse console:

SELECT count()
FROM tracks
WHERE date = '2019-07-09'

┌─count()─┐
│ 4162001 │
└─────────┘

And Rails log shows

DEBUG -- : SELECT count() FROM tracks WHERE date = '2019-07-09'
DEBUG -- : [{"count()"=>0}]

The gem seems to be abandoned, but I'll drop this here anyway as a warning.

String array insert not working

Clickhouse.connection.create_table("events") do |t|
  t.fixed_string :id, 16
  t.array       :array_data, 'String'
  t.date        :date
  t.engine       "MergeTree(date, (date), 8192)"
end
Clickhouse.connection.insert_rows("events", :names => %w(id array_data date)) do |rows|
  rows << ["d91d1c90", ["1", "2"], "2016-10-17"]
end
Clickhouse::QueryError: Code: 26, e.displayText() = DB::Exception: Cannot parse quoted string: expected opening quote: 

Row 1:
Column 0,   name: id,         type: FixedString(16), parsed text: "d91d1c90"
Column 1,   name: array_data, type: Array(String),   parsed text: <EMPTY>ERROR

It happens because CSV generate array with double quotes, and Clickhouse cant understand this format

\"[\"1\", \"2\"]\"

Worked example with single quotes:

Clickhouse.connection.execute("INSERT INTO events FORMAT CSV \"d91d1c90\",\"['1','2']\",\"2016-10-17\"")

Default value for columns

Does this gem support defining default value as what is descried in clickhouse documentation metioned bellow?

CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db.]name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
    ...
) ENGINE = engine

I saw Clickhouse::Connection::Query::Table#to_sql and think it is not supported.

Long queries cause error 400 due to usage of GET request

Long SQL queries cause clickhouse to respond 400 on a valid query
I finally had to switch to POST request in this way:

def query_long part1, part2
data = JSON.parse Clickhouse.connection.execute(part1, part2 + " FORMAT JSONCompact")
names = data["meta"].collect{|column| column["name"]}
types = data["meta"].collect{|column| column["type"]}
Clickhouse::Connection::Query::ResultSet.new data["data"], names, types
end

This would be cool if GEM will make this decision automatically

Password required for user default

When I run this command Clickhouse.connection.databases it gives an error

Clickhouse::QueryError: Got status 401 (expected 200): Code: 194, e.displayText() = DB::Exception: Password required for user default (version 19.17.6.36 (official build))

how can I give password?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.