Giter Site home page Giter Site logo

node-clickhouse's Introduction

Simple and powerful interface for ClickHouse travis codecov

npm install @apla/clickhouse

Synopsis

const ClickHouse = require('@apla/clickhouse')
const ch = new ClickHouse({ host, port, user, password })

const stream = ch.query("SELECT 1", (err, data) => {})
stream.pipe(process.stdout)

// promise interface, not recommended for selects
// (requires 'util.promisify' for node < 8, Promise shim for node < 4)
await ch.querying("CREATE DATABASE test")

Examples:

API

new ClickHouse(options: Options)

Options

required default description
host Host to connect.
user Authentication user.
password Authentication password.
path (pathname) / Pathname of ClickHouse server.
port 8123 Server port number.
protocol 'http:' 'https:' or 'http:'.
dataObjects false By default (false), you'll receive array of values for each row.
If you set dataObjects: true, every row will become an object with format: { fieldName: fieldValue, … }.
Alias to format: 'JSON'.
format JSONCompact Adds the FORMAT statement for query if it did not have one.
Specifies format of selected or inserted data.
See "Formats for input and output data" to find out possible values.
queryOptions Object, can contain any ClickHouse option from Settings, Restrictions and Permissions.
See example.
readonly false Tells driver to send query with HTTP GET method. Same as readonly=1 setting. More details.
timeout,
headers,
agent,
localAddress,
servername,
etc…
Any http.request or https.request options are also available.
Options example:
const ch = new ClickHouse({
  host: "clickhouse.msk",
  dataObjects: true,
  readonly: true,
  queryOptions: {
    profile: "web",
    database: "test",
  },
})

clickHouse.query(query, [options], [callback])

Sends a query statement to a server.

query: string

SQL query statement.

options: Options

The same Options, excluding connection options.

callback: (error, result) => void

Will be always called upon completion.

Returns: DuplexStream

It supports .pipe to process records.
You should have at least one error handler listening. Via query callback or via stream error event.

Stream event Description
'error' Query execution finished with error.
If you have both query callback and stream error listener, you'll have error notification in both listeners.
'metadata' When a column information is parsed.
'data' When a row is available.
'end' When entire response is processed.
Regardless of whether there is an 'end' listener, the query callback are always called.
You should always listen to 'data' event together with 'end' event.
"The 'end' event will not be emitted unless the data is completely consumed."
If you don't need to handle 'data' event prefer to use only callback or Promise interface.
stream.supplemental

After response is processed, you can read a supplemental response data from it, such as row count.

Examples:

clickHouse.ping(callback)

Sends an empty query. Doesn't requires authorization.

callback: (error, result) => void

Will be called upon completion.


Promise interface

Promise interface is not recommended for INSERT and SELECT queries.

  • INSERT can't do bulk load data with promise interface.
  • SELECT will collect entire query result in the memory. See the Memory size section.

With promise interface query result are parsed synchronously. This means that large query result in promise interface:

  • Will synchronously block JS thread/event loop.
  • May lead to memory leaks in your app due peak GC loads.

Use it only for queries where resulting data size is is known and extremely small.
The good cases to use it is DESCRIBE TABLE or EXISTS TABLE

clickHouse.querying(query, [options])

Similar to ch.query(query) but collects entire response in memory and resolves with complete query result.
See the Memory size section.

options: Options

The same Options, excluding connection options.

Returns: Promise

Will be resolved with entire query result.

Example of promise interface.

clickHouse.pinging()

Promise interface for .ping.

Returns: Promise

How it works

Bulk data loading with INSERT statements

INSERT can be used for bulk data loading. There is a 2 formats easily implementable with javascript: CSV and TabSeparated/TSV.

CSV is useful for loading from file, thus you can read and .pipe into clickhouse file contents.
To activate CSV parsing you should set format driver option or query FORMAT statement to CSV:

var csvStream = fs.createReadStream('data.csv')
var clickhouseStream = ch.query(statement, { format: CSV })

csvStream.pipe(clickhouseStream)

TSV is useful for loading from file and bulk loading from external sources, such as other databases. Only \\, \t and \n need to be escaped in strings; numbers, nulls, bools and date objects need some minor processing. You can send prepared TSV data strings (line ending will be appended automatically), buffers (always passed as is) or Arrays with fields.

Internally, every field will be converted to the format which ClickHouse can accept. Then escaped and joined with delimiter for the particular format. If you ever need to store rows (in arrays) and send preformatted data, you can do it.

ClickHouse also supports JSONEachRow format which can be useful to insert javascript objects if you have such recordset.

const stream = ch.query(statement, { format: 'JSONEachRow' })

stream.write(object) // Do write as many times as possible
stream.end() // And don't forget to finish insert query

Memory size

You can read all the records into memory in single call like this:

var ch = new ClickHouse({ host: host, port: port })
ch.querying("SELECT number FROM system.numbers LIMIT 10", (err, result) => {
  // result will contain all the data you need
})

In this case whole JSON response from the server will be read into memory, then parsed into memory hogging your CPU. Default parser will parse server response line by line and emits events. This is slower, but much more memory and CPU efficient for larger datasets.


Examples

Selecting with stream:

const readableStream = ch.query(
  'SELECT * FROM system.contributors FORMAT JSONEachRow',
  (err, result) => {},
)
const writableStream = fs.createWriteStream('./contributors.json')
readableStream.pipe(writableStream)

Inserting with stream:

const readableStream = fs.createReadStream('./x.csv')
const writableStream = ch.query('INSERT INTO table FORMAT CSV', (err, result) => {})
readableStream.pipe(writableStream)

Insert single row of data:

const ch = new ClickHouse(options)
const writableStream = ch.query(`INSERT INTO table FORMAT TSV`, (err) => {
  if (err) {
    console.error(err)
  }
  console.log('Insert complete!')
})

// data will be formatted for you
writableStream.write([1, 2.22, "erbgwerg", new Date()])

// prepare data yourself
writableStream.write("1\t2.22\terbgwerg\t2017-07-17 17:17:17")

writableStream.end()

Selecting large dataset:

const ch = new ClickHouse(options)
// it is better to use stream interface to fetch select results
const stream = ch.query("SELECT * FROM system.numbers LIMIT 10000000")

stream.on('metadata', (columns) => { /* do something with column list */ })

let rows = []
stream.on('data', (row) => rows.push(row))

stream.on('error', (err) => { /* handler error */ })

stream.on('end', () => {
  console.log(
    rows.length,
    stream.supplemental.rows,
    stream.supplemental.rows_before_limit_at_least, // how many rows in result are set without windowing
  )
})

Inserting large dataset:

const ch = new ClickHouse(options)
// insert from file
const tsvStream = fs.createReadStream('data.tsv')
const clickhouseStream = ch.query('INSERT INTO table FORMAT TSV')

tsvStream.pipe(clickhouseStream)

Settings for connection:

const ch = new ClickHouse({
  host: 'clickhouse.msk',
  queryOptions: {
    database: "test",
    profile: "web",
    readonly: 2,
    force_index_by_date: 1,
    max_rows_to_read: 10 * 1e6,
  },
})

Settings for query:

const ch = new ClickHouse({ host: 'clickhouse.msk' })
const stream = ch.query('INSERT INTO table FORMAT TSV', {
  queryOptions: {
    database: "test",
    insert_quorum: 2,
  },
})

Promise interface:

const ch = new ClickHouse(options)
// Check connection to server. Doesn't requires authorization.
await ch.pinging()
const { data } = await ch.querying("SELECT 1")
// [ [ 1 ] ]
const { data } = await ch.querying("DESCRIBE TABLE system.numbers", { dataObjects: true })
// [ { name: 'number', type: 'UInt64', default_type: '', default_expression: '' } ]

node-clickhouse's People

Contributors

0xflotus avatar alexey-milovidov avatar apla avatar blastz avatar druide avatar evle avatar moehaydar avatar nezed avatar orloffv avatar peteychuk avatar tbkba avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

node-clickhouse's Issues

The last blank is trimed,so how to fix this error

error:
Error: Unknown format CSV1550220733

Code

let statement = 'INSERT INTO testTable FORMAT CSV ';
let csvStream = fs.createReadStream (‘csv.csv’);
let clickhouseStream = ch.query (statement, {inputFormat: 'CSV'});
csvStream.pipe (clickhouseStream);
clickhouseStream.on('end',function(){
ac(file_name+'->success');
});
clickhouseStream.on ('error', function (err) {
console.log(err);
ac(file_name+'->false');
});

csv file

1550220733,22222,33333

Basic Connection with clickhouse

Hey apla,
I have a trouble on connecting clickhouse with nodejs. I am now working with you sample code and some questions pop-up with the clickhouse options. I just want to know it is correct of the options format?
Thank you a lot for this modules with clickhouse!

`var ClickHouse = require ('@apla/clickhouse');
var options = {
host: '160.xx.xx.xxx:8123',
auth:{
user:'abc123',
password:'paswword123'
},
};
var ch = new ClickHouse (options);
var stream = ch.query ("SELECT * from xxx limit 1");

// or collect records yourself
var rows = [];

stream.on ('metadata', function (columns) {
// do something with column list
});
`
I got an error about

first argument must be a string buffer arraybuffer array or array-like object

Error: Chunk size is too large (version 21.3.2.5 (official build))

We are using node-clickhouse and creating batches of insert values like

INSERT INTO .... values (), (), ()...

and so on.

After a certain batch size, the database starts to complain with the following error

Error: Chunk size is too large (version 21.3.2.5 (official build))
code: 307,
scope: 'DB::Exception:'

While researching more on this problem, I found the following thread

ClickHouse/ClickHouse#21907

which seemed to suggest it could be related to chunk headers. Can submit a script to try and reproduce the issue.

Thanks

stream.on('error', ...) is not being fired

I am trying to gracefully handle stream errors but neither stream.on('error', ...) nor wrapping the stream in a try catch actually work. Any particular ways to gracefully handle the errors without having the app to crash?

connect/disconnect or how to know if connection is alive

Should I create ch on every query or can use instance of ch in time?

const ch = new ClickHouse({ host: HOST, post: PORT, user: USER, password: PASSWORD });

If i shouldn't, how can i know if connection is alive and how to disconnect from CH?

Automated codestyle checks

To make it easier to make PR and keep codestyle consistent i propose to add:

  • eslint with preferred confg
  • precommit hook that checks codestyle (i recommend husky+lint-staged)
  • CI checks (may be implemented by running eslint on travis-ci)

One of the most good and not too strict config which i respect is eslint-config-standard
https://standardjs.com/#standardjs--the-rules

Introducing hard codestyle convensions requires mostly automatic code refactoring which will produce large but dummy diff at one time.

@apla i'l make a PR, but i want to hear your opinion about all described above.

Termination with comments in query and .query method

Run such snippet:

const ClickHouse = require('@apla/clickhouse');

const c = new ClickHouse({
  host,
  port,
  auth
});

const q = 'select count() from table
-- ololololoolololololoolololololol
'
const stream = c.query(q);

stream.on('metadata', function (metadata) {
  console.log('metadata', metadata);
});

stream.on('data', function (data) {
  console.log('data!', data);
});

stream.on('error', err => {
  console.log('error', err);
});

stream.on('end', () => {
  console.log('end');
});

stream.on('end', () => {
  console.log('end');
});

And receive nothing. Stream was closed and the script terminated. There were no fired events. However, if I remove comments from the query it will work properly.

I looked at tcpdump and saw that the request was sent to the server. After sending we got 'close' event from http.response inside the client's code then 'response' event and 'close' again.

Tried to send the request with http://ui.tabix.io/ - everything is fine. And it is the reason why I texted here. It looks like client bug, which not related to server settings.

anyLast returns string

I have a field called duration which UInt64. If I pass select query with duration:anyLast, it returns string. Here I expect number instead of string. why does it happen?

Backslash escapes not working properly

The library has a problem with escaping backslashes

In code we have:
v.replace (/\\/g, '\\')

Its mean: please replace each one backslash with one backslash
Because JS has following rules for backslashes in strings and regular expressions

\ — always escapes the next character
\\ — always mean one backslash

So for replacing one backslash with two backslashes, we need double this one in second parameters of replace function

v.replace (/\\/g, '\\') -> v.replace (/\\/g, '\\\\')

Exposing of the "finalized" option

My problem is to execute a query where INSERT is performed from the aggregated data from another table, for example:

    INSERT INTO temp_regs_${sessionID}
    SELECT *
    FROM (
        SELECT min(date) as day, user_id
        FROM runs
        GROUP BY user_id
    ) as first_regs
    WHERE day BETWEEN ${from} AND ${to}    
    GROUP BY day, user_id

Such query will never end if I'm using a promise interface, because it waits for stream.end() call. I looked into source and found the line https://github.com/apla/node-clickhouse/blob/master/src/clickhouse.js#L289 where request is marked as not finalized. I really want to stick with promises API, because it allows me to use async/await syntax, but now it's not possible for a query above.

For now I came up with a custom wrapper like this:

  function queryFinalized(sql, options) {
    return new Promise((resolve, reject) => {
      const callback = (error, data) => {
        if (error) reject(error);
        else resolve(data);
      };

      const stream = ctx.clickhouse.query(sql, options, callback);

      stream.end();
    });
  }

But maybe it's better to expose finalized flag in req options?

result value type is wrong

hi

when the result include Int type, the value will be string but not number

any options to handle it?

thanks

Importing TSV to ClickHouse return ERROR: garbage after Nullable(Int64)

This code process a simple TSV and get broken when instead of String data type it is used Int.

  • Resume error: parsed text: <EMPTY>ERROR: garbage after Nullable(Int64): "NULL<LINE FEED>112<TAB>S"
  • TSV line: second

the create table schema

 CREATE TABLE bucket_8.migrar
              (customerNumber Nullable(Int16),customerName Nullable(String),contactLastName Nullable(String),contactFirstName Nullable(String),contactName Nullable(String),phone Nullable(String),addressLine1 Nullable(String),addressLine2 Nullable(String),address Nullable(String),city Nullable(String),state Nullable(String),postalCode Nullable(String),country Nullable(String),salesRepEmployeeNumber Nullable(Int16),creditLimit Nullable(Decimal(12,4)),casa Nullable(Int64)) ENGINE = MergeTree
              ORDER BY tuple()

code to do the magic

 private async bringToClickHouse(taskData, destination) {
        const tsvStream = createReadStream(destination);
        const clickhouseStream = click.query(
            `INSERT INTO default.customers` {
                    format: 'TSV',
                    queryOptions: {
                        input_format_tsv_empty_as_default: 1,
                        input_format_null_as_default: 1,
                        input_format_skip_unknown_fields: 1,
                    },
                });

        tsvStream.pipe(clickhouseStream);

        return new Promise((resolve, reject) => {
            tsvStream.on('error', err => {
                reject(err);
            });
            tsvStream.on('finish', res => {
                resolve(res);
            });
        });
    }

the TSV

103	Atelier graphique	Schmitt	Carine 	Carine  Schmitt	40.32.2555	54, rue Royale	NULL	NULL	Nantes	NULL	44000	France	1370	21000.00	NULL
112	Signal Gift Stores	King	Jean	Jean King	7025551838	8489 Strong St.	NULL	NULL	Las Vegas	NV	83030	USA	1166	71800.00	NULL
114	Australian Collectors, Co.	Ferguson	Peter	Peter Ferguson	03 9520 4555	636 St Kilda Road	Level 3	636 St Kilda Road - Level 3	Melbourne	Victoria	3004	Australia	1611	117300.00	NULL
119	La Rochelle Gifts	Labrune	Janine 	Janine  Labrune	40.67.8555	67, rue des Cinquante Otages	NULL	NULL	Nantes	NULL	44000	France	1370	118200.00	NULL
121	Baane Mini Imports	Bergulfsen	Jonas 	Jonas  Bergulfsen	07-98 9555	Erling Skakkes gate 78	NULL	NULL	Stavern	NULL	4110	Norway	1504	81700.00	NULL
124	Mini Gifts Distributors Ltd.	Nelson	Susan	Susan Nelson	4155551450	5677 Strong St.	NULL	NULL	San Rafael	CA	97562	USA	1165	210500.00	NULL
125	Havel & Zbyszek Co	Piestrzeniewicz	Zbyszek 	Zbyszek  Piestrzeniewicz	(26) 642-7555	ul. Filtrowa 68	NULL	NULL	Warszawa	NULL	01-012	Poland	NULL	0.00	NULL
128	Blauer See Auto, Co.	Keitel	Roland	Roland Keitel	+49 69 66 90 2555	Lyonerstr. 34	NULL	NULL	Frankfurt	NULL	60528	Germany	1504	59700.00	NULL

the helpful but not so loved error

  var fields = new Error (e.toString ('utf8'));
              ^
Error: Cannot parse input: expected \n before: NULL\n112\tSignal Gift Stores\tKing\tJean\tJean King\t7025551838\t8489 Strong St.\tNULL\tNULL\tLas Vegas\tNV\t83030\tUSA\t1166\t71800.00\tNULL\n114\tAustralian Collectors, Co.\tFe: (at row 1)Row 1:Column 0,   name: customerNumber,         type: Nullable(Int16),          parsed text: "103"Column 1,   name: customerName,           type: Nullable(String),         parsed text: "Atelier graphique"Column 2,   name: contactLastName,        type: Nullable(String),         parsed text: "Schmitt"Column 3,   name: contactFirstName,       type: Nullable(String),         parsed text: "Carine "Column 4,   name: contactName,            type: Nullable(String),         parsed text: "Carine  Schmitt"Column 5,   name: phone,                  type: Nullable(String),         parsed text: "40.32.2555"Column 6,   name: addressLine1,           type: Nullable(String),         parsed text: "54, rue Royale"Column 7,   name: addressLine2,           type: Nullable(String),         parsed text: "NULL"Column 8,   name: address,                type: Nullable(String),         parsed text: "NULL"Column 9,   name: city,                   type: Nullable(String),         parsed text: "Nantes"Column 10,  name: state,                  type: Nullable(String),         parsed text: "NULL"Column 11,  name: postalCode,             type: Nullable(String),         parsed text: "44000"Column 12,  name: country,                type: Nullable(String),         parsed text: "France"Column 13,  name: salesRepEmployeeNumber, type: Nullable(Int16),          parsed text: "1370"Column 14,  name: creditLimit,            type: Nullable(Decimal(12, 4)), parsed text: "21000.00"Column 15,  name: casa,                   type: Nullable(Int64),          parsed text: <EMPTY>ERROR: garbage after Nullable(Int64): "NULL<LINE FEED>112<TAB>S" (version 20.3.7.46 (official build))

No error handling

For example, any ECONNREFUSED causes an application crash. For example, if CH server is down.

const clickhouse = new ClickHouse(config.clickhouse);
clickhouse.query('select 1', (...args) => console.log(args)); // <= crash here
clickhouse.ping((...args) => console.log(args)); // <= crash here
events.js:137
      throw er; // Unhandled 'error' event
      ^

Error: connect ECONNREFUSED 127.0.0.1:8123
    at Object._errnoException (util.js:1003:13)
    at _exceptionWithHostPort (util.js:1024:20)
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1195:14)

Therefore, it's not possible to use this lib in production. There is a PR #3 about it, but looks like it is ignored. Is this repo alive?

Stream transfer data

Hello dier support. I am working with Clickhouse on nodejs and using your driver.

As I can see, the stream transmits only one row per time. Transmit of 1,5 million of rows takes 25 sec.
It would be very good if Stream could receive a bulk of rows per time.

Could you tell me, maybe it is a clickhouse configuration problem, or bad driver?

Error: Cannot read from istream at offset 0

I query with format INSERT INTO MY_TABLE with option JSONEachRow, my data have one billion records, I splice and write up 1000 each time, and write from record 79k I have a bug Error: Cannot read from istream at offset 0.
My query "INSERT INTO " + table + " FORMAT JSONEachRow " + data.

logs
Log error on the server.

Please help my issue, thank.

convert timestamp to DataTime64 fail

Need your help. Thank you.
I create a table with sql:

CREATE TABLE dt
(
    `timestamp` DateTime64(3, 'Europe/Moscow'),
    `event_id` UInt8
)
ENGINE = TinyLog;

But when I insert data to the table with

const writableStream = ch.query(`INSERT INTO dt FORMAT TSV`, {format: "TSV"}, (err) => {
  if (err) {
    console.error(err)
  }
  console.log('Insert complete!')
})
writableStream.write(["1546300800012", 4]);
writableStream.end();

it returns an error:

{ Error: Cannot parse input: expected '\t' before: '012\t4\n': (at row 1): Row 1:Column 0,   name: timestamp, type: DateTime64(3, 'Europe/Moscow'), parsed text: "1546300800"ERROR: garbage after DateTime64(3, 'Europe/Moscow'): "012<TAB>4<LINE FEED>" (version 21.2.5.5 (official build))
    at parseError (/home/work/src/node_modules/@apla/clickhouse/src/parse-error.js:2:15)
    at errorHandler (/home/work/src/node_modules/@apla/clickhouse/src/clickhouse.js:29:13)
    at IncomingMessage.<anonymous> (/home/meilin/src/node_modules/@apla/clickhouse/src/clickhouse.js:97:11)
    at IncomingMessage.emit (events.js:203:15)
    at endReadableNT (_stream_readable.js:1145:12)
    at process._tickCallback (internal/process/next_tick.js:63:19) code: 27, scope: 'DB::ParsingException:' }

But if I modify the query statement to

writableStream.write(["1546300800", 4])

It will be OK.
But in fact, the format of timestamp in my table is DataTime64(3, "xxx"), It could get a timestamp length of 13.

Anybody occured the problem?

Query SQL comment prefix query report error

For example SQL:
--admin
SELECT
date_time, app_id FROM basic.basic_day_all limit 2`
execute SQL to return underfind,

Query source code is found to be a problem with this code:

if (chQuery.match (/^(?:SELECT|WITH|SHOW|DESC|DESCRIBE|EXISTS\s+TABLE)/i)) {
	if (!options.format)
		options.format = options.dataObjects ? 'JSON' : 'JSONCompact';
} else if (chQuery.match (/^INSERT/i)) {
      //
} 

I hope the author can help repair it, TK~

Format option is not correctly applied for SELECT query with WITH clause

Example:

ClickHouse({format: "JSON"})
Query:

WITH sum(bytes) as s
SELECT
    formatReadableSize(s),
    table
FROM system.parts
GROUP BY table
ORDER BY s

Current behaviour:

Execution of the query produces result formatted as plaintext.

Expected behaviour:

Execution of the query produces result formatted JSON.

Workaround:

Explicitly add FROMAT JSON to the query.

WITH sum(bytes) as s
SELECT
    formatReadableSize(s),
    table
FROM system.parts
GROUP BY table
ORDER BY s
FORMAT JSON

Не могу получить результат запроса из ClickHouse, используя JavaScript

Я только знакомлюсь как с СН, так и JS - ну, так вышло.. ((
СН подключен через @apla.
Я посылаю простейшие запросы к существующей таблице примерно так:
this.clickHouse.query ("SELECT count() FROM gStat.cand001")

На выходе получаю стрим - как и запланировано... разработчиками.
Но где мой результат (счетчик) - мне непонятно :(
Как его получить?
Спасибо

Stream large data

Topic open to community of this module

This topic is just to a discussion about the stream flow so we users can understand better the way this plugin handle stream with clickhouse.

Considera a amount os data that takes 10GB. Make a readstream then do a stream is quite easy with the node-clickhouse. About that:

  1. Can I consider that this flow does not supercharge clickhouse database?
  2. Looking at the documentation, whats the diferente between this, does is has diference in terms of performance:

A. Insert with stream

const writableStream = ch.query('INSERT INTO table FORMAT CSV', (err, result) => {})

B. Insert large data (without callback)

const clickhouseStream = ch.query('INSERT INTO table FORMAT TSV')
tsvStream.pipe(clickhouseStream)
  1. I read the Clickhouse docs. This setting make things goes right when well set. How does can I use insert_quorum to make stream write faster considering a single server (without replicas)?

  2. With node-clickhouse WriteStream, do I have to make my code take care of garbage collection so I must make use of pause/resume/drain?

Connection and Loadbalancing to multiple clickhouse hosts

We have a clickhouse cluster, used for replication, and we want to connect to all the hosts and ensure a load balancing across healthy hosts or an automatic failover.
Is this something that is scheduled to be implemented in the near future? Are you open for a pull requests?

Query progress

Is it possible to get information about the query progress in order to provide feedback like the clickhouse-client does?

How to insert type 'Date'?

hey,

I'm trying to insert the clickhouse type Date from js Date().
I tried new Date().toISOString().substr(0,10) which worked fine but reading it causes it to return a String.

What's the proper way of doing this?

ON CLUSTER syntax makes the query execution hang

I've set up multiple 'replicas' and 'shards' with docker compose. If you use the clickhouse 'ON CLUSTER' syntax, clickhouse sends back a simple text (not json) which contains the information about the replicas. The library expects JSON but does not handle this gracefully. The result is that the response will never come since no 'end' or 'error' is send.

What clickhouse sends back:
clickhouse-shard1-replica1 9000 0 3 0
clickhouse-shard2-replica1 9000 0 2 0
clickhouse-shard2-replica2 9000 0 1 0
clickhouse-shard1-replica2 9000 0 0 0

In order to send an end an just make it work (without providing results) I assume you need to add a stream.push(null) in clickhouse.js at line 160. I don't know how you want to tackle something like that though.

Cannot use JSONEachRow format

I was trying to sending an INSERT query with a JSONEachRow format and any of my attempts failed.

While debugging I found two problems:

The line

if (!options.omitFormat && formatMatch) {
overrides the options.format = "JSONEachRow" I have explicitly set. Is that on purpose? If so, why?

The regexp

var formatRegexp = /FORMAT\s+(BlockTabSeparated|CSV|CSVWithNames|JSON|JSONCompact|JSONEachRow|Native|Null|Pretty|PrettyCompact|PrettyCompactMonoBlock|PrettyNoEscapes|PrettyCompactNoEscapes|PrettySpaceNoEscapes|PrettySpace|RowBinary|TabSeparated|TabSeparatedRaw|TabSeparatedWithNames|TabSeparatedWithNamesAndTypes|TSKV|Values|Vertical|XML)/i;
should have a \b on each side of the parentheses group, otherwise e.g. JSON is matched instead of JSONEachRow.

So:

 var formatRegexp = /FORMAT\s+\b(BlockTabSeparated|CSV|CSVWithNames|JSON|JSONCompact|JSONEachRow|Native|Null|Pretty|PrettyCompact|PrettyCompactMonoBlock|PrettyNoEscapes|PrettyCompactNoEscapes|PrettySpaceNoEscapes|PrettySpace|RowBinary|TabSeparated|TabSeparatedRaw|TabSeparatedWithNames|TabSeparatedWithNamesAndTypes|TSKV|Values|Vertical|XML)\b/i; 

So options.format ended up being set as JSON whatever I was doing.

Suggest to separate write and read from module clickhouse.js

From a real-world production scenario, we've found the current implementation of httpRequest in clickhouse.js is a bit too complicated. It handles both reads (select) and writes (insert). That makes the httpRequest part too hard to be extended and brings some limitations.
e.g.

  1. No way to buffer the write response by setting wait_end_of_query. This causes the whole system to fail to handle clickhouse server data-parsing error.
  2. No way to read back response.headers['X-ClickHouse-Summary'] without hacking into the source code of this module.
  3. No way to fetch X-ClickHouse-Progress upgrades when reading back a large chunk of data.

Thus, I'd like to suggest to separate write and read from module clickhouse.js for their significant different behaviours.

Cannot read property 'elapsed' of undefined

When doing a simple SELECT * FROM db.table alerts the above error. I've checked ClickHouse error logs and see no errors there so it must be an error from the node-clickhouse package

How to join tables by range of number

Hi, is there a way to do this kind of join in Clickhouse?

select * from tableA a
left join tableB b on a.mynumber between b.rangefrom and b.rangeto

Thanks.

handle the data failed when use `with totals` in query

hi

It failed when using with totals in query

version v1.6.2

sql

 select sum(1) from tables group by name with totals

query options

{
 dataObjects: true,
 syncParser: true
}

output

Unknown error field: SyntaxError: Unexpected token C in JSON at position 2478

[2020-04-13 09:48:58.949 +0800] ERROR (6707 on psnails-MacBook-Pro.local):
    err: {
      "type": "Error",
      "message": "SyntaxError: Unexpected token C in JSON at position 2478",
      "stack":
          Error: SyntaxError: Unexpected token C in JSON at position 2478
              at parseError (/Users/psnail/Documents/thunder/projects/abacus/abacus-merchant-desktop/node_modules/@apla/clickhouse/src/parse-error.js:2:15)
              at errorHandler (/Users/psnail/Documents/thunder/projects/abacus/abacus-merchant-desktop/node_modules/@apla/clickhouse/src/clickhouse.js:29:13)
              at IncomingMessage.<anonymous> (/Users/psnail/Documents/thunder/projects/abacus/abacus-merchant-desktop/node_modules/@apla/clickhouse/src/clickhouse.js:160:12)
              at IncomingMessage.emit (events.js:323:22)
              at endReadableNT (_stream_readable.js:1204:12)
              at processTicksAndRejections (internal/process/task_queues.js:84:21)
    }

is it a bug or is there any option to handle it?

thanks

node-clickhouse fails silently when used with "pipeline".

It's a common thing when dealing with streams to wrap them with a pipeline like so:

const { pipeline } = require('stream');
const { promisify } = require('util');
const ClickHouse = require('@apla/clickhouse');

const ch = new ClickHouse({ host, port, user, password });

let input = SOME_INPUT_STREAM;
let output = ch.query("SOME_QUERY");

await promisify(pipeline)([
    input,
    output
]);

There is a big issue here when using @apla/clickhouse, because in case of error the output stream will emit the finish event before the error event, so the promise will resolve (ie. will not reject).

So basically using @apla/clickhouse with pipeline will cause errors to fail silently!

To fix this issue the stream created by @apla/clickhouse query function should finish after having errored I guess.

result is empty when using union all sql

hi

I execute this sql

select a from table1
union all 
select a from table2

the result will be empty and has no meta field in result

but I execute sql

select * from (
  select a from table1
  union all 
  select a from table2
)

it works

bug or something wrong I handle?

thanks

Unable to parse CSV

I am testing out bulk CSV inserts.

CREATE TABLE ch_test (
  `id` UInt64,
  `trans_date` DateTime,
  `sign` Int8
) ENGINE = CollapsingMergeTree(sign)

CSV Data:

1,"2019-04-17 04:15:41",1
2,"2019-04-18 04:15:41",1
3,"2019-04-19 04:15:41",1

Following is the error i am getting:
Error: Cannot parse input: expected \t before: ,"2019-04-17 04:15:41",1\n2,"2019-04-18 04:15:41",1\n3,"2019-04-19 04:15:41",1\n: (at row 1)Row 1:Column 0, name: id, type: UInt64, parsed text: "1"ERROR: garbage after UInt64: ",<DOUBLE QUOTE>2019-04-"(version 19.13.3.26 (official build))

Code snippet:

let statement = 'INSERT INTO ch_test';
var csvStream = fs.createReadStream (csvPath);
var clickhouseStream = ch.query (statement, {inputFormat: 'CSV'});
csvStream.pipe (clickhouseStream);

Why is the libary forcing me to use \t(TSV) ?
Any way to resolve this using CSV?

Define Enum type with xml config

Hello!

I would like to define Enum8 as an attribute type. I'v tried with this markup:

<attribute>
  <name>any_name</name>
  <type>Enum8('first'=1,'second'=2)</type>
  <null_value/>
</attribute>

but got an error:

Error: Unknown type Enum8('first'=1,'second'=2)

How could I define Enum type with xml config? I'v not found the appropriate docs.

Thanks!

click house auth problem

when I used password with '@' or '#', I got auth failed with wrong password, maybe the password not need encodeURIComponent?

Error: socket hang up

When I try to insert a very big file (23 GB) using , it always return a error:

{ Error: socket hang up
    at createHangUpError (_http_client.js:331:15)
    at Socket.socketOnEnd (_http_client.js:423:23)
    at emitNone (events.js:111:20)
    at Socket.emit (events.js:208:7)
    at endReadableNT (_stream_readable.js:1064:12)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)
    at process._tickCallback (internal/process/next_tick.js:180:9) code: 'ECONNRESET' }

My insert code:

var tsvStream = fs.createReadStream ('data.tsv');  // file size of data.tsv is 23 GB
var clickhouseStream = clickHouse.query ('INSERT INTO test_table', {inputFormat: 'TSV'});
tsvStream.pipe (clickhouseStream);

How to resolve this issue?
And I wonder is there throttling in streaming insert?

'end' event is not firing

Hello guys. I'm trying to implement some inserting to clickhouse app using you library.

chStream = ch.query('INSERT INTO megas.xovis_events', { format: 'JSONEachRow' })
chStream.on('error', err => {
     console.log(err)
     this.nackAll()
     chStream.end()
})
chStream.on('end', e => {
     this.ackAll()
     console.log('100 batch insertion done ' + new Date())
})
chStream.write(message)
chStream.end()

So after I end stream manually 'end' event has not being fired for some reason. Do you have any ideas? I can provide any additional info you need.

How to pass the argument

clickhouse-client --input_format_allow_errors_num=1
--input_format_allow_errors_ratio=0.1
--query="INSERT INTO test.formats_test FORMAT TSV"

how to pass the --input_format_allow_errors_ratio

Deprecate useless `format` option

The problem:

Clickhouse understands the FORMAT statement which is accepted for both read (SELECT/DESCRIBE/SHOW/etc) and write (INSERT) kinds of queries.

That means format public option is redundant because we'll always have to parse FORMAT value from query.

For now, we have several issues related to it:

  • Hard to understand what to use:
    1. ch.query('INSERT INTO table FORMAT CSV') as described in CH docs

    Has #25 bug

    1. ch.query ('INSERT INTO testTable', {format: 'CSV'}) as supported by node-clickhouse

    Not even described in readme

    1. ch.query ('INSERT INTO testTable FORMAT CSV', {format: 'CSV'}) as a panic solution

    No comments

    1. ch.query ('INSERT INTO testTable', {inputFormat: 'CSV'}) as described by node-clickhouse readme

    Not working

  • We have a bug #25 : query aren't separated from data when using ch.query('INSERT INTO table FORMAT CSV') (Fix #39 )
  • Our readme refers to inputFormat option, which isn't implemented.

Proposed solution:

  • Mark format option as deprecated and remove it from public options later
  • Improve handling of FORMAT statement
  • Get rid of unimplemented inputFormat references (It useless too)

@apla are you agree with this solution?
I can add deprecation warning to master, for current 1+ releases and create 2.0.0 branch with deprecation.

Is there a way to run a SELECT with external data through Apla?

Hi,

I'm trying to encode a query that uses external data. Here's an example. 'values' are external data and must be supplied as form data.

SELECT number FROM numbers(20) WHERE number IN values

It's pretty easy to do this with node-fetch or similar request libraries. You just put the query and values_structure parameters in the URL and put the file in as form data. Here's an example using node-fetch.

// Nab required modules. 
const fetch = require('node-fetch');
const fs = require('fs');
const FormData = require('form-data');

// Pull in a file with the external values. 
const formData = new FormData();
formData.append('values', fs.createReadStream('values.tsv'));

// Build the query URL. 
query = 'SELECT number FROM numbers(20) WHERE number IN values'
values_struct = 'value Int32'
url = "http://localhost:8123?query=" + query + "&values_structure=" + values_struct;

fetch(url, {
    method: 'POST',
    body: formData
})
  .then(response => response.text())
  .then(text => console.log(text))
  .catch(err => console.log(err))

Is there a way to do this in Apla? It seems Apla wants to post the query via the request body, which conflicts with using form data for external data files. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.