blynk-technologies / clickhouse4j Goto Github PK

View Code? Open in Web Editor NEW

208.0 6.0 38.0 1.15 MB

Lighter and faster alternative for the official ClickHouse JDBC driver

License: Other

Java 100.00%

jdbc-driver java

clickhouse4j's Introduction

Clickhouse4j - lighter and faster alternative for the official ClickHouse JDBC driver

The main differences between this and the official driver are:

Removed Guava, Jackson and Apache Http Client dependencies;
Smaller size - 850kb vs 5.6mb of the original driver (7x smaller jar size)
A bunch of micro optimizations were applied (for example, batch inserts are now 40% faster)
CopyManager added;
Support for JSON, JSONCompact select;
Compiled against Java 8 and many other things

Usage

<dependency>
    <groupId>cc.blynk.clickhouse</groupId>
    <artifactId>clickhouse4j</artifactId>
    <version>1.4.4</version>
</dependency>

CopyManager usage

CopyManager is utility class that helps to read / write the queries from / to the file/stream/reader.

Select from DB to File

String query = "SELECT * from copy_manager_test.my_table FORMAT CSVWithNames";
Path outputFile = ...;

try (CopyManager copyManager = CopyManagerFactory.create(dataSource)) {
    copyManager.copyFromDb(query, outputFile);
}
//outputFile now has all the data and headers from the copy_manager_test DB and my_table table

Select from DB to File with prepared statement

try (Connection connection = dataSource.getConnection();
     PreparedStatement ps = connection.prepareStatement(sql);
     CopyManager copyManager = CopyManagerFactory.create(connection)) {
        ps.setLong(1, id);
        copyManager.copyFromDb(ps, outputStream);
}

Insert from File to DB

String query = "INSERT INTO copy_manager_test.my_table FORMAT CSV";
Path inputFile = ...;

try (CopyManager copyManager = CopyManagerFactory.create(dataSource)) {
    copyManager.copyToDb(query, inputFile);
}

//DB copy_manager_test and my_table table now has all csv data from the inputFile

Select as JSON

ResultSet rs = connection.createStatement().executeQuery("SELECT * FROM test.my_table FORMAT JSON");
if (rs.next()) {
    return rs.getString("json");
}

//respone example:

{
	"meta":
	[
		{
			"name": "created",
			"type": "DateTime"
		},
		{
			"name": "value",
			"type": "Int32"
		}
	],

	"data":
	[
		{
			"created": "2019-11-17 11:31:22",
			"value": 1
		},
		{
			"created": "2019-11-17 11:31:22",
			"value": 2
		}
	],

	"rows": 2,

	"statistics":
	{
		"elapsed": 0.000312306,
		"rows_read": 2,
		"bytes_read": 16
	}
}

Migration from the official driver

All you need to do is replace:

ru.yandex.clickhouse.ClickHouseDriver to cc.blynk.clickhouse.ClickHouseDriver

URL syntax: jdbc:clickhouse://<host>:<port>[/<database>], e.g. jdbc:clickhouse://localhost:8123/test

JDBC Driver Class: cc.blynk.clickhouse.ClickHouseDriver

additionally, if you have a few instances, you can use BalancedClickhouseDataSource.

Build requirements

In order to build the jdbc client one needs to have jdk 1.8 or higher.

Compiling with maven

mvn package -DskipTests=true

To build a jar with dependencies use

mvn package assembly:single -DskipTests=true

clickhouse4j's People

Contributors

Stargazers

Watchers

Forkers

jscrdev rzwitserloot kaijianding forestlzj viktor-krasikov yonatang clappp yourfrienddhruv stardustxiaot mmsmdali callmelittileyoung laashub-sda clooney2007 shenghuitan imperio-wxm zhbdesign xuze1993 frankswu clickhouse-java muskanmahajan37 help-lixin wolfboys johnhelsing wengraver zp315 davinqi kbill4345 doytsujin timbotetsu rakhithjk alexey-milovidov esonzhang synpore xzl1000 blynk-fan sai8charan suryatmodulus

clickhouse4j's Issues

Add module-info.java to make driver fully compatible with java 9

Add cache for the queries

Typical jdbc driver has embedded LRU cache for queries, so queries in most of the apps just parsed once and than reused. In that case parser optimization could be avoided.

it's ok to write data to local table. but error writing to distributed table

error code

scala> :paste
// Entering paste mode (ctrl-D to finish)

      lv1.repartition(30)
          .write
          .mode(SaveMode.Append)
          .format("jdbc")
          .option("url", ckUrlDa)
          .option("driver", driverClass4j)
          //.option("numPartitions", "1")
          .option("dbtable", "t_md_page_path_di_rep")
          .option("user", userMe)
          .option("password", passwordMe)
          .option("fetchsize", fetchSize)
          .option("batchsize", batchSize)
          .save()

// Exiting paste mode, now interpreting.


scala> :paste
// Entering paste mode (ctrl-D to finish)

      lv1.repartition(30)
          .write
          .mode(SaveMode.Append)
          .format("jdbc")
          .option("url", ckUrlDa)
          .option("driver", driverClass4j)
          //.option("numPartitions", "1")
          .option("dbtable", "t_md_page_path_di_dis")
          .option("user", userMe)
          .option("password", passwordMe)
          .option("fetchsize", fetchSize)
          .option("batchsize", batchSize)
          .save()

// Exiting paste mode, now interpreting.

cc.blynk.clickhouse.except.ClickHouseException: ClickHouse exception, code: 62, host: null, port: 0; Code: 62, e.displayText() = DB::Exception: Syntax error: failed at position 159: NOT NULL, "node_depth" INTEGER NOT NULL, "page_id_lv1" TEXT , "page_id_lv2" TEXT NOT NULL, "page_id_lv3" TEXT NOT NULL, "page_id_lv4" TEXT NOT NULL, "page_id_lv. Expected one of: DEFAULT, MATERIALIZED, ALIAS, COMMENT, CODEC, TTL, token, ClosingRoundBracket, Comma (version 19.15.3.6 (official build))

  at cc.blynk.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:61)
  at cc.blynk.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:31)
  at cc.blynk.clickhouse.http.DefaultHttpConnector.checkForErrorAndThrow(DefaultHttpConnector.java:296)
  at cc.blynk.clickhouse.http.DefaultHttpConnector.sendPostRequest(DefaultHttpConnector.java:175)
  at cc.blynk.clickhouse.http.DefaultHttpConnector.post(DefaultHttpConnector.java:62)
  at cc.blynk.clickhouse.http.DefaultHttpConnector.post(DefaultHttpConnector.java:74)
  at cc.blynk.clickhouse.ClickHouseStatementImpl.sendRequest(ClickHouseStatementImpl.java:674)
  at cc.blynk.clickhouse.ClickHouseStatementImpl.executeUpdate(ClickHouseStatementImpl.java:198)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:692)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:89)
  at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:518)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:218)
  ... 59 elided
Caused by: java.lang.Throwable: Code: 62, e.displayText() = DB::Exception: Syntax error: failed at position 159: NOT NULL, "node_depth" INTEGER NOT NULL, "page_id_lv1" TEXT , "page_id_lv2" TEXT NOT NULL, "page_id_lv3" TEXT NOT NULL, "page_id_lv4" TEXT NOT NULL, "page_id_lv. Expected one of: DEFAULT, MATERIALIZED, ALIAS, COMMENT, CODEC, TTL, token, ClosingRoundBracket, Comma (version 19.15.3.6 (official build))

  at cc.blynk.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:56)
  ... 70 more

Provide a way to set values for IN clause

ClickHouse/clickhouse-java#309

Optimize query_id generation and add control to turn it off

Allow to download csv files?

Current flow:

select x, y from z;
var entry = new EntryDTO(x, y);
file.write(entry.toCSV())

Would be nice to have some kind of CopyManager analog that is present in postgres driver and download files directly instead of creating intermediate objects.

How can I implement batch insert ?

I am new to clickhouse and clickhouse4j.
And I can't see any helpful document about batch insert.

I think more example codes would be very helpful.
Thanks.

Avoid regexp when parsing the query

Cover with test to/from UInt64

https://github.com/blynkkk/clickhouse4j/pull/15/files

Update dependencies to the latest versions and move version to properties

Add support Clickhouse Settings to the CopyManager

Add settings support to Copy Manager.

Link to settings list.

Replace SimpleDateFormat with DateTimeFormatter

Improve CopyManager usability

It would be nice if CopyManager could accept statement or preparedStatement.
Right now I have to use it like this:

copyManager.copyFromDb(((ClickHousePreparedStatement) ps).asSql(), zipOutputStream);

Use primitive types for clickhouse errors

Allocate less for Utils.startsWithIgnoreCase

Support FORMAT JSON type for the selects

JSON is very crucial for high perforance applications. Right now driver doesn't support this. We need to fix.

CopyManager should extend Closeable

Current approach:

try (Connection connection = dataSource.getConnection();
    OutputStream outputStream = Files.newOutputStream(outputFile, TRUNCATE_EXISTING)) {
    CopyManager copyManager = CopyManagerFactory.create(connection); //lightweight
    copyManager.copyFromDb(query, outputStream);
}

Could be improved with:

try ( CopyManager copyManager = CopyManagerFactory.create(dataSource.getConnection()); 
      OutputStream outputStream = Files.newOutputStream(outputFile, TRUNCATE_EXISTING)) {
    copyManager.copyFromDb(query, outputStream);
}

Support FORMAT JSONCompact type for the selects

Hi,
We are testing ClickHouse with your connector and you did a great job with it. But we have a problem. When we run service in AWS as docker and try to insert a lot of records (2 billion) I don't know what is happening but connector gradually uses all ports in docker and never release it.

I try to close every connection in try catch and get a new connection for every batch insert. And I try to get new connection only if it was closed it was better because service uses fewer ports but still it was thousands of ports.

...
           try( final Connection connection = dataSource.getConnection();
            final PreparedStatement preparedStatement = connection.prepareStatement(
                    ActivitySqlGenerator.createInsertPreparedStatement(getDbAndTableName(), QuoteType.BACK_QUOTE)));
{
         connection.setAutoCommit(false);

        for (Activity activity: activities) {
            ActivitySqlGenerator.bindInsert(1, preparedStatement, activity);
            preparedStatement.addBatch();
        }

        preparedStatement.executeBatch();
        preparedStatement.clearBatch();
        connection.commit();

        } catch (Exception e) {
            logger.info("Failed to bulk insert.", e);
        }
...

    public ClickHouseConnection getConnection() throws SQLException {
        if (connection == null || connection.isClosed()) {
            this.connection = dataSource.getConnection();
        }
        return this.connection;
    }

...
        try
        {
            final Connection connection = this.getConnection();
            final PreparedStatement preparedStatement = connection.prepareStatement(
                    ActivitySqlGenerator.createInsertPreparedStatement(getDbAndTableName(), QuoteType.BACK_QUOTE));

        connection.setAutoCommit(false);

        for (Activity activity: activities) {
            ActivitySqlGenerator.bindInsert(1, preparedStatement, activity);
            preparedStatement.addBatch();
        }

        preparedStatement.executeBatch();
        preparedStatement.clearBatch();
        connection.commit();

        } catch (Exception e) {
            logger.info("Failed to bulk insert.", e);
        }
...

Can someone help? Is It known problem or do I something wrong?

Move to java 11

Remove CSV Manager

Right now it only things more complex (javadocs, examples, tets). At the moment we don't need this specialization.

Update all maven plugins in order to build the project

Do not send ClickHouseQueryParam.EXTREMES as query param for every query

Should be int, not boolean

error autorefresh tableau

java.lang.UnsupportedOperationException

Add more overloaded methods for CopyManager for files

Release to MVN Central

Remove param descriptions to minimize driver size.

Every property should be part of driver documentation, not code.

Inline StringUtils.isBlank() check

Cleanup ClickHouseProperties

Current ClickHouseProperties is a mess, we need to refactor it and make the usage simpler.

how should I handle null value from spark

Caused by: java.lang.Throwable: Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected \t before: \\N\t1\t50\t\\N\t\\N\t\\N\t\\N\t9139\t6916\t9118\t482856\n2019-10-03\tv X21UD A\t1\t3.4.3.3\t10\t\\N\t1\t52\t\\N\t\\N\t\\N\t\\N\t195\t136\t180\t1261641\n2019-10-03\tv Y66i A\t1\t2.8.1.1\t15\t\\N\t1\t: (at row 1)

It seems that clickhouse4j parsed null value in spark as \\N

Add checkstyle

Implement driver level enums

can not parsed quoted string

I want to interst Array(String) type value into a table.
but it failed:

Cannot parse quoted string: expected opening quote: (at row 1)

['001|002|003','001|002|003']

but i can insert it successfully by cli

Remove jackson dependency

Lz4 should use the fastest instance (current safe instance is used)

Implement dynamic IN CLAUSE

Problem:

String query = "... where id in (?)";

statement.setObject(listOfIds);

Result:

String query = "... where id in ([1,2,3,4])";

Should be:

String query = "... where id in (1,2,3,4)";

java.lang.OutOfMemoryError
  at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
  at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
  at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
  at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
  at cc.blynk.clickhouse.util.guava.StreamUtils.copy(StreamUtils.java:70)
  at cc.blynk.clickhouse.http.DefaultHttpConnector.sendPostRequest(DefaultHttpConnector.java:182)
  at cc.blynk.clickhouse.http.DefaultHttpConnector.post(DefaultHttpConnector.java:62)
  at cc.blynk.clickhouse.http.DefaultHttpConnector.post(DefaultHttpConnector.java:74)
  at cc.blynk.clickhouse.ClickHouseStatementImpl.sendRequest(ClickHouseStatementImpl.java:674)
  at cc.blynk.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:168)
  at cc.blynk.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:150)
  at cc.blynk.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:137)
  at cc.blynk.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:88)
  ... 59 elided

Thanks very much.