Giter Site home page Giter Site logo

clickhouse4j's Introduction

Clickhouse4j - lighter and faster alternative for the official ClickHouse JDBC driver

Maven Central clickhouse4j

The main differences between this and the official driver are:

  • Removed Guava, Jackson and Apache Http Client dependencies;
  • Smaller size - 850kb vs 5.6mb of the original driver (7x smaller jar size)
  • A bunch of micro optimizations were applied (for example, batch inserts are now 40% faster)
  • CopyManager added;
  • Support for JSON, JSONCompact select;
  • Compiled against Java 8 and many other things

Usage

<dependency>
    <groupId>cc.blynk.clickhouse</groupId>
    <artifactId>clickhouse4j</artifactId>
    <version>1.4.4</version>
</dependency>

CopyManager usage

CopyManager is utility class that helps to read / write the queries from / to the file/stream/reader.

Select from DB to File
String query = "SELECT * from copy_manager_test.my_table FORMAT CSVWithNames";
Path outputFile = ...;

try (CopyManager copyManager = CopyManagerFactory.create(dataSource)) {
    copyManager.copyFromDb(query, outputFile);
}
//outputFile now has all the data and headers from the copy_manager_test DB and my_table table
Select from DB to File with prepared statement
try (Connection connection = dataSource.getConnection();
     PreparedStatement ps = connection.prepareStatement(sql);
     CopyManager copyManager = CopyManagerFactory.create(connection)) {
        ps.setLong(1, id);
        copyManager.copyFromDb(ps, outputStream);
}
Insert from File to DB
String query = "INSERT INTO copy_manager_test.my_table FORMAT CSV";
Path inputFile = ...;

try (CopyManager copyManager = CopyManagerFactory.create(dataSource)) {
    copyManager.copyToDb(query, inputFile);
}

//DB copy_manager_test and my_table table now has all csv data from the inputFile
Select as JSON
ResultSet rs = connection.createStatement().executeQuery("SELECT * FROM test.my_table FORMAT JSON");
if (rs.next()) {
    return rs.getString("json");
}

//respone example:

{
	"meta":
	[
		{
			"name": "created",
			"type": "DateTime"
		},
		{
			"name": "value",
			"type": "Int32"
		}
	],

	"data":
	[
		{
			"created": "2019-11-17 11:31:22",
			"value": 1
		},
		{
			"created": "2019-11-17 11:31:22",
			"value": 2
		}
	],

	"rows": 2,

	"statistics":
	{
		"elapsed": 0.000312306,
		"rows_read": 2,
		"bytes_read": 16
	}
}

Migration from the official driver

All you need to do is replace:

ru.yandex.clickhouse.ClickHouseDriver to cc.blynk.clickhouse.ClickHouseDriver

URL syntax: jdbc:clickhouse://<host>:<port>[/<database>], e.g. jdbc:clickhouse://localhost:8123/test

JDBC Driver Class: cc.blynk.clickhouse.ClickHouseDriver

additionally, if you have a few instances, you can use BalancedClickhouseDataSource.

Build requirements

In order to build the jdbc client one needs to have jdk 1.8 or higher.

Compiling with maven

mvn package -DskipTests=true

To build a jar with dependencies use

mvn package assembly:single -DskipTests=true

clickhouse4j's People

Contributors

alex-krash avatar alexey-milovidov avatar amaslak avatar arkhipov avatar bgranvea avatar bluedogseyes avatar dcastanier avatar dmitriy-sych avatar doom369 avatar dysosmus avatar egorlitvinenko avatar enqueue avatar ezhaka avatar genuss avatar h6ak avatar hamsterready avatar ignatberesnev avatar izebit avatar jkee avatar kanterov avatar meegooo avatar peturrun avatar robincacou avatar serebrserg avatar smagellan avatar starwix avatar tkroman avatar tobia avatar tsl-karlp avatar zgmnkv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

clickhouse4j's Issues

Add cache for the queries

Typical jdbc driver has embedded LRU cache for queries, so queries in most of the apps just parsed once and than reused. In that case parser optimization could be avoided.

it's ok to write data to local table. but error writing to distributed table

error code

scala> :paste
// Entering paste mode (ctrl-D to finish)

      lv1.repartition(30)
          .write
          .mode(SaveMode.Append)
          .format("jdbc")
          .option("url", ckUrlDa)
          .option("driver", driverClass4j)
          //.option("numPartitions", "1")
          .option("dbtable", "t_md_page_path_di_rep")
          .option("user", userMe)
          .option("password", passwordMe)
          .option("fetchsize", fetchSize)
          .option("batchsize", batchSize)
          .save()

// Exiting paste mode, now interpreting.


scala> :paste
// Entering paste mode (ctrl-D to finish)

      lv1.repartition(30)
          .write
          .mode(SaveMode.Append)
          .format("jdbc")
          .option("url", ckUrlDa)
          .option("driver", driverClass4j)
          //.option("numPartitions", "1")
          .option("dbtable", "t_md_page_path_di_dis")
          .option("user", userMe)
          .option("password", passwordMe)
          .option("fetchsize", fetchSize)
          .option("batchsize", batchSize)
          .save()

// Exiting paste mode, now interpreting.

cc.blynk.clickhouse.except.ClickHouseException: ClickHouse exception, code: 62, host: null, port: 0; Code: 62, e.displayText() = DB::Exception: Syntax error: failed at position 159: NOT NULL, "node_depth" INTEGER NOT NULL, "page_id_lv1" TEXT , "page_id_lv2" TEXT NOT NULL, "page_id_lv3" TEXT NOT NULL, "page_id_lv4" TEXT NOT NULL, "page_id_lv. Expected one of: DEFAULT, MATERIALIZED, ALIAS, COMMENT, CODEC, TTL, token, ClosingRoundBracket, Comma (version 19.15.3.6 (official build))

  at cc.blynk.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:61)
  at cc.blynk.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:31)
  at cc.blynk.clickhouse.http.DefaultHttpConnector.checkForErrorAndThrow(DefaultHttpConnector.java:296)
  at cc.blynk.clickhouse.http.DefaultHttpConnector.sendPostRequest(DefaultHttpConnector.java:175)
  at cc.blynk.clickhouse.http.DefaultHttpConnector.post(DefaultHttpConnector.java:62)
  at cc.blynk.clickhouse.http.DefaultHttpConnector.post(DefaultHttpConnector.java:74)
  at cc.blynk.clickhouse.ClickHouseStatementImpl.sendRequest(ClickHouseStatementImpl.java:674)
  at cc.blynk.clickhouse.ClickHouseStatementImpl.executeUpdate(ClickHouseStatementImpl.java:198)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:692)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:89)
  at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:518)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:218)
  ... 59 elided
Caused by: java.lang.Throwable: Code: 62, e.displayText() = DB::Exception: Syntax error: failed at position 159: NOT NULL, "node_depth" INTEGER NOT NULL, "page_id_lv1" TEXT , "page_id_lv2" TEXT NOT NULL, "page_id_lv3" TEXT NOT NULL, "page_id_lv4" TEXT NOT NULL, "page_id_lv. Expected one of: DEFAULT, MATERIALIZED, ALIAS, COMMENT, CODEC, TTL, token, ClosingRoundBracket, Comma (version 19.15.3.6 (official build))

  at cc.blynk.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:56)
  ... 70 more

Allow to download csv files?

Current flow:

select x, y from z;
var entry = new EntryDTO(x, y);
file.write(entry.toCSV())

Would be nice to have some kind of CopyManager analog that is present in postgres driver and download files directly instead of creating intermediate objects.

How can I implement batch insert ?

I am new to clickhouse and clickhouse4j.
And I can't see any helpful document about batch insert.

I think more example codes would be very helpful.
Thanks.

Improve CopyManager usability

It would be nice if CopyManager could accept statement or preparedStatement.
Right now I have to use it like this:

copyManager.copyFromDb(((ClickHousePreparedStatement) ps).asSql(), zipOutputStream);

CopyManager should extend Closeable

Current approach:

try (Connection connection = dataSource.getConnection();
    OutputStream outputStream = Files.newOutputStream(outputFile, TRUNCATE_EXISTING)) {
    CopyManager copyManager = CopyManagerFactory.create(connection); //lightweight
    copyManager.copyFromDb(query, outputStream);
}

Could be improved with:

try ( CopyManager copyManager = CopyManagerFactory.create(dataSource.getConnection()); 
      OutputStream outputStream = Files.newOutputStream(outputFile, TRUNCATE_EXISTING)) {
    copyManager.copyFromDb(query, outputStream);
}

Client service uses all ports

Hi,
We are testing ClickHouse with your connector and you did a great job with it. But we have a problem. When we run service in AWS as docker and try to insert a lot of records (2 billion) I don't know what is happening but connector gradually uses all ports in docker and never release it.

I try to close every connection in try catch and get a new connection for every batch insert. And I try to get new connection only if it was closed it was better because service uses fewer ports but still it was thousands of ports.

...
           try( final Connection connection = dataSource.getConnection();
            final PreparedStatement preparedStatement = connection.prepareStatement(
                    ActivitySqlGenerator.createInsertPreparedStatement(getDbAndTableName(), QuoteType.BACK_QUOTE)));
{
         connection.setAutoCommit(false);

        for (Activity activity: activities) {
            ActivitySqlGenerator.bindInsert(1, preparedStatement, activity);
            preparedStatement.addBatch();
        }

        preparedStatement.executeBatch();
        preparedStatement.clearBatch();
        connection.commit();

        } catch (Exception e) {
            logger.info("Failed to bulk insert.", e);
        }
...
    public ClickHouseConnection getConnection() throws SQLException {
        if (connection == null || connection.isClosed()) {
            this.connection = dataSource.getConnection();
        }
        return this.connection;
    }

...
        try
        {
            final Connection connection = this.getConnection();
            final PreparedStatement preparedStatement = connection.prepareStatement(
                    ActivitySqlGenerator.createInsertPreparedStatement(getDbAndTableName(), QuoteType.BACK_QUOTE));

        connection.setAutoCommit(false);

        for (Activity activity: activities) {
            ActivitySqlGenerator.bindInsert(1, preparedStatement, activity);
            preparedStatement.addBatch();
        }

        preparedStatement.executeBatch();
        preparedStatement.clearBatch();
        connection.commit();

        } catch (Exception e) {
            logger.info("Failed to bulk insert.", e);
        }
...

Can someone help? Is It known problem or do I something wrong?

Remove CSV Manager

Right now it only things more complex (javadocs, examples, tets). At the moment we don't need this specialization.

how should I handle null value from spark

Caused by: java.lang.Throwable: Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected \t before: \\N\t1\t50\t\\N\t\\N\t\\N\t\\N\t9139\t6916\t9118\t482856\n2019-10-03\tv X21UD A\t1\t3.4.3.3\t10\t\\N\t1\t52\t\\N\t\\N\t\\N\t\\N\t195\t136\t180\t1261641\n2019-10-03\tv Y66i A\t1\t2.8.1.1\t15\t\\N\t1\t: (at row 1)

It seems that clickhouse4j parsed null value in spark as \\N

can not parsed quoted string

I want to interst Array(String) type value into a table.
but it failed:

Cannot parse quoted string: expected opening quote: (at row 1)
['001|002|003','001|002|003']

but i can insert it successfully by cli

Implement dynamic IN CLAUSE

Problem:

String query = "... where id in (?)";

statement.setObject(listOfIds);

Result:

String query = "... where id in ([1,2,3,4])";

Should be:

String query = "... where id in (1,2,3,4)";

how to solve large amount of query result ?

how to solve large amount of query result ?
my query result is huge, and get a java.lang.OutOfMemoryError , how can I solve it?

java.lang.OutOfMemoryError
  at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
  at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
  at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
  at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
  at cc.blynk.clickhouse.util.guava.StreamUtils.copy(StreamUtils.java:70)
  at cc.blynk.clickhouse.http.DefaultHttpConnector.sendPostRequest(DefaultHttpConnector.java:182)
  at cc.blynk.clickhouse.http.DefaultHttpConnector.post(DefaultHttpConnector.java:62)
  at cc.blynk.clickhouse.http.DefaultHttpConnector.post(DefaultHttpConnector.java:74)
  at cc.blynk.clickhouse.ClickHouseStatementImpl.sendRequest(ClickHouseStatementImpl.java:674)
  at cc.blynk.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:168)
  at cc.blynk.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:150)
  at cc.blynk.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:137)
  at cc.blynk.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:88)
  ... 59 elided

Thanks very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.