Giter Site home page Giter Site logo

hetu-core's Introduction

Overview

openLooKeng is a drop in engine which enables in-situ analytics on any data, anywhere, including geographically remote data sources. It provides a global view of all of your data via its SQL 2003 interface. With high availability, auto-scaling, built-in caching and indexing support, openLooKeng is ready for enterprise workload with required reliability.

The goal of openLooKeng is to support data exploration, ad hoc queries, and batch processing with near real time latency ranging from 100+ms to minutes, without moving your data around. openLooKeng also supports hierarchical deployment enabling geographically remote openLooKeng clusters to participate in the same query. With its cross region query plan optimization capability, queries involving remote data can achieve close to "local" performance.

Application Scenarios

Cross-Source Heterogeneous Query Scenario

Data management systems like RDBMS (such as MySQL and Oracle) and NoSQL (such as HBase, ES, and Kafka) are widely used in various application systems of customers. With the increase of data volume and better data management, customers gradually build data warehouses based on Hive or MPPDB. These data storage systems are often isolated from each other, resulting in independent data islands. Data analysts often suffer from the following problems:

  1. Unaware of where and how to use the data you want, you cannot build a new service model based on massive data.
  2. Querying various data sources requires different connection modes or clients and running different SQL dialects. These differences cause extra learning costs and complex application development logic.
  3. If data is not aggregated, federated query cannot be performed on data of different systems.

openLooKeng can be used to implement joint query of data warehouses such as RDBMS, NoSQL, Hive, and MPPDB. With the cross-source heterogeneous query capability of openLooKeng, data analysts can quickly analyze massive data.

Cross-domain and cross-DC query

In a two-level or multi-level data center scenario, for example, a province-city data center or a headquarters-branch data center, users often need to query data from the provincial (headquarters) data center or municipal (branch) data center, the bottleneck of cross-domain query is the network problems (such as insufficient bandwidth, long delay, and packet loss) between multiple data centers. As a result, the query delay is long and the performance is unstable. openLooKeng is a cross-domain and cross-DC solution designed for cross-domain queries. The openLooKeng cluster is deployed in multiple DCs. After the openLooKeng cluster in DC2 completes computing, the result is transmitted to the openLooKeng cluster in DC1 through the network, aggregation calculation is completed in the openLooKeng cluster of DC1. In the cross-domain and cross-DC openLooKeng solution, calculation results are transmitted between openLooKeng clusters. This avoids network problems caused by insufficient network bandwidth and packet loss, and solves the cross-domain query problem to some extent.

Separated Storage & Compute

openLooKeng itself does not have a storage engine, but it can query data stored in different data sources. Therefore, the system is a typical storage and computing separated system, which can easily expand the computing and storage resources independently. The openLooKeng storage and computing separation architecture is suitable for dynamic expansion clusters, facilitating quick elastic scaling of resources.

Quick Data Exploration

Customers have a large amount of data. To use the data, they usually build a dedicated data warehouse. However, this will cause extra labor costs for data warehouse maintenance and data ETL time costs. For customers who need to quickly explore data but do not want to build a dedicated data warehouse, it is time-consuming and labor-intensive to replicate data and load the data to the data warehouse. openLooKeng can use standard SQL to define a virtual data mart and connect to each data source with the cross-source heterogeneous query capability. In this way, various analysis tasks that users need to explore can be defined at the semantic layer of the virtual data mart. With the data virtualization capability of openLooKeng, customers can quickly build exploration and analysis services based on various data sources without building complex and dedicated data warehouses.

Key Features

Cross Data Center Connector

A new connector is directly connected to another openLooKeng cluster to implement collaboration between multiple DCs. The key technologies are as follows:

  1. Parallel data access: Data sources are concurrently accessed to improve access efficiency. Clients can concurrently obtain data from the server to accelerate data obtaining.
  2. Data compression: Data is compressed using GZIP compression algorithm before being serialized during data transmission, reducing the amount of data transmitted over the network.
  3. Cross-DC dynamic filtering: Filters data to reduce the amount of data to be pulled from the remote end, ensuring network stability and improving query efficiency.
  4. High availability: In openLooKeng, Coordinator AA is supported. Therefore, you can use a proxy (for example, Nginx) to implement load balancing among Coordinators to achieve high availability.If a coordinator is faulty, the availability of the entire cluster is not affected.

Dynamic Filtering

In the multi-table join scenario with low correlation, most probe side rows are filtered out because they do not match the join conditions after being read. As a result, unnecessary join calculation, I/O read, and network transmission are caused. Dynamic filtering dynamically generates filtering conditions based on join conditions and data read from the build-side table during query running, and applies the filtering conditions to the table scan phase of the probe-side table. This reduces the data volume of the probe table that participates in the join operation, effectively reducing network transmission and improving performance by 30%.

Index

openLooKeng increases query efficiency by creating indexes on existing data and storing the indexes outside the data source. Bitmap is a bitmap index that records binary information. It is applicable to the AND operation and can be quickly queried using a dictionary. Bloom uses a bit array to represent a set, which can quickly determine whether a value (only the equal sign is supported) exists in a set. Min-max records the maximum and minimum values in a file, which is applicable to statements that are greater than or less than the condition.

Cache

openLooKeng creates multiple caches such as metadata cache, execution plan cache, and ORC row data cache in order to improve query performance.

Metadata Cache

All Connectors that use JDBC connections cache metadata during the first query to improve the query performance.

ORC Row Data Cache

For ORC files, the ORC row data cache provides an efficient method to cache frequently accessed data to improve query latency. The administrator is able to create a cache on specific tables and partitions.

Execution Plan Cache

The execution plan is cached after the first query, rather than discarded after each query request, thereby reducing the preprocessing time and resources required for subsequent queries. By caching these plans, time-consuming plan generation steps can be skipped.

Obtaining the Design Document

Click here to obtain the design document.

Next Steps

Developer Guide

hetu-core's People

Contributors

andy12383 avatar armlly avatar ayushgupta3 avatar chenpingzeng avatar danielscorpian avatar debasatwa29 avatar doubledue avatar everypp avatar farhan3 avatar gitzhangjingfang avatar haochending avatar ikishore avatar it-is-a-robot avatar jiwenyu0531 avatar lizheng920625 avatar neerajunnikrishnan avatar nitin-kashyap avatar peiwangdb avatar qiaominna avatar rajeevrastogi avatar sandeepkampati avatar sandygao avatar shuaiwang999 avatar sumanth43 avatar sundarannamalai avatar vinayakumarb avatar vincent-weng avatar wulinzi312 avatar x26guo avatar yuhaijun87 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hetu-core's Issues

The worker page information in the Web interface cannot be loaded correctly!!!

Software Environment:

  • OpenLooKeng version (source or binary):
    1.2.0
  • OS platform & distribution (eg., Linux Ubuntu 16.04):
    CentOS Linux release 7.6.1810 (Core)
  • Java version:
    java version "1.8.0_291"

Describe the current behavior

The information in the Worker interface does not load correctly, as is shown:
image

The corresponding log information is as follows:
2021-05-30T15:49:16.933+0800 WARN http-worker-143 io.prestosql.server.ThrowableMapper Request failed for /v1/worker/c261ff75-b470-4eaa-ae5b-08eb125d3617/status java.nio.channels.AsynchronousCloseException at org.eclipse.jetty.client.util.InputStreamResponseListener$Input.close(InputStreamResponseListener.java:364) at java.io.FilterInputStream.close(FilterInputStream.java:181) at io.airlift.http.client.jetty.JettyHttpClient.execute(JettyHttpClient.java:536) at io.prestosql.server.WorkerResource.proxyJsonResponse(WorkerResource.java:83) at io.prestosql.server.WorkerResource.getStatus(WorkerResource.java:59) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76)

Describe the expected behavior

request worker node can retrieve the status information correctly

Steps to reproduce the issue

1.login hetu "web interface"
2.click "Home" tab, then click the "Run" button in the right window
3.click "Query History"tab, then select one of the queryIDs in the right window
4.on the page that pops up, select the Host link for the Tasks tab

Related log/screenshots

Special notes for this issue

Add reverse function for varbinary

Software Environment:

  • OpenLooKeng version (source or binary): 1.3.0

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

Describe the expected behavior

Support reverse function for varbinary

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

spelling correction in tpch.md

Software Environment:

  • OpenLooKeng version (source or binary): 1.3.0

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior: Spelling mistake of word algorithm in tpch.md

Describe the expected behavior: Corrected spelling mistake of word algorithm in tpch.md

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

Case and Spelling Corrections in Documents

Software Environment:

  • OpenLooKeng version (source or binary): 1.3.0

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

Case & Spelling Mistakes in vdm.md,table-pushdown.md,state-store.md

Describe the expected behavior

Corrected Case & Spelling Mistakes in vdm.md,table-pushdown.md,state-store.md

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

Spelling correction in hetu-docs/en/develop/connectors.md

Software Environment:

  • OpenLooKeng version (source or binary):

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

Describe the expected behavior

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

Spell fix in types.md

Software Environment:

  • OpenLooKeng version (source or binary):

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

unix epoch

Describe the expected behavior

Unix epoch

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

Grammatical correction in vdm.md

Software Environment:

  • OpenLooKeng version (source or binary):

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

there is no and

Describe the expected behavior

added and

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

English language correction in spill.md file

Software Environment:

  • OpenLooKeng version (source or binary): Latest master

  • OS platform & distribution (eg., Linux Ubuntu 16.04): NA

  • Java version:

Describe the current behavior

Describe the expected behavior

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

openLooKeng CLI连接mysql查表日志文件报NullPointerException异常

Software Environment:

  • OpenLooKeng version (source or binary):
    官方在线部署,存在目录/opt/openlookeng/hetu-server-1.0.1/,推测是1.0.1 binary
  • OS platform & distribution (eg., Linux Ubuntu 16.04):
    Linux linvm 5.4.0-72-generic #80~18.04.1-Ubuntu SMP Mon Apr 12 23:26:25 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  • Java version:
    openjdk version "1.8.0_222"
    OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_222-b10)
    OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.222-b10, mixed mode)
  • MySQL version:
    mysqld Ver 5.7.33-0ubuntu0.18.04.1 for Linux on x86_64 ((Ubuntu))

Describe the current behavior

openLooKeng CLI runs command: select * from mysql.pml.user;
It gives the right output, but the log file, /home/openlkadmin/.openlkadmin/logs/server.log, reports NullPointerException.

Describe the expected behavior

The result is normal, but the log file reports java.lang.NullPointerException

Steps to reproduce the issue

  1. cat /opt/openlookeng/hetu-server-1.0.1/etc/catalog/mysql.properties
    connector.name=mysql
    connection-url=jdbc:mysql://localhost:3306
    connection-user=root
    connection-password=root

  2. construct mysql table

DROP DATABASE IF EXISTS pml;
create database pml charset utf8;
use pml;
create table user(`Id` int not null AUTO_INCREMENT primary key, `Name` varchar(20) NOT NULL, `FacePath` varchar(4096) NOT NULL);
insert into user (Id,Name,FacePath) values (1001,'张三','/home/openlkadmin/team-1269929257/data/luoxiang.jpg');
insert into user (Id,Name,FacePath) values (1002,'李四','/home/openlkadmin/team-1269929257/data/luoyonghao.jpg');
  1. bash /opt/openlookeng/bin/auxiliary_tools/launcher.sh restart
  2. bash /opt/openlookeng/bin/openlk-cli
    lk> select * from mysql.pml.user;

Related log/screenshots

2021-04-23T16:43:20.189+0800	INFO	Query-20210423_084320_00008_enn83-613	stderr	Fri Apr 23 16:43:20 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2021-04-23T16:43:20.198+0800	INFO	Query-20210423_084320_00008_enn83-613	stderr	Fri Apr 23 16:43:20 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2021-04-23T16:43:20.251+0800	INFO	20210423_084320_00008_enn83.1.0-0-101	stderr	Fri Apr 23 16:43:20 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2021-04-23T16:43:20.298+0800	ERROR	20210423_084320_00008_enn83.1.0-0-101	io.prestosql.operator.Driver	Error closing operator 0 for task 20210423_084320_00008_enn83.1.0
java.lang.NullPointerException
	at com.mysql.jdbc.MysqlIO.clearInputStream(MysqlIO.java:899)
	at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:172)
	at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:6680)
	at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:851)
	at io.prestosql.plugin.jdbc.JdbcRecordCursor.close(JdbcRecordCursor.java:234)
	at io.prestosql.spi.connector.RecordPageSource.close(RecordPageSource.java:76)
	at io.prestosql.operator.TableScanOperator.finish(TableScanOperator.java:241)
	at io.prestosql.operator.TableScanOperator.close(TableScanOperator.java:230)
	at io.prestosql.operator.Driver.closeAndDestroyOperators(Driver.java:546)
	at io.prestosql.operator.Driver.processInternal(Driver.java:406)
	at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
	at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
	at io.prestosql.operator.Driver.processFor(Driver.java:276)
	at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
	at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
	at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
	at io.prestosql.$gen.Presto_1_0_1____20210423_024616_1.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


2021-04-23T16:43:20.349+0800	INFO	dispatcher-query-22	io.prestosql.event.QueryMonitor	TIMELINE: Query 20210423_084320_00008_enn83 :: Transaction:[28fe24a8-e216-4003-b106-0c27d272fae5] :: elapsed 152ms :: planning 16ms :: waiting 23ms :: scheduling 40ms :: running 69ms :: finishing 27ms :: begin 2021-04-23T16:43:20.186+08:00 :: end 2021-04-23T16:43:20.338+08:00

Special notes for this issue

结果正确,日志提示异常;在openEuler系统上也出现;

spelling correction in kafka-tutorial.md

Software Environment:

  • OpenLooKeng version (source or binary): 1.3.0

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

spelling mistake of word milliseconds in kafka-tutorial.md

Describe the expected behavior

corrected spelling mistake of word milliseconds in kafka-tutorial.md

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

Spelling correction in /hetu-docs/en/migration/from-hive.md file

Software Environment:

  • OpenLooKeng version (source or binary): 1.3.0

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

Spelling mistake in from-hive.md file

Describe the expected behavior

Corrected spelling mistake

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

Spelling mistakes in hana.md

Software Environment:

  • OpenLooKeng version (source or binary): 1.3.0

  • OS platform & distribution (eg., Linux Ubuntu 16.04):Linux

  • Java version: 1.8

Describe the current behavior

Spelling mistakes in hana.md

Describe the expected behavior

  • ""throught"" should be ""through""
  • ""Detabase"" should be ""Database""
  • ""preject"" should be ""project""

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

spell fix in faq.md

Software Environment:

  • OpenLooKeng version (source or binary):

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

In line no 27, 'commuity' is used

Describe the expected behavior

it should be 'community'

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

Support for Doris DB

Currently, we use data analysis system constructed by Doris DB,and we expect OLK could support Doris so that we can extend more analysis scenarios.

Enhancement: Add class loader isolation for connector instances with dynamic load catalogs

Software Environment:

  • OpenLooKeng version (source or binary):
    latest version
  • OS platform & distribution (eg., Linux Ubuntu 16.04):
    linux 4.9.0-8-amd64 Bump libthrift from 0.9.3-1 to 0.12.0 in /hetu-heuristic-index #1 SMP Debian 4.9.144-3 (2019-02-02) x86_64 GNU/Linux
  • Java version:
    java version "1.8.0_162"
    Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
    Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)

Describe the current behavior

Class loader is same for different connector instances.
Configuration (for Kerberos, LDAP, S3 etc) leak between two catalogs that are using same connector.
For more information: trinodb/trino#1711

Describe the expected behavior

Expect different class loaders for connector instances.

Special notes for this issue

When try to add class loaders to isolate connector instances, dynamic add/delete catalogs is ok.
However, dynamic update catalogs need to clear expired class loaders in TypeDeserializer from JacksonModule.
I wonder Is there any elegant way to clear it and reload updated class loaders?

grammar correction hetu-docs/en/develop/functions.md

Software Environment:

  • OpenLooKeng version (source or binary): 1.3.0

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

spelling mistake of one word combine in hetu-docs/en/develop/functions.md

Describe the expected behavior

corrected spelling mistake of one word combine in hetu-docs/en/develop/functions.md

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

Inconsistent error reporting when schema does not exist

Software Environment:

  • OpenLooKeng version (source or binary):
    master

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version: all

Describe the current behavior

Describe the expected behavior

Specific error code must be reported when schema does not exist

Related log/screenshots

NA

Special notes for this issue

jdbc connector does not support deletes and updates

jdbc connector does not support deletes and updates.So far, there is no version of this function on the Internet

What should I do if I want to support this feature?
If someone already supports this function, is it convenient to provide a code reference?

System.md spell error

Software Environment:

  • OpenLooKeng version (source or binary):

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

Doesn/'t

Describe the expected behavior

Doesn't

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

integer overflow is happening when date is beyond representable range

Software Environment:

  • OpenLooKeng version 1.3.0:

  • OS platform & distribution Linux Ubuntu 20.04:

  • Java version:

Describe the current behavior

integer overflow is happening when date is beyond representable range

Describe the expected behavior

It should throw Arithemetic Exception in case of integer overflow

Steps to reproduce the issue

  1. Select date '5881580-07-12';

Related log/screenshots

Special notes for this issue

Kafka-Tutorial Spell Error

Software Environment:

  • OpenLooKeng version (source or binary):

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

Check for milliseconds spelling.

Describe the expected behavior

Milliseconds is modified as per dictionary

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

Add support for large number format function

Software Environment:

  • OpenLooKeng version (source or binary):1.3.0

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

need to support large number format function

Describe the expected behavior

Added support for large number format function

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

For the window functions, OLK1.3.0 is 20%-50% slower than expected

Software Environment:

  • OpenLooKeng version (source or binary):1.3.0

  • OS platform & distribution (eg., Linux Ubuntu 16.04):Ubuntu 20.04

  • Java version: 8 / 11

Describe the current behavior

when we try to run the SQL with window functions, we find it is slower as expected.
Take the function rank() as example:

select
brand_id
,category_id
,channel
,sub_channel
,uuid
,if(rank() over(partition by brand_id,uuid order by transaction_date)=1,'AA','BB') as user_group
,if(rank() over(partition by brand_id,category_id,channel,sub_channel,uuid order by transaction_date)=1,'AA','BB') as b_user_group
,if(rank() over(partition by category_id,channel,sub_channel,uuid order by transaction_date)=1,'CC','DD') as c_user_group
,if(rank() over(partition by brand_id,channel,sub_channel,uuid order by transaction_date)=1,'AA','BB') as bc_user_group
,transaction_date
from test.db_order_detail_by_day_item
where sub_channel='ASDFSDFSDF';

We find it is slower than trinodb 360.
Besides of rank(), maybe the other window functions also should be verified on performance.

Describe the expected behavior

OLK can invole the latest improvement of TrinoDB about window function.

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

Gitee Issue: I4CVQX

bug: new queries cannot be submitted when one of coordinator disconnected in specific case

Software Environment:

  • OpenLooKeng version (source or binary):
    latest version
  • OS platform & distribution (eg., Linux Ubuntu 16.04):
    linux 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3 (2019-02-02) x86_64 GNU/Linux
  • Java version:
    java version "1.8.0_162"
    Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
    Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)

Describe the current behavior

After submitting big query which leads to the corruption of one coordinator, new small query cannot be submitted successfully.

Describe the expected behavior

The corruption of one coordinator doesn't affect other new queries.

Steps to reproduce the issue

  1. set node-scheduler.include-coordinator=true and you'd better set experimental.reserved-pool-enabled=false, otherwise the coordinator may not be corrupted.
  2. submit a big query which leads to the corruption of one coordinator directly Or using "kill -9 prestoServerPID" on that coordinator processing this query if you are sure the coordinator has no enough memories for new queries.
  3. submit new queries to other coordinators.

Related log/screenshots

lk> select 1;

Query 20210428_035158_00002_4tzge, QUEUED, 0 nodes, 0 splits

Special notes for this issue

Why new query cannot be submitted?
The related bug is when new query be submitted, group.canRunMore() always returns false.
So that new query always be queued.

Why group.canRunMore() always returns false?
Because query stats in Hazelcast not updated when big query leads to corruption of one coordinator.
The fact is that client gets server gone after big query submitted, but query status in Hazelcast is always running , which affects canRunMore of other new queries.

Add random function taking range min, max

Software Environment:

  • OpenLooKeng version (source or binary):
    1.3.0

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

Add random function taking range min, max

Describe the expected behavior

Added random function taking range min, max

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

Language Update

Software Environment:

  • OpenLooKeng version (source or binary):

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

Language is not updated

Describe the expected behavior

Language can be updated

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

olk fails to select Array type in Elasticsearch

Software Environment:

  • OpenLooKeng version (source or binary):

  • OS platform & distribution (eg., Linux Ubuntu 16.04): Linux

  • Java version:

Describe the current behavior

  1. create an index in Elasticsearch 7.10 containing array data(like "songs" in the picture)
  2. select from the index above, olk will throw error.
    154027_cfac562b_1404593

Describe the expected behavior

select of array data should work

Steps to reproduce the issue

image
do this in kibana ui with http://localhost:5601/app/dev_tools#/console
2. Now run select query in olk
image

Related log/screenshots

Special notes for this issue

Support Hive tables with customized delimiters

Software Environment:

  • OpenLooKeng version (source or binary): 1.3.0

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

Failed to query Hive tables with customized delimiters.
How can openLooKeng support this scenario.

Describe the expected behavior

openLooKeng supports querying hive tables with customized delimiters.

Steps to reproduce the issue

1.Configure hive-site.xml to support customized delimiters
2.In beeline, create a hive table with customized delimiters
3.Load the data file with customized delimiters into hive table
4.query in beeline successfully
5.query in openLooKeng failed

Related log/screenshots

Special notes for this issue

Spelling correction in spill.md

Software Environment:

  • OpenLooKeng version (source or binary):

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

Spelling mistake of a word cumulated

Describe the expected behavior

Corrected the spelling mistake of cumulated to accumulated in spill.md file

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

spelling correction in elasticsearch.md

Software Environment:

  • OpenLooKeng version (source or binary): 1.3.0

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior: spelling mistake of word openLooKeng in elasticsearch.md

Describe the expected behavior: corrected spelling mistake of word openLooKeng in elasticsearch.md

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

Spelling mistake in spill.md

Software Environment:

  • OpenLooKeng version (source or binary): Latest master

  • OS platform & distribution (eg., Linux Ubuntu 16.04): NA.

  • Java version:

Describe the current behavior

Describe the expected behavior

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

Docker Build Failed!!!!

Software Environment:

  • OpenLooKeng source code hetu-core:

  • OS platform & distribution Linux Ubuntu 20.04

Expected behavior

Docker image should be successfully build

Steps to reproduce the issue

1.Change your terminal to root user. using sudo -i command
2. Navigate to hetu-core directory.
3. Run docker build -f ./docker/Dockerfile -t hetucore:latest "docker"
4. It will take some time. On 8th step it shows
"COPY_ failed: file not found in build context or excluded by .dockerignore: stat hetu-server-: file does not exist"

Related log/screenshots

I'm attaching the screenshots of my terminal.
Screenshot from 2021-10-10 13-23-22

The execution time printed from CLI is not correct

Software Environment:

  • OpenLooKeng version (source or binary):
    OLK 1.2.0
  • OS platform & distribution (eg., Linux Ubuntu 16.04):
    ALL
  • Java version:

Describe the current behavior

The execution time printed from olk1.2.0 CLI is not same as trino358.

Describe the expected behavior

CLI output should be same.

Steps to reproduce the issue

  1. Let us try to execute the same query. Here is the info printed by olk1.2.0 and trinodb358

olk1.2.0

lk:tpcds_orc_hive_1000> select count(1) from web_sales where ws_sold_date_sk=2452594;
_col0
791846
(1 row)

Query 20210701_014752_00011_aa5tu, FINISHED, 2 nodes
Splits: 19 total, 19 done (100.00%)
0:00 [792K rows, 16.5KB] [2.58M rows/s, 53.8KB/s]

trino358

trino:tpcds_orc_hive_1000> select count(1) from web_sales where ws_sold_date_sk=2452594;
_col0
791846
(1 row)

Query 20210701_014504_00013_fg5by, FINISHED, 2 nodes
Splits: 35 total, 35 done (100.00%)
0.21 [792K rows, 537B] [3.7M rows/s, 2.45KB/s]

By checking the execution info from native page, we can see that :

  • In both native pages of olk and trinodb, the execution time is 0.21
  • The execution time printed from olk1.2.0 CLI is not correct

Related log/screenshots

Special notes for this issue

Spelling errors in tpch.md

Software Environment:

  • OpenLooKeng version (source or binary): Latest master

  • OS platform & distribution (eg., Linux Ubuntu 16.04): ALL

  • Java version:

Describe the current behavior

NA

Describe the expected behavior

NA

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

Create table as carbondata table from mysql query does not work

Software Environment:

  • OpenLooKeng version (source or binary):
    010

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

1.8

Describe the expected behavior

Create a carbondata table from a mysql query.Does openlookeng support such operation?

Steps to reproduce the issue

1.Register a mysql data source and a carbondata source to openlookeng
2.Do a query looks like 'create TABLE carbondata.default.table as select somthing from mysql.schema.table'
3.Run the query with jdbc driver.My test throws such exception:Exception in thread "main" java.sql.SQLException: Query failed (#20200827_065416_00000_vkdav): org.apache.hadoop.io.ByteWritable cannot be cast to org.apache.hadoop.hive.serde2.io.ByteWritable

Related log/screenshots

Special notes for this issue

openlookeng支持MySQL8.0吗?

在/opt/openlookeng/data/plugin/内看到mysql资源文件夹,但好像只能支持5.x版本,可以手动替换驱动,或者是其他办法使openlookeng支持mysql8.x吗?

支持达梦数据库

由于国产化的原因,达梦数据库在政企行业使用的越来越多,希望能够支持。

Dynamic Filtering Issue when ORC Predicate Pushdown is enabled

Software Environment:

  • OpenLooKeng version (source or binary):1.4.0-RC5

  • OS platform & distribution (eg., Linux Ubuntu 16.04): Linux

  • Java version:

Describe the current behavior

While executing same query, Dynamic filters are created when orc_predicate_pushdown is disabled and dynamic filters are not created when orc_predicate_pushdown is enabled

Describe the expected behavior

Dynamic filters to be created in both the cases of orc_predicate_pushdown enabled and disabled.

Steps to reproduce the issue

1.Set enable-dynamic-filtering = true and hive.orc-predicate-pushdown-enabled=false
2.Execute tpcds Q05.sql
3.Set enable-dynamic-filtering = true and hive.orc-predicate-pushdown-enabled=true
4.Execute tpcds Q05.sql
5. Difference can be seen in live plan of both the executions

Related log/screenshots

Special notes for this issue

hetu-opengauss Connector如何提高并发

hetu-opengauss Connector是基于PostgreSQL Connector或者说是jdbc Connector来开发的,因此在进行数据读取的时候,默认只有一个split。这样的话如果想从gaussdb获取大量数据的场景下,通过jdbc协议读取数据就成为了一个性能瓶颈。请问有什么方法可以进行优化么

Language Improvment

Describe the current behavior

Formatting was not proper

Describe the expected behavior

Formatting needs to improved

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

[Doc]Spelling correction in elasticsearch.md change to openLooKeng instead of opneLooKeng

Software Environment:

  • OpenLooKeng version (source or binary):

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

Spelling correction is wrong in elasticsearch.md --'opneLooKeng'

Describe the expected behavior

Spelling correction in elasticsearch.md change to openLooKeng instead of opneLooKeng

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

Add Support for Show Views

Software Environment:

  • OpenLooKeng version (source or binary): 1.3.0

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

Add Support for Show Views

Describe the expected behavior

Added Support for Show Views

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

Integer overflow when parsing dates beyond representable range

Software Environment:

  • OpenLooKeng version (source or binary):

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

Describe the expected behavior

Steps to reproduce the issue

Related log/screenshots

Special notes for this issue

hetu-hazelcast depends com.sun:tools:jar witch removed since JDK9

Software Environment:

  • OpenLooKeng version (source or binary):
    source

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

I use mvn clean install -DskipTest to build hetu-core with JDK 14, and it break by a dependency error:

[ERROR] Failed to execute goal org.codehaus.mojo:aspectj-maven-plugin:1.11:compile (default) on project hetu-hazelcast: Execution default of goal org.codehaus.mojo:aspectj-maven-plugin:1.11:compile failed: Plugin org.codehaus.mojo:aspectj-maven-plugin:1.11 or one of its dependencies could not be resolved: Could not find artifact com.sun:tools:jar:14.0.1 at specified path {JAVA_HOME}../lib/tools.jar -> [Help 1]

Describe the expected behavior

Success build the project with JDK 14

Steps to reproduce the issue

  1. git clone https://github.com/openlookeng/hetu-core.git
  2. cd hetu-core
  3. mvn clean install -DskipTest

Related log/screenshots

Special notes for this issue

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.