Giter Site home page Giter Site logo

analysys / presto-hbase-connector Goto Github PK

View Code? Open in Web Editor NEW
239.0 20.0 100.0 284 KB

presto hbase connector 组件基于Presto Connector接口规范实现,用来给Presto增加查询HBase的功能。相比其他开源版本的HBase Connector,我们的性能要快10到100倍以上。

License: Apache License 2.0

Java 100.00%

presto-hbase-connector's Introduction

Analysys Presto-HBase-Connector

The component is implemented based on the Presto Connector interface specification and is used to add the ability to query HBase to Presto.

Our performance is 10 to 100 times faster than other open source versions of the HBase Connector.

Performance Comparison

Environment Detail
Data Size The event table contains 5 million records and 90 fields
Workers 3
Hardware 16 logical core 64GB memory (Presto and HBase 16GB memory respectively) 4T*2 hard disk

analysys-hb-performance.png

Details: https://github.com/analysys/public-docs/blob/master/Attachment5(Presto-HBase-Connector-PerformanceTesting).xlsx

Function Point Comparison

Fuction Analysys Others
Salted Table SUPPORTED UNSUPPORTED
Scan By StartKey & EndKey SUPPORTED UNSUPPORTED
Batch Gets SUPPORTED UNSUPPORTED
Predicate Pushdown (Filter) SUPPORTED UNSUPPORTED
ClientSideScan SUPPORTED SUPPORTED
Insert SUPPORTED SUPPORTED
Delete SUPPORTED SUPPORTED
Create Table SUPPORT LATER SUPPORTED

Environment

  1. Mac OS X or Linux
  2. 8u161+, 64-bit.
  3. Maven 3.3.9+
  4. PrestoSql 315+

Building

mvn clean package

Deploying

1.hbase.properties

Create the hbase.properties file under the {Presto_Config_Dir}/catalog directory.Synchronize to all worker nodes after configuration.

The following is a relatively simple configuration sample for your reference:

connector.name=hbase
zookeeper-quorum=localhost:2181
zookeeper-client-port=2181
zookeeper-znode-parent=/hbase
hbase-cluster-distributed=true
presto-server-port=8285
random-schedule-redundant-split=false
meta-dir=/etc/presto/chbase

Parameters:

  • connector.name

       Please fix this param to hbase.
    
  • zookeeper-quorum

       Please refer to hbase.zookeeper.quorum of HBase API.
    
  • zookeeper-client-port

       Please refer to hbase.zookeeper.property.clientPort of HBase API.
    
  • hbase-cluster-distributed

       Please refer to hbase.cluster.distributed of HBase API.
    
  • presto-workers-name

       Hostname of presto Worker, separated by commas(,).
       If param split-remove-accessible configuration is false, this parameter can be unset.
    
  • presto-server-port

       Please refer to http-server.http.port in {Presto_Config_Dir}/config.properties.
    
  • random-schedule-redundant-split

       By default, Presto dispatches the Split in the order of available workers and assigns the Split in turn.
       This makes it easy for the remainder of the Split set to be dispatched to workers at the beginning of the available Worker list when multi-table queries occur.
       Setting this parameter to true can change to randomly dispatch the remainder of the Split to a Worker for execution.
    
  • meta-dir

       The directory where the HBase table metadata information is stored.
    
  • zookeeper-znode-parent

       Please refer to zookeeper.znode.parent in hbase-site.xml.
    
  • enable-clientSide-scan

       Whether to enable HBase's ClientSide query mode.Default is false.
    
  • clientside-querymode-tablenames

       The name of table that is queried using ClientSide mode, with multiple tables separated by commas(,).
    
2.namespace

After configuring hbase.properties, we need to create the hbase namespace directory structure in the {meta-dir} directory. Informations:

  • The {meta-dir} directory is used to store metadata information for tables.
  • Under this directory, first create the directory with the namespace name of HBase.
  • Each table will have a separate JSON file, named {table name}.json, to hold its table structure information. We can configure metadata information such as table fields in this JSON file.
  • Tables of different namespace are stored in their respective namespace directories.

Example:

--meta-dir:
	--namespace_a:
		table_a1.json
		table_a2.json
	--namespace_b:
		table_b1.json
		table_b2.json
	--default:
		table_c.json

This example defines five tables:

namespace_a:table_a1 namespace_a:table_a2 namespace_b:table_b1 namespace_b:table_b2 default:table_c (saved in default namespace)

3.Table Structure JSON

After the namespace directory is created, we need to configure the table structure JSON file. Here are the properties in the JSON file:

Attribute Detail
tableName Table name.
schemaName Namespace.
rowKeyFormat The RowKey is composed of which fields, separated by English commas.The fields are in order.
rowKeySeparator The delimiter between the fields that make up the RowKey, which is \001 by default.
rowKeyFirstCharRange If the RowKey is hashed, you can specify a range for the first letter of the RowKey, which can dramatically improve performance in the form of multiple split concurrency.The value range of the first letter can be A~ z,A~ z, 09, with commas between each other, such as ab,DK,35, or 3~5, C ~f, and so on.
describe Comment of table.
columns columns.

Columns Json:

Attribute Detail
family Family name.
columnName Column name.
isRowKey is RowKey.
type Column type (Case insensitive): string, int, bigint, double, boolean(Stored with int, 0 for false, 1 for false), array< string >.
comment Column comment.

Description: isRowKey is true, which means we abstract the Rowkey of this table into a concrete field.Whether it's querying, writing, or any other complex operation, it's no different on the surface from a normal field, except that underneath the surface it has a special meaning as a row key for a table.

The RowKey field must be of type VARCHAR.

Example:

{
  "tableName": "t_event_test",
  "schemaName": "db_test",
  "rowKeyFormat": "xwhat,xwho",
  "describe": "Table for test!",
  "rowKeySeparator": "-",
  "rowKeyFirstCharRange": "a~z,0~9",
  "columns": [{
    "family": "",
    "columnName": "rowkey",
    "comment": "The RowKey column of table!",
    "type": "varchar",
    "isRowKey": true
  }, {
    "family": "f",
    "columnName": "xwho",
    "comment": "Column for test!",
    "type": "varchar",
    "isRowKey": false
  }, {
    "family": "f",
    "columnName": "ds",
    "comment": "Column for test!",
    "type": "varchar",
    "isRowKey": false
  }]
}

Find its corresponding directory in {meta-dir} from the table namespace, and follow the instructions above to create a JSON file named after the table name.

4.Build Jar
  • After completing the above steps, we need to compile the component JAR package:
// download source code
// using maven to build
mvn clean package
5.Deploy Component Jar

Create the plugin directory hbase (directory name can be set arbitrarily) in the {plugin.dir} directory.

Copy the presto0.20-hbase-{version.num}. Jar into this directory and synchronize it to all worker nodes.

6.Restart Presto Cluster

Insert

Write operations are supported in dev_0.1.1.

The write operation requires the user to specify the row_key for the data, either as a field concatenation or as a fixed value. As follows:

insert into hbase.db_test.test_event(row_key, xwho, distinct_id, ds, xwhen, xwhat, attri_1) select '01-test_rowkey' as row_key, xwho, distinct_id, ds, xwhen, xwhat, attri_1 from hbase.db_test.test_event_v2 where xwhen=1562057346821;

insert into hbase.db_test.test_event(row_key, xwho, distinct_id, ds, xwhen, xwhat, attri_1) select concat('01-', xwho, '-', xwhat, '-', xwhen) as row_key, xwho, distinct_id, ds, xwhen, xwhat, attri_1 from hbase.db_test.test_event_v2 where xwhat='login';

Delete

Deletion is supported in meta_0.1.1. The delete operation does not require the user to specify the value of the row_key for the data in the SQL, but requires the table being operated on to have the row_key field set in the JSON file that defines its metadata. Connector, when sieving the data to be deleted, gets the row_key for the data and then deletes the specified data based on the value of the row_key.

Example:

delete from hbase.db_test.test_event where xwhen >= 1562139516028;

Query Optimization

1.Salted Table

A salt value is a set of reverse-reducible random Numbers that are prefixed to each RowKey. This allows the data to be stored in multiple regions and queries to be searched concurrently by multiple threads. In Presto, you can use this mechanism to split the data into multiple split and run concurrently. It has been proven that using salt values can improve performance by dozens of times.

Using salt values in a component requires the following two properties to be set in the JSON file:

  • seperateSaltPart

    Whether RowKey uses an INDEPENDENT salt value as a prefix.This param is set to true if the RowKey begins with a separate salt value section followed by {rowKeySeparator}.Starting with version 0.1.5, salt values can only consist of characters with values ranging from a to Z, A to Z, and 0 to 9 bits.
    
  • rowKeyFirstCharRange

    When the first character of RowKey is hashed according to MD5 or some other algorithm, this property can be used to indicate the range of values for the first character.The Connector generates multiple split based on its value range.For the time being, the value range of the first letter only supports az,A z, 0~9, with commas between each other, like:

    a~b,D~K,3~5
    3~5,c~f
    A~Z,p~u
    

    When the property is configured as A b,DF,6~8, 8 startKey and endKey pairs are generated in turn. Each pair of startKey and endKey will give a split for data scanning, like:

    (a,a|)
    (b,b|)
    (D,D|)
    (E,E|)
    (F,F|)
    (6,6|)
    (7,7|)
    (8,8|)
    

    Sometimes, if too many split is split, the scope will be merged automatically to avoid the degradation of the performance of too many split, like:

    (a,b|)
    (D,F|)
    (6,8|)
    
  • rowKeySeparator

    The delimiter between the fields that make up the RowKey, which is \001 by default.
    
2.Concatenate StartKey and EndKey according to the composition of the RowKey

This means to concatenate queries StartKey and EndKey based on which fields the RowKey consists of, and the predicates of the current query.

For example, when RowKey is composed as follows:

xwhat-xwho

sqls:

select xwhat, xwho, date, xwhen from t_event_test where xwhat='login' and xwho in ('drew', 'george');

This generates the following two StartKey and EndKey pairs:

(login-drew, login-drew|)
(login-george, login-george|)

To implement such a query optimization mechanism, we need to configure the following two parameters:

  • rowKeyFormat

       Defines which fields the RowKey consists of in order.So in this case, it should be configured as "xwhat,xwho"
    
  • rowKeySeparator

       The delimiter between the different components of the RowKey, which is \001 by default.So in this case, it should be configured as "-"
    

Also, if you want to see exactly which splits the SQL has made, you can set the log level to Info to see in server.log.

3.Batch Get

Batch get is the way the HBase API encapsulates multiple rowkeys to be queried into a List< get >, and then requests the List to get the data.

This query method is very convenient to use, you can directly query the RowKey as the equivalent of the query criteria into the SQL.

select * from t_event_test where rk in ('rk1', 'rk2', 'rk3');

When the system resolves the predicate, it determines whether this query pattern will be executed based on whether the field name is consistent with the RowKey field.

Using this query pattern, you must specify the RowKey field via isRowKey in the table's JSON file.

Note: Because the RowKey field we defined is a virtual field, it makes no logical sense to do any type of query against it other than equivalent.

4.ClientSideRegionScanner

ClientSideRegionScanner is a new addition to HBase's 0.96 Scanner. It scans files on HDFS directly on the Client side without sending a request to RegionServer, which in turn scans the HDFS files. This reduced the RegionServer burden and did not affect the query even if the RegionServer was in the unavailable state.At the same time, because it is to read HDFS directly, so in the cluster with relatively balanced load, the local read strategy can avoid a lot of network load.

Here's a comparison of the component's performance using ClientSideRegionScanner vs ordinary RegionScanner. Most of the queries showed a 30% or more improvement, especially a full table scan:

ClientSide&NormalScanner.png

Details: https://github.com/analysys/public-docs/blob/master/Attachment6(Presto-HBase-Connector-PerformanceTesting-ClientSide).xlsx

  • Using ClientSide queries requires the following three parameters:
  • hbase-rootdir

    This parameter is consistent with the hbase.rootdir of hbase.site.xml.

  • enable-clientSide-scan

    Whether ClientSide query mode is enabled or not.

  • clientside-querymode-tablenames

    Define which tables need ClientSide queries, with commas between table names, for example:

    namespace_a:table_a,namespace_a:table_b,namespace_b:table_c
    

    If all tables use ClientSide queries, you can configure it to *.

In addition to the above three parameters, the two configuration files of Hadoop in the running environment, core-site.xml and HDFs-site.xml, need to be copied to the src/main/resources directory of the project when packaging.

Note that ClientSideRegionScanner's queries rely on Snapshot, so in order for queries to get the latest data, a Snapshot is automatically created with the following naming rule for each query:

"ss-" + schemaName + "." + tableName + "-" + System.nanoTime()

The maximum number of Snapshot supported by HBase is 65,536, so it is a good idea to periodically clean out expired Snapshot when using ClientSideRegionScanner.

Problem Solving

1.How to support ClientSideRegionScanner query Snappy compressed HBase table?

You need to solve the following problems:

1) Cannot find SnappyCodec

This is due to the lack of Hadoop-common-2.7.3.jar in Presto's classPath.Since we are based on the Presto built by Ambari, we need to copy this JAR package to the directory /usr/lib/presto/lib.

2) SnappyCodec could not be converted to CompressionCodec

Presto was found to load the plugins by a custom PluginClassLoader, while SnappyCodec was loaded by AppClassLoader.Different classloaders result in no parent-child inheritance relationship between parent and child classes.

This problem was solved by modifying the code in hbase-common-1.1.2.1.jar, change SnappyCodec to load using PluginClassLoader.Need to modify the code for hbase-common module org.apache.hadoop.hbase.io.compress.Compression, modification method is as follows:

  /**
   * Returns the classloader to load the Codec class from.
   */
  private static ClassLoader getClassLoaderForCodec() {
    /* before modify:
    ClassLoader cl = Thread.currentThread().getContextClassLoader();
    if (cl == null) {
      cl = Compression.class.getClassLoader();
    }*/
    // after modify:
    ClassLoader cl = Compression.class.getClassLoader();
    if (cl == null) {
      cl = Thread.currentThread().getContextClassLoader();
    }
    
    if (cl == null) {
      cl = ClassLoader.getSystemClassLoader();
    }
    if (cl == null) {
      throw new RuntimeException("A ClassLoader to load the Codec could not be determined");
    }

Re-install the Maven repository with the modified code and repackage the component jars.

3) java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z

This is the Native Snappy library that needs to add Hadoop to the JVM. In Presto's jvm.config, you can add the following configuration:

-Djava.library.path={path to hadoop native lib}
4) java.io.IOException: java.lang.NoSuchMethodError: com.google.common.base.Objects.toStringHelper(Ljava/lang/Object;)Lcom/google/common/base/Objects$ToStringHelper;

This is because the guava v20.0+ version removed the inner class ToStringHelper in class com.google.common.base.Objects, and some toStringHelper method.

We can add the deleted code from the lower version to the higher version of the Guava source, recompile and update the guava-24.1-jre.jar in the Maven repository, and then rebuild the presto-hbase.jar.

And upload guava-24.1-jre.jar to PrestoWorker's lib directory.

Or use Maven's shade plugin to resolve this conflicts.

5) Cannot find constructor of Stopwatch

Change the constructor in com.google.common.base.Stopwatch from protect to public, and rebuild.

Or use Maven's shade plugin to resolve this conflicts.

6)Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=presto, access=WRITE, inode="/apps/hbase/data/data/db_moredatatest/multifamily_90cq_7cf_500w_snappy2/dee9b34c5cd8ee34f74ff5fc5446432a/.tmp":hbase:hdfs:drwxr-xr-x

The permissions are insufficient, when we use presto user to access hbase data dir through presto query, we need to grant the presto user read permissions.

2.How to debug ClientSideRegionScanner to scan a Snappy compressed HBase table ?

You need to solve the following problems:

1) cannot find class CanUnbuff

add dependency below in module presto-hbase-connector:

<dependency>
	<groupId>com.facebook.presto.hadoop</groupId>
	<artifactId>hadoop-apache2</artifactId>
	<version>2.7.4-1</version>
</dependency>
2) use hbase-shaded-client and hbase-shaded-server dependencies
3) According to part "SnappyCodec could not be converted to CompressionCodec", and re-install maven library. These three modules hbase-shade-client,hbase-shade-server and hbase-common must be rebuild.
4) Add -Djava.library.path in run->Edit Configuration to VM options configuration in PrestoServer. java.library.path is the native snappy library path of hadoop.

Releases Notes

1. meta-0.1.1
  • Support ClientSide Query.
2. meta-0.1.2
  • Support insert and delete.
  • Fix failure when query table in default namespace using ClientSide.
  • Change enable-clientSide-scan to false by default. Change hbase-rootdir to nullable.
  • Add param zookeeper-znode-parant.
3. meta-0.1.3
  • Migrate connector to PrestoSql-315.
  • Provide a usable version based on PrestoDB-0.221. Branch name is dev_prestodb-0.221_0.1.2.
  • Fix document.
4. meta-0.1.4
  • Migrate connector API to non-legacy version.
5. meta-0.1.5
  • Readjust the split logic, remove the parameter rowKeySaltUpperAndLower, and change it to rowKeyFirstCharRange.So that even if the RowKey has no salt part and no predicates available to splice StartKey, as long as the first character of the RowKey is hashed, we can still split multiple splits to increase the parallelism of the query.

presto-hbase-connector's People

Contributors

crossoverrr avatar thomasperkins1123 avatar toriycn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

presto-hbase-connector's Issues

支持Protobuf, Avro等HBase value

在HBase value中使用Protobuf,Avro是常见的use case,如果想提交PR实现Parse Protobuf,Avro 功能,有什么上手建议吗

jdk版本

JDK版本是否需要升级到11?
Caused by: org.codehaus.plexus.compiler.CompilerException: invalid target release: 11
at org.codehaus.plexus.compiler.javac.JavaxToolsCompiler.compileInProcess (JavaxToolsCompiler.java:191)

无法查询int,bigint,double数值字段

您好,我遇到一个问题。
查询表中的int,bigint,double等数值字段时,结果始终是 0 rows;varchar数值类型才可以正常显示。
带有数值字段的表,select * from table,结果为0 rows。而select count(1) from table是正常的

增加离线读hfile的模式

  1. 目前的hbase connector是基于regionserver rpc handler,presto查询会影响已有在线业务
  2. 希望能够增加离线读hfile的方式,避免对在线handler的消耗

java.lang.IllegalArgumentException: No factory for connector hbase

启动presto时
-- Loading plugin /data/presto/data1/plugin/hbase --
No service providers of type com.facebook.presto.spi.Plugin
然后控制台报错
java.lang.IllegalArgumentException: No factory for connector hbase
请问下这是什么原因,jar包没有生效么

配置hbase.propertites 之后报错

2020-01-02T20:42:09.519+0800 ERROR main com.facebook.presto.server.PrestoServer No factory for connector hbase
java.lang.IllegalArgumentException: No factory for connector hbase
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:216)
at com.facebook.presto.connector.ConnectorManager.createConnection(ConnectorManager.java:194)
at com.
这是什么原因导致的?尝试了多次仍然不行

presto表默认未记录hbase的时间戳信息

我看文档里有这样的例子:
delete from hbase.db_test.test_event where xwhen >= 1562139516028;
请问,xwhen是hbase里头timestamp的映射吗?代码里似乎没有相关信息啊
image
文档里的这个例子,是手动在hbase里创建了xwhen列吗?

版本适配的问题

您好,我配置好像不成功,运行bin/launcher run命令后问题如下:
java.util.ServiceConfigurationError: com.facebook.presto.spi.Plugin: Provider com.analysys.presto.connector.hbase.frame.HBasePlugin not a subtype
prestodb版本:0.228
presto-hbase-connector版本:meta-0.1.3

有贡献代码意向的朋友请注意,确保你的提交会让你成为contributor!

在提交PR之前请阅读以下链接,以确保你的提交会让你成为contributor
https://docs.github.com/en/github/setting-up-and-managing-your-github-profile/why-are-my-contributions-not-showing-up-on-my-profile
这里还有中文版:
https://segmentfault.com/a/1190000004318632

以下两点请务必确认:

  • 提交PR请确保提交到master分支
  • 要确保邮箱与github账号相关联,邮箱确认方式如下:
    你可以在你的本地repo中使用git log命令查看,确保你的提交记录上显示了正确的邮箱。

Table has no columns:

Query 20190318_114214_00006_ij286 failed: Table has no columns: hbase:HBaseTableHandle{schemaTableName=default.test4}
您好,我在hbase的default的namespace里建了一个test4表,按照文档配置好test4.json后,查询出现这样的错误提示,请问是错在哪儿呢?

  • test4.json配置
{
  "tableName": "test4",
  "schemaName": "default",
  "rowKeyFormat": "name,age",
  "rowKeySaltUpperAndLower": "0,29",
  "describe": "Table for test!",
  "columns": [{
    "family": "cf",
    "columnName": "rowkey",
    "comment": "Column for test4!",
    "type": "varchar",
    "isRowKey": true
  }, {
    "family": "cf",
    "columnName": "name",
    "comment": "Column for test4!",
    "type": "varchar",
    "isRowKey": false
  }, {
    "family": "cf",
    "columnName": "age",
    "comment": "Column for test4!",
    "type": "int",
    "isRowKey": false
  }],
  "rowKeySeparator": "_"
}
  • hbase.properties配置
connector.name=hbase
zookeeper-quorum=172.16.1.243:2181,172.16.1.244:2181,172.16.1.245:2181
zookeeper-client-port=2181
hbase-cluster-distributed=true
presto-server-port=8090
random-schedule-redundant-split=false
meta-dir=/usr/local/meta/chbase
  • hbase表里是这样的
hbase(main):018:0> scan 'test4'
ROW                                              COLUMN+CELL                                                                                                                                 
 miao_30                                         column=cf:age, timestamp=1552471555481, value=30                                                                                            
 miao_30                                         column=cf:name, timestamp=1552471544975, value=miao                                                                                         
1 row(s) in 0.0360 seconds

namespace下的json配置后,集群需要重启才生效

hbase基于列式存储,如果hbase添加了几个字段,那对应的json也要做修改,并且需要重启presto才能生效。
比如,如果hbase新增了很多表,那presto对应的namespace添加完表信息后,也需要重启才能生效。
现在的问题不但只hbase-connector有这样的问题,其他的connector也会出现,能否添加完表配置信息,不需要重启presto就可以生效,因为新增表是很常见的操作。

presto的hbase依赖目录只需要编译出来的那一个jar包么

presto版本0.221,我将编译出来的jar包放入plugin/hbase目录下,配置好hbase相关信息后,运行show tables;是正常的,但是运行select的时候select name from demo;就报错
2019-06-28T17:59:05.865+0800 DEBUG query-execution-12 com.facebook.presto.execution.QueryStateMachine Query 20190628_095905_00002_x8g6u is FAILED
2019-06-28T17:59:05.865+0800 DEBUG Query-20190628_095905_00002_x8g6u-257 com.facebook.presto.execution.QueryStateMachine Query 20190628_095905_00002_x8g6u failed
java.lang.AbstractMethodError
at com.facebook.presto.split.SplitManager.getSplits(SplitManager.java:87)
at com.facebook.presto.split.CloseableSplitSourceProvider.getSplits(CloseableSplitSourceProvider.java:51)
at com.facebook.presto.sql.planner.SplitSourceFactory$Visitor.lambda$visitTableScan$0(SplitSourceFactory.java:126)
at com.facebook.presto.sql.planner.SplitSourceFactory$Visitor.visitTableScan(SplitSourceFactory.java:131)
at com.facebook.presto.sql.planner.SplitSourceFactory$Visitor.visitTableScan(SplitSourceFactory.java:102)
at com.facebook.presto.sql.planner.plan.TableScanNode.accept(TableScanNode.java:185)
at com.facebook.presto.sql.planner.SplitSourceFactory$Visitor.visitAggregation(SplitSourceFactory.java:222)
at com.facebook.presto.sql.planner.SplitSourceFactory$Visitor.visitAggregation(SplitSourceFactory.java:102)
at com.facebook.presto.sql.planner.plan.AggregationNode.accept(AggregationNode.java:199)
at com.facebook.presto.sql.planner.plan.InternalPlanNode.accept(InternalPlanNode.java:31)
at com.facebook.presto.sql.planner.SplitSourceFactory.createSplitSources(SplitSourceFactory.java:84)
at com.facebook.presto.execution.scheduler.SqlQueryScheduler.createStreamingLinkedStages(SqlQueryScheduler.java:411)
at com.facebook.presto.execution.scheduler.SqlQueryScheduler.createStreamingLinkedStages(SqlQueryScheduler.java:506)
at com.facebook.presto.execution.scheduler.SqlQueryScheduler.createStages(SqlQueryScheduler.java:327)
at com.facebook.presto.execution.scheduler.SqlQueryScheduler.(SqlQueryScheduler.java:207)
at com.facebook.presto.execution.scheduler.SqlQueryScheduler.createSqlQueryScheduler(SqlQueryScheduler.java:154)
at com.facebook.presto.execution.SqlQueryExecution.planDistribution(SqlQueryExecution.java:490)
at com.facebook.presto.execution.SqlQueryExecution.startExecution(SqlQueryExecution.java:359)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
除了编译出的那两包还需要其他的jar包么

hbase的表名有大写字母的会出现表不存在的异常

你好!
当表名有大写字母的会出现以下的异常,实际表名为ApiMetaData。在获取表名时是不是都统一转换成小写了?
2020-07-03T10:35:39.795+0800 ERROR Query-20200703_023539_00006_7pa7n-534 com.analysys.presto.connector.hbase.connection.HBaseClientManager apimetadata
org.apache.hadoop.hbase.TableNotFoundException: apimetadata
at org.apache.hadoop.hbase.client.HBaseAdmin.getTableDescriptor(HBaseAdmin.java:458)
at com.analysys.presto.connector.hbase.connection.HBaseClientManager.getTable(HBaseClientManager.java:108)
at com.analysys.presto.connector.hbase.meta.HBaseMetadata.getTableMetadata(HBaseMetadata.java:96)
at com.analysys.presto.connector.hbase.meta.HBaseMetadata.getTableMetadata(HBaseMetadata.java:89)
at io.prestosql.metadata.MetadataManager.getTableMetadata(MetadataManager.java:450)
at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:933)
at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:272)
at io.prestosql.sql.tree.Table.accept(Table.java:53)
at io.prestosql.sql.tree.AstVisitor.process(AstVisitor.java:27)
at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:286)
at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.analyzeFrom(StatementAnalyzer.java:1879)
at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:1040)
at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:272)
at io.prestosql.sql.tree.QuerySpecification.accept(QuerySpecification.java:144)
at io.prestosql.sql.tree.AstVisitor.process(AstVisitor.java:27)
at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:286)
at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:296)
at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:742)
at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:272)
at io.prestosql.sql.tree.Query.accept(Query.java:107)
at io.prestosql.sql.tree.AstVisitor.process(AstVisitor.java:27)
at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:286)
at io.prestosql.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:264)
at io.prestosql.sql.analyzer.Analyzer.analyze(Analyzer.java:76)
at io.prestosql.sql.analyzer.Analyzer.analyze(Analyzer.java:68)
at io.prestosql.execution.SqlQueryExecution.(SqlQueryExecution.java:184)
at io.prestosql.execution.SqlQueryExecution.(SqlQueryExecution.java:92)
at io.prestosql.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:706)
at io.prestosql.dispatcher.LocalDispatchQueryFactory.lambda$createDispatchQuery$0(LocalDispatchQueryFactory.java:118)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

配置无法成功,有版本的关系么

presto> select xwhat, xwho from default.employ;
Query 20191122_055951_00000_nyi4c failed: com.analysys.presto.connector.hbase.schedule.HBaseSplitManager.getSplits(Lcom/facebook/presto/spi/connector/ConnectorTransactionHandle;Lcom/facebook/presto/spi/ConnectorSession;Lcom/facebook/presto/spi/ConnectorTableLayoutHandle;Lcom/facebook/presto/spi/connector/ConnectorSplitManager$SplitSchedulingContext;)Lcom/facebook/presto/spi/ConnectorSplitSource;

表结构目录
default
employ.json
{
"tableName": "employ",
"schemaName": "default",
"rowKeyFormat": "xwhat,xwho",
"rowKeySaltUpperAndLower": "0,2",
"describe": "Table for test!",
"columns": [{
"family": "",
"columnName": "rowkey",
"comment": "Column for test!",
"type": "varchar",
"isRowKey": true
}, {
"family": "base",
"columnName": "xwhat",
"comment": "Column for test!",
"type": "varchar",
"isRowKey": false
}, {
"family": "base",
"columnName": "xwho",
"comment": "Column for test!",
"type": "varchar",
"isRowKey": false
}],
"rowKeySeparator": "-"
}

presto-service 起不来

2019-06-28T18:09:40.392+0800 INFO main Bootstrap zookeeper-znode-parent null -- n/a --
2019-06-28T18:09:40.556+0800 ERROR main com.analysys.presto.connector.hbase.frame.HBaseConnectorFactory Unable to create injector, see the following errors:

  1. Error: Invalid configuration property zookeeper-znode-parent: may not be null (for class com.analysys.presto.connector.hbase.meta.HBaseConfig.zookeeperZnodeParent)

1 error
com.google.inject.CreationException: Unable to create injector, see the following errors:

  1. Error: Invalid configuration property zookeeper-znode-parent: may not be null (for class com.analysys.presto.connector.hbase.meta.HBaseConfig.zookeeperZnodeParent)

1 error
at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:543)
at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:159)
at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
at com.google.inject.Guice.createInjector(Guice.java:87)
at io.airlift.bootstrap.Bootstrap.initialize(Bootstrap.java:241)
at com.analysys.presto.connector.hbase.frame.HBaseConnectorFactory.create(HBaseConnectorFactory.java:56)
at com.facebook.presto.connector.ConnectorManager.createConnector(ConnectorManager.java:323)
at com.facebook.presto.connector.ConnectorManager.addCatalogConnector(ConnectorManager.java:195)
at com.facebook.presto.connector.ConnectorManager.createConnection(ConnectorManager.java:187)
at com.facebook.presto.connector.ConnectorManager.createConnection(ConnectorManager.java:173)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalog(StaticCatalogStore.java:96)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalogs(StaticCatalogStore.java:74)
at com.facebook.presto.server.PrestoServer.run(PrestoServer.java:131)
at com.facebook.presto.server.PrestoServer.main(PrestoServer.java:73)

2019-06-28T18:09:40.557+0800 ERROR main com.facebook.presto.server.PrestoServer connector is null
java.lang.NullPointerException: connector is null
at java.util.Objects.requireNonNull(Objects.java:228)
at com.facebook.presto.connector.ConnectorManager$MaterializedConnector.(ConnectorManager.java:348)
at com.facebook.presto.connector.ConnectorManager.addCatalogConnector(ConnectorManager.java:195)
at com.facebook.presto.connector.ConnectorManager.createConnection(ConnectorManager.java:187)
at com.facebook.presto.connector.ConnectorManager.createConnection(ConnectorManager.java:173)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalog(StaticCatalogStore.java:96)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalogs(StaticCatalogStore.java:74)
at com.facebook.presto.server.PrestoServer.run(PrestoServer.java:131)
at com.facebook.presto.server.PrestoServer.main(PrestoServer.java:73)

2019-06-28T18:09:40.557+0800 INFO Thread-94 io.airlift.bootstrap.LifeCycleManager Life cycle st

List table issue in presto hbase

Hi,

Im trying to query from presto hbase table, im facing below issue. i have configured all the schemas and put it compiled jar to presto/plugin folder. but still im facing issue while show tables internal server error issue. Kindly help me out from this issue,

2020-12-07T11:52:30.364Z	ERROR	remote-task-callback-5	io.prestosql.execution.StageStateMachine	Stage 20201207_115229_00003_tdxze.2 failed
java.lang.NullPointerException
	at com.analysys.presto.connector.hbase.meta.HBaseMetadata.listTables(HBaseMetadata.java:111)
	at io.prestosql.metadata.MetadataManager.listTables(MetadataManager.java:573)
	at io.prestosql.metadata.MetadataListing.listTables(MetadataListing.java:93)
	at io.prestosql.connector.informationschema.InformationSchemaPageSource.addTablesRecords(InformationSchemaPageSource.java:281)
	at io.prestosql.connector.informationschema.InformationSchemaPageSource.buildPages(InformationSchemaPageSource.java:219)
	at io.prestosql.connector.informationschema.InformationSchemaPageSource.getNextPage(InformationSchemaPageSource.java:183)
	at io.prestosql.operator.ScanFilterAndProjectOperator$ConnectorPageSourceToPages.process(ScanFilterAndProjectOperator.java:376)
	at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
	at io.prestosql.operator.WorkProcessorUtils$YieldingProcess.process(WorkProcessorUtils.java:181)
	at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
	at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
	at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
	at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
	at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
	at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.prestosql.operator.WorkProcessorUtils.lambda$flatten$6(WorkProcessorUtils.java:277)
	at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:319)
	at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
	at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
	at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
	at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
	at io.prestosql.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:215)
	at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.prestosql.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:149)
	at io.prestosql.operator.Driver.processInternal(Driver.java:379)
	at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
	at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
	at io.prestosql.operator.Driver.processFor(Driver.java:276)
	at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1076)
	at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
	at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
	at io.prestosql.$gen.Presto_347____20201207_115115_2.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)


Trino support

Seems the connector doesn't work with Trino 356:

2022-01-07T17:34:04.826+0800	INFO	main	io.trino.server.PluginManager	-- Loading plugin /home/work/apps/tools/presto-server/trino-server-356/plugin/hbase --
2022-01-07T17:34:04.870+0800	ERROR	main	io.trino.server.Server	No service providers of type io.trino.spi.Plugin
java.lang.IllegalStateException: No service providers of type io.trino.spi.Plugin
	at com.google.common.base.Preconditions.checkState(Preconditions.java:591)
	at io.trino.server.PluginManager.loadPlugin(PluginManager.java:139)
	at io.trino.server.PluginManager.loadPlugin(PluginManager.java:129)
	at io.trino.server.ServerPluginsProvider.loadPlugins(ServerPluginsProvider.java:48)
	at io.trino.server.PluginManager.loadPlugins(PluginManager.java:110)
	at io.trino.server.Server.doStart(Server.java:121)
	at io.trino.server.Server.lambda$start$0(Server.java:77)
	at io.trino.$gen.Trino_356____20220107_093347_1.run(Unknown Source)
	at io.trino.server.Server.start(Server.java:77)
	at io.trino.server.TrinoServer.main(TrinoServer.java:38)


2022-01-07T17:34:04.873+0800	INFO	Thread-88	io.airlift.bootstrap.LifeCycleManager	JVM is shutting down, cleaning up

enable CLientSideScan and throw exceptions about Cannot construct instance of `org.apache.hadoop.hbase.client.RegionInfo`

i think this is caused by regionInfo can't be serialized in Split, so how do you running this feature?

Caused by: com.fasterxml.jackson.databind.exc.InvalidDefinitionException: Cannot construct instance of org.apache.hadoop.hbase.client.RegionInfo (no Creators, like default construct, exist): abstract types either need to be mapped to concrete types, have custom deserializer, or contain additional type information
at [Source: (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); line: 1, column: 3011] (through reference chain: io.prestosql.server.TaskUpdateRequest["sources"]->java.util.ArrayList[0]->io.prestosql.execution.TaskSource["splits"]->java.util.HashSet[0]->io.prestosql.execution.ScheduledSplit["split"]->io.prestosql.metadata.Split["connectorSplit"]->com.analysys.presto.connector.hbase.schedule.HBaseSplit["regionInfo"])
at com.fasterxml.jackson.databind.exc.InvalidDefinitionException.from(InvalidDefinitionException.java:67)
at com.fasterxml.jackson.databind.DeserializationContext.reportBadDefinition(DeserializationContext.java:1589)
at com.fasterxml.jackson.databind.DeserializationContext.handleMissingInstantiator(DeserializationContext.java:1055)
at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserialize(AbstractDeserializer.java:265)
at com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:530)
at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:528)
at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:417)
at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1287)
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:326)
at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:194)
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:161)
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:130)
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:97)
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromAny(AsPropertyTypeDeserializer.java:193)
at io.prestosql.metadata.AbstractTypedJacksonModule$InternalTypeDeserializer.deserialize(AbstractTypedJacksonModule.java:87)
at com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:530)
at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:528)
at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:417)
at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1287)
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:326)
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:159)
at com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:530)
at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:528)
at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:417)
at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1287)
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:326)
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:159)
at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:286)
at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:245)
at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:27)
at com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:530)
at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:528)
at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:417)
at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1287)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.