I really like this concept, but it's critical for us to be able to create permanent ta

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Support Hive about filodb HOT 12 CLOSED

filodb commented on June 27, 2024

Support Hive

from filodb.

Comments (12)

velvia commented on June 27, 2024

@rstrickland welcome! Would you mind painting a picture of your setup? Hive metastore is part of your Hadoop installation? Or DSE? I don't quite understand - table meta stored in Hive, but actual tables stored in C*? Thanks!

from filodb.

rstrickland commented on June 27, 2024

We use a centralized Hive metastore that's shared by multiple Spark and EMR clusters that all serve different purposes. The data is stored in multiple places, including Cassandra. However, with the lack of a good open source Cassandra-Hive driver we've had to resort to creating temp tables every time we want to get at Cassandra data. It would be awesome if Filo supported legitimate Hive tables so we could bypass this step.

from filodb.

velvia commented on June 27, 2024

Got it. Would it be fine to do this through Spark — i.e. you cannot use the filo-cli but use Spark API to create tables etc. (since Spark already can connect to the Hive metastore)

On Nov 10, 2015, at 10:09 AM, Robbie Strickland [email protected] wrote:

We use a centralized Hive metastore that's shared by multiple Spark and EMR clusters that all serve different purposes. The data is stored in multiple places, including C. However, with the lack of a good open source C-Hive driver we've had to resort to creating temp tables every time we want to get at C* data. It would be awesome if Filo supported legitimate Hive tables so we could bypass this step.

—
Reply to this email directly or view it on GitHub #41 (comment).

from filodb.

rstrickland commented on June 27, 2024

We could, but we do have BI tools that use Hive proper (i.e. not the Spark SQL thrift server). Ideally it would be great if that would work as well, but I know that's a bigger effort.

from filodb.

velvia commented on June 27, 2024

Okay, I think I understand now. You are looking for a proper FiloDB driver for HIVE that lets you query FiloDB from Hive itself. Understood now.

On Nov 10, 2015, at 1:09 PM, Robbie Strickland [email protected] wrote:

We could, but we do have BI tools that use Hive proper (i.e. not the Spark SQL thrift server). Ideally it would be great if that would work as well, but I know that's a bigger effort.

—
Reply to this email directly or view it on GitHub #41 (comment).

from filodb.

velvia commented on June 27, 2024

@rstrickland ok so to break this up into two steps:

Have a Hive driver (like DSE's) that automatically lets you query tables from Spark without having to do a CREATE EXTERNAL TABLE
Actually support queries directly from Hive without Spark. Hmmmm.... I think this involves yucky input formats and Hive SerDes, etc.

from filodb.

velvia commented on June 27, 2024

Ok, scoped out the work for Hive metastore support of FiloDB tables for querying in Spark. Spark has a HiveMetadataCatalog class which has a createDataSourceTable method. So one possibility is that when the FiloDB daemon/library spins up, it automatically resolves differences between FiloDB tables and the Hive catalog. Other times this sync could in theory happen is when a user requests tables or schema, but this would then require a custom Hive plugin in Spark. Need to think about how to automate the syncing.

@rstrickland it appears in Hive you either have to register a table as Hive-supported (i.e. using Hadoop INputFormats) or non-Hive supported (for Spark datasources, for example). Thus there might need to be some hack for namespacing the tables. What do you think?

from filodb.

rstrickland commented on June 27, 2024

So are you suggesting creating temp tables on startup that would be accessible via Spark only? I think the only way to get bona fide Hive support is to create a Hive SerDe. But your solution (if I'm understanding correctly) would solve the issue of having to recreate temp tables on startup for Spark or Spark SQL jobs, but would not allow for Hive queries via a BI tool.

from filodb.

velvia commented on June 27, 2024

@rstrickland you are right, the above proposed solution would not enable true Hive-only queries, though you can still connect BI tools to Spark SQL / Thrift server via the JDBC/ODBC drivers. The SerDe/InputFormats required for true Hive-only operation would come as a second step.

Would you guys be willing to test out the Spark-only solution, before the full Hive solution comes? What would the timeframe look like?

from filodb.

rstrickland commented on June 27, 2024

We would definitely test it whenever it's ready.

On Sunday, January 10, 2016, Evan Chan [email protected] wrote:

@rstrickland https://github.com/rstrickland you are right, the above
proposed solution would not enable true Hive-only queries, though you can
still connect BI tools to Spark SQL / Thrift server via the JDBC/ODBC
drivers. The SerDe/InputFormats required for true Hive-only operation would
come as a second step.

Would you guys be willing to test out the Spark-only solution, before the
full Hive solution comes? What would the timeframe look like?

—
Reply to this email directly or view it on GitHub
#41 (comment).

Robbie **Strickland *|Director, Software Engineering
w:* 770-226-2093 e: [email protected]

from filodb.

velvia commented on June 27, 2024

@rstrickland check out #63
and LMK if this is roughly what you guys are looking for as a first step. Would like some feedback first.

Thanks!

from filodb.

velvia commented on June 27, 2024

So the initial support has been merged. Let's close this and open a new ticket for any issues or changes desired.

from filodb.

Support Hive about filodb HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent