Giter Site home page Giter Site logo

Support Hive about filodb HOT 12 CLOSED

filodb avatar filodb commented on June 27, 2024
Support Hive

from filodb.

Comments (12)

velvia avatar velvia commented on June 27, 2024

@rstrickland welcome! Would you mind painting a picture of your setup? Hive metastore is part of your Hadoop installation? Or DSE? I don't quite understand - table meta stored in Hive, but actual tables stored in C*? Thanks!

from filodb.

rstrickland avatar rstrickland commented on June 27, 2024

We use a centralized Hive metastore that's shared by multiple Spark and EMR clusters that all serve different purposes. The data is stored in multiple places, including Cassandra. However, with the lack of a good open source Cassandra-Hive driver we've had to resort to creating temp tables every time we want to get at Cassandra data. It would be awesome if Filo supported legitimate Hive tables so we could bypass this step.

from filodb.

velvia avatar velvia commented on June 27, 2024

Got it. Would it be fine to do this through Spark — i.e. you cannot use the filo-cli but use Spark API to create tables etc. (since Spark already can connect to the Hive metastore)

On Nov 10, 2015, at 10:09 AM, Robbie Strickland [email protected] wrote:

We use a centralized Hive metastore that's shared by multiple Spark and EMR clusters that all serve different purposes. The data is stored in multiple places, including C. However, with the lack of a good open source C-Hive driver we've had to resort to creating temp tables every time we want to get at C* data. It would be awesome if Filo supported legitimate Hive tables so we could bypass this step.


Reply to this email directly or view it on GitHub #41 (comment).

from filodb.

rstrickland avatar rstrickland commented on June 27, 2024

We could, but we do have BI tools that use Hive proper (i.e. not the Spark SQL thrift server). Ideally it would be great if that would work as well, but I know that's a bigger effort.

from filodb.

velvia avatar velvia commented on June 27, 2024

Okay, I think I understand now. You are looking for a proper FiloDB driver for HIVE that lets you query FiloDB from Hive itself. Understood now.

On Nov 10, 2015, at 1:09 PM, Robbie Strickland [email protected] wrote:

We could, but we do have BI tools that use Hive proper (i.e. not the Spark SQL thrift server). Ideally it would be great if that would work as well, but I know that's a bigger effort.


Reply to this email directly or view it on GitHub #41 (comment).

from filodb.

velvia avatar velvia commented on June 27, 2024

@rstrickland ok so to break this up into two steps:

  1. Have a Hive driver (like DSE's) that automatically lets you query tables from Spark without having to do a CREATE EXTERNAL TABLE
  2. Actually support queries directly from Hive without Spark. Hmmmm.... I think this involves yucky input formats and Hive SerDes, etc.

from filodb.

velvia avatar velvia commented on June 27, 2024

Ok, scoped out the work for Hive metastore support of FiloDB tables for querying in Spark. Spark has a HiveMetadataCatalog class which has a createDataSourceTable method. So one possibility is that when the FiloDB daemon/library spins up, it automatically resolves differences between FiloDB tables and the Hive catalog. Other times this sync could in theory happen is when a user requests tables or schema, but this would then require a custom Hive plugin in Spark. Need to think about how to automate the syncing.

@rstrickland it appears in Hive you either have to register a table as Hive-supported (i.e. using Hadoop INputFormats) or non-Hive supported (for Spark datasources, for example). Thus there might need to be some hack for namespacing the tables. What do you think?

from filodb.

rstrickland avatar rstrickland commented on June 27, 2024

So are you suggesting creating temp tables on startup that would be accessible via Spark only? I think the only way to get bona fide Hive support is to create a Hive SerDe. But your solution (if I'm understanding correctly) would solve the issue of having to recreate temp tables on startup for Spark or Spark SQL jobs, but would not allow for Hive queries via a BI tool.

from filodb.

velvia avatar velvia commented on June 27, 2024

@rstrickland you are right, the above proposed solution would not enable true Hive-only queries, though you can still connect BI tools to Spark SQL / Thrift server via the JDBC/ODBC drivers. The SerDe/InputFormats required for true Hive-only operation would come as a second step.

Would you guys be willing to test out the Spark-only solution, before the full Hive solution comes? What would the timeframe look like?

from filodb.

rstrickland avatar rstrickland commented on June 27, 2024

We would definitely test it whenever it's ready.

On Sunday, January 10, 2016, Evan Chan [email protected] wrote:

@rstrickland https://github.com/rstrickland you are right, the above
proposed solution would not enable true Hive-only queries, though you can
still connect BI tools to Spark SQL / Thrift server via the JDBC/ODBC
drivers. The SerDe/InputFormats required for true Hive-only operation would
come as a second step.

Would you guys be willing to test out the Spark-only solution, before the
full Hive solution comes? What would the timeframe look like?


Reply to this email directly or view it on GitHub
#41 (comment).

  • Robbie **Strickland *|Director, Software Engineering
  • w:* 770-226-2093 e: [email protected]

from filodb.

velvia avatar velvia commented on June 27, 2024

@rstrickland check out #63
and LMK if this is roughly what you guys are looking for as a first step. Would like some feedback first.

Thanks!

from filodb.

velvia avatar velvia commented on June 27, 2024

So the initial support has been merged. Let's close this and open a new ticket for any issues or changes desired.

from filodb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.