python-pinot-dbapi / pinot-dbapi Goto Github PK

View Code? Open in Web Editor NEW

19.0 19.0 33.0 425 KB

Python DB-API and SQLAlchemy dialect for Pinot

License: MIT License

Makefile 0.48% Python 99.52%

pinot-dbapi's People

Contributors

Stargazers

Watchers

pinot-dbapi's Issues

Yank the 5.1.0 release?

I'm presuming this release should've been 0.5.1. Can it be yanked from pypi?

All releases until major version 5 will be seen as outdated.

TIMESTAMP datatype causing failure in Supserset

When trying to add Pinot table to Superset, getting the following error:

Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: ERROR:root:'timestamp' Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: Traceback (most recent call last): Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/lib/python3.8/site-packages/flask_appbuilder/api/__init__.py", line 84, in wraps Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: return f(self, *args, **kwargs) Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/superset/superset/views/base_api.py", line 80, in wraps Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: duration, response = time_function(f, self, *args, **kwargs) Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/superset/superset/utils/core.py", line 1368, in time_function Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: response = func(*args, **kwargs) Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/superset/superset/utils/log.py", line 224, in wrapper Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: value = f(*args, **kwargs) Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/superset/superset/datasets/api.py", line 236, in post Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: new_model = CreateDatasetCommand(g.user, item).run() Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/superset/superset/datasets/commands/create.py", line 47, in run Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: self.validate() Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/superset/superset/datasets/commands/create.py", line 87, in validate Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: if database and not DatasetDAO.validate_table_exists( Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/superset/superset/datasets/dao.py", line 81, in validate_table_exists Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: database.get_table(table_name, schema=schema) Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/superset/superset/models/core.py", line 603, in get_table Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: return Table( Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "<string>", line 2, in __new__ Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/lib/python3.8/site-packages/sqlalchemy/util/deprecations.py", line 139, in warned Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: return fn(*args, **kwargs) Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 560, in __new__ Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: metadata._remove_table(name, schema) Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__ Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: compat.raise_( Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_ Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: raise exception Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 555, in __new__ Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: table._init(name, metadata, *args, **kw) Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 644, in _init Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: self._autoload( Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 667, in _autoload Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: autoload_with.run_callable( Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2212, in run_callable Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: return conn.run_callable(callable_, *args, **kwargs) Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1653, in run_callable Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: return callable_(self, *args, **kwargs) Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 469, in reflecttable Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: return insp.reflecttable( Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/lib/python3.8/site-packages/sqlalchemy/engine/reflection.py", line 664, in reflecttable Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: for col_d in self.get_columns( Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/lib/python3.8/site-packages/sqlalchemy/engine/reflection.py", line 390, in get_columns Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: col_defs = self.dialect.get_columns( Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/lib/python3.8/site-packages/pinotdb-0.3.6-py3.8.egg/pinotdb/sqlalchemy.py", line 390, in get_columns Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: columns = [ Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/lib/python3.8/site-packages/pinotdb-0.3.6-py3.8.egg/pinotdb/sqlalchemy.py", line 393, in <listcomp> Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: "type": get_type(spec["dataType"], spec.get("fieldSize")), Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: File "/app/superset/lib/python3.8/site-packages/pinotdb-0.3.6-py3.8.egg/pinotdb/sqlalchemy.py", line 458, in get_type Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: return type_map[data_type.lower()] Oct 15 17:11:23 ip-10-0-7-125 superset[1849861]: KeyError: 'timestamp'

Looks like

pinot-dbapi/pinotdb/sqlalchemy.py

Line 450 in 8738d8f

def get_type(data_type, field_size):

is missing datetime data type.

Pinot version: 0.8.0
Table Type: Realtime
Superset version: 1.1.0

Password logged when initializing dialect

Related to this snippet

pinot-dbapi/pinotdb/sqlalchemy.py

Lines 175 to 186 in 3cdea3c

    
           if "password" in kwargs: 
        
               kwargs["password"] = self._password = kwargs.pop("password") 
        
           if "database" in kwargs: 
        
               kwargs["database"] = self._database = kwargs.pop("database") 
        
           kwargs["debug"] = self._debug = bool(kwargs.get("debug", False)) 
        
           kwargs["verify_ssl"] = self._verify_ssl = (str(kwargs.get("verify_ssl", "true")).lower() in ['true']) 
        
           logger.info( 
        
               "Updated pinot dialect args from %s: %s and %s", 
        
               kwargs, 
        
               self._controller, 
        
               self._debug, 
        
           )

If password is provided, this statement will print out the plaintext value. Could we update this log statement by removing or masking the password value?

Example log message

2024-03-25 16:16:24,916:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'host': '<host>', 'port': 443, 'path': 'query/sql', 'scheme': 'https', 'verify_ssl': True, 'username': 'username', 'password': 'password', 'debug': False}

httpx appears to be missing from requirements in setup.py

It appears that imports of the httpx package were added a few weeks ago but the corresponding package requirement is missing from:
https://github.com/python-pinot-dbapi/pinot-dbapi/blob/release-0.4.1/setup.py#L21-L24

Missing ciso8601 in Pnnotdb 0.4.7

Airflow canary builds started to fail after last pintodb release:

Example here:
https://github.com/apache/airflow/actions/runs/3133775825/jobs/5087788714#step:10:9254

Seems that the recent release has a hidden dependency on ciso8601 package, but it missing in install_requires.

  ______ ERROR collecting tests/providers/apache/pinot/hooks/test_pinot.py _______
  ImportError while importing test module '/opt/airflow/tests/providers/apache/pinot/hooks/test_pinot.py'.
  Hint: make sure your test modules/packages have valid Python names.
  Traceback:
  /usr/local/lib/python3.8/importlib/__init__.py:127: in import_module
      return _bootstrap._gcd_import(name[level:], package, level)
  tests/providers/apache/pinot/hooks/test_pinot.py:29: in <module>
      from airflow.providers.apache.pinot.hooks.pinot import PinotAdminHook, PinotDbApiHook
  airflow/providers/apache/pinot/hooks/pinot.py:24: in <module>
      from pinotdb import connect
  /usr/local/lib/python3.8/site-packages/pinotdb/__init__.py:1: in <module>
      from pinotdb.db import connect, connect_async
  /usr/local/lib/python3.8/site-packages/pinotdb/db.py:2: in <module>
      import ciso8601
  E   ModuleNotFoundError: No module named 'ciso8601'

Enhancement: Parse string results

Proposition: Enhancement

In some cases, the Pinot Broker REST API returns string values [e.g. "Infinity"] as results (as part of rows) when the value cannot be represented as a JSON "number".

To improve predictability from the Python client, should we parse this output string(s) as a Python native numeric type ?

Simply parsing with float should cover many cases and allow for more "natural" handling of Pinot's results from Python.

Should we apply a similar approach to parsing for other types ? (which ones ?)

Pinot BIG_DECIMAL columns are not accessible via SQLAlchemy

When retrieving Pinot table metadata with sqlalchemy the Pinot BIG_DECIMAL type is not currently supported. This prevents Apache SuperSet from creating datasets for any Pinot tables that contain BIG_DECIMAL columns. The following code triggers this:

from sqlalchemy import *
from sqlalchemy.engine import create_engine

engine = create_engine("pinot+http://<broker_host>:<broker_port>/query/sql?controller=http://<controller_host>:<controller_port>/")
conn = engine.connect()
metadata = MetaData()
t = Table("mytable", metadata, autoload_with=conn)

This results in the following error:

Traceback (most recent call last):
  File "/home/keri/testbed/test.py", line 23, in <module>
    t = Table("arm_commission", metadata, autoload_with=conn)
  File "<string>", line 2, in __new__
  File "/path/to/sqlalchemy/util/deprecations.py", line 277, in warned
    return fn(*args, **kwargs)  # type: ignore[no-any-return]
  File "/path/to/sqlalchemy/sql/schema.py", line 432, in __new__
    return cls._new(*args, **kw)
  File "/path/to/sqlalchemy/sql/schema.py", line 486, in _new
    with util.safe_reraise():
  File "/path/to/sqlalchemy/util/langhelpers.py", line 147, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/path/to/sqlalchemy/sql/schema.py", line 482, in _new
    table.__init__(name, metadata, *args, _no_init=False, **kw)
  File "/path/to/sqlalchemy/sql/schema.py", line 858, in __init__
    self._autoload(
  File "/path/to/sqlalchemy/sql/schema.py", line 890, in _autoload
    conn_insp.reflect_table(
  File "/path/to/sqlalchemy/engine/reflection.py", line 1535, in reflect_table
    _reflect_info = self._get_reflection_info(
  File "/path/to/sqlalchemy/engine/reflection.py", line 2014, in _get_reflection_info
    columns=run(
  File "/path/to/sqlalchemy/engine/reflection.py", line 2000, in run
    res = meth(filter_names=_fn, **kw)
  File "/path/to/sqlalchemy/engine/reflection.py", line 928, in get_multi_columns
    table_col_defs = dict(
  File "/path/to/sqlalchemy/engine/default.py", line 917, in _default_multi_reflect
    single_tbl_method(
  File "/path/to/pinotdb/sqlalchemy.py", line 252, in get_columns
    columns = [
  File "/path/to/pinotdb/sqlalchemy.py", line 255, in <listcomp>
    "type": get_type(spec["dataType"], spec.get("fieldSize")),
  File "/path/to/pinotdb/sqlalchemy.py", line 329, in get_type
    return type_map[data_type.lower()]
KeyError: 'big_decimal'

Asyncio support

I think everything is in the title, I'm not an expert in asyncio (at all) but I can tell that as a user, being able to call Pinot asynchronously is a big plus, which can make or break some applications.
For example, having to query Pinot N times takes takes approximately N times longer now [synchronous] vs using asynchronous calls which easily turns the realtime queries in not-so-realtime responses.

I'll look closer into it but any help and ideas are appreciated !

Update docs to use pinot broker for querying

The current docs suggest to use Pinot Controller for querying, would be good to update it to use pinot-broker for querying instead.

https://pypi.org/project/pinotdb/

Support gRPC + streaming for fetching records directly from Pinot servers in Pinot dbapi client

We have a use case that require fetching 100K-1M filtered records directly from Pinot servers with minimal performance hit. Each record has between 5 and 10 columns. We noticed fetching 500K records through default path (Pinot servers -> Pinot broker -> client) is a challenge for brokers.

Once reason is because Pinot dbapi client uses HTTP/JSON communication which is inefficient for large result set. Pinot-Connector for Presto and Spark fetches large result set directly from Pinot servers using a more efficient communication method: gRPC + streaming. This method has less impact on Pinot servers and allow fetching larger result set quickly.

Can you add gRPC + streaming support to Pinot python client?

[More details]
We noticed high CPU utilization on Pinot brokers. The following chart shows that Pinot brokers are spending most time on Reduce operation. Please note that the queries in question are simple SELECT + WHERE clause queries (no aggregations, no group by and no joins).

Reduce operation: Time spent by broker in combining query results from multiple servers.

Broker Avg. P99 reduce operation:

To summarize above chart, broker spends:

between 1s and up to 3.5s combining response for ApplicationStage queries.
between 1s and up to 4.5s combining response for ApplicationMilestone queries.
up to 1s combining response for ATSApplicant queries.

💡 The chart explains where 1s and up to 3s-4s of ApplicationStage and ApplicationMilestone queries are spent (broker combining responses, serializing into JSON before responding back to Reports Pinot client).

Update httpx dependency to allow `0.24.1`

0.24 was released a two months ago and includes a ton of additional logging improvements. I'd love to turn these on, but I can't because pinotdb restricts to >=0.23.0,<0.24.0.

Generally, I don't think using ^ is a good idea in any package, it's much better to use >= because then it leaves the responsibility to the client to find breaking changes when upgrading and resolving them or forcing a lower version. Library developers shouldn't be responsible for this.

https://hynek.me/articles/semver-will-not-save-you/

expose more cursor.stats

follow up discussion from #87

in Apache Pinot query response consist of many statistics useful for debugging and tracing the performance of a query. (see: https://docs.pinot.apache.org/users/api/querying-pinot-using-standard-sql/response-format)

in #87 we added the total execution time but we want to also expose more similar to how the REST API is returning.
Propose to have a stats field in cursor which contains a dict() that shows the k-v stats listed in the documentation

Support for nulls

There is some ongoing/merged development around handling nulls in Pinot, which is going to be released in an upcoming Pinot version. Here are some PRs:
https://github.com/apache/pinot/pulls?q=is%3Apr+is%3Aclosed+author%3Anizarhejazi

Is this Python client going to be compatible with those changes?

create a release pipeline

we've seen 2 issues regarding release package not containing necessary requirement/dependencies (#47 and #32)
we should create a release pipeline alongside CI pipeline

Allow timeout on connect execute

we recently switched from requests to httpx. however the default 5s timeout configuration might not work for many longer pinot queries.

Propose

propose to have a **kwargs passthrough from execute API down to httpx so that some additional config can also be added in the future.

Caveat

timeout should be synchronized between query parameter and HTTP timeout, so a util might be needed in the future to unified the query param and HTTP params.

Reuse HTTP / TCP connection between calls (`execute`)

Currently, every call (Cursor.execute) to Pinot establishes a new HTTP connection.
This quickly adds latency whenever "many" queries are executed.
I've measured somewhere around ~20ms overhead per call - which, multiplied by N requests to serve a more complex business query quickly adds significant delay.

Using requests.Session seems like the standard way of enabling connection reuse and improve performance in those cases.

Should it be done at the Cursor or Connection level ?

Handling this at the Cursor level seems like the safest and more future-proof approach and keeps the API predictable (no state between cursors) while still addressing the performance issue.

Is there another alternative worth exploring ?

Inconsistent connect behavior when using context manager

Somewhat confusingly, when calling connect() within a context manager, we get the Cursor but when not using a context manager, we get a Connection object. This is in contrast to behaviors with other db clients like sqlite3, SQLAlchemy, etc. and seems inconsistent. Curious to why we're not just returning self in __enter__()

pinot-dbapi/pinotdb/db.py

Lines 27 to 35 in dd8809c

    
           def connect(*args, **kwargs): 
        
               """ 
        
               Constructor for creating a connection to the database. 
        
                   >>> conn = connect('localhost', 8099) 
        
                   >>> curs = conn.cursor() 
        
               """ 
        
               return Connection(*args, **kwargs)

pinot-dbapi/pinotdb/db.py

Lines 164 to 168 in dd8809c

    
           def __enter__(self): 
        
               return self.cursor() 
        
           def __exit__(self, *exc): 
        
               self.close()

Move to Pinot SQL Endpoint

Plan to add support for new Pinot SQL endpoint and upgrade the lib version to 0.3.x
The new lib drops support to Pinot version < 0.3.0 without SQL endpoint.
Users with usage of Pinot version <0.3.0 should still remain on the lib version 0.2.x

Unable to use Multi Stage Engine for Apache Pinot on Superset

I would like to enable multi stage engine for Apache Pinot on Superset. As per PinotDB docs, it is possible to enable multi stage engine by providing connect args in create_engine method of SQLAlchemy

from sqlalchemy.engine import create_engine

 engine = create_engine(
     "pinot://localhost:8000/query/sql?controller=http://localhost:9000/",     
      connect_args={"useMultistageEngine": "true"}
 )

I have tried adding connect_args to Engine Parameters in Superset UI but multistage engine is still not enabled.

Expected results

Multi stage queries which run fine by using pinotdb in python directly should run in Superset as well

Actual results

Multi stage queries does not run in Superset

Screenshots

Here is one example!

DISTINCT * syntax runs fine in Python with multistage enabled. It also runs fine in Pinot UI Query Console. It does not work in Superset.

Environment

superset version: 3.0.0
pinotdb==5.1.0
psycopg2-binary==2.9.9
pinot version: release-1.0.0

Additional context

I have deployed Superset on EKS using Helm chart.
Same issue raised on Superset apache/superset#25627

Clean up PY2 stuff and improve code structure

lots of python2 __future__ import is still in pinot-dbapi.

Several clean ups.

clean up py2 stuff
add typing
split codes in db.py
- type and column processing into separate file
- query response processing / data manipulation into separate file

It seems like we still have lots to do to support more sqlalchemy features. but will left that into separate issues, potentially we can leverage the type / column / schema / data utils split out from db.py

SQLAlchemy integration improvements

Hi folks!

First of all: amazing project you guys have! I'm thrilled about my ride so far, Pinot is pretty awesome!

I just wanted to leave some words here though about mixing SQLAlchemy and Pinot. I've just started to use it in a project, and I struggled quite a bit with this integration because I thought I would be able to just use the ORM right out-of-the-box (I know, it's kinda stupid of me to even consider putting "ORM" and "OLAP" together, but having a representational model class in Python is very helpful).

Is there any ongoing effort to try to make this integration easier to put up? I'm considering contributing with this part, in the future, but I don't want to step onto anyone's toes or hinder anyone's work in this direction...

Thanks!

Make release a github action

Ideally we should make release a github action so people can pick the right version and kick off the release process.

Update doc with the correct key to use multistage engine

Using the docs, we were not able to get multistage engine working with sqlalchemy when using useMultistageEngine.

Example

pinot-dbapi/README.md

Line 53 in 4d3ce83

    
           To pass in additional query parameters (such as `useMultistageEngine=true`) you may pass

pinot-dbapi/README.md

Lines 56 to 58 in 4d3ce83

    
           ```python 
        
           curs.execute("select * from airlineStats air limit 10", queryOptions="useMultistageEngine=true") 
        
           ```

pinot-dbapi/README.md

Lines 108 to 111 in 4d3ce83

    
           # engine = create_engine( 
        
           #     "pinot://localhost:8000/query/sql?controller=http://localhost:9000/", 
        
           #     connect_args={"useMultistageEngine": "true"} 
        
           # )

Could we update the doc to use the correct key that will enable multistage engine?

It works when we used use_multistage_engine instead

Expose query response statistics

Expose query response stats in pinot-dbapi including:

NumServersQueried
NumServersResponded
NumSegmentsQueried
NumSegmentsProcessed
NumSegmentsMatched
NumConsumingSegmentsQueried
NumDocsScanned
NumEntriesScannedInFilter
NumEntriesScannedPostFilter
NumGroupsLimitReached
TotalDocs
TimeUsedMs
MinConsumingFreshnessTimeMs
NumSegmentsPrunedByBroker

Can we also expose all broker metrics here:
https://docs.pinot.apache.org/configuration-reference/monitoring-metrics#pinot-broker

Improve schemas and tables list

PinotDialect currently gets schemas and tables from the controller, but there are a couple issues with this:

In Pinot, schema refers to the table column definition, but in the context of SQLAlchemy schema generally refers to a directory / namespace of tables (e.g. in Postgres). This is confusing because in tools such as Superset you can select a schema then a table but with Pinot, they are more or less the same thing.

Query failed when '%' shows up in the literal

Python treats % as a special character, so if there is a % in the query literal, users need to write the query with %%.

sample query:

SELECT DATETIMECONVERT(metricTime, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:MINUTES'),
       AVG(metricValue) AS "AVG_1"
FROM metric_v6.metric_v6
WHERE metricTime >= 1621555200000
  AND metricTime < 1622160000000
  AND metric = 'CECCP%'
GROUP BY DATETIMECONVERT(metricTime, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:MINUTES')
LIMIT 10000

It works when change 'CECCP%' to 'CECCP%%'

v0.3.6 Aggregates such as sum(val) are incorrectly quoted in double quotes in group by

Superset queries to pinot fail after updating to 0.3.6. Sql has quoted aggregate in the select and in group by clauses causing sql exception saying that sum(val) needs to be in group by.

Fixed in superset by reverting superset back to 0.3.5

Add username and password option for broker

I'm using HTTP basic auth for my brokers but not my controller. I'm thinking of adding arguments so that similar to how a DB username / password can be passed in like so, the broker username and password can be passed in like
pinot://localhost:8099/query/sql?controller=localhost:9000&username=<username>&password=<password>. Thoughts on this?

'bool' object has no attribute 'lower'

A simple test with version 0.3.10 on superset or even a simple demo script against my cluster currently results in the following error:

Traceback (most recent call last):
File "/home/marko/projects/ASW/files/pinot-dbapi/examples/pinot-live.py", line 6, in
engine = create_engine(
File "", line 2, in create_engine
File "/usr/lib/python3.10/site-packages/sqlalchemy/util/deprecations.py", line 309, in warned
return fn(*args, **kwargs)
File "/usr/lib/python3.10/site-packages/sqlalchemy/engine/create.py", line 576, in create_engine
(cargs, cparams) = dialect.create_connect_args(u)
File "/home/marko/.local/lib/python3.10/site-packages/pinotdb/sqlalchemy.py", line 203, in create_connect_args
kwargs = self.update_from_kwargs(kwargs)
File "/home/marko/.local/lib/python3.10/site-packages/pinotdb/sqlalchemy.py", line 172, in update_from_kwargs
kwargs["verify_ssl"] = self._verify_ssl = (kwargs.get("verify_ssl", "true").lower() in ['true'])
AttributeError: 'bool' object has no attribute 'lower'

I'm using a simple pinot cluster based on docker swarm, the code that produces this error is the following:

Query pinot.live with sqlalchemy

from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *

engine = create_engine(
"pinot+http://pinot-broker.studiobot:8099/query/sql?controller=http://pinot-controller.studiobot:9020/"
)

meetupRsvp = Table("meetupRsvp_REALTIME", MetaData(bind=engine), autoload=True)
print(f"\nSending Count() SQL to Pinot")
query=select([func.count("")], from_obj=meetupRsvp)
print(engine.execute(query).scalar())

Please fix.

How does pinotdb handle Reserved words?

Hi,
I'm trying to integrate Pinot with Superset and being blocked by querying columns that same as reserved words.
Pinot SQL has some reserved words, like ‘time’ ‘timestamp’ etc. . A query using unquoted columns as such will cause a parser error. So how do we let Superset quote such columns using pinotdb?

I tried setting variables of PinotEngineSpec in Superset, but took no effect. On the other hand, I found that pydruid and [kylinpy]https://github.com/Kyligence/kylinpy/blob/master/kylinpy/sqla_dialect.py#L24 handle reserved words.

So could pinotd handle reserved words like pydruid and kylinpy?

AttributeError: 'NoneType' object has no attribute '_username'

Hey guys,

First off, thank you very much for this amazing library. It's great to have a way to query Pinot using Python.

I have been having some trouble querying my database and I keep getting this error,

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [74], in <cell line: 1>()
----> 1 curs.execute('SELECT * FROM baseballStats')
      2 for row in curs:
      3     print(row)

File ~\Anaconda3\envs\pinotdb\lib\site-packages\pinotdb\db.py:56, in check_closed.<locals>.g(self, *args, **kwargs)
     54 if self.closed:
     55     raise exceptions.Error(f"{self.__class__.__name__} already closed")
---> 56 return f(self, *args, **kwargs)

File ~\Anaconda3\envs\pinotdb\lib\site-packages\pinotdb\db.py:448, in Cursor.execute(self, operation, parameters)
    441 @check_closed
    442 def execute(self, operation, parameters=None):
    443     query = self.finalize_query_payload(operation, parameters)
    445     r = self.session.post(
    446         self.url,
    447         json=query,
--> 448         auth=(self.auth._username, self.auth._password))
    449     return self.normalize_query_response(query, r)

AttributeError: 'NoneType' object has no attribute '_username'

Additionally, I also see this error in my terminal,
'_xsrf' argument missing from POST

What exactly is the issue here?

This is simply what my code looks like,

from pinotdb import connect

conn = connect(host='localhost', port=8099, path='/query/sql', scheme='http')
curs = conn.cursor()

curs.execute('SELECT * FROM baseballStats')
for row in curs:
    print(row)

Note: I've tried a few other ports as well, including 9000.

I've followed the quick start instructions on the Pinot documentation and run the following commands to get my server running,

docker pull apachepinot/pinot:latest

docker run -p 9000:9000 pinot:latest QuickStart -type batch

I can access the server through the web UI just fine.

Do I need to provide a username and password? If so, what is the default username and password that I should include here?

By the way I am using Python version Python 3.10.4 and pinotdb 0.4.1.

Split extra header on first "="

The extra header key value pairs should be split on the first "=" to allow for values containing "=" characters.

This code on line 333:

k, v = header.split("=")

should be:

k, v = header.split("=", 1)

Otherwise there is an error:

>>> curs = conn.cursor()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.11/site-packages/pinotdb/db.py", line 57, in g
    return f(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pinotdb/db.py", line 185, in cursor
    cursor = Cursor(*self._args, **self._kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pinotdb/db.py", line 333, in __init__
    k, v = header.split("=")
    ^^^^
ValueError: too many values to unpack (expected 2)

Allow Pinotdb client to hit v2 engine

Python's Pinotdb client version 0.4.5 (https://docs.pinot.apache.org/users/clients/python) does not support table aliasing. For example, queries such as SELECT "tbl1"."x" FROM "y" "tbl1" aren't able to run because table aliasing isn't supported. However, Pinot's v2 engine:https://docs.pinot.apache.org/developers/advanced/v2-multi-stage-query-engine allows table aliasing and we'd need the client to run against the v2 engine with table aliasing support.

Table names seem to get dropped from joins etc.

While working on some queries for my team I noticed that table names weren't present in my join statements and occasionally this was causing issues with ambiguous column names.

I started to look through the code and think the issue is in some old code in the visit_column function. I've opened PR #97 which then uncovered what I suspect is another bug involving types. I can open a separate issue/PR for that if you prefer.

I understand PR #97 may not be the right solution, I admit the code is not entirely clear to me even after looking at the commit history of the relevant lines. I would be happy to submit a better solution but may need some help understanding the implementation.

Create test suite and CI pipelines

currently pinot-dbapi as not test or CI pipelines.

propose to create a unit-test suite

unit-test between sqlalchemy usage vs. direct query usage.
executes with a docker run started pinot quickstart hybrid table (github even? or periodically generated real-time stream)
CI will start with the docker pinot / local development requires either a minikube or start a local pinot (will improve test/context.py)

Python type of Timestamp and Boolean schema fields

My Pinot table schema has fields of type Timestamp and Boolean (I believe these types were introduced after Pinot v0.7.1).
When querying them using this Python client, the returned values for both of them are of the Python type str. More intuitive Python datatype IMO for them would have been datetime and bool. Can this be supported by this client?

	if "password" in kwargs:
	kwargs["password"] = self._password = kwargs.pop("password")
	if "database" in kwargs:
	kwargs["database"] = self._database = kwargs.pop("database")
	kwargs["debug"] = self._debug = bool(kwargs.get("debug", False))
	kwargs["verify_ssl"] = self._verify_ssl = (str(kwargs.get("verify_ssl", "true")).lower() in ['true'])
	logger.info(
	"Updated pinot dialect args from %s: %s and %s",
	kwargs,
	self._controller,
	self._debug,
	)

	def connect(args, *kwargs):
	"""
	Constructor for creating a connection to the database.

	>>> conn = connect('localhost', 8099)
	>>> curs = conn.cursor()

	"""
	return Connection(args, *kwargs)

	def __enter__(self):
	return self.cursor()

	def __exit__(self, *exc):
	self.close()

	```python
	curs.execute("select * from airlineStats air limit 10", queryOptions="useMultistageEngine=true")
	```

	# engine = create_engine(
	# "pinot://localhost:8000/query/sql?controller=http://localhost:9000/",
	# connect_args={"useMultistageEngine": "true"}
	# )

python-pinot-dbapi / pinot-dbapi Goto Github PK

pinot-dbapi's People

Contributors

Stargazers

Watchers

Forkers

pinot-dbapi's Issues

Propose

Caveat

Expected results

Actual results

Screenshots

Environment

Additional context

Query pinot.live with sqlalchemy

Recommend Projects

Recommend Topics

Recommend Org