Giter Site home page Giter Site logo

sqlalchemy-solr's Introduction

linting: pylint

Apache Solr dialect for SQLAlchemy

A SQLAlchemy dialect for Apache Solr.

Requirements

The dialect is compatible with Solr version 6.0 and higher.

Installation

The package can either be installed through PyPi or from the source code.

Through Python Package Index

pip install sqlalchemy-solr

Latest from Source Code

pip install git+https://github.com/aadel/sqlalchemy-solr

Usage

To connect to Solr with SQLAlchemy, the following URL pattern can be used:

solr://<username>:<password>@<host>:<port>/solr/<collection>[?parameter=value]

Note: port 8983 is used when port in the URL is omitted

Authentication

Basic Authentication

Basic authentication credentials can be supplied in connection URL as shown in the URL pattern above

JWT Authentication

JWT authentication token can be supplied as a token URL parameter, for example:

solr://<host>:<port>/solr/<collection>?token=<token_value>

Additional Parameters:

If HTTPS is enabled, the following parameters can be supplied:

  1. use_ssl: a boolean parameter when set to true an HTTPS connection is used. Default value is false.
  2. verify_ssl: a boolean parameter that controls whether to verify SSL certificate. Default value is true.

Aliases

Aliases are supported as tables where columns are the union of all the underlying collections fields. For example, if an alias collection_alias has linked collection members, the following query is valid:

SELECT f1, f2, f3 FROM collection_alias

where f1, f2, and f3 are defined in the linked collections.

Time Filters

Time filtration predicates are transformed to Solr syntax when ORM mode is used and Solr datasource release is lower than 9. Time filters transformations are handled internally by Solr 9 and above without the driver intervention. Open- and close-ended date ranges are supported.

Multi-valued Fields

Multi-value fields are mapped to SQL arrays of a specific scalar type. For example:

phones = Column('PHONE_ss', ARRAY(String))

SQL Compilation Caching

The dialect supports caching by leveraging SQLAlchemy SQL compilation caching capabilities, which include query caching.

Schema

If the ORM query supplied explicitly refers to a schema, the schema would be filtered out before query execution.

Basic Example

The following example illustrates the basic usage in a Python project:

engine = create_engine('solr://solr:8983/solr/examples_books')

with engine.connect() as connection:
    result = connection.execute(text("SELECT sequence_i, genre_s FROM examples_books"))
    for row in result:
        print("Sequence: {}, Genre: {}".format(row['sequence_i'], row['genre_s']))

ORM Example

Base = declarative_base()

class Book(Base):
    __tablename__ = "books"

    id = Column('index_i', Integer, primary_key=True)
    name = Column('book_name_t', String)
    publishing_year = Column('publishing_year_i', Integer)

    def __repr__(self):
        return f"Book(id={self.id!r}, name={self.name!r}, publishing_year={self.publishing_year!r})"

engine = create_engine('solr://solr:8983/solr/books')

with Session(engine) as session:
    stmt = select(Book).where(Book.publishing_year.in_([2014, 2015])).limit(10)

    for book in session.scalars(stmt):
        print(book)

where index_i, book_name_t, and publishing_year_i are fields of books collection.

Time Filters Example

Base = declarative_base()

class SalesHistory(Base):
    __tablename__ = "sales"

    order_number = Column('order_number_i', Integer, primary_key=True)
    price_each = Column('price_each_f', Float)
    status = Column('status_s', String)
    order_date = Column('order_date_dt', Integer)

    def __repr__(self):
        return f"SalesHistory(order number={self.order_number!r}, status={self.status!r}, price={self.price_each}, order date={self.order_date!r})"

engine = create_engine('solr://solr:8983/solr/sales')

with Session(engine) as session:
    stmt = select(SalesHistory) \
        .where(and_(SalesHistory.order_date >= "2024-01-01 00:00:00", SalesHistory.order_date < "2024-02-01 00:00:00")) \
        .order_by(SalesHistory.price_each.asc()) \
        .limit(10)

    for row in session.scalars(stmt):
        print(row)

where order_number_i, price_each_f, status_s, and order_date_dt fields of sales collection.

In the above example, date predicates are transformed to [2024-01-01T00:00:00Z TO 2024-02-01T00:00:00Z}.

Open-ended date ranges are supported. For example:

.where(SalesHistory.order_date >= "2024-01-01 00:00:00")

translates to [2024-01-01T00:00:00Z TO *]

Compatibility

Feature 6.0 6.5 6.6 7.x 8.x 9.x
Aliases
Built-in date range compilation
SELECT expression statements
SQL compilation caching

Use Cases

Apache Superset

To connect Apache Superset with Solr datasource, add the package to the requirements then create a database connection using the URL pattern shown above.

Testing with Apache Superset

Requirements

  • A Solr instance with a Parallel SQL supported up and running
  • A Superset instance up and running with this package installed
  • pytest >= 7.4.4 installed on the testing machine

Procedure

  1. Change conftest.py as appropriate
  2. Run pytest

Resources

  1. SQLAlchemy documentation
  2. SQLAlchemy dialects
  3. PEP 249 – Python Database API Specification v2.0

sqlalchemy-solr's People

Contributors

aadel avatar anjackson avatar dependabot[bot] avatar rohitpawar2811 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

sqlalchemy-solr's Issues

Support for JWT Auth Token

Hi,

I'm reaching out regarding an issue with sqlalchemy-solr. Currently, only Basic-Auth is supported . The request is for adding Jwt Auth Token support when connecting to Solr.

Current connection:
solr://username:XXXXXXXXXX@host:8983/solr/local-enterpriseSearch

Desired connection:
solr://172.20.20.244:8983/solr/local-enterpriseSearch?token=13224212....

Thanks!

Contributing to the source code

@aadel

I want to know how I can contribute to the source code. Some improvements are helpful towards ensuring we have a library that's able to do more.

I have made some changes and will wait to hear back from you

How to handle multivalued Solr fields?

If a query return values from a multivalued column, this error occurs:

TypeError: <class 'list'> is not convertible to datetime

and is swallowed up by this except statement:

except Exception as ex:
logging.error(Cursor.mf.format("Error in Cursor.parse_column_types", str(ex)))

The code can be made to return something usable more useful by changing this section:

try:
df[column] = to_datetime(df[column])
types.append("timestamp")
except ValueError:
types.append("varchar")

So that the except statement catches (ValueError, TypeError) rather than just ValueError).

The code then returns the array as a string, e.g. [value,value]. This is better than crashing, but doesn't seem ideal. Is there a good way to handle Solr multivalued fields so the resulting list()s are handled as ARRAY[string] or whatever makes sense in SQL Alchemy?

Support Superset 2.0

Currently, this package is tested with Superset 0.37.0, however Superset 2.0 has been release. This issue is for supporting Superset 2.0.

Error when configuring filters

Thanks for this module! I've been trying it out with Apache Superset 0.37.2 and this error cropped up when trying to set up a filter:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 747, in __getattr__
    return getattr(self.comparator, key)
AttributeError: 'Comparator' object has no attribute 'text'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/superset/viz.py", line 503, in get_df_payload
    df = self.get_df(query_obj)
  File "/usr/local/lib/python3.7/site-packages/superset/viz.py", line 258, in get_df
    self.results = self.datasource.query(query_obj)
  File "/usr/local/lib/python3.7/site-packages/superset/connectors/sqla/models.py", line 1193, in query
    query_str_ext = self.get_query_str_extended(query_obj)
  File "/usr/local/lib/python3.7/site-packages/superset/connectors/sqla/models.py", line 679, in get_query_str_extended
    sql = self.database.compile_sqla_query(sqlaq.sqla_query)
  File "/usr/local/lib/python3.7/site-packages/superset/models/core.py", line 413, in compile_sqla_query
    sql = str(qry.compile(engine, compile_kwargs={"literal_binds": True}))
  File "<string>", line 1, in <lambda>
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 481, in compile
    return self._compiler(dialect, bind=bind, **kw)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 487, in _compiler
    return dialect.statement_compiler(dialect, self, **kw)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 592, in __init__
    Compiled.__init__(self, dialect, statement, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 322, in __init__
    self.string = self.process(self.statement, **compile_kwargs)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 352, in process
    return obj._compiler_dispatch(self, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py", line 96, in _compiler_dispatch
    return meth(self, **kw)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 2164, in visit_select
    text, select, inner_columns, froms, byfrom, kwargs
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/compiler.py", line 2263, in _compose_select_body
    t = select._whereclause._compiler_dispatch(self, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py", line 96, in _compiler_dispatch
    return meth(self, **kw)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy_solr-0.1.dev0-py3.7.egg/sqlalchemy_solr/base.py", line 111, in visit_clauselist
    parser.parse(c.right.text)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 758, in __getattr__
    replace_context=err,
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
AttributeError: Neither 'BindParameter' object nor 'Comparator' object has an attribute 'text'

Is filtering supported?

Thanks for your time.

Schema 'default' not found

Hi there,

I'm running into a problem setting up a chart for my solr collection where it seems to have problems "finding" the default schema:

WARNING:superset.connectors.sqla.models:Query SELECT COUNT(*) AS `count`
FROM `default`.mycollection
ORDER BY `count` DESC
LIMIT 100 on schema default failed
Traceback (most recent call last):
  File "/app/superset/connectors/sqla/models.py", line 1863, in query
    df = self.database.get_df(sql, self.schema, mutator=assign_column_label)
  File "/app/superset/models/core.py", line 446, in get_df
    self.db_engine_spec.execute(cursor, sqls[-1])
  File "/app/superset/db_engine_specs/base.py", line 1260, in execute
    raise cls.get_dbapi_mapped_exception(ex)
  File "/app/superset/db_engine_specs/base.py", line 1258, in execute
    cursor.execute(query)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy_solr/solrdbapi/_solrdbapi.py", line 72, in func_wrapper
    return func(self, *args, **kwargs)  # pylint: disable=not-callable
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy_solr/solrdbapi/_solrdbapi.py", line 142, in execute
    raise Exception(rows[0]["EXCEPTION"])
Exception: Failed to execute sqlQuery 'SELECT COUNT(*) AS `count`
FROM `default`.mycollection
ORDER BY `count` DESC
LIMIT 100' against JDBC connection 'jdbc:calcitesolr:'.
Caused by: Error while executing SQL "SELECT COUNT(*) AS `count`
FROM `default`.mycollection
ORDER BY `count` DESC
LIMIT 100": From line 2, column 6 to line 2, column 20: Object 'default' not found

My understanding is that solr doesn't really have a schema in the same way SQL does so there's only a default option available. I was able to set up and connect the datasource just fine, and the dataset also has no problems, but when I try to create a chart I get this error.

I'm running this through the docker-compose on the repository via the non-dev compose file after setting up the requirements-local.txt.
Am I missing something with my setup? Of course, let me know if there's any more info I can provide.

Thanks!

Querying User-defined Field Types in Solr give KeyError

Hi,
It seems there is an issue when a field that is not of a built-in type is being queried.
Error Message :
Error in SolrReflect.reflect_column_types
File "........./solr_reflect.py", line 62, in reflect_column_types
"type"
KeyError :

Remove semicolon from query

Solr does not accept SQL Queries terminated by a semicolon (;). To ensure compatibility with other packages that generate terminated SQL queries, a trailing semicolon should be removed before sending the query to the solr.

As a quick fix, I added the line query = query.replace(';', '') to substitute_in_query in the Cursor class.

Issue while creating chart for Solr

This query is generated by Superset and is causing errors. After fetching data, we prepare columns and types in the SolrTableReflection class.

However, in the logic for extracting tables, it extracts the name of the alias ['virtual_table']. Subsequently, using the alias name, it calls the /admin/luke API and encounters an error because there is no collection available with the alias virtual_table.

The problematic query is:

SELECT pd AS pd,
       pn AS pn
FROM
  (SELECT productId AS pd,
          productName AS pn
   FROM enterpriseSearch) AS virtual_table
LIMIT 10;

For Collections containing dashes "-" , SQL Statements return JSON decoder error

Hi,
I tried using sqlalchemy-solr with Superset version 0.999.0dev with several collections containing dashes "-", the select statements return a JSON Decoder error.
Through wireshark I can Solr replied back with a JSON response but looks like ;
"
{\n
"result-set":{\n
"docs":[{\n
"field1":"value1"}\n
,{\n
"EOF":true,\n
"RESPONSE_TIME":100}]}}
\n
"
The last \n is due to Wireshark display most probably.

I tried the equivalent GET URL in the browser and I could see the expected normal query result ;
http://host:port/solr/collection-1/sql?stmt=SELECT+field1+from+%60collection-1%60+limit+1

%60 is used to escape the dashes "-" in the collection name collection-1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.