pacman82 / arrow-odbc-py Goto Github PK

View Code? Open in Web Editor NEW

53.0 53.0 5.0 537 KB

Read Apache Arrow batches from ODBC data sources in Python

License: MIT License

Python 66.80% Rust 31.44% Dockerfile 0.55% Shell 1.21%

arrow-odbc-py's People

Contributors

Stargazers

Watchers

Forkers

bms-dwh wynnw timvink dmitrymaletin rupurt

arrow-odbc-py's Issues

Memory requirements for PostgresSQL

Thank you for this awesome library! I have been testing it for smaller tables Postgres tables ~1.5GB and everything worked pretty well and fast. But I tried to make it work with some bigger table and it is always killed with OOM killer:

System: Docker image with allocated 10GB RAM. Debian GNU/Linux 11 (bullseye). Python 3.9.13.
Arrow ODBC version: 0.2.0
ODBC Driver: PostgreSQL Unicode (from odbc-postgresql)
Table size: ~6.8GB

I have run the following code:

from arrow_odbc import read_arrow_batches_from_odbc

connection_string = "Driver={PostgreSQL Unicode};Server=xxxxx;Port=5432;Database=xxxx;"

reader = read_arrow_batches_from_odbc(
    query=f"SELECT * FROM xxx",
    connection_string=connection_string,
    batch_size=10,
    max_binary_size=1000,
    max_text_size=1000,
    user="xxxx",
    password="xxxx",
    falliable_allocations=False,
)

i = 1
for batch in reader:
    print(i)
    i += 1

When I run it the memory usage is constantly growing until the OOM killer kills the process. I have also tried to play with different values in falliable_allocations , batch_size, max_binary_size, and max_text_size, but nothing made a script finish. I would appreciate any hint.

Thanks a lot for your work!

Decimal numbers with magnitude <1 are treated as always-positive

xref: pacman82/odbc-api#513

Database error message not in stack trace

I discovered this at the same time as
pacman82/arrow-odbc#63

DB2 on IBM i error messages and status codes aren't making it into the Python stack trace. I redacted some connection details below.

from arrow_odbc import read_arrow_batches_from_odbc, insert_into_table
import getpass
import numpy as np
import pyarrow as pa
import pyarrow.parquet as pq

dev_libl = ",REDACTEDLIB1,REDACTEDLIB2,REDACTEDLIB3"
driver = "{IBM i Access ODBC Driver 64-bit}"
dev_host="<redacted>"
user = getpass.getuser()
pwd = getpass.getpass()

# https://www.ibm.com/docs/en/i/7.4?topic=details-connection-string-keywords
dev_conn_str = f"DRIVER={driver};SYSTEM={dev_host};CMT=0;NAM=1;DBQ={dev_libl};UID={user};PWD={pwd}"

def write_to_db():
    arrow_table = pa.Table.from_arrays((np.arange(100), np.random.rand(100)), names=('row_num', 'rand_float'))
    print(arrow_table.schema)
    print(arrow_table.shape)
    reader = pa.RecordBatchReader.from_batches(arrow_table.schema, arrow_table.to_batches())
    insert_into_table(
        connection_string=dev_conn_str,
        chunk_size=10000,
        table="TSTPANDAS",
        reader=reader
    )

write_to_db()

I get this error

$ python arrow_odbc_test_minimal.py 
Password: 
row_num: int64
rand_float: double
(100, 2)
Traceback (most recent call last):
  File "/data/hub/rcoleman12/ic-ml-hard-part-adds/arrow_odbc_test_minimal.py", line 28, in <module>
    write_to_db()
  File "/data/hub/rcoleman12/ic-ml-hard-part-adds/arrow_odbc_test_minimal.py", line 21, in write_to_db
    insert_into_table(
  File "/data/hub/rcoleman12/ic-ml-hard-part-adds/.venv/lib/python3.9/site-packages/arrow_odbc/writer.py", line 140, in insert_into_table
    raise_on_error(error)
  File "/data/hub/rcoleman12/ic-ml-hard-part-adds/.venv/lib/python3.9/site-packages/arrow_odbc/error.py", line 30, in raise_on_error
    raise Error(error_out)
arrow_odbc.error.Error: An error occurred preparing SQL statement: INSERT INTO TSTPANDAS (row_num, rand_float) VALUES (?, ?);

I know from the other linked issue that the issue is ODBC drivers rejecting the semicolon at the end of the INSERT statement. So, I'm missing the database error message that pyodbc wraps like this:

pyodbc.ProgrammingError: ('42000', '[42000] [IBM][System i Access ODBC Driver][DB2 for i5/OS]SQL0104 - Token ; was not valid. Valid tokens: <END-OF-STATEMENT>. (-104) (SQLPrepare)')

insert_into_table error: [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Incorrect syntax near 'Audit'

Hi.
I can succesfully read data from an Azure SQL Server database using read_arrow_batches_from_odbc.
If I use insert_into_table with the same connecting string, I get this error:
Error: Failure to execute the sql statement, sending the data to the database.
ODBC emitted an error calling 'SQLExecute':
State: 42000, Native error: 102, Message: [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Incorrect syntax near 'Audit'.

def dataframe_to_table(df): table = pa.Table.from_pandas(df) reader = pa.RecordBatchReader.from_batches(table.schema, table.to_batches()) insert_into_table( connection_string=connection_string, chunk_size=1000, table="tmp_test", reader=reader, ) dataframe_to_table(to_format_db)

Is there a way to debug this to find out if the error is on my side?

Unexpected batch size of 32,767 instead of 10,000,000

I'm running

    reader = read_arrow_batches_from_odbc(
        query=QUERY,
        connection_string=CONNECTION_STRING,
        batch_size=10000000,
    )

When I enumerate over the BatchReader

    for i, batch in enumerate(reader):
        print(f"Batch Size {batch_size}")
        print(f"In loop reading arrow batches, i:{i}")
        # print(f"batch schema:{batch.schema}")
        print(f"NUM ROWS: {batch.num_rows}")#32767
        print(f"num_columns: {batch.num_columns}")#9
        # print(f"columns: {batch.columns}")# a list of columns and their data
        print(f"get_total_buffer_size: {batch.get_total_buffer_size()}")#4770907
        print(f"nbytes: {batch.nbytes}")#4770891
        # print(f"schema: {batch.schema}")

I get the following output:

Batch Size 10000000
In loop reading arrow batches, i:0
NUM ROWS: 32767
num_columns: 9
get_total_buffer_size: 4770907
nbytes: 4770891

I would have expected the num_rows to be 10Million

if we do a smaller batch say 10 it works:

Batch Size 10
In loop reading arrow batches, i:4
NUM ROWS: 10
num_columns: 9
get_total_buffer_size: 1468

If we do the edge case batchsize+1 its missing 1 expected row

Batch Size 32768
In loop reading arrow batches, i:2
NUM ROWS: 32767
num_columns: 9
get_total_buffer_size: 4749999
nbytes: 4749983

I'm curious how I can debug this and figure out whats causing this limit? Does this work with large batches or do you recommend another tool for large batches? Thank you for your time and consideration.

I've also tried changing the query to be just 1 column and I still get the same result

    QUERY = f"SELECT 'A' as the_letter_A FROM tbl"

I'm also wondering if 1.456GB of RAM usage will be a problem or not as thats what I calculated if the reader was actually reading 10M records

Add option to use 64 bits for all floats? (difference in behavior compared to pyodbc)

I noticed I had some Float64 values coming through in Float32 precision and narrowed it down to the wrong precision/scale being reported by the ODBC driver.

With this test query:

arrow_odbc.log_to_stderr(3)
...
query = """select 
cast(123456789.123456789 as decimal(38, 20)) as original_decimal, 
cast('double', cast(123456789.123456789 as decimal(38, 20))) as cast_to_double
"""

reader = arrow_odbc.read_arrow_batches_from_odbc(
    connection_string="DSN=...", 
    query=query
)
t = pa.Table.from_batches(reader, reader.schema)
display(t)

(it's not related to the syntax used here - same thing happens with different variants and double-precision columns in tables)

I see these types in the arrow-odbc debug output:

DEBUG - ODBC Environment created.
DEBUG - SQLAllocHandle allocated connection (Dbc) handle '0x1fc1c1c9b00'
WARN - State: 01000, Native error: 0, Message: [Microsoft][ODBC Driver Manager] The driver doesn't support the version of ODBC behavior that the application requested (see SQLSetEnvAttr).
DEBUG - Database managment system name as reported by ODBC: PostgreSQL
DEBUG - ODBC driver reported for column 0. Relational type: Numeric { precision: 38, scale: 20 }; Nullability: Nullable; Name: 'original_decimal';
DEBUG - ODBC driver reported for column 1. Relational type: Float { precision: 17 }; Nullability: Nullable; Name: 'cast_to_double';
INFO - Column 'original_decimal'
Bytes used per row: 49
INFO - Column 'cast_to_double'
Bytes used per row: 12
INFO - Total memory usage per row for single transit buffer: 61

It looks like the type indicated by their driver is wrong - should be more like 53 for a double instead of the 17 we're getting.

I get this result, with 123456790 as the result of Float32 conversion:

pyarrow.Table
original_decimal: decimal128(38, 20)
cast_to_double: float
----
original_decimal: [[123456789.12345678900000000000]]
cast_to_double: [[123456790]]

If I provide a schema, it works:

schema = pa.schema([
('original_decimal',pa.decimal128(38,20)),
('cast_to_double','double')
])
reader = arrow_odbc.read_arrow_batches_from_odbc(
    connection_string="DSN=...",
    query=query,
    schema=schema
)
t = pa.Table.from_batches(reader, schema)
display(t)

INFO - Column 'original_decimal'
Bytes used per row: 49
INFO - Column 'cast_to_double'
Bytes used per row: 16
INFO - Total memory usage per row for single transit buffer: 65

pyarrow.Table
original_decimal: decimal128(38, 20)
to_double: double
----
original_decimal: [[123456789.12345678900000000000]]
to_double: [[123456789.12345679]]

The surprising part here is pyodbc works:

with pyodbc.connect(...) as conn:
    with conn.cursor() as cursor:
        rs = cursor.execute(query).fetchall()
for c in cursor.description:
    print(c)
print(rs)

('original_decimal', <class 'decimal.Decimal'>, None, 38, 38, 20, True)
('cast_to_double', <class 'float'>, None, 17, 17, 0, True)
[(Decimal('123456789.12345678900000000000'), 123456789.12345679)]

This is happening with the Denodo ODBC driver, which they say is based on the Postgres ODBC driver 09.05. I suspect the same thing probably happens with other databases. I'm guessing this kind of issue is widespread enough that pyodbc just ignores the float precision indicated by ODBC and always uses 64 bits.

To fix: maybe add an option to always use 64-bit for floats as a workaround?
(could consider changing the default behavior to do the same thing as pyodbc, but that could break things and would be less efficient for other drivers that do things correctly here)

Db2 VARCHAR column with empty string causes panic

This is a new issue based on comments in #68

Given a Db2 table with a column that has empty strings
And the column does not allow NULL's
When I try to read the table into a pyarrow record batch
Then it should be mapped correctly
But it currently causes a panic

> evac odbc sql -c 'SELECT NUM FROM MYSCHEMA.MYTABLE' -v
DEBUG - ODBC Environment created.
DEBUG - SQLAllocHandle allocated connection (Dbc) handle '0x21976e0'
DEBUG - Database managment system name as reported by ODBC: DB2
DEBUG - ODBC driver reported for column 0. Relational type: Varchar { length: Some(20) }; Nullability: NoNulls; Name: 'NUM';
DEBUG - SQLColAttribute called with attribute 'ConciseType' for column '1' reported 12.
DEBUG - SQLColAttribute called with attribute 'DisplaySize' for column '1' reported 20.
DEBUG - Relational type of column 0: Varchar { length: Some(20) }
INFO - Column 'NUM'
Bytes used per row: 89
INFO - Total memory usage per row for single transit buffer: 89
thread '<unnamed>' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-odbc-6.1.0/src/reader/to_record_batch.rs:103:85:
called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Column 'NUM' is declared as non-nullable but contains null values")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Aborted

Table definition

CREATE TABLE "MYSCHEMA"."MYTABLE" (
		  "ID" INTEGER NOT NULL WITH DEFAULT  , 
		  "NUM" VARCHAR(20 OCTETS) NOT NULL WITH DEFAULT  , 
		  "DTE" DATE NOT NULL WITH DEFAULT '9999-09-09'
) ;

Type Image leads to memory error

Hi there

I have a column of SQL Server which is of type image. SQL Server reports MaxLength = 2147483647. This results in the following error in arrow-odbc-py:
memory allocation of 107374182350000 bytes failed

Somehow I cannot even catch this error. I guess you are using kind of a buffer that grows too much with such a high max length. I propose to impose a max length and throw an error that is recoverable - or really fix it by not using a fixed-sized buffer

how to deal with varchar(max) columns in mssql

Hi, I am using polars==0.19.7, which now includes ODBC support through arrow-odbc-py (arrow-odbc==1.2.8).

When running the code, see example below, an error occurs from arrow-odbc.

SRNM = ''
PWD = ''
DBNAME = ''
HOST = ''
PORT = ''

CONN = f"Driver={{ODBC Driver 17 for SQL Server}};Server={HOST};Port={PORT};Database={DBNAME};Uid={USERNM};Pwd={PWD}"

df = pl.read_database(
    connection=CONN,
    query="SELECT varchar_max_col FROM [dbo].[tablname]",
)

with the error being:

arrow_odbc.error.Error: There is a problem with the SQL type of the column with name: varchar_max_col and index 0:
ODBC reported a size of '0' for the column. This might indicate that the driver cannot specify a sensible upper bound for the column. E.g. for cases like VARCHAR(max). Try casting the column into a type with a sensible upper bound. The type of the column causing this error is Varchar { length: 0 }.

I can easily resolve this by editing the query to

df = pl.read_database( connection=CONN, query="SELECT CAST(varchar_max_col AS VARCHAR(100)) AS varchar_max_col FROM [dbo].[tablname]", )
which then resolves the issue (or change the column type in the database, but that is not something you want to do or always can do).

However, as varchar(max) columns still occur frequently in databases, I was wondering if there could be native support in arrow-odbc for this? In other words, it catches varchar(max) columns and optimizes the query to return these columns without throwing an error.

I hope this is the right place to ask the question, because I am not sure if this is arrow-odbc related or ODBC driver related...

Utf8Error

Got the following error when extracting a table from an MS SQL Server Database:

thread '' panicked at 'ODBC column had been expected to return valid utf8, but did not.: Utf8Error { valid_up_to: 78, error_len: None }', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-odbc-0.6.1/src/column_strategy/text.rs:95:26
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
Aborted (core dumped)

The datatype from the column that produced the error is nvarchar(40). Strangely, it works when I convert the column to nvarchar(42).

OS: Ubuntu 20.04
Driver: ODBC Driver 17 for SQL Server
Source: MS SQL Server 2019
Python Version: 3.9

insert_into_table fails

Hi Markus again ;)

I try to just read a parquet file and call insert_into_table with the most recent pyarrow version:

from pyarrow import Table
import pyarrow.parquet as pq
from arrow_odbc import insert_into_table


reader: Table = pq.read_table(tf)
insert_into_table(reader, chunk_size, table, connection_string)

I get this error:
AttributeError: 'pyarrow.lib.ChunkedArray' object has no attribute '_export_to_c'

Tinyint Mapping

In MS SQL Server, tinyint is an unsigned int. It should be mapped to uint8 therefore, I think

https://learn.microsoft.com/en-us/sql/t-sql/data-types/int-bigint-smallint-and-tinyint-transact-sql?view=sql-server-ver16

Make ODBC diagnostics available from Python

arrow-odbc-py builds on top of the arrow-odbc Rust crate. This crate allows for linking up a log backend in order to make ODBC diagnostic records available. A behavior it inherits from the odbc-api crate.

Currently only the last diagnostic record associated with the last error is forwarded to Python as part of an execptions. Warnings are not visible to Python users. Neither are any but the first record associated with an error, in case there are several.

Maybe we could provide a backend for log based on the Python logger?

Error message with SQL Server

Hi there!

After months using this lib in production, we have some error we do not quite understand.

External error: The number of diagnostic records returned by ODBC seems to exceed `32,767` This means not\n        all (diagnostic) records could be inspected. This in turn may be problematic if invariants\n        rely on checking that certain errors did not occurr. Usually this many warnings are only\n        generated, if one or more warnings per row is generated, and then there are many rows. Maybe\n        try fewer rows, or fix the cause of some of these warnings/errors?

Do you know what that means? :)

Fetching query results concurrently

I was wondering if there is any way to use more CPU cores for fetching query results.

If you have a very fast database, fetching results can be bottlenecked by the single thread that is used by arrow-odbc. One way to do it is to manually partition the query and spawn multiple instances of arrow-odbc but that adds complexity and might not be as efficient as fetching concurrently.

In turbodbc you can use use_async_io which is nice but also limited to a 2x speedup.

Explicit connection reuse

It would be convenient to be able to re-use specific database connections for explicitly sequential tasks. For instance, declare a temporary table, insert some rows from an Arrow table, and join against that session table in a subsequent select.

Things I have tried unsucessfully:

enable_odbc_connection_pooling(), as mentioned in #43
decorate connect.py:connect_to_database() with @functools.cache()
- (yes I knew it was a bad idea, thought it was worth a try)

I assume there are Rust ownership issues that make this difficult, but it would be amazing for ETL jobs which compute intermediate data sets. Right now I'm creating tables dynamically and dropping them after.

Cannot set connection timeout

Hi there!

I'm trying to use arrow-odbc to connect to an azure sql database which seems to have some timeout to connect. It would be great if I could set the connection timeout somehow, but that's not possible through connection string. I must be done by some ODBC Attribute according to the MSFT Doc . Would it be possible to expose somehow an API that allows setting the Connection Timeout or more generic ODBC Attributes on calling read_arrow_batches_from_odbc?

DB2: Invalid attribute value, function: "SQLSetEnvAttr"

Hi, thanks a lot for this awesome lib. I am having a problem with connecting to the DB2. I have setup the container (based on public.ecr.aws/lambda/python:3.10) where I installed DB ODBC client (with ibm_db). If I try isql or pyodbc everything works fine:

# isql -k -v "DSN=DB2;database=xxx;hostname=xxx;port=50001;uid=xxx;pwd=xxx"
+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+

>>> import pyodbc
>>> con = pyodbc.connect("DSN=DB2;database=xxx;hostname=xxx;port=50001;uid=xxx;pwd=xxx")
... querying works fine

But when I run arrow-odbc:

from arrow_odbc import read_arrow_batches_from_odbc
read_arrow_batches_from_odbc("SELECT * FROM xxx", connection_string="DSN=DB2;database=xxx;hostname=xxx;port=50001;uid=xxx;pwd=xxx", batch_size=10000)

I get a following error:

thread '<unnamed>' panicked at src/lib.rs:24:54:
called `Result::unwrap()` on an `Err` value: Diagnostics { record: State: S1009, Native error: 0, Message: [unixODBC][Driver Manager]Invalid attribute value, function: "SQLSetEnvAttr" }
stack backtrace:
   0:       0x40097ff1f0 - std::backtrace_rs::backtrace::libunwind::trace::he43a6a3949163f8c
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:       0x40097ff1f0 - std::backtrace_rs::backtrace::trace_unsynchronized::h50db52ca99f692e7
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:       0x40097ff1f0 - std::sys_common::backtrace::_print_fmt::hd37d595f2ceb2d3c
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:67:5
   3:       0x40097ff1f0 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h678bbcf9da6d7d75
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:44:22
   4:       0x40097acfac - core::fmt::rt::Argument::fmt::h3a159adc080a6fc9
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/fmt/rt.rs:138:9
   5:       0x40097acfac - core::fmt::write::hb8eaf5a8e45a738e
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/fmt/mod.rs:1094:21
   6:       0x40097d3c1d - std::io::Write::write_fmt::h9663fe36b2ee08f9
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/io/mod.rs:1714:15
   7:       0x40098007ae - std::sys_common::backtrace::_print::hcd4834796ee88ad2
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:47:5
   8:       0x40098007ae - std::sys_common::backtrace::print::h1360e9450e4f922a
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:34:9
   9:       0x4009800390 - std::panicking::default_hook::{{closure}}::h2609fa95cd5ab1f4
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:270:22
  10:       0x40098013b1 - std::panicking::default_hook::h6d75f5747cab6e8d
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:290:9
  11:       0x40098013b1 - std::panicking::rust_panic_with_hook::h57e78470c47c84de
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:707:13
  12:       0x4009800e82 - std::panicking::begin_panic_handler::{{closure}}::h3dfd2453cf356ecb
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:599:13
  13:       0x4009800de6 - std::sys_common::backtrace::__rust_end_short_backtrace::hdb177d43678e4d7e
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:170:18
  14:       0x4009800dd1 - rust_begin_unwind
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:595:5
  15:       0x400971c162 - core::panicking::panic_fmt::hd1e971d8d7c78e0e
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/panicking.rs:67:14
  16:       0x400971c599 - core::result::unwrap_failed::hccb456d39e9c31fc
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/result.rs:1652:5
  17:       0x400971b3d2 - std::sys_common::once::futex::Once::call::h892ebff3825980de
  18:       0x400978aba5 - arrow_odbc_connect_with_connection_string
  19:       0x40096de052 - ffi_call_unix64
  20:       0x40096dd12c - ffi_call_int
  21:       0x40096db01f - cdata_call
                               at /project/src/c/_cffi_backend.c:3201:5
  22:       0x4000b8efa1 - _PyObject_MakeTpCall
                               at /var/python-src/Objects/call.c:215
  23:       0x4000c05b4e - _PyObject_VectorcallTstate
                               at /var/python-src/./Include/cpython/abstract.h:112
  24:       0x4000c05b4e - PyObject_Vectorcall
                               at /var/python-src/./Include/cpython/abstract.h:123
  25:       0x4000c05b4e - call_function
                               at /var/python-src/Python/ceval.c:5893
  26:       0x4000c05b4e - _PyEval_EvalFrameDefault
                               at /var/python-src/Python/ceval.c:4181
  27:       0x4000bfe212 - _PyEval_EvalFrame
                               at /var/python-src/./Include/internal/pycore_ceval.h:46
  28:       0x4000bfe212 - _PyEval_Vector
                               at /var/python-src/Python/ceval.c:5067
  29:       0x4000bff3d6 - _PyObject_VectorcallTstate
                               at /var/python-src/./Include/cpython/abstract.h:114
  30:       0x4000bff3d6 - PyObject_Vectorcall
                               at /var/python-src/./Include/cpython/abstract.h:123
  31:       0x4000bff3d6 - call_function
                               at /var/python-src/Python/ceval.c:5893
  32:       0x4000bff3d6 - _PyEval_EvalFrameDefault
                               at /var/python-src/Python/ceval.c:4213
  33:       0x4000bfe212 - _PyEval_EvalFrame
                               at /var/python-src/./Include/internal/pycore_ceval.h:46
  34:       0x4000bfe212 - _PyEval_Vector
                               at /var/python-src/Python/ceval.c:5067
  35:       0x4000c00365 - _PyObject_VectorcallTstate
                               at /var/python-src/./Include/cpython/abstract.h:114
  36:       0x4000c00365 - PyObject_Vectorcall
                               at /var/python-src/./Include/cpython/abstract.h:123
  37:       0x4000c00365 - call_function
                               at /var/python-src/Python/ceval.c:5893
  38:       0x4000c00365 - _PyEval_EvalFrameDefault
                               at /var/python-src/Python/ceval.c:4231
  39:       0x4000bfe212 - _PyEval_EvalFrame
                               at /var/python-src/./Include/internal/pycore_ceval.h:46
  40:       0x4000bfe212 - _PyEval_Vector
                               at /var/python-src/Python/ceval.c:5067
  41:       0x4000bfe03d - PyEval_EvalCode
                               at /var/python-src/Python/ceval.c:1134
  42:       0x4000c9920d - run_eval_code_obj
                               at /var/python-src/Python/pythonrun.c:1291
  43:       0x4000c9919c - run_mod
                               at /var/python-src/Python/pythonrun.c:1312
  44:       0x4000b2e87d - PyRun_InteractiveOneObjectEx
                               at /var/python-src/Python/pythonrun.c:277
  45:       0x4000b2ea35 - _PyRun_InteractiveLoopObject
                               at /var/python-src/Python/pythonrun.c:148
  46:       0x4000b2e4e7 - _PyRun_AnyFileObject
                               at /var/python-src/Python/pythonrun.c:84
  47:       0x4000b2eb30 - PyRun_AnyFileExFlags
                               at /var/python-src/Python/pythonrun.c:116
  48:       0x4000b3b091 - pymain_run_stdin
                               at /var/python-src/Modules/main.c:502
  49:       0x4000b3b091 - pymain_run_python
                               at /var/python-src/Modules/main.c:590
  50:       0x4000b3b091 - Py_RunMain
                               at /var/python-src/Modules/main.c:666
  51:       0x4000ca0a49 - Py_BytesMain
                               at /var/python-src/Modules/main.c:720
  52:       0x4001bb313a - __libc_start_main
  53:           0x40064a - _start
  54:                0x0 - <unknown>

Any idea what can be wrong?

Settings

Python: 3.10.13
arrow-odbc-py:: 1.3.0
unixODBC: 2.3.1

/etc/odbcinst.ini

[DB2 Driver]
Description = DB2 Driver
Driver      = /var/lang/lib/python3.10/site-packages/clidriver/lib/libdb2.so
FileUsage   = 1
DontDLClose = 1

/etc/odbc.ini

[DB2]
Driver = DB2 Driver

Problem with Varchar(max)

Got an error when trying to extract a table with varchar(max). However, the error message was not that clear:

arrow_odbc.error.Error: ODBC reported a display size of 0.

Assume it is related to this issue:
pacman82/odbc2parquet#59

Could work around by converting it to varchar(8000).

OS: Ubuntu 20.04
Driver: ODBC Driver 17 for SQL Server
Source: MS SQL Server 2019
Python Version: 3.9

Increase default `max_bytes_per_batch`

The current value is 2 MiB which IMO is very low. Can we safely assume that a typical user has at least 512 MiB to spare? So we could make batch size 256 MiB (to accommodate for double buffering). Probably 256 MiB is not Pareto optimal anymore, but how about 64 MiB? Turbodbc uses a default batch size of 20 MiB.

Add packet size connection attribute to read_arrow_batches_from_odbc & insert_into_table

The ODBC spec provides a connection attribute to control the size of the network packet. The attribute is called SQL_ATTR_PACKET_SIZE. Db2 doesn't support the spec version but has a custom attribute called SQL_ATTR_FET_BUF_SIZE that achieves a similar result. The attribute must be set after the handle has been allocated but before the network connection is established.

The integer values of the attributes are:

SQL_ATTR_PACKET_SIZE = 112
SQL_ATTR_FET_BUF = 3001

I've received a ~2.5x increase in throughput using this attribute over a slow connection. 1.6 MB/s -> 4.2 MB/s

insert_into_table fails

Hi there!

I'm faking an issue when using insert_into_table. The only error message I get is this one: Failure to execute the sql statement, sending the data to the database. I use MS Azure SQL Server as target

I use insert_into_table for a number of tables and it works for some, for others it does not. therefore I think it might be related to not supported datatypes or wrong mappings.

Is there a way to get a more detailed error message?

Data Types I use that might be exotic:
timestamp[us] -> datetime2
bool -> bit
float -> double

Regards,
Adrian

Column Name UTF8 Error

Hello,

While reading data from an MS SQL database, an error throws that can't be caught.

arrow-odbc-4.1.1/src/schema.rs:50:14:
Column name must be representable in utf8: FromUtf8Error { bytes: [66, 117, 99, 104, 117, 110, 103, 115, 115, 99, 104, 108, 252, 115, 115, 101, 108], error: Utf8Error { valid_up_to: 12, error_len: Some(1) } }
stack backtrace:
   0:        0x11a50c138 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h3c8b3da4c3ca3a14
   1:        0x11a4c95d0 - core::fmt::write::hfb70cbdb2260ac51
   2:        0x11a502454 - std::io::Write::write_fmt::hc13c5ba5d088bd95
   3:        0x11a50bedc - std::sys_common::backtrace::print::hb0798cc2b68a4b36
   4:        0x11a4fd454 - std::panicking::default_hook::{{closure}}::hb6b09a0c32b10ee5
   5:        0x11a4fe244 - std::panicking::rust_panic_with_hook::h491fddbcf07c6736
   6:        0x11a50c48c - std::panicking::begin_panic_handler::{{closure}}::ha2bc72305b00ceb6
   7:        0x11a50c3f8 - std::sys_common::backtrace::__rust_end_short_backtrace::hb9c89d964676cd3d
   8:        0x11a4fd8d4 - _rust_begin_unwind
   9:        0x11a515198 - core::panicking::panic_fmt::h5de4b603c189570c
  10:        0x11a515484 - core::result::unwrap_failed::h6c2edae44e6d47ca
  11:        0x11a4a75e8 - arrow_odbc::reader::odbc_reader::OdbcReaderBuilder::build::h36ee0dd824483534
  12:        0x11a4afa08 - _arrow_odbc_reader_make
  13:        0x19dada050 - <unknown>
  14:        0x19dae2adc - <unknown>
  15:        0x119d85850 - _cdata_call
  16:        0x103371de4 - __PyObject_MakeTpCall
  17:        0x103451704 - __PyEval_EvalFrameDefault
  18:        0x103448a8c - __PyEval_Vector
  19:        0x10337215c - __PyVectorcall_Call
  20:        0x103453240 - __PyEval_EvalFrameDefault
  21:        0x10338b1fc - _gen_send_ex2
  22:        0x103389d58 - _gen_iternext
  23:        0x10344b740 - __PyEval_EvalFrameDefault
  24:        0x10338b1fc - _gen_send_ex2
  25:        0x103389d58 - _gen_iternext
  26:        0x103450a68 - __PyEval_EvalFrameDefault
  27:        0x10338b1fc - _gen_send_ex2
  28:        0x103389d58 - _gen_iternext
  29:        0x10337f8d4 - _enum_next
  30:        0x103450a68 - __PyEval_EvalFrameDefault
  31:        0x10344895c - _PyEval_EvalCode
  32:        0x10344548c - _builtin_exec
  33:        0x1033bc66c - _cfunction_vectorcall_FASTCALL_KEYWORDS
  34:        0x103372284 - _PyObject_Vectorcall
  35:        0x103451704 - __PyEval_EvalFrameDefault
  36:        0x10344895c - _PyEval_EvalCode
  37:        0x10349bdfc - _run_mod
  38:        0x10349a2f0 - __PyRun_SimpleFileObject
  39:        0x103499d78 - __PyRun_AnyFileObject
  40:        0x1034b91c4 - _Py_RunMain
  41:        0x1034b9574 - _pymain_main
  42:        0x1034b9614 - _Py_BytesMain

It seems the column names contain German Umlaute (äöü) which shouldn't be a problem as they are UTF8 and other tools read them just fine.

Please let me know if I should open the issue in the upstream repo.

Any help would be appreciated!

Thank you!

Specify minimum pyarrow version.

I tried to install in an environment that was using pyarrow==6.0.0

cannot import name 'RecordBatchReader' from 'pyarrow'

It looks like pyarrow==8.0.0 added RecordBatchReader, but pip didn't know about that minimum version. Could you add the minimum pyarrow version to pyproject.toml?

Strange issue with strange characters on linux

with this code:

from arrow_odbc import read_arrow_batches_from_odbc
         reader = read_arrow_batches_from_odbc(
            query=sql,
            connection_string=self.connection_string,
            max_binary_size=20000,
            max_text_size=20000,
        )
        print(sql)
        print(reader.schema)

I get this output:


SELECT [User - iD] AS [User_-_iD], [FirstName] AS [FirstName], [LastName] AS [LastName], [Age] AS [Age], [companyid] AS [companyid], CAST([time stämp] AS BIGINT) AS [time_stämp], CAST(GETUTCDATE() AS datetime2(6)) AS __timestamp, CAST(0 AS BIT) AS __is_deleted, CAST(1 AS BIT) AS __is_full_load FROM [dbo].[user]

User_-_iD: int64 not null
FirstName: string
LastName: string
Age: decimal128(15, 3)
companyid: string not null
time_stäm: int64
__timestamp: timestamp[us]
__is_deleted: bool
__is_full_load: bool

Please note the name of the [time_stämp] column in the schema

Length indicator must be non-negative.: TryFromIntError()

Hi, I am using arrow-odbc-py with db2. I have a table that has column zip that is VARCHAR(16). I am using this code to run it:

    result = read_arrow_batches_from_odbc(
        query=sql,
        connection_string=conn,
        batch_size=batch_size,
        max_text_size=4000,
    )

And I am getting this error:

Seems the problem is when the column contains null values.

Any idea, how can I deal with it? Thanks a lot!

Allow persisting connection

I'd like to have a connect method for the db connection and handle connections on our side. Looks straightforward to implement, would also do a PR

Arrow C Streaming interface

Definitly should use <this: https://arrow.apache.org/docs/format/CStreamInterface.html>.

But:

How to implement it in Rust?
- Implement it for every Box upstream?
How to consume it in Python?

Non-string paramaters cause an error.

I noticed that read_arrow_batches_from_odbc only supports string paramaters. There's some overhead if the database has to cast types when binding paramaters, so I think it would be best to support Python ints, floats, and dates.

I know the ODBC standard supports it:
https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/binding-parameter-markers?view=sql-server-ver16

If this is a lot of work, I think you should add a more descriptive error message that only string query parameters are allowed. I assumed there was a bug in how arrow-odbc was communicating with the database drivers for DB2 for IBM i, and I wasted time trying to figure that out. (Edit: until I actually looked at the source code for to_bytes_and_len)

If query_params is a list of integers in this code:

     reader = read_arrow_batches_from_odbc(
         query=query_sql,
         connection_string=job_config.dsn,
         batch_size=job_config.query_chunksize_small,
         parameters=query_params,
     )

I get this error:

INFO icmladds.db_util: Query with arrow-odbc
Traceback (most recent call last):
  File "/opt/queue/code/icmladds/main.py", line 43, in <module>
    main()
  File "/opt/queue/code/icmladds/main.py", line 29, in main
    load_feature_data.main()
  File "/opt/queue/code/icmladds/icmladds/batch/load_feature_data.py", line 606, in main
    store_demand_totals_curr = store_demand_totals(conn, curr_yr, curr_mo)
  File "/opt/queue/code/icmladds/icmladds/batch/load_feature_data.py", line 386, in store_demand_totals
    store_dmd_totals_curr_mo = db_util.query(
  File "/opt/queue/code/icmladds/icmladds/db_util.py", line 40, in query
    df = query_arrow_odbc(query_sql, query_params)
  File "/opt/queue/code/icmladds/icmladds/db_util.py", line 18, in query_arrow_odbc
    reader = read_arrow_batches_from_odbc(
  File "/opt/queue/code/icmladds/lib/python3.9/site-packages/arrow_odbc/reader.py", line 289, in read_arrow_batches_from_odbc
    encoded_parameters = [to_bytes_and_len(p) for p in parameters]
  File "/opt/queue/code/icmladds/lib/python3.9/site-packages/arrow_odbc/reader.py", line 289, in <listcomp>
    encoded_parameters = [to_bytes_and_len(p) for p in parameters]
  File "/opt/queue/code/icmladds/lib/python3.9/site-packages/arrow_odbc/connect.py", line 15, in to_bytes_and_len
    value_bytes = value.encode("utf-8")
AttributeError: 'int' object has no attribute 'encode'

Support reading datetime2 as timestamp[ms]

I have a column that is datetime2(prec=27, scale=7, length=8). However all values are actually dates:

-- Result: 0
SELECT COUNT(*)
FROM ...
WHERE (DATEDIFF_BIG(nanosecond, '1800-01-01', col) % 86400000000000) > 0

If you have dates outside of the range of timestamp[ns], loading data with arrow-odbc-py fails.

I wonder if and how arrow-odbc-py could automatically use a timestamp type with larger range than timestamp[ns] in this case.

For example, we could add an option to allow truncation of timestamps.

Build and publish documentation

A value (at least one) is too large to be written into the allocated buffer without truncation

when a query against databricks has a string column, and that has more than 256 character string, then the following error is returned:

arrow_odbc.error.Error: External error: A value (at least one) is too large to be written into the allocated buffer without truncation.

This error occurs even when setting max_text_size to a size larger than the string data in the column.

reader = read_arrow_batches_from_odbc(
query=sql_query,
connection_string=self.odbc_connection_string,
batch_size=batch_size,
max_text_size=1024000
)

odbc driver is Simba Spark ODBC Driver

odbc_connection_string = f"""
Driver=Simba Spark ODBC Driver;
HOST={dbx_hostname};
PORT=443;
Schema=default;
SparkServerType=3;
AuthMech=3;
UID=token;
PWD={dbx_token};
ThriftTransport=2;
SSL=1;
HTTPPath={dbx_endpoint};
"""

Timestamp with nanoprecision results in arrow error

Export of a datetime2(7) column with this value: "8900-12-31 00:00:00.0000000" results in the following error: pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: -2639760884514619392.

If the value is cast to a datetime2(3) no error and getting the data from parquet looks ok.
However if the value is cast to datetime2(4) no errors are reported but the value stored in parquet now is: "1886-05-08 05:05:15.485381".

Not sure if it's an error in arrow-odbc or an error in arrow.

I've used pyarrow version 12.0.1 and arrow-odbc: 1.2.0

Cannot install `arrow-odbc` on Apple Silicon Macs

Not sure how much effort would be required to build suitable wheels for distribution (I believe GitHub does offer the necessary runners these days?), but would be great to be able to install on modern Macs without having to compile locally (which I tried, but temporarily gave up on as it was getting a little non-trivial).

pip will attempt to compile it on download, but fails to resolve the relevant paths for linking (assuming you have already downloaded the necessary libraries), etc...😅

turbodbc is faster at downloading

In my benchmarks, it seems like Turbodbc with use_async_io=True is 20–30% faster than arrow-odbc-py with fetch_concurrently().

I haven't done any profiling on this yet.

Support : multiple result set

We have stored procedure like as:

create procedure simpleexample
@docno varchar(10)
as
begin
Select * from dbo.documentheader where docno = @docno
Select * from dbo.documentitems where docno = @docno
end

In fact, we have many similar stored procedures, e.g. invoice, salesorder, shiporder

Is there any sample code to support return multiple resultset from SP using arrow-odbc-py.

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: NulError(533, [69, 120, 116, 101, 114, 110, 97, 108, 32, 101, 114, 114, 111, 114, 58, 32, 84, 104, 101, 32, 110, 117, 109, 98, 101, 114, 32, 111, 102, 32, 100, 105, 97, 103, 110, 111, 115, 116, 105, 99, 32, 114, 101, 99, 111, 114, 100, 115, 32, 114, 101, 116, 117, 114, 110, 101, 100, 32, 98, 121, 32, 79, 68, 66, 67, 32, 115, 101, 101, 109, 115, 32, 116, 111, 32, 101, 120, 99, 101, 101, 100, 32, 96, 51, 50, 44, 55, 54, 55, 96, 32, 84, 104, 105, 115, 32, 109, 101, 97, 110, 115, 32, 110, 111, 116, 10, 32, 32, 32, 32, 32, 32, 32, 32, 97, 108, 108, 32, 40, 100, 105, 97, 103, 110, 111, 115, 116, 105, 99, 41, 32, 114, 101, 99, 111, 114, 100, 115, 32, 99, 111, 117, 108, 100, 32, 98, 101, 32, 105, 110, 115, 112, 101, 99, 116, 101, 100, 46, 32, 83, 97, 100, 108, 121, 32, 116, 104, 105, 115, 32, 112, 114, 101, 118, 101, 110, 116, 115, 32, 99, 104, 101, 99, 107, 105, 110, 103, 32, 102, 111, 114, 32, 116, 114, 117, 110, 99, 97, 116, 101, 100, 10, 32, 32, 32, 32, 32, 32, 32, 32, 118, 97, 108, 117, 101, 115, 32, 97, 110, 100, 32, 112, 117, 116, 115, 32, 121, 111, 117, 32, 97, 116, 32, 114, 105, 115, 107, 32, 111, 102, 32, 115, 105, 108, 101, 110, 116, 108, 121, 32, 108, 111, 111, 115, 105, 110, 103, 32, 100, 97, 116, 97, 46, 32, 85, 115, 117, 97, 108, 108, 121, 32, 116, 104, 105, 115, 32, 109, 97, 110, 121, 32, 119, 97, 114, 110, 105, 110, 103, 115, 32, 97, 114, 101, 32, 111, 110, 108, 121, 10, 32, 32, 32, 32, 32, 32, 32, 32, 103, 101, 110, 101, 114, 97, 116, 101, 100, 44, 32, 105, 102, 32, 111, 110, 101, 32, 111, 114, 32, 109, 111, 114, 101, 32, 119, 97, 114, 110, 105, 110, 103, 115, 32, 112, 101, 114, 32, 114, 111, 119, 32, 105, 115, 32, 103, 101, 110, 101, 114, 97, 116, 101, 100, 44, 32, 97, 110, 100, 32,
116, 104, 101, 110, 32, 116, 104, 101, 114, 101, 32, 97, 114, 101, 32, 109, 97, 110, 121, 32, 114, 111, 119, 115, 46, 32, 77, 97, 121, 98, 101, 10, 32, 32, 32, 32, 32, 32, 32, 32, 116, 114, 121, 32, 102, 101, 119, 101, 114, 32, 114, 111, 119, 115, 44, 32, 111, 114, 32, 102, 105, 120, 32, 116, 104, 101, 32, 99, 97, 117, 115, 101, 32, 111, 102, 32, 115, 111, 109, 101, 32, 111, 102, 32, 116, 104, 101, 115, 101, 32, 119, 97, 114, 110, 105, 110, 103,
115, 47, 101, 114, 114, 111, 114, 115, 63, 32, 79, 110, 101, 32, 111, 102, 32, 116, 104, 101, 115, 101, 32, 100, 105, 97, 103, 110, 111, 115, 116, 105, 99, 10, 32, 32, 32, 32, 32, 32, 32, 32, 114, 101, 99, 111, 114, 100, 115, 32, 99, 111, 110, 116, 97, 105, 110, 115, 58, 10, 83, 116, 97, 116, 101, 58, 32, 0, 0, 0, 0, 0, 44, 32, 78, 97, 116, 105, 118, 101, 32, 101, 114, 114, 111, 114, 58, 32, 48, 44, 32, 77, 101, 115, 115, 97, 103, 101, 58, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 46])', src\error.rs:12:43
stack backtrace:
   0:     0x7ff84e89bd8b - arrow_odbc_writer_flush
   1:     0x7ff84e872a8b - arrow_odbc_writer_flush
   2:     0x7ff84e8954e1 - arrow_odbc_writer_flush
   3:     0x7ff84e89db45 - arrow_odbc_writer_flush
   4:     0x7ff84e89d7eb - arrow_odbc_writer_flush
   5:     0x7ff84e89e0e3 - arrow_odbc_writer_flush
   6:     0x7ff84e89dcf5 - arrow_odbc_writer_flush
   7:     0x7ff84e89dc3f - arrow_odbc_writer_flush
   8:     0x7ff84e89dc14 - arrow_odbc_writer_flush
   9:     0x7ff84e8a0d25 - arrow_odbc_writer_flush
  10:     0x7ff84e8a1003 - arrow_odbc_writer_flush
  11:     0x7ff84e854b26 - arrow_odbc_connect_with_connection_string
  12:     0x7ff84e8593da - arrow_odbc_reader_next
  13:     0x7ff8515210f3 - <unknown>
  14:     0x7ff85153ac70 - PyInit__cffi_backend
  15:     0x7ff85152748f - <unknown>
  16:     0x7ff8594f866c - PyObject_MakeTpCall
  17:     0x7ff85960db2f - PyObject_CallMethod_SizeT
  18:     0x7ff859711a2f - PyEval_ThreadsInitialized
  19:     0x7ff859712972 - Py_FatalError_TstateNULL
  20:     0x7ff85967f54f - Py_gitversion
  21:     0x7ff85954f624 - PyFunction_Vectorcall
  22:     0x7ff85953bb27 - PyLong_Copy
  23:     0x7ff859539f10 - PyNumber_AsSsize_t
  24:     0x7ff859510586 - PyUnicode_IsPrintable
  25:     0x7ff8595511b9 - PyEval_EvalFrameDefault
  26:     0x7ff85954dd13 - PyObject_GC_Del
  27:     0x7ff85954f707 - PyFunction_Vectorcall
  28:     0x7ff85960daf4 - PyObject_CallMethod_SizeT
  29:     0x7ff859711a2f - PyEval_ThreadsInitialized
  30:     0x7ff859712972 - Py_FatalError_TstateNULL
  31:     0x7ff85968005e - Py_gitversion
  32:     0x7ff85954dd13 - PyObject_GC_Del
  33:     0x7ff85954f707 - PyFunction_Vectorcall
  34:     0x7ff85960daf4 - PyObject_CallMethod_SizeT
  35:     0x7ff859711a2f - PyEval_ThreadsInitialized
  36:     0x7ff859712972 - Py_FatalError_TstateNULL
  37:     0x7ff85967f9dc - Py_gitversion
  38:     0x7ff85954dd13 - PyObject_GC_Del
  39:     0x7ff85954f707 - PyFunction_Vectorcall
  40:     0x7ff85960daf4 - PyObject_CallMethod_SizeT
  41:     0x7ff859711a2f - PyEval_ThreadsInitialized
  42:     0x7ff859712972 - Py_FatalError_TstateNULL
  43:     0x7ff859680711 - Py_gitversion
  44:     0x7ff85954f624 - PyFunction_Vectorcall
  45:     0x7ff85960daf4 - PyObject_CallMethod_SizeT
  46:     0x7ff859711a2f - PyEval_ThreadsInitialized
  47:     0x7ff859712972 - Py_FatalError_TstateNULL
  48:     0x7ff85968005e - Py_gitversion
  49:     0x7ff85954dd13 - PyObject_GC_Del
  50:     0x7ff8594fafe5 - PyEval_EvalCodeWithName
  51:     0x7ff85952815b - PyEval_EvalCodeEx
  52:     0x7ff8595280b9 - PyEval_EvalCode
  53:     0x7ff859527f8a - PyFuture_FromASTObject
  54:     0x7ff859527e93 - PyFuture_FromASTObject
  55:     0x7ff85957317b - PyObject_GetBuffer
  56:     0x7ff85960daf4 - PyObject_CallMethod_SizeT
  57:     0x7ff859711a2f - PyEval_ThreadsInitialized
  58:     0x7ff859712a2a - Py_FatalError_TstateNULL
  59:     0x7ff85968005e - Py_gitversion
  60:     0x7ff85954dd13 - PyObject_GC_Del
  61:     0x7ff85954f707 - PyFunction_Vectorcall
  62:     0x7ff85960daf4 - PyObject_CallMethod_SizeT
  63:     0x7ff859711a2f - PyEval_ThreadsInitialized
  64:     0x7ff859712972 - Py_FatalError_TstateNULL
  65:     0x7ff85968005e - Py_gitversion
  66:     0x7ff85954dd13 - PyObject_GC_Del
  67:     0x7ff85954f707 - PyFunction_Vectorcall
  68:     0x7ff85960daf4 - PyObject_CallMethod_SizeT
  69:     0x7ff859711a2f - PyEval_ThreadsInitialized
  70:     0x7ff859712972 - Py_FatalError_TstateNULL
  71:     0x7ff859680711 - Py_gitversion
  72:     0x7ff85954dd13 - PyObject_GC_Del
  73:     0x7ff85954f707 - PyFunction_Vectorcall
  74:     0x7ff85960daf4 - PyObject_CallMethod_SizeT
  75:     0x7ff859711a2f - PyEval_ThreadsInitialized
  76:     0x7ff859712972 - Py_FatalError_TstateNULL
  77:     0x7ff859680711 - Py_gitversion
  78:     0x7ff859552e31 - PyEval_EvalFrameDefault
  79:     0x7ff85954f624 - PyFunction_Vectorcall
  80:     0x7ff859552660 - PyEval_EvalFrameDefault
  81:     0x7ff85954dd13 - PyObject_GC_Del
  82:     0x7ff8594fafe5 - PyEval_EvalCodeWithName
  83:     0x7ff85952815b - PyEval_EvalCodeEx
  84:     0x7ff8595280b9 - PyEval_EvalCode
  85:     0x7ff859527f8a - PyFuture_FromASTObject
  86:     0x7ff859527e93 - PyFuture_FromASTObject
  87:     0x7ff859551bb0 - PyEval_EvalFrameDefault
  88:     0x7ff85954dd13 - PyObject_GC_Del
  89:     0x7ff85955455e - PyEval_EvalFrameDefault
  90:     0x7ff85954dd13 - PyObject_GC_Del
  91:     0x7ff85954f707 - PyFunction_Vectorcall
  92:     0x7ff85953aa14 - PyVectorcall_Call
  93:     0x7ff85953a80f - PyObject_Call
  94:     0x7ff8595e7ff6 - PyPickleBuffer_GetBuffer
  95:     0x7ff8595a3fff - Py_RunMain
  96:     0x7ff8595a3ed1 - Py_RunMain
  97:     0x7ff859636419 - Py_Main
  98:     0x7ff71f461254 - <unknown>
  99:     0x7ff8a8ef84d4 - BaseThreadInitThunk
 100:     0x7ff8ab051791 - RtlUserThreadStart

Return pyarrow.RecordBatchReader from read_arrow_batches_from_odbc

This will allow arrow-odbc to be used with the pyarrow dataset API

https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatchReader.html

pacman82 / arrow-odbc-py Goto Github PK

arrow-odbc-py's People

Contributors

Stargazers

Watchers

Forkers

arrow-odbc-py's Issues

Settings

Recommend Projects

Recommend Topics

Recommend Org