Giter Site home page Giter Site logo

odbc2parquet's Introduction

ODBC to Parquet

Licence Crates.io

A command line tool to query an ODBC data source and write the result into a parquet file.

  • Small memory footprint. Only holds one batch at a time in memory.
  • Fast. Makes efficient use of ODBC bulk reads, to lower IO overhead.
  • Flexible. Query any ODBC data source you have a driver for. MySQL, MS SQL, Excel, ...

Mapping of types in queries

The tool queries the ODBC Data source for type information and maps it to parquet type as such:

ODBC SQL Type Parquet Type
Decimal(p < 39, s) Decimal(p,s)
Numeric(p < 39, s) Decimal(p,s)
Bit Boolean
Double Double
Real Float
Float(p: 0..24) Float
Float(p >= 25) Double
Tiny Integer Signed Int8 Signed
Tiny Integer Unsigned Int8 Unsigned
Small Integer Int16
Integer Int32
Big Int Int64
Date Date
Time(p: 0..3)* Time Milliseconds
Time(p: 4..6)* Time Microseconds
Time(p: 7..9)* Time Nanoseconds
Timestamp(p: 0..3) Timestamp Milliseconds
Timestamp(p: 4..6) Timestamp Microseconds
Timestamp(p >= 7) Timestamp Nanoseconds
Datetimeoffset(p: 0..3) Timestamp Milliseconds (UTC)
Datetimeoffset(p: 4..6) Timestamp Microseconds (UTC)
Datetimeoffset(p >= 7) Timestamp Nanoseconds (UTC)
Varbinary Byte Array
Long Varbinary Byte Array
Binary Fixed Length Byte Array
All others Utf8 Byte Array

p is short for precision. s is short for scale. Intervals are inclusive.

  • Time is only supported for Microsoft SQL Server

Installation

Prerequisites

To work with this tool you need an ODBC driver manager and an ODBC driver for the data source you want to access.

Windows

An ODBC driver manager is already preinstalled on windows. So is the ODBC data sources (64Bit) and ODBC data sources (32Bit) app which you can use to discover which drivers are already available on your system.

Linux

This tool links both at runtime and during build against libodbc.so. To get it you should install unixODBC. You can do this using your systems packet manager. For ubuntu you run:

sudo apt install unixodbc-dev

OS-X

This tool links both at runtime and during build against libodbc.so. To get it you should install unixODBC. To install it I recommend the homebrew packet manager, which allows you to install it using:

brew install unixodbc

Via scoop package manager

If you have scoop package manager installed (Windows only), you can install this with:

scoop install odbc2parquet

Download binary from GitHub

https://github.com/pacman82/odbc2parquet/releases/latest

Note: Download the 32 Bit version if you want to connect to data sources using 32 Bit drivers and download the 64 Bit version if you want to connect via 64 Bit drivers. It won't work vice versa.

Via Cargo

If you have a rust tool chain installed, you can install this tool via cargo.

cargo install odbc2parquet

You can install cargo from here https://rustup.rs/.

Usage

Use odbc2parquet --help to see all commands.

Query

Use odbc2parquet help query to see all options related to fetching data.

Query using connection string

odbc2parquet query \
--connection-string "Driver={ODBC Driver 17 for SQL Server};Server=localhost;UID=SA;PWD=<YourStrong@Passw0rd>;" \
out.par  \
"SELECT * FROM Birthdays"

Query using data source name

odbc2parquet query \
--dsn my_db \
--password "<YourStrong@Passw0rd>" \
--user "SA" \
out.par1 \
"SELECT * FROM Birthdays"

Use parameters in query

odbc2parquet query \
--connection-string "Driver={ODBC Driver 17 for SQL Server};Server=localhost;UID=SA;PWD=<YourStrong@Passw0rd>;" \
out.par  \
"SELECT * FROM Birthdays WHERE year > ? and year < ?" \
1990 2010

List available ODBC drivers

odbc2parquet list-drivers

List available ODBC data sources

odbc2parquet list-data-sources

Inserting data into a database

odbc2parquet insert \
--connection-string "Driver={ODBC Driver 17 for SQL Server};Server=localhost;UID=SA;PWD=<YourStrong@Passw0rd>;" \
input.par \
MyTable

Use odbc2parquet help insert to see all options related to inserting data.

Links

Thanks to @samaguire there is a script for Powershell users which helps you to download a bunch of tables to a folder: https://github.com/samaguire/odbc2parquet-PSscripts

odbc2parquet's People

Contributors

alippai avatar dependabot-preview[bot] avatar dependabot[bot] avatar dtolnay avatar holgerpeters avatar igitur avatar pacman82 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

odbc2parquet's Issues

Example PS Script

I recently wrote a Powershell script to extract some tables. Thought I would share a 'generic' version that extracts the tables from a localhost copy of the ContosoRetailDW db after pulling the latest version of the tool from github. It may be handy for others and I thought you could include in your ReadMe/Wiki.

ContosoRetailDW_CreateParquetFiles.ps1.txt

Grt Prj BTW!

Support MSSQL data type TIME

Currencly the data type time from MSSQL is exported as BYTE_ARRAY, UTF8, String:

Column description from parquet-tools:

############ Column(OrderTime) ############
name: OrderTime
path: OrderTime
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
compression: ZSTD (space_saved: 62%)

I would have expected a parquet TIME type in logical_type/converted_type.

Support UTF-16LE encoding

Hey @pacman82,

I really like your tool, it's easy to use and fast for exporting data out of MSSQL. For me there is one issue though: I am using MSSQL Server 2016 which uses UTF-16LE encoding by default. This results in special characters being replaced by the Unicode Replacement Character in the target parquet file, which makes recovery impossible. Could you maybe add an option for supporting UTF-16LE encoding? I think also just a flag to write strings directly as varbinary in the parquet file could do the trick for me.

Thank you, cheers,
Clemens

Apache Arrow support, and other questions

Hi Markus, just discovered this through looking at public crates that depend on Parquet.

I'm one of the Arrow commiters, working on the Rust implementation of Arrow and Parquet.
We recently received a PR that does a lot of refactoring on the Parquet crate, and will very likely come as a breaking change in the next 2-4 months (release 3.0).
If you have some time, may you please look at building your crate with the changes at (apache/arrow#8300 (comment)), and provide some feedback if possible.

We are busy with Parquet <> Arrow IO on https://github.com/apache/arrow/tree/rust-parquet-arrow-writer, which we're also hoping to be ready by 3.0.
Out of curiousity, how easy/hard would it be to extend odbc2parquet to odbc > arrow, with Parquet then effectively coming for 'free', similar to turbodbc?

We are currently lacking on DB access in Arrow, and that potentially keeps the door closed for people who'd like to optimise Db <> File IO in Rust. I think we could benefit from a lot of allocation improvements in Arrow, so having ODBC to Arrow as an option, would help with ODBC <> CSV and other future formats.

Thanks ๐Ÿ˜ƒ

Compression SNAPPY not possible since version 0.13.2

When I create a parquet file with compression snappy the compression is ZSTD even I defined it with --column-compression-default snappy.

This seems to be a bug with a new version because in version 0.8.0 this worked, but with version 0.13.2 this does not work anymore.

Memory allocation error due to Longtext column

Hey, not entirely sure this issue will make sense, but I'm an absolute parquet noob and found your tool as a way of dumping stuff from a MariaDB to Parquet to provide this to other folks.

I'm encountering a memory allocation error

# odbc2parquet -vvv query --connection-string "Driver={MySQL ODBC 8.0 Unicode Driver};Server=${MARIADB_HOST};Database=${DATABASE};Uid=${MARIADB_USER};Pwd=${MARIADB_PASS};" --batch-size 100000 --batches-per-file 100 "/tmp/${TABLE}.par" "SELECT * FROM ${TABLE}"
2021-03-19T12:45:21+00:00 - DEBUG - ODBC Environment created.
2021-03-19T12:45:21+00:00 - INFO - Batch size set to 100000
2021-03-19T12:45:21+00:00 - DEBUG - ODBC column description for column 1: ColumnDescription { name: [114, 101, 115, 101, 108, 108, 101, 114, 95, 105, 100], data_type: Integer, nullability: Nullable }
2021-03-19T12:45:21+00:00 - DEBUG - ODBC buffer description for column 1: BufferDescription { nullable: true, kind: I32 }
[...]
2021-03-19T12:45:21+00:00 - DEBUG - ODBC column description for column 182: ColumnDescription { name: [101, 120, 116, 114, 97, 95, 97, 116, 116, 114, 105, 98, 117, 116, 101, 115], data_type: Other { data_type: SqlDataType(-10), column_size: 65535, decimal_digits: 0 }, nullability: Nullable }
2021-03-19T12:45:21+00:00 - DEBUG - ODBC buffer description for column 182: BufferDescription { nullable: true, kind: Text { max_str_len: 21845 } }
memory allocation of 143165576600000 bytes failed
Aborted (core dumped)

which I'm fairly certain should be connected to the following

2021-03-19T12:45:21+00:00 - DEBUG - ODBC column description for column 166: ColumnDescription { name: [112, 114, 105, 118, 97, 99, 121, 95, 112, 111, 108, 105, 99, 121], data_type: Other { data_type: SqlDataType(-10), column_size: 4294967295, decimal_digits: 0 }, nullability: Nullable }
2021-03-19T12:45:21+00:00 - DEBUG - ODBC buffer description for column 166: BufferDescription { nullable: true, kind: Text { max_str_len: 1431655765 } }

which in MySQL is this

| privacy_policy                        | longtext                                                           | YES  |     | NULL                |                |

The factor between the columns max_str_len and the memory allocation is a bit more than 100000 so this appears too connected to be random to me.
I have no influence over the source data, so I will not be able to convince anyone to change the type of this field from LONGTEXT to something more reasonable. The largest entry in this column is 366211 characters, so there's definitely no data in there that would require a memory allocation of 143TB.

I'm not entirely sure why this happens though, hence this issue.
The maximum length for an entry in a LONGTEXT column is 4.3GB, which again, none of the entries are even close to having, but no one will be touching this. But how could this lead to a memory allocation of a bit more than 100000 that?
I'm guessing the allocation happens somewhere around https://github.com/pacman82/odbc2parquet/blob/master/src/query.rs#L417-L428 given that this is a field of type other? The entire loop runs through though as you can see above. The table in question has 182 columns and we see column/buffer descriptions for every column. The memory allocation error happens after that.

Do you have any ideas of what could be done about this? Would be really nice to dump this data into Parquet but with it randomly crashing right now I'm entirely at a loss :)
I'm running odbc2parquet 0.5.9 installed via cargo on debian buster.
If I can provide any more data that could help here I'm completely up for that!

Error reading Bit / Boolean columns

Hi,

I am testing this tool and it have been great for most of our tables extraction.
However, I am encountering errors while trying to read columns with Bit data type.

odbc2parquet -vvv query --connection-string "Driver={ODBC Driver 17 for SQL Server};Server=<...>;database=<...>;UID=<...>;PWD=<...>;" Pessoas.parquet "SELECT HabilitadoRiscoSacado_pes FROM Pessoas"
2021-07-30T18:03:35-03:00 - DEBUG - ODBC Environment created.
2021-07-30T18:03:36-03:00 - WARN - State: 01000, Native error: 5701, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Contexto do banco de dados alterado para '<...>'.
2021-07-30T18:03:36-03:00 - WARN - State: 01000, Native error: 5703, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Definiรงรฃo do idioma alterada para Portuguรชs (Brasil).
2021-07-30T18:03:36-03:00 - DEBUG - ODBC column description for column 1: ColumnDescription { name: [72, 97, 98, 105, 108, 105, 116, 97, 100, 111, 82, 105, 115, 99, 111, 83, 97, 99, 97, 100, 111, 95, 112, 101, 115], data_type: Bit, nullability: NoNulls }
2021-07-30T18:03:36-03:00 - DEBUG - ODBC buffer description for column 1: BufferDescription { nullable: true, kind: Bit }
2021-07-30T18:03:36-03:00 - INFO - Memory usage per row is 68 bytes. This excludes memory directly allocated by the ODBC driver.
2021-07-30T18:03:36-03:00 - INFO - Batch size set to 31580641
2021-07-30T18:03:37-03:00 - INFO - Fetched batch 1 with 83817 rows.
2021-07-30T18:03:37-03:00 - DEBUG - Writing column with index 0 and name 'HabilitadoRiscoSacado_pes'.
Error: EOF: unable to put boolean value

If I cast it as a Tinyint it works. Do you have any idea of why is this happening?

Thanks,
Renรฉ

Quick benchmark

With default options I made the following measurements using a 10m row table with few int32, int64, double, date, timestamp columns (so no string or other exotic type).

tool			filetype	output size		execution
MySQL			txt			1,5 GB			47 seconds
odbc2parquet 	parquet		344MB file		54 seconds
turbodbc 		pandas DF	(in-memory)		54 seconds

IDK whether this is expected, but thought you might find it useful.

ODBC driver indicated must indicate string length correctly.: FromBytesWithNulError { kind: InteriorNul(38) }

Hi,

First of all, thank you for such a nice tool!
I'm trying to use it to dump the whole database from the MS SQL 15 Server, but unfortunately, I'm getting some error on the biggest table there. I tried to split it into smaller chunks, use different drivers but nothing actually worked.

thread 'main' panicked at 'ODBC driver indicated must indicate string length correctly.: FromBytesWithNulError { kind: InteriorNul(38) }', C:\Users\runneradmin\.cargo\registry\src\github.com-1ecc6299db9ec823\odbc-api-0.14.0\src\buffers\text_column.rs:56:22

I tested it on a couple of other tables and everything worked. Do you have any idea what could cause that?

Getting the SQL statement from stdin

I would like to have an option to pass the SQL statement to the odbc2parquet query command via stdin, via an extra option (e.g. --query-from-stdin) or by passing - as SQL query statement.

Proposed shell parameter design

Option 1 by passing -

cat my_sql_query.sql \
  | odbc2parquet query -c "<connection_string>" my-output.parquet -

Option 2 by using a new parameter --query-from-stdin

cat my_sql_query.sql \
  | odbc2parquet query -c "<connection_string>" --query-from-stdin my-output.parquet

Prompt user for password

Prompt for password if the user does not provide one then connecting via dsn on the command line. Consider using https://crates.io/crates/indicatif to handle the user dialog.

How do we want to treat passwords in connection strings? Or rather connection strings without one.

String data, right truncation

2020-12-03T13:05:09+00:00 - WARN - State: 01000, Native error: 5701, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Changed database context to 'master'.
2020-12-03T13:05:09+00:00 - WARN - State: 01000, Native error: 5703, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Changed language setting to us_english.
2020-12-03T13:05:09+00:00 - INFO - Batch size set to 100000
2020-12-03T13:05:09+00:00 - DEBUG - ODBC column description for column 1: ColumnDescription { name: [73, 100], data_type: Numeric { precision: 15, scale: 1 }, nullable: Nullable }
2020-12-03T13:05:09+00:00 - DEBUG - ODBC buffer description for column 1: BufferDescription { nullable: true, kind: Text { max_str_len: 17 } }
2020-12-03T13:05:09+00:00 - DEBUG - ODBC column description for column 2: ColumnDescription { name: [70, 105, 114, 115, 116, 78, 97, 109, 101], data_type: Other { data_type: SqlDataType(-9), column_size: 40, decimal_digits: 0 }, 
nullable: NoNulls }
2020-12-03T13:05:09+00:00 - DEBUG - ODBC buffer description for column 2: BufferDescription { nullable: true, kind: Text { max_str_len: 40 } }
2020-12-03T13:05:09+00:00 - DEBUG - ODBC column description for column 3: ColumnDescription { name: [76, 97, 115, 116, 78, 97, 109, 101], data_type: Other { data_type: SqlDataType(-9), column_size: 40, decimal_digits: 0 }, 
nullable: NoNulls }
2020-12-03T13:05:09+00:00 - DEBUG - ODBC buffer description for column 3: BufferDescription { nullable: true, kind: Text { max_str_len: 40 } }
2020-12-03T13:05:09+00:00 - DEBUG - ODBC column description for column 4: ColumnDescription { name: [67, 105, 116, 121], data_type: Other { data_type: SqlDataType(-9), column_size: 40, decimal_digits: 0 }, nullable: Nullable }
2020-12-03T13:05:09+00:00 - DEBUG - ODBC buffer description for column 4: BufferDescription { nullable: true, kind: Text { max_str_len: 40 } }
2020-12-03T13:05:09+00:00 - DEBUG - ODBC column description for column 5: ColumnDescription { name: [67, 111, 117, 110, 116, 114, 121], data_type: Other { data_type: SqlDataType(-9), column_size: 40, decimal_digits: 0 }, 
nullable: Nullable }
2020-12-03T13:05:09+00:00 - DEBUG - ODBC buffer description for column 5: BufferDescription { nullable: true, kind: Text { max_str_len: 40 } }
2020-12-03T13:05:09+00:00 - DEBUG - ODBC column description for column 6: ColumnDescription { name: [80, 104, 111, 110, 101], data_type: Other { data_type: SqlDataType(-9), column_size: 20, decimal_digits: 0 }, nullable: 
Nullable }
2020-12-03T13:05:09+00:00 - DEBUG - ODBC buffer description for column 6: BufferDescription { nullable: true, kind: Text { max_str_len: 20 } }
2020-12-03T13:05:09+00:00 - DEBUG - ODBC column description for column 7: ColumnDescription { name: [71, 101, 111, 109, 101, 116, 114, 121], data_type: Other { data_type: SqlDataType(-151), column_size: 0, decimal_digits: 0 }, 
nullable: Nullable }
2020-12-03T13:05:09+00:00 - DEBUG - ODBC buffer description for column 7: BufferDescription { nullable: true, kind: Text { max_str_len: 0 } }
2020-12-03T13:05:45+00:00 - WARN - State: 01004, Native error: 0, Message: [Microsoft][ODBC Driver 17 for SQL Server]String data, right truncation
2020-12-03T13:05:45+00:00 - WARN - State: 01004, Native error: 0, Message: [Microsoft][ODBC Driver 17 for SQL Server]String data, right truncation

Then I get around 300.000 errors like this

2020-12-03T13:05:45+00:00 - WARN - State: 01004, Native error: 0, Message: [Microsoft][ODBC Driver 17 for SQL Server]String data, right truncation
2020-12-03T13:05:45+00:00 - WARN - State: 01004, Native error: 0, Message: [Microsoft][ODBC Driver 17 for SQL Server]String data, right truncation
2020-12-03T13:05:45+00:00 - WARN - State: 01004, Native error: 0, Message: [Microsoft][ODBC Driver 17 for SQL Server]String data, right truncation
2020-12-03T13:05:45+00:00 - WARN - State: 01004, Native error: 0, Message: [Microsoft][ODBC Driver 17 for SQL Server]String data, right truncation
2020-12-03T13:05:45+00:00 - WARN - State: 01004, Native error: 0, Message: [Microsoft][ODBC Driver 17 for SQL Server]String data, right truncation
2020-12-03T13:05:45+00:00 - WARN - State: 01004, Native error: 0, Message: [Microsoft][ODBC Driver 17 for SQL Server]String data, right truncation
2020-12-03T13:05:45+00:00 - WARN - State: 01004, Native error: 0, Message: [Microsoft][ODBC Driver 17 for SQL Server]String data, right truncation
2020-12-03T13:05:45+00:00 - WARN - State: 01004, Native error: 0, Message: [Microsoft][ODBC Driver 17 for SQL Server]String data, right truncation
2020-12-03T13:05:45+00:00 - WARN - State: 01004, Native error: 0, Message: [Microsoft][ODBC Driver 17 for SQL Server]String data, right truncation

My guess is that it has to do with conversion of the MSSQL data type Geography to Utf8 Byte Array?

Originally posted by @Htorne in #16 (comment)

Option to not generate file if row count is 0

Hi Markus,

Is there a way to prevent output file to be generated if the returned number of rows from a query is 0?

I saw in the code that the file is not generated if the query doesn't return any result at all. However, as soon as a schema is returned, a file is generated - even if no rows are returned.

Unfortunatelly I don't have permission to modify the views / stored procedures to do this kind of check at the data source level.

Best regards,
Saลกo

use zero-prefixed digits in multi file output for sort-by-filename firendly

Currently, the --batch-size-row produce file output like foo_1.par foo_2.par foo_10.par foo_100.par
However, this is not sort-by-filename friendly

Because in most string sort, foo_9.par is larger than foo_10.par because the compare char one-by-one.

So if someone want load the parquet file back by order, he had to sort it by time, or rename them somehow.

A general solution is prefix the number by zeros, to a fixed width, for example:
foo_00009.par foo_00090.par
So they can sort by string correctly. Maybe use 7 width is big enough?

The split command of linux use a suffix like foo.aaaaa foo.aaaba https://man7.org/linux/man-pages/man1/split.1.html

Setup nightly toolchain in dev container

Right now the dev container comes out of the box with a stable toolchain, so contributers must manually install nightly. Would be nice if we could do this during docker setup.

Converted type not written to output file for timestamps without timezone

When I connect to a SQL Server database and query data with data type datetime or datetime2, it gets converted to parquet data type BIGINT. But I would expect data type TIMESTAMP (as shown here).

This looks like a custom implementation needs to be done for these two types as well. According to Data Type Support for ODBC Date and Time Improvements, the types SQL_TYPE_TIMESTAMP/SQL_TIMESTAMP struct types should be available to read the datetime information.

This issue is similar to #249

Allow for splitting files into several smaller ones (for downloading larger files)

Users should be able to pass an option with --query which allows to control an upper limit of row groups (or rows?) for the output file. If the limit is reached a new file with almost the same name but the the suffix _n is created. n being a number which is iterated with each new file created. This allows for opting into creating reasonably large files instead of very large files.

Enable parquet compresssion options

It appears the rust parquet library supports compression but isn't used during query export. It would be useful to be able to specify parquet compression type / options via the CLI. I was experimenting with using this to export a large amount of data and the parquet files were many times larger than doing the same "manually" (eg. python) with compression.

Spark import error - not supporting FIXED_LEN_BYTE_ARRAY

Great tool btw. Exactly what I was after.

Though I am struggling to use the results in spark.

Caused by: org.apache.spark.sql.AnalysisException: Illegal Parquet type: FIXED_LEN_BYTE_ARRAY

Is there a way to adjust the type mapping in any way? Or anything you can think of to help out here

Invalid SQL Data Type Reported on SQLFetch (Oracle 19c on Windows 10 64Bit)

Hi,

I am attempting to use odbc2parquet to export a large Oracle 19c history table (millions of records) with the definition below and odbc2parquet (version 0.6.21) reports the following error:

2022-02-01T16:37:55-05:00 - WARN - State: HY004, Native error: 0, Message: [Oracle][ODBC]Invalid SQL data type <-25>.
Error: ODBC emitted an error calling 'SQLFetch':
State: HY004, Native error: 0, Message: [Oracle][ODBC]Invalid SQL data type <-25>.                          

None of the columns appear to have unsupported datatypes given that they are just numbers and varchars. Any assistance you can provide with this issue would be greatly appreciated. Thanks!

FCIHISTV
-----------
ACC_RN                            NUMBER(10)   
CIHIS_COMP                        NUMBER(2)    
CIHIS_APPLICATION                 NUMBER(2)    
CIHIS_CUST                        NUMBER(12)   
CIHIS_DATE                        NUMBER(8)    
CIHIS_SEQ                         NUMBER(7)    
CIHIS_ELEMENT                     NUMBER(4)    
CIHIS_TYPE_CODE                   NUMBER(4)    
CIHIS_READ_SEQ                    NUMBER(4)    
CIHIS_TRANS_TYPE                  NUMBER(2)    
CIHIS_BILL_DATE                   NUMBER(9)    
CIHIS_AMT                         NUMBER(11,2) 
CIHIS_FACTOR                      NUMBER(9,6)  
CIHIS_MANUAL                      NUMBER(2)    
CIHIS_ADJDT                       NUMBER(9)    
CIHIS_TYPE                        NUMBER(2)    
CIHIS_COUNT                       NUMBER(7)    
CIHIS_UT_TAX1_AMT                 NUMBER(11,2) 
CIHIS_UT_TAX2_AMT                 NUMBER(11,2) 
CIHIS_BANK                        NUMBER(4)    
CIHIS_RATE                        NUMBER(4)    
CIHIS_TAX_CD                      NUMBER(2)    
CIHIS_TAX_TYPE                    NUMBER(2)    
CIHIS_MTAX_KIND                   NUMBER(2)    
CIHIS_BDOWN                       NUMBER(2)    
CIHIS_BUDGET_CD                   NUMBER(2)    
CIHIS_FROM_DATE                   NUMBER(9)    
CIHIS_WO_NUM                      NUMBER(10)   
CIHIS_CLASS_TYPE                  NUMBER(4)    
CIHIS_CLASS_NUM                   NUMBER(4)    
CIHIS_PRO                         NUMBER(2)    
CIHIS_POSTMARK_DT                 NUMBER(9)    
CIHIS_TX_CODE                     NUMBER(2)    
CIHIS_DELINQ_FLAG                 VARCHAR2(1)  
CIHIS_DISPUTE_FLAG                VARCHAR2(1)  
CIHIS_ACTION                      NUMBER(2)    
CIHIS_DESC                        VARCHAR2(20) 
CIHIS_USE                         NUMBER(9)    
CIHIS_ACCT_TP                     VARCHAR2(1)  
CIHIS_ALT_CUST                    NUMBER(12)   
CIHIS_TAXABLE_SALES               NUMBER(11,2) 
CIHIS_METERED_ITEM_RATE           NUMBER(4)    
CIHIS_NONTAXABLE_SALES            NUMBER(11,2) 
CIHIS_TAX_INSTALLMENT             NUMBER(2)    
CIHIS_TSS_BOD                     NUMBER(11,2) 
CIHIS_MIN_ADD                     NUMBER(11,2) 
FILLER_CD1                        NUMBER(2)    
FILLER_CD2                        NUMBER(2)    
CIHIS_OYR                         NUMBER(4)    
CIHIS_TO_DATE                     NUMBER(9)    
CIHIS_ITEM_SEQ                    NUMBER(7)    
CIHIS_PPAC_TYPE                   NUMBER(4)    
CIHIS_DEFER_DUE_DT                NUMBER(9)    
CIHIS_SERV_VOLT                   NUMBER(9)    
CIHIS_SERVICE                     NUMBER(4)    
CIHIS_BUDPAY                      NUMBER(11,2) 
CIHIS_BILL_CONTRACT               VARCHAR2(10) 
CIHIS_BLOCK_AMT1                  NUMBER(11,2) 
CIHIS_BLOCK_USE1                  NUMBER(11)   
CIHIS_BLOCK_AMT2                  NUMBER(11,2) 
CIHIS_BLOCK_USE2                  NUMBER(11)   
CIHIS_CRB_PCT                     NUMBER(9,6)  
CIHIS_CRB_NUM                     NUMBER(2)    
CIHIS_BALANCE_AMOUNT              NUMBER(11,2) 
CIHIS_WORKORDER_APPLICATION       NUMBER(2)    
CIHIS_FIXEDID                     NUMBER(10)  

odbc2parquet command and output

E:\history>odbc2parquet -vvv query  --column-compression-default gzip -c "Driver={Oracle in instantclient_19_13};Dbq=WMPTST;Uid=redacted;Pwd=redacted;" test.parquet  "select * from cis.fcihistv"
2022-02-01T16:37:53-05:00 - DEBUG - ODBC Environment created.
2022-02-01T16:37:53-05:00 - WARN - State: 01000, Native error: 0, Message: [Microsoft][ODBC Driver Manager] The driver doesn't support the version of ODBC behavior that the application requested (see SQLSetEnvAttr).
2022-02-01T16:37:53-05:00 - WARN - State: 01004, Native error: 0, Message: [Oracle][ODBC]String data, right truncated.doesn't support the version of ODBC behavi
2022-02-01T16:37:53-05:00 - DEBUG - ODBC column description for column 1: ColumnDescription { name: [65, 67, 67, 95, 82, 78], data_type: Decimal { precision: 10, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC buffer description for column 1: BufferDescription { nullable: true, kind: I64 }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC column description for column 2: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 67, 79, 77, 80], data_type: Decimal { precision: 2, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC buffer description for column 2: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC column description for column 3: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 65, 80, 80, 76, 73, 67, 65, 84, 73, 79, 78], data_type: Decimal { precision: 2, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC buffer description for column 3: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC column description for column 4: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 67, 85, 83, 84], data_type: Decimal { precision: 12, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC buffer description for column 4: BufferDescription { nullable: true, kind: I64 }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC column description for column 5: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 68, 65, 84, 69], data_type: Decimal { precision: 8, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC buffer description for column 5: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC column description for column 6: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 83, 69, 81], data_type: Decimal { precision: 7, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC buffer description for column 6: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC column description for column 7: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 69, 76, 69, 77, 69, 78, 84], data_type: Decimal { precision: 4, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC buffer description for column 7: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC column description for column 8: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 84, 89, 80, 69, 95, 67, 79, 68, 69], data_type: Decimal { precision: 4, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC buffer description for column 8: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC column description for column 9: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 82, 69, 65, 68, 95, 83, 69, 81], data_type: Decimal { precision: 4, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC buffer description for column 9: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC column description for column 10: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 84, 82, 65, 78, 83, 95, 84, 89, 80, 69], data_type: Decimal { precision: 2, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC buffer description for column 10: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC column description for column 11: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 66, 73, 76, 76, 95, 68, 65, 84, 69], data_type: Decimal { precision: 9, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC buffer description for column 11: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC column description for column 12: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 65, 77, 84], data_type: Decimal { precision: 11, scale: 2 }, nullability: Nullable }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC buffer description for column 12: BufferDescription { nullable: true, kind: Text { max_str_len: 13 } }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC column description for column 13: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 70, 65, 67, 84, 79, 82], data_type: Decimal { precision: 9, scale: 6 }, nullability: Nullable }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC buffer description for column 13: BufferDescription { nullable: true, kind: Text { max_str_len: 11 } }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC column description for column 14: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 77, 65, 78, 85, 65, 76], data_type: Decimal { precision: 2, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC buffer description for column 14: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC column description for column 15: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 65, 68, 74, 68, 84], data_type: Decimal { precision: 9, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC buffer description for column 15: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC column description for column 16: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 84, 89, 80, 69], data_type: Decimal { precision: 2, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC buffer description for column 16: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC column description for column 17: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 67, 79, 85, 78, 84], data_type: Decimal { precision: 7, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC buffer description for column 17: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:53-05:00 - DEBUG - ODBC column description for column 18: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 85, 84, 95, 84, 65, 88, 49, 95, 65, 77, 84], data_type: Decimal { precision: 11, scale: 2 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 18: BufferDescription { nullable: true, kind: Text { max_str_len: 13 } }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 19: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 85, 84, 95, 84, 65, 88, 50, 95, 65, 77, 84], data_type: Decimal { precision: 11, scale: 2 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 19: BufferDescription { nullable: true, kind: Text { max_str_len: 13 } }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 20: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 66, 65, 78, 75], data_type: Decimal { precision: 4, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 20: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 21: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 82, 65, 84, 69], data_type: Decimal { precision: 4, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 21: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 22: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 84, 65, 88, 95, 67, 68], data_type: Decimal { precision: 2, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 22: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 23: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 84, 65, 88, 95, 84, 89, 80, 69], data_type: Decimal { precision: 2, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 23: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 24: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 77, 84, 65, 88, 95, 75, 73, 78, 68], data_type: Decimal { precision: 2, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 24: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 25: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 66, 68, 79, 87, 78], data_type: Decimal { precision: 2, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 25: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 26: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 66, 85, 68, 71, 69, 84, 95, 67, 68], data_type: Decimal { precision: 2, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 26: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 27: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 70, 82, 79, 77, 95, 68, 65, 84, 69], data_type: Decimal { precision: 9, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 27: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 28: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 87, 79, 95, 78, 85, 77], data_type: Decimal { precision: 10, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 28: BufferDescription { nullable: true, kind: I64 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 29: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 67, 76, 65, 83, 83, 95, 84, 89, 80, 69], data_type: Decimal { precision: 4, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 29: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 30: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 67, 76, 65, 83, 83, 95, 78, 85, 77], data_type: Decimal { precision: 4, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 30: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 31: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 80, 82, 79], data_type: Decimal { precision: 2, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 31: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 32: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 80, 79, 83, 84, 77, 65, 82, 75, 95, 68, 84], data_type: Decimal { precision: 9, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 32: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 33: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 84, 88, 95, 67, 79, 68, 69], data_type: Decimal { precision: 2, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 33: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 34: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 68, 69, 76, 73, 78, 81, 95, 70, 76, 65, 71], data_type: Varchar { length: 1 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 34: BufferDescription { nullable: true, kind: WText { max_str_len: 2 } }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 35: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 68, 73, 83, 80, 85, 84, 69, 95, 70, 76, 65, 71], data_type: Varchar { length: 1 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 35: BufferDescription { nullable: true, kind: WText { max_str_len: 2 } }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 36: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 65, 67, 84, 73, 79, 78], data_type: Decimal { precision: 2, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 36: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 37: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 68, 69, 83, 67], data_type: Varchar { length: 20 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 37: BufferDescription { nullable: true, kind: WText { max_str_len: 40 } }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 38: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 85, 83, 69], data_type: Decimal { precision: 9, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 38: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 39: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 65, 67, 67, 84, 95, 84, 80], data_type: Varchar { length: 1 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 39: BufferDescription { nullable: true, kind: WText { max_str_len: 2 } }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 40: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 65, 76, 84, 95, 67, 85, 83, 84], data_type: Decimal { precision: 12, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 40: BufferDescription { nullable: true, kind: I64 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 41: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 84, 65, 88, 65, 66, 76, 69, 95, 83, 65, 76, 69, 83], data_type: Decimal { precision: 11, scale: 2 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 41: BufferDescription { nullable: true, kind: Text { max_str_len: 13 } }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 42: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 77, 69, 84, 69, 82, 69, 68, 95, 73, 84, 69, 77, 95, 82, 65, 84, 69], data_type: Decimal { precision: 4, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 42: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 43: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 78, 79, 78, 84, 65, 88, 65, 66, 76, 69, 95, 83, 65, 76, 69, 83], data_type: Decimal { precision: 11, scale: 2 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 43: BufferDescription { nullable: true, kind: Text { max_str_len: 13 } }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 44: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 84, 65, 88, 95, 73, 78, 83, 84, 65, 76, 76, 77, 69, 78, 84], data_type: Decimal { precision: 2, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 44: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 45: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 84, 83, 83, 95, 66, 79, 68], data_type: Decimal { precision: 11, scale: 2 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 45: BufferDescription { nullable: true, kind: Text { max_str_len: 13 } }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 46: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 77, 73, 78, 95, 65, 68, 68], data_type: Decimal { precision: 11, scale: 2 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 46: BufferDescription { nullable: true, kind: Text { max_str_len: 13 } }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 47: ColumnDescription { name: [70, 73, 76, 76, 69, 82, 95, 67, 68, 49], data_type: Decimal { precision: 2, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 47: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 48: ColumnDescription { name: [70, 73, 76, 76, 69, 82, 95, 67, 68, 50], data_type: Decimal { precision: 2, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 48: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 49: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 79, 89, 82], data_type: Decimal { precision: 4, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 49: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 50: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 84, 79, 95, 68, 65, 84, 69], data_type: Decimal { precision: 9, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 50: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 51: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 73, 84, 69, 77, 95, 83, 69, 81], data_type: Decimal { precision: 7, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 51: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 52: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 80, 80, 65, 67, 95, 84, 89, 80, 69], data_type: Decimal { precision: 4, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 52: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 53: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 68, 69, 70, 69, 82, 95, 68, 85, 69, 95, 68, 84], data_type: Decimal { precision: 9, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 53: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 54: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 83, 69, 82, 86, 95, 86, 79, 76, 84], data_type: Decimal { precision: 9, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 54: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 55: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 83, 69, 82, 86, 73, 67, 69], data_type: Decimal { precision: 4, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 55: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 56: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 66, 85, 68, 80, 65, 89], data_type: Decimal { precision: 11, scale: 2 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 56: BufferDescription { nullable: true, kind: Text { max_str_len: 13 } }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 57: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 66, 73, 76, 76, 95, 67, 79, 78, 84, 82, 65, 67, 84], data_type: Varchar { length: 10 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 57: BufferDescription { nullable: true, kind: WText { max_str_len: 20 } }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 58: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 66, 76, 79, 67, 75, 95, 65, 77, 84, 49], data_type: Decimal { precision: 11, scale: 2 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 58: BufferDescription { nullable: true, kind: Text { max_str_len: 13 } }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 59: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 66, 76, 79, 67, 75, 95, 85, 83, 69, 49], data_type: Decimal { precision: 11, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 59: BufferDescription { nullable: true, kind: I64 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 60: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 66, 76, 79, 67, 75, 95, 65, 77, 84, 50], data_type: Decimal { precision: 11, scale: 2 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 60: BufferDescription { nullable: true, kind: Text { max_str_len: 13 } }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 61: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 66, 76, 79, 67, 75, 95, 85, 83, 69, 50], data_type: Decimal { precision: 11, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 61: BufferDescription { nullable: true, kind: I64 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 62: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 67, 82, 66, 95, 80, 67, 84], data_type: Decimal { precision: 9, scale: 6 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 62: BufferDescription { nullable: true, kind: Text { max_str_len: 11 } }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 63: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 67, 82, 66, 95, 78, 85, 77], data_type: Decimal { precision: 2, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 63: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 64: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 66, 65, 76, 65, 78, 67, 69, 95, 65, 77, 79, 85, 78, 84], data_type: Decimal { precision: 11, scale: 2 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 64: BufferDescription { nullable: true, kind: Text { max_str_len: 13 } }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 65: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 87, 79, 82, 75, 79, 82, 68, 69, 82, 95, 65, 80, 80, 76, 73, 67, 65, 84, 73, 79, 78], data_type: Decimal { precision: 2, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 65: BufferDescription { nullable: true, kind: I32 }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC column description for column 66: ColumnDescription { name: [67, 73, 72, 73, 83, 95, 70, 73, 88, 69, 68, 73, 68], data_type: Decimal { precision: 10, scale: 0 }, nullability: Nullable }
2022-02-01T16:37:54-05:00 - DEBUG - ODBC buffer description for column 66: BufferDescription { nullable: true, kind: I64 }
2022-02-01T16:37:54-05:00 - INFO - Memory usage per row is 1127 bytes. This excludes memory directly allocated by the ODBC driver.
2022-02-01T16:37:54-05:00 - INFO - Batch size set to 65535 rows.
2022-02-01T16:37:55-05:00 - WARN - State: HY004, Native error: 0, Message: [Oracle][ODBC]Invalid SQL data type <-25>.
Error: ODBC emitted an error calling 'SQLFetch':
State: HY004, Native error: 0, Message: [Oracle][ODBC]Invalid SQL data type <-25>.

Warnings shown when quiet flag

I am mostly running odbc2parquet from Powershell.
I noticed that I always get warnings when running - which powershell interprets as errors. To fix this I tried using the -q flag.
However warnings are still shown
image

Is it possible to introduce a flag so only errors are displayed?

Feature Request - File Size Limit for Partitioned Parquet

I've noticed that when producing partitioned files from large datasets, the file output sizes are inconsistent when specifying batch_size_mib and batches_per_file. It'd be a very helpful feature if a new parameter can be added so the file size can be capped while still allowing the batches to self-configure as they do.

I'd imagine it'd look something like file_size_limit_mib

Thanks for all that you do! Let me know your thoughts or if I missed something.

Flag to support legacy converted types

Inspired by #284 and #285.

The switch to logical types allowed odbc2parquet to select better fitting types in many situations. However, if logical types are not supported by the next step in the data pipeline restricting odbc2parquet to only choose schemas which can be expressed within the legacy converted types may be preferable.

Ping @leo-schick.

See also: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#deprecated-timestamp-convertedtype

Build release assets for linux as well

Release assets are only created for osx and windows. I would like to see here linux as well:
image

Currently I install it on linux via cargo install odbc2parquet, but that comes heavy when installing odbc2parquet insinde a docker image.

Issue with MySQL JSON columns

I am trying to export a table which has a json column type. However, it seems like the tool is unable to handle this currently.

The debug output of the column in question is as follows:

2023-02-02T10:37:12+01:00 - DEBUG - ODBC column description for column 5: ColumnDescription { name: [<removed>], data_type: LongVarchar { length: 4294967295 }, nullability: Nullable } 2023-02-02T10:37:12+01:00 - DEBUG - ODBC buffer description for column 5: Text { max_str_len: 4294967295 }

As you can see, it assumes a very large string length for the column. When it then calculates the memory usage per row, it gets a very large value. Memory limit option does not help, as a single row is now already in the gigabyte range. When using a row limit, it tries to allocate huge files, which also fails. For example with 1000 rows:

2023-02-02T10:37:12+01:00 - INFO - Memory usage per row is 4294970028 bytes. This excludes memory directly allocated by the ODBC driver. 2023-02-02T10:37:12+01:00 - INFO - Batch size set to 1000 rows. memory allocation of 4294967296000 bytes failed

Is there a workaround for that, or is this simple a limitation of parquet itself?

Parquet Column Encoding

Is there a way to define column encoding for the parquet format? For example, DELTA_BINARY_PACKED

return table from store procedure not available

I have store procedure to return table, error recieved from odbc2parquet:

Query came back empty (not even a schema has been returned). No file has been created.

Workarround: store procedure to writte on temp table and then odbc2parquet to read that temp table

Thread 'main' panicked at 'assertion failed: rec_number > 0'

Program works fine, but if I put it to work on a +60GB large data set it crashes.

Here is command:
.\odbc2parquet-win64.exe query --connection-string "Driver={ODBC Driver 17 for SQL Server};Server=localhost;UID=username;PWD=secretcode;" output.parquet "SELECT [Id],[FirstName],[LastName],[City],[Country],[Phone],[Geometry] FROM database.[dbo].[Customer] where TenantId = 001 and AreaName = 'area'"

image

The error message refers to "C:\Users\runneradmin" which does not exsist <-- is the runneradmin a placeholder ?

Same command works if I address e.g. TenantId = 002 which contains a smaller data set.

thanks :)

export in chunks?

Hi!

First of all - thank you for this great tool. :)

Is there a way to make the export in chunks?

I would like to define maximum amount of rows per file - so that the exported data is written into multiple files.

Likely not writing parquet correctly

While going through the code, I noticed that we write the values and definitions to the same length buffers, as per below:

self.def_levels[row_index] = def;
self.values_i32[row_index] = value;

I think this is incorrect, as we should only write values if the definition > 0 (or 1 in our case with primitives).
This works out such that vec![None, Some(35), None] ends up with:

let defs = vec![0, 1, 0];
let vals = vec![35];

IIRC, the parquet crate will only complain if the values being written are less than valid definitions, so the above might not be surfacing itself in an error.

I tried to write a fairly large table that I know to be very sparse, and I reached a panic:

thread 'main' panicked at 'assertion failed: self.data.is_some()', parquet-1.0.1\src\data_type.rs:143:9

I have a suspicion that the panic is related to the above potential issue, but I'll have a look at it over the weekend or next week.

Make it possible for odbc2parquet to stream its output to standard out.

This feature is desirable, because not always would users want the output stored in their local filesystem. Streaming it to stdout would allow other tools to pick up the output and store it i.e. in a remote object store.

Open question: How should the user indicate on the command line, that the output is supposed to go to stdout? I'd rather have it explicit.

Publish on homebrew

Never met a datascientist, who used a MacBook without homebrew. Maybe we should publish this tool there.

Automatic change of batch size when memory error occurs

I use odbc2parquet to export a whole database of over 100 tables at once where I run odbc2parquet up to 8 times at once or more. Sometimes the jobs steal each other the memory and I get into an memory error (e.g. memory allocation of 262205535 bytes failed).

I cannot predict how much memory is available when I start a job and the size of the tables variate in single row size and number of rows. To use a fixed batch size (option --batch-size) is I think a bad solution because I have no clue about the design of the table when I run the command.

I think it would be great when odbc2parquet just tries to use a lower batch size when the required memory is not available. Something like "when failed, brute force until you get through".

Similar issue:

BiDirectional

Is there any aspirations to make this tool bidirectional, so you can import to SQL from Parquet?

Add shell completions

clap seems to be able to autogenerate some shell completions, would be nice if we could provide them to the users

More debug messages for unixODBC issues

On one of my CentOS 7 machines I have an unixODBC issue which I can't figure out how to solve.

odbcinst -j works, loads the correct ODBCSYSINI env variable. My unixODBC version is 2.3.1
isql -k ... works, successfully connects
odbc2parquet throws unixODBC error S1009, Invalid attribute value (native error 0)
even odbc2parquet list-drivers throws the same error, so it's not about the DSN

ODBC Environment created is also printed

Would it be possible to add some more details to the error message? (I tried -v -vvv etc)

Add flag to disable coloring

Currently the CLI output is always colored which is nice, but sometimes looks a bit ugy when you see it through a console which doesn't support coloring (e.g. Jenkins). I would like to have a option to disable console coloring. Something similar to terraform plan -no-color

setup types for particular column

how to say to odbctoparquet with SQL or with some other flag which particular type of column in parquet do I want to set?

example this is my query for replicating the data

SELECT
    CAST(offer_id AS CHAR) 				 AS offer_id,
    CAST(clinic_id AS SIGNED) 			 AS clinic_id,
    CAST(operation_id AS SIGNED) 		 AS operation_id,
    CAST(illness_id AS SIGNED) 			 AS illness_id,
    CAST(diagnostic_id AS SIGNED) 	         AS diagnostic_id,
    CAST(score AS CHAR) 				 AS score,
    CAST(created_at AS DATETIME) 		 AS created_at,
    CAST(type AS CHAR) 				 AS type,
    CAST(clinic AS CHAR) 				AS clinic,
    CAST(operation AS CHAR) 			AS operation,
    CAST(illness AS CHAR) 				AS illness,
    CAST(diagnostic AS CHAR) 			AS diagnostic,
    CAST(tag_new AS SIGNED) 			AS tag_new,
    CAST(tags AS CHAR) 				AS tags,
    CAST(views AS SIGNED) 				AS views,
    CAST(clicks AS SIGNED) 				AS clicks,
    CAST(leads AS SIGNED) 				AS leads,
    CAST(conversion_rate AS DECIMAL)        AS conversion_rate,
    CAST(ctr AS DECIMAL) 				AS ctr,
    CAST(ctr_weight AS DECIMAL) 		AS ctr_weight
FROM offers_clinics

then in the parquet viewer in Pycharm, I see the next (using parquet/avro viewer)

  "type" : "record",
  "name" : "schema",
  "fields" : [ {
    "name" : "offer_id",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "clinic_id",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "operation_id",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "illness_id",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "diagnostic_id",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "score",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "created_at",
    "type" : [ "null", {
      "type" : "long",
      "logicalType" : "timestamp-millis"
    } ],
    "default" : null
  }, {
    "name" : "type",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "clinic",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "operation",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "illness",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "diagnostic",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "tag_new",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "tags",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "views",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "clicks",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "leads",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "conversion_rate",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "ctr",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "ctr_weight",
    "type" : [ "null", "long" ],
    "default" : null
  } ]
}

I'm expecting these three columns in DOUBLE or NUMERIC instead of INTEGER or LONG
maybe you can point me to the place in doc or maybe you have some work arounds.
Thanks in advance

version of binary odbc2parquet 0.9.5
Database MySql

Problem with varchar(MAX)

Hi Markus,

I'm still using this nice tool and so far everything worked as expected. But then recently I encountered some edge case scenario when I'm trying to dump a column from MS SQL created using varchar(MAX).

This leads to a warning (only visible when using -vvv) WARN - State: 01004, Native error: 0, Message: [Microsoft][ODBC Driver 17 for SQL Server]String data, right truncation. The parquet file generated for this table contains an empty string instead of a proper value.
During the debugging, I found that the ODBC buffer description has max_str_len set to 0 which is probably the root cause of this behavior.

2021-05-10T17:34:23+00:00 - DEBUG - ODBC column description for column 1: ColumnDescription { name: [116, 111, 107, 101, 110], data_type: Varchar { length: 0 }, nullability: Nullable }
2021-05-10T17:34:23+00:00 - DEBUG - ODBC buffer description for column 1: BufferDescription { nullable: true, kind: WText { max_str_len: 0 } }

I think that there are two problems here. One is that this truncation happens on the varchar(MAX) (I don't know if there is anything that you could do with it). The second one is that this is a silent warning which corrupts the parquet files (shouldn't we have some flag to be more strict about dumping data 1 to 1 without any transformation?).

Cheers,
Adrian

Support for data type timezone conversion to UTC

In TSQL/SQL Server, the datetimeoffset - which is practially datetime with timezone offset - is currently converted into a parquet string. I don't like that because the data stored there is quite large and might be bad to sort.
Postgres has a similar data type called timestamp with time zone

Unfortunately, PARQUET does not support a logical data type which supports time zone, but there are serveral considerations how to deal with different time zones, see here: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp One suggestion is to convert all to UTC.

I would like to have the option to tell odbc2parquet that all datetimeoffset/timestamp with time zone data types shall be converted into a UTC TIMESTAMP in the PARQUET export instead of into a PARQUET string. I suggest to add this option with an additional parameter.

The benefit of this is that filtering or sorting on a PARQUET timestamp is much faster than on a PARQUET string.

Reserved Column Names not Supported

According to the mysql documentation, when dealing with reserved words as column names, you can enclose the name with a ` or by providing fully qualified column names in the form of <table>.<column>. Would it be possible to integrate this into this tool? I could do a PR for that, but i can only test it on mysql. I am also not too familiar with Rust... I did a test by renaming the columns using python from <column> to <table>.<column> which then successfully imported.

Enable compression options as argument

Hello,

I was excited to utilize your utility to export some microsoft access tables to parquet, but unfortunately I discovered that it appears no compression is applied so the file size is much larger than I anticipated. I am not sure if compression is something you are considering adding, but I would just like to mention that it would be great if it could be relatively easily implemented and the command line accepted parameters for compression options.

Regards,
Chris Whelan

Attempting to list drivers using odbc2parquet 0.6.2.7 on Oracle Linux 7.8 after compiling it on the same os fails with State: S1009, Native error: 0, Message: [unixODBC][Driver Manager]Invalid attribute value

[svrvdef@tmwenqtrn utils]$ ./odbc2parquet
odbc2parquet 0.6.27
Query an ODBC data source at store the result in a Parquet file

USAGE:
odbc2parquet [OPTIONS]

OPTIONS:
-h, --help Print help information
-q, --quiet Only print errors to standard error stream. Supresses warnings and all other
log levels independent of the verbose mode
-v, --verbose Verbose mode (-v, -vv, -vvv, etc)
-V, --version Print version information

SUBCOMMANDS:
completions Generate shell completions
help Print this message or the help of the given subcommand(s)
insert Read the content of a parquet and insert it into a table
list-data-sources List preconfigured data sources. Useful to find data source name to
connect to database
list-drivers List available drivers and their attributes
query Query a data source and write the result as parquet
[svrvdef@tmwenqtrn utils]$ ./odbc2parquet list-drivers
2022-03-24T16:18:54-07:00 - WARN - State: S1009, Native error: 0, Message: [unixODBC][Driver Manager]Invalid attribute value
Error: ODBC emitted an error calling 'SQLSetEnvAttr':
State: S1009, Native error: 0, Message: [unixODBC][Driver Manager]Invalid attribute value

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.