When I connect to a SQL Server database and query data with data type <code class="not

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Converted type not written to output file for timestamps without timezone about odbc2parquet HOT 9 CLOSED

pacman82 commented on May 30, 2024

Converted type not written to output file for timestamps without timezone

from odbc2parquet.

Comments (9)

pacman82 commented on May 30, 2024

Hello @leo-schick ,

I am missing something here. Sure the physical type for timstamps is INT64, but its logical type should be TIMESTAMP. There is even a test to prove it.

See:

odbc2parquet/tests/lib.rs

Line 686 in aee2ca5

    
           parquet_schema_out(out_str).stdout(contains("OPTIONAL INT64 a (TIMESTAMP(MICROS,false));"));

The test uses datetime2 and not datetime. Did you experience the issue with datetime?

Best, Markus

from odbc2parquet.

leo-schick commented on May 30, 2024

I experienced the issue with both...

from odbc2parquet.

leo-schick commented on May 30, 2024

Used odbc2parquet version: 0.13.2

Probably the issue is not the logical type but that Azure Synapse (which I use to read the parquet files in this sample) does not support the logical type but use the converted type (legacy).

I inspected now the parquet files with parquet-tools:

MSSQL column with type `date`

############ Column(OrderDate) ############
name: OrderDate
path: OrderDate
max_definition_level: 1
max_repetition_level: 0
physical_type: INT32
logical_type: Date
converted_type (legacy): DATE
compression: ZSTD (space_saved: 18%)

✔️ Works fine with Azure Synapse, using data type date.

MSSQL column with type `datetime`

############ Column(CreatedDateTime) ############
name: CreatedDateTime
path: CreatedDateTime
max_definition_level: 1
max_repetition_level: 0
physical_type: INT64
logical_type: Timestamp(isAdjustedToUTC=false, timeUnit=milliseconds, is_from_converted_type=false, force_set_converted_type=false)
converted_type (legacy): NONE
compression: ZSTD (space_saved: 30%)

🛑 Does not work with Azure Synapse. I must read column with data type bigint. datetime and datetime2 fails.

MSSQL column with type `datetime2(7)`

############ Column(CreatedDateTime) ############
name: CreatedDateTime
path: CreatedDateTime
max_definition_level: 1
max_repetition_level: 0
physical_type: INT64
logical_type: Timestamp(isAdjustedToUTC=false, timeUnit=microseconds, is_from_converted_type=false, force_set_converted_type=false)
converted_type (legacy): NONE
compression: ZSTD (space_saved: 15%)

🛑 Does not work with Azure Synapse. I must read column with data type bigint. datetime and datetime2 fails.

MSSQL column `datetimeoffset`

############ Column(OrderTimestamp) ############
name: OrderTimestamp
path: OrderTimestamp
max_definition_level: 1
max_repetition_level: 0
physical_type: INT64
logical_type: Timestamp(isAdjustedToUTC=true, timeUnit=microseconds, is_from_converted_type=false, force_set_converted_type=false)
converted_type (legacy): TIMESTAMP_MICROS
compression: ZSTD (space_saved: 20%)

✔️ Works fine with Azure Synapse when I use data type datetime2.

It looks to me that Azure Synapse interprets not the logical_type but the converted_type which is probably deprecated. Would it be possible to add the converted_type schema information to the datetime and datetime2 data types as well?

from odbc2parquet.

leo-schick commented on May 30, 2024

Note: I found the official documentation which as a note about the deprecated converted_type:

Despite there is no exact corresponding ConvertedType for local timestamp semantic, in order to support forward compatibility with those libraries, which annotated their local timestamps with legacy TIMESTAMP_MICROS and TIMESTAMP_MILLIS annotation, Parquet writer implementation must annotate local timestamps with legacy annotations too, as shown below.

Backward compatibility:

ConvertedType LogicalType

TIME_MILLIS TimeType (isAdjustedToUTC = true, unit = MILLIS)

TIME_MICROS TimeType (isAdjustedToUTC = true, unit = MICROS)

Forward compatibility:

LogicalType ConvertedType

TimeType isAdjustedToUTC = true unit = MILLIS TIME_MILLIS

unit = MICROS TIME_MICROS

unit = NANOS -

isAdjustedToUTC = false unit = MILLIS TIME_MILLIS

unit = MICROS TIME_MICROS

unit = NANOS -

ConvertedType	LogicalType
TIME_MILLIS	TimeType (isAdjustedToUTC = true, unit = MILLIS)
TIME_MICROS	TimeType (isAdjustedToUTC = true, unit = MICROS)

LogicalType	ConvertedType
TimeType	isAdjustedToUTC = true	unit = MILLIS	TIME_MILLIS
unit = MICROS	TIME_MICROS
unit = NANOS	-
isAdjustedToUTC = false	unit = MILLIS	TIME_MILLIS
unit = MICROS	TIME_MICROS
unit = NANOS	-

from odbc2parquet.

pacman82 commented on May 30, 2024

Yeah, we should do that. I wonder however if this is not better fixed upstream in the parquet crate. Actually I assumed it would already do that. Maybe I am mistaken and there is a good reason for it not to do it, but asking can not hurt.

from odbc2parquet.

pacman82 commented on May 30, 2024

Hi @leo-schick ,

thanks again for your thorough Bug report. My current understanding is that by setting the logical type, the converted type should be implied. So I opened an Upstream ticket. As a workaround I tried to set the converted type manually. but it seems to be ignored then I tested with parquet-tools. I currently can not think of another quick workaround, so for the moment I am waiting for feedback on the upstream ticket. Also to verify that my understanding is correct.

If the situtation does persist, I could try introducing a flag to instruct odbc2parquet to set the converted instead of the logical type. Currently my spare time is very rare, so the best course for now I'd say is to move forward with the upstream ticket.

Best Markus

from odbc2parquet.

pacman82 commented on May 30, 2024

It seems the missing converted type is due to odbc2parquet using INT64 instead of INT32 for Milliseconds precision timestamps. However switching to INT32 the code now fails if setting the logic type. So, still trying to figure this thing out, but working on it.

from odbc2parquet.

pacman82 commented on May 30, 2024

Hi @leo-schick ,

as stated in the conclusion of apache/arrow-rs#3017 the parquet crate (and therefore also odbc2parquet) do emit a converted type. The bug seems to be in the readers which are based on the C++ implemantion (like pyarrow). Please open a bug ticket there.

Cheers, Markus

from odbc2parquet.

pacman82 commented on May 30, 2024

Also closing this issue, as the converted type is written into the output by odbc2parquet. Fault seems to lie with readers. Please open a bug there, if you need these to work together with odbc2parquet.

from odbc2parquet.

Converted type not written to output file for timestamps without timezone about odbc2parquet HOT 9 CLOSED

Comments (9)

MSSQL column with type `date`

MSSQL column with type `datetime`

MSSQL column with type `datetime2(7)`

MSSQL column `datetimeoffset`

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Comments (9)

MSSQL column with type date

MSSQL column with type datetime

MSSQL column with type datetime2(7)

MSSQL column datetimeoffset

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org

MSSQL column with type `date`

MSSQL column with type `datetime`

MSSQL column with type `datetime2(7)`

MSSQL column `datetimeoffset`