Giter Site home page Giter Site logo

Comments (6)

pacman82 avatar pacman82 commented on May 31, 2024

Hello René,

thanks for taking the time to file a bug report. Based on the error message and the log I would say the bug happens then writing to parquet, rather than reading from the database. I've written a minimal test case, but sadly it failed to reproduce the issue.

I setup an empty table with a bit column, and inserted five values into it.

setup_empty_table(&conn, table_name, &["BIT"]).unwrap();
let insert = format!(
        "INSERT INTO {}
        (a)
        VALUES
        (0), (1), (NULL), (1), (0);",
        table_name
    );

Yet odbc2parquet runs fine and produces a parquet file for which parquet read shows the following output.

let expected_values = "\
        {a: false}\n\
        {a: true}\n\
        {a: null}\n\
        {a: true}\n\
        {a: false}\n\
    ";

I am happy to see that you use a Microsoft SQL Database. Same as I use for the test setup for this tool. Yet my minimal mock data fails to reproduce the issue. I wonder if we need all of your 83817 rows to reproduce the issue. What would happen if you limit the query to only return part of the data? (Syntax for MSSQL can be found is this stack overflow issue: https://stackoverflow.com/questions/13220743/implement-paging-skip-take-functionality-with-this-query).

Your help in figuring this out would be greatly appreciated.

Cheers, Markus

from odbc2parquet.

reneoctavio avatar reneoctavio commented on May 31, 2024

Hi Markus, thanks for the quick response.
It seems like it fails when it tries to fetch more than 4096 rows.

Fetching 4096 rows

odbc2parquet -vvv query --connection-string "Driver={ODBC Driver 17 for SQL Server};---" Pessoas.parquet "SELECT TOP(4096) HabilitadoRiscoSacado_pes FROM Pessoas"
2021-07-31T09:10:19-03:00 - DEBUG - ODBC Environment created.
2021-07-31T09:10:20-03:00 - WARN - State: 01000, Native error: 5701, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Contexto do banco de dados alterado para '---'.
2021-07-31T09:10:20-03:00 - WARN - State: 01000, Native error: 5703, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Definição do idioma alterada para Português (Brasil).
2021-07-31T09:10:20-03:00 - DEBUG - ODBC column description for column 1: ColumnDescription { name: [72, 97, 98, 105, 108, 105, 116, 97, 100, 111, 82, 105, 115, 99, 111, 83, 97, 99, 97, 100, 111, 95, 112, 101, 115], data_type: Bit, nullability: NoNulls }
2021-07-31T09:10:20-03:00 - DEBUG - ODBC buffer description for column 1: BufferDescription { nullable: true, kind: Bit }
2021-07-31T09:10:20-03:00 - INFO - Memory usage per row is 68 bytes. This excludes memory directly allocated by the ODBC driver.
2021-07-31T09:10:20-03:00 - INFO - Batch size set to 31580641
2021-07-31T09:10:20-03:00 - INFO - Fetched batch 1 with 4096 rows.
2021-07-31T09:10:20-03:00 - DEBUG - Writing column with index 0 and name 'HabilitadoRiscoSacado_pes'.

Fetching 4097 rows

odbc2parquet -vvv query --connection-string "Driver={ODBC Driver 17 for SQL Server};---" Pessoas.parquet "SELECT TOP(4097) HabilitadoRiscoSacado_pes FROM Pessoas"
2021-07-31T09:10:14-03:00 - DEBUG - ODBC Environment created.
2021-07-31T09:10:15-03:00 - WARN - State: 01000, Native error: 5701, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Contexto do banco de dados alterado para '---'.
2021-07-31T09:10:15-03:00 - WARN - State: 01000, Native error: 5703, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Definição do idioma alterada para Português (Brasil).
2021-07-31T09:10:15-03:00 - DEBUG - ODBC column description for column 1: ColumnDescription { name: [72, 97, 98, 105, 108, 105, 116, 97, 100, 111, 82, 105, 115, 99, 111, 83, 97, 99, 97, 100, 111, 95, 112, 101, 115], data_type: Bit, nullability: NoNulls }
2021-07-31T09:10:15-03:00 - DEBUG - ODBC buffer description for column 1: BufferDescription { nullable: true, kind: Bit }
2021-07-31T09:10:15-03:00 - INFO - Memory usage per row is 68 bytes. This excludes memory directly allocated by the ODBC driver.
2021-07-31T09:10:15-03:00 - INFO - Batch size set to 31580641
2021-07-31T09:10:15-03:00 - INFO - Fetched batch 1 with 4097 rows.
2021-07-31T09:10:15-03:00 - DEBUG - Writing column with index 0 and name 'HabilitadoRiscoSacado_pes'.
Error: EOF: unable to put boolean value

Fetching 4096 rows offsetting 4096

odbc2parquet -vvv query --connection-string "Driver={ODBC Driver 17 for SQL Server};---" Pessoas.parquet "SELECT HabilitadoRiscoSacado_pes FROM Pessoas ORDER BY HabilitadoRiscoSacado_pes OFFSET 4096 ROWS FETCH NEXT 4096 ROWS ONLY"
2021-07-31T09:12:36-03:00 - DEBUG - ODBC Environment created.
2021-07-31T09:12:37-03:00 - WARN - State: 01000, Native error: 5701, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Contexto do banco de dados alterado para '---'.
2021-07-31T09:12:37-03:00 - WARN - State: 01000, Native error: 5703, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Definição do idioma alterada para Português (Brasil).
2021-07-31T09:12:37-03:00 - DEBUG - ODBC column description for column 1: ColumnDescription { name: [72, 97, 98, 105, 108, 105, 116, 97, 100, 111, 82, 105, 115, 99, 111, 83, 97, 99, 97, 100, 111, 95, 112, 101, 115], data_type: Bit, nullability: NoNulls }
2021-07-31T09:12:37-03:00 - DEBUG - ODBC buffer description for column 1: BufferDescription { nullable: true, kind: Bit }
2021-07-31T09:12:37-03:00 - INFO - Memory usage per row is 68 bytes. This excludes memory directly allocated by the ODBC driver.
2021-07-31T09:12:37-03:00 - INFO - Batch size set to 31580641
2021-07-31T09:12:37-03:00 - INFO - Fetched batch 1 with 4096 rows.
2021-07-31T09:12:37-03:00 - DEBUG - Writing column with index 0 and name 'HabilitadoRiscoSacado_pes'.

Fetching 4097 rows offsetting 4096

odbc2parquet -vvv query --connection-string "Driver={ODBC Driver 17 for SQL Server};Server=172.20.16.4;---" Pessoas.parquet "SELECT HabilitadoRiscoSacado_pes FROM Pessoas ORDER BY HabilitadoRiscoSacado_pes OFFSET 4096 ROWS FETCH NEXT 4097 ROWS ONLY"
2021-07-31T09:12:40-03:00 - DEBUG - ODBC Environment created.
2021-07-31T09:12:41-03:00 - WARN - State: 01000, Native error: 5701, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Contexto do banco de dados alterado para '---'.
2021-07-31T09:12:41-03:00 - WARN - State: 01000, Native error: 5703, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Definição do idioma alterada para Português (Brasil).
2021-07-31T09:12:41-03:00 - DEBUG - ODBC column description for column 1: ColumnDescription { name: [72, 97, 98, 105, 108, 105, 116, 97, 100, 111, 82, 105, 115, 99, 111, 83, 97, 99, 97, 100, 111, 95, 112, 101, 115], data_type: Bit, nullability: NoNulls }
2021-07-31T09:12:41-03:00 - DEBUG - ODBC buffer description for column 1: BufferDescription { nullable: true, kind: Bit }
2021-07-31T09:12:41-03:00 - INFO - Memory usage per row is 68 bytes. This excludes memory directly allocated by the ODBC driver.
2021-07-31T09:12:41-03:00 - INFO - Batch size set to 31580641
2021-07-31T09:12:41-03:00 - INFO - Fetched batch 1 with 4097 rows.
2021-07-31T09:12:41-03:00 - DEBUG - Writing column with index 0 and name 'HabilitadoRiscoSacado_pes'.
Error: EOF: unable to put boolean value

from odbc2parquet.

pacman82 avatar pacman82 commented on May 31, 2024

Hello René,

now there is a smoking gun. Thanks for running the experiment. I'm currently very busy moving. I'll try to reproduce the issue during train rides. In the meantime you may try to query all the data but set the --batch-size-row option to 4096. Could be a workaround.

Cheers, Markus

from odbc2parquet.

pacman82 avatar pacman82 commented on May 31, 2024

Have a nice reproducing test, now. Thanks, I'll keep you in the loop.

Cheers, Markus

from odbc2parquet.

pacman82 avatar pacman82 commented on May 31, 2024

Should work now in 0.6.8. Tests pass fine. I misunderstood what the write batch size property in the parquet library meant.

from odbc2parquet.

reneoctavio avatar reneoctavio commented on May 31, 2024

Thanks Markus, working great now!

from odbc2parquet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.