Comments (6)
Hello René,
thanks for taking the time to file a bug report. Based on the error message and the log I would say the bug happens then writing to parquet, rather than reading from the database. I've written a minimal test case, but sadly it failed to reproduce the issue.
I setup an empty table with a bit column, and inserted five values into it.
setup_empty_table(&conn, table_name, &["BIT"]).unwrap();
let insert = format!(
"INSERT INTO {}
(a)
VALUES
(0), (1), (NULL), (1), (0);",
table_name
);
Yet odbc2parquet
runs fine and produces a parquet file for which parquet read shows the following output.
let expected_values = "\
{a: false}\n\
{a: true}\n\
{a: null}\n\
{a: true}\n\
{a: false}\n\
";
I am happy to see that you use a Microsoft SQL Database. Same as I use for the test setup for this tool. Yet my minimal mock data fails to reproduce the issue. I wonder if we need all of your 83817 rows to reproduce the issue. What would happen if you limit the query to only return part of the data? (Syntax for MSSQL can be found is this stack overflow issue: https://stackoverflow.com/questions/13220743/implement-paging-skip-take-functionality-with-this-query).
Your help in figuring this out would be greatly appreciated.
Cheers, Markus
from odbc2parquet.
Hi Markus, thanks for the quick response.
It seems like it fails when it tries to fetch more than 4096 rows.
Fetching 4096 rows
odbc2parquet -vvv query --connection-string "Driver={ODBC Driver 17 for SQL Server};---" Pessoas.parquet "SELECT TOP(4096) HabilitadoRiscoSacado_pes FROM Pessoas"
2021-07-31T09:10:19-03:00 - DEBUG - ODBC Environment created.
2021-07-31T09:10:20-03:00 - WARN - State: 01000, Native error: 5701, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Contexto do banco de dados alterado para '---'.
2021-07-31T09:10:20-03:00 - WARN - State: 01000, Native error: 5703, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Definição do idioma alterada para Português (Brasil).
2021-07-31T09:10:20-03:00 - DEBUG - ODBC column description for column 1: ColumnDescription { name: [72, 97, 98, 105, 108, 105, 116, 97, 100, 111, 82, 105, 115, 99, 111, 83, 97, 99, 97, 100, 111, 95, 112, 101, 115], data_type: Bit, nullability: NoNulls }
2021-07-31T09:10:20-03:00 - DEBUG - ODBC buffer description for column 1: BufferDescription { nullable: true, kind: Bit }
2021-07-31T09:10:20-03:00 - INFO - Memory usage per row is 68 bytes. This excludes memory directly allocated by the ODBC driver.
2021-07-31T09:10:20-03:00 - INFO - Batch size set to 31580641
2021-07-31T09:10:20-03:00 - INFO - Fetched batch 1 with 4096 rows.
2021-07-31T09:10:20-03:00 - DEBUG - Writing column with index 0 and name 'HabilitadoRiscoSacado_pes'.
Fetching 4097 rows
odbc2parquet -vvv query --connection-string "Driver={ODBC Driver 17 for SQL Server};---" Pessoas.parquet "SELECT TOP(4097) HabilitadoRiscoSacado_pes FROM Pessoas"
2021-07-31T09:10:14-03:00 - DEBUG - ODBC Environment created.
2021-07-31T09:10:15-03:00 - WARN - State: 01000, Native error: 5701, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Contexto do banco de dados alterado para '---'.
2021-07-31T09:10:15-03:00 - WARN - State: 01000, Native error: 5703, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Definição do idioma alterada para Português (Brasil).
2021-07-31T09:10:15-03:00 - DEBUG - ODBC column description for column 1: ColumnDescription { name: [72, 97, 98, 105, 108, 105, 116, 97, 100, 111, 82, 105, 115, 99, 111, 83, 97, 99, 97, 100, 111, 95, 112, 101, 115], data_type: Bit, nullability: NoNulls }
2021-07-31T09:10:15-03:00 - DEBUG - ODBC buffer description for column 1: BufferDescription { nullable: true, kind: Bit }
2021-07-31T09:10:15-03:00 - INFO - Memory usage per row is 68 bytes. This excludes memory directly allocated by the ODBC driver.
2021-07-31T09:10:15-03:00 - INFO - Batch size set to 31580641
2021-07-31T09:10:15-03:00 - INFO - Fetched batch 1 with 4097 rows.
2021-07-31T09:10:15-03:00 - DEBUG - Writing column with index 0 and name 'HabilitadoRiscoSacado_pes'.
Error: EOF: unable to put boolean value
Fetching 4096 rows offsetting 4096
odbc2parquet -vvv query --connection-string "Driver={ODBC Driver 17 for SQL Server};---" Pessoas.parquet "SELECT HabilitadoRiscoSacado_pes FROM Pessoas ORDER BY HabilitadoRiscoSacado_pes OFFSET 4096 ROWS FETCH NEXT 4096 ROWS ONLY"
2021-07-31T09:12:36-03:00 - DEBUG - ODBC Environment created.
2021-07-31T09:12:37-03:00 - WARN - State: 01000, Native error: 5701, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Contexto do banco de dados alterado para '---'.
2021-07-31T09:12:37-03:00 - WARN - State: 01000, Native error: 5703, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Definição do idioma alterada para Português (Brasil).
2021-07-31T09:12:37-03:00 - DEBUG - ODBC column description for column 1: ColumnDescription { name: [72, 97, 98, 105, 108, 105, 116, 97, 100, 111, 82, 105, 115, 99, 111, 83, 97, 99, 97, 100, 111, 95, 112, 101, 115], data_type: Bit, nullability: NoNulls }
2021-07-31T09:12:37-03:00 - DEBUG - ODBC buffer description for column 1: BufferDescription { nullable: true, kind: Bit }
2021-07-31T09:12:37-03:00 - INFO - Memory usage per row is 68 bytes. This excludes memory directly allocated by the ODBC driver.
2021-07-31T09:12:37-03:00 - INFO - Batch size set to 31580641
2021-07-31T09:12:37-03:00 - INFO - Fetched batch 1 with 4096 rows.
2021-07-31T09:12:37-03:00 - DEBUG - Writing column with index 0 and name 'HabilitadoRiscoSacado_pes'.
Fetching 4097 rows offsetting 4096
odbc2parquet -vvv query --connection-string "Driver={ODBC Driver 17 for SQL Server};Server=172.20.16.4;---" Pessoas.parquet "SELECT HabilitadoRiscoSacado_pes FROM Pessoas ORDER BY HabilitadoRiscoSacado_pes OFFSET 4096 ROWS FETCH NEXT 4097 ROWS ONLY"
2021-07-31T09:12:40-03:00 - DEBUG - ODBC Environment created.
2021-07-31T09:12:41-03:00 - WARN - State: 01000, Native error: 5701, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Contexto do banco de dados alterado para '---'.
2021-07-31T09:12:41-03:00 - WARN - State: 01000, Native error: 5703, Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Definição do idioma alterada para Português (Brasil).
2021-07-31T09:12:41-03:00 - DEBUG - ODBC column description for column 1: ColumnDescription { name: [72, 97, 98, 105, 108, 105, 116, 97, 100, 111, 82, 105, 115, 99, 111, 83, 97, 99, 97, 100, 111, 95, 112, 101, 115], data_type: Bit, nullability: NoNulls }
2021-07-31T09:12:41-03:00 - DEBUG - ODBC buffer description for column 1: BufferDescription { nullable: true, kind: Bit }
2021-07-31T09:12:41-03:00 - INFO - Memory usage per row is 68 bytes. This excludes memory directly allocated by the ODBC driver.
2021-07-31T09:12:41-03:00 - INFO - Batch size set to 31580641
2021-07-31T09:12:41-03:00 - INFO - Fetched batch 1 with 4097 rows.
2021-07-31T09:12:41-03:00 - DEBUG - Writing column with index 0 and name 'HabilitadoRiscoSacado_pes'.
Error: EOF: unable to put boolean value
from odbc2parquet.
Hello René,
now there is a smoking gun. Thanks for running the experiment. I'm currently very busy moving. I'll try to reproduce the issue during train rides. In the meantime you may try to query all the data but set the --batch-size-row
option to 4096. Could be a workaround.
Cheers, Markus
from odbc2parquet.
Have a nice reproducing test, now. Thanks, I'll keep you in the loop.
Cheers, Markus
from odbc2parquet.
Should work now in 0.6.8
. Tests pass fine. I misunderstood what the write batch size property in the parquet library meant.
from odbc2parquet.
Thanks Markus, working great now!
from odbc2parquet.
Related Issues (20)
- Option to not generate file if row count is 0 HOT 4
- setup types for particular column HOT 2
- Issue with MySQL JSON columns HOT 8
- Reserved Column Names not Supported HOT 1
- Feature Request - Support column encryption in the generated parquet file HOT 4
- JobName as .sql file in config file HOT 4
- Parquet format version support HOT 9
- Feature suggestion: connect to URL `postgresql://username:pass@host/database` HOT 1
- What permissions are needed? - State: 42501, Native error: 1, Message: ERROR: permission denied HOT 4
- StarRocks parquet file import of parquet file generated by odbc2parquet fails with encoding error HOT 11
- Memory allocation with column-length-limit HOT 11
- Build for alpine HOT 8
- file-size-threshold generates wrong size files HOT 1
- --no-empty-file option doesn't work properly when row-groups-per-file should devide result into few files HOT 6
- MSSQL nvarchar - missing column in output file HOT 2
- Feature request: Progress bar for full table copies HOT 6
- Data source must return valid UTF16 in wide character buffer: Utf16Error HOT 4
- Write statistics HOT 14
- Make zstd the default compression HOT 4
- Build release assets for Ubuntu ARM64 as well HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from odbc2parquet.