Comments (6)
@Htorne Closing this issue. I do not know the exact type of the column which caused you the warnings, but I know I fixed something ;-)
from odbc2parquet.
@Htorne , we should find out why this is happening. For once, truncated strings could mean a loss of data and there is also the thing that to transfer that many diagnostic records from server to client takes a long time. How long did it take for you to execute this query?
from odbc2parquet.
Since the messages generated by the driver do not contain a column number we could test which column it is, by only Selecting one column at a time.
from odbc2parquet.
Let's see what types we have here:
- Numeric with scale 1. Wanted to map these properly to decimal anyway
- -9: These seem to be Wide Character Columns on the Database. Likely UCS2. Don't have a test for these yet. Maybe the driver does report the length to represent these in characters, rather than in bytes. A UTF-8 char might take up to 6Bytes of space so somthing might be coing on here. I can write a test for this though.
- -151: I'd have to look this up. Could you tell me what the type of the last column is?
from odbc2parquet.
Column length seems to be reported in characters, rather than in bytes fo the target encoding.
from odbc2parquet.
Your suggestion, that it is the geometry column is most likely right. As is visible in the log messages you posted, the driver reported 0, which is MS SQL Server specific to indicate that the user defined type is unlimited in size. The last version of odbc2parquet
took the zero litterally however and allocated a buffer which could not fit a single char per row. There is now a new version 0.5.0
which will only log one warning about ignoring this column.
Other improvements which might be interssting to you. The decimal column you have should now be properly mapped to decimal, as support has been increased beyond decimals with scale 0. There is also extra space allocated for text columns, but I doubt you've hit any issues there.
If my local test setup is any indication, the new version might run significantly faster, as the server does no longer need to generate that many diagnostic messages.
If ignoring the "strange" column is not good enough for your use case please open a new issue stating the exact column type and use case.
from odbc2parquet.
Related Issues (20)
- Support for data type timezone conversion to UTC HOT 11
- Warnings shown when quiet flag HOT 4
- Automatic change of batch size when memory error occurs HOT 7
- Converted type not written to output file for timestamps without timezone HOT 9
- Support MSSQL data type TIME HOT 10
- Compression SNAPPY not possible since version 0.13.2 HOT 5
- export in chunks? HOT 3
- Flag to support legacy converted types HOT 1
- Option to not generate file if row count is 0 HOT 4
- setup types for particular column HOT 2
- Issue with MySQL JSON columns HOT 8
- Reserved Column Names not Supported HOT 1
- Feature Request - Support column encryption in the generated parquet file HOT 4
- JobName as .sql file in config file HOT 4
- Parquet format version support HOT 9
- Feature suggestion: connect to URL `postgresql://username:pass@host/database` HOT 1
- What permissions are needed? - State: 42501, Native error: 1, Message: ERROR: permission denied HOT 4
- StarRocks parquet file import of parquet file generated by odbc2parquet fails with encoding error HOT 11
- Memory allocation with column-length-limit HOT 11
- Build for alpine HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from odbc2parquet.