Giter Site home page Giter Site logo

Comments (8)

pacman82 avatar pacman82 commented on May 30, 2024

Hello @theOvercloud ,

thanks for the excelletly written issue. To my own suprise (unlike odbcsv) odbc2parquet does not offer a flag yet to limit the size of an indivdual column. You are unlikely the first person to run into this issue, but you are kind of the first to ask for it.

In terms of workarounds. Could you try to cast the JSON column into a VARCHAR(4096) column or the like in order to limit the size? You would need to of course apply some kind of domain knowledege how big your JSONS actually get, in order to avoid truncation.

This is more a limitation of ODBC rather than parquet. ODBC requires you for bulk (i.e. fast) fetching of rows to allocate the buffer before hand. Yet at that point in time you do not know the length of the actual values (otherwise they would be already fetched) so you have to assume the largest possible element there is.

Hope this answer is as helpful to you as your issue is to me. Otherwise feel free to ask again.

Best, Markus

from odbc2parquet.

theOvercloud avatar theOvercloud commented on May 30, 2024

Hallo Markus,

thanks a lot for your fast feedback! Good idea! I will certainly try that out. Hopefully the conversion does not introduces issues when inserting the data later into an other database. I will report back:)

Cheers,
Yanick

from odbc2parquet.

theOvercloud avatar theOvercloud commented on May 30, 2024

I was able to get it running using your suggestion. Thanks!

However, i noticed that the driver will cache all the batches before passing it to Rust. You can see that when looking at the network traffic and comparing it to the log outputs of the script. So i would still run into memory issues. The fix was to pass the driver argument NO_CACHE=1; (mysql docu) to the connection string. With that, the batches were directly passed to Rust and it worked as expected. Can you see the same behavior? Would it make sense to add this to the documentation or even add this argument by default? I am using the MySQL ODBC 8.0 Unicode Driver on Ubuntu.

from odbc2parquet.

theOvercloud avatar theOvercloud commented on May 30, 2024

Short update on the JSON front: As I said above, i was able to successfully export the table using the cast( <column_name> as char(..)) as <column_name>. However, when importing i had the following error:
2023-02-08T15:57:24+01:00 - WARN - State: HY000, Native error: 3144, Message: [MySQL][ODBC 8.0(w) Driver][mysqld-8.0.31-0ubuntu0.22.04.1]Cannot create a JSON value from a string with CHARACTER SET 'binary'.
I was able to fix that by setting the charset when connecting to the database via the initstmt. My full connection string now looks like this for inserts:

connection_string="Driver={MySQL ODBC 8.0 Unicode Driver};initstmt={set charset utf8mb4;};charset=utf8mb4;Server=$db_ip;Port=$db_port;UID=$username;PWD=$password;NO_CACHE=1;FORWARD_CURSOR=1;COMPRESSED_PROTO=1;"

from odbc2parquet.

pacman82 avatar pacman82 commented on May 30, 2024

Hello Yanick,

thanks for your feedback. I am a bit underwater at the moment, otherwise I would have already implemented an option to limit the column size for querying. I'll leave this issue open at least until I can do that.

I am happy it works now for you, and there is some value in persisting that knowledge. However, a lot of your issues are specific to MySQL, I am a bit hesitant to driver specific stuff, as I want for the most part to sit firmly on top of ODBC.

How about an odbc2parquet wiki here at GitHub with driver specific issues as a first step?

Thanks again for all your work and your feedback, I want to answer in greater detail, but I am pretty spend today.

Cheers, Markus

from odbc2parquet.

pacman82 avatar pacman82 commented on May 30, 2024

FYI: A new flag column-length-limit has just been added to the new version odbc2parquet 0.15.2.

from odbc2parquet.

theOvercloud avatar theOvercloud commented on May 30, 2024

Hi Markus,

Same here;) Always too much on the plate...

FYI: A new flag column-length-limit has just been added to the new version odbc2parquet 0.15.2.

I was not expecting that feature so soon, especially as you are doing this for free! Thanks a lot for the great tool and your work!

How about an odbc2parquet wiki here at GitHub with driver specific issues as a first step?

I agree. Would be great to have a knowledge base on that.

I think you could close the issue if you want.

Cheers,
Yanick

from odbc2parquet.

pacman82 avatar pacman82 commented on May 30, 2024

Hello Yanick,

thanks for the feedback. I have created a minimal homepage for the odbc2parquet wiki. Feel free to create a page to share you knowledge about odbc2parquet and MySQL JSON there. I did not use GitHub wiki before, so tell me if I missed something in case you wanna give it a spin.

Closing this issue.

Cheers, Markus

from odbc2parquet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.