Giter Site home page Giter Site logo

Comments (4)

pacman82 avatar pacman82 commented on August 29, 2024 1

Hello @ddresslerlegalplans ,

thanks for opening this issue. I updated the parameter description of batch size to be (hopefully) more helpful:

The maximum number rows within each batch. Please note that the actual batch
size is up to the ODBC driver of your database. This parameter influences primarily the size
of the buffers the ODBC driver is supposed to fill with data, yet it is up to the driver how
many values it fills in one go. Also note that the primary use-case of batching is to reduce
IO overhead. So even if you fetch millions of rows a batch size of 100 or 1000 may be
entirely reasonable. This is trading memory for speed, but with diminishing returns.

Without knowing what database and driver you use I think I have an hypothesis, there the limit of 32767 comes from. 32767 is 2^(16-1) - 1. Which is the highest possible number you can represent with a signed 16Bit integer. So my guess is that your driver uses a 16Bit integer to represent batch size. Actually it behaves quite nicely, I have seen drivers just overflowing and wreaking havoc if the batch size becomes to large. Independent of driver implementation in the ODBC standard as I understand it the driver has the final say about batch size, so you can not rely on it being a certain length.

I'm also wondering if 1.456GB of RAM usage will be a problem or not as thats what I calculated if the reader was actually reading 10M records

That's the secondary use-case of batching. So you do not have to store all at once. The primary idea is not to fetch each row individually to save Network IO. I am guessing here, but it seems you want to produce one single large arrow array to hold all the data. You still should concatenate it after fetching. Especially string data is usually much smaller once it is in the arrow array. The buffer ODBC uses for transfer need always to account for the largest possible values, not only for the actual values in your database. So even in that use case fetching with a batch size of just 100 or 1000 is more reasonable than fetching with a batch size of several million.

Cheers, Markus

from arrow-odbc-py.

pacman82 avatar pacman82 commented on August 29, 2024

I am closing this issue, as there is nothing actionable. Apart from improving the documentation, which has already happened.

Cheers, Markus

from arrow-odbc-py.

ddresslerlegalplans avatar ddresslerlegalplans commented on August 29, 2024

Thanks Markus! After changing the Batch Size to 32767 it appears to be running correctly. I appreciate the thorough explanation. Have a nice day.

Cheers!

from arrow-odbc-py.

pacman82 avatar pacman82 commented on August 29, 2024

Hi @ddresslerlegalplans ,

happy to hear it works out now for you. This makes me think though. Maybe I should add a helper function which does the batching and concatenation into a single arrow array itself?

Thanks for reporting back.

Cheers, Markus

from arrow-odbc-py.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.