Giter Site home page Giter Site logo

Importing the local .tsv.bz2 file via unark() to a remote AWS RDS MySQL Database is taking an unusual amount of time about arkdb HOT 9 CLOSED

GitHunter0 avatar GitHunter0 commented on July 19, 2024
Importing the local .tsv.bz2 file via unark() to a remote AWS RDS MySQL Database is taking an unusual amount of time

from arkdb.

Comments (9)

GitHunter0 avatar GitHunter0 commented on July 19, 2024 1

DBI::dbWriteTable() is now using LOAD DATA LOCAL INFILE as a backend and working perfectly for remote MySQL databases, as discussed here r-dbi/RMariaDB#265 .

And as I suspected, arkdb is now working flawlessly as well. The only requirement is to use this version r-dbi/RMariaDB#267, however they will probably soon merge to the main branch and upload to CRAN.

@cboettig , thanks again!

from arkdb.

cboettig avatar cboettig commented on July 19, 2024

I don't have an AWS RDS MySQL db handy to test unfortunately. You might simplify things further by (a) testing with an uncompressed file and (b) testing with a trivially small dataset such that you can run profvis() or equivalent profiling to see where the slowest steps are.

It looks like you alternative method may not be entirely analogous though, it seems you are just writing df[1,] there?

Also sounds like you suspect part of the issue is related specifically to RMariaDB -- is there a part that you think is specifically arkdb-related? (arkdb really doesn't do much beyond just calling dbWriteTable in chunks; and this data is small enough to write in a single chunk, so like you I suspect the performance issue must be something specific to MariaDB or the R driver thereof. Honestly I have almost never touched MySQL since discovering duckdb and arrow....).

from arkdb.

GitHunter0 avatar GitHunter0 commented on July 19, 2024

Thank you very much for your feedback, @cboettig .

Some very small datasets worked but way slower than LOAD LOCAL DATA INFILE.

I used df[1,] just to make a boilerplate for the table since LOAD LOCAL DATA INFILE requires a preexisting table.

I agree with you that the problem is probably solely related to RMariaDB , I will try to work the issue there.

from arkdb.

GitHunter0 avatar GitHunter0 commented on July 19, 2024

Hey @cboettig , r-dbi/RMariaDB#265 was solved.
So I will just leave you a suggestion if I may.
Adding an option to use LOAD LOCAL DATA INFILE method as unark() backend would solve this freezing and slowness issue when dealing with remote MySQL databases.
Thank you

from arkdb.

cboettig avatar cboettig commented on July 19, 2024

Thanks for the heads up! A PR to add this for MySQL-type remotes would be welcome.

from arkdb.

GitHunter0 avatar GitHunter0 commented on July 19, 2024

@cboettig, soon I will take a look in the inner workings of arkdb to see if I can do that, thank you!

from arkdb.

GitHunter0 avatar GitHunter0 commented on July 19, 2024

@cboettig, soon I will take a look in the inner workings of arkdb to see if I can do that, thank you!

For what've seen the rightful solution would be to add LOAD DATA LOCAL INFILE method to DBI::dbWriteTable() backend, so I will discuss this possibility with the DBI contributors.

from arkdb.

cboettig avatar cboettig commented on July 19, 2024

makes total sense, thanks!

from arkdb.

cboettig avatar cboettig commented on July 19, 2024

awesome, thanks for reporting this!

from arkdb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.