Giter Site home page Giter Site logo

Comments (7)

jonkeane avatar jonkeane commented on August 22, 2024 1

Any opinions on whether we:

  • just mention out right we are using a mish mash of methods (we have 3 methods: arrow::open_dataset(), curl::multidownload() and download.file())
  • change the one optional curl::multidownload() one to download.file() (the Seattle CSV file)
  • change the various download.file() ones to curl::multidownload() (reduces to 2 methods)

IMHO the second or third, maybe leaning towards the second since it's base and means you don't need to also ask folks to install curl (surely they will have it anyway, but still one less thing to need to on). but the third would reduce the number of variants in cases where they "aren't" "necessary"

from arrow.

stephhazlitt avatar stephhazlitt commented on August 22, 2024

Thanks @jonkeane, excellent feedback.

For open_dataset(), I don't think that is a 12.0.1 vs 13.0.0 issue, but IIRC you can get that strange error when you don't have dplyr loaded. Can you try this:

library(arrow)
library(dplyr)

data_path <- here::here("data/nyc-taxi") # Or set your own preferred path

open_dataset("s3://voltrondata-labs-datasets/nyc-taxi") |>
    filter(year %in% 2012:2021) |> 
    write_dataset(data_path, partitioning = c("year", "month"))

from arrow.

jonkeane avatar jonkeane commented on August 22, 2024

For open_dataset(), I don't think that is a 12.0.1 vs 13.0.0 issue, but IIRC you can get that strange error when you do't have dplyr loaded. Can you try this: [...]

That does seem to be working now, yeah

from arrow.

stephhazlitt avatar stephhazlitt commented on August 22, 2024

Excellent point about the mish-mash of methods. Of course, the reason is either an intentional one (providing a couple of options for getting the big data and one that uses arrow, so we can talk about using arrow with cloud) or a legacy one (because that is the way R4DS gets the data ๐Ÿ˜†).

Any opinions on whether we:

  • just mention out right we are using a mish mash of methods (we have 3 methods: arrow::open_dataset(), curl::multidownload() and download.file())
  • change the one optional curl::multidownload() one to download.file() (the Seattle CSV file)
  • change the various download.file() ones to curl::multidownload() (reduces to 2 methods)

from arrow.

stephhazlitt avatar stephhazlitt commented on August 22, 2024

For open_dataset(), I don't think that is a 12.0.1 vs 13.0.0 issue, but IIRC you can get that strange error when you do't have dplyr loaded. Can you try this: [...]

That does seem to be working now, yeah

OK, so I will add library(dplyr) in thenโ€”great catch.

from arrow.

stephhazlitt avatar stephhazlitt commented on August 22, 2024

Updating with feedback WIP: #3

from arrow.

thisisnic avatar thisisnic commented on August 22, 2024

Closing this now as I just merged the PR, but let me know if we've missed anything. Thanks for testing these out @jonkeane and making the updates @stephhazlitt !

from arrow.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.