Giter Site home page Giter Site logo

Comments (8)

piccolbo avatar piccolbo commented on August 22, 2024

If I can remember I went for read.table because it is more flexible, but I
can see your point as far as creating the wrong assumption. If you read
the Wikipedia entry for CSV I think the naming is sensible and I am not
sure I always want to follow all the quirks of some arbitrary definitions
in R. The formats read by read.table read.CSV and read. csv2 are all CSVs.
Isn't that just bad naming in the standard functions? And is this worth a
backward incompatible change?
On Jun 4, 2013 5:46 PM, "Jamie F Olson" [email protected] wrote:

Since the rmr2 format is referred to as "csv", shouldn't it actually call
read.csv so that it has the expected default parameters? Of particular
importance is comment.char = "", which I spent a surprising amount of
time debugging before I finally noticed that rmr actually calls read.table.
I think it specifies somewhere in the documentation that read.table is
being called, but at least I still found it surprising that it's not
calling read.csv.


Reply to this email directly or view it on GitHubhttps://github.com//issues/50
.

from rmr2.

jamiefolson avatar jamiefolson commented on August 22, 2024

Yeah, csv isn't really a standard and there are wide variations on how people parse "csv" files.

One option would be to simply default to comment.char = "" but that could be even more confusing since then you're not consistent with any of the read.* functions. Maybe a new input format consistent with the hive/pig defaults (e.g. sep="\001",comment.char = "",quote="")?

from rmr2.

piccolbo avatar piccolbo commented on August 22, 2024

How is the rmr csv format not consistent with read.table? A new input
format to import from hive pig sonds like a great idea independent from the
original subject here.

On Tue, Jun 11, 2013 at 8:14 AM, Jamie F Olson [email protected]:

Yeah, csv isn't really a standard and there are wide variations on how
people parse "csv" files.

One option would be to simply default to comment.char = "" but that could
be even more confusing since then you're not consistent with any of the
read.* functions. Maybe a new input format consistent with the hive/pig
defaults (e.g. sep="\001",comment.char = "",quote="")?


Reply to this email directly or view it on GitHubhttps://github.com//issues/50#issuecomment-19268215
.

from rmr2.

jamiefolson avatar jamiefolson commented on August 22, 2024

I meant that you're currently completely consistent with read.table but that the read.table default comment.char="#" leads to surprises. If you only changed that default then you would perhaps be less surprising to the people wanting to parse "csv" files, but you would be more confusing to experts since you would no longer be consistent with read.table

I'm currently using sep="\001",comment.char = "",colClasses="character",fill=TRUE,flush=TRUE,quote="",... for importing hive/pig data:

  make.input.format("csv","text",sep=sep,comment.char = comment.char,
                    colClasses=colClasses,
                    fill=fill,flush=flush,quote=quote,...)

from rmr2.

piccolbo avatar piccolbo commented on August 22, 2024

And what do you need to do, if anything, in Hive and Pig?

On Mon, Jun 17, 2013 at 7:25 AM, Jamie F Olson [email protected]:

I meant that you're currently completely consistent with read.table but
that the read.table default comment.char="#" leads to surprises. If you
only changed that default then you would perhaps be less surprising to the
people wanting to parse "csv" files, but you would be more confusing to
experts since you would no longer be consistent with read.table

I'm currently using sep="\001",comment.char =
"",colClasses="character",fill=TRUE,flush=TRUE,quote="",... for importing
hive/pig data:

make.input.format("csv","text",sep=sep,comment.char = comment.char, colClasses=colClasses, fill=fill,flush=flush,quote=quote,...)


Reply to this email directly or view it on GitHubhttps://github.com//issues/50#issuecomment-19548260
.

from rmr2.

jamiefolson avatar jamiefolson commented on August 22, 2024

Those parameters should be consistent with the default default format for
both Hive and Pig (ROW FORMAT DELIMITED FIELDS TERMINATED BY '001' LINES TERMINATED BY '\n').

Jamie Olson

On Mon, Jun 17, 2013 at 11:34 AM, Antonio Piccolboni <
[email protected]> wrote:

And what do you need to do, if anything, in Hive and Pig?

On Mon, Jun 17, 2013 at 7:25 AM, Jamie F Olson [email protected]:

I meant that you're currently completely consistent with read.table but
that the read.table default comment.char="#" leads to surprises. If you
only changed that default then you would perhaps be less surprising to
the
people wanting to parse "csv" files, but you would be more confusing to
experts since you would no longer be consistent with read.table

I'm currently using sep="\001",comment.char =
"",colClasses="character",fill=TRUE,flush=TRUE,quote="",... for
importing
hive/pig data:

make.input.format("csv","text",sep=sep,comment.char = comment.char,
colClasses=colClasses, fill=fill,flush=flush,quote=quote,...)


Reply to this email directly or view it on GitHub<
https://github.com/RevolutionAnalytics/rmr2/issues/50#issuecomment-19548260>

.


Reply to this email directly or view it on GitHubhttps://github.com//issues/50#issuecomment-19553113
.

from rmr2.

piccolbo avatar piccolbo commented on August 22, 2024

I am implementing this for 2.3.0 and I was wondering why you added the ... to the make input call. Of course that's not correct R but I was wondering if you meant that I should accept additional arguments. Or more in general, should I make the pig/hive format fixed or are some variations useful?

from rmr2.

jamiefolson avatar jamiefolson commented on August 22, 2024

I just accepted additional arguments assuming that I'd find additional things I'd want to configure. I think a couple options that might depend on circumstances are stringsAsFactors and strip.white.

from rmr2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.