Giter Site home page Giter Site logo

Comments (5)

doobwa avatar doobwa commented on August 20, 2024

I'm curious about how to go about this. In the following it seems that + precedes : in the order of operations for Expr objects (which of course is incorrect for the model notation).

julia> f = Formula(:(y ~ x1 + x1:x2))
Formula([y],[:(+(x1,x1),x2)])

julia> f.rhs[1].args
2-element Any Array:
 +(x1,x1)
 x2      

Doesn't this make it harder to use the : notation without changing Expr objects?

from dataframes.jl.

tshort avatar tshort commented on August 20, 2024

Maybe we'll have to change operators. :: looks like it might work. So would & and %. Here's a list of operators ordered by precedence from julia-parser.scm:

(define ops-by-prec
  '#((= := += -= *= /= //= .//= .*= ./= |\\=| |.\\=| ^= .^= %= |\|=| &= $= => <<= >>= >>>= ~ |.+=| |.-=|)
     (?)
     (|\|\||)
     (&&)
     ; note: there are some strange-looking things in here because
     ; the way the lexer works, every prefix of an operator must also
     ; be an operator.
     (<- -- -->)
     (> < >= <= == === != |.>| |.<| |.>=| |.<=| |.==| |.!=| |.=| |.!| |<:| |>:|)
     (: |..|)
     (+ - |.+| |.-| |\|| $)
     (<< >> >>>)
     (* / |./| % & |.*| |\\| |.\\|)
     (// .//)
     (^ |.^|)
     (|::|)
     (|.|)))

from dataframes.jl.

HarlanH avatar HarlanH commented on August 20, 2024

(Tom, think you hit the close button by mistake! A bit of a GitHub UI quirk...)

I concur. I think we should go with & instead of :. y ~ 1 + x + x&y. There are also those redundant formula features I never use, like subtracting a predictor: y ~ 1 + x * y - y and whatnot. I don't really care if we support those or not. I'd prefer we stick with 0+ to remove the interaction term too, and not support - 1, which I find harder to read.

from dataframes.jl.

doobwa avatar doobwa commented on August 20, 2024

There is something to be said for supporting R's syntax: it's been around long enough for people to be familiar with it, and the Python people are starting to use it as well. Would this be possible if we instead parsed strings? As soon as I said that, though, it doesn't seem worth it.

On the other hand, the number of operations we're talking about is pretty minimal, so people will just need to look up Julia's way of doing it. One direction I think would be cool: extend this notation to also include namespaces of features a la Vowpal Wabbit's sparse format. For example, if you have a sparse, bag-of-words representation for a text document, all of these features could be under the words namespace. If you also have a categorical variable for day of week, all y ~ words * day would create interaction terms between all the word features and the day feature.

from dataframes.jl.

HarlanH avatar HarlanH commented on August 20, 2024

Yeah, I don't think a single-character change is a big deal here, and using Julia's parser seems a big enough win that I think we should stick with it.

As for namespaces (cool -- I need to actually try VW out sometime!), we'd need a way to define them separate from the formula. Would we want to include something like "colname groups" in the DataFrame? So, you'd somehow define "dims" to be a colname group for "height", "width", and "depth", then you could use "dims" instead of a list of those three column names? That could be useful for other things too. df["dims"] becomes a shorthand for df[["height", "width", "depth"]], and df["predictors"] and df["response"] seem natural things to define, too. So you could then call lm(:(response ~ predictors + covariants), df) or something. That's fairly awesome. I'm going to spin off an issue!

from dataframes.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.