Giter Site home page Giter Site logo

Comments (7)

simonpcouch avatar simonpcouch commented on August 28, 2024

Thanks for bringing this up! It may be a good idea to import tidyselect and use proper data-masking here.

from infer.

sda030 avatar sda030 commented on August 28, 2024

Actually, you are already importing rlang (with data masking) and tidyselect through dplyr, so no new dependencies really. And also, yes, data masking (.data[["age"]]) is perhaps more meaningful as you only allow single arguments.

from infer.

simonpcouch avatar simonpcouch commented on August 28, 2024

Yeah, not worried on dependency heaviness for this one!

EDIT: Ha, you're ahead of me on tidyselection lifecycle! Didn't realize .data had been deprecated in tidyselection.

from infer.

simonpcouch avatar simonpcouch commented on August 28, 2024

As I work through this, noting-to-self a few oddities of the current all.vars() solution to column selection via formula. Note that:

library(rlang)

f_rhs(college ~ "age")
#> [1] "age"
all.vars(f_rhs(college ~ "age"))
#> character(0)

As a result:

library(infer)

# "age" is not symbolic
specify(gss, college ~ "age")
#> Error in `specify()`:
#> ! The explanatory should be a bare variable name (not a string in quotation marks).
#> Backtrace:
#>     ▆
#>  1. └─infer::specify(gss, college ~ "age")
#>  2.   └─infer:::parse_variables(x, formula, response, explanatory)
#>  3.     └─infer:::check_var_correct(x, "explanatory", call = call)
#>  4.       └─rlang::abort(...)

# "age" and a nonexistent column is, but the helper 
# can't handle multiple explanatory variables
specify(gss, college ~ "age" + nonexistent_column)
#> Error in `specify()`:
#> ! The explanatory variable `+` cannot be found in this dataframe.
#> • The explanatory variable `age` cannot be found in this dataframe.
#> • The explanatory variable `nonexistent_column` cannot be found in this dataframe.
#> Backtrace:
#>     ▆
#>  1. └─infer::specify(gss, college ~ "age" + nonexistent_column)
#>  2.   └─infer:::parse_variables(x, formula, response, explanatory)
#>  3.     └─infer:::check_var_correct(x, "explanatory", call = call)
#>  4.       └─rlang::abort(...)

# doesn't trigger that error, though, with at least one
# valid column name and no invalid symbolics, since
# all.vars(RHS) == "year"
specify(gss, college ~ "age" + year)
#> Response: college (factor)
#> Explanatory: year (numeric)
#> # A tibble: 500 × 2
#>    college    year
#>    <fct>     <dbl>
#>  1 degree     2014
#>  2 no degree  1994
#>  3 degree     1998
#>  4 no degree  1996
#>  5 degree     1994
#>  6 no degree  1996
#>  7 no degree  1990
#>  8 degree     2016
#>  9 degree     2000
#> 10 no degree  1998
#> # ℹ 490 more rows

specify(gss, college ~ "age" + year + nonexistent_column)
#> Error in `specify()`:
#> ! The explanatory variable `+` cannot be found in this dataframe.
#> • The explanatory variable `"age" + year` cannot be found in this dataframe.
#> • The explanatory variable `nonexistent_column` cannot be found in this dataframe.
#> Backtrace:
#>     ▆
#>  1. └─infer::specify(gss, college ~ "age" + year + nonexistent_column)
#>  2.   └─infer:::parse_variables(x, formula, response, explanatory)
#>  3.     └─infer:::check_var_correct(x, "explanatory", call = call)
#>  4.       └─rlang::abort(...)

Created on 2023-05-24 with reprex v2.0.2

from infer.

simonpcouch avatar simonpcouch commented on August 28, 2024

The only other established functionality for column selection via formula in the tidymodels that I'm aware of is in recipes. It errors (via base R terms()) in all of the above cases:

recipes:::get_rhs_vars(college ~ "age", infer::gss)
#> Error in terms.formula(formula, data = data): invalid model formula in ExtractVars
recipes:::get_rhs_vars(college ~ "age" + nonexistent_column, infer::gss)
#> Error in terms.formula(formula, data = data): invalid model formula in ExtractVars
recipes:::get_rhs_vars(college ~ "age" + year, infer::gss)
#> Error in terms.formula(formula, data = data): invalid model formula in ExtractVars
recipes:::get_rhs_vars(college ~ "age" + year + nonexistent_column, infer::gss)
#> Error in terms.formula(formula, data = data): invalid model formula in ExtractVars

Created on 2023-05-24 with reprex v2.0.2

This feels to me like a possible argument for not transitioning to tidyselect under the hood in infer, as there's not a well-defined tidyselect procedure for formulas, and trying to write one would either require inconsistency with tidyselect (not allowing strings) or with base R (and thus tidymodels).

from infer.

simonpcouch avatar simonpcouch commented on August 28, 2024

After stewing with this for a while longer, I think the possibility of specify()ing via formula indeed means consistent tidyselection with infer is not well-defined. I appreciate you raising this issue, and believe we ought to revisit if at some point there is a proper spec for the interaction between tidyselect and formulae.

from infer.

github-actions avatar github-actions commented on August 28, 2024

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

from infer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.