Giter Site home page Giter Site logo

agua's People

Contributors

gvelasq avatar hfrick avatar qiushiyan avatar simonpcouch avatar topepo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

agua's Issues

Release agua 0.1.0

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Check if any deprecation processes should be advanced, as described in Gradual deprecation
  • Polish NEWS
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::cloud_check()
  • Update cran-comments.md
  • git push
  • Draft blog post
  • Slack link to draft blog in #open-source-comms

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted ๐ŸŽ‰
  • git push
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • usethis::use_news_md()
  • git push
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

dials activation values

Since h2o uses different values for activation functions, we can

  • take the values that are consistent with current tidymodels engines (e.g. "tanh") and translate them inside of h2o_train_mlp() to be what h2o expects (e.g. "Tanh").
  • Also, we could expand what dials has as possible values to include others in h2o that tidymodels does not currently have (or just fail if the value is not in our current list).

breaking change in upcoming tune release

In the tune release following 1.2.1, tune's .catch_and_log(split) argument will be renamed to .catch_and_log(split_labels), and will take the format labels(split) rather than split. agua just passes that argument once here:

agua/R/tune.R

Lines 105 to 111 in 6a742f6

workflow <- tune::.catch_and_log(
.expr = workflows::.fit_pre(workflow, training_frame),
control,
split,
iter_msg_preprocessor,
notes = out_notes
)

...and can pass it's value conditional on tune's package version.

Related to tidymodels/tune#909.

The noted release is probably at least a couple months out, so this can be ignored for now.

setup parsnip engine docs

For all models, not just those with engines in parsnip, we create engine-specific documentation in parsnip/man/rmd. Details are here. We should add docs for the h20 engines.

use h2o::with_no_h2o_progress

In the very latest h2o version, they have this function documented exported. This should stop the progress bars and some other output.

edit: @ledell @tomasfryda Was the function supposed to be exported (since it is documented)?

use `parallelism` in `h2o.grid()`

We need to find a way to to specify parallelism in h2o.grid() and allow parallel model building. One possible solution is using control_grid(parallel_over) and have a condition for that here.

@topepo

allow other preprocessors

This line restricts h2o engines from being tuned unless there is a recipe. There are two other types of preprocessors so we should generalize this. There's probably code in tune to do this already.

use pkgdown

Once the repo is public, let's use usethis::use_pkgdown(). I already made a CNAME entry so we should be able to use agua.tidymodels.org.

Release agua 0.1.0

First release:

Prepare for release:

  • git pull
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • git push
  • Draft blog post
  • Slack link to draft blog in #open-source-comms

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted ๐ŸŽ‰
  • git push
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • usethis::use_news_md()
  • git push
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Unused arguement error while tuning

The problem

I cannot seem to tune with H20, I keep getting an "unused argument" error. I was using an example that used keras (which works fine) and I just switched the engine from keras to h2o and thought it should also work. But it didn't.

To track it down, I decided to run the code from https://agua.tidymodels.org/articles/tune.html

which is given below:

copied R code

library(tidymodels)
library(agua)
library(ggplot2)
theme_set(theme_bw())
doParallel::registerDoParallel()
h2o_start()
data(ames)

set.seed(4595)
data_split <- ames %>%
  mutate(Sale_Price = log10(Sale_Price)) %>%
  initial_split(strata = Sale_Price)
ames_train <- training(data_split)
ames_test <- testing(data_split)
cv_splits <- vfold_cv(ames_train, v = 10, strata = Sale_Price)

ames_rec <-
  recipe(Sale_Price ~ Gr_Liv_Area + Longitude + Latitude, data = ames_train) %>%
  step_log(Gr_Liv_Area, base = 10) %>%
  step_ns(Longitude, deg_free = tune("long df")) %>%
  step_ns(Latitude, deg_free = tune("lat df"))

lm_mod <- linear_reg(penalty = tune()) %>%
  set_engine("h2o")

lm_wflow <- workflow() %>%
  add_model(lm_mod) %>%
  add_recipe(ames_rec)

grid <- lm_wflow %>%
  extract_parameter_set_dials() %>%
  grid_regular(levels = 5)

ames_res <- tune_grid(
  lm_wflow,
  resamples = cv_splits,
  grid = grid,
  control = control_grid(save_pred = TRUE,
    backend_options = agua_backend_options(parallelism = 5))
)

ames_res

The output is :

Tuning results

10-fold cross-validation using stratification

There were issues with some computations:

  • Error(s) x10: Error in fn(...): unused arguments (metrics_info = list(c("rmse", "rsq"), c("minimiz...

Run show_notes(.Last.tune.result) for more information.

Follow up on the suggestion

If I run show_notes โ€ฆ, this is the output:

unique notes:
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Error in fn(...): unused arguments (metrics_info = list(c("rmse", "rsq"), c("minimize", "maximize"), c("numeric", "numeric")), list(c("penalty", "deg_free", "deg_free"), c("penalty", "long df", "lat df"), c("model_spec", "recipe", "recipe"), c("linear_reg", "step_ns", "step_ns"), c("main", "ns_PCP7q", "ns_33KwK"), list(list("double", list(-10, 0), c(TRUE, TRUE), list("log-10", function (x)
log(x, base), function (x)
base^x, function (x, n = n_default)
{
raw_rng <- suppressWarnings(range(x, na.rm = TRUE))
if (any(!is.finite(raw_rng))) {
return(numeric())
}
rng <- log(raw_rng, base = base)
min <- floor(rng[1])
max <- ceiling(rng[2])
if (max == min) {
return(base^min)
}
by <- floor((max - min)/n) + 1
breaks <- base^seq(min, max, by = by)
relevant_breaks <- base^rng[1] <= breaks & breaks <= base^rng[2]
if (sum(relevant_breaks) >= (n - 2)) {
return(breaks)
}
while (by > 1) {
by <- by - 1
breaks <- base^seq(min, max, by = by)

Error for `h2o_start()` without java installed

When I run h2o_start() without things installed/configured correctly, I do get

> h2o_start()
The operation couldnโ€™t be completed. Unable to locate a Java Runtime.
Please visit http://www.java.com for information on installing Java.

but then it just hangs there. It would be nice if that threw an error instead.

[New Functionalitiy]: Add explainability/interpretability functions from h2o.

Hi,

Thanks for bringing h2o capabilities to tidymodels!.

h2o already includes various functions to help in model's interpretation/explainability for binary classification and regression models:

  • h2o.shap_summary_plot()
  • h2o.shap_explain_row_plot()
  • h2o.pd_multi_plot()
  • h2o_pd_plot()
  • h2o_ice_plot()

These functions can also be applied to an h2o.automl() object.

All the available h2o functionality is documented here

Thanks!
Carlos.

auto_ml() model type

We'd need to add a model definition to parsnip (with a default engine of h2o) and add the rest in agua.

Not sure what the main arguments should be (max number of models?).

Release agua 0.1.4

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Bump required R version in DESCRIPTION to 4.0
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • revdepcheck::cloud_check()
  • Update cran-comments.md
  • git push

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted ๐ŸŽ‰
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)

case weights

The h2o functions take weights in the argument weights_column that is described as "Column with observation weights".

on.exit() for h2o tuning module

At the top of the iteration function we should run h2o.no_progress() and then use an on.exit() to:

  • run h2o.show_progress()
  • run a function that removes the model id's that were created.

external parallel processing

h2o parallelized internally by multithreading the training for an individual model.

We could also use R's external parallelization (via foreach or futures) to send more models to the h2o server at the same time.

We could also use both approaches.

Right now, when using multicore, it just works. For PSOCK clusters, it does not. It produces the error that it cannot find the h2o server.

Can we create a helper that will setup PSOCK clusters so that we can used them? We would need to experiment on what the worker processes are missing. It might be as simple as loading the h2o package in each.

todos after h2o August release

A reminder of todos after h2o's cran release (3.36.1.2)

  • change tune functions to use strategy = 'Sequential', also remove this line line from tune

  • update relevant parts in the vignette discussing parllel processing with h2o.grid

  • add tuning benchmark

  • use new progress functions

  • display threshold for classification models (if available)

  • discuss if explainability functions in #31 should be added

Cannot run tuning example

I am unable to run the code from the model tuning vignette here. When doing so, I get the following error when running tune_grid:

 Error in get(x, envir = ns, inherits = FALSE) : 
object 'tune_grid_loop_iter_h2o' not found
7.
get(x, envir = ns, inherits = FALSE)
6.
utils::getFromNamespace(x = "tune_grid_loop_iter_h2o", ns = "agua")
5.
fn_tune_grid_loop(resamples, grid, workflow, metrics, control, 
rng)
4.
tune_grid_loop(resamples = resamples, grid = grid, workflow = workflow, 
metrics = metrics, control = control, rng = rng)
3.
tune_grid_workflow(object, resamples = resamples, grid = grid, 
metrics = metrics, pset = param_info, control = control)
2.
tune_grid.workflow(lm_wflow, resamples = cv_splits, grid = grid, 
control = control_grid(save_pred = TRUE))
1.
tune_grid(lm_wflow, resamples = cv_splits, grid = grid, control = control_grid(save_pred = TRUE))

Any thoughts?

Interaction terms are ignored

The training wrapper functions (e.g., h2o_train_glm) did not receive possible interaction terms.

library(agua)
#> Loading required package: parsnip
h2o_start()

linear_mod <- linear_reg(penalty = 0.1) |> 
  set_engine("h2o") %>% 
  fit(mpg ~ wt * cyl, data = mtcars)

linear_mod$fit@parameters$x
#> [1] "wt"  "cyl"

Created on 2022-06-22 by the reprex package (v2.0.1)

Internal functions used in tune_grid_loop_iter_h2o

Internal functions used in tune_grid_loop_iter_h2o that may need to be exported or carried to agua:

setup for parallel processing

  • tune:::load_namespace

finalize and fit workflows when loooping parameters

  • tune:::catch_and_log
  • tune:::forge_from_workflow
  • workflows:::.fit_pre

formatting functions for predictions

compute metrics

  • tune::outcome_names
  • tune:::estimate_metrics

Error segfault

Getting following error when fitting drf model
Warning: stack imbalance in 'as.environment', 249 then 246
*** caught segfault ***
*** caught segfault ***
*** caught segfault ***
address 0x64209498, cause 'memory not mapped'
*** caught segfault ***
*** caught segfault ***
address 0x64209498, cause 'memory not mapped'
*** caught segfault ***
address 0x64209498, cause 'memory not mapped'
address 0x7fcfd18a4e7a, cause 'invalid permissions'
*** caught segfault ***
address 0x64209498, cause 'memory not mapped'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.