Prework <input type="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Support pattern in `tar_map()` about tarchetypes HOT 12 CLOSED

ropensci commented on June 11, 2024

Support pattern in `tar_map()`

from tarchetypes.

Comments (12)

wlandau commented on June 11, 2024

I do not think there is a need. tar_map() accepts fully-formed target objects with patterns of their own, and static branching modifies those patterns as needed. Users should set pattern beforehand on a target-by-target basis.

library(targets)
tar_script({
  library(tarchetypes)
  templates <- list(
    tar_target(x, f(a)),
    tar_target(y, f(x), pattern = map(x)),
    tar_target(z, g(y), pattern = map(y))
  )
  targets <- tar_map(values = list(a = c(1, 2)), templates)
  tar_pipeline(targets)
})

tar_manifest()
#> # A tibble: 6 x 3
#>   name  command pattern 
#>   <chr> <chr>   <chr>   
#> 1 z_1   g(y_1)  map(y_1)
#> 2 z_2   g(y_2)  map(y_2)
#> 3 y_1   f(x_1)  map(x_1)
#> 4 y_2   f(x_2)  map(x_2)
#> 5 x_1   f(1)    <NA>    
#> 6 x_2   f(2)    <NA>

tar_visnetwork(targets_only = TRUE)

^{Created on 2020-12-29 by the reprex package (v0.3.0)}

from tarchetypes.

psychelzh commented on June 11, 2024

Thanks, that is very clean. But here I have a case that the pattern of tar_map() have to be dynamic.

Specifically, I have data from many psychological tests, each of which has to be preprocessed by a different function. This makes my pipeline be based on data and function simultaneously. Functions are not treated as something to be branched over dynamically in targets, so I build static branches over all of these functions. However, the data have to be dynamic because they can not be fetched easily, and thus a target in the pipeline is required. And then the tar_map() is preferably branched over this target, a dynamic one.

So I am looking forward this pattern supported in tar_map(), although I can use tar_target_raw() to build iteratively by myself. Maybe there are some other better methods?

from tarchetypes.

wlandau commented on June 11, 2024

I suggest a different way to express the pipeline, either with static branching to split the data before preprocessing or dynamic branching to map over a list of functions. Sketch of the latter:

tar_pipeline(
  tar_target(
    dataset,
    download_dataset() %>%
      dplyr::group_by(...),
      targets::tar_group(),
    iteration = "group" 
  ),
  tar_target(
    functions,
    create_function_list_from_dataset(dataset),
    iteration = "list"
  ),
  tar_target(
    preprocessed,
    functions(dataset),
    pattern = map(functions, dataset)
  )
)

pattern in tar_map() would be incompatible with the design of targets and tarchetypes, and it would risk enabling suboptimal usage habits.

from tarchetypes.

psychelzh commented on June 11, 2024

Maybe the challenge here is to track the changes of functions. I have ever tried this method, and there is no way to make it work if there is a dynamic branch for functions.

from tarchetypes.

wlandau commented on June 11, 2024

The function list should always rerun (quick) but only the affected branches should rerun after that (see evidence below). So I think the workaround could work.

library(targets)

tar_script({
  options(crayon.enabled = FALSE)
  f <- function() {
    c(f = "f_current")
  }
  g <- function() {
    c(g = "g_old")
  }
  tar_pipeline(
    tar_target(
      functions,
      list(f = f, g = g),
      iteration = "list"
    ),
    tar_target(
      output,
      functions(),
      pattern = map(functions)
    )
  )
})

tar_make()
#> * run target functions
#> * run branch output_e1755eb6
#> * run branch output_51320a50

tar_read(output)
#>           f           g 
#> "f_current"     "g_old"

tar_script({
  options(crayon.enabled = FALSE)
  f <- function() {
    c(f = "f_current")
  }
  g <- function() {
    c(g = "g_new")
  }
  tar_pipeline(
    tar_target(
      functions,
      list(f = f, g = g),
      iteration = "list"
    ),
    tar_target(
      output,
      functions(),
      pattern = map(functions)
    )
  )
})

tar_make()
#> * run target functions
#> v skip branch output_e1755eb6
#> * run branch output_6dbb0c9e

tar_read(output)
#>           f           g 
#> "f_current"     "g_new"

^{Created on 2020-12-30 by the reprex package (v0.3.0)}

One note: functions have fickle internals that change the first few times they are called, which changes the hash. This was a problem in drake when users returned functions from targets: ropensci/drake#345. Because of the improvements in targets, I think this is unlikely to affect you (unless you call tar_make(callr_function = NULL) and manually run the functions repeatedly in the same R session) but I have not spent much time exploring it personally.

from tarchetypes.

psychelzh commented on June 11, 2024

Thank you very much! I have learnt what you mean, maybe I need to refactor it a little. Later I will post my solutions in some time, which uses tar_target_raw() instead.

from tarchetypes.

psychelzh commented on June 11, 2024

@wlandau Sorry to disturb you about this issue again. I just have no idea how to create the functions list from a configuration file which stores the function names as characters.

I have tried getFromNamespace() or get(), which worked but it will always rerun if the function is from a package and the package updates. When the function is in the global environment, there is nothing unexpected.

I also tried rlang::sym(), but no way. A symbol is just different from a call.

from tarchetypes.

wlandau commented on June 11, 2024

I just have no idea how to create the functions list from a configuration file which stores the function names as characters.

You can use tidy evaluation to insert symbols into a target's command. That should allow you to start from a character vector.

library(targets)
tar_script({
  library(rlang)
  library(targets)
  fns <- c(a = "a", b = "b")
  list(
    tar_target(functions, list(!!!syms(fns)))
  )
})

tar_manifest()
#> # A tibble: 1 x 3
#>   name      command            pattern
#>   <chr>     <chr>              <chr>  
#> 1 functions list(a = a, b = b) <NA>

^{Created on 2021-03-14 by the reprex package (v1.0.0)}

I have tried getFromNamespace() or get(), which worked but it will always rerun if the function is from a package and the package updates.

To track functions from a package and the inner nested functions they call, list the package name in the imports option as described here.

from tarchetypes.

psychelzh commented on June 11, 2024

Thank you very much for your response, and sorry for late feedback. This method requires that fns is an object not dynamically generated. So it is not so compatible with the previous suggestion of create_function_list_from_dataset().

I suggest a different way to express the pipeline, either with static branching to split the data before preprocessing or dynamic branching to map over a list of functions. Sketch of the latter:
tar_pipeline(
  tar_target(
    dataset,
    download_dataset() %>%
      dplyr::group_by(...),
      targets::tar_group(),
    iteration = "group" 
  ),
  tar_target(
    functions,
    create_function_list_from_dataset(dataset),
    iteration = "list"
  ),
  tar_target(
    preprocessed,
    functions(dataset),
    pattern = map(functions, dataset)
  )
)
pattern in tar_map() would be incompatible with the design of targets and tarchetypes, and it would risk enabling suboptimal usage habits.

from tarchetypes.

wlandau commented on June 11, 2024

If that's the case, then we are back to static branching over functions and dynamically branching over data, which is the better option anyway when it can be done.

library(targets)
tar_script({
  options(tidyverse.quiet = TRUE)
  library(rlang)
  library(targets)
  library(tarchetypes)
  library(tidyverse)
  values <- tribble(
    ~f,   ~g,   ~name,
    "f1", "g1", "function_set1",
    "f2", "g2", "function_set2",
  ) %>%
    mutate(f = syms(f), g = syms(g))
  tar_map(
    values = values,
    names = name,
    tar_target(x, f()),
    tar_target(y, g(x), pattern = map(x))
  )
})

tar_manifest()
#> # A tibble: 4 x 3
#>   name            command             pattern             
#>   <chr>           <chr>               <chr>               
#> 1 x_function_set1 f1()                <NA>                
#> 2 x_function_set2 f2()                <NA>                
#> 3 y_function_set1 g1(x_function_set1) map(x_function_set1)
#> 4 y_function_set2 g2(x_function_set2) map(x_function_set2)

tar_visnetwork(targets_only = TRUE)

^{Created on 2021-03-17 by the reprex package (v1.0.0)}

from tarchetypes.

psychelzh commented on June 11, 2024

Thank you for your quick response. I am just thinking about it the same way as your last post, so there should be some trade-off in this case. I will try using this method.

from tarchetypes.

psychelzh commented on June 11, 2024

The final version I used is as follows (original link), which is enough for use and very clear for me. Thank you very much for you help!

list(
  ...,
  targets_indices <- tar_map(
    values = tarflow.iquizoo::game_info %>%
      group_by(prep_fun_name) %>%
      summarise(game_ids = list(game_id), .groups = "drop") %>%
      mutate(prep_fun = syms(prep_fun_name)),
    names = prep_fun_name,
    tar_target(data_prep, data %>% filter(game_id %in% game_ids)),
    tar_target(indices, tarflow.iquizoo::calc_indices(data_prep, prep_fun))
  ),
  tar_combine(game_indices, targets_indices[[2]], format = "fst_tbl")
)

from tarchetypes.

Support pattern in `tar_map()` about tarchetypes HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent