Giter Site home page Giter Site logo

Comments (12)

wlandau avatar wlandau commented on June 11, 2024

I do not think there is a need. tar_map() accepts fully-formed target objects with patterns of their own, and static branching modifies those patterns as needed. Users should set pattern beforehand on a target-by-target basis.

library(targets)
tar_script({
  library(tarchetypes)
  templates <- list(
    tar_target(x, f(a)),
    tar_target(y, f(x), pattern = map(x)),
    tar_target(z, g(y), pattern = map(y))
  )
  targets <- tar_map(values = list(a = c(1, 2)), templates)
  tar_pipeline(targets)
})

tar_manifest()
#> # A tibble: 6 x 3
#>   name  command pattern 
#>   <chr> <chr>   <chr>   
#> 1 z_1   g(y_1)  map(y_1)
#> 2 z_2   g(y_2)  map(y_2)
#> 3 y_1   f(x_1)  map(x_1)
#> 4 y_2   f(x_2)  map(x_2)
#> 5 x_1   f(1)    <NA>    
#> 6 x_2   f(2)    <NA>

tar_visnetwork(targets_only = TRUE)

Created on 2020-12-29 by the reprex package (v0.3.0)

from tarchetypes.

psychelzh avatar psychelzh commented on June 11, 2024

Thanks, that is very clean. But here I have a case that the pattern of tar_map() have to be dynamic.

Specifically, I have data from many psychological tests, each of which has to be preprocessed by a different function. This makes my pipeline be based on data and function simultaneously. Functions are not treated as something to be branched over dynamically in targets, so I build static branches over all of these functions. However, the data have to be dynamic because they can not be fetched easily, and thus a target in the pipeline is required. And then the tar_map() is preferably branched over this target, a dynamic one.

So I am looking forward this pattern supported in tar_map(), although I can use tar_target_raw() to build iteratively by myself. Maybe there are some other better methods?

from tarchetypes.

wlandau avatar wlandau commented on June 11, 2024

I suggest a different way to express the pipeline, either with static branching to split the data before preprocessing or dynamic branching to map over a list of functions. Sketch of the latter:

tar_pipeline(
  tar_target(
    dataset,
    download_dataset() %>%
      dplyr::group_by(...),
      targets::tar_group(),
    iteration = "group" 
  ),
  tar_target(
    functions,
    create_function_list_from_dataset(dataset),
    iteration = "list"
  ),
  tar_target(
    preprocessed,
    functions(dataset),
    pattern = map(functions, dataset)
  )
)

pattern in tar_map() would be incompatible with the design of targets and tarchetypes, and it would risk enabling suboptimal usage habits.

from tarchetypes.

psychelzh avatar psychelzh commented on June 11, 2024

Maybe the challenge here is to track the changes of functions. I have ever tried this method, and there is no way to make it work if there is a dynamic branch for functions.

from tarchetypes.

wlandau avatar wlandau commented on June 11, 2024

The function list should always rerun (quick) but only the affected branches should rerun after that (see evidence below). So I think the workaround could work.

library(targets)

tar_script({
  options(crayon.enabled = FALSE)
  f <- function() {
    c(f = "f_current")
  }
  g <- function() {
    c(g = "g_old")
  }
  tar_pipeline(
    tar_target(
      functions,
      list(f = f, g = g),
      iteration = "list"
    ),
    tar_target(
      output,
      functions(),
      pattern = map(functions)
    )
  )
})

tar_make()
#> * run target functions
#> * run branch output_e1755eb6
#> * run branch output_51320a50

tar_read(output)
#>           f           g 
#> "f_current"     "g_old"

tar_script({
  options(crayon.enabled = FALSE)
  f <- function() {
    c(f = "f_current")
  }
  g <- function() {
    c(g = "g_new")
  }
  tar_pipeline(
    tar_target(
      functions,
      list(f = f, g = g),
      iteration = "list"
    ),
    tar_target(
      output,
      functions(),
      pattern = map(functions)
    )
  )
})

tar_make()
#> * run target functions
#> v skip branch output_e1755eb6
#> * run branch output_6dbb0c9e

tar_read(output)
#>           f           g 
#> "f_current"     "g_new"

Created on 2020-12-30 by the reprex package (v0.3.0)

One note: functions have fickle internals that change the first few times they are called, which changes the hash. This was a problem in drake when users returned functions from targets: ropensci/drake#345. Because of the improvements in targets, I think this is unlikely to affect you (unless you call tar_make(callr_function = NULL) and manually run the functions repeatedly in the same R session) but I have not spent much time exploring it personally.

from tarchetypes.

psychelzh avatar psychelzh commented on June 11, 2024

Thank you very much! I have learnt what you mean, maybe I need to refactor it a little. Later I will post my solutions in some time, which uses tar_target_raw() instead.

from tarchetypes.

psychelzh avatar psychelzh commented on June 11, 2024

@wlandau Sorry to disturb you about this issue again. I just have no idea how to create the functions list from a configuration file which stores the function names as characters.

I have tried getFromNamespace() or get(), which worked but it will always rerun if the function is from a package and the package updates. When the function is in the global environment, there is nothing unexpected.

I also tried rlang::sym(), but no way. A symbol is just different from a call.

from tarchetypes.

wlandau avatar wlandau commented on June 11, 2024

I just have no idea how to create the functions list from a configuration file which stores the function names as characters.

You can use tidy evaluation to insert symbols into a target's command. That should allow you to start from a character vector.

library(targets)
tar_script({
  library(rlang)
  library(targets)
  fns <- c(a = "a", b = "b")
  list(
    tar_target(functions, list(!!!syms(fns)))
  )
})

tar_manifest()
#> # A tibble: 1 x 3
#>   name      command            pattern
#>   <chr>     <chr>              <chr>  
#> 1 functions list(a = a, b = b) <NA>

Created on 2021-03-14 by the reprex package (v1.0.0)

I have tried getFromNamespace() or get(), which worked but it will always rerun if the function is from a package and the package updates.

To track functions from a package and the inner nested functions they call, list the package name in the imports option as described here.

from tarchetypes.

psychelzh avatar psychelzh commented on June 11, 2024

Thank you very much for your response, and sorry for late feedback. This method requires that fns is an object not dynamically generated. So it is not so compatible with the previous suggestion of create_function_list_from_dataset().

I suggest a different way to express the pipeline, either with static branching to split the data before preprocessing or dynamic branching to map over a list of functions. Sketch of the latter:

tar_pipeline(
  tar_target(
    dataset,
    download_dataset() %>%
      dplyr::group_by(...),
      targets::tar_group(),
    iteration = "group" 
  ),
  tar_target(
    functions,
    create_function_list_from_dataset(dataset),
    iteration = "list"
  ),
  tar_target(
    preprocessed,
    functions(dataset),
    pattern = map(functions, dataset)
  )
)

pattern in tar_map() would be incompatible with the design of targets and tarchetypes, and it would risk enabling suboptimal usage habits.

from tarchetypes.

wlandau avatar wlandau commented on June 11, 2024

If that's the case, then we are back to static branching over functions and dynamically branching over data, which is the better option anyway when it can be done.

library(targets)
tar_script({
  options(tidyverse.quiet = TRUE)
  library(rlang)
  library(targets)
  library(tarchetypes)
  library(tidyverse)
  values <- tribble(
    ~f,   ~g,   ~name,
    "f1", "g1", "function_set1",
    "f2", "g2", "function_set2",
  ) %>%
    mutate(f = syms(f), g = syms(g))
  tar_map(
    values = values,
    names = name,
    tar_target(x, f()),
    tar_target(y, g(x), pattern = map(x))
  )
})

tar_manifest()
#> # A tibble: 4 x 3
#>   name            command             pattern             
#>   <chr>           <chr>               <chr>               
#> 1 x_function_set1 f1()                <NA>                
#> 2 x_function_set2 f2()                <NA>                
#> 3 y_function_set1 g1(x_function_set1) map(x_function_set1)
#> 4 y_function_set2 g2(x_function_set2) map(x_function_set2)

tar_visnetwork(targets_only = TRUE)

Created on 2021-03-17 by the reprex package (v1.0.0)

from tarchetypes.

psychelzh avatar psychelzh commented on June 11, 2024

Thank you for your quick response. I am just thinking about it the same way as your last post, so there should be some trade-off in this case. I will try using this method.

from tarchetypes.

psychelzh avatar psychelzh commented on June 11, 2024

The final version I used is as follows (original link), which is enough for use and very clear for me. Thank you very much for you help!

list(
  ...,
  targets_indices <- tar_map(
    values = tarflow.iquizoo::game_info %>%
      group_by(prep_fun_name) %>%
      summarise(game_ids = list(game_id), .groups = "drop") %>%
      mutate(prep_fun = syms(prep_fun_name)),
    names = prep_fun_name,
    tar_target(data_prep, data %>% filter(game_id %in% game_ids)),
    tar_target(indices, tarflow.iquizoo::calc_indices(data_prep, prep_fun))
  ),
  tar_combine(game_indices, targets_indices[[2]], format = "fst_tbl")
)

from tarchetypes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.