Giter Site home page Giter Site logo

ropensci / tarchetypes Goto Github PK

View Code? Open in Web Editor NEW
120.0 7.0 15.0 1.51 MB

Archetypes for targets and pipelines

Home Page: https://docs.ropensci.org/tarchetypes

License: Other

R 100.00%
reproducibility high-performance-computing r data-science rstats pipeline r-package workflow targets r-targetopia

tarchetypes's Introduction

tarchetypes

ropensci zenodo R Targetopia CRAN status check codecov lint

The tarchetypes R package is a collection of target and pipeline archetypes for the targets package. These archetypes express complicated pipelines with concise syntax, which enhances readability and thus reproducibility. Archetypes are possible because of the flexible metaprogramming capabilities of targets. In targets, one can define a target as an object outside the central pipeline, and the tar_target_raw() function completely avoids non-standard evaluation. That means anyone can write their own niche interfaces for specialized projects. tarchetypes aims to include the most common and versatile archetypes and usage patterns.

Grouped data frames

tarchetypes has functions for easy dynamic branching over subsets of data frames:

  • tar_group_by(): define row groups using dplyr::group_by() semantics.
  • tar_group_select(): define row groups using tidyselect semantics.
  • tar_group_count(): define a given number row groups.
  • tar_group_size(): define row groups of a given size.

If you define a target with one of these functions, all downstream dynamic targets will automatically branch over the row groups.

# _targets.R file:
library(targets)
library(tarchetypes)
produce_data <- function() {
  expand.grid(var1 = c("a", "b"), var2 = c("c", "d"), rep = c(1, 2, 3))
}
list(
  tar_group_by(data, produce_data(), var1, var2),
  tar_target(group, data, pattern = map(data))
)
# R console:
library(targets)
tar_make()
#> ▶ dispatched target data
#> ● completed target data [0.007 seconds]
#> ▶ dispatched branch group_b3d7d010
#> ● completed branch group_b3d7d010 [0 seconds]
#> ▶ dispatched branch group_6a76c5c0
#> ● completed branch group_6a76c5c0 [0 seconds]
#> ▶ dispatched branch group_164b16bf
#> ● completed branch group_164b16bf [0 seconds]
#> ▶ dispatched branch group_f5aae602
#> ● completed branch group_f5aae602 [0 seconds]
#> ● completed pattern group
#> ▶ completed pipeline [0.104 seconds]

# First row group:
tar_read(group, branches = 1)
#> # A tibble: 3 × 4
#>   var1  var2    rep tar_group
#>   <fct> <fct> <dbl>     <int>
#> 1 a     c         1         1
#> 2 a     c         2         1
#> 3 a     c         3         1

# Second row group:
tar_read(group, branches = 2)
#> # A tibble: 3 × 4
#>   var1  var2    rep tar_group
#>   <fct> <fct> <dbl>     <int>
#> 1 a     d         1         2
#> 2 a     d         2         2
#> 3 a     d         3         2

Literate programming

Consider the following R Markdown report.

---
title: report
output: html_document
---

```{r}
library(targets)
tar_read(dataset)
```

We want to define a target to render the report. And because the report calls tar_read(dataset), this target needs to depend on dataset. Without tarchetypes, it is cumbersome to set up the pipeline correctly.

# _targets.R
library(targets)
list(
  tar_target(dataset, data.frame(x = letters)),
  tar_target(
    report, {
      # Explicitly mention the symbol `dataset`.
      list(dataset)
      # Return relative paths to keep the project portable.
      fs::path_rel(
        # Need to return/track all input/output files.
        c( 
          rmarkdown::render(
            input = "report.Rmd",
            # Always run from the project root
            # so the report can find _targets/.
            knit_root_dir = getwd(),
            quiet = TRUE
          ),
          "report.Rmd"
        )
      )
    },
    # Track the input and output files.
    format = "file",
    # Avoid building small reports on HPC.
    deployment = "main"
  )
)

With tarchetypes, we can simplify the pipeline with the tar_render() archetype.

# _targets.R
library(targets)
library(tarchetypes)
list(
  tar_target(dataset, data.frame(x = letters)),
  tar_render(report, "report.Rmd")
)

Above, tar_render() scans code chunks for mentions of targets in tar_load() and tar_read(), and it enforces the dependency relationships it finds. In our case, it reads report.Rmd and then forces report to depend on dataset. That way, tar_make() always processes dataset before report, and it automatically reruns report.Rmd whenever dataset changes.

Alternative pipeline syntax

tar_plan() is a drop-in replacement for drake_plan() in the targets ecosystem. It lets users write targets as name/command pairs without having to call tar_target().

tar_plan(
  tar_file(raw_data_file, "data/raw_data.csv", format = "file"),
  # Simple drake-like syntax:
  raw_data = read_csv(raw_data_file, col_types = cols()),
  data =raw_data %>%
    mutate(Ozone = replace_na(Ozone, mean(Ozone, na.rm = TRUE))),
  hist = create_plot(data),
  fit = biglm(Ozone ~ Wind + Temp, data),
  # Needs tar_render() because it is a target archetype:
  tar_render(report, "report.Rmd")
)

Installation

Type Source Command
Release CRAN install.packages("tarchetypes")
Development GitHub remotes::install_github("ropensci/tarchetypes")
Development rOpenSci install.packages("tarchetypes", repos = "https://dev.ropensci.org")

Documentation

For specific documentation on tarchetypes, including the help files of all user-side functions, please visit the reference website. For documentation on targets in general, please visit the targets reference website. Many of the linked resources use tarchetypes functions such as tar_render().

Help

Please read the help guide to learn how best to ask for help using targets and tarchetypes.

Code of conduct

Please note that this package is released with a Contributor Code of Conduct.

Citation

citation("tarchetypes")
#> To cite tarchetypes in publications use:
#> 
#>   William Michael Landau (2021). tarchetypes: Archetypes for Targets.
#>   https://docs.ropensci.org/tarchetypes/,
#>   https://github.com/ropensci/tarchetypes.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {tarchetypes: Archetypes for Targets},
#>     author = {William Michael Landau},
#>     year = {2021},
#>     note = {{https://docs.ropensci.org/tarchetypes/, https://github.com/ropensci/tarchetypes}},
#>   }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.