Giter Site home page Giter Site logo

business-science / timetk Goto Github PK

View Code? Open in Web Editor NEW
608.0 38.0 98.0 115.15 MB

Time series analysis in the `tidyverse`

Home Page: https://business-science.github.io/timetk/

R 98.79% CSS 1.06% Rez 0.15%
time-series timeseries tidyverse dplyr tidyquant tidy tibble coercion-functions time forecasting

timetk's Introduction

timetk for R

R-CMD-check CRAN_Status_Badge codecov

Making time series analysis in R easier.

Mission: To make time series analysis in R easier, faster, and more enjoyable.

Installation

Download the development version with latest features:

remotes::install_github("business-science/timetk")

Or, download CRAN approved version:

install.packages("timetk")

Package Functionality

There are many R packages for working with Time Series data. Here’s how timetk compares to the “tidy” time series R packages for data visualization, wrangling, and feature engineeering (those that leverage data frames or tibbles).

Task timetk tsibble feasts tibbletime (retired)
Structure
Data Structure tibble (tbl) tsibble (tbl_ts) tsibble (tbl_ts) tibbletime (tbl_time)
Visualization
Interactive Plots (plotly)
Static Plots (ggplot)
Time Series
Correlation, Seasonality
Data Wrangling
Time-Based Summarization
Time-Based Filtering
Padding Gaps
Low to High Frequency
Imputation
Sliding / Rolling
Machine Learning
Time Series Machine Learning
Anomaly Detection
Clustering
Feature Engineering (recipes)
Date Feature Engineering
Holiday Feature Engineering
Fourier Series
Smoothing & Rolling
Padding
Imputation
Cross Validation (rsample)
Time Series Cross Validation
Time Series CV Plan Visualization
More Awesomeness
Making Time Series (Intelligently)
Handling Holidays & Weekends
Class Conversion
Automatic Frequency & Trend

Getting Started

Summary

Timetk is an amazing package that is part of the modeltime ecosystem for time series analysis and forecasting. The forecasting system is extensive, and it can take a long time to learn:

  • Many algorithms
  • Ensembling and Resampling
  • Machine Learning
  • Deep Learning
  • Scalable Modeling: 10,000+ time series

Your probably thinking how am I ever going to learn time series forecasting. Here’s the solution that will save you years of struggling.

Take the High-Performance Forecasting Course

Become the forecasting expert for your organization

High-Performance Time Series Forecasting Course

High-Performance Time Series Course

Time Series is Changing

Time series is changing. Businesses now need 10,000+ time series forecasts every day. This is what I call a High-Performance Time Series Forecasting System (HPTSF) - Accurate, Robust, and Scalable Forecasting.

High-Performance Forecasting Systems will save companies by improving accuracy and scalability. Imagine what will happen to your career if you can provide your organization a “High-Performance Time Series Forecasting System” (HPTSF System).

How to Learn High-Performance Time Series Forecasting

I teach how to build a HPTFS System in my High-Performance Time Series Forecasting Course. You will learn:

  • Time Series Machine Learning (cutting-edge) with Modeltime - 30+ Models (Prophet, ARIMA, XGBoost, Random Forest, & many more)
  • Deep Learning with GluonTS (Competition Winners)
  • Time Series Preprocessing, Noise Reduction, & Anomaly Detection
  • Feature engineering using lagged variables & external regressors
  • Hyperparameter Tuning
  • Time series cross-validation
  • Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
  • Scalable Forecasting - Forecast 1000+ time series in parallel
  • and more.

Become the Time Series Expert for your organization.


Take the High-Performance Time Series Forecasting Course

Acknowledgements

The timetk package wouldn’t be possible without other amazing time series packages.

  • stats - Basically every timetk function that uses a period (frequency) argument owes it to ts().
    • plot_acf_diagnostics(): Leverages stats::acf(), stats::pacf() & stats::ccf()
    • plot_stl_diagnostics(): Leverages stats::stl()
  • lubridate: timetk makes heavy use of floor_date(), ceiling_date(), and duration() for “time-based phrases”.
    • Add and Subtract Time (%+time% & %-time%): "2012-01-01" %+time% "1 month 4 days" uses lubridate to intelligently offset the day
  • xts: Used to calculate periodicity and fast lag automation.
  • forecast (retired): Possibly my favorite R package of all time. It’s based on ts, and its predecessor is the tidyverts (fable, tsibble, feasts, and fabletools).
    • The ts_impute_vec() function for low-level vectorized imputation using STL + Linear Interpolation uses na.interp() under the hood.
    • The ts_clean_vec() function for low-level vectorized imputation using STL + Linear Interpolation uses tsclean() under the hood.
    • Box Cox transformation auto_lambda() uses BoxCox.Lambda().
  • tibbletime (retired): While timetk does not import tibbletime, it uses much of the innovative functionality to interpret time-based phrases:
    • tk_make_timeseries() - Extends seq.Date() and seq.POSIXt() using a simple phase like “2012-02” to populate the entire time series from start to finish in February 2012.
    • filter_by_time(), between_time() - Uses innovative endpoint detection from phrases like “2012”
    • slidify() is basically rollify() using slider (see below).
  • slider: A powerful R package that provides a purrr-syntax for complex rolling (sliding) calculations.
    • slidify() uses slider::pslide under the hood.
    • slidify_vec() uses slider::slide_vec() for simple vectorized rolls (slides).
  • padr: Used for padding time series from low frequency to high frequency and filling in gaps.
    • The pad_by_time() function is a wrapper for padr::pad().
    • See the step_ts_pad() to apply padding as a preprocessing recipe!
  • TSstudio: This is the best interactive time series visualization tool out there. It leverages the ts system, which is the same system the forecast R package uses. A ton of inspiration for visuals came from using TSstudio.

timetk's People

Contributors

emilhvitfeldt avatar jarodmeng avatar joelgombin avatar jorane avatar karina2808 avatar mdancho84 avatar mitokic avatar olivroy avatar realauggieheschmeyer avatar romainfrancois avatar samuelmacedo83 avatar tbradley1013 avatar tonyk7440 avatar topepo avatar tylergrantsmith avatar vspinu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

timetk's Issues

tk_make_future_timeseries does not correctly handle timezones.

Hi,

I'm trying to use an index based on POSIXct with the timezone set to a non UTC zone. Unfortunately, the tk_make_future_timeseries() does not handle the creation of future time series correctly when the timezone has been set to something other than UTC.

The first example is based on the one given for the tk_make_future_timeseries() function, that works as expected.

idx <- c("2015-04-01 00:00:00",
         "2015-04-01 01:00:00",
         "2015-04-01 02:00:00") %>%
  ymd_hms()
idx
idx %>%
  tk_make_future_timeseries(n_future = 3)

[1] "2015-04-01 03:00:00 UTC" "2015-04-01 04:00:00 UTC" "2015-04-01 05:00:00 UTC"

However, once we set a time zone for the idx, things get weird...

idx <- c("2015-04-05 00:00:00",
         "2015-04-05 01:00:00",
         "2015-04-05 02:00:00") %>%
  ymd_hms(tz = 'Africa/Bujumbura')
idx
idx %>%
  tk_make_future_timeseries(n_future = 3)

[1] "2015-04-05 01:00:00 CAT" "2015-04-05 02:00:00 CAT" "2015-04-05 03:00:00 CAT"
Here, the Bujumbura future time series starts in the middle of the idx time series.

idx <- c("2015-04-01 00:00:00",
         "2015-04-01 01:00:00",
         "2015-04-01 02:00:00") %>%
  ymd_hms(tz = 'Hongkong')
idx
idx %>%
  tk_make_future_timeseries(n_future = 3)

[1] "2015-03-31 19:00:00 HKT" "2015-03-31 20:00:00 HKT" "2015-03-31 21:00:00 HKT"

The Hong Kong future time series starts before the idx time series starts, which is not expected.

A (inefficient??) work-around is to convert the posixct to numerical values, then back again after calling
tk_make_future_timeseries()

idx <- c("2015-04-01 00:00:00",
         "2015-04-01 01:00:00",
         "2015-04-01 02:00:00") %>%
  ymd_hms(tz = 'Hongkong')
idx

idx.future <- as.numeric(idx) %>%
  tk_make_future_timeseries(n_future = 3)

as.POSIXct(idx.future, origin = '1970-01-01 00:00:00', tz = 'Hongkong')

"2015-04-01 03:00:00 HKT" "2015-04-01 04:00:00 HKT" "2015-04-01 05:00:00 HKT"
Hence we get the expected future time series.

I hope this can be fixed as timetk looks to be a very useful toolbox that helps with many of the compatibility issues between the Pantheon of time series packages.

Strange result tk_make_timeseries() using by = "8 day" as an argument

Hello Matt Dancho

I am exploring the function tk_make_timeseries() using version 2.3.0 of the package timetk and I get the following result when using tk_make_timeseries(start_date = "2011-01-01", by = "8 day", length_out = 10) where I attach information about my session:

library(timetk)
sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=es_CO.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=es_CO.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=es_CO.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=es_CO.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] timetk_2.3.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] zoo_1.8-8          tidyselect_1.1.0   xfun_0.17          purrr_0.3.4       
#>  [5] listenv_0.8.0      splines_4.0.2      lattice_0.20-41    colorspace_1.4-1  
#>  [9] vctrs_0.3.4        generics_0.0.2     htmltools_0.5.0    rsample_0.0.8     
#> [13] yaml_2.2.1         survival_3.1-12    prodlim_2019.11.13 rlang_0.4.7       
#> [17] pillar_1.4.6       glue_1.4.2         withr_2.3.0        lifecycle_0.2.0   
#> [21] lava_1.6.8         stringr_1.4.0      timeDate_3043.102  munsell_0.5.0     
#> [25] gtable_0.3.0       future_1.19.1      recipes_0.1.13     codetools_0.2-16  
#> [29] evaluate_0.14      knitr_1.30         parallel_4.0.2     class_7.3-17      
#> [33] highr_0.8          furrr_0.1.0        xts_0.12.1         Rcpp_1.0.5        
#> [37] scales_1.1.1       ipred_0.9-9        ggplot2_3.3.2      digest_0.6.25     
#> [41] stringi_1.5.3      dplyr_1.0.2        grid_4.0.2         tools_4.0.2       
#> [45] magrittr_1.5       tibble_3.0.3       crayon_1.3.4       tidyr_1.1.2       
#> [49] pkgconfig_2.0.3    MASS_7.3-53        ellipsis_0.3.1     Matrix_1.2-18     
#> [53] lubridate_1.7.9    gower_0.2.2        rmarkdown_2.3      R6_2.4.1          
#> [57] globals_0.13.0     rpart_4.1-15       nnet_7.3-14        compiler_4.0.2

tk_make_timeseries(start_date = "2011-01-01",
                   by         = "8 day",
                   length_out = 10)
#>  [1] "2011-01-01" "2010-12-31" "2010-12-31" "2010-12-31" "2011-01-31"
#>  [6] "2011-01-31" "2011-01-31" "2011-01-31" "2011-02-28" "2011-02-28"

Created on 2020-09-28 by the reprex package (v0.3.0)

The time stamp jump from 2011-01-01 to 2010-12-31

However I don't have the same problem with by = "1 day", by = "2 day", ..., by = "7 day". For example with tk_make_timeseries(start_date = "2011-01-01", by = "7 day", length_out = 10):

library(timetk)
sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=es_CO.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=es_CO.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=es_CO.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=es_CO.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] timetk_2.3.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] zoo_1.8-8          tidyselect_1.1.0   xfun_0.17          purrr_0.3.4       
#>  [5] listenv_0.8.0      splines_4.0.2      lattice_0.20-41    colorspace_1.4-1  
#>  [9] vctrs_0.3.4        generics_0.0.2     htmltools_0.5.0    rsample_0.0.8     
#> [13] yaml_2.2.1         survival_3.1-12    prodlim_2019.11.13 rlang_0.4.7       
#> [17] pillar_1.4.6       glue_1.4.2         withr_2.3.0        lifecycle_0.2.0   
#> [21] lava_1.6.8         stringr_1.4.0      timeDate_3043.102  munsell_0.5.0     
#> [25] gtable_0.3.0       future_1.19.1      recipes_0.1.13     codetools_0.2-16  
#> [29] evaluate_0.14      knitr_1.30         parallel_4.0.2     class_7.3-17      
#> [33] highr_0.8          furrr_0.1.0        xts_0.12.1         Rcpp_1.0.5        
#> [37] scales_1.1.1       ipred_0.9-9        ggplot2_3.3.2      digest_0.6.25     
#> [41] stringi_1.5.3      dplyr_1.0.2        grid_4.0.2         tools_4.0.2       
#> [45] magrittr_1.5       tibble_3.0.3       crayon_1.3.4       tidyr_1.1.2       
#> [49] pkgconfig_2.0.3    MASS_7.3-53        ellipsis_0.3.1     Matrix_1.2-18     
#> [53] lubridate_1.7.9    gower_0.2.2        rmarkdown_2.3      R6_2.4.1          
#> [57] globals_0.13.0     rpart_4.1-15       nnet_7.3-14        compiler_4.0.2

tk_make_timeseries(start_date = "2011-01-01",
                   by         = "7 day",
                   length_out = 10)
#>  [1] "2011-01-01" "2011-01-08" "2011-01-15" "2011-01-22" "2011-01-29"
#>  [6] "2011-02-05" "2011-02-12" "2011-02-19" "2011-02-26" "2011-03-05"

Created on 2020-09-28 by the reprex package (v0.3.0)

I get the correct time stamps

plot_anomaly_diagnostics() do not know how to convert "x" to class "Date" but is date already

I am trying to use plot_anomaly_diagnostics() and get the following error:

frequency = 12 observations per 1 year
trend = 12 observations per 1 year
Error in as.Date.default(x, origin = "1970-01-01", tz = scale$timezone) : 
  do not know how to convert 'x' to class “Date”

Yet when I do class(query$date_col) I get "Date" returned.

zip of rds file attached.

R version 3.6.3
Session Info:

R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tidyquant_1.0.1            quantmod_0.4.17            TTR_0.23-6                
 [4] PerformanceAnalytics_2.0.4 xts_0.12-0                 zoo_1.8-8                 
 [7] janitor_2.0.1              DBI_1.1.0                  odbc_1.2.2                
[10] timetk_2.2.0               lubridate_1.7.9            forcats_0.5.0             
[13] stringr_1.4.0              readr_1.3.1                tidyverse_1.3.0           
[16] modeltime_0.0.2            yardstick_0.0.7            workflows_0.1.2           
[19] tune_0.1.1                 tidyr_1.1.0                tibble_3.0.3              
[22] rsample_0.0.7              recipes_0.1.13             purrr_0.3.4               
[25] parsnip_0.1.2              modeldata_0.0.2            infer_0.5.3               
[28] ggplot2_3.3.2              dplyr_1.0.0                dials_0.0.8               
[31] scales_1.1.1               broom_0.7.0                tidymodels_0.1.1          

loaded via a namespace (and not attached):
 [1] colorspace_1.4-1   ellipsis_0.3.1     class_7.3-15       snakecase_0.11.0  
 [5] fs_1.4.2           rstudioapi_0.11    listenv_0.8.0      furrr_0.1.0       
 [9] bit64_0.9-7.1      prodlim_2019.11.13 fansi_0.4.1        xml2_1.3.2        
[13] codetools_0.2-16   splines_3.6.3      jsonlite_1.7.0     pROC_1.16.2       
[17] packrat_0.5.0      dbplyr_1.4.4       anomalize_0.2.1    compiler_3.6.3    
[21] httr_1.4.1         backports_1.1.8    lazyeval_0.2.2     assertthat_0.2.1  
[25] Matrix_1.2-18      cli_2.0.2          htmltools_0.5.0    tools_3.6.3       
[29] gtable_0.3.0       glue_1.4.1         fastmatch_1.1-0    Rcpp_1.0.5        
[33] cellranger_1.1.0   DiceDesign_1.8-1   vctrs_0.3.2        crosstalk_1.1.0.1 
[37] iterators_1.0.12   timeDate_3043.102  gower_0.2.2        globals_0.12.5    
[41] rvest_0.3.5        lifecycle_0.2.0    pacman_0.5.1       future_1.18.0     
[45] MASS_7.3-51.5      ipred_0.9-9        install.load_1.2.3 hms_0.5.3         
[49] parallel_3.6.3     yaml_2.2.1         curl_4.3           gridExtra_2.3     
[53] rpart_4.1-15       stringi_1.4.6      foreach_1.5.0      checkmate_2.0.0   
[57] lhs_1.0.2          lava_1.6.7         rlang_0.4.7        pkgconfig_2.0.3   
[61] lattice_0.20-41    labeling_0.3       htmlwidgets_1.5.1  bit_1.1-15.2      
[65] tidyselect_1.1.0   plyr_1.8.6         magrittr_1.5       R6_2.4.1          
[69] generics_0.0.2     pillar_1.4.6       haven_2.3.1        withr_2.2.0       
[73] survival_3.2-3     nnet_7.3-14        tibbletime_0.1.5   modelr_0.1.8      
[77] crayon_1.3.4       utf8_1.1.4         Quandl_2.10.0      xgboost_1.1.1.1   
[81] plotly_4.9.2.1     grid_3.6.3         readxl_1.3.1       data.table_1.12.8 
[85] blob_1.2.1         reprex_0.3.0       digest_0.6.25      GPfit_1.0-8       
[89] munsell_0.5.0      viridisLite_0.3.0  quadprog_1.5-8 

When I run the following I get what is expected:

query %>%
  arrange(date_col) %>%
  time_decompose(excess_days, method = "twitter") %>%
  anomalize(remainder, method = "gesd") %>%
  time_recompose() %>%
  plot_anomaly_decomposition() +
  ggtitle("Anomaly Decomposition - IP Discharges") +
  labs(
    x = ""
    , y = ""
    , subtitle = "Using Twitters' S-H-ESD algorithm"
  )

grouped_tbl.zip

tk_get_timeseries_signature() returning wrong number of rows

I am running the FANG script and it is throwing an error because new_data is only generating 173 records while actual_future has 252. Can you explain how you made this script run? Thanks.

new_data <- train %>%
tk_index() %>%
tk_make_future_timeseries(n_future = 252, skip_values = holidays, inspect_weekdays = TRUE) %>%
tk_get_timeseries_signature()

pred_lm <- predict(fit_lm, newdata = new_data)

actual_future <- actual_future %>%
add_column(yhat = pred_lm)
Error: .data must have 252 rows, not 173

Error in time_series_cv

Time Series Cross Validation returns an error, after successfully creating the CV object. The error received is

Error in [.tbl_df(x, is.finite(x <- as.numeric(x))) : 'list' object cannot be coerced to type 'double'

This error is thrown when I was trying to look at the object via Console, and in View(), it shows empty object with only headers. However, I was able to access all the splits, using analysis() or assessment() methods. And, I can also use it for tuning.

error message in tk_ts

I get the following message when I call tk_ts() function:

tk_ts(select=roubo_veiculo,frequency=12,start=2003)

Error in tk_xts_.data.frame(ret, select = select, silent = silent) :
No date or date-time column found. Object must contain an unambigous date or date-time column

I thought that there was no need to specify a date column.

tk_ts dose not accept a variable for select parameter.

I've written a function that processes data in the mutate map manner containing an inner function.
While doing so I realized that the tk_ts()-function can't handle a variable for the select statement.
Here is a MWE that shows the broken function which I would like to work and a working function which can't be parameterized:

library(timetk)
library(dplyr)
library(purrr)
library(lubridate)

DataComputationBroken <- function(data, month.per.year, smoothing.value.name) {
  data %>%
    mutate(timeseries = map(
      .x = dates.and.values,
      .f = function(x) {
        first.real.date <- min(x$Date)
        tk_ts(x,
          select = smoothing.value.name,
          start = c(year(first.real.date), month(first.real.date)),
          frequency = month.per.year
        )
      }
    ))
}

DataComputationOK <- function(data, month.per.year) {
  data %>%
    mutate(timeseries = map(
      .x = dates.and.values,
      .f = function(x) {
        first.real.date <- min(x$Date)
        tk_ts(x,
          select = "Val1",
          start = c(year(first.real.date), month(first.real.date)),
          frequency = month.per.year
        )
      }
    ))
}

data.1 <- tibble(
  Date = seq(as.Date("2001-01-01"),
    as.Date("2001-12-01"),
    by = "1 month"
  ),
  Val1 = 1:12,
  Val2 = 2:13
)

data.2 <- tibble(
  Date = seq(as.Date("2002-01-01"),
    as.Date("2002-12-01"),
    by = "1 month"
  ),
  Val1 = 1:12,
  Val2 = 2:13
)

data <- tibble(dates.and.values = list(data.1, data.2))

DataComputationOK(data, 12)
DataComputationBroken(data, 12, "Val1")

Is there a way to make the DataComputationBroken working?

plot_anomaly_diagnostics Error in as.POSIXct.default but data is POSIXct

Session Info:

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] anomalize_0.2.1            tidyquant_1.0.1            quantmod_0.4.17           
 [4] TTR_0.23-6                 PerformanceAnalytics_2.0.4 xts_0.12-0                
 [7] zoo_1.8-8                  janitor_2.0.1              DBI_1.1.0                 
[10] odbc_1.2.2                 timetk_2.2.0               lubridate_1.7.9           
[13] forcats_0.5.0              stringr_1.4.0              readr_1.3.1               
[16] tidyverse_1.3.0            modeltime_0.0.2            yardstick_0.0.7           
[19] workflows_0.1.2            tune_0.1.1                 tidyr_1.1.0               
[22] tibble_3.0.3               rsample_0.0.7              recipes_0.1.13            
[25] purrr_0.3.4                parsnip_0.1.2              modeldata_0.0.2           
[28] infer_0.5.3                ggplot2_3.3.2              dplyr_1.0.0               
[31] dials_0.0.8                scales_1.1.1               broom_0.7.0               
[34] tidymodels_0.1.1           pacman_0.5.1              

loaded via a namespace (and not attached):
 [1] colorspace_1.4-1   ellipsis_0.3.1     class_7.3-15       snakecase_0.11.0  
 [5] fs_1.4.2           rstudioapi_0.11    listenv_0.8.0      furrr_0.1.0       
 [9] bit64_0.9-7.1      prodlim_2019.11.13 fansi_0.4.1        xml2_1.3.2        
[13] codetools_0.2-16   splines_3.6.3      jsonlite_1.7.0     pROC_1.16.2       
[17] packrat_0.5.0      dbplyr_1.4.4       compiler_3.6.3     httr_1.4.1        
[21] backports_1.1.8    lazyeval_0.2.2     assertthat_0.2.1   Matrix_1.2-18     
[25] cli_2.0.2          htmltools_0.5.0    tools_3.6.3        gtable_0.3.0      
[29] glue_1.4.1         Rcpp_1.0.5         cellranger_1.1.0   DiceDesign_1.8-1  
[33] vctrs_0.3.2        iterators_1.0.12   timeDate_3043.102  gower_0.2.2       
[37] globals_0.12.5     rvest_0.3.5        lifecycle_0.2.0    future_1.18.0     
[41] MASS_7.3-51.5      ipred_0.9-9        hms_0.5.3          parallel_3.6.3    
[45] curl_4.3           padr_0.5.2         rpart_4.1-15       stringi_1.4.6     
[49] foreach_1.5.0      lhs_1.0.2          lava_1.6.7         rlang_0.4.7       
[53] pkgconfig_2.0.3    lattice_0.20-41    labeling_0.3       htmlwidgets_1.5.1 
[57] bit_1.1-15.2       tidyselect_1.1.0   plyr_1.8.6         magrittr_1.5      
[61] R6_2.4.1           generics_0.0.2     pillar_1.4.6       haven_2.3.1       
[65] withr_2.2.0        survival_3.2-3     nnet_7.3-14        tibbletime_0.1.5  
[69] modelr_0.1.8       crayon_1.3.4       Quandl_2.10.0      plotly_4.9.2.1    
[73] grid_3.6.3         readxl_1.3.1       data.table_1.12.8  blob_1.2.1        
[77] reprex_0.3.0       digest_0.6.25      GPfit_1.0-8        munsell_0.5.0     
[81] viridisLite_0.3.0  quadprog_1.5-8 

I have data that is grouped by hour. When I run the following I get POSIXct:

> str(query)
tibble [92,016 x 2] (S3: tbl_df/tbl/data.frame)
 $ date_col: POSIXct[1:92016], format: "2010-01-01 00:00:00" "2010-01-01 01:00:00" "2010-01-01 02:00:00" ...
 $ value   : num [1:92016] 5 11 5 4 2 7 4 2 5 14 ...

I am trying to run plot_anomaly_diagnostics and get the following error:

frequency = 24 observations per 1 day
trend = 336 observations per 14 days
Error in as.POSIXct.default(x, origin = "1970-01-01", tz = scale$timezone) : 
  do not know how to convert 'x' to class “POSIXct”

Here is what I run:

plot_anomaly_diagnostics(
  .data = filter_by_time(
    .data = query
    , .date_var = date_col
    , .start_date = (end_date - dhours(30*24))
  )
  , .date_var = date_col
  , .value = value
  , .title = "Anomaly Diagnostics Last 30 Days"
)

When I run filter_by_time() %>% str() I get POSIXct so I am a bit confused.

> filter_by_time(
+     .data = query
+     , .date_var = date_col
+     , .start_date = (end_date - dhours(30*24))
+   ) %>%
+     str()
tibble [721 x 2] (S3: tbl_df/tbl/data.frame)
 $ date_col: POSIXct[1:721], format: "2020-05-31 23:00:00" "2020-06-01 00:00:00" "2020-06-01 01:00:00" ...
 $ value   : num [1:721] 7 2 4 1 0 0 1 3 5 4 ...
>

When I run the following I get what I expect:

filter_by_time(
  .data = query
  , .date_var = date_col
  , .start_date = CEILING_MONTH(end_date - dhours(30*24))
) %>%
  time_decompose(value) %>%
  anomalize(remainder) %>%
  plot_anomaly_decomposition() +
  labs("")

Data is attached.

hourly_data.zip

Conflict in tk_xts when time variable is called `time`

It gives an error:

timeError in xts(x = list(obsvalue = c(10.61, 10.57, 10.55, 10.47, 10.41, : order.by requires an appropriate time-based object

Reproducible example:

library(tidyverse)
library(tidyquant) 
library(ecb) 
library(tibbletime)
library(timetk) 

unemp=get_data("STS.M.I8.S.UNEH.RTT000.4.000") %>% 
  mutate(time=convert_dates(obstime)) %>%         
  as_tbl_time(index=time)
unemp %>% tk_xts()

If the time variable is called something different, like month, it works fine.

The code also worked fine using timekit.

step_fourier returns all NaN

Although it worked a month or two ago, step_fourier is now giving me NaNs for everything.

After pulling the source code and debugging, I think the issue arises when the scale is inferred:

date_to_seq_scale_factor <- function(idx) {
tk_get_timeseries_summary(idx) %>% dplyr::pull(diff.median)
}

since tk_get_timeseries_summary returns a diff.median of zero. This is because I'm using panel data, not time series data. My guess is that the sort order of the data is now being changed by some upstream process (possibly but not necessarily something in timetk). When the data is sorted by the time index, rather than the unit then the time index, then it looks like most diffs are zero. The scale factor is then zero.

I don't know whether there is anything you can do about it. Perhaps allow the user to define the scale, or just document this pitfall when using non-univariate time series data.

Problem with plot_seasonal_diagnostics

I copied your example of plot_seasonal_diagnostics but it gives the following error message:

taylor_30_min %>%

  • plot_seasonal_diagnostics(date, value, .interactive = FALSE)
    

Error in if (facet_groups == "") facet_groups <- "." :
argument is of length zero

Compatibility with stats::stl(): only univariate series are allowed

I got a problem while working with tk_ts() and stl() together.

# Original ts works fine
invisible(stl(AirPassengers, s.window = "periodic"))
# Covert to tibble
ts_tbl <- timetk::tk_tbl(AirPassengers)
ts <- timetk::tk_ts(ts_tbl, select = "value", 
              start = 1949, 
              end = 1960, 
              frequency = 12)
stl(ts, s.window = "periodic")
Error in stl(ts, s.window = "periodic") : 
  only univariate series are allowed

Based on this, str() rises the error from here:

 if (is.matrix(x)) 
        stop("only univariate series are allowed")

And from ?is.matrix

is.matrix returns TRUE if x is a vector and has a "dim" attribute of length 2) and FALSE otherwise. Note that a data.frame is not a matrix by this test. The function is generic: you can write methods to handle specific classes of objects, see InternalMethods.

Error on coersion from data.table to xts

I have a data.table called dt with a POSIXct column called timestamp.

> tk_xts(dt, date_var = timestamp)

gives the error: Error in !syms : invalid argument type. I tracked it down to line 11 of tk_xts_.data.frame where this call is made:

ret <- dplyr::select_if(ret, is.numeric)

White Noise Bounding

Hey Matt,

I know that you based a lot of the functionality within timetk and modeltime on the forecast package (and I can only assume its spiritual successors, fable and co.). One of the most useful features, in my opinion, of those packages is the white noise bounding bars present in the ACF and PACF plots. You can see an example of them in this section from Forecasting. I find them extremely useful not only for seeing if a time series is stationary but also for determining which lags might serve as useful features. I noticed, however, that plot_acf_diagnostics doesn't include these bounding bars. I have two questions about this:

  1. Is there a reason you chose not to recreate this functionality within timetk?
  2. If there isn't a reason, would you be okay with me opening a PR to add that functionality? My thought with this would be to recreate the blue, dashed line look from fable but make them an optional argument (ex. .white_noise_bars = FALSE) within plot_acf_diagnostics.

What are your thoughts?

Thank you.

pad_by_time passing start_val and end_val for regular grouped datasets

Hi Matt,

Super nice package once again! I was working with pad_by_time, and my goal was to create a grouped regular time series, in such a way that each group has exactly the same amount of rows (=intervals). The only way I could figure it out was to use the start_val = min(date$data) and end_val = max(date$data) directly in part::pad. So maybe it would be nice to have these params available to the user in pad_by_time.

Alternatively, perhaps auto-creating this grouped regular time series using a separate parameter (for example .regular = T) could do the job for the user.

Thanks,

Bob

Installing timetk via CRAN vs. MRAN

I am trying to install timetk from the Microsoft R Open (version 3.5.2) distribution (https://mran.microsoft.com). Unfortunately, install.packages("timetk") installs package version 0.1.1.1, which is not usable. It does not include most of the package's functions. Checking (https://mran.microsoft.com) it says that package version 1.0.0.0 should be available, which is the same as is available via CRAN. So, I am not really sure what is happening.

My company only allows me to install packages via (https://mran.microsoft.com). So, I cannot install it via CRAN or from the Git repository. Could any of the contributors please check the version available via Microsoft R Open and if necessary update it?

Build errored with stringi_1.1.7

Hi!

I'm trying to push a newer stringi version to CRAN. It's available at:

install_github("gagolews/stringi")

Even though the changes between v1.1.6 and v1.1.7 are very cosmetic (mods to the C/C++ code -- removed #pragmas suppressing some warnings and changing the order of the parameters to some class constructor affecting the Windows-only build), I got a response from the CRAN team that your package errored during check:

Package: timetk
Check: tests
New result: ERROR
    Running ‘testthat.R’ [4s/38s]
  Running the tests in ‘tests/testthat.R’ failed.
  Complete output:
    > library(testthat)
    > library(timetk)
    >
    > test_check("timetk")
    ── 1. Error: (unknown) (@test_tk_index.R#9) ───────────────────────────────────
    order.by requires an appropriate time-based object
    1: tk_xts(AAPL_tbl, select = -date, date_var = date) at testthat/test_tk_index.R:9
    2: tk_xts_(data = data, select = select, date_var = date_var, silent = silent, ...)
    3: tk_xts_.default(data = data, select = select, date_var = date_var, silent = silent,
           ...)
    4: xts::xts(data, ...)
    5: stop("order.by requires an appropriate time-based object")

    ── 2. Failure: tbl tot tbl test returns tibble with correct rows and columns. (@
    nrow(test_tbl_1) not equal to 504.
    1/1 mismatches
    [1] 1 - 504 == -503

    ── 3. Failure: tbl tot tbl test returns tibble with correct rows and columns. (@
    ncol(test_tbl_1) not equal to 7.
    1/1 mismatches
    [1] 1 - 7 == -6

    ── 4. Failure: tbl tot tbl test returns tibble with correct rows and columns. (@
    colnames(test_tbl_1)[[1]] not equal to "date".
    1/1 mismatches
    x[1]: "data"
    y[1]: "date"

    ── 5. Error: (unknown) (@test_tk_tbl.R#21) ────────────────────────────────────
    order.by requires an appropriate time-based object
    1: tk_xts(AAPL_tbl, select = -date, date_var = date) at testthat/test_tk_tbl.R:21
    2: tk_xts_(data = data, select = select, date_var = date_var, silent = silent, ...)
    3: tk_xts_.default(data = data, select = select, date_var = date_var, silent = silent,
           ...)
    4: xts::xts(data, ...)
    5: stop("order.by requires an appropriate time-based object")

    ══ testthat results ═══════════════════════════════════════════════════════════
    OK: 176 SKIPPED: 0 FAILED: 5
    1. Error: (unknown) (@test_tk_index.R#9)
    2. Failure: tbl tot tbl test returns tibble with correct rows and columns. (@test_tk_tbl.R#14)
    3. Failure: tbl tot tbl test returns tibble with correct rows and columns. (@test_tk_tbl.R#15)
    4. Failure: tbl tot tbl test returns tibble with correct rows and columns. (@test_tk_tbl.R#16)
    5. Error: (unknown) (@test_tk_tbl.R#21)

    Error: testthat unit tests failed
    Execution halted

Is the error likely to be on the stringi side?

Cheers!

tk_ts causes deprecated warning

Hello,
since the dplyr 0.7.0 update the timetk::tk_ts function causes a deprecated warning.
Here is a minimal working example:

timetk::tk_ts(data.frame(
  A = 1:4,
  B = c(
    as.Date("1990-01-10"),
    as.Date("1990-02-10"),
    as.Date("1990-03-10"),
    as.Date("1990-04-10")
  )
),
select = "A"
)

which results in the actual result and:

Warning:
`select_()` is deprecated as of dplyr 0.7.0.
Please use `select()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.

Like suggested I ran lifecycle::last_warnings() which resulted in:

<deprecated>
message: `select_()` is deprecated as of dplyr 0.7.0.
Please use `select()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
backtrace:
 1. timetk::tk_ts(...)
 3. timetk::tk_ts_.data.frame(...)
 7. timetk:::tk_xts_.data.frame(ret, select = select, silent = silent)
 8. dplyr::select_(data, select)
 9. dplyr:::lazy_deprec("select", hint = FALSE)

So it seems that this is the depreciated function.
Is this hard to fix? It would be lovely if you could use the new function and thus remove the warning.
Thank you very much :-)

ymd vs POSIXlt

from the "working with time series" vignette:

The index gives the user a lot of information in a simple timestamp. Consider the datetime “2016-01-01 00:00:00”. From this timestamp, we can decompose the datetime to get the signature, which consists of the year, quarter, month, day, day of year, day of month, hour, minute, and second of the occurrence of a single observation.

Actually if I saw a 00:00:00 with that much precision, I would assume the time is actually unknown and I am really working with lubridate::ymd data, which in R type datasets is more common than a precise UTC value.

Install on docker image rocker/tidyverse:3.5.2

When installing executing docker build command:
* installing *source* package ‘timetk’ ... ** package ‘timetk’ successfully unpacked and MD5 sums checked ** R ** inst ** byte-compile and prepare package for lazy loading
gives this error message:
Error : package ‘lattice’ was installed by an R version with different internals; it needs to be reinstalled for use with this R version

Possible work around?

timetk::diff_vec doesn't handle ts

Perhaps this isn't so much a bug or even something that needs to be fixed, but I was looking at #54 and noticed that the way the tibble was constructed can unintentionally cause problems.

airpass <- tibble::tibble(x = AirPassengers)
timetk::tk_augment_differences(airpass, x)
#> Error: Problem with `mutate()` input `x_lag1_diff1`.
#> x diff_vec: No method for class ts.
#> i Input `x_lag1_diff1` is `timetk::diff_vec(...)`.

It seems like this should work as you can very well diff a ts object.

timetk holidays and weekdays

Matt;
I apologize for the confusion, so let's go back to the original question. The problem that I am having is with holidays and weekends. Please try to run all the code below to see if you get error messages.
After running this:
new_data <- train %>%
tk_index() %>%
tk_make_future_timeseries(n_future = 252, skip_values = holidays, inspect_weekdays = TRUE) %>%
tk_get_timeseries_signature()
new_data has only 173 records because holidays and weekends have been removed but actual_future still has 252 so they cannot been cbind together.

RUN FULL SCRIPT BELOW
library(timetk)
library(tidyquant)

FB_tbl <- FANG %>%
filter(symbol == "FB") %>%
select(date, volume)
FB_tbl

Everything before 2016 will be used for training (2013-2015 data)

train <- FB_tbl %>%
filter(date < ymd("2016-01-01"))
dim(train)

Everything in 2016 will be used for comparing the output

actual_future <- FB_tbl %>%
filter(date >= ymd("2016-01-01"))
dim(actual_future)

train <- tk_augment_timeseries_signature(train)

train

fit_lm <- lm(volume ~ ., data = train[,-1])
summary(fit_lm)

US trading holidays in 2016

holidays <- c("2016-01-01", "2016-01-18", "2016-02-15", "2016-03-25", "2016-05-30",
"2016-07-04", "2016-09-05", "2016-11-24", "2016-12-23", "2016-12-26",
"2016-12-30") %>%
ymd()

Build new data for prediction: 3 Steps

new_data <- train %>%
tk_index() %>%
tk_make_future_timeseries(n_future = 252, skip_values = holidays, inspect_weekdays = TRUE) %>%
tk_get_timeseries_signature()

New data should look like this

new_data

Prediction using a linear model, fit_lm, on future index time series signature

pred_lm <- predict(fit_lm, newdata = new_data)
length(pred_lm)

Add predicted values to actuals data

actual_future <- actual_future %>%
add_column(yhat = pred_lm)
actual_future
######I GET AN ERROR HERE BECAUSE HOLIDAYS AND WEEKENDS HAVE NOT BEEN REMOVED FROM actual_data and therefore one dataset has 252 records and new data has only 173. `

anomaly detection

Implement function to detect anomalies in data similar to twitter AnomalyDetection package. Simplify output to return only outliers as tidy output.

timetk::pad_by_time() include an option .direction = c("down", "up", "downup", "updown") like in tidyr::fill()

Dear Matt Dancho

Is there a possibility to include in timetk::pad_by_time() an option .direction = c("down", "up", "downup", "updown") like in tidyr::fill()?

Here is an example where I found it would be useful where I need to use timetk::pad_by_time() plus tidyr::fill(). It would be useful to just use timetk::pad_by_time():

# LIBRARIES ----

# Core
library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(timetk)

# Import
library(jsonlite)
#> 
#> Attaching package: 'jsonlite'
#> The following object is masked from 'package:purrr':
#> 
#>     flatten

# DATA -----
# * Importing ----
json_file <- "https://www.datos.gov.co/resource/mcec-87by.json"
exchange_rate_cop_usd <- jsonlite::fromJSON(txt = json_file)

# * Tidying
exchange_rate_cop_usd_tbl <- exchange_rate_cop_usd %>%
    # Changing data structure 
    tidyr::as_tibble() %>%
    # Using appropriate name variables
    purrr::set_names(nm = c("value", "units", "initial_date", "final_date")) %>%
    # Specifying the correct units
    dplyr::mutate(units        = "COP/USD",
                  initial_date = lubridate::parse_date_time(initial_date, 
                                                            orders = "YmdHMS", 
                                                            tz     = "America/Bogota"),
                  final_date   = lubridate::parse_date_time(final_date, 
                                                          orders = "YmdHMS", 
                                                          tz     = "America/Bogota")) %>% 
    # Organizing columns
    dplyr::select(initial_date, final_date, value, units)

    # Clean data
exchange_rate_cop_usd_tbl %>% 
    # selecting relevant variables
    select(initial_date, value, units) %>% 
    # Using timetk::pad_by_time() plus tidyr::fill()
    # where I didn't find this option directly in timetk::pad_by_time()
    timetk::pad_by_time(.date_var  = initial_date,
                        .by        = "1 day", 
                        .pad_value = NA) %>% 
    tidyr::fill(value, units, .direction = "down")
#> # A tibble: 1,541 x 3
#>    initial_date        value   units  
#>    <dttm>              <chr>   <chr>  
#>  1 2016-07-02 00:00:00 2914.38 COP/USD
#>  2 2016-07-03 00:00:00 2914.38 COP/USD
#>  3 2016-07-04 00:00:00 2914.38 COP/USD
#>  4 2016-07-05 00:00:00 2914.38 COP/USD
#>  5 2016-07-06 00:00:00 2966.87 COP/USD
#>  6 2016-07-07 00:00:00 3003.20 COP/USD
#>  7 2016-07-08 00:00:00 2986.49 COP/USD
#>  8 2016-07-09 00:00:00 2952.64 COP/USD
#>  9 2016-07-10 00:00:00 2952.64 COP/USD
#> 10 2016-07-11 00:00:00 2952.64 COP/USD
#> # … with 1,531 more rows

Created on 2020-09-20 by the reprex package (v0.3.0)

Issue: Backticked date columns in tibble failing during coercion

library(timekit)
library(tidyquant)

tib <- tibble(
    `date column` = seq.Date(from = as.Date("2017-01-01"), by = "day", length.out = 10),
    `my value` = 1:10
    )
tib
#> # A tibble: 10 x 2
#>    `date column` `my value`
#>           <date>      <int>
#>  1    2017-01-01          1
#>  2    2017-01-02          2
#>  3    2017-01-03          3
#>  4    2017-01-04          4
#>  5    2017-01-05          5
#>  6    2017-01-06          6
#>  7    2017-01-07          7
#>  8    2017-01-08          8
#>  9    2017-01-09          9
#> 10    2017-01-10         10

tk_xts(tib)
#> Warning in tk_xts_.data.frame(data = data, select = select, date_var =
#> date_var, : Non-numeric columns being dropped: date column
#> Error in parse(text = x): <text>:1:6: unexpected symbol
#> 1: date column
#>          ^

Error in tk_get_timeseries_summary

tk_get_timeserie_summary returns an error

library(tidyquant)
library(timetk)
library(tidyverse)

data(FANG, package = "tibbletime")

# Works with time-based tibbles
FB_tbl <- FANG %>% filter(symbol == "FB")
FB_idx <- tk_index(FB_tbl)

tk_get_timeseries_summary(FB_idx)
Error in dimnames(x) <- dnx : 'dimnames' applied to non-array

The issue originates from the get_timeseries_summary_date function when trying to tidy a summary table object using broom::tidy. This might be a result of broom::tidy.table now being deprecated.

A working fix is to substitute the broom:tidy function with as.list

As I stumbled upon this issue when using the anomalize package I am not sure what kind #of output the broom::tidy.table would provide if it worked as intended. Using the suggested fix the output would look like this:

# A tibble: 1 x 12
  n.obs start      end        units scale tzone diff.Min. `diff.1st Qu.` diff.Median
  <int> <date>     <date>     <chr> <chr> <chr>     <dbl>          <dbl>       <dbl>
1  1008 2013-01-02 2016-12-30 days  day   UTC       86400          86400       86400
# … with 3 more variables: diff.Mean <dbl>, `diff.3rd Qu.` <dbl>, diff.Max. <dbl>

Overlap between tibbletime, timetk and tsibble

Hi!
tibbles are extended to deal with time in three packages (that I know of!): timetk, tibbletime and tsibble. I'm still trying to understand which package is best suited to use for what, and I find they have quite an overlap.

I know the former two packages are both created by you, have you considered joining forces with tsibble's author to create a single, complete package?

Thanks for the nice packages!

plot_seasonal_diagnostics errors on ungrouped tibble

taylor_30_min %>%
  plot_seasonal_diagnostics(date, value, .interactive = FALSE)

Returns Error in if (facet_groups == "") facet_groups <- "." : argument is of length zero.

Suggests that the function is not compatible with ungrouped dataframes.

The following modification produces the expected result:

taylor_30_min %>%
  mutate(id = "id") %>%
  group_by(id) %>%
  plot_seasonal_diagnostics(date, value, .interactive = FALSE)

week.iso and year.iso

Hello,

Thanks for the timekit package. This is really usefull.

I need to aggregate TS on a weekly basis - I would like to use week.iso but also year.iso. Below a simple R script to explain my problem:

library(dplyr)
library(timekit)
tk_get_timeseries_signature(as.Date(c("2014-12-28", "2014-12-29", 
    "2014-12-30", "2014-12-31"))) %>% 
  as.data.frame()
       index  index.num  diff year half quarter month month.xts month.lbl day hour minute second hour12
1 2014-12-28 1419724800    NA 2014    2       4    12        11  December  28    0      0      0      0
2 2014-12-29 1419811200 86400 2014    2       4    12        11  December  29    0      0      0      0
3 2014-12-30 1419897600 86400 2014    2       4    12        11  December  30    0      0      0      0
4 2014-12-31 1419984000 86400 2014    2       4    12        11  December  31    0      0      0      0
  am.pm wday wday.xts  wday.lbl mday qday yday mweek week week.iso week2 week3 week4 mday7
1     1    1        0    Sunday   28   89  362     4   52       52     0     1     0     5
2     1    2        1    Monday   29   90  363     5   52        1     0     1     0     5
3     1    3        2   Tuesday   30   91  364     5   52        1     0     1     0     5
4     1    4        3 Wednesday   31   92  365     5   53        1     1     2     1     5

Aggregating with week.iso and year as grouping variables will not work as 2014-12-31 is in week 1 of year 2015.

Is there a way to get a year.iso variable?

step_fourier and tk_augment_fourier possible inconsistency

I'm using timetk to create fourier terms on monthly data. I'm able to use tk_augment_fourier() for certain period and K but get an error when using step_fourier().

Here is a little reproducible example.

library(tidyverse)
library(timetk)
library(tidymodels)
    

airpass <- AirPassengers %>% as_tibble() %>% add_column(date = seq.Date(from = as.Date("1949-01-01"),
                                                                        to   = as.Date("1960-12-01"),
                                                                        by   = "month"))
# Works
airpass %>% 
    tk_augment_fourier(date, .periods = c(3, 5, 8), .K = 4) %>% 
    plot_time_series_regression(
        date,
        .formula = x ~ . -date
    )

# Get an error
recipe_spec_base <- recipe(x ~ ., data = airpass) %>% 
    step_fourier(date, period = c(3, 5, 8), K = 4)

timetk::tk_index() does not return correct time-based index if timezone is used.

Hi again,

After further playing around with timetk, I found another problem where tk_index() does not return the correct time-based index.

library(timetk)

# Create some hourly time series data.
data_tbl <- tibble::tibble(
  date = seq(from = as.POSIXct("2018-01-01 00:00:00", tz='Europe/Berlin'), by = 60*60, length.out = 5),
  x    = rnorm(5) * 10,
  y    = 5:1
)

The data_tbl date field is correct and tk_index() returns the correct time-base index from data_tbl

data_tbl
# # A tibble: 5 x 3
# date                     x     y
# <dttm>               <dbl> <int>
#   1 2018-01-01 00:00:00 -19.0      5
# 2 2018-01-01 01:00:00  13.0      4
# 3 2018-01-01 02:00:00 -23.9      3
# 4 2018-01-01 03:00:00 -14.8      2
# 5 2018-01-01 04:00:00   2.31     1

data_tbl$date
# [1] "2018-01-01 00:00:00 CET" "2018-01-01 01:00:00 CET" "2018-01-01 02:00:00 CET" "2018-01-01 03:00:00 CET" "2018-01-01 04:00:00 CET"

as.numeric(data_tbl$date)
# [1] 1514761200 1514764800 1514768400 1514772000 1514775600

tk_index(data_tbl) # Returns time-based index vector, consistent with previous results.
# [1] "2018-01-01 00:00:00 CET" "2018-01-01 01:00:00 CET" "2018-01-01 02:00:00 CET" "2018-01-01 03:00:00 CET" "2018-01-01 04:00:00 CET"

However, the tk_index() problem occurs after coercing data_tbl to a ts object using tk_ts().

data_ts <- tk_ts(data_tbl)
data_ts
# Time Series:
#   Start = 1 
# End = 5 
# Frequency = 1 
# x y
# 1 -19.018495 5
# 2  13.040283 4
# 3 -23.884440 3
# 4 -14.828830 2
# 5   2.310638 1
# attr(,"index")
# [1] 1514761200 1514764800 1514768400 1514772000 1514775600
# attr(,"index")attr(,"tzone")
# [1] Europe/Berlin
# attr(,"index")attr(,"tclass")
# [1] POSIXct  POSIXt
# attr(,".indexCLASS")
# [1] POSIXct  POSIXt
# attr(,"tclass")
# [1] POSIXct  POSIXt
# attr(,".indexTZ")
# [1] Europe/Berlin
# attr(,"tzone")
# [1] Europe/Berlin

tk_index returns the correct regularized index.

tk_index(data_ts, timetk_idx = FALSE) # Returns regularized index
# [1] 1 2 3 4 5

However, the time-based index is incorrect.

tk_index(data_ts, timetk_idx = TRUE)  # Returns original time-based index vector
# [1] "2017-12-31 23:00:00 CET" "2018-01-01 00:00:00 CET" "2018-01-01 01:00:00 CET" "2018-01-01 02:00:00 CET" "2018-01-01 03:00:00 CET"
# Warning message:
#   In check_tzones(e1, e2) : 'tzone' attributes are inconsistent

The same problem occurs when coercing data_tbl to an XTS object. Here we can see the XTS object has the correct time-based index.

# The xts object preservers the correct datetime stamps, but tk_index() does not return the correct datetime stamps.
data_xts <- tk_xts(data_tbl)
data_xts
#                              x y
# 2018-01-01 00:00:00 -19.018495 5
# 2018-01-01 01:00:00  13.040283 4
# 2018-01-01 02:00:00 -23.884440 3
# 2018-01-01 03:00:00 -14.828830 2
# 2018-01-01 04:00:00   2.310638 1

Get time-based index from xts object using index(), which returns the correct value.

zoo::index(data_xts)
# [1] "2018-01-01 00:00:00 CET" "2018-01-01 01:00:00 CET" "2018-01-01 02:00:00 CET" "2018-01-01 03:00:00 CET" "2018-01-01 04:00:00 CET"

As you can see below, tk_index() returns the wrong time-based index.

tk_index(data_xts, timetk_idx = FALSE) # Returns regularized index
# [1] "2017-12-31 23:00:00 CET" "2018-01-01 00:00:00 CET" "2018-01-01 01:00:00 CET" "2018-01-01 02:00:00 CET" "2018-01-01 03:00:00 CET"
# Warning message:
#   In check_tzones(e1, e2) : 'tzone' attributes are inconsistent

tk_index(data_xts, timetk_idx = TRUE)  # Returns original time-based index vector
# [1] "2017-12-31 23:00:00 CET" "2018-01-01 00:00:00 CET" "2018-01-01 01:00:00 CET" "2018-01-01 02:00:00 CET" "2018-01-01 03:00:00 CET"
# Warning message:
  # In check_tzones(e1, e2) : 'tzone' attributes are inconsistent

I hope this helps.

Possibility to include timetk::mutate_by_time()

Hello Matt Dancho

In your package timetk your functions filter_by_time() and summarize_by_time are incredible useful! I want to know if maybe in the future you could consider the possibility of including a function timetk::group_by_time() with an argument to group by year, quarter, month, week, day or any other meaningful unit. I know that we can use dplyr::group_by but the problem is that you must separate a column date between its different components to perform dplyr::group_by, adding more lines of code.

Best wishes and thank you for building the packages timetk and modeltime

CRAN Check Failure for Upcoming broom Release

Hi there! The broom dev team just ran reverse dependency checks on the upcoming broom 0.7.0 release and found new errors/test failures for the CRAN version of this package. I've pasted the results below, which seem to result from our decision to no longer export the tidy.table() method in favor of tibble::as_tibble().

  • checking examples ... ERROR

    ...
    Business Science offers a 1-hour course - Learning Lab #9: Performance Analysis & Portfolio Optimization with tidyquant!
    [39m[34m</> Learn more at: https://university.business-science.io/p/learning-labs-pro </>[39m
    
    Attaching package: ‘tidyquant’
    
    The following objects are masked from ‘package:timetk’:
    
        summarise_by_time, summarize_by_time
    
    > library(timetk)
    > 
    > # Filter values in January 1st through end of February, 2013
    > FANG %>%
    +     group_by(symbol) %>%
    +     filter_by_time(date, "start", "2013-02") %>%
    +     plot_time_series(date, adjusted, .facet_ncol = 2, .interactive = FALSE)
    Warning: 'tidy.table' is deprecated.
    See help("Deprecated")
    Error in dimnames(x) <- dnx : 'dimnames' applied to non-array
    Calls: %>% ... eval -> eval -> data.frame -> do.call -> provideDimnames
    Execution halted
    
  • checking tests ...

     ERROR
    Running the tests in ‘tests/testthat.R’ failed.
    Last 13 lines of output:
      ══ testthat results  ═══════════════════════════════════════════════════════════
      [ OK: 233 | SKIPPED: 0 | WARNINGS: 7 | FAILED: 10 ]
      1.  Error: tk_get_timeseries_summary(datetime) test returns correct format. (@test_tk_get_timeseries.R#79) 
      2.  Error: tk_get_timeseries_summary(date) test returns correct format. (@test_tk_get_timeseries.R#91) 
      3.  Error: tk_get_timeseries_summary(yearmon) test returns correct format. (@test_tk_get_timeseries.R#103) 
      4.  Error: tk_get_timeseries_summary(yearqtr) test returns correct format. (@test_tk_get_timeseries.R#116) 
      5.  Error: tk_make_future_timeseries(datetime) test returns correct format. (@test_tk_make_future_timeseries.R#12) 
      6.  Error: tk_make_future_timeseries(date) test returns correct format. (@test_tk_make_future_timeseries.R#53) 
      7.  Error: tk_make_future_timeseries(predict_every_two) test returns correct format. (@test_tk_make_future_timeseries.R#309) 
      8.  Error: tk_make_future_timeseries(predict_every_three) test returns correct format. (@test_tk_make_future_timeseries.R#348) 
      9.  Error: tk_make_future_timeseries(predict_every_four) test returns correct format. (@test_tk_make_future_timeseries.R#386) 
      10. Error: tk_make_future_timeseries(predict_random) test returns correct format. (@test_tk_make_future_timeseries.R#430) 
      
      Error: testthat unit tests failed
      Execution halted
    

We hope to submit this new version of the package to CRAN in the coming weeks. If you encounter any problems fixing these issues, please feel free to reach out!🙂

Inconsistent output length with tk_make_future_timeseries

So this package is amazing, thank you! But it's most useful feature is a little broken. Suppose I want to make forecast for the next 60 business days. If I understand correctly, I should use this function:

idx.pred <- tk_make_future_timeseries(index(df.train), 60, inspect_weekdays = TRUE)

The problem is that depending on where the weekends fall in my out of sample data, the tk_make_future_timeseries() call returns a different length vector. Rather than returning the next sixty business days, it looks like it's taking the calendar for the next sixty days and dropping the weekends. This is a disaster, because pred.xts <- xts(pred, idx.pred) fails with an error about series needing to be the same length as their dates.

Error in dirname(to) : chemin trop long

Hey Matt,
I was trying your lab 36 and when I was trying to use timetk to plot a time series smoothed by fixed period (say 90 days like you did) I was having an error while executing these lines:

data_formatted %>% summarise_by_time(date_time, .by = "day", global_active_power = sum(global_active_power)) %>% plot_time_series(date_time, global_active_power, .smooth_period = "90 days")

can't use tk_index and sw_augment and sw_tidy_decomp

Hello

I am making a custom anomaly detection by feeling how much actual falls off projection. As such I want to return the residuals to the original time series. I can do this with the model, but not with the components.

dailytimeseries <- ts_tk(originaldata, start = 1)

TBATS <- tbats(dailytimeseries)

tbats_decomp <- sw_tidy_decomp(TBATS)

tk_index(tbats_decomp, timetk_idx = TRUE)

will not return the original time series.

Clean up dependencies?

Hello!

A bit of a random thought, and I may be completely missing something. But it looks to me like timetk has a bunch of dependencies in DESCRIPTION that are probably not needed?

timetk/DESCRIPTION

Lines 22 to 47 in 3cde451

Imports:
devtools (>= 1.12.0),
dplyr (>= 0.7.0),
forecast (>= 0.8.0),
lazyeval (>= 0.2.0),
lubridate (>= 1.6.0),
padr (>= 0.3.0),
purrr (>= 0.2.2),
readr (>= 1.0.0),
stringi (>= 1.1.5),
tibble (>= 1.2),
tidyr (>= 0.6.1),
xts (>= 0.9-7),
zoo (>= 1.7-14)
Suggests:
broom,
forcats,
knitr,
rmarkdown,
robets,
scales,
stringr,
testthat,
tidyverse,
tidyquant,
timeSeries

Contrasted with:

timetk/NAMESPACE

Lines 97 to 98 in 3cde451

importFrom(dplyr,"%>%")
importFrom(xts,xts)

I wanted to ask why some of these dependencies were there and if you would be amenable to a PR that tries to strip some of them out? If this is a convenience to the end-user, is this something that could be remedied with documentation?

(i.e. devtools was a dependency that seemed to be unnecessary, just from my quick perusing). Again, I am completely open to the fact that I may be missing your hard dependence on these packages elsewhere in the repo. Thanks!

Cole

EDIT: I did some searching of the other packages, and functions are declared with pkg::fun, etc. and just not expressly imported. The reason for my request started with me wondering why devtools was a run-time dependency for one of my apps 😄

Error: .onLoad failed in loadNamespace() for 'slider', details: call: fun(libname, pkgname) error: function 'exp_vec_restore' not provided by package 'vctrs'

Hi Colleagues,

Trying to migrate my code to time_tk 2.0 replacing the tk_augment_roll_apply by slidify. However I am getting the following error:

> test <- test %>% tk_augment_slidify(.value = snsr_val_clean_lag1_diff1, .period = c(4,12,18), .partial = "right", .f = AVERAGE)
Error: .onLoad failed in loadNamespace() for 'slider', details:
  call: fun(libname, pkgname)
  error: function 'exp_vec_restore' not provided by package 'vctrs'

Session info attached:

sessionInfo()

sessionInfo() R version 4.0.0 (2020-04-24) 
Platform: x86_64-pc-linux-gnu (64-bit) 
Running under: Ubuntu 16.04.6 LTS  
Matrix products: 
default BLAS:   /usr/lib/atlas-base/atlas/libblas.so.3.0 
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0  
locale:  
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8     [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C         

attached base packages: 

[1] stats     graphics  grDevices utils     datasets  methods   base       

other attached packages: 

[1] vctrs_0.3.1                tidyquant_1.0.0            quantmod_0.4.17            TTR_0.23-6                 PerformanceAnalytics_2.0.4  [6] paws_0.1.8                 feather_0.3.5              xts_0.12-0                 zoo_1.8-8                  DescTools_0.99.35          [11] data.table_1.12.8          optparse_1.6.6             furrr_0.1.0                future_1.17.0              imputeTS_3.0               [16] timetk_2.0.0               feasts_0.1.3               fable_0.2.1                fabletools_0.2.0.9000      tsibble_0.9.1              [21] forcats_0.5.0              stringr_1.4.0              dplyr_1.0.0                purrr_0.3.4                readr_1.3.1                [26] tidyr_1.1.0                tibble_3.0.1               ggplot2_3.3.2              tidyverse_1.3.0            aws.s3_0.3.21              [31] drake_7.12.2               tictoc_1.0                 ConfigParser_1.0.0         R6_2.4.1                   ini_0.3.1                  [36] DBI_1.1.0                  odbc_1.2.2                 lubridate_1.7.8             

loaded via a namespace (and not attached):   

[1] readxl_1.3.1                backports_1.1.6             igraph_1.2.5                paws.common_0.3.1           splines_4.0.0                 [6] storr_1.2.1                 listenv_0.8.0               digest_0.6.25               rsconnect_0.8.16            fansi_0.4.1                  [11] magrittr_1.5                base64url_1.4               aws.signature_0.5.2         recipes_0.1.12              globals_0.12.5               [16] modelr_0.1.7                gower_0.2.1                 askpass_1.1                 anytime_0.3.7               forecast_8.12                [21] tseries_0.10-47             prettyunits_1.1.1           colorspace_1.4-1            blob_1.2.1                  rvest_0.3.5                  [26] warp_0.1.0                  haven_2.2.0                 crayon_1.3.4                jsonlite_1.6.1              progressr_0.6.0              [31] survival_3.1-12             glue_1.4.1                  gtable_0.3.0                ipred_0.9-9                 distributional_0.1.0         [36] Quandl_2.10.0               scales_1.1.0                stinepack_1.4               mvtnorm_1.1-0               Rcpp_1.0.4.6                 [41] progress_1.2.2              bit_1.1-15.2                txtq_0.2.0                  lava_1.6.7                  prodlim_2019.11.13           [46] httr_1.4.1                  getopt_1.20.3               ellipsis_0.3.0              pkgconfig_2.0.3             farver_2.0.3                 [51] nnet_7.3-14                 dbplyr_1.4.3                utf8_1.1.4                  tidyselect_1.1.0            labeling_0.3                 [56] rlang_0.4.6                 munsell_0.5.0               cellranger_1.1.0            tools_4.0.0                 cli_2.0.2                    [61] generics_0.0.2              broom_0.5.6                 bit64_0.9-7                 fs_1.4.1                    packrat_0.5.0                [66] nlme_3.1-147                xml2_1.3.2                  compiler_4.0.0              rstudioapi_0.11             filelock_1.0.2               [71] curl_4.3                    testthat_2.3.2              reprex_0.3.0                stringi_1.4.6               desc_1.2.0                   [76] lattice_0.20-41             Matrix_1.2-18               urca_1.3-0                  pillar_1.4.4                aws.ec2metadata_0.2.0        [81] lifecycle_0.2.0             lmtest_0.9-37               sessioninfo_1.1.1           codetools_0.2-16            boot_1.3-25                  [86] paws.machine.learning_0.1.8 MASS_7.3-51.6               assertthat_0.2.1            pkgload_1.0.2               openssl_1.4.1                [91] rprojroot_1.3-2             withr_2.2.0                 fracdiff_1.5-1              mgcv_1.8-31                 expm_0.999-4                 [96] parallel_4.0.0              hms_0.5.3                   quadprog_1.5-8              grid_4.0.0                  rpart_4.1-15                [101] timeDate_3043.102           class_7.3-17                base64enc_0.1-3

Any clue of what is going on?

BR
/Edgar

tk_index warning

Hi;
I just downloaded the latest release of timetk and I am getting a warning when using the tk_index function. It still works but was wondering if there is a woraround because eventually is going to turn into an error. Thanks

idx <- tk_index(pond)
Warning message:
'xts::indexTZ' is deprecated.
Use 'tzone' instead.
See help("Deprecated") and help("xts-deprecated").

Error in if (.interactive) { : argument is not interpretable as logical

I get this error after updating timeTK, and restarting Rstudio? (Mac10.14.6)

library(tidyverse)
library(tidyquant)
library(timetk)
library(purrr)
library(dplyr)

walmart_sales_weekly %>%

  • select(id, Date, Weekly_Sales, Temperature, Fuel_Price) %>%
    
  • group_by(id) %>%
    
  • plot_acf_diagnostics(
    
  •     Date, Weekly_Sales,        # ACF & PACF
    
  •     Temperature, Fuel_Price,   # CCFs
    
  •     .lags        = "3 months",    # 3 months of weekly lags
    
  •     .interactive = interactive
    
  • )
    

Error in if (.interactive) { : argument is not interpretable as logical

Plot Anomaly Diagnostics

Hi, I just updated my RStudio to version 4.0.0 but I encountered the following error while running the # 2.0 Anomaly Detection line:

bike_sharing_daily %>%
plot_anomaly_diagnostics(
dteday,
log(cnt)
)
Error in plot_anomaly_diagnostics(., dteday, log(cnt)) :
no se pudo encontrar la función "plot_anomaly_diagnostics"

Any advice on this?

tk_ts automatic start date detection

Hi Mat, first of all I would like to start this post giving you thanks for this amazing packages!

The idea is that the function tk_ts would be able to select automatically the start date one you want to convert a tbl to a ts object.

Let me show you a reproducible example:

# Create dates
d1 <- seq.Date(as.Date("2016-01-01"), length.out = 3, by = "months")
d2 <- seq.Date(as.Date("2017-01-01"), length.out = 3, by = "months")

# Data frame
df <- data_frame(
  id = c(1, 1, 1, 2, 2, 2), 
  date = c(d1, d2),
  value = c(10, 20, 30, 10, 20, 30)
  )

# A tibble: 6 x 3
     id       date value
  <dbl>     <date> <dbl>
1     1 2016-01-01    10
2     1 2016-02-01    20
3     1 2016-03-01    30
4     2 2017-01-01    10
5     2 2017-02-01    20
6     2 2017-03-01    30

Each serie start at different dates so we cannot use tk_ts

a <- df %>% 
  group_by(id) %>% 
  nest() %>%
  mutate(data_ts = map(data, tk_ts, frequency = 12))

> a$data_ts
[[1]]
  Jan Feb Mar
1  10  20  30

[[2]]
  Jan Feb Mar
1  10  20  30

So, I have developed a workaround that helps me in this step and may be useful to incorporate it in the tk_ts function

# Auxiliar function. Assumes that dates are ordered
to_ts <- function(data, freq = 12L) {
  dates <- data$date
  value <- data$value
  ts(value, start = c(year(dates)[1], month(dates)[1]), frequency = freq)
}

b <- df %>% 
  group_by(id) %>% 
  nest() %>%
  mutate(data_ts = map(data, to_ts))

> b$data_ts
[[1]]
     Jan Feb Mar
2016  10  20  30

[[2]]
     Jan Feb Mar
2017  10  20  30

Hope that helps!
Cheers

tk_augment_lags() function is not working with grouped data

I have tried the doc example of lagged variable within groups of data, but it does not work.

`
library(tidyverse)
library(timetk)

m4_monthly %>%
group_by(id) %>%
tk_augment_lags(value, .lags = 1)
`

The lag value of the first datapoint of the id = M2 is the last one of the id = 1. I think this should not be the behavior. The behaviour should be to place a NA value, as it already does on the first datapoint of the id = M1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.