The seasonalbook.content from christophsax

seasonalbook.content's Issues

Book Outline

@jlivsey also move the discussion here.

I put the book outline in a separate file, and added some stuff to it: https://github.com/christophsax/x13book/blob/master/proposal/outline.md

It may also make sense to focus on the outline first. If that is clear, we the other questions are basically answered.

Some ideas:

I think we could build the book around a cheat sheet with references to the chapters. Probably would make sense to have an idea of the cheat sheet at an early stage.
As discussed before, each section discusses a specific practical problem. We could also add a 'Case Study' box with a more challenging question to each section. Ramadan was the only thing that came to my mind, but I am sure will will find other great examples. I remember one of the Eurostat people talking about a Tunisian time series where they had strong tourism driven Easter effects along with Ramadan. I think it would just make it much more interesting to see these cases instead of AirPassengers all along.
So we could have one (or a few) interesting standard series (e.g. US Unemployment) which we use throughout the main text, and use the exotic series in the case study boxes only.

Daily seas adj paper: remaining tasks

Small collection of good daily example series, perhaps one weekly, with data description, that are part of the package.
Discuss criterions of method evaluation: What, besides OOS forecast evaluation, could be useful?
Define adjustment methods to be discussed and evaluated: Which additional methods should we show, which ones should we drop?
If 1 to 3 are done, complete evaluation and discussion.
Clean up description, examples of individual methods
Elaborate on 'additional topics'
clean up, polish, discuss

Add "Brockwell and Davis" to bib

In X11 chp: "Details of this estimation can be found in (Brockwell and Davis)" with no reference

Revise chapter: Seasonal Breaks

Census / FRED time series DB

where we can take out our examples. We should have a list or a nested tibble with a few 1000 series, so we can pick interesting ones.

I guess FRED would be more interesting, because we can include international series: https://fred.stlouisfed.org/categories

Did we had something on that? How could we come up with a list of 1000 or so broadly mixed IDs?

How to address a vary basic question: which tools at hand to check if a series candidate for SA

Maybe an idea for an exercise to consolidate a fundamental question that might still be forgotten.
I realized it on myself when I started to call seas() like a monkey without even asking myself whether it made sense what I was trying to do.
I don't know where this would fit best. This, for sure, is part of the "getting started" section.
Before starting with seas(), which tools can I use to explore a series, especially in case where the seasonal pattern is not so evident with eye-balling?
It would be nice to have a dedicated small section listing all the tools needed to squeeze a series before starting modeling:

plot(ts)
plot(decompose(ts))
plot(stl(ts, s.window = "periodic"))
monthplot(ts)
acf(ts)
qs(seas(ts))

Migrate few cbar.sa ts objects into seasonalbook

In case the source of the book will be made open, cbar.sa would be needed to replicate some of the examples. In case this is not going to happen few ts objects should be made part of the seasonalbook pkg.

Add graphics for trading days regressor

Showing the percentage of the holidays in a given month and the positive or negative effect of the coefficient.

Use budg_exp for a CS study on seasonal break

show regressors
if f-test change by a lot adding the regressor we get a seasonal break
would have a model.span work? arima is fitted only on the second half of the series
series.modelspan (less strict) and series.span (more strict)
compare ARIMA parameters to both model spans: use series.span before 2016 then model.span for the second half

Case study: Seasonal break

We discussed a good example last time. Which one was it?

@andreranza Could you add a section to chapter 11?

An example for a case study is given in: 31-holidays.html#case-study-azerbaijani-retail-sales

Real-world transaction data

Taking this is here from an e-mail.
This shows real transaction data at a daily frequency. We can see relevant peaks on Wed-Tue. Sun and Sat are missing.

Would it make sense to add this in the trading days chp as part of the introduction? This would make it clear that if we see fewer or less of these days it might have an impact on the final adjustment.
Would make sense to make a rough estimate of what's the impact on different scenarios?

library(ggplot2)

trans_df <- 
  tibble::tibble(
    value_date = as.Date(c(
      "2022-07-01", "2022-07-04", "2022-07-05", "2022-07-06", "2022-07-07",
      "2022-07-08", "2022-07-13", "2022-07-14", "2022-07-15", "2022-07-18",
      "2022-07-19", "2022-07-20", "2022-07-21", "2022-07-22", "2022-07-25",
      "2022-07-26", "2022-07-27", "2022-07-28", "2022-07-29", "2022-08-01",
      "2022-08-02", "2022-08-03", "2022-08-04", "2022-08-05", "2022-08-08",
      "2022-08-09", "2022-08-10", "2022-08-11", "2022-08-12", "2022-08-15",
      "2022-08-16", "2022-08-17", "2022-08-18", "2022-08-19", "2022-08-22",
      "2022-08-23", "2022-08-24", "2022-08-25", "2022-08-26", "2022-08-29",
      "2022-08-30", "2022-08-31", "2022-09-01", "2022-09-02", "2022-09-05",
      "2022-09-06", "2022-09-07", "2022-09-08", "2022-09-09", "2022-09-12",
      "2022-09-13", "2022-09-14", "2022-09-15", "2022-09-16", "2022-09-19",
      "2022-09-20", "2022-09-21", "2022-09-22", "2022-09-23", "2022-09-26",
      "2022-09-27", "2022-09-28", "2022-09-29", "2022-09-30"
    )),
    amount = c(
      30293627, 29684302, 18354636, 35183606, 9582177, 14897203, 1069765, 43384861,
      39524015, 86802632, 6614005, 17761052, 8276755, 21152542, 16291322, 7772429,
      20393879, 5318004, 11859141, 23864014, 12376091, 279657, 3614375, 19932634,
      586553, 33552104, 33483075, 8039351, 6629124, 14459099, 161360799, 26694683,
      12940912, 32584637, 23363951, 23351179, 9331648, 5428734, 10499438, 7540658,
      6299517, 25319539, 8195559, 57542928, 31303483, 269582871, 18277812, 39795633,
      36711, 16797389, 20370978, 22465056, 43725288, 73089307, 36169962, 93165179,
      107295392, 69236616, 19283321, 42972503, 61393684, 329600009, 78949695,
      119115555
    ),
    day = ordered(
      c(
        "Fri", "Mon", "Tue", "Wed", "Thu", "Fri", "Wed", "Thu", "Fri", "Mon", "Tue",
        "Wed", "Thu", "Fri", "Mon", "Tue", "Wed", "Thu", "Fri", "Mon", "Tue", "Wed",
        "Thu", "Fri", "Mon", "Tue", "Wed", "Thu", "Fri", "Mon", "Tue", "Wed", "Thu",
        "Fri", "Mon", "Tue", "Wed", "Thu", "Fri", "Mon", "Tue", "Wed", "Thu", "Fri",
        "Mon", "Tue", "Wed", "Thu", "Fri", "Mon", "Tue", "Wed", "Thu", "Fri", "Mon",
        "Tue", "Wed", "Thu", "Fri", "Mon", "Tue", "Wed", "Thu", "Fri"
      ),
      levels = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat")
    ),
  )

trans_df |>
  ggplot(aes(x = value_date, y = amount)) +
  geom_line() +
  scale_x_date(date_breaks = 'day', date_labels = '%Y-%m-%d, %a') +
  theme_minimal() +
  theme(
    panel.grid.minor.x = element_blank(),
    axis.text.x = element_text(angle = 90, size = rel(0.6), vjust = -0.01)
  )

^{Created on 2024-01-17 with reprex v2.0.2}

Harmonize tables layout

Are all the tables shown in the same ways? If not harmonize them.

Revise chapter: Annual constraining

Revise chapter: Quality Measures

Series frequency for book

@christophsax - Do you plan to only discuss monthly and quarterly series in the book? This isn't crucial to decide right now but there are some aspect of the software that are different if we assume only monthly/quarterly data. For example, transform.aicdiff has default of -2 for monthly/quarterly but 0 for anything else.

Improve outlier tables layout

Each chp. discussing arguments should have a table with associated options.

Show:

what are the important options (only relevant and important)
description possibly made easier compared to the X13 manual and more compact
each option has an associated example of a seas() call

Draft a PR to outliers, then if good to irregular holidays.

Gentle seasonal adjustment

If in doubt:

don't pick short seasonal filter
don't choose an exotic arima
set options so that you stay "moderate"
might suggest seats has smother adjustment?
what if you adjust a rnorm?
...

So that you don't mess your data.

What justifications?
What is the reason no to do it?

Frequency Domain

Can we visualize a sum of sins and cosines, that represents a model / series?

Simplify? Perhaps keep in optional boxes?

Minimum viable product

In my opinion, we still need the following. FYI @jlivsey

Consistent use of cases studies

Use of AirPassengers in main text, provide at least one case study per chapter. For MVP, we don't need a case study everywhere, but handling it consistently would be good.
Currently missing in
- regARIMA Model
- X11
- SEATS
- Indirect vs direct adjustment
- Annual constraining
- Quality measures
- Revisions

Consistent use of Spec Arg overviews

Only show 'important' spec args
How to display rarely used spec args? Not at alls? Or In an appendix at the end of a chapter?
Define principles and implement for one chapter, e.g., https://christophsax.github.io/seasonalbook.content/22-transform.html#transform-options
Separate data from content. All data should be stored, e.g, in a YAML or CSV file, and book tables should be generated from them.

Consistent Use of Boxes

Define in introduction (I think we did that at some point, but I cannot find it)
- Frequency Domain
- Case Study
- ...

Consistent Use of Exercises

All chapters should have at least 4 exercises

Prettify chapters

No code overload
Reasonable paragraph length
Language
Use of udg()

Smooth

1 Introduction

Rocky

15 Quality measures
16 Revisions

Unclear weekday regressors output

When to adjust?

Saying is opinionated
Cite central banks opinions

level shifts
temporal shifts
temporary change
...

Found those figure in a pdf file.

History chapter needs rework

Broad issue but something to keep in mind.

Inconsistent conclusion from AirPassenger model output interpretation

Surprisingly, the 8-day Easter model has a lower AICc than the one-day model. This opens the question of why the one-day model has been chosen in the beginning.

Actually, it is the opposite.

JL's Census Windows PC

My windows PC complains about the file tree containing file names with backslash i.e. //summarys.html

As a workaround, I followed this git issue

Disable core.protectNTFS:

git config --global core.protectNTFS false

christophsax / seasonalbook.content Goto Github PK

seasonalbook.content's People

Contributors

Stargazers

Watchers