Giter Site home page Giter Site logo

seasonalbook.content's People

Contributors

andreranza avatar christophsax avatar jlivsey avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

seasonalbook.content's Issues

Book Outline

@jlivsey also move the discussion here.

I put the book outline in a separate file, and added some stuff to it: https://github.com/christophsax/x13book/blob/master/proposal/outline.md

It may also make sense to focus on the outline first. If that is clear, we the other questions are basically answered.

Some ideas:

  • I think we could build the book around a cheat sheet with references to the chapters. Probably would make sense to have an idea of the cheat sheet at an early stage.

  • As discussed before, each section discusses a specific practical problem. We could also add a 'Case Study' box with a more challenging question to each section. Ramadan was the only thing that came to my mind, but I am sure will will find other great examples. I remember one of the Eurostat people talking about a Tunisian time series where they had strong tourism driven Easter effects along with Ramadan. I think it would just make it much more interesting to see these cases instead of AirPassengers all along.

  • So we could have one (or a few) interesting standard series (e.g. US Unemployment) which we use throughout the main text, and use the exotic series in the case study boxes only.

Daily seas adj paper: remaining tasks

  • Small collection of good daily example series, perhaps one weekly, with data description, that are part of the package.

  • Discuss criterions of method evaluation: What, besides OOS forecast evaluation, could be useful?

  • Define adjustment methods to be discussed and evaluated: Which additional methods should we show, which ones should we drop?

  • If 1 to 3 are done, complete evaluation and discussion.

  • Clean up description, examples of individual methods

  • Elaborate on 'additional topics'

  • clean up, polish, discuss

Census / FRED time series DB

where we can take out our examples. We should have a list or a nested tibble with a few 1000 series, so we can pick interesting ones.

I guess FRED would be more interesting, because we can include international series: https://fred.stlouisfed.org/categories

Did we had something on that? How could we come up with a list of 1000 or so broadly mixed IDs?

How to address a vary basic question: which tools at hand to check if a series candidate for SA

Maybe an idea for an exercise to consolidate a fundamental question that might still be forgotten.
I realized it on myself when I started to call seas() like a monkey without even asking myself whether it made sense what I was trying to do.
I don't know where this would fit best. This, for sure, is part of the "getting started" section.
Before starting with seas(), which tools can I use to explore a series, especially in case where the seasonal pattern is not so evident with eye-balling?
It would be nice to have a dedicated small section listing all the tools needed to squeeze a series before starting modeling:

  • plot(ts)
  • plot(decompose(ts))
  • plot(stl(ts, s.window = "periodic"))
  • monthplot(ts)
  • acf(ts)
  • qs(seas(ts))

Migrate few cbar.sa ts objects into seasonalbook

In case the source of the book will be made open, cbar.sa would be needed to replicate some of the examples. In case this is not going to happen few ts objects should be made part of the seasonalbook pkg.

Use budg_exp for a CS study on seasonal break

  • show regressors
  • if f-test change by a lot adding the regressor we get a seasonal break
  • would have a model.span work? arima is fitted only on the second half of the series
  • series.modelspan (less strict) and series.span (more strict)
  • compare ARIMA parameters to both model spans: use series.span before 2016 then model.span for the second half

Case study: Seasonal break

We discussed a good example last time. Which one was it?

@andreranza Could you add a section to chapter 11?

An example for a case study is given in: 31-holidays.html#case-study-azerbaijani-retail-sales

Real-world transaction data

Taking this is here from an e-mail.
This shows real transaction data at a daily frequency. We can see relevant peaks on Wed-Tue. Sun and Sat are missing.

Would it make sense to add this in the trading days chp as part of the introduction? This would make it clear that if we see fewer or less of these days it might have an impact on the final adjustment.
Would make sense to make a rough estimate of what's the impact on different scenarios?

library(ggplot2)

trans_df <- 
  tibble::tibble(
    value_date = as.Date(c(
      "2022-07-01", "2022-07-04", "2022-07-05", "2022-07-06", "2022-07-07",
      "2022-07-08", "2022-07-13", "2022-07-14", "2022-07-15", "2022-07-18",
      "2022-07-19", "2022-07-20", "2022-07-21", "2022-07-22", "2022-07-25",
      "2022-07-26", "2022-07-27", "2022-07-28", "2022-07-29", "2022-08-01",
      "2022-08-02", "2022-08-03", "2022-08-04", "2022-08-05", "2022-08-08",
      "2022-08-09", "2022-08-10", "2022-08-11", "2022-08-12", "2022-08-15",
      "2022-08-16", "2022-08-17", "2022-08-18", "2022-08-19", "2022-08-22",
      "2022-08-23", "2022-08-24", "2022-08-25", "2022-08-26", "2022-08-29",
      "2022-08-30", "2022-08-31", "2022-09-01", "2022-09-02", "2022-09-05",
      "2022-09-06", "2022-09-07", "2022-09-08", "2022-09-09", "2022-09-12",
      "2022-09-13", "2022-09-14", "2022-09-15", "2022-09-16", "2022-09-19",
      "2022-09-20", "2022-09-21", "2022-09-22", "2022-09-23", "2022-09-26",
      "2022-09-27", "2022-09-28", "2022-09-29", "2022-09-30"
    )),
    amount = c(
      30293627, 29684302, 18354636, 35183606, 9582177, 14897203, 1069765, 43384861,
      39524015, 86802632, 6614005, 17761052, 8276755, 21152542, 16291322, 7772429,
      20393879, 5318004, 11859141, 23864014, 12376091, 279657, 3614375, 19932634,
      586553, 33552104, 33483075, 8039351, 6629124, 14459099, 161360799, 26694683,
      12940912, 32584637, 23363951, 23351179, 9331648, 5428734, 10499438, 7540658,
      6299517, 25319539, 8195559, 57542928, 31303483, 269582871, 18277812, 39795633,
      36711, 16797389, 20370978, 22465056, 43725288, 73089307, 36169962, 93165179,
      107295392, 69236616, 19283321, 42972503, 61393684, 329600009, 78949695,
      119115555
    ),
    day = ordered(
      c(
        "Fri", "Mon", "Tue", "Wed", "Thu", "Fri", "Wed", "Thu", "Fri", "Mon", "Tue",
        "Wed", "Thu", "Fri", "Mon", "Tue", "Wed", "Thu", "Fri", "Mon", "Tue", "Wed",
        "Thu", "Fri", "Mon", "Tue", "Wed", "Thu", "Fri", "Mon", "Tue", "Wed", "Thu",
        "Fri", "Mon", "Tue", "Wed", "Thu", "Fri", "Mon", "Tue", "Wed", "Thu", "Fri",
        "Mon", "Tue", "Wed", "Thu", "Fri", "Mon", "Tue", "Wed", "Thu", "Fri", "Mon",
        "Tue", "Wed", "Thu", "Fri", "Mon", "Tue", "Wed", "Thu", "Fri"
      ),
      levels = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat")
    ),
  )

trans_df |>
  ggplot(aes(x = value_date, y = amount)) +
  geom_line() +
  scale_x_date(date_breaks = 'day', date_labels = '%Y-%m-%d, %a') +
  theme_minimal() +
  theme(
    panel.grid.minor.x = element_blank(),
    axis.text.x = element_text(angle = 90, size = rel(0.6), vjust = -0.01)
  )

Created on 2024-01-17 with reprex v2.0.2

Series frequency for book

@christophsax - Do you plan to only discuss monthly and quarterly series in the book? This isn't crucial to decide right now but there are some aspect of the software that are different if we assume only monthly/quarterly data. For example, transform.aicdiff has default of -2 for monthly/quarterly but 0 for anything else.

Improve outlier tables layout

Each chp. discussing arguments should have a table with associated options.

Show:

  • what are the important options (only relevant and important)
  • description possibly made easier compared to the X13 manual and more compact
  • each option has an associated example of a seas() call

Draft a PR to outliers, then if good to irregular holidays.

Gentle seasonal adjustment

If in doubt:

  • don't pick short seasonal filter
  • don't choose an exotic arima
  • set options so that you stay "moderate"
  • might suggest seats has smother adjustment?
  • what if you adjust a rnorm?
  • ...

So that you don't mess your data.

  • What justifications?
  • What is the reason no to do it?

Frequency Domain

Can we visualize a sum of sins and cosines, that represents a model / series?

Simplify? Perhaps keep in optional boxes?

Minimum viable product

In my opinion, we still need the following. FYI @jlivsey

Consistent use of cases studies

  • Use of AirPassengers in main text, provide at least one case study per chapter. For MVP, we don't need a case study everywhere, but handling it consistently would be good.
  • Currently missing in
    • regARIMA Model
    • X11
    • SEATS
    • Indirect vs direct adjustment
    • Annual constraining
    • Quality measures
    • Revisions

Consistent use of Spec Arg overviews

Consistent Use of Boxes

  • Define in introduction (I think we did that at some point, but I cannot find it)
    • Frequency Domain
    • Case Study
    • ...

Consistent Use of Exercises

  • All chapters should have at least 4 exercises

Prettify chapters

  • No code overload
  • Reasonable paragraph length
  • Language
  • Use of udg()

Smooth

  • 1 Introduction

Rocky

  • 15 Quality measures
  • 16 Revisions

JL's Census Windows PC

My windows PC complains about the file tree containing file names with backslash i.e. //summarys.html

As a workaround, I followed this git issue

Disable core.protectNTFS:

git config --global core.protectNTFS false

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.