touringplans's Introduction

touringplans

The goal of touringplans is to provide access to Disney World Ride Wait Time Datasets curated by the TouringPlans.com team.

Installation

You can install the development version of touringplans with:

devtools::install_github("LucyMcGowan/touringplans")

You can find a list of all data sets along with variable information on the touringplans package website

Example

The touringplans_2018 data frame contains wait time data aggregated by hour for 14 attractions, along with some park-level daily metadata. The example below demonstrates how wait times by day are associated with the Ticket Season. The example below uses the tidyverse package to assist with data wrangling.

library(touringplans)
library(tidyverse)
touringplans_2018 %>%
  count(attraction_name)
#> # A tibble: 14 × 2
#>    attraction_name                                           n
#>    <chr>                                                 <int>
#>  1 Alien Swirling Saucers                                 2718
#>  2 Avatar Flight of Passage                               5147
#>  3 DINOSAUR                                               4884
#>  4 Expedition Everest - Legend of the Forbidden Mountain  4950
#>  5 Kilimanjaro Safaris                                    4887
#>  6 Na'vi River Journey                                    5073
#>  7 Pirates of the Caribbean                               5168
#>  8 Rock 'n' Roller Coaster Starring Aerosmith             5062
#>  9 Seven Dwarfs Mine Train                                5622
#> 10 Slinky Dog Dash                                        2691
#> 11 Soarin' Around the World                               5203
#> 12 Spaceship Earth                                        5078
#> 13 Splash Mountain                                        4964
#> 14 Toy Story Mania!                                       5045

We can aggregate the hourly posted wait time data into an avarage wait time by day for each ride.

agg_2018 <- touringplans_2018 %>%
  group_by(park_date, attraction_name, park_ticket_season) %>%
  summarise(average_diff = mean(wait_minutes_posted_avg - wait_minutes_actual_avg, na.rm = TRUE), .groups = "drop") %>%
  filter(average_diff > -300) # remove weird data points (more on this later!)

On average, Disney over predicts wait times by 15 and a half minutes per day on the 14 rides included in this dataset during peak season, around 11 and a half minutes per day during regular season, and around 13 minutes per day during value season.

lm(average_diff ~ park_ticket_season, data = agg_2018) %>%
  summary()
#> 
#> Call:
#> lm(formula = average_diff ~ park_ticket_season, data = agg_2018)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -82.401  -8.734  -3.066   5.487 198.772 
#> 
#> Coefficients:
#>                           Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)                15.6420     0.6006  26.045  < 2e-16 ***
#> park_ticket_seasonregular  -3.9076     0.7074  -5.524 3.53e-08 ***
#> park_ticket_seasonvalue    -2.4144     0.8378  -2.882  0.00397 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 17.57 on 3967 degrees of freedom
#> Multiple R-squared:  0.007723,   Adjusted R-squared:  0.007223 
#> F-statistic: 15.44 on 2 and 3967 DF,  p-value: 2.095e-07

ggplot(agg_2018, aes(x = park_date, y = average_diff)) +
  geom_point(aes(color = park_ticket_season)) +
  geom_line() +
  geom_hline(yintercept = 0, lty = 2) +
  facet_wrap(~ attraction_name, ncol = 2) + 
  labs(y = "Average difference in posted and actual wait time",
       color = "Ticket Season")

We can see that there are some attractions that this holds true for moreso than others.

library(ggridges)

ggplot(agg_2018, aes(x = average_diff, y = park_ticket_season, fill = park_ticket_season)) +
  geom_density_ridges() +
  geom_vline(xintercept = 0, lty = 2) +
  xlim(c(-30, 50)) +
  facet_wrap(~ attraction_name, ncol = 3) + 
  labs(x = "Average difference in posted and actual wait time",
       fill = "Ticket Season",
       y = "Ticket Season")

touringplans's People

Contributors

Stargazers

Watchers

touringplans's Issues

Some days in `touringplans_2018` have different values of `park_extra_magic_morning` by hour

I expected that the value of park_extra_magic_morning would be the same for every hour of every day, such that distinct(park_date, park_extra_magic_morning) would be 365. However, some days appear to have different values of park_extra_magic_morning depending on the hour. That doesn't seem right, but is there a reason we might expect that?

library(dplyr, warn.conflicts = FALSE)
library(touringplans)
distinct_dates <- touringplans_2018 |> 
  select(park_date, wait_hour, park_extra_magic_morning) |> 
  distinct(park_date, park_extra_magic_morning, .keep_all = TRUE)

nrow(distinct_dates)
#> [1] 662

distinct_dates |> 
  group_by(park_date) |> 
  summarize(n = n()) |> 
  filter(n > 1) |> 
  nrow()
#> [1] 297

distinct_dates |> 
  # example of date with conflicting EMM
  filter(park_date == "2018-01-02")
#> # A tibble: 2 × 3
#>   park_date  wait_hour park_extra_magic_morning
#>   <date>         <int>                    <dbl>
#> 1 2018-01-02         7                        0
#> 2 2018-01-02         8                        1

^{Created on 2024-05-17 with reprex v2.1.0}

Make column names ala Emily

https://emilyriederer.netlify.app/post/column-name-contracts

add wdwmeantemp to touringplans_2018

times showing as seconds without hms loaded

as discussed in r-causal/causal-inference-in-R#198 by @tgerke

It seems that without library(hms) (or library(tidyverse)) that hms time objects are showing as seconds.

touringplans::parks_metadata_raw |> dplyr::select(mkclose)
#> # A tibble: 2,079 × 1
#>    mkclose   
#>    <hms>     
#>  1 90000 secs
#>  2 90000 secs
#>  3 90000 secs
#>  4 86400 secs
#>  5 82800 secs
#>  6 75600 secs
#>  7 75600 secs
#>  8 75600 secs
#>  9 75600 secs
#> 10 82800 secs
#> # ℹ 2,069 more rows

library(hms)

touringplans::parks_metadata_raw |> dplyr::select(mkclose)
#> # A tibble: 2,079 × 1
#>    mkclose
#>    <time> 
#>  1 25:00  
#>  2 25:00  
#>  3 25:00  
#>  4 24:00  
#>  5 23:00  
#>  6 21:00  
#>  7 21:00  
#>  8 21:00  
#>  9 21:00  
#> 10 23:00  
#> # ℹ 2,069 more rows