Giter Site home page Giter Site logo

tidymodels / workshops Goto Github PK

View Code? Open in Web Editor NEW
76.0 8.0 43.0 37.35 MB

Website and materials for tidymodels workshops

Home Page: https://workshops.tidymodels.org

License: Creative Commons Attribution Share Alike 4.0 International

HTML 5.29% CSS 8.26% R 0.92% SCSS 0.54% JavaScript 84.98%

workshops's Introduction

workshops

This repo contains tutorial materials for machine learning with tidymodels.

Organization

This repo is organized into directories:

  • slides/ has Quarto files for the latest version of our slides.
  • classwork/ contains Quarto files prepared for you to work along with the slides.
  • archive/ is the location for older versions of this workshop.

Code of Conduct

Please note that the workshops project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

CC BY-SA 4.0

Archiving Notes

To archive previous workshop notes:

  • Make a subdirectory in archive/ called YYYY-MM-workshop-name.
  • Copy the contents of slides/ into archive/YYYY-MM-workshop-name.
  • Copy the contents of classwork/ into archive/YYYY-MM-workshop-name.
  • Copy index.qmd into archive/YYYY-MM-workshop-name.
  • In index.qmd, remove slides/ from links to slides.
  • In _quarto.yml:
    • add an entry "archive/YYYY-MM-workshop-name/*qmd" under render.
    • add an entry "archive/YYYY-MM-workshop-name/classwork/*qmd" under resources.
  • In archive/YYYY-MM-workshop-name/, add a _metadata.yml file with the contents
execute:
  freeze: true
  • In the command line, run quarto render archive/YYYY-MM-workshop-name. This will regenerate the workshop slides under docs/archive/YYYY-MM-workshop-name.
  • Check that:
    • Running quarto render didn't change any files in docs/ outside of docs/archive/.
    • The generated slides are added to _freeze/archive/YYYY-MM-workshop-name rather than in archive/YYYY-MM-workshop-name.
    • The generated slides work (specifically, that filepaths to figures function correctly.)
  • In index.qmd, add an entry in H2 "Past workshops" like [M YYYY](archive/YYYY-MM-workshop-name/) in workshop-name
  • If you are adding slides other than English, update the navbar link in _quarto.yml.

Once the above changes are merged to main, make a GitHub Release noting the big-picture changes since the previous iteration of the workshop.

workshops's People

Contributors

davisvaughan avatar dgrtwo avatar edgararuiz avatar emilhvitfeldt avatar fvd avatar hfrick avatar juliasilge avatar simonpcouch avatar topepo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

workshops's Issues

Deck 4 - Embarrassingly parallel

Let's edit this slide, or maybe make two, to show parallel processing for Windows as well as Mac. (Linux? Probably those people already know how to do this on their computer. πŸ˜† )

prep and bake

There were a lot of questions about how to see the results of the recipe (around slide 25 of 05). It might be helpful to show a slide about prep and bake(new_data = NULL).

distance annotation is missing

05 slide 46 has an annotation pointing to https://workshops.tidymodels.org/slides/annotations.html#distance, but there isn't a distance annotation on that page.

Move info on rank deficient fit earlier

Folks want to understand what the rank deficient fit means the first time they see it, so we should move that annotation up to the first fit_resamples() in Deck 5.

2023 - conf - describing taxi "pre-processing"

The mutate(month = factor(...)) and drop_na() steps were both described as no-no's when showing the slide. Might be worth thinking about 1) whether we want to do both of these (or in the internals of data_taxi()?) and/or 2) how we might describe when why this is fine for the purposes of the workshop.

Shrink height in `hexes()`

I think if we shrink from 1.16 to 1.10 in hexes() for the height= modifier then I think the spacing between the hexes and the first element is better. I think this bugs me and @juliasilge πŸ˜›

We'd have to rerender the full site

Before

Screen Shot 2022-06-13 at 2 39 52 PM

After

Screen Shot 2022-06-13 at 2 39 16 PM

Prep at least an `.qmd` for end of first day

If we get through the first day really quickly, we will want some back-up content. Let's prep at least an .Rmd file to walk through more content. First idea: using the tree frogs data to introduce stacks

Getting Help slide seems confusing

I'm not sure I understand the difference between the first two bullets. Plus, dont we have sticky notes? Should we refer to that instead?

Screen Shot 2022-06-13 at 2 25 12 PM

2023 conf - whole game slides

This is more of a personal opinion

I think it would be neat if the 3 models (diamonds) were contained in a container to show it is "a pool of possible models"

Deck 3 - Speaker note on how workflows/hardhat handles levels better than `model.matrix()`

Slide 24

It would be nice to have a speaker note for this bullet that says exactly how workflows is better, so we dont have to think about it while we are up there.

For the speaker notes, hardhat::scream() does these two nice things:

  • Enforces that new levels are not allowed at prediction time (this is an optional check that can be turned off)
  • Restores missing levels that were present at fit time, but happen to be missing at prediction time (like, if your "new" data just doesn't have an instance of that level)

Screen Shot 2022-07-12 at 10 54 44 AM

Deck 4 - Include histogram/density chart of outcome when talking about stratification

Screen Shot 2022-07-12 at 9 47 23 AM

When we talk about stratification here for the first time on slide 35, I feel like it would be useful to have an image of the outcome handy so we can talk about how stratification preserves the outcome distribution in each split

Something like:

ggplot(tree_frogs) + geom_histogram(aes(latency))
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Add 404 page

Since we are archiving slides over time, we are bound to run into people using dead links. A 404 page pointing to the archive should fix part of this problem

using modeldatatoo

pins is a tricky install for some folks with locked-down laptops, and modeldatatoo needs it for the data_taxi() backend.

There may be ways to get around that in modeldatatoo, but also maybe another argument for urging folks to transition to the cloud instanceπŸ™ˆ

Have an intuitive visual description of what degrees of freedom means

In the tuning slides, I think a visual depiction would help learners grasp what the degrees of freedom means for a model.

Here are two ideas:

Include this image from your book as a slide (with a talk track about "how bendy" it is):

image

Visualize how the effect of distance to goal is nonlinear, with a plot like this one:

example_data <- nhl_train %>% 
  mutate(distance = sqrt((89 - abs(coord_x))^2 + abs(coord_y)^2))

example_data %>%
  group_by(distance = cut(distance, c(0, seq(10, 60, 5), 100))) %>%
  summarize(pct_on_goal = mean(on_goal == "yes"), n = n()) %>%
  mutate(distance = fct_recode(distance, "<10" = "(0,10]", ">60" = "(60,100]")) %>%
  ggplot(aes(distance, pct_on_goal)) +
  geom_line(group = 1, size = 2) +
  scale_y_continuous(labels = scales::percent) +
  expand_limits(y = 0) +
  labs(x = "Distance to goal (bucketed)",
       y = "% of shots in this bucket that are on goal")

image

cc @juliasilge

Deck 3 - Model explanations

I think my biggest meta comment about deck 3 is that I feel like we don't explain the models we are talking about.

We said that our prereq for this workshop is basic tidyverse knowledge, so I don't think we can assume people know how rpart works.

Like, in this slide we fit an rpart model and then the next few slides use predict() on it and show some of the rpart plotting methods, but I don't see a place where we really stop and discuss what this kind of model does

Screen Shot 2022-06-14 at 10 17 27 AM

day 1 - consider how much to peek at test data

re: the slide where we print out taxi_test

I find it weird we are like "don't look at the testing data", and then we go around and looks at it πŸ˜†

Maybe we should add a "don't try this at home" for this slide? πŸ˜„

Originally posted by @EmilHvitfeldt in #108 (comment)

Should we just print out the dims()? Or even just say it's a data frame?

2023 - conf - Update references to "tomorrow" to "Advanced tidymodels"

We sometimes refer to "day 2" or "tomorrow". For the workshops in Chicago, this should be updated to refer to "Advanced tidymodels". This issue is to keep track of such reference which we leave in for NYR but need to update afterwards.

Deck 4 has the following on random forests:

Often works well without tuning hyperparameters (more on this tomorrow!), as long as there are enough trees

choosing the package to install remotes

We had 3/3 users with issues during the first classwork run into problems because of pak installation or credentatial setup. For at least two of them, switching to devtools::install_github() fixed the issue without having to go down the gitcreds rabbit hole.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.