datacamp / tidymetrics Goto Github PK

View Code? Open in Web Editor NEW

45.0 17.0 11.0 134 KB

Dimensional modeling done the tidy way!

License: Other

R 100.00%

tidymetrics's People

Stargazers

Watchers

Forkers

carlganz machow dgrtwo rakshitb j450h1 metabdel verrah pursuitofdatascience mdip32

tidymetrics's Issues

Add support for a metric store

Add S3 generics and functions for local and S3 storage.

metric_read
metric_write
metric_delete
metrics_list

cross_by_periods behavior

Hi - thanks for creating this great package. I have a question on the behavior of cross_by_periods. In the sample data below, the max date is 3/21/2020

df <- tibble::tibble(
  date = structure(c(18333, 18334, 18335, 18336, 18337, 18338, 18339, 
    18340, 18341, 18342), class = "Date"),
  count = c(8, 23, 38, 64, 97, 118, 156, 229, 314, 426)
)

df %>% 
  cross_by_periods('day',windows = 7) %>% 
  summarise(roll=mean(count))

Two questions:

the function seems to add days depending on the window you specify (i.e. the max date in the data is 3/21, but the function adds days until 3/27. for the rolling_7d calculation). Is this the desired behavior?
the function starts a 7day rolling window from the first day (instead of the 7th); is it possible to adjust this?

Thanks for your time.

add postgres unit tests

Hey @ramnathv, following up from pairing--I open a PR with code I was using to test tidymetrics against postgres. Let me know if there are any adjustments that would be useful!

Right now it's running well against a local db (spun up using the included docker-compose.yml file), but would need a couple tweaks to get up on travis.

I set it up to work with a subset of flights data, and copied one of the existing tests to work against postgres.

Couple things to note

I set the port to be 5433 locally (since my system wide postgres uses the default), and the default 5432 on Travis.
I think there is a slight issue with the calculation of calendar dates (copied from datacamp's data-pipeline-views), causing the test to fail...

The datacamp code sets the date here to "2012-12-31", but the test expects "2013-01-01".

Explore the use of pins package for flexible metric stores

The rstudio/pins package has support for local, S3, and a bunch of other data stores.

Misprint in function description

Function cross_by_dimensions, misprint in description in word "All", extra letter "l":

replaces the value of the column with the word "Alll"

create_metrics should preserve dimension factor levels in the metadata

This allows more programmatic (as opposed to manual) customization of factor levels when saving metrics.

Make create_metrics less opinionated

I was using tidymetrics in a screencast and I noticed how much more opinionated it is than it needs to be. This makes it difficult to get someone up and running,

I'd be very happy to implement this myself but wanted to run the approach by you @ramnathv

Current interface

Right now, create_metrics() requires the following in the YAML header:

name, which it then splits into three parts (because the first is generally metrics_), and the second and third turn into category and subcategory
owner
metrics, with title and description for each metric
dimensions, with title and description for each metric

Proposal

I'm proposing a new interface. First, all the metadata is optional, so that if you run create_metrics on a table with a date column you'll get something right away.

category (optional)
subcategory (even more optional)
owner (optional)
metrics (optional): If this doesn't have anything in it, the titles will be the metric IDs, and the descriptions could be blank.
dimensions (optional) If this doesn't have anything in it, the titles will be the metric IDs, and the descriptions could be blank.

(For reverse compatibility we could maybe allow name that gets split up into category/subcategory, but I'm not even sure about that).

How we'd handle this in shinymetrics is an open question. If a description is NA, it could show no description at all, or could say something like "To fill in a description, add description: to the metric's metadata" or whatever.

Note that this would make the metric full IDs less strict; they wouldn't always be category_subcategory_prefix_metric, they might just be category_prefix_metric or just prefix_metric. But I think it's worth it to have people get up and running with a metric really quickly.

check for missing dimensions

When making a metric with create_metrics(), it should check to see if the documentation on dimensions are missing (similar to how it checks for missing metric documentation.)

When there's only one dimension, the All tab doesn't appear

Is this intentional?

Reproducible example. YAML header:

---
name: metrics_stock_prices
owner: drob
metrics:
  usd_close:
    title: Closing Price
    description: Close price, in USD, at the end of this time period.
  nb_volume:
    title: Volume
    description: Number of shares traded
dimensions:
  symbol:
    title: Stock
    description: Stock symbol
---

Code:

library(dplyr)
library(tidymetrics)
library(shinymetrics)
library(tidyquant)
stocks <- tq_get(c("AAPL", "GOOG"))

stocks_summarized <- stocks %>%
  cross_by_dimensions(symbol) %>%
  cross_by_periods(c("day", "week")) %>%
  summarize(nb_volume = sum(volume),
            usd_close = last(close))

m <- create_metrics(stocks_summarized)

preview_metric(m$stock_prices_nb_volume)

Result:

Cross by dimensions should be able to calculate 1-depth rather than all combinations

This makes the size of the intermediate table with k dimensions linear in k rather than 2^k.

cross1 <- bind_rows(mutate(mtcars, wt = "All"), mtcars %>% mutate(wt = as.character(wt)))
result_full <- bind_rows(mutate(cross1, mpg = "All"), cross1 %>% mutate(mpg = as.character(mpg)))

cross1 <- bind_rows(mutate(mtcars, wt = "All"), mtcars %>% mutate(wt = as.character(wt))) %>% mutate(mpg = as.character(mpg))
cross2 <- bind_rows(cross1, mtcars %>% mutate(mpg = as.character(mpg)) %>% mutate(wt = as.character(wt)))

# Ideal interface something like
cross_by_dimension(mtcars, depth = NULL)
cross_by_dimension(mtcars, depth = 1)

datacamp / tidymetrics Goto Github PK

tidymetrics's People

Stargazers

Watchers

Forkers

tidymetrics's Issues

Add support for a metric store

cross_by_periods behavior

add postgres unit tests

Explore the use of pins package for flexible metric stores

Misprint in function description

create_metrics should preserve dimension factor levels in the metadata

Make create_metrics less opinionated

Current interface

Proposal

check for missing dimensions

When there's only one dimension, the All tab doesn't appear

Cross by dimensions should be able to calculate 1-depth rather than all combinations

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent