datacamp / tidymetrics Goto Github PK
View Code? Open in Web Editor NEWDimensional modeling done the tidy way!
License: Other
Dimensional modeling done the tidy way!
License: Other
Add S3 generics and functions for local and S3 storage.
metric_read
metric_write
metric_delete
metrics_list
Hi - thanks for creating this great package. I have a question on the behavior of cross_by_periods
. In the sample data below, the max date is 3/21/2020
df <- tibble::tibble(
date = structure(c(18333, 18334, 18335, 18336, 18337, 18338, 18339,
18340, 18341, 18342), class = "Date"),
count = c(8, 23, 38, 64, 97, 118, 156, 229, 314, 426)
)
df %>%
cross_by_periods('day',windows = 7) %>%
summarise(roll=mean(count))
Two questions:
the function seems to add days depending on the window you specify (i.e. the max date in the data is 3/21, but the function adds days until 3/27. for the rolling_7d calculation). Is this the desired behavior?
the function starts a 7day rolling window from the first day (instead of the 7th); is it possible to adjust this?
Thanks for your time.
Hey @ramnathv, following up from pairing--I open a PR with code I was using to test tidymetrics against postgres. Let me know if there are any adjustments that would be useful!
Right now it's running well against a local db (spun up using the included docker-compose.yml file), but would need a couple tweaks to get up on travis.
I set it up to work with a subset of flights data, and copied one of the existing tests to work against postgres.
Couple things to note
The datacamp code sets the date here to "2012-12-31", but the test expects "2013-01-01".
The rstudio/pins
package has support for local, S3, and a bunch of other data stores.
Function cross_by_dimensions
, misprint in description in word "All", extra letter "l":
replaces the value of the column with the word "Alll"
This allows more programmatic (as opposed to manual) customization of factor levels when saving metrics.
I was using tidymetrics in a screencast and I noticed how much more opinionated it is than it needs to be. This makes it difficult to get someone up and running,
I'd be very happy to implement this myself but wanted to run the approach by you @ramnathv
Right now, create_metrics()
requires the following in the YAML header:
name
, which it then splits into three parts (because the first is generally metrics_
), and the second and third turn into category
and subcategory
owner
metrics
, with title
and description
for each metricdimensions
, with title
and description
for each metricI'm proposing a new interface. First, all the metadata is optional, so that if you run create_metrics on a table with a date column you'll get something right away.
category
(optional)subcategory
(even more optional)owner
(optional)metrics
(optional): If this doesn't have anything in it, the titles will be the metric IDs, and the descriptions could be blank.dimensions
(optional) If this doesn't have anything in it, the titles will be the metric IDs, and the descriptions could be blank.(For reverse compatibility we could maybe allow name
that gets split up into category
/subcategory
, but I'm not even sure about that).
How we'd handle this in shinymetrics is an open question. If a description is NA, it could show no description at all, or could say something like "To fill in a description, add description:
to the metric's metadata" or whatever.
Note that this would make the metric full IDs less strict; they wouldn't always be category_subcategory_prefix_metric, they might just be category_prefix_metric or just prefix_metric. But I think it's worth it to have people get up and running with a metric really quickly.
When making a metric with create_metrics()
, it should check to see if the documentation on dimensions are missing (similar to how it checks for missing metric documentation.)
Is this intentional?
Reproducible example. YAML header:
---
name: metrics_stock_prices
owner: drob
metrics:
usd_close:
title: Closing Price
description: Close price, in USD, at the end of this time period.
nb_volume:
title: Volume
description: Number of shares traded
dimensions:
symbol:
title: Stock
description: Stock symbol
---
Code:
library(dplyr)
library(tidymetrics)
library(shinymetrics)
library(tidyquant)
stocks <- tq_get(c("AAPL", "GOOG"))
stocks_summarized <- stocks %>%
cross_by_dimensions(symbol) %>%
cross_by_periods(c("day", "week")) %>%
summarize(nb_volume = sum(volume),
usd_close = last(close))
m <- create_metrics(stocks_summarized)
preview_metric(m$stock_prices_nb_volume)
Result:
This makes the size of the intermediate table with k dimensions linear in k rather than 2^k.
cross1 <- bind_rows(mutate(mtcars, wt = "All"), mtcars %>% mutate(wt = as.character(wt)))
result_full <- bind_rows(mutate(cross1, mpg = "All"), cross1 %>% mutate(mpg = as.character(mpg)))
cross1 <- bind_rows(mutate(mtcars, wt = "All"), mtcars %>% mutate(wt = as.character(wt))) %>% mutate(mpg = as.character(mpg))
cross2 <- bind_rows(cross1, mtcars %>% mutate(mpg = as.character(mpg)) %>% mutate(wt = as.character(wt)))
# Ideal interface something like
cross_by_dimension(mtcars, depth = NULL)
cross_by_dimension(mtcars, depth = 1)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.