frankportman / bayesab Goto Github PK
View Code? Open in Web Editor NEW๐ข bayesAB: Fast Bayesian Methods for A/B Testing
Home Page: http://frankportman.github.io/bayesAB/
License: Other
๐ข bayesAB: Fast Bayesian Methods for A/B Testing
Home Page: http://frankportman.github.io/bayesAB/
License: Other
Hi
We are working on the next ggplot2 release and bayesAB fails its unit tests due to a new deprecation warning in ggplot2 which breaks the silent expectations in
bayesAB/tests/testthat/test-generics.R
Lines 39 to 63 in 9db6f46
We plan to submit on May 31st and hope you are able to fix these small issues beforehand
This is including one variable, where all you want is to sample from the posterior and get a parameter estimate. Is this outside the scope of the package / does it break the current purpose?
Is it possible to run BayesTest in parallel?
Looking for way to run 1M permutations faster.
I think to calculate the expected loss for choosing A over B, in this case, is to calculate the percentage change of A over B, when B is better than A. Should the formula be: loss = -f(A_samples, B_samples)
instead of loss = f(B_samples, A_samples)
?
Line 19 in 9db6f46
It's currently labeled A_data and B_data, until there are more than 2 columns, then it breaks. Need an elegant way of doing this.
Can you explain how does this function work? When I try running it on two dummy vectors, I get an Inf.
Do A_samples and B_samples encode the entire data vectors? Is this the squared loss? Can you explain the logic behind this implementation? What does loss do if A_samples and B_samples have 0s?
I highlighted the unclear code snipets.
`set.seed(1)
a <- rbinom(100, 1, .5)
b <- rbinom(100, 1, .6)
coalesce <- function(n) ifelse(is.na(n) | is.nan(n), 0, n)
getPostError <- function(A_samples, B_samples, f = function(a, b) (a-b)/b) {
BoverA <- B_samples > A_samples
loss <- f(B_samples, A_samples)
coalesce(mean(BoverA) * mean(loss[BoverA]))
}
getPostError(b,a)`
How does summary(bayesTest(a,b, priors = c('alpha' = 1, 'beta' = 1), n_samples = 1e5, distribution = 'bernoulli'))
reach a posterior loss of 0.260855 ?
Thank you very much for making this clear.
Both summary
and plot
generics take 'percentLift' as an argument. Does it make more sense for the API to plot the summary object or the base bayesTest object? @JesseKolb
We're in the process of preparing a ggplot2 release. As part of the release process, we run the R CMD check on packages that use ggplot2 to make sure we don't accidentally break code for downstream packages.
In running the R CMD check on bayesAB, we identified the following issue:
...
$Probability
[1] 0.04541
--------------------------------------------
Credible Interval on (A - B) / B for interval length(s) (0.9) :
$Probability
5% 95%
-0.355943366 -0.006198834
--------------------------------------------
Posterior Expected Loss for choosing B over A:
$Probability
[1] 0.2601664
> plot(AB1)
Error: Either ymin or ymax must be given as an aesthetic.
Execution halted
These failures are because bayesAB is using geom_ribbon()
without layer data. The behaviour of layers when "setting" an aesthetic (outside aesI()
) with length > 1 is not defined and may change in the future (in the case of geom_ribbon()
, we now require that at least one of ymin
or ymax
is mapped to prevent bugs from occurring).
To fix this error, we suggest using a geom_*()
function with its own data. Note that using .data$col_name
within aes()
is the preferred way to avoid CMD check issues about undefined variables when mapping columns (make sure you include #' importFrom rlang .data
in at least one roxygen documentation block to avoid an error about the name .data
being undefined).
ggplot2::ggplot() +
ggplot2::geom_ribbon(
ggplot2::aes(x = .data$x, ymax = .data$ymax),
data = data.frame(x = 1:5, ymax = c(0, 1, 3, 1, 0)),
ymin = 0
)
Created on 2019-05-09 by the reprex package (v0.2.1)
We hope to release the new version of ggplot2 in the next two weeks, at which point you will get a note from CRAN that your package checks are failing. Let me know if I can help!
Next release I want to have both the current setup (input A_data, B_data directly) and summarized data (A_sum, A_length, B_sum, B_length). This should not complicate the architecture if done correctly.
Hi
We preparing the next release of ggplot2 and our reverse dependency checks show that your package is failing with the new version. Looking into it we see that this is due to changes in the warnings and errors thrown by ggplot2 that you test for in your package
You can install the release candidate of ggplot2 using devtools::install_github('tidyverse/[email protected]')
to test this out.
We plan to submit ggplot2 by the end of October and hope you can have a fix ready before then
Kind regards
Thomas
Your code does not correspond to the update rule in https://en.wikipedia.org/wiki/Conjugate_prior, which is the same as https://www.cs.ubc.ca/~murphyk/Papers/bayesGauss.pdf under a different parameterization.
Another problem: when sampling from the gamma distribution, the default arguments have rate as the third argument instead of shape. You would need to name your shape argument.
Great package overall. Thanks for the great work.
I think these args should be named lists/vectors in order to not be ambiguous.
Alternatively, we can make them constant (scalar) for a single invocation of plot/summary. However this is less flexible.
Hi!
Since a lot of AB tests involve more than 2 groups, have you thought about adding support for more 2 groups? Or do you advise using the current structure and just compare all groups to A?
Thank you for creating the package.
Best,
Miha
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.