Giter Site home page Giter Site logo

Comments (15)

ismayc avatar ismayc commented on July 4, 2024

@rudeboybert I'd love to figure out a good way to implement this. It seems like a really tricky task though. I think in order to program it, you'd need to first check to make sure that the obs_stat doesn't fall outside of the range of the stat. (This is just a simple if clause.) If it does, you just do the usual binning as you wish.

(The tricky part) If it doesn't, how could we set up the bins to go from

  1. the minimum value of stat to
  2. (a) the negative of obs_stat/obs_stat if obs_stat is negative and then have
  3. the appropriate number of bins between (a) and (b) the positive of obs_stat/-obs_stat if obs_stat is negative
  4. while also going up to the maximum value of stat?

from infer.

ismayc avatar ismayc commented on July 4, 2024

It seems like the internal ggplot2 function of bin_breaks_bins() provides some guidance but it still seems really, really tricky:

bin_breaks_bins <- function(x_range, bins = 30, center = NULL,
                            boundary = NULL, closed = c("right", "left")) {
  stopifnot(length(x_range) == 2)
  
  bins <- as.integer(bins)
  if (bins < 1) {
    stop("Need at least one bin.", call. = FALSE)
  } else if (bins == 1) {
    width <- diff(x_range)
    boundary <- x_range[1]
  } else {
    width <- (x_range[2] - x_range[1]) / (bins - 1)
  }
  
  bin_breaks_width(x_range, width, boundary = boundary, center = center,
                   closed = closed)
}

from infer.

mine-cetinkaya-rundel avatar mine-cetinkaya-rundel commented on July 4, 2024

Another option would be to do something like this:

screen shot 2017-11-07 at 3 37 55 pm

Obviously based on the simulated null as opposed to theoretical (so underneath there is a histogram instead of normal curve) but we just shade beyond the observed over the whole plot instead of in the bins. Perhaps not as pretty, but a simpler solution.

from infer.

ismayc avatar ismayc commented on July 4, 2024

@rudeboybert: Do you think having a bin cut in half at the observed statistic would be OK for students? I think this is much better than having some weird shading on the bin in which the observed stat falls.

@mine-cetinkaya-rundel I like this option much, much better than the current shading scheme! Much easier to decipher and really makes it clear what the p-value corresponds to I think.

@andrewpbray @beanumber @hardin47 Any opposition to this shading scheme?

from infer.

ismayc avatar ismayc commented on July 4, 2024

Any thoughts on colors? I prefer green to red when working with p-values since I want to try to have students think about green as being "GO!" in looking at the p-value, but I'm open to using red as Mine did above. Here are some examples of what the shading looks like for three types of hypothesis test directions with the density overlay:

rplot1
rplot2
rplot3

from infer.

mine-cetinkaya-rundel avatar mine-cetinkaya-rundel commented on July 4, 2024

Do we want/need the density overlay? I think no for simulation based inference since we directly calculate the p-value from the histogram.

For the last figure, I purposefully had the bold line on one side only to indicate the location of the observed sample stat. I'm not wedded to that idea, but I thought it's important to show the other side is just a mirror image, not based on an observed sample stat.

As for the line cutting through the bin -- I guess it's a bit confusing, though less so than the bin itself being multiple colored.

from infer.

ismayc avatar ismayc commented on July 4, 2024

We don't have to have the density overlay, but it is an option in visualize() and that's what I was currently testing there. I do agree that I like to have ONLY the observed stat come through as a bolded vertical line. Here is an example without the density overlay (using -obs_stat from the previous plots as the observed statistic).

rplot4

Agreed that the line cutting through the bin is far easier to explain and this solution cuts down on the number of lines of code in the visualize() function by a lot. I think the tradeoff is well worth it here.

from infer.

rpruim avatar rpruim commented on July 4, 2024

This is a little bit tricky to do "right" -- and you may have to decide which elements of "right" matter most to you.

I wrote code in mosaic to do multi-colored bins in lattice histograms, but in general it isn't really a very good idea since it can violate the "area = probability" maxim that defines histograms and density curves. Here's why: Suppose you have a bin over the interval [a, b] and a cutpoint k in that interval. It is not necessarily the case that P(a <= X <= k) / P(a <= X <= b) = (k-a) / (b-a).

So you really should make sure that the test statistic falls on the edge of a bin to maintain the area = probability identity.

On the other hand, if you like "nice" bins, and want all bins to be the same width, this can be a problem. I think I would suggest creating a custom bar plot that splits that bar(s) containing the test statistic (and its negative) and calculates separate heights for each of the two "halves". The result will be a true histogram that accurately represents probability with area, but it will have 2 (4) bins that are narrower than the others.

I also prefer density histograms, because if you start splitting bins (or having unequal widths for any other reason, then "count" doesn't really make sense for the y-scale anymore. If you want to stick with counts, then you really should have all bins the same width.

PS. I haven't mentioned that sticky issues of floating point arithmetic, fuzzy computation, and data resolution.

PPS. I've also used the simple trick of overlaying a semi-transparent rectangle. It is easy to implement (see mosaic:statTally()), but has some of the same issues as the histogram and can slightly distort the representation of the p-value if test statistic values are not uniformly distributed within histogram bins.

x <- c(10, 18, 9, 15)   # counts in four cells
rdata <- rmultinom(999, sum(x), prob = rep(.25, 4))
statTally(x, rdata, fun = max, binwidth = 1)  # unusual test statistic

image

Now I have to decide whether I want to go back and fix these issues in statTally(). (Thanks ;-)

PPS. I think the vertical line should stop at the x-axis like the bins and the rectangle do.

from infer.

rudeboybert avatar rudeboybert commented on July 4, 2024

I definitely prefer the vertical splitting of bars instead of the horizontally splitting of bars as a short-term minimally viable solution, as a little supplementary explanation will get the correct point across. But in the longer term once the bigger fish are fried, having the appropriate binning scheme will yield graphics that are stand-alone and self-explanatory.

from infer.

mine-cetinkaya-rundel avatar mine-cetinkaya-rundel commented on July 4, 2024

@rudeboybert Would this require having bins of altering widths? If so, I can see it being even more confusing. Alternatively it would require somewhat non-standard binwidths/ticks, correct? Not sure if either of these are a huge improvement but we could always prototype and see. I guess I'm just having difficulty picturing what they would look like.

from infer.

mine-cetinkaya-rundel avatar mine-cetinkaya-rundel commented on July 4, 2024

Also on the issue of colors -- I am completely impartial, but previously I had used red for p-values (red -> reject) and green for confidence intervals (green -> contain). I think we can go with either though.

from infer.

beanumber avatar beanumber commented on July 4, 2024

Not red and green on the same plot though, right?

https://en.wikipedia.org/wiki/Color_blindness#Red.E2.80.93green_color_blindness

from infer.

ismayc avatar ismayc commented on July 4, 2024

For sure! I will be nice to any color-blind folks in any vignettes/examples.

from infer.

mine-cetinkaya-rundel avatar mine-cetinkaya-rundel commented on July 4, 2024

@beanumber No, not on the same plot. I didn't think we would show p-value and confidence interval on the same plot anyway.

from infer.

github-actions avatar github-actions commented on July 4, 2024

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

from infer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.