Giter Site home page Giter Site logo

Comments (5)

ranshadmi avatar ranshadmi commented on May 29, 2024 1

Are you familiar with statsmodels.stats.power.TTestIndPower.solve_power()?
I realize that is probably just a fancy wrapper around a simple t-test, but given effect_size, alpha, power and the ratio between the two groups size, it output the "required" number of samples. That's what I'm using now and I was wondering how can I replace it with your function.

from deep-significance.

Kaleidophon avatar Kaleidophon commented on May 29, 2024

Hey @ranshadmi!

Thank you for your interest. Let me try to explain the relationship between sample size and the result better.
First of all, the results of the test are non-deterministic, since they based on a bootstrapping procedure. When you are playing around with the test, you can use the seed argument to fix the randomness.

Secondly, the final test score depends on two different part: The extent of the violation of the stochastic order based on the original two samples of scores, and an adjustment based on bootstrapped score samples (see eq. 3 in the paper). The violation of the stochastic order is being calculated based on the cumulative distribution functions of the two distribution of model scores. Since we do not have access to the true distributions, the empirical CDFs are used (see implementation here). When you only have a single for one of the samples, the empirical CDF for the first algorithm will essentially be a step function with a single step - you can imagine that this will not be very informative for the test. In addition, the second term includes a correction term for the sample sizes which can correct for this lack to some degree, and another variance term based on bootstrapped score samples. But since bootstrapping a single score will always lead to the same result, this will also not be very informative.

The bootstrapping is used in order to produce an upper bound to the test result. In that sense, the difference between your examples with [8] and [8, 10] is expected: Adding 10 does decrease the violation of the stochastic order, and adding another score sample makes the upper bound tighter. However, you are correct that this sample size is very low, and any conclusions from any statistical test with such a low sample size should be drawn with the appropriate grain of salt.

For this purpose, the package supplies two more functions: With aso_uncertainty_reduction(), you can compare by which factor the uncertainty about the true test result decreases by adding more scores. With bootstrap_power_analysis(), you can determine the statistical power, i.e. the complement to the Type II error, a false negative, of your sample. Ideally, the power should be around 0.8. These tools are meant to help you in the decision about how many scores to collect, but the rule of thumb always remains: The more, the better.

Hope that helped and let me know if you have any further questions!

from deep-significance.

ranshadmi avatar ranshadmi commented on May 29, 2024

Thanks for your elaborate reply!
I still have some questions...

As for aso_uncertainty_reduction() - how should I interpret the returned number? what does "uncertainty reduction" of 1.1547005383792515 actually mean?
Should I just try a few pairs of values until I see some kind of saturation in the returned value? For example, 1 (old) -> 3 (new) will return a large number, while 3 -> 5 will return a small number - from that I can conclude that 3 is a good-enough number?

from deep-significance.

Kaleidophon avatar Kaleidophon commented on May 29, 2024

No problem! The number you get from aso_uncertainty_reduction() is the factor by which the amount of uncertainty around the estimate of the degree of violation of the stochastic order decreases as the sample size grows. This is based on the theorem 2.4 by del Barrio et al. (2017). You can see that the estimate of the violation (the term with F_n, G_m), approaches the true estimate (e_W2(F, G)) at a rate of sqrt(mn/(m + n)). Thus, adding more samples has diminishing returns. However, keep in mind that these are relative numbers - there is no (or at least, I don't know) good way to estimate by how much the current estimate is off. Thus, the function is supposed to help you decide whether adding more scores of algorithm A or B is more useful in reducing the uncertainty and making the test more accurate - "the more samples, the better" still holds.

from deep-significance.

Kaleidophon avatar Kaleidophon commented on May 29, 2024

No, it looks really cool! I don't think you really need to replace that mine, it seems more versatile. Mine just computes the power, and you can use an arbitrary test to do that, but that is pretty much it compared to the one you provided. Thanks for sharing!

from deep-significance.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.