Giter Site home page Giter Site logo

kaleidophon / deep-significance Goto Github PK

View Code? Open in Web Editor NEW
316.0 8.0 20.0 5.67 MB

Enabling easy statistical significance testing for deep neural networks.

Home Page: https://deep-significance.rtfd.io/en/latest/

License: GNU General Public License v3.0

Python 67.18% Shell 0.16% Jupyter Notebook 32.65%
significance-testing deep-learning dl hypothesis-testing hypothesis-tests statistical-significance statistical-significance-test machine-learning ml deeplearning

deep-significance's People

Contributors

kaleidophon avatar rtmdrr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep-significance's Issues

Impact of sample size

Hi,

I'm doing some (very) small scale experimentation with your package and there's something I'm not clear about. How does the sample size(s) effect the statistical significance returned by your function (min_eps)?
I don't mean that as a general question, rather about the correct usage of your package. For example, why would aso([8], [5, 5, 8, 7, 8]) return 0.0858? can I really conclude that algorithm 1 is better than algorithm 2 based on a single sample from algorithm 1's results?
Another example would be aso([10, 8], [5, 5, 8, 7, 8]) -> 0.0367. Again, I'm convinced I can conclude that algorithm 1 is better than algorithm2 based on such as small sample of results. I would expect to be "asked" to run more experiments to provide more results to be used in the statistical test.

So in short, I'm asking if your function takes into account the sample size(s) when it calculates the significance score (min_eps)? Am I missing something here? If I do, please feel free to correct me.

Any clarification would be appreciated, thanks!
Ran

Sample-level random seed test

Hi,

First of all, thanks a lot for your work. It is exactly what I was looking for :)

I wondered if it is possible to compare multiple runs of the two models, A and B, on sample level rather than on score level? So let's say you trained each model five times with different random seeds. Does it make sense that two tests each run of A against each run of B and then average all the epsilons?

Doubt on how tu use ASO

@Kaleidophon Good evening, sorry i'm not sure i understood how the ASO function works.
For example if i run:
"
my_model_scores = scores_AUROC_Resnet
baseline_scores = scores_AUROC_Mobilenet

min_eps = aso(my_model_scores, baseline_scores, seed=seed, show_progress=False, confidence_score 0.95)"
from what i understood min_eps should be the upper bound to the amount of violation of the stochastic order.
What i don't understand is how the samples F* and G* are extracted. I mean in the original paper it says inverse transform sampling is used. While as far as i understood in your paper on this repository it is stated in (3) that these samples are obtained bootstrapping. Does this mean the same thing? Are they inverselly sampled casually or is a bootstrapping involving a constant similar to the one used in power analysis used? maybe i'm just confusing terms and they mean the same thing.
Thank you in advance and have a good evening

Misaligned diagonals for DataFrame

Hi,

Love this repo, thanks for doing this!

Issue

I have a small issue with respect to the new DataFrame feature that you implemented. Here I have misaligned diagonals where I would get close to stochastic dominance of a model compared to itself.

Reproduce Issue

I have the following dictionary:
d = {'x': array([59.13, 58.03, 59.18, 58.78, 58.5 ]), 'y': array([58.13, 59.19, 59.94, 60.08, 59.85]), 'z': array([58.77, 58.86, 59.58, 59.59, 59.64]), 'w': array([58.16, 58.49, 59.87, 58.94, 58.96])}

I use the following line of code:
print(multi_aso(d, confidence_level=0.05, return_df=True))

I get the following result:

    x         y            z            w
x    1.000000  1.000000     0.202027       0.0
y    1.000000  0.101093     0.000000       0.0
z    0.202027  0.000000     1.000000       0.0
w    0.000000  0.000000     0.000000       1.0

Where I think the diagonal for the (y, y) pair shouldn't be correct.

Thanks for reading!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.