Giter Site home page Giter Site logo

Comments (2)

vnmabus avatar vnmabus commented on June 23, 2024

First, if you are obtaining such high p-values for clearly distinct distributions, maybe there is a bug in the code or maybe you are calling the method with wrong parameters, because that should not happen. Can you provide an example of how are you using the method?

As for the explanation and understanding, the complete procedure is explained in the original article of Székely and Rizzo.

I will summarize the method:

  1. The null hypothesis is that the two samples have the same distribution. The alternative hypothesis is that the distribution is different (it does not matter how).
  2. In the article they prove that the expected energy statistic (energy_test_statistic in the code) between two samples converge if the samples have the same distribution but tends to infinity (when the size of the samples grow) if they have different distributions.
  3. So, we will discard the null hypothesis if the energy statistic is "too high". But, how do we measure if it is "too high"? Because our samples have a finite size, the statistic will not be near infinity.
  4. Here is where we use the idea of a permutation test. Essentially, under the null hypothesis, all the observations come from the same distribution. Thus, if we permute the observations, so that now some observations may switch to a different sample, under the null hypothesis, the energy statistic would be similar to the original one: there is no reason for the original one to be special.
  5. However, under the alternative hypothesis, the samples obtained from the permutation come from a common distribution, which is a mixture of the original distributions of each sample. However, when we computed the original statistic, each sample had a different distribution. Thus, it is expected that the original statistic would be larger in this case than the statistics obtained by the permutations.
  6. Thus, we can perform a lot of random permutations (the number of permutations is the parameter num_resamples). We then compare the statistics obtained with the original one, obtaining the proportion of statistics larger than the original. This proportion is the estimated p-value.
  7. Under the alternative hypothesis, this p-value should be very small, as the statistic should be more extreme for the original data. Under the null hypothesis, the original p-value is not speciall in any way, so this p-value would be distributed uniformly between 0 and 1. The probability that this p-value is less than α is exactly α. Thus, if we discard the null hypothesis when the p-value is less than 0.05, we will wrongly discard the null hypothesis one time every 20 times.

from dcor.

vnmabus avatar vnmabus commented on June 23, 2024

I will close this as there is no answer from @srujan741.

from dcor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.