Giter Site home page Giter Site logo

hopkins's People

Contributors

kwstat avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

hopkins's Issues

NaN or Error in numeric(3L^d) : vector size cannot be infinite, when using the hopkins method

I don't know whether I am doing something wrong but here's what's happening:

I have a DataFrame in the format depicted below, which are features extracted from 15 images of a class (1024 dimensions).

[1] "Number of columns:"
[1] 1024
[1] "Data frame:"
# A tibble: 15 × 1,024
        n0      n1     n2     n3    n4      n5      n6       n7        n8    n9
     <dbl>   <dbl>  <dbl>  <dbl> <dbl>   <dbl>   <dbl>    <dbl>     <dbl> <dbl>
 1  0.160   0.0716 -0.259 0.176  0.517 -0.0688 -0.199   0.0999   0.156    0.384
 2 -0.100  -0.111  -0.294 0.305  0.373 -0.227  -0.130   0.553    0.128    0.313
 3  0.0758  0.0861 -0.276 0.196  0.595 -0.232  -0.0155 -0.0915  -0.000393 0.333
 4  0.164   0.0172 -0.189 0.173  0.354 -0.296  -0.0317  0.0504  -0.0319   0.355
 5  0.163  -0.107  -0.330 0.124  0.542 -0.296  -0.141  -0.00439 -0.0609   0.255
 6  0.296   0.0430 -0.400 0.186  0.606 -0.0735 -0.186   0.0813   0.206    0.465
 7  0.180  -0.0658 -0.266 0.193  0.344 -0.111   0.0569 -0.0170   0.105    0.356
 8  0.175   0.0847 -0.329 0.233  0.535 -0.180  -0.121  -0.0474   0.00945  0.400
 9  0.143   0.0531 -0.116 0.183  0.615 -0.246  -0.171   0.103   -0.0468   0.294
10  0.163  -0.121  -0.335 0.0410 0.802 -0.342  -0.0733 -0.149    0.0699   0.147
11  0.182   0.122  -0.264 0.239  0.571 -0.0713 -0.170  -0.0525   0.0392   0.313
12  0.290  -0.233  -0.283 0.115  0.508 -0.461  -0.0274 -0.194   -0.0963   0.272
13  0.154  -0.0282 -0.264 0.278  0.540 -0.0221 -0.225   0.141    0.205    0.293
14  0.134   0.132  -0.391 0.229  0.414 -0.172  -0.0504  0.295    0.226    0.277
15  0.107   0.0469 -0.235 0.157  0.590 -0.129  -0.0529  0.160    0.102    0.193

I then tried to run the hopkins as exemplified in the documentation:
hopkins(test, m=2)
Which yields either NaN as ran as above or Error in numeric(3L^d) : vector size cannot be infinite, when using torus geometry.

Another problem is when trying to set the number of samples equals to the number of rows (m=100%, i.e., 15), which outputs: m must be no larger than num of samples (but it is actually equal, not greater then the number of samples).

Issues about extremely high dimension data

By this definition Hopkins statistics will not be applicable to extremely high dimension data? like D = 4000+. This will result in either 0 or Not a Number ( Inf / Inf ).

Or if this is used:
stat = 1 / (1 + sum(dwx^d) / sum( dux^d ) )
This will result in either 0 or 1. (I’m not a English native so I didn't realize what 'it is not for our test cases' in the annotations means. Why this formula not used/usable?)

(I‘m trying to validate if there's clustering tendency in a questionaire with 102 items. Since the items plays the role being clustered, the 'dimension' here will be 4000+ subjects answering the questionaire....

So will it be fine if I repeat the process like 10000 repeats and count 1s and 0s?
Or is there any other clustering tendency index for this condition?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.