The hopkins's discuss from kwstat

NaN or Error in numeric(3L^d) : vector size cannot be infinite, when using the hopkins method

I don't know whether I am doing something wrong but here's what's happening:

I have a DataFrame in the format depicted below, which are features extracted from 15 images of a class (1024 dimensions).

[1] "Number of columns:"
[1] 1024
[1] "Data frame:"
# A tibble: 15 × 1,024
        n0      n1     n2     n3    n4      n5      n6       n7        n8    n9
     <dbl>   <dbl>  <dbl>  <dbl> <dbl>   <dbl>   <dbl>    <dbl>     <dbl> <dbl>
 1  0.160   0.0716 -0.259 0.176  0.517 -0.0688 -0.199   0.0999   0.156    0.384
 2 -0.100  -0.111  -0.294 0.305  0.373 -0.227  -0.130   0.553    0.128    0.313
 3  0.0758  0.0861 -0.276 0.196  0.595 -0.232  -0.0155 -0.0915  -0.000393 0.333
 4  0.164   0.0172 -0.189 0.173  0.354 -0.296  -0.0317  0.0504  -0.0319   0.355
 5  0.163  -0.107  -0.330 0.124  0.542 -0.296  -0.141  -0.00439 -0.0609   0.255
 6  0.296   0.0430 -0.400 0.186  0.606 -0.0735 -0.186   0.0813   0.206    0.465
 7  0.180  -0.0658 -0.266 0.193  0.344 -0.111   0.0569 -0.0170   0.105    0.356
 8  0.175   0.0847 -0.329 0.233  0.535 -0.180  -0.121  -0.0474   0.00945  0.400
 9  0.143   0.0531 -0.116 0.183  0.615 -0.246  -0.171   0.103   -0.0468   0.294
10  0.163  -0.121  -0.335 0.0410 0.802 -0.342  -0.0733 -0.149    0.0699   0.147
11  0.182   0.122  -0.264 0.239  0.571 -0.0713 -0.170  -0.0525   0.0392   0.313
12  0.290  -0.233  -0.283 0.115  0.508 -0.461  -0.0274 -0.194   -0.0963   0.272
13  0.154  -0.0282 -0.264 0.278  0.540 -0.0221 -0.225   0.141    0.205    0.293
14  0.134   0.132  -0.391 0.229  0.414 -0.172  -0.0504  0.295    0.226    0.277
15  0.107   0.0469 -0.235 0.157  0.590 -0.129  -0.0529  0.160    0.102    0.193

I then tried to run the hopkins as exemplified in the documentation:
hopkins(test, m=2)
Which yields either NaN as ran as above or Error in numeric(3L^d) : vector size cannot be infinite, when using torus geometry.

Another problem is when trying to set the number of samples equals to the number of rows (m=100%, i.e., 15), which outputs: m must be no larger than num of samples (but it is actually equal, not greater then the number of samples).

Issues about extremely high dimension data

By this definition Hopkins statistics will not be applicable to extremely high dimension data? like D = 4000+. This will result in either 0 or Not a Number ( Inf / Inf ).

Or if this is used：
stat = 1 / (1 + sum(dwx^d) / sum( dux^d ) )
This will result in either 0 or 1. （I’m not a English native so I didn't realize what 'it is not for our test cases' in the annotations means. Why this formula not used/usable?）

（I‘m trying to validate if there's clustering tendency in a questionaire with 102 items. Since the items plays the role being clustered, the 'dimension' here will be 4000+ subjects answering the questionaire....

So will it be fine if I repeat the process like 10000 repeats and count 1s and 0s？
Or is there any other clustering tendency index for this condition？

kwstat / hopkins Goto Github PK

hopkins's Issues

NaN or Error in numeric(3L^d) : vector size cannot be infinite, when using the hopkins method

Issues about extremely high dimension data

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent