Hi folks, I get the following error when I tried MCA. I had a data frame 200 * 3.<

IndexError in fs_r about mca HOT 9 CLOSED

grahnavard commented on June 11, 2024

IndexError in fs_r

from mca.

Comments (9)

esafak commented on June 11, 2024

Please upload the data too; as little as needed for us to reproduce the error.

from mca.

grahnavard commented on June 11, 2024

Here are the data that I have used. I added the code I wrote to call MCA as well. what happens if I have only one column? What I am interested is to get the first dimension for all individuals as a representation of my data. Is there a way to get one dimension with high explained variance?
code:

data="""X   Y   Z
5   6   2
12  0   4
12  8   13
1   5   13
4   10  7
9   1   5
7   5   8
8   6   9
2   5   3
9   2   6
3   1   0
7   0   5
10  11  8
6   7   8
1   11  8
3   11  0
11  8   4
8   7   6
7   13  1
0   12  10
6   10  1
13  12  6
3   12  3
8   5   0
6   4   9
13  2   2
12  9   1
2   11  10
4   11  11
3   9   9
11  3   12
9   4   0
4   3   7
11  5   2
6   13  12
1   8   5
2   10  13
6   3   0
2   4   13
1   1   7
5   12  0
2   7   5
10  7   12
1   10  8
3   3   2
2   2   4
4   7   10
0   9   4
8   0   8
5   1   1
7   10  5
9   7   3
10  13  7
3   6   4
13  6   5
3   4   3
2   5   6
7   7   8
0   11  9
4   6   7
1   10  3
10  8   3
2   0   12
12  13  3
12  8   8
13  10  7
5   1   11
12  1   11
1   3   10
8   5   3
5   4   7
7   5   13
13  1   6
13  2   1
11  13  12
9   6   4
6   0   4
12  2   12
7   13  10
8   6   12
5   11  3
6   8   7
11  0   11
13  10  10
5   12  0
13  3   11
9   10  4
5   3   7
4   12  1
1   6   10
7   4   1
6   9   7
10  10  11
12  0   9
2   0   13
10  2   7
7   3   2
9   10  9
10  13  9
9   2   6
11  12  7
2   12  5
9   11  9
3   0   1
0   12  0
6   0   9
3   6   5
0   7   2
8   6   0
6   1   13
7   11  1
10  10  12
4   11  11
11  7   10
0   2   9
3   5   3
10  3   9
13  12  0
10  8   8
10  8   3
0   1   9
4   8   5
8   4   9
8   5   6
7   9   13
10  2   7
13  3   11
9   12  6
5   12  0
5   6   5
11  4   7
0   6   3
13  3   1
6   4   5
12  8   8
4   2   3
2   9   0
1   4   10
9   8   6
3   3   2
5   0   5
2   8   12
0   7   11
6   11  10
3   2   8
10  13  1
3   0   0
5   9   11
9   11  6
9   12  3
10  2   13
7   4   5
12  13  12
12  7   12
11  1   4
12  11  13
8   9   2
10  9   8
12  10  1
7   7   10
0   3   12
1   6   11
4   1   2
0   2   4
7   13  2
9   0   2
4   5   11
8   0   6
1   3   8
4   12  4
2   5   1
8   1   0
11  8   10
1   9   4
0   13  2
4   10  8
0   7   5
0   0   2
1   9   12
12  4   6
2   9   1
6   7   6
3   13  12
3   2   0
0   13  6
5   1   4
8   6   10
13  1   10
11  5   13
6   8   13
11  9   11
7   11  13
4   3   11
5   13  13
11  7   2
13  5   10
8   4   3
13  9   4
11  4   9
1   10  7"""
import pandas, mca, io
X = mca.MCA(pandas.read_csv(io.StringIO(data), sep='\t'),
    benzecri=True, TOL=1e-4, cols=None, ncols=None))
print(X.fs_r())

from mca.

esafak commented on June 11, 2024

What's going on in your situation is that Benzecri correction is eliminating all your eigenvalues:

> print(mca.MCA(pandas.read_csv(io.StringIO(data), sep='\t'), benzecri=True).E)

array([0, 0, 0])

The problem goes away if you simply do not use Benzecri correction. Does this satisfy your concern, or do you think the package should be acting otherwise?

from mca.

grahnavard commented on June 11, 2024

Thank you for your quick reposes. I still get the same error. Do you mean I should use benzecri= False? In either cases it gives me the same error. In your answer what all element in the array are zero.

from mca.

esafak commented on June 11, 2024

If you set benzecri=False fs_r() runs on the above data without error. If it doesn't you might have to install the latest copy from github rather than pypi. The array in my previous answer is the Benzecri corrected eigenvalue matrix. Recall that Benzecri correction involves thresholding the eigenvalues (cf. equation 7). In your case all the eigenvalues are less than the reciprocal of the number of dimensions (i.e., 1/3), hence they are mapped to zero:

> print(mca.MCA(pandas.read_csv(io.StringIO(data), sep='\s', header=None), benzecri=True).s**2)

array([  1.44950626e-01,   1.32977051e-01,   3.88439951e-31])

from mca.

grahnavard commented on June 11, 2024

In fact, I used the latest version from github while the pypi version doesnt have mca.MCA(), instead it should be called by mca.mca(), Now I re-install it from pypi and it worked as you mentioned. Do we have "expl_var" in pypi version? what happens if we just have one or two columns? many times I get this error the pypi version:
File "/Library/Python/2.7/site-packages/mca-1.0-py2.7.egg/mca.py", line 52, in init
self.P, self.s, self.Q = scipy.linalg.svd(_mul(self.D_r, Z_c, self.D_c))
File "/Library/Python/2.7/site-packages/scipy-0.15.1-py2.7-macosx-10.9-intel.egg/scipy/linalg/decomp_svd.py", line 88, in svd
a1 = asarray_chkfinite(a)
File "/Library/Python/2.7/site-packages/numpy/lib/function_base.py", line 613, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs
Data:
0 1 2 3
0 0 0 0 0
1 4 4 2 3
2 2 1 1 1
3 0 3 2 1
4 3 1 4 2
5 1 0 0 0
6 2 2 2 2
7 4 2 4 3
8 4 3 4 5
9 6 4 6 4
10 6 1 5 4
11 0 0 0 0
12 5 6 6 5
13 4 2 1 1
14 6 5 6 6
15 3 4 3 4
16 6 4 3 4
17 2 1 1 0
18 1 5 2 2
19 0 0 0 0
20 6 5 6 5
21 5 6 6 6
22 3 6 6 5
23 3 0 2 1
24 0 2 0 1
25 5 1 4 1
26 1 1 1 2
27 0 0 0 0
28 2 6 4 5
29 2 0 1 0
30 5 6 5 6
31 4 3 4 3
32 1 3 4 3
33 4 6 6 6
34 6 6 5 6
35 5 4 3 3
36 3 2 2 2
37 6 4 5 5
38 5 5 5 5
39 5 3 5 3
40 2 5 3 4
41 3 3 3 4
42 0 3 1 2
43 3 5 3 4
44 1 4 3 6
45 1 2 2 3
46 2 1 1 2
47 1 2 0 1
48 4 5 5 6
49 0 0 0 0

Thank you.

from mca.

esafak commented on June 11, 2024

Since the last pypi release, one contributor renamed mca to MCA, while another introduced the expl_var method.

It makes no sense to use MCA with one/two-dimensional data; it's a dimensionality reduction method, and you have nothing to reduce.

from mca.

grahnavard commented on June 11, 2024

With one column, I believe that method should return the original data instead of giving error, but with two columns, it still should work. The error I got , as the data shows, can happen for more than 2 columns.

from mca.

esafak commented on June 11, 2024

In this case the problem is the all-zero entries, which causes division by zero during the calculation of the normalization factor D_r. My suggested remedy is to drop them:

data = ... # the 49-line string from your last post
newdf = pandas.read_csv(io.StringIO(data), sep='\s', index_col=0)
mca.MCA(newdf[newdf.sum(axis=1) != 0], benzecri=False).fs_r()

(Benzecri correction fails for the same reason as before.)

from mca.

IndexError in fs_r about mca HOT 9 CLOSED

Comments (9)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent