Comments (4)
There is no set limit; it depends on your computer. That's an interesting line for it to trip up at. I would have expected to fail before it got there, if at all. Does your dataframe have categorical variables with high cardinality? Is subsampling an option? I can try it on my computer if you are allowed to share the data.
from mca.
@esafak Sorry I cannot share the data. I already convert my categorical data to binaries via onehotencoder.
from mca.
@michelleowen Hi, have you solved this problem?
Same error here, and I write some demo code, the same issue happened.
_temp_data = []
for i in tqdm.tqdm_notebook(range(20)):
_d = []
for i in range(1244210):
_d.append(np.random.choice([1, 2]))
_temp_data.append(_d)
_temp_df = pd.DataFrame(data=np.array(_temp_data).T, columns=range(20))
mac_result = prince.MCA(_temp_df, n_components=2)
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-85-38ea16b0891a> in <module>()
----> 1 mac_result = prince.MCA(_temp_df, n_components=2)
/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/mca.py in __init__(self, dataframe, n_components, use_benzecri_rates, plotter)
43 dataframe=pd.get_dummies(dataframe),
44 n_components=n_components,
---> 45 plotter=plotter
46 )
47
/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/ca.py in __init__(self, dataframe, n_components, plotter)
26 self._set_plotter(plotter_name=plotter)
27
---> 28 self._compute_svd()
29
30 def _compute_svd(self):
/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/ca.py in _compute_svd(self)
29
30 def _compute_svd(self):
---> 31 self.svd = SVD(X=self.standardized_residuals, k=self.n_components)
32
33 def _set_plotter(self, plotter_name):
/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/ca.py in standardized_residuals(self)
123 """
124 residuals = (self.P - self.expected_frequencies).values
--> 125 return self.row_masses.dot(residuals).dot(self.column_masses)
126
127 @property
/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/ca.py in row_masses(self)
99 represents the weight of the matching row; the non-diagonal cells are equal to 0.
100 """
--> 101 return np.diag(1 / np.sqrt(self.row_sums))
102
103 @property
/home/libertatis/anaconda3/lib/python3.6/site-packages/numpy/lib/twodim_base.py in diag(v, k)
247 if len(s) == 1:
248 n = s[0]+abs(k)
--> 249 res = zeros((n, n), v.dtype)
250 if k >= 0:
251 i = k
MemoryError:
Since in line 249 of twodim_base.py
in numpy, n=1244210 that is the length of the data, and res = zeros((n, n), v.dtype)
means create a very large matrix which may exceed the memory of your machine. I really get stuck on this issue. Since when applied with PCA, there is no error. However, PCA is not suitable for categorical variables.
Since MCA and PCA are similar algorithms which can do dimension reduction. So, I think must be some better methods to rewrite MCA to tackle this issue. However, I can't fix this problem because I am not an expert in this area.
from mca.
The problem is the formation of the large diagonal matrices. To alleviate the problem we can either use a sparse representation, or avoid forming the matrices altogether using BLAS/LAPACK. I could not find a diagonal matrix multiplication routine to do the latter, so I went the sparse matrix route.
from mca.
Related Issues (14)
- Verify factor scores under Benzecri correction
- Fix unit tests under 2.x
- IndexError in fs_r HOT 9
- Include generic data samples in docs
- MCA having problems with pandas CategoricalIndex HOT 1
- MCA from pypi is outdated HOT 1
- Possible fix for error: “ValueError: array must not contain infs or NaNs”
- ncols can't be larger than number of rows of the dataframe HOT 2
- Fails to return results and gets killed HOT 1
- MCA throwing Memory Error HOT 1
- Improvements Suggestions HOT 2
- Should number samples must be greater than reduced dimension value HOT 1
- Functionality of fs_r_sup() HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mca.