Giter Site home page Giter Site logo

Comments (4)

esafak avatar esafak commented on June 11, 2024

There is no set limit; it depends on your computer. That's an interesting line for it to trip up at. I would have expected to fail before it got there, if at all. Does your dataframe have categorical variables with high cardinality? Is subsampling an option? I can try it on my computer if you are allowed to share the data.

from mca.

michelleowen avatar michelleowen commented on June 11, 2024

@esafak Sorry I cannot share the data. I already convert my categorical data to binaries via onehotencoder.

from mca.

GoingMyWay avatar GoingMyWay commented on June 11, 2024

@michelleowen Hi, have you solved this problem?

Same error here, and I write some demo code, the same issue happened.

_temp_data = []
for i in tqdm.tqdm_notebook(range(20)):
    _d = []
    for i in range(1244210):
        _d.append(np.random.choice([1, 2]))
    _temp_data.append(_d)

_temp_df = pd.DataFrame(data=np.array(_temp_data).T, columns=range(20))

mac_result = prince.MCA(_temp_df, n_components=2)
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-85-38ea16b0891a> in <module>()
----> 1 mac_result = prince.MCA(_temp_df, n_components=2)

/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/mca.py in __init__(self, dataframe, n_components, use_benzecri_rates, plotter)
     43             dataframe=pd.get_dummies(dataframe),
     44             n_components=n_components,
---> 45             plotter=plotter
     46         )
     47 

/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/ca.py in __init__(self, dataframe, n_components, plotter)
     26         self._set_plotter(plotter_name=plotter)
     27 
---> 28         self._compute_svd()
     29 
     30     def _compute_svd(self):

/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/ca.py in _compute_svd(self)
     29 
     30     def _compute_svd(self):
---> 31         self.svd = SVD(X=self.standardized_residuals, k=self.n_components)
     32 
     33     def _set_plotter(self, plotter_name):

/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/ca.py in standardized_residuals(self)
    123         """
    124         residuals = (self.P - self.expected_frequencies).values
--> 125         return self.row_masses.dot(residuals).dot(self.column_masses)
    126 
    127     @property

/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/ca.py in row_masses(self)
     99             represents the weight of the matching row; the non-diagonal cells are equal to 0.
    100         """
--> 101         return np.diag(1 / np.sqrt(self.row_sums))
    102 
    103     @property

/home/libertatis/anaconda3/lib/python3.6/site-packages/numpy/lib/twodim_base.py in diag(v, k)
    247     if len(s) == 1:
    248         n = s[0]+abs(k)
--> 249         res = zeros((n, n), v.dtype)
    250         if k >= 0:
    251             i = k

MemoryError: 

Since in line 249 of twodim_base.py in numpy, n=1244210 that is the length of the data, and res = zeros((n, n), v.dtype) means create a very large matrix which may exceed the memory of your machine. I really get stuck on this issue. Since when applied with PCA, there is no error. However, PCA is not suitable for categorical variables.

Since MCA and PCA are similar algorithms which can do dimension reduction. So, I think must be some better methods to rewrite MCA to tackle this issue. However, I can't fix this problem because I am not an expert in this area.

from mca.

esafak avatar esafak commented on June 11, 2024

The problem is the formation of the large diagonal matrices. To alleviate the problem we can either use a sparse representation, or avoid forming the matrices altogether using BLAS/LAPACK. I could not find a diagonal matrix multiplication routine to do the latter, so I went the sparse matrix route.

from mca.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.