Giter Site home page Giter Site logo

Comments (7)

jbiggsets avatar jbiggsets commented on June 7, 2024 3

I like the option of giving the users the choice. Is that a lot of work?

No, I think it should be pretty easy to implement. I can work on it with this next release, if we're aiming for mid-April.

from factor_analyzer.

jbiggsets avatar jbiggsets commented on June 7, 2024 2

Thanks for submitting this issue.

The way we're calculating the principal factors currently is using the randomized_svd() function from scikit-learn. This is a lot more efficient and resolves some of the intrinsic sign indeterminacy problems with SVD. However, it has the drawback that the number of factors is limited to K, where K = min(M, N). In your case, the number of rows is less than the number of columns, so it's constraining K to the number of rows.

An alternative is to use scipy's svd() function with full_matrix=True (the default). The drawback here that it's less efficient with larger data sets and, at least as far as I understand, does not have an easy way to resolve the sign indeterminacy the way randomized_svd() does when flip_sign=True. (If I'm wrong about that or you're aware of a work-around here, please let me know!)

Assuming I'm not missing anything, one way to resolve this might just be to implement both options and give users the ability to decide? @desilinguist, if you have thoughts, let me know.

from factor_analyzer.

jbiggsets avatar jbiggsets commented on June 7, 2024 2

The more I look into this, the more I think that we shouldn't actually permit this use case. I still think we can allow users to choose between randomized_svd() and scipy.linalg.svd(), but scikit-learn's PCA() estimator also prevents users from trying to do PCA when the number of components is set higher than min(n_samples, n_features).

import numpy as np
from sklearn.decomposition import PCA

X = np.random.randn(5000).reshape(20, 250)

n = 21
for solver in ['full', 'randomized', 'arpack']:
    try:
        pca = PCA(n_components=n, svd_solver=solver)
        pca.fit(X)
    except ValueError as error:
        print(error); pass

Output:

n_components=21 must be between 0 and min(n_samples, n_features)=20 with svd_solver='full'
n_components=21 must be between 1 and min(n_samples, n_features)=20 with svd_solver='randomized'
n_components=21 must be between 1 and min(n_samples, n_features)=20 with svd_solver='arpack'

from factor_analyzer.

jbiggsets avatar jbiggsets commented on June 7, 2024 2

Yeah, let's follow their lead and do the same then?

Yeah, I think that makes the most sense.

from factor_analyzer.

desilinguist avatar desilinguist commented on June 7, 2024 1

I like the option of giving the users the choice. Is that a lot of work?

from factor_analyzer.

desilinguist avatar desilinguist commented on June 7, 2024

Fantastic! Let's go for it.

from factor_analyzer.

desilinguist avatar desilinguist commented on June 7, 2024

Yeah, let's follow their lead and do the same then?

from factor_analyzer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.