Giter Site home page Giter Site logo

educationaltestingservice / factor_analyzer Goto Github PK

View Code? Open in Web Editor NEW
222.0 19.0 68.0 5 MB

A Python module to perform exploratory & confirmatory factor analyses.

License: GNU General Public License v2.0

Python 91.05% Jupyter Notebook 7.04% R 1.91%
factor-analysis efa python cfa

factor_analyzer's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

factor_analyzer's Issues

Methods to derive loadings

Hi, Thanks a lot for providing such package in Factor Analysis.

I am a student working on multivariate analysis class, and I have learned three methods in deriving out loading matrix.

  1. Principal component method (ref : https://aaronschlegel.me/factor-analysis-principal-component-method-r.html)
  2. Principal factor method (ref : https://rpubs.com/aaronsc32/factor-analysis-principal-factor-method)
  3. Maximum likelihood method

I can see that the method 'ml' corresponds to maximum likelihood method, and from the source code I guess 'principal' method corresponds to Principal component method (please correct me if I am wrong). However, I cannot tell which method corresponds to Principal factor method, please let me know if possible.

Thanks a lot.

Possibility to choose the correlation

Would it be possible to give the factor analysis a parameter where you would define the correlation to be used in the correlation matrix calculation? At least pandas corr() function has three methods (Pearson, Spearman and Kendall) from which to choose from.

Can you give the formula to be deduced?

Hello, I'm researching your factor analysis module
But,I dont know the fromula what you use,
Is it convenient for you to share the basic derivation process?
I want to know the different between it and Spss。
thank you!

aic and bic output nan

Hi,

I conducted a cfa analysis, but the aic_ and bic_ outputs are both "nan". I wonder if there could be any example in calling aic and bic? And what might be the reason that I receive "nan"?

Factor's scores compute.

It is thanks to use this for my data analysis. It's friendly used! On my opinion, the last step of process of factor analysis is computing the Factor's scores, and this could be added!

ddof biased

First, thanks a lot for this module, much more convenient than the function in sklearn.

The setting of cov calculation (and thus corr) is biased, because the parameter is set ddof=0.

r = np.cov(x, rowvar=False, ddof=0)

Any mathematical reason?

Migrate CI builds from Travis CI to Gitlab CI

Since Travis CI no longer seems to be offering open source credits for free, we need to move to GitLab CI since it works with GitHub repos.

Here is the process to follow:

  1. Follow the steps here to set up a mirror repository on Gitlab only for CI/CD purposes.

  2. Go to the newly mirrored project on GitLab. Go to Settings > Integration. Choose "Slack notifications". Uncheck all checkboxes except "Enable Integration - Active" and "Pipeline". Paste the Slack webhook URL in the "Webhook" field. Uncheck "Notify only broken pipelines" and change "Branches to be notified" dropdown to "All Branches". Click the "Save Changes" button.

  3. In this Github repository, go to Settings > Branches > Branch Protection Rule. Edit the rule for the main branch. Look in the Status Checks section, uncheck "Travis CI - Branch" and "Travis CI - Pull Request" and hit the Save button at the bottom.

  4. Create a new branch with the same changes as shown in this SKLL PR. DO NOT SUBMIT A PULL REQUEST YET.

  5. Once the Gitlab CI builds pass for this new branch, go back to the status checks in Step 1. You should now see a "ci/gitlab/gitlab.com" item. Enable that so that it becomes a required check and then click the Save button again.

  6. Now submit a PR with this new branch.

Factor Loading - possible error

In line 194,195 the eigenvalues are sorted but the eigenvectors are not:

    values = sorted(values, reverse=True)[:n_factors]
    vectors = vectors[:, :n_factors]

The factor loadings are then calculated in line 200:

        loadings = np.dot(vectors,
                          np.diag(np.sqrt(values)))

I think that the vectors need to be sorted by the same order as the values, otherwise, you are not multiplying the correct eigenvalues with the eigenvectors.

Bug when ussing oblimin rotation

Hi!

First of all thanks for the aesome package, I've found it very usefull, but there is something I cannot do, use the gamma parameter for oblimin rotation.

When I execute FactorAnalyzer(n_factors=x[0], rotation=x[1],method=x[2], rotation_kwargs={'gamma':0.5}, use_smc=True ) the error launched is the following:

ValueError: Input must be 1- or 2-d.

It's appear to come from here

X = np.diag(1, p) - np.dot(np.zeros((p, p)), X)

If I'm completly honest I don't know how to fix it but I think what is needed is ti implement the X as in quartimin minuns

[ Math from here ]

KMO value to be NaN

I try to calculate the KMO value for 4 data sets as:

kmo_all1, kmo_total1 = calculate_kmo(df1)
kmo_all2, kmo_total2 = calculate_kmo(df2)
kmo_all3, kmo_total3 = calculate_kmo(df3)
kmo_all4, kmo_total4 = calculate_kmo(df4)

where df1,df2,df3,df4 are 4 similar data sets.
And I got kmo_total1, kmo_total3, kmo_total4 higher than 0.85 while kmo_total2 to be NaN. Why? I remember that KMO should be between 0 and 1. By the way that in kmo_all2, all values are NaN. How could it happen?

Does this support mixed data? i.e Binary valued vs numerical data

Hi jbiggsets,

Thanks for the package and it has been very helpful for my work. The lack of good statistical packages in Python compared to R is bit of a show stopper. This has indeed been a gem.

While using the module, I'm curious as to whether it can handle mixed data types as inputs. For example variables from a survey that can be wither binary valued and numerical.
Thanks!
Thusitha

Add structure matrices for oblique rotations

We should add the structure matrices for oblique rotations. With oblique rotations, the structure matricies contain the correlations between factors and variables. With orthogonal rotations, this is not needed because it is identical to the loading matrix.

Clarification needed - How is the factor order determined?

When using the analyze function and retrieving the loadings dataframe; how is the ordering of the factor columns determined? (factor1, factor2, etc...)

I've been trying to get as close to the way SAS does factor analysis as possible (knowing that factor_analyzer is limited to lm and MINRES while SAS uses PCA), and for a particular dataset (https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv) I got very close results in terms of the values in the loadings matrix, but found that factor 2 and 3 were reversed in comparison with SAS' output. I'd like to check how the ordering for the factor columns is determined in order to know how to account for that swap. I'm attaching the results from SAS and this package in a colored excel file to illustrate the behavior I'm referring to.

I would really appreciate some insight into this matter.
Thanks in advance.

corr_bh.xlsx

Very strange result of varimax FA

I'm making web version of Kelly's rep test, where we have table of estimations of people by user generated features and by FA we extract factors from this table by fa.transform(df); resulting table always has typically for normal distribution values (from about -3 to 3). But for one case I recieved very strange result (it's on attached image). I've checked - without rotation we have typically table, it seems there is some error (or maybe not all tables are suitable for varimax rotation? but I've also checked with sklearn FA package, it gave normal result with varimax). Can you please check this case? Of course, I can send table in any format
Image

GridSearchCV FactorAnalyzer error

Hi Jeremy,
How are you?
I ran into an error I cannot solve, and I don't find any clue on the internet. I tried to integrate a FactorAnalyzer into an sklearn GridSearchCV. As far as I know as it has a fit and a transform method it should work, shouldn't it? But unfortunately I'm running into the following error:

RuntimeError: Cannot clone object FactorAnalyzer(rotation_kwargs={}), as the constructor either does not set or modifies parameter impute

I tried to set impute directly in the constructor, I also tried to add it to the param_grid dict, but none of those work, same error. Is this a known issue? Should it work at all? Is there a workaround you know about? Maybe it has to do something with this answer?

Thanks a million in advance for your kind help!

Here is my code for reference:

ppln_pca_param_grid = {'decisiontreeclassifier__max_depth': np.arange(2, 15),
                   'decisiontreeclassifier__min_samples_leaf': np.arange(5,10),
                   'decisiontreeclassifier__ccp_alpha': np.arange(0., 0.003, 0.0003).tolist(),            
                   'decisiontreeclassifier__criterion': ['gini', 'entropy'],
                   'smote__sampling_strategy': np.arange(0.3, 1., 0.1).tolist(),
                   'factoranalyzer__n_components':[2, 3, 4, 5],
                   'factoranalyzer__impute':['median'],
                   'factoranalyzer__rotation':['varimax']                      
                    }

scaler = StandardScaler()
resampler = SMOTE(random_state=123)
fa = FactorAnalyzer()
dt = DecisionTreeClassifier(random_state=123)

imba_pca_pipeline = make_pipeline(resampler, scaler, fa, dt)

gscv_pca_imba_f1 = GridSearchCV(imba_pca_pipeline, ppln_pca_param_grid, scoring='f1', cv=10, n_jobs = 4, verbose = 2)
gscv_pca_imba_f1.fit(train_df_pca[features], train_df_pca['hit'])

Results not the same as in R psych package.

The testing data is:

x1 x2 x3 x4 x5
0.2 3 21.5 13.6 7.3
0.5 8.4 33.5 12.1 9.8
1.3 13.6 49.4 13.6 5.2
1.3 11.6 47.5 13 4.9
2.4 17.7 45.2 12.5 3.4
0.5 12.7 50.3 10.1 5.1

In Python:

from factor_analyzer import FactorAnalyzer
fa=FactorAnalyzer()
fa.analyze(data,2,rotation='varimax')
fa.loadings

Then I got:
Factor1 Factor2
x1 0.958472 0.288052
x2 0.943449 -0.311609
x3 0.747067 -0.520007
x4 -0.016515 0.720617
x5 -0.791465 0.072079

In R:

b= read_excel("b.xlsx")
fa=principal(b,2,rotate='varimax')
fa$loadings#查看载荷矩阵

However, I got:
Loadings:
RC1 RC2
x1 0.917 0.261
x2 0.941 -0.247
x3 0.802 -0.460
x4 0.974
x5 -0.885

Loadings change relative position to each other after rotation

Describe the bug
I'm trying to orthogonally rotate loadings obtained by a PCA by using the rotator function, however the result changes the loadings change relative position to each other after rotation. This is more evident when more components are kept before rotation, and when only the two largest components are kept almost fits the original data.

To Reproduce
Steps to reproduce the behavior:

  1. Download data https://we.tl/t-B3BJz8Gorz
from factor_analyzer import Rotator
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Read the loadings from the csv
df = pd.read_csv('loadings.csv', index_col=0)

# Plot the initial loadings
print('Original loadings')
df.plot(x=0, y=1, kind='scatter')
plt.axhline(0,color='black')
plt.axvline(0,color='black')
plt.xlim(-1,1)
plt.ylim(-1,1)
plt.show()

L = df.to_numpy()

# Loop over the number of components and remove the last each time

for _ in range(L.shape[1]-1):
    print(L.shape)

    rotator = Rotator(method='varimax', normalize=False)

    rotated_loadings = pd.DataFrame(rotator.fit_transform(L))

    rotated_loadings[[0,1]].plot(x=0, y=1, kind='scatter')
    plt.axhline(0,color='black')
    plt.axvline(0,color='black')
    plt.xlim(-1,1)
    plt.ylim(-1,1)
    plt.show()
    
    # Delete the smallest component
    L = np.delete(L, L.shape[1]-1, 1)

Expected behavior
I did not expect the loadings relative position to each other to change, neither for the number of components to have an impact on this. Maybe I have misunderstood varimax rotation, as I expected it to only rotate the axis, and not change the positions of the loadings in relation to each other, or I'm encountering some kind of bug.

Loadings Different (negative) vs SPSS/R Psych Lib results

Based on a correlation matrix, the calculated results with factor_analyzer are different than when running in SPSS, as in they seem to be multiplied by (-1). Communalities are more or less equal.

Example Matrix below (12x12):

1.00 | 0.53 | 0.26 | 0.14 | 0.18 | 0.24 | 0.24 | 0.22 | 0.20 | 0.21 | 0.21 | 0.36
0.53 | 1.00 | 0.33 | 0.34 | 0.39 | 0.51 | 0.50 | 0.42 | 0.27 | 0.43 | 0.35 | 0.52
0.26 | 0.33 | 1.00 | 0.22 | 0.28 | 0.24 | 0.27 | 0.28 | 0.09 | 0.16 | 0.03 | 0.18
0.14 | 0.34 | 0.22 | 1.00 | 0.56 | 0.47 | 0.49 | 0.34 | 0.28 | 0.37 | 0.27 | 0.29
0.18 | 0.39 | 0.28 | 0.56 | 1.00 | 0.55 | 0.59 | 0.49 | 0.25 | 0.43 | 0.30 | 0.40
0.24 | 0.51 | 0.24 | 0.47 | 0.55 | 1.00 | 0.80 | 0.55 | 0.30 | 0.51 | 0.49 | 0.55
0.24 | 0.50 | 0.27 | 0.49 | 0.59 | 0.80 | 1.00 | 0.56 | 0.31 | 0.58 | 0.50 | 0.56
0.22 | 0.42 | 0.28 | 0.34 | 0.49 | 0.55 | 0.56 | 1.00 | 0.27 | 0.37 | 0.32 | 0.42
0.20 | 0.27 | 0.09 | 0.28 | 0.25 | 0.30 | 0.31 | 0.27 | 1.00 | 0.55 | 0.28 | 0.29
0.21 | 0.43 | 0.16 | 0.37 | 0.43 | 0.51 | 0.58 | 0.37 | 0.55 | 1.00 | 0.52 | 0.51
0.21 | 0.35 | 0.03 | 0.27 | 0.30 | 0.49 | 0.50 | 0.32 | 0.28 | 0.52 | 1.00 | 0.55
0.36 | 0.52 | 0.18 | 0.29 | 0.40 | 0.55 | 0.56 | 0.42 | 0.29 | 0.51 | 0.55 | 1.00

Code:

fa = FactorAnalyzer(method='minres', n_factors=1, rotation=None, is_corr_matrix=True, bounds=(0.005, 1)) fa.fit(fa_df) print(fa.loadings_)

Result:
[[-0.3883726 ]
[-0.66186571]
[-0.32924641]
[-0.55939523]
[-0.66587561]
[-0.81117504]
[-0.8457027 ]
[-0.6317671 ]
[-0.44712954]
[-0.69460134]
[-0.58560195]
[-0.69916123]]

SPSS Code produces almost the same result except each value *(-1) (i.e. positive value/absolute value), while get_communalities() returns mostly the same values (mostly because SPSS rounds values on display).

Any idea on what am I missing or what is the issue?

Thanks

Implement ten Berge factor scores

Is your feature request related to a problem? Please describe.
Nope!

Describe the solution you'd like
There are multiple ways to calculate factor scores. Currently, factor_analyzer implements the regression-based Thurstone method; however, the the Berge method does a better job of preserving the original correlations.

Describe alternatives you've considered
Alternative is not to implement this.

Additional context
Here is a preliminary implemenetation:

https://stackoverflow.com/questions/67856186/correct-way-to-calculate-correlations-between-factors/67895893#67895893

ImportError: No module named utils

How to resolve this issue?

from factor_analyzer import FactorAnalyzer 19 from sklearn.base import BaseEstimator, TransformerMixin
20
---> 21 from factor_analyzer.utils import (corr,
22 impute_values,
23 partial_correlations,
ImportError: No module named utils

Remove dependency on `sklearn`

The factor_analyzer package only uses sklearn to perform linear regression and scaling. These should both be easy to accomplish without adding such a weighty dependency.

issue about bartlett.

def calculate_bartlett_sphericity(x):
"""
Test the hypothesis that the correlation matrix
is equal to the identity matrix.identity

H0: The matrix of population correlations is equal to I.
H1: The matrix of population correlations is not equal to I.

The formula for Bartlett's Sphericity test is:

.. math:: -1 * (n - 1 - ((2p + 5) / 6)) * ln(det(R))

Where R det(R) is the determinant of the correlation matrix,
and p is the number of variables.

Parameters
----------
x : array-like
    The array from which to calculate sphericity.

Returns
-------
statistic : float
    The chi-square value.
p_value : float
    The associated p-value for the test.
"""
n, p = x.shape
x_corr = corr(x)

corr_det = np.linalg.det(x_corr)
statistic = -np.log(corr_det) * (n - 1 - (2 * p + 5) / 6)
degrees_of_freedom = p * (p - 1) / 2
p_value = chi2.sf(statistic, degrees_of_freedom)
return statistic, p_value

英文不好,这里的p_value = chi2.pdf(statistic, degrees_of_freedom)改为sf,另外感谢原作者,已星星!
English poor, code"p_value = chi2.pdf(statistic, degrees_of_freedom)" change "sf" , thanks and star.

calculate_bartlett_sphericity error

The function calculate_bartlett_sphericity(x) return the following error:

/usr/local/lib/python3.6/dist-packages/factor_analyzer/factor_analyzer.py:118: RuntimeWarning: invalid value encountered in log
statistic = -np.log(corr_det) * (n - 1 - (2 * p + 5) / 6)

I verified that:
with my data:

np.linalg.det(x_corr)= -3.545142402311294e-53

so the problem is the log on a negative number.
Is there a problem with the test's formula?

A problem

I am a beginner,I use Python 3.7 to enter the following code:
import pandas as pd
import numpy as np
from pandas import DataFrame,Series
from factor_analyzer import FactorAnalyzer

datafile = 'data.xlsx'
data = pd.read_excel(datafile)
data = data.fillna(0)#用0填充空值

fa = FactorAnalyzer()
fa.analyze(data, 5, rotation=None)#固定公共因子个数为5个
print("公因子方差:\n", fa.get_communalities())#公因子方差
print("\n成分矩阵:\n", fa.loadings)#成分矩阵
var = fa.get_factor_variance()#给出贡献率
print("\n解释的总方差(即贡献率):\n", var)

fa_score = fa.get_scores(data)#因子得分
fa_score.head()

#将各因子乘上他们的贡献率除以总的贡献率,得到因子得分中间值
a = (fa.get_scores(data)*var.values[1])/var.values[-1][-1]

#将各因子得分中间值相加,得到综合得分
a['score'] = a.apply(lambda x: x.sum(), axis=1)

Operation error:
Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\leave.py", line 11, in
fa.analyze(data, 5, rotation=None)#固定公共因子个数为5个
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\factor_analyzer\factor_analyzer.py", line 745, in analyze
method)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\factor_analyzer\factor_analyzer.py", line 580, in fit_factor_analysis
smc_mtx = self.smc(data).values
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\factor_analyzer\factor_analyzer.py", line 482, in smc
corr_inv = sp.linalg.inv(corr)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\scipy\linalg\basic.py", line 977, in inv
'getrf|getri' % -info)
ValueError: illegal value in 4-th argument of internal getrf|getri

How to solve?Help me, thank you.

Bug in `get_eigenvalues()`

There is a minor bug in the get_eigenvalues() method. The diagonal of the original correlation matrix can sometimes be permanently filled with the communality estimates. We should be copying the original correlation matrix before we fill the diagonal, since np.fill_diagonal() modifies things in place.

This can only happen after the factor model is estimated, so it's not a huge deal, but if we use the saved correlations matrix after calling get_eigenvalues(), it will be wrong.

Add Geomin rotation

I'm doing work in python for which I'd like to use geomin rotation. I see that geomin isn't implemented in factor_analyzer. It looks straightforward to implement and I was considering doing so.

I think I'd have to create a _geomin_obj function in rotator.py that gives the gradient and criterion for geomin. The equations for this are in "GRADIENT PROJECTION ALGORITHMS AND SOFTWARE FOR ARBITRARY ROTATION CRITERIA IN FACTOR ANALYSIS", which R's GPArotation package references, so writing that code should be easy. Given a geomin objective, I believe Rotator should be able to use the generic _oblique rotation function?

The Rotator class would also need an epsilon parameter.

And thoughts on this and can you confirm my understanding here?

Add new factor rotations

There are several common factor rotations missing from factor_analyzer. It would be nice to add the following:

  • oblimax
  • quartimax
  • oblimin
  • quartimin
  • equamax

Incorporate CFA

This is a pretty big enhancement, but it may be useful to add CFA as well. Will look into how difficult this might be.

Equamax rotation giving similar results to quartimax instead

Describe the bug
Using equamax rotation gives the same results as quartimax to around 12 decimal points.

Quartimax results match with SPSS but not the equamax ones.

To Reproduce

import numpy as np
from factor_analyzer import FactorAnalyzer

data = np.random.randn(10,10)

equamax = FactorAnalyzer(n_factors=6, rotation='equamax', method='principal')
equamax.fit(data)
quartimax = FactorAnalyzer(n_factors=6, rotation='quartimax', method='principal')
quartimax.fit(data)

DECIMALS = 12

equal_loadings = np.array_equal(equamax.loadings_.round(DECIMALS),quartimax.loadings_.round(DECIMALS))
equal_rot_matrix = np.array_equal(equamax.rotation_matrix_.round(DECIMALS),quartimax.rotation_matrix_.round(DECIMALS))

print(equal_loadings, equal_rot_matrix)

This prints True True, but the loadings and rotation matrices should be different.

Expected behavior
The equamax and quartimax results should be different from each other AND closely match with SPSS.

If you try the example above but replace 'quartimax' with 'varimax' the results are different.

Desktop (please complete the following information):
Windows 10 x64, Python 3.9.7, factor_analyzer 0.3.2

Make issue template

We should make an issue template, telling users what information we need when they file an issue.

This should include details, such as:

  • What are you trying to do?
  • What have you tried so far?
  • What is the exact error you are getting?
  • Can you share your data set (or a sample), so that we can try to replicate the issue?

The optimization routine failed to converge: b'ABNORMAL_TERMINATION_IN_LNSRCH'

Hi, I ran cfa.fit() on a 33x119 dataset and got this error:

/Users/*/opt/anaconda3/lib/python3.7/site-packages/factor_analyzer/confirmatory_factor_analyzer.py:733: UserWarning: The optimization routine failed to converge: b'ABNORMAL_TERMINATION_IN_LNSRCH'
  'to converge: {}'.format(str(res.message)))
ConfirmatoryFactorAnalyzer(bounds=None, disp=True, impute='median',
                           is_cov_matrix=False, max_iter=200, n_obs=33,
                           specification=<factor_analyzer.confirmatory_factor_analyzer.ModelSpecification object at 0x1a1c2a7410>,
                           tol=None)

Here's the code I used:

model_dict = {"set_shifting": cued_cols+predict_cols,\ #15+15 columns
              "working_memory": DF_cols+nback_cols,\ #15+15 columns
              "attention": flanker_cols+shape_cols,\ #15+15 columns
              "response_inhibition": GNG_cols+stop_cols} #15+14 columns
model_spec = ModelSpecificationParser.parse_model_specification_from_dict(concat_mainDVs, model_dict)
cfa = ConfirmatoryFactorAnalyzer(model_spec, disp=True)
cfa.fit(concat_mainDVs.to_numpy())

Add weights attribute

Hi,

I would like to suggest a new attribute to the factor_analyzer.py module. Some applications require to access the weights (B) used to calculate F = Z · B in the transform method.

As an example, I leave a paper with such use (Table 9: Component score coefficient matrix):
https://www.researchgate.net/publication/227440018_Constructing_an_alternative_dollar_index_to_gauge_the_movements_in_currency_markets

Would it be possible to add a new attribute to access them? Maybe something like:

def __init__(self, ...):
    # ...
    self.weights_ = None


def transform(self, X):
    # ...
    try:
        self.weights_ = np.linalg.solve(self.corr_, structure)
    except Exception as error:
        warnings.warn('Unable to calculate the factor score weights; '
                            'factor loadings used instead: {}'.format(error))
        self.weights_ = self.loadings_
    # ...

# # # #
fa = FactorAnalyzer(...)
fa.fit(my_data)
fa.transform(my_data)
my_weights = fa.weights_

Thanks in advance

Add "Reorder factors after rotation" option to analyze(...) method

See issue #23 for context:

when SAS performs the varimax rotation on the factor loading matrix, it is re-ordering the columns according to the amount of variance accounted for by each factor after rotation. The factor_analyzer program is not doing that. It is maintaining the original order of the unrotated factor loading matrix after the rotation is performed.

This would allow the method to replicate SAS' behavior regarding column order in its output.

Fix rotation matrices

In fa(), the rotation matrices are inverted and transposed after the fact. We do not currently do this in factor_analyzer, so the rotation matrix results are not consistent. In order to address #12 this needs to be fixed.

Getting an error when trying to fit factor loadings to original data

Describe the bug
I'm doing factor analysis on a matrix with the dimensions 17x153, so I have more columns than rows. Normally, I would expect a PCA to return the same number of components as variables, but in my instance it only returns 17. I get the correct factor loadings for the 17 components, however when I try to fit it to the original data - in order to see how my original observations are distributed on the components - I get an error, and values that do not correspond with my tests.

C:\Users\UserName\anaconda3\envs\DS_env\lib\site-packages\factor_analyzer\factor_analyzer.py:740: UserWarning: Unable to calculate the factor score weights; factor loadings used instead: Singular matrix
  'factor loadings used instead: {}'.format(error))

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser: Firefox

Additional context
Python version: 3.7.8

Scaling constants can differ between analyze() and get_scores()

Hi!
I'm a data scientist in Japan and I'm really happy to do factor analysis with rotation in python.
However, one day, I find a problem as follows.

When we try to calculate the factor score from data:

This may cause a problem when the data for analyze() and get_scores() are not the same.
(I think this situation is not so rare.)

I'm really happy when this problem is fixed.
Thank you 😄

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.