Giter Site home page Giter Site logo

statsmodels / statsmodels Goto Github PK

View Code? Open in Web Editor NEW
9.6K 281.0 2.8K 53.95 MB

Statsmodels: statistical modeling and econometrics in Python

Home Page: http://www.statsmodels.org/devel/

License: BSD 3-Clause "New" or "Revised" License

Python 92.79% Assembly 0.06% AGS Script 2.89% HTML 0.93% R 0.67% Stata 0.32% C 0.01% MATLAB 0.63% Shell 0.16% Batchfile 0.01% Fortran 0.11% Cython 1.43%
python statistics econometrics data-analysis generalized-linear-models timeseries-analysis regression-models count-model data-science forecasting

statsmodels's Introduction

Statsmodels logo

PyPI Version Conda Version License Codecov Coverage Coveralls Coverage PyPI - Downloads Conda downloads

About statsmodels

statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics and estimation and inference for statistical models.

Documentation

The documentation for the latest release is at

https://www.statsmodels.org/stable/

The documentation for the development version is at

https://www.statsmodels.org/dev/

Recent improvements are highlighted in the release notes

https://www.statsmodels.org/stable/release/

Backups of documentation are available at https://statsmodels.github.io/stable/ and https://statsmodels.github.io/dev/.

Main Features

  • Linear regression models:
    • Ordinary least squares
    • Generalized least squares
    • Weighted least squares
    • Least squares with autoregressive errors
    • Quantile regression
    • Recursive least squares
  • Mixed Linear Model with mixed effects and variance components
  • GLM: Generalized linear models with support for all of the one-parameter exponential family distributions
  • Bayesian Mixed GLM for Binomial and Poisson
  • GEE: Generalized Estimating Equations for one-way clustered or longitudinal data
  • Discrete models:
    • Logit and Probit
    • Multinomial logit (MNLogit)
    • Poisson and Generalized Poisson regression
    • Negative Binomial regression
    • Zero-Inflated Count models
  • RLM: Robust linear models with support for several M-estimators.
  • Time Series Analysis: models for time series analysis
    • Complete StateSpace modeling framework
      • Seasonal ARIMA and ARIMAX models
      • VARMA and VARMAX models
      • Dynamic Factor models
      • Unobserved Component models
    • Markov switching models (MSAR), also known as Hidden Markov Models (HMM)
    • Univariate time series analysis: AR, ARIMA
    • Vector autoregressive models, VAR and structural VAR
    • Vector error correction model, VECM
    • exponential smoothing, Holt-Winters
    • Hypothesis tests for time series: unit root, cointegration and others
    • Descriptive statistics and process models for time series analysis
  • Survival analysis:
    • Proportional hazards regression (Cox models)
    • Survivor function estimation (Kaplan-Meier)
    • Cumulative incidence function estimation
  • Multivariate:
    • Principal Component Analysis with missing data
    • Factor Analysis with rotation
    • MANOVA
    • Canonical Correlation
  • Nonparametric statistics: Univariate and multivariate kernel density estimators
  • Datasets: Datasets used for examples and in testing
  • Statistics: a wide range of statistical tests
    • diagnostics and specification tests
    • goodness-of-fit and normality tests
    • functions for multiple testing
    • various additional statistical tests
  • Imputation with MICE, regression on order statistic and Gaussian imputation
  • Mediation analysis
  • Graphics includes plot functions for visual analysis of data and model results
  • I/O
    • Tools for reading Stata .dta files, but pandas has a more recent version
    • Table output to ascii, latex, and html
  • Miscellaneous models
  • Sandbox: statsmodels contains a sandbox folder with code in various stages of development and testing which is not considered "production ready". This covers among others
    • Generalized method of moments (GMM) estimators
    • Kernel regression
    • Various extensions to scipy.stats.distributions
    • Panel data models
    • Information theoretic measures

How to get it

The main branch on GitHub is the most up to date code

https://www.github.com/statsmodels/statsmodels

Source download of release tags are available on GitHub

https://github.com/statsmodels/statsmodels/tags

Binaries and source distributions are available from PyPi

https://pypi.org/project/statsmodels/

Binaries can be installed in Anaconda

conda install statsmodels

Getting the latest code

Installing the most recent nightly wheel

The most recent nightly wheel can be installed using pip.

python -m pip install -i https://pypi.anaconda.org/scientific-python-nightly-wheels/simple statsmodels --upgrade --use-deprecated=legacy-resolver

Installing from sources

See INSTALL.txt for requirements or see the documentation

https://statsmodels.github.io/dev/install.html

Contributing

Contributions in any form are welcome, including:

  • Documentation improvements
  • Additional tests
  • New features to existing models
  • New models

https://www.statsmodels.org/stable/dev/test_notes

for instructions on installing statsmodels in editable mode.

License

Modified BSD (3-clause)

Discussion and Development

Discussions take place on the mailing list

https://groups.google.com/group/pystatsmodels

and in the issue tracker. We are very interested in feedback about usability and suggestions for improvements.

Bug Reports

Bug reports can be submitted to the issue tracker at

https://github.com/statsmodels/statsmodels/issues

statsmodels's People

Contributors

alexbrc avatar bartbkr avatar bashtage avatar chadfulton avatar enricogiampieri avatar evgenyzhurko avatar fperez avatar j-grana6 avatar jarrodmillman avatar jbrockmendel avatar josef-pkt avatar jseabold avatar kshedden avatar langmore avatar matthew-brett avatar phobson avatar rgommers avatar s-scherrer avatar tatamiya avatar thequackdaddy avatar tomaugspurger avatar tupui avatar tvanzyl avatar tyleha avatar vincentarelbundock avatar vincentdavis avatar wesm avatar yarikoptic avatar yl565 avatar yogabonito avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

statsmodels's Issues

adfuller problem in autolag

Original Launchpad bug 661793: https://bugs.launchpad.net/statsmodels/+bug/661793
Reported by: josef-pktd (joep).

running tsa.stattools as a script shows that the call to autolag uses variables that are not defined

trying to fix it blindly doesn't work

    from var import AR  #move to top ?
    fullRHS = xdall   #JP: just guessing
    lagstart = 1
    icbest, bestlag = _autolag(AR, xdshort, fullRHS, lagstart,
            maxlag, autolag)

raise ValueError("Exogenous variables are not supported for AR.")

ValueError: Exogenous variables are not supported for AR.

using
icbest, bestlag = _autolag(AR, xdshort, None, lagstart,
maxlag, autolag)

I get

results[lag] = mod_instance.fit(*fitargs, **{maxlag:lag})

TypeError: fit() keywords must be strings

and adfuller should be renames

Convergence Failure with Newton, Bug in discrete GLM Poisson model

Original Launchpad bug 673197: https://bugs.launchpad.net/statsmodels/+bug/673197
Reported by: hanno-spreeuw (Hanno Starling).

updated diagnosis:
Newton, the default optimizer, does not converge to the correct solution. I didn't look at the details but I guess the stepsize selection is not robust. The gradient in the example is very large. The example converges with Nelder-Mead.

original description:
The attachment has three columns with discrete data. Try scikits.statsmodels.discretemod.Poisson on the data.
The residuals of the middle column will all be negative, which is wrong.
This is because the slope is very close to zero which causes some bug in the algorithm.
The first and last column give good results.

numdifftools dependency

Original Launchpad bug 653902: https://bugs.launchpad.net/statsmodels/+bug/653902
Reported by: vincent-vincentdavis (Vincent Davis).

statsmodels/init.py imports tsa
Which then returns an exception from statsmodels/tsa/var.py "raise Warning("You need to install numdifftools to try out the AR model")"
Should numdifftools be a dependency for all of statsmodels ?

add hasconstant indicator for R-squared and df calculations

Original Launchpad bug 574004: https://bugs.launchpad.net/statsmodels/+bug/574004
Reported by: josef-pktd (joep).

see also Bug #440151: fvalue and mse_model are -inf if only 1 regressor

The R-squared and df calculations assume that there is a constant among the regressors.

todo: add a hasconstant indicator and adjust calculations for df and R-squared

Notes: pandas did recently a similar change

R-squared is not really well defined without constant and there are several competing definitions, but for the simple case of regression without constant we should use the total sum of squares instead of mean corrected ???

I would keep the R-squared with constant for transformed regressors that loses the constant for the regression, (example of heteroscedasticity).

perfect fit (in rlm)

Original Launchpad bug 790770: https://bugs.launchpad.net/statsmodels/+bug/790770
Reported by: josef-pktd (joep).

look at what happens in the models, model results when we have a perfect fit, and decide what to do in this cases
example for rlm, but there might be similar problems in other cases

mailinglist 2011-05-31
for rlm having a zero residual doesn't cause an exception anymore, but having a perfect fit can produce nans in some results (scale=0 and 0/0 division)

import numpy as np
import scikits.statsmodels.api as sm
x = np.array([0, 0.96, 2.18])
levels = np.array([2.8, 2.8, 2.8])
res1 = sm.RLM(endog=levels,exog=sm.add_constant(x),M=sm.robust.norms.HuberT()).fit()
print res1.params
print res1.sresid
print res1.scale

>>> print res1.params
[  2.77555756e-17   2.80000000e+00]
>>> print res1.sresid
[ nan  nan  nan]
>>> print res1.scale
0.0

arma singular matrix

Original Launchpad bug 800012: https://bugs.launchpad.net/statsmodels/+bug/800012
Reported by: josef-pktd (joep).

import numpy as np
import scikits.statsmodels.api as sm

singular matrix with zero autocorrelation timeseries in ARMA ?

exog = sm.add_constant(np.random.randn(100, 2), prepend=True)
endog = exog.sum(1) + 0.2 * np.random.randn(100)

modarma = sm.tsa.ARMA(endog)
resarma = modarma.fit(order=(1,1))

Optimization terminated successfully.
Current function value: 6.341930
Iterations 12
Traceback (most recent call last):
File "", line 1, in
File "sm_overview.py", line 36, in
resarma = modarma.fit(order=(1,1))
File "E:\Josef\eclipsegworkspace\statsmodels-git\statsmodels-josef\scikits\statsmodels\tsa\arima_model.py", line 375, in fit
bounds=bounds, iprint=disp)
File "C:\Python26\lib\site-packages\scipy\optimize\lbfgsb.py", line 196, in fmin_l_bfgs_b
f, g = func_and_grad(x)
File "C:\Python26\lib\site-packages\scipy\optimize\lbfgsb.py", line 142, in func_and_grad
f = func(x, args)
File "E:\Josef\eclipsegworkspace\statsmodels-git\statsmodels-josef\scikits\statsmodels\tsa\arima_model.py", line 343, in
loglike = lambda params: -self.loglike_kalman(params)
File "E:\Josef\eclipsegworkspace\statsmodels-git\statsmodels-josef\scikits\statsmodels\tsa\arima_model.py", line 203, in loglike_kalman
return KalmanFilter.loglike(params, self)
File "E:\Josef\eclipsegworkspace\statsmodels-git\statsmodels-josef\scikits\statsmodels\tsa\kalmanf\kalmanfilter.py", line 632, in loglike
Q_0 = dot(inv(identity(m
*2)-kron(T_mat,T_mat)),
File "C:\Python26\lib\site-packages\numpy\linalg\linalg.py", line 445, in inv
return wrap(solve(a, identity(a.shape[0], dtype=a.dtype)))
File "C:\Python26\lib\site-packages\numpy\linalg\linalg.py", line 328, in solve
raise LinAlgError, 'Singular matrix'
numpy.linalg.linalg.LinAlgError: Singular matrix

IndentationError in generalized_linear_model.py and test_mvt_pdf AssertionError

There is an IndentationError in genmod/generalized_linear_model.py, line 917 has to be commented out.

Also running the tests I got four warnings and one fail (besides the fails from the IndentationError):

In [2]: scikits.statsmodels.test()
Running unit tests for scikits.statsmodels
NumPy version 1.6.1
NumPy is installed in /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy
Python version 2.7.2 (default, Aug 23 2011, 11:38:07) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)]
nose version 1.0.0
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scikits.statsmodels-0.3.0-py2.7.egg/scikits/statsmodels/tools/tools.py:256: FutureWarning: The default of `prepend` will be changed to True in the next release, use explicit prepend
  "next release, use explicit prepend", FutureWarning)
S............................................................................S.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................S.......S.........S.......S........S.........S.......S........S............../opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scikits.statsmodels-0.3.0-py2.7.egg/scikits/statsmodels/robust/scale.py:176: RuntimeWarning: divide by zero encountered in divide
  subset = np.less_equal(np.fabs((a - mu)/scale), self.c)
.........../opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/distributions.py:3834: RuntimeWarning: overflow encountered in double_scalars
  Px /= sqrt(r*pi)*(1+(x**2)/r)**((r+1)/2)
..F.....Warning: The algorithm does not converge.  Roundoff error is detected
  in the extrapolation table.  It is assumed that the requested tolerance
  cannot be achieved, and that the returned result (if full_output = 1) is 
  the best which can be obtained.
................................................................................................/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scikits.statsmodels-0.3.0-py2.7.egg/scikits/statsmodels/tools/tools.py:310: RuntimeWarning: divide by zero encountered in divide
  return np.where(test, 0, 1. / X)
.............................................SSSSS...................................................................................................................................................................................................................................................................................................................................................................................................................
======================================================================
FAIL: scikits.statsmodels.sandbox.distributions.tests.test_multivariate.TestMVDistributions.test_mvt_pdf
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/case.py", line 187, in runTest
    self.test(*self.arg)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scikits.statsmodels-0.3.0-py2.7.egg/scikits/statsmodels/sandbox/distributions/tests/test_multivariate.py", line 161, in test_mvt_pdf
    decimal=18)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 452, in assert_almost_equal
    return assert_array_almost_equal(actual, desired, decimal, err_msg)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 800, in assert_array_almost_equal
    header=('Arrays are not almost equal to %d decimals' % decimal))
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 636, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 18 decimals

(mismatch 66.6666666667%)
 x: array([ 0.00072768,  0.00099806,  0.00276614])
 y: array([ 0.00072768,  0.00099806,  0.00276614])

----------------------------------------------------------------------
Ran 1176 tests in 108.226s

FAILED (SKIP=15, failures=1)
Out[2]: <nose.result.TextTestResult run=1176 errors=0 failures=1>

make link argument in Family._setlink more flexible

Original Launchpad bug 422327: https://bugs.launchpad.net/statsmodels/+bug/422327
Reported by: josef-pktd (joep).

setlink checks for class instance, both in general and for specific instances and raises a ValueError

if not isinstance(link, L.Link):
validlink = link in self.links

especially this includes the use of Power with different values (reported by PierreGM)

Proposal:

convert Exception to warning: since it's up to the user to decide which links to use, using the "wrong" links might not make much statistical sence but shouldn't break the functioning of the model.

make check for specific links by name attribute

allow for user specified links, not just links that are derived from the predefined links, e.g. extension to BoxCox transformation.

make all-pdf failure

Original Launchpad bug 532870: https://bugs.launchpad.net/statsmodels/+bug/532870
Reported by: nwagner ([email protected],de).

[2]
Chapter 1.
[3] [4]
Chapter 2.
(/usr/share/texmf/tex/latex/psnfss/ts1ptm.fd) [5] [6] [7] [8]
Underfull \hbox (badness 10000) in paragraph at lines 518--521
\T1/ptm/m/n/10 Biopython is a set of tools for bi-o-log-i-cal com-pu-ta-tion []
[]http://biopython.org/wiki/Main_Page[][] Li-cense:
[9] [10] [11] [12] [13] [14]
Overfull \hbox (22.60109pt too wide) in paragraph at lines 1125--1126
[][]
[15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25]

! LaTeX Error: Something's wrong--perhaps a missing \item.

See the LaTeX manual or LaTeX Companion for explanation.
Type H for immediate help.
...

l.2347 \end{tabulary}

?
! Missing \endgroup inserted.

\endgroup
l.2347 \end{tabulary}

I am using


revno: 1955
committer: joep [email protected]
branch nick: statsmodels_trunk3
timestamp: Mon 2010-02-15 12:54:04 -0500
message:
small changes to install notes, add sandbox/tests folder to manifest.in

on opensuse11.2

arma with exog

Original Launchpad bug 800011: https://bugs.launchpad.net/statsmodels/+bug/800011
Reported by: josef-pktd (joep).

is this a bug?

import numpy as np
import scikits.statsmodels.api as sm

exog = sm.add_constant(np.random.randn(100, 2), prepend=True)
endog = exog.sum(1) + 0.2 * np.random.randn(100)
modarma = sm.tsa.ARMA(endog, exog)
resarma = modarma.fit(order=(1,1))

Optimization terminated successfully.
         Current function value: 5.533772
         Iterations 12
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "sm_overview.py", line 36, in <module>
    resarma = modarma.fit(order=(1,1))
  File "E:\Josef\eclipsegworkspace\statsmodels-git\statsmodels-josef\scikits\statsmodels\tsa\arima_model.py", line 359, in fit
    start_params = self._fit_start_params((k_ar,k_ma,k))
  File "E:\Josef\eclipsegworkspace\statsmodels-git\statsmodels-josef\scikits\statsmodels\tsa\arima_model.py", line 76, in _fit_start_params
    start_params[:k] = ols_params
ValueError: shape mismatch: objects cannot be broadcast to a single shape

Logit and glm don't raise warning for perfect separation case

Original Launchpad bug 562376: https://bugs.launchpad.net/statsmodels/+bug/562376
Reported by: josef-pktd (joep).

see thread "discretemod.Logit vs glm" 2010-04-13

If a logit model has perfect separation then the MLE does not exist or converge

sm.GLM(Endog, Exog, family = sm.family.Binomial())

has a zero division warning and some nans in some results

sm.discretemod.Logit(Endog, Exog)

gives no indication that the optimization stopped because of maxiter and not because of convergence

As in SAS, we should check convergence and the resulting estimate whether we are in the case of perfect separation and then issue a warning and add a flag to the results.

OLS fails with 1 exogenous variable

Original Launchpad bug 618283: https://bugs.launchpad.net/statsmodels/+bug/618283
Reported by: changshe (Chang She).

In [1]: import numpy as np

In [2]: import numpy.random as rand

In [3]: x, y = rand.normal(0.5, 1.3, 1200), rand.normal(1.3, 2.5, 1200)

In [4]: from scikits.statsmodels.regression import OLS

In [5]: OLS(y, x).fit()

ValueError Traceback (most recent call last)

H:\Workspace\Python\src in ()

Q:\GAARD\Prod\Apps\PythonRuntime\1.9\lib\site-packages\scikits.statsmodels-0.1.0b1-py2.5
\statsmodels\regression.pyc in fit(self)
226 #TODO: add a full_output keyword so that only light results needed for
227 # IRLS are calculated?
--> 228 beta = np.dot(self.pinv_wexog, self.wendog)
229 # should this use lstsq instead?
230 # worth a comparison at least...though this is readable

ValueError: matrices are not aligned

In [6]: %debug

q:\gaard\prod\apps\pythonruntime\1.6\lib\site-packages\scikits.statsmodels-0.1.0b1-py2.5.egg\scikits\statsmodels\regression.py(228)fit()
227 # IRLS are calculated?
--> 228 beta = np.dot(self.pinv_wexog, self.wendog)
229 # should this use lstsq instead?

ipdb> self.pinv_wexog.shape
(1200, 1)
ipdb> self.wendog.shape
(1200,)
ipdb>

dev first guess returns nan for Poisson family GLM

Original Launchpad bug 603306: https://bugs.launchpad.net/statsmodels/+bug/603306
Reported by: amcmorl (amcmorl).

Poisson famiy GLM fit to data with 0s in endog variable fails with error:

"ValueError: The first guess on the deviance function returned a nan. This could be a boundary problem and should be reported."

Problem can be reproduced with following code:

import numpy as np
import scikits.statsmodels as sm

size = 1e5
nbeta = 3

beta = np.random.rand(nbeta)
x = np.random.rand(size, nbeta)
y = np.random.poisson(np.dot(x, beta))

fam = sm.families.Poisson()
glm = sm.GLM(y,x, family=fam)
res = glm.fit()

If family=fam is left out of GLM constructor, then glm.fit() works as expected.

kernel density

Original Launchpad bug 717511: https://bugs.launchpad.net/statsmodels/+bug/717511
Reported by: josef-pktd (joep).

I need a kde estimator for quick check of residuals, univariate is enough for diagnostic checks on residuals, but I need to be able to experiment with the bandwidth interactively.

Where is it?

Get the subclass of scipy.stats.gaussian_kde with bandwidth option into statsmodels, or change class in scipy.
sandbox/nonparametric needs testing

GLS.fit() doesn't attach results to self

Original Launchpad bug 428895: https://bugs.launchpad.net/statsmodels/+bug/428895
Reported by: josef-pktd (joep).

GLS.fit() doesn't attach results to self, as a consequence predict fails because of missing results
GLS.results works correctly

Shall we downgrade the fit() method and recommend the use of GLS.results instead, or keep duplicate functionality between fit (method) and results (property) ?

res = sm.OLS(y, X).fit()
res.model.predict(X)
Traceback (most recent call last):
File "c:\josef\eclipsegworkspace\statsmodels_trunk2\scikits\statsmodels\regression.py", line 270, in predict
raise ValueError, "If the model has not been fit, then you must specify the params argument."
ValueError: If the model has not been fit, then you must specify the params argument.

GenericLikelihoodModel and fittedvalues

Original Launchpad bug 717510: https://bugs.launchpad.net/statsmodels/+bug/717510
Reported by: josef-pktd (joep).

GenericLikelihoodModel and its results do not define fittedvalues

Check what minimal methods a subclass has to implement, so we can have some fittedvalues. The meaning of fittedvalues is not obvious in some non-linear or non-normal models.

Is defining a predict method in the model subclass enough? Is there some generic support?

example Markov switching model with AR terms in each state: estimation with GenericLikelihoodModel seems to work fine although it's slow, but how do I get prediction, fitted values and residuals?

glm Binomial loglike 0log0 error

Original Launchpad bug 680077: https://bugs.launchpad.net/statsmodels/+bug/680077
Reported by: josef-pktd (joep).

reported by Sol 2010-11-22

sm.GLM(y.astype(float),
sm.add_constant(XYP.astype(float), prepend=True),
family=sm.families.Binomial()).fit()
Traceback (most recent call last):
File "<pyshell#37>", line 3, in
family=sm.families.Binomial()).fit()
File "c:\josef\eclipsegworkspace\statsmodels-josef-experimental-gsoc\scikits\statsmodels\glm.py", line 388, in fit
returned a nan. This could be a boundary problem and should be reported."
ValueError: The first guess on the deviance function returned a nan. This could be a boundary problem and should be reported.

Newey-West sandwich covariance is missing

Original Launchpad bug 430788: https://bugs.launchpad.net/statsmodels/+bug/430788
Reported by: josef-pktd (joep).

As far as I have seen, the current sandwich estimators only correct for heteroscedasticity but not for autocorrelation. Adding for example Newey-West sandwich covariance would be useful for data with a time dimension.

But I haven't checked whether the current sandwich estimators work to some extend also for autocorrelation.

t_test, f_test, model.py for normal instead of t-distribution

Original Launchpad bug 647777: https://bugs.launchpad.net/statsmodels/+bug/647777
Reported by: josef-pktd (joep).

models.py LikelihoodModelResults

t_test assumes that test statistic is t-distributed. If we only look at the asymptotic normal distribution, then we don't need df_residual.

Currently mainly a thought that needs checking.

model.py provides most of the generic methods that can be used with MLE models that are based on the asymptotic normal distribution.
see also SAS new proc PLM

essentially only params and cov_params are needed for the normal approximation.

I haven't thought yet about the dof in f_test (number of restriction is ok, but I don't know if we need df_resid.

conf_int is already overwritten by some subclasses to use the normal instead of t-distribution

rpy dependency in tests

Original Launchpad bug 422330: https://bugs.launchpad.net/statsmodels/+bug/422330
Reported by: josef-pktd (joep).

class TestWLS in test_regression uses RModel in the inti which raises a NameError when rpy is missing. If rpy is found missing, then RModel is not imported and therefore not available.

Solution:
Move the call to RModel to the setup method, as we did for all other test classes.

robust covariance for tstat, ttest, ftest,

Original Launchpad bug 526164: https://bugs.launchpad.net/statsmodels/+bug/526164
Reported by: josef-pktd (joep).

(I haven't checked the source, just from memory)

We provide sandwich estimators, but they cannot directly be used for testing.

enhancement proposal:
find a way to allow setting the used parameter covariance matrix for the various test measures, t(), f_test, t_test and other result statistics that depend on the parameter covariance matrix or standard errors.

e.g. option to set use_cov = HC0 or used_cov=NW once we have Newey-West also.

not clear: as new attribute or as a keyword option to t_test, f_test.

Naming of "fittedvalues" inconsistent

Original Launchpad bug 504782: https://bugs.launchpad.net/statsmodels/+bug/504782
Reported by: adrian-schlatter (Adrian Schlatter).

The fit() objects returned by OLS(...).fit() and RLM(...).fit() differ in naming for fitted values.

OLS uses ".fittedvalues" while RLM uses ".fitted_values" (note the underscore).

I expect the naming to be the same in both objects.

Code to reproduce the bug:

import numpy as np
import scikits.statsmodels as sm

nsample=100
x = np.linspace(0,10, nsample)
X = sm.tools.add_constant(x)
beta = np.array([1, 0.1])
y = np.dot(X, beta) + np.random.normal(size=nsample)

resOLS = sm.OLS(y, X).fit()
resRLM = sm.RLM(y, X).fit()

dir(resOLS)[39]
dir(resRLM)[28]

design cannot be n x 1

Original Launchpad bug 434407: https://bugs.launchpad.net/statsmodels/+bug/434407
Reported by: jsseabold (Skipper Seabold).

Just so I remember. GLS does not currently work for a n x 1 design array.

import scikits.statsmodels as sm
data = sm.datasets.longley.Load()
data.exog = sm.add_constant(data.exog)
ols_res = sm.OLS(data.endog, data.exog).fit()
res = ols_res.resid
res_regression = sm.OLS(res[1:],res[:1]).fit()

ValueError: matrices are not aligned

The one time I tried to work on this, it took a little more attention than I expected or had time for.

UnboundLocalError: local variable 'wls_results' referenced before assignment (glm.py, 386)

Original Launchpad bug 422216: https://bugs.launchpad.net/statsmodels/+bug/422216
Reported by: david-warde-farley (David Warde-Farley).

It says that binomial family models can take 2D responses (presumably for multinomial regression?). Neither appears to work; am I doing something wrong?

Exactly what it says above.

In [141]: Y = zeros((700, 3))

In [142]: X = randn(700, 10)

In [143]: Y[arange(700),random_integers(3,size=700)-1] = 1

In [144]: a = sm.GLM(Y, X, family=sm.family.Binomial())

In [145]: a.fit()

UnboundLocalError Traceback (most recent call last)

/Users/dwf/ in ()

/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/scikits.statsmodels-0.1.0b1-py2.5.egg/scikits/statsmodels/glm.pyc in fit(self, maxiter, method, tol, data_weights, scale)
384 self.iteration += 1
385 self.mu = mu
--> 386 glm_results = GLMResults(self, wls_results.params,
387 wls_results.normalized_cov_params, self.scale)
388 glm_results.bse = np.sqrt(np.diag(wls_results.cov_params(\

UnboundLocalError: local variable 'wls_results' referenced before assignment

In [146]: a = sm.GLM(Y[:,:-1], X, family=sm.family.Binomial())

In [147]: a.fit()

UnboundLocalError Traceback (most recent call last)

/Users/dwf/ in ()

/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/scikits.statsmodels-0.1.0b1-py2.5.egg/scikits/statsmodels/glm.pyc in fit(self, maxiter, method, tol, data_weights, scale)
384 self.iteration += 1
385 self.mu = mu
--> 386 glm_results = GLMResults(self, wls_results.params,
387 wls_results.normalized_cov_params, self.scale)
388 glm_results.bse = np.sqrt(np.diag(wls_results.cov_params(\

UnboundLocalError: local variable 'wls_results' referenced before assignment

tsa.stattools.acf confint needs checking and tests

Original Launchpad bug 668759: https://bugs.launchpad.net/statsmodels/+bug/668759
Reported by: josef-pktd (joep).

the confint example in the file for tsa.stattools.acf didn't work, shape mismatch exception

I fixed the example how it looked right to me, but this needs verification, and it looks like there are no tests for it. Currently my branch only.

I didn't see that the standard errors/variance themselves are returned anywhere.

Maybe it would be better now to turn acf with statistics into a class, with lazily evaluated (cached attributes) instead of many options for various returns.

genfromdta() fails under Win7

Original Launchpad bug 602426: https://bugs.launchpad.net/statsmodels/+bug/602426
Reported by: kfor (Kyle Foreman).

genfromdta() (in both statsmodels release 0.2.0 and the trunk as of July 6, 2010) fail when reading in files under Windows 7. Same script with same data file works fine on Linux. The test file I've been using is attached, code (for release 0.2.0) is below:

import scikits.statsmodels
scikits.statsmodels.lib.io.genfromdta('./genfromdta_test.dta')

error: unpack requires a string argument of length 4 (line 349 of io.pyc)

Replace mutables used as defaults in functions with placeholder defaults

Original Launchpad bug 562874: https://bugs.launchpad.net/statsmodels/+bug/562874
Reported by: m-j-a-crowe (Mike Crowe).

There are places where we have got class constructor calls in the
function defaults,

e.g.

def __init__(self, endog, exog, family=family.Gaussian()): 

The Gaussian constructor will be called when the module is imported, not when the function is called, therefore all calls to init that use the default will be sharing the same instance of the default object.

If we want the default evaluated when it called, then you need
something like:

def __init__(self, endog, exog, family=None): 
    if family is None: 
         family = family.Gaussian() 

or

def __init__(self, endog, exog, family=family.Default()): 
    if family = family.Default(): 
          family = family.Gaussian()

before release: test with older versions of numpy, scipy and with EPD

Original Launchpad bug 706053: https://bugs.launchpad.net/statsmodels/+bug/706053
Reported by: josef-pktd (joep).

Wes on mailing list 2011-01-21

test suite hangs with bfgs in EPD, numpy 1.4 scipy 0.8

latest 0.3-devel revision 2105

Hanging here in the call to fit:

class TestLogitBFGS(CheckModelResults, CheckMargEff):
@classmethod
def setupClass(cls):
data = sm.datasets.spector.load()
data.exog = sm.add_constant(data.exog)
res2 = Spector()
res2.logit()
cls.res2 = res2
cls.res1 = Logit(data.endog, data.exog).fit(method="bfgs",
disp=0)

t_test enhancement

Original Launchpad bug 647774: https://bugs.launchpad.net/statsmodels/+bug/647774
Reported by: josef-pktd (joep).

model.t_test only allows test linear function equal to zero, we also want to test eg. coefficient=1

might be as easy as adding "- rhs_array" in models.py but we need to test the required dimension of rhs_array, because we allow for simultaneous t_test, so rhs doesn't need to be scalar

_effect = np.dot(r_matrix, self.params) - rhs_array

the analogous change for f_test has been already made

inconsistent signatures in predict

Original Launchpad bug 677914: https://bugs.launchpad.net/statsmodels/+bug/677914
Reported by: josef-pktd (joep).

model.Model.predict has design as argument
regression.GLS has exog and params=None as arguments

calculations in RegressionResults require/use params when calling predict

example:
NonlinearLS subclasses Model and uses RegressionResults, calling bse, or cov_params raises exception in wresid
TypeError: predict() takes exactly 2 arguments (3 given)

I will adjust the signature in model.Model

incomplete installation

Original Launchpad bug 765712: https://bugs.launchpad.net/statsmodels/+bug/765712
Reported by: rosanna-smith (rosanna smith).

this is probably not a bug, but i don't know where else to ask.

when i installed scikits.statsmodels, it appeared to install, with this message:

easy_install scikits.statsmodelsSearching for scikits.statsmodels
Best match: scikits.statsmodels 0.2.0
Processing scikits.statsmodels-0.2.0-py2.6.egg
scikits.statsmodels 0.2.0 is already the active version in easy-install.pth

Using /Library/Python/2.6/site-packages/scikits.statsmodels-0.2.0-py2.6.egg
Processing dependencies for scikits.statsmodels
Finished processing dependencies for scikits.statsmodels

but when i try to run code such as:

import scikits.statsmodels as sm

i get this message:

ImportError: No module named scikits.statsmodels

Can you direct me to documentation to show how to properly install sckkits.statsmodels?

check for possible 0*log(0)

Original Launchpad bug 681444: https://bugs.launchpad.net/statsmodels/+bug/681444
Reported by: josef-pktd (joep).

discretemod still has several cases that could possibly end up with 0*log(0)

Poisson is protected with np.clip

many other logs are unprotected, I don't think we have test cases, or how close to zero or one it is possible to get in different cases.

exog*beta is a float type and might not get close enough to zero to create a problem (?)

discretemod predict typo

Original Launchpad bug 659923: https://bugs.launchpad.net/statsmodels/+bug/659923
Reported by: josef-pktd (joep).

in a branch not exercised there is a typo .results instead of _results

>>> logit_mod.predict(linear=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\josef\eclipsegworkspace\statsmodels-josef-experimental-gsoc\scikits\statsmodels\discretemod.py", line 162, in predict
    return np.dot(exog, self.results.params)
AttributeError: 'Logit' object has no attribute 'results'
>>>

add compare methods to Results

Original Launchpad bug 677947: https://bugs.launchpad.net/statsmodels/+bug/677947
Reported by: josef-pktd (joep).

see thread starting Oct17 2010, Comparing models

proposal by Thomas Wiecki

http://bazaar.launchpad.net/~josef-pktd/statsmodels/statsmodels-josef-experimental-gsoc/revision/2186

For now I made two methods out of it compare_f_test and
compare_lr_test both attached to regression results
I think, compare_lr_test will go higher up in the class hierarchy so
that it is also available for maximum likelihood models.

I like the more explicit names, but shorter names might be possible

In the simple example, compare_f_test produces exactly the same result
as f_test with the corresponding restriction matrix, as it should for
linear models. Likelihood Ratio test results are close to f_test but I
don't have an exact test for it yet.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.