statsmodels / statsmodels Goto Github PK

Statsmodels: statistical modeling and econometrics in Python

Home Page: http://www.statsmodels.org/devel/

License: BSD 3-Clause "New" or "Revised" License

Python 92.79% Assembly 0.06% AGS Script 2.89% HTML 0.93% R 0.67% Stata 0.32% C 0.01% MATLAB 0.63% Shell 0.16% Batchfile 0.01% Fortran 0.11% Cython 1.43%

python statistics econometrics data-analysis generalized-linear-models timeseries-analysis regression-models count-model data-science forecasting

statsmodels's Introduction

About statsmodels

statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics and estimation and inference for statistical models.

Documentation

The documentation for the latest release is at

https://www.statsmodels.org/stable/

The documentation for the development version is at

https://www.statsmodels.org/dev/

Recent improvements are highlighted in the release notes

https://www.statsmodels.org/stable/release/

Backups of documentation are available at https://statsmodels.github.io/stable/ and https://statsmodels.github.io/dev/.

Main Features

Linear regression models:
- Ordinary least squares
- Generalized least squares
- Weighted least squares
- Least squares with autoregressive errors
- Quantile regression
- Recursive least squares
Mixed Linear Model with mixed effects and variance components
GLM: Generalized linear models with support for all of the one-parameter exponential family distributions
Bayesian Mixed GLM for Binomial and Poisson
GEE: Generalized Estimating Equations for one-way clustered or longitudinal data
Discrete models:
- Logit and Probit
- Multinomial logit (MNLogit)
- Poisson and Generalized Poisson regression
- Negative Binomial regression
- Zero-Inflated Count models
RLM: Robust linear models with support for several M-estimators.
Time Series Analysis: models for time series analysis
- Complete StateSpace modeling framework
  - Seasonal ARIMA and ARIMAX models
  - VARMA and VARMAX models
  - Dynamic Factor models
  - Unobserved Component models
- Markov switching models (MSAR), also known as Hidden Markov Models (HMM)
- Univariate time series analysis: AR, ARIMA
- Vector autoregressive models, VAR and structural VAR
- Vector error correction model, VECM
- exponential smoothing, Holt-Winters
- Hypothesis tests for time series: unit root, cointegration and others
- Descriptive statistics and process models for time series analysis
Survival analysis:
- Proportional hazards regression (Cox models)
- Survivor function estimation (Kaplan-Meier)
- Cumulative incidence function estimation
Multivariate:
- Principal Component Analysis with missing data
- Factor Analysis with rotation
- MANOVA
- Canonical Correlation
Nonparametric statistics: Univariate and multivariate kernel density estimators
Datasets: Datasets used for examples and in testing
Statistics: a wide range of statistical tests
- diagnostics and specification tests
- goodness-of-fit and normality tests
- functions for multiple testing
- various additional statistical tests
Imputation with MICE, regression on order statistic and Gaussian imputation
Mediation analysis
Graphics includes plot functions for visual analysis of data and model results
I/O
- Tools for reading Stata .dta files, but pandas has a more recent version
- Table output to ascii, latex, and html
Miscellaneous models
Sandbox: statsmodels contains a sandbox folder with code in various stages of development and testing which is not considered "production ready". This covers among others
- Generalized method of moments (GMM) estimators
- Kernel regression
- Various extensions to scipy.stats.distributions
- Panel data models
- Information theoretic measures

How to get it

The main branch on GitHub is the most up to date code

https://www.github.com/statsmodels/statsmodels

Source download of release tags are available on GitHub

https://github.com/statsmodels/statsmodels/tags

Binaries and source distributions are available from PyPi

https://pypi.org/project/statsmodels/

Binaries can be installed in Anaconda

conda install statsmodels

Getting the latest code

Installing the most recent nightly wheel

The most recent nightly wheel can be installed using pip.

python -m pip install -i https://pypi.anaconda.org/scientific-python-nightly-wheels/simple statsmodels --upgrade --use-deprecated=legacy-resolver

Installing from sources

See INSTALL.txt for requirements or see the documentation

https://statsmodels.github.io/dev/install.html

Contributing

Contributions in any form are welcome, including:

Documentation improvements
Additional tests
New features to existing models
New models

https://www.statsmodels.org/stable/dev/test_notes

for instructions on installing statsmodels in editable mode.

License

Modified BSD (3-clause)

Discussion and Development

Discussions take place on the mailing list

https://groups.google.com/group/pystatsmodels

and in the issue tracker. We are very interested in feedback about usability and suggestions for improvements.

Bug Reports

Bug reports can be submitted to the issue tracker at

https://github.com/statsmodels/statsmodels/issues

statsmodels's People

Contributors

Stargazers

Watchers

Forkers

josef-pkt jseabold wesm bartbkr chrisjordansquire neurodebian matthew-brett dieterv77 smc77 sgenoud jeffhsu3 taliastocks rgommers zed takluyver slojo404 arokem crp jwkvam bcui6611 divyanshubandil j-grana6 changhiskhan gpanterov bendmorris pprett andreas-h awblocker westurner langmore jbpoline edtenerife invinciblejha phobson guyrt sjsrey eph dmcdougall jonathan-taylor ucyixl escheffel virgilefritsch timmie dengemann pdevlieger zhisheng r0k3 joaonatali yangls06 ballacky13 danielballan code-fish greedo jacoblsmith mrtomato8 enricogiampieri ronncc daniel-b-smith jonbaer wabu solvi jmralves pyzen spencerogden dvabhishek toobaz mdelaurentis joonro yiwang djmarais zed9 rogerlew tyleha ankit-maverick chadfulton anamp jarrodmillman jburroni quoctran adam-m-mcelhinney mattions joehooper avishaylivne itsbanderson padarn jankatins rcdenne adriangebe rc djsutherland fnbellomo ev-br ajmarks thobardin tomekla bavardage tanecho0025 jsmonteiro devkhokhar lucciano

statsmodels's Issues

adfuller problem in autolag

Original Launchpad bug 661793: https://bugs.launchpad.net/statsmodels/+bug/661793
Reported by: josef-pktd (joep).

running tsa.stattools as a script shows that the call to autolag uses variables that are not defined

trying to fix it blindly doesn't work

    from var import AR  #move to top ?
    fullRHS = xdall   #JP: just guessing
    lagstart = 1
    icbest, bestlag = _autolag(AR, xdshort, fullRHS, lagstart,
            maxlag, autolag)

raise ValueError("Exogenous variables are not supported for AR.")

ValueError: Exogenous variables are not supported for AR.

using
icbest, bestlag = _autolag(AR, xdshort, None, lagstart,
maxlag, autolag)

I get

results[lag] = mod_instance.fit(*fitargs, **{maxlag:lag})

TypeError: fit() keywords must be strings

and adfuller should be renames

Convergence Failure with Newton, Bug in discrete GLM Poisson model

Original Launchpad bug 673197: https://bugs.launchpad.net/statsmodels/+bug/673197
Reported by: hanno-spreeuw (Hanno Starling).

updated diagnosis:
Newton, the default optimizer, does not converge to the correct solution. I didn't look at the details but I guess the stepsize selection is not robust. The gradient in the example is very large. The example converges with Nelder-Mead.

original description:
The attachment has three columns with discrete data. Try scikits.statsmodels.discretemod.Poisson on the data.
The residuals of the middle column will all be negative, which is wrong.
This is because the slope is very close to zero which causes some bug in the algorithm.
The first and last column give good results.

Reminder that chain_dot has two typos.

Original Launchpad bug 571457: https://bugs.launchpad.net/statsmodels/+bug/571457
Reported by: jsseabold (Skipper Seabold).

sm.tools.chain_dot has a typo in the docs for the creation of the C matrix. Also matrices was not changed to arrs when it was moved over. This needs to be fixed (and add a test).

numdifftools dependency

Original Launchpad bug 653902: https://bugs.launchpad.net/statsmodels/+bug/653902
Reported by: vincent-vincentdavis (Vincent Davis).

statsmodels/init.py imports tsa
Which then returns an exception from statsmodels/tsa/var.py "raise Warning("You need to install numdifftools to try out the AR model")"
Should numdifftools be a dependency for all of statsmodels ?

add hasconstant indicator for R-squared and df calculations

Original Launchpad bug 574004: https://bugs.launchpad.net/statsmodels/+bug/574004
Reported by: josef-pktd (joep).

see also Bug #440151: fvalue and mse_model are -inf if only 1 regressor

The R-squared and df calculations assume that there is a constant among the regressors.

todo: add a hasconstant indicator and adjust calculations for df and R-squared

Notes: pandas did recently a similar change

R-squared is not really well defined without constant and there are several competing definitions, but for the simple case of regression without constant we should use the total sum of squares instead of mean corrected ???

I would keep the R-squared with constant for transformed regressors that loses the constant for the regression, (example of heteroscedasticity).

perfect fit (in rlm)

Original Launchpad bug 790770: https://bugs.launchpad.net/statsmodels/+bug/790770
Reported by: josef-pktd (joep).

look at what happens in the models, model results when we have a perfect fit, and decide what to do in this cases
example for rlm, but there might be similar problems in other cases

mailinglist 2011-05-31
for rlm having a zero residual doesn't cause an exception anymore, but having a perfect fit can produce nans in some results (scale=0 and 0/0 division)

import numpy as np
import scikits.statsmodels.api as sm
x = np.array([0, 0.96, 2.18])
levels = np.array([2.8, 2.8, 2.8])
res1 = sm.RLM(endog=levels,exog=sm.add_constant(x),M=sm.robust.norms.HuberT()).fit()
print res1.params
print res1.sresid
print res1.scale

>>> print res1.params
[  2.77555756e-17   2.80000000e+00]
>>> print res1.sresid
[ nan  nan  nan]
>>> print res1.scale
0.0

sphinx incorrectly render cache decorated results

Original Launchpad bug 498024: https://bugs.launchpad.net/statsmodels/+bug/498024
Reported by: jsseabold (Skipper Seabold).

All of the results properties do not correctly render in the docs now that all of the results are cached. See for example GLMResults. Not sure yet if just adding docstrings to the properties would fix this.

arma singular matrix

Original Launchpad bug 800012: https://bugs.launchpad.net/statsmodels/+bug/800012
Reported by: josef-pktd (joep).

import numpy as np
import scikits.statsmodels.api as sm

singular matrix with zero autocorrelation timeseries in ARMA ?

exog = sm.add_constant(np.random.randn(100, 2), prepend=True)
endog = exog.sum(1) + 0.2 * np.random.randn(100)

modarma = sm.tsa.ARMA(endog)
resarma = modarma.fit(order=(1,1))

Optimization terminated successfully.
Current function value: 6.341930
Iterations 12
Traceback (most recent call last):
File "", line 1, in
File "sm_overview.py", line 36, in
resarma = modarma.fit(order=(1,1))
File "E:\Josef\eclipsegworkspace\statsmodels-git\statsmodels-josef\scikits\statsmodels\tsa\arima_model.py", line 375, in fit
bounds=bounds, iprint=disp)
File "C:\Python26\lib\site-packages\scipy\optimize\lbfgsb.py", line 196, in fmin_l_bfgs_b
f, g = func_and_grad(x)
File "C:\Python26\lib\site-packages\scipy\optimize\lbfgsb.py", line 142, in func_and_grad
f = func(x, args)
File "E:\Josef\eclipsegworkspace\statsmodels-git\statsmodels-josef\scikits\statsmodels\tsa\arima_model.py", line 343, in
loglike = lambda params: -self.loglike_kalman(params)
File "E:\Josef\eclipsegworkspace\statsmodels-git\statsmodels-josef\scikits\statsmodels\tsa\arima_model.py", line 203, in loglike_kalman
return KalmanFilter.loglike(params, self)
File "E:\Josef\eclipsegworkspace\statsmodels-git\statsmodels-josef\scikits\statsmodels\tsa\kalmanf\kalmanfilter.py", line 632, in loglike
Q_0 = dot(inv(identity(m*2)-kron(T_mat,T_mat)),
File "C:\Python26\lib\site-packages\numpy\linalg\linalg.py", line 445, in inv
return wrap(solve(a, identity(a.shape[0], dtype=a.dtype)))
File "C:\Python26\lib\site-packages\numpy\linalg\linalg.py", line 328, in solve
raise LinAlgError, 'Singular matrix'
numpy.linalg.linalg.LinAlgError: Singular matrix

scikits.statsmodels is not released yet

Original Launchpad bug 417221: https://bugs.launchpad.net/statsmodels/+bug/417221
Reported by: josef-pktd (joep).

there is no scikits.statsmodels on pypi

IndentationError in generalized_linear_model.py and test_mvt_pdf AssertionError

There is an IndentationError in genmod/generalized_linear_model.py, line 917 has to be commented out.

Also running the tests I got four warnings and one fail (besides the fails from the IndentationError):

In [2]: scikits.statsmodels.test()
Running unit tests for scikits.statsmodels
NumPy version 1.6.1
NumPy is installed in /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy
Python version 2.7.2 (default, Aug 23 2011, 11:38:07) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)]
nose version 1.0.0
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scikits.statsmodels-0.3.0-py2.7.egg/scikits/statsmodels/tools/tools.py:256: FutureWarning: The default of `prepend` will be changed to True in the next release, use explicit prepend
  "next release, use explicit prepend", FutureWarning)
S............................................................................S.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................S.......S.........S.......S........S.........S.......S........S............../opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scikits.statsmodels-0.3.0-py2.7.egg/scikits/statsmodels/robust/scale.py:176: RuntimeWarning: divide by zero encountered in divide
  subset = np.less_equal(np.fabs((a - mu)/scale), self.c)
.........../opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/distributions.py:3834: RuntimeWarning: overflow encountered in double_scalars
  Px /= sqrt(r*pi)*(1+(x**2)/r)**((r+1)/2)
..F.....Warning: The algorithm does not converge.  Roundoff error is detected
  in the extrapolation table.  It is assumed that the requested tolerance
  cannot be achieved, and that the returned result (if full_output = 1) is 
  the best which can be obtained.
................................................................................................/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scikits.statsmodels-0.3.0-py2.7.egg/scikits/statsmodels/tools/tools.py:310: RuntimeWarning: divide by zero encountered in divide
  return np.where(test, 0, 1. / X)
.............................................SSSSS...................................................................................................................................................................................................................................................................................................................................................................................................................
======================================================================
FAIL: scikits.statsmodels.sandbox.distributions.tests.test_multivariate.TestMVDistributions.test_mvt_pdf
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/case.py", line 187, in runTest
    self.test(*self.arg)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scikits.statsmodels-0.3.0-py2.7.egg/scikits/statsmodels/sandbox/distributions/tests/test_multivariate.py", line 161, in test_mvt_pdf
    decimal=18)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 452, in assert_almost_equal
    return assert_array_almost_equal(actual, desired, decimal, err_msg)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 800, in assert_array_almost_equal
    header=('Arrays are not almost equal to %d decimals' % decimal))
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 636, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 18 decimals

(mismatch 66.6666666667%)
 x: array([ 0.00072768,  0.00099806,  0.00276614])
 y: array([ 0.00072768,  0.00099806,  0.00276614])

----------------------------------------------------------------------
Ran 1176 tests in 108.226s

FAILED (SKIP=15, failures=1)
Out[2]: <nose.result.TextTestResult run=1176 errors=0 failures=1>

make link argument in Family._setlink more flexible

Original Launchpad bug 422327: https://bugs.launchpad.net/statsmodels/+bug/422327
Reported by: josef-pktd (joep).

setlink checks for class instance, both in general and for specific instances and raises a ValueError

if not isinstance(link, L.Link):
validlink = link in self.links

especially this includes the use of Power with different values (reported by PierreGM)

Proposal:

convert Exception to warning: since it's up to the user to decide which links to use, using the "wrong" links might not make much statistical sence but shouldn't break the functioning of the model.

make check for specific links by name attribute

allow for user specified links, not just links that are derived from the predefined links, e.g. extension to BoxCox transformation.

summary method in glm and rlm

Original Launchpad bug 528757: https://bugs.launchpad.net/statsmodels/+bug/528757
Reported by: josef-pktd (joep).

summary method is currently missing in glm and rlm

Skippers proposal is to use sandbox/output.py for new style of producing printable output

test with different dtypes, was: family.Binomial integer division

Original Launchpad bug 675631: https://bugs.launchpad.net/statsmodels/+bug/675631
Reported by: josef-pktd (joep).

family.Binomial.initialize uses integer division, that produces an exception when endog is an integer.

fixed in my branch josef-gsoc and will merge to devel

return y*1./self.n

we need tests for dtypes of inputs. I don't think we have many checks for these

error case reported by Sol on mailinglist 2010-11-15

make all-pdf failure

Original Launchpad bug 532870: https://bugs.launchpad.net/statsmodels/+bug/532870
Reported by: nwagner ([email protected],de).

[2]
Chapter 1.
[3] [4]
Chapter 2.
(/usr/share/texmf/tex/latex/psnfss/ts1ptm.fd) [5] [6] [7] [8]
Underfull \hbox (badness 10000) in paragraph at lines 518--521
\T1/ptm/m/n/10 Biopython is a set of tools for bi-o-log-i-cal com-pu-ta-tion []
[]http://biopython.org/wiki/Main_Page[][] Li-cense:
[9] [10] [11] [12] [13] [14]
Overfull \hbox (22.60109pt too wide) in paragraph at lines 1125--1126
[][]
[15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25]

! LaTeX Error: Something's wrong--perhaps a missing \item.

See the LaTeX manual or LaTeX Companion for explanation.
Type H for immediate help.
...

l.2347 \end{tabulary}

?
! Missing \endgroup inserted.

\endgroup
l.2347 \end{tabulary}

I am using

revno: 1955
committer: joep [email protected]
branch nick: statsmodels_trunk3
timestamp: Mon 2010-02-15 12:54:04 -0500
message:
small changes to install notes, add sandbox/tests folder to manifest.in

on opensuse11.2

arma with exog

Original Launchpad bug 800011: https://bugs.launchpad.net/statsmodels/+bug/800011
Reported by: josef-pktd (joep).

is this a bug?

import numpy as np
import scikits.statsmodels.api as sm

exog = sm.add_constant(np.random.randn(100, 2), prepend=True)
endog = exog.sum(1) + 0.2 * np.random.randn(100)
modarma = sm.tsa.ARMA(endog, exog)
resarma = modarma.fit(order=(1,1))

Optimization terminated successfully.
         Current function value: 5.533772
         Iterations 12
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "sm_overview.py", line 36, in <module>
    resarma = modarma.fit(order=(1,1))
  File "E:\Josef\eclipsegworkspace\statsmodels-git\statsmodels-josef\scikits\statsmodels\tsa\arima_model.py", line 359, in fit
    start_params = self._fit_start_params((k_ar,k_ma,k))
  File "E:\Josef\eclipsegworkspace\statsmodels-git\statsmodels-josef\scikits\statsmodels\tsa\arima_model.py", line 76, in _fit_start_params
    start_params[:k] = ols_params
ValueError: shape mismatch: objects cannot be broadcast to a single shape

Logit and glm don't raise warning for perfect separation case

Original Launchpad bug 562376: https://bugs.launchpad.net/statsmodels/+bug/562376
Reported by: josef-pktd (joep).

see thread "discretemod.Logit vs glm" 2010-04-13

If a logit model has perfect separation then the MLE does not exist or converge

sm.GLM(Endog, Exog, family = sm.family.Binomial())

has a zero division warning and some nans in some results

sm.discretemod.Logit(Endog, Exog)

gives no indication that the optimization stopped because of maxiter and not because of convergence

As in SAS, we should check convergence and the resulting estimate whether we are in the case of perfect separation and then issue a warning and add a flag to the results.

figure out how to do internal link between rst files in sphinx

Original Launchpad bug 561604: https://bugs.launchpad.net/statsmodels/+bug/561604
Reported by: josef-pktd (joep).

in docs/index.rst

Development: See our developer's page <developernotes.html>_.

currently directly refers to the html page.

I don't think this link works with htmlhelp or pdf rendering.

This needs to be replaced with correct internal links, but I don't know yet what the format is.

OLS fails with 1 exogenous variable

Original Launchpad bug 618283: https://bugs.launchpad.net/statsmodels/+bug/618283
Reported by: changshe (Chang She).

In [1]: import numpy as np

In [2]: import numpy.random as rand

In [3]: x, y = rand.normal(0.5, 1.3, 1200), rand.normal(1.3, 2.5, 1200)

In [4]: from scikits.statsmodels.regression import OLS

In [5]: OLS(y, x).fit()

ValueError Traceback (most recent call last)

H:\Workspace\Python\src in ()

Q:\GAARD\Prod\Apps\PythonRuntime\1.9\lib\site-packages\scikits.statsmodels-0.1.0b1-py2.5
\statsmodels\regression.pyc in fit(self)
226 #TODO: add a full_output keyword so that only light results needed for
227 # IRLS are calculated?
--> 228 beta = np.dot(self.pinv_wexog, self.wendog)
229 # should this use lstsq instead?
230 # worth a comparison at least...though this is readable

ValueError: matrices are not aligned

In [6]: %debug

q:\gaard\prod\apps\pythonruntime\1.6\lib\site-packages\scikits.statsmodels-0.1.0b1-py2.5.egg\scikits\statsmodels\regression.py(228)fit()
227 # IRLS are calculated?
--> 228 beta = np.dot(self.pinv_wexog, self.wendog)
229 # should this use lstsq instead?

ipdb> self.pinv_wexog.shape
(1200, 1)
ipdb> self.wendog.shape
(1200,)
ipdb>

dev first guess returns nan for Poisson family GLM

Original Launchpad bug 603306: https://bugs.launchpad.net/statsmodels/+bug/603306
Reported by: amcmorl (amcmorl).

Poisson famiy GLM fit to data with 0s in endog variable fails with error:

"ValueError: The first guess on the deviance function returned a nan. This could be a boundary problem and should be reported."

Problem can be reproduced with following code:

import numpy as np
import scikits.statsmodels as sm

size = 1e5
nbeta = 3

beta = np.random.rand(nbeta)
x = np.random.rand(size, nbeta)
y = np.random.poisson(np.dot(x, beta))

fam = sm.families.Poisson()
glm = sm.GLM(y,x, family=fam)
res = glm.fit()

If family=fam is left out of GLM constructor, then glm.fit() works as expected.

kernel density

Original Launchpad bug 717511: https://bugs.launchpad.net/statsmodels/+bug/717511
Reported by: josef-pktd (joep).

I need a kde estimator for quick check of residuals, univariate is enough for diagnostic checks on residuals, but I need to be able to experiment with the bandwidth interactively.

Where is it?

Get the subclass of scipy.stats.gaussian_kde with bandwidth option into statsmodels, or change class in scipy.
sandbox/nonparametric needs testing

GLS.fit() doesn't attach results to self

Original Launchpad bug 428895: https://bugs.launchpad.net/statsmodels/+bug/428895
Reported by: josef-pktd (joep).

GLS.fit() doesn't attach results to self, as a consequence predict fails because of missing results
GLS.results works correctly

Shall we downgrade the fit() method and recommend the use of GLS.results instead, or keep duplicate functionality between fit (method) and results (property) ?

res = sm.OLS(y, X).fit()
res.model.predict(X)
Traceback (most recent call last):
File "c:\josef\eclipsegworkspace\statsmodels_trunk2\scikits\statsmodels\regression.py", line 270, in predict
raise ValueError, "If the model has not been fit, then you must specify the params argument."
ValueError: If the model has not been fit, then you must specify the params argument.

GenericLikelihoodModel and fittedvalues

Original Launchpad bug 717510: https://bugs.launchpad.net/statsmodels/+bug/717510
Reported by: josef-pktd (joep).

GenericLikelihoodModel and its results do not define fittedvalues

Check what minimal methods a subclass has to implement, so we can have some fittedvalues. The meaning of fittedvalues is not obvious in some non-linear or non-normal models.

Is defining a predict method in the model subclass enough? Is there some generic support?

example Markov switching model with AR terms in each state: estimation with GenericLikelihoodModel seems to work fine although it's slow, but how do I get prediction, fitted values and residuals?

Discrete model results don't have any residuals.

Original Launchpad bug 584064: https://bugs.launchpad.net/statsmodels/+bug/584064
Reported by: jsseabold (Skipper Seabold).

That's it. Note to self.

glm Binomial loglike 0log0 error

Original Launchpad bug 680077: https://bugs.launchpad.net/statsmodels/+bug/680077
Reported by: josef-pktd (joep).

reported by Sol 2010-11-22

sm.GLM(y.astype(float),
sm.add_constant(XYP.astype(float), prepend=True),
family=sm.families.Binomial()).fit()
Traceback (most recent call last):
File "<pyshell#37>", line 3, in
family=sm.families.Binomial()).fit()
File "c:\josef\eclipsegworkspace\statsmodels-josef-experimental-gsoc\scikits\statsmodels\glm.py", line 388, in fit
returned a nan. This could be a boundary problem and should be reported."
ValueError: The first guess on the deviance function returned a nan. This could be a boundary problem and should be reported.

bsejac raises linalg error singular matrix

Original Launchpad bug 703401: https://bugs.launchpad.net/statsmodels/+bug/703401
Reported by: josef-pktd (joep).

after refactoring changes bsejac raises linalg error singular matrix

found when correcting ex_arma.py
http://bazaar.launchpad.net/~josef-pktd/statsmodels/statsmodels-josef-experimental-gsoc/revision/2230

investigate whether it is a bug and write some regression tests so we know when refactoring breaks things

asanyarray coercion in Model class breaks sparse matrices

Original Launchpad bug 521229: https://bugs.launchpad.net/statsmodels/+bug/521229
Reported by: jsseabold (Skipper Seabold).

When we coerce with asanyarray in Model, it doesn't work with sparse matrices. It's a simple fix to check for sparse, but no time now. Otherwise, we can pass in block diagonal sparse matrices to GLS etc., if we want to and there's no other option for stacking (not convinced of this yet).

Newey-West sandwich covariance is missing

Original Launchpad bug 430788: https://bugs.launchpad.net/statsmodels/+bug/430788
Reported by: josef-pktd (joep).

As far as I have seen, the current sandwich estimators only correct for heteroscedasticity but not for autocorrelation. Adding for example Newey-West sandwich covariance would be useful for data with a time dimension.

But I haven't checked whether the current sandwich estimators work to some extend also for autocorrelation.

RLM Results do not use lazy evaluation

Original Launchpad bug 584065: https://bugs.launchpad.net/statsmodels/+bug/584065
Reported by: jsseabold (Skipper Seabold).

Fixed in my branch now.

tsa.stattools.acovf is slow and is missing nobs options

Original Launchpad bug 661222: https://bugs.launchpad.net/statsmodels/+bug/661222
Reported by: josef-pktd (joep).

I was trying an example from nitime

X.shape
(20480,)
sm.tsa.stattools.acovf(X)[:order+1] #slow

sm.tsa.stattools.acovf(X, fft=True)[:order+1] #fast

I think it would be good to add an "auto" option that picks the fastest way of calculating the acovf

Also, code completion doesn't show an option to get only a small number of acovs

t_test, f_test, model.py for normal instead of t-distribution

Original Launchpad bug 647777: https://bugs.launchpad.net/statsmodels/+bug/647777
Reported by: josef-pktd (joep).

models.py LikelihoodModelResults

t_test assumes that test statistic is t-distributed. If we only look at the asymptotic normal distribution, then we don't need df_residual.

Currently mainly a thought that needs checking.

model.py provides most of the generic methods that can be used with MLE models that are based on the asymptotic normal distribution.
see also SAS new proc PLM

essentially only params and cov_params are needed for the normal approximation.

I haven't thought yet about the dof in f_test (number of restriction is ok, but I don't know if we need df_resid.

conf_int is already overwritten by some subclasses to use the normal instead of t-distribution

rpy dependency in tests

Original Launchpad bug 422330: https://bugs.launchpad.net/statsmodels/+bug/422330
Reported by: josef-pktd (joep).

class TestWLS in test_regression uses RModel in the inti which raises a NameError when rpy is missing. If rpy is found missing, then RModel is not imported and therefore not available.

Solution:
Move the call to RModel to the setup method, as we did for all other test classes.

missing files in binary distribution

Original Launchpad bug 522304: https://bugs.launchpad.net/statsmodels/+bug/522304
Reported by: josef-pktd (joep).

"setup.py build bdist_egg" works to build an egg for binary distribution

With the current setup.py, sandbox/tests and docs/build are not included in the egg.

setup.py needs adjustments when we distribute eggs and bdist

robust covariance for tstat, ttest, ftest,

Original Launchpad bug 526164: https://bugs.launchpad.net/statsmodels/+bug/526164
Reported by: josef-pktd (joep).

(I haven't checked the source, just from memory)

We provide sandwich estimators, but they cannot directly be used for testing.

enhancement proposal:
find a way to allow setting the used parameter covariance matrix for the various test measures, t(), f_test, t_test and other result statistics that depend on the parameter covariance matrix or standard errors.

e.g. option to set use_cov = HC0 or used_cov=NW once we have Newey-West also.

not clear: as new attribute or as a keyword option to t_test, f_test.

Naming of "fittedvalues" inconsistent

Original Launchpad bug 504782: https://bugs.launchpad.net/statsmodels/+bug/504782
Reported by: adrian-schlatter (Adrian Schlatter).

The fit() objects returned by OLS(...).fit() and RLM(...).fit() differ in naming for fitted values.

OLS uses ".fittedvalues" while RLM uses ".fitted_values" (note the underscore).

I expect the naming to be the same in both objects.

Code to reproduce the bug:

import numpy as np
import scikits.statsmodels as sm

nsample=100
x = np.linspace(0,10, nsample)
X = sm.tools.add_constant(x)
beta = np.array([1, 0.1])
y = np.dot(X, beta) + np.random.normal(size=nsample)

resOLS = sm.OLS(y, X).fit()
resRLM = sm.RLM(y, X).fit()

dir(resOLS)[39]
dir(resRLM)[28]

design cannot be n x 1

Original Launchpad bug 434407: https://bugs.launchpad.net/statsmodels/+bug/434407
Reported by: jsseabold (Skipper Seabold).

Just so I remember. GLS does not currently work for a n x 1 design array.

import scikits.statsmodels as sm
data = sm.datasets.longley.Load()
data.exog = sm.add_constant(data.exog)
ols_res = sm.OLS(data.endog, data.exog).fit()
res = ols_res.resid
res_regression = sm.OLS(res[1:],res[:1]).fit()

ValueError: matrices are not aligned

The one time I tried to work on this, it took a little more attention than I expected or had time for.

UnboundLocalError: local variable 'wls_results' referenced before assignment (glm.py, 386)

Original Launchpad bug 422216: https://bugs.launchpad.net/statsmodels/+bug/422216
Reported by: david-warde-farley (David Warde-Farley).

It says that binomial family models can take 2D responses (presumably for multinomial regression?). Neither appears to work; am I doing something wrong?

Exactly what it says above.

In [141]: Y = zeros((700, 3))

In [142]: X = randn(700, 10)

In [143]: Y[arange(700),random_integers(3,size=700)-1] = 1

In [144]: a = sm.GLM(Y, X, family=sm.family.Binomial())

In [145]: a.fit()

UnboundLocalError Traceback (most recent call last)

/Users/dwf/ in ()

/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/scikits.statsmodels-0.1.0b1-py2.5.egg/scikits/statsmodels/glm.pyc in fit(self, maxiter, method, tol, data_weights, scale)
384 self.iteration += 1
385 self.mu = mu
--> 386 glm_results = GLMResults(self, wls_results.params,
387 wls_results.normalized_cov_params, self.scale)
388 glm_results.bse = np.sqrt(np.diag(wls_results.cov_params(\

UnboundLocalError: local variable 'wls_results' referenced before assignment

In [146]: a = sm.GLM(Y[:,:-1], X, family=sm.family.Binomial())

In [147]: a.fit()

UnboundLocalError Traceback (most recent call last)

/Users/dwf/ in ()

UnboundLocalError: local variable 'wls_results' referenced before assignment

tsa.stattools.acf confint needs checking and tests

Original Launchpad bug 668759: https://bugs.launchpad.net/statsmodels/+bug/668759
Reported by: josef-pktd (joep).

the confint example in the file for tsa.stattools.acf didn't work, shape mismatch exception

I fixed the example how it looked right to me, but this needs verification, and it looks like there are no tests for it. Currently my branch only.

I didn't see that the standard errors/variance themselves are returned anywhere.

Maybe it would be better now to turn acf with statistics into a class, with lazily evaluated (cached attributes) instead of many options for various returns.

fitted values for glm are not correct

Original Launchpad bug 430471: https://bugs.launchpad.net/statsmodels/+bug/430471
Reported by: jsseabold (Skipper Seabold).

They should be mu, which calls the particular family.fitted rather than the dot product (exog,params). This should be fixed and tested in my next commit.

genfromdta() fails under Win7

Original Launchpad bug 602426: https://bugs.launchpad.net/statsmodels/+bug/602426
Reported by: kfor (Kyle Foreman).

genfromdta() (in both statsmodels release 0.2.0 and the trunk as of July 6, 2010) fail when reading in files under Windows 7. Same script with same data file works fine on Linux. The test file I've been using is attached, code (for release 0.2.0) is below:

import scikits.statsmodels
scikits.statsmodels.lib.io.genfromdta('./genfromdta_test.dta')

error: unpack requires a string argument of length 4 (line 349 of io.pyc)

iolib, SimpleTable is not in docs

Original Launchpad bug 800009: https://bugs.launchpad.net/statsmodels/+bug/800009
Reported by: josef-pktd (joep).

I didn't find it in my .chm which is a few commits behind

nose dependency

Original Launchpad bug 558474: https://bugs.launchpad.net/statsmodels/+bug/558474
Reported by: jsseabold (Skipper Seabold).

We had a nose dependency in stattools. Fix committed to trunk.

Replace mutables used as defaults in functions with placeholder defaults

Original Launchpad bug 562874: https://bugs.launchpad.net/statsmodels/+bug/562874
Reported by: m-j-a-crowe (Mike Crowe).

There are places where we have got class constructor calls in the
function defaults,

e.g.

def __init__(self, endog, exog, family=family.Gaussian()):

The Gaussian constructor will be called when the module is imported, not when the function is called, therefore all calls to init that use the default will be sharing the same instance of the default object.

If we want the default evaluated when it called, then you need
something like:

def __init__(self, endog, exog, family=None): 
    if family is None: 
         family = family.Gaussian()

def __init__(self, endog, exog, family=family.Default()): 
    if family = family.Default(): 
          family = family.Gaussian()

before release: test with older versions of numpy, scipy and with EPD

Original Launchpad bug 706053: https://bugs.launchpad.net/statsmodels/+bug/706053
Reported by: josef-pktd (joep).

Wes on mailing list 2011-01-21

test suite hangs with bfgs in EPD, numpy 1.4 scipy 0.8

latest 0.3-devel revision 2105

Hanging here in the call to fit:

class TestLogitBFGS(CheckModelResults, CheckMargEff):
@classmethod
def setupClass(cls):
data = sm.datasets.spector.load()
data.exog = sm.add_constant(data.exog)
res2 = Spector()
res2.logit()
cls.res2 = res2
cls.res1 = Logit(data.endog, data.exog).fit(method="bfgs",
disp=0)

docstring in sandbox functions are incomplete or messy

Original Launchpad bug 522249: https://bugs.launchpad.net/statsmodels/+bug/522249
Reported by: josef-pktd (joep).

several of the functions in the sandbox that are added to the sphinx docs have incomplete docstrings or docstrings with developer notes.

At least the docstrings exposed in sphinx should be cleaned up and missing ones added.

t_test enhancement

Original Launchpad bug 647774: https://bugs.launchpad.net/statsmodels/+bug/647774
Reported by: josef-pktd (joep).

model.t_test only allows test linear function equal to zero, we also want to test eg. coefficient=1

might be as easy as adding "- rhs_array" in models.py but we need to test the required dimension of rhs_array, because we allow for simultaneous t_test, so rhs doesn't need to be scalar

_effect = np.dot(r_matrix, self.params) - rhs_array

the analogous change for f_test has been already made

inconsistent signatures in predict

Original Launchpad bug 677914: https://bugs.launchpad.net/statsmodels/+bug/677914
Reported by: josef-pktd (joep).

model.Model.predict has design as argument
regression.GLS has exog and params=None as arguments

calculations in RegressionResults require/use params when calling predict

example:
NonlinearLS subclasses Model and uses RegressionResults, calling bse, or cov_params raises exception in wresid
TypeError: predict() takes exactly 2 arguments (3 given)

I will adjust the signature in model.Model

incomplete installation

Original Launchpad bug 765712: https://bugs.launchpad.net/statsmodels/+bug/765712
Reported by: rosanna-smith (rosanna smith).

this is probably not a bug, but i don't know where else to ask.

when i installed scikits.statsmodels, it appeared to install, with this message:

easy_install scikits.statsmodelsSearching for scikits.statsmodels
Best match: scikits.statsmodels 0.2.0
Processing scikits.statsmodels-0.2.0-py2.6.egg
scikits.statsmodels 0.2.0 is already the active version in easy-install.pth

Using /Library/Python/2.6/site-packages/scikits.statsmodels-0.2.0-py2.6.egg
Processing dependencies for scikits.statsmodels
Finished processing dependencies for scikits.statsmodels

but when i try to run code such as:

import scikits.statsmodels as sm

i get this message:

ImportError: No module named scikits.statsmodels

Can you direct me to documentation to show how to properly install sckkits.statsmodels?

check for possible 0*log(0)

Original Launchpad bug 681444: https://bugs.launchpad.net/statsmodels/+bug/681444
Reported by: josef-pktd (joep).

discretemod still has several cases that could possibly end up with 0*log(0)

Poisson is protected with np.clip

many other logs are unprotected, I don't think we have test cases, or how close to zero or one it is possible to get in different cases.

exog*beta is a float type and might not get close enough to zero to create a problem (?)

discretemod predict typo

Original Launchpad bug 659923: https://bugs.launchpad.net/statsmodels/+bug/659923
Reported by: josef-pktd (joep).

in a branch not exercised there is a typo .results instead of _results

>>> logit_mod.predict(linear=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\josef\eclipsegworkspace\statsmodels-josef-experimental-gsoc\scikits\statsmodels\discretemod.py", line 162, in predict
    return np.dot(exog, self.results.params)
AttributeError: 'Logit' object has no attribute 'results'
>>>

add compare methods to Results

Original Launchpad bug 677947: https://bugs.launchpad.net/statsmodels/+bug/677947
Reported by: josef-pktd (joep).

see thread starting Oct17 2010, Comparing models

proposal by Thomas Wiecki

http://bazaar.launchpad.net/~josef-pktd/statsmodels/statsmodels-josef-experimental-gsoc/revision/2186

For now I made two methods out of it compare_f_test and
compare_lr_test both attached to regression results
I think, compare_lr_test will go higher up in the class hierarchy so
that it is also available for maximum likelihood models.

I like the more explicit names, but shorter names might be possible

In the simple example, compare_f_test produces exactly the same result
as f_test with the corresponding restriction matrix, as it should for
linear models. Likelihood Ratio test results are close to f_test but I
don't have an exact test for it yet.