Giter Site home page Giter Site logo

fixedeffectmodel's Introduction

FixedEffectModel: A Python Package for Linear Model with High Dimensional Fixed Effects.

image

Downloads Downloads Downloads

FixedEffectModel is a Python Package designed and built by Kuaishou DA ecology group. It is used to estimate the class of linear models which handles panel data. Panel data refers to the type of data when time series and cross-sectional data are combined.

Main Features

  • Linear model
  • Linear model with high dimensional fixed effects
  • Difference-in-difference model with parallel checking plot
  • Instrumental variable model
  • Robust/white standard error
  • Multi-way cluster standard error
  • Instrumental variable model tests, including weak iv test (cragg-dolnald statistics+stock and yogo critical values), over-identification test (sargan/Basmann test), endogeneity test (durbin test)

For instrumental variable model, we now only provide two stage least square estimator and produce second stage regression result. In our next release we will include GMM method and robust standard error based on GMM.

Installation

Install this package directly from PyPI

$ pip install FixedEffectModel

Getting started

This very simple case-study is designed to get you up-and-running quickly with fixedeffectmodel. We will show the steps needed.

Loading modules and functions

After installing statsmodels and its dependencies, we load a few modules and functions:

import numpy as np
import pandas as pd


from fixedeffect.iv import iv2sls, ivgmm, ivtest
from fixedeffect.fe import fixedeffect, did, getfe
from fixedeffect.utils.panel_dgp import gen_data

gen_data is the function we use to simulate data.

Data

We use a simulated dataset with 100 cross-sectional units and 10 time units.

N = 100
T = 10
beta = [-3,1,2,3,4]
ate = 1
exp_date = 5
df = gen_data(N, T, beta, ate, exp_date)

Ihe the above simulated dataset, "beta" are true coefficients, "ate" is the true treatment effect, "exp_date" is the start date of experiment.

Model fit and summary

Instrumental variables estimation

We include two function: "iv2sls" and "iv2gmm" for instrumental variable regression.

iv2sls

This function return two-stage least square estimation results. Define y as the dependent variable, x_1 as exogenous variable, x_2 as endogenous variable, x_3 and x_4 are instrumental variables. id and time are cross sectional id and time id. An IV two-way fixed effect model estimated by two-stage least square is achieved by using:

formula = 'y ~ x_1|id+time|0|(x_2~x_3+x_4)'
model_iv2sls = iv2sls(data_df = df,
                      formula = formula)
result = model_iv2sls.fit()
result.summary()

or

exog_x = ['x_1']
endog_x = ['x_2']
iv = ['x_3','x_4']
y = ['y']

model_iv2sls = iv2sls(data_df = df,
                      dependent = y,
                      exog_x = exog_x,
                      endog_x = endog_x,
                      category = ['id','time'],
                      iv = iv)

result = model_iv2sls.fit()
result.summary()

The two grammars above yield identical results. We provide specification test for iv models:

ivtest(result1)

Three tests are included: weak iv test (Cragg-Dolnald statistics + Stock and Yogo critical values), over-identification test (Sargan/Basmann test), and endogeneity test (Durbin test).

ivgmm

This function returns one-step gmm estimation result. With same variables definition, estimation is achieved by:

formula = 'y ~ x_1|id+time|0|(x_2~x_3+x_4)'

model_ivgmm = ivgmm(data_df = df,
                    formula = formula)
result = model_ivgmm.fit()
result.summary()

or

exog_x = ['x_1']
endog_x = ['x_2']
iv = ['x_3','x_4']
y = ['y']

model_ivgmm = ivgmm(data_df = df,
                      dependent = y,
                      exog_x = exog_x,
                      endog_x = endog_x,
                      category = ['id','time'],
                      iv = iv)

result = model_ivgmm.fit()
result.summary()

Fixed Effect Model

This function returns fixed effect model estimation result. Define y as the dependent variable, x_1 as independent variable, id and time are cross sectional ID and time ID. Following code yield estimation of a two-way fixed effect model with two-way cluster standard error:

formula = 'y ~ x_1|id+time|id+time|0'

model_fe = fixedeffect(data_df = df,
                       formula = formula,
                       no_print=True)
result = model_fe.fit()
result.summary()

or

exog_x = ['x_1']
y = ['y']
category = ['id','time']
cluster = ['id','time']


model_fe = fixedeffect(data_df = df,
                      dependent = y,
                      exog_x = exog_x,
                      category = category,
                      cluster = cluster)

result = model_fe.fit()
result.summary()

Difference in Difference

DID is simply a specific type of fixed effect model. We provide a function of DID to help simplify the estimation process. The regular DID estimation is achieved using following command:

formula = 'y ~ 0|0|0|0'

model_did = did(data_df = df,
                formula = formula,
                treatment = ['treatment'],
                csid = ['id'],
                tsid = ['time'],
                exp_date = 2)
result = model_did.fit()
result.summary()

"exp_date" is the first date that the experiment begins, "treatment" is the column name of the treatment variable. This command estimate the equation below:

We also provide DID with individual effect:

formula = 'y ~ 0|0|0|0'

model_did = did(data_df = df,
                formula = formula,
                treatment = ['treatment'],
                group_effect='individual',
                csid = ['id'],
                tsid = ['time'],
                exp_date = 2)
result = model_did.fit()
result.summary()

This command above estimate the equation below:

Main Functions

Currently there are five main function you can call:

Function name Description Usage
fixedeffect define class for fixed effect estimation fixedeffect (data_df = None, dependent = None, exog_x = None, category = None, cluster = None, formula = None, robust = False, noint = False, c_method = 'cgm', psdef = True)
iv2sls define class for 2sls estimation iv2sls (data_df = None, dependent = None, exog_x = None, endog_x = None, iv = None, category = None, cluster = None, formula = None, robust = False, noint = False)
ivgmm define class for gmm estimation ivgmm (data_df = None, dependent = None, exog_x = None, endog_x = None, iv = None, category = None, cluster = None, formula = None, robust = False, noint = False)
did define class for did estimation did (data_df = None, dependent = None, exog_x = None, treatment = None, csid = None, tsid = None, exp_date = None, group_effect = 'treatment', cluster = None, formula = None, robust = False, noint = False, c_method = 'cgm', psdef = True)
model.fit fit pre-defined models result = model.fit()
result.summary result.object result.summary()
fit_multi_model fit multiple models models = [model,model_did,model_iv2sls], fit_multi_model (models)
getfe get fixed effects getfe(result)
ivtest get iv post estimation tests results ivtest (result)

fixedeffect

Provide results for a fixed effect model:

model = fixedeffect (data_df = None, dependent = None, exog_x = None, category = None, cluster = None, formula = None, robust = False, noint = False, c_method = 'cgm', psdef = True)

Input parameters Type Description
data_df pandas dataframe Dataframe with relevant data.
dependent list List object of dependent variables
exog_x list List object of independent variables
category list, default [] List object of category variables, i.e, fixed effect
cluster list, default [] List object of cluster variables, i.e, the cluster level of your standard error
formula string, default None Formula used to parse grammar.
robust bool, default False Whether or not to calculate df-adjusted white standard error (HC1)
noint bool, default True Whether or not generate intercept
c_method str, default 'cgm' Method to calculate multi-cluster standard error. Possible choices are 'cgm' and 'cgm2'.
psdef bool, default True if True, replace negative eigenvalue of variance matrix with 0 (only in multi-way clusters variance)

Return an object of results:

Attribute Type
params Estimated coefficients
df Degree of freedom.
bse standard error
variance_matrix coefficients' variance-covariance matrix

iv2sls/ivgmm

model = iv2sls (data_df = None, dependent = None, exog_x = None, endog_x = None, iv = None, category = None, cluster = None, formula = None, robust = False, noint = False)

model = ivgmm (data_df = None, dependent = None, exog_x = None, endog_x = None, iv = None, category = None, cluster = None, formula = None, robust = False, noint = False)

Input parameters Type Description
data_df pandas dataframe Dataframe with relevant data.
dependent list List object of dependent variables
exog_x list List object of exogenous variables
endof_x list List object of endogenous variables
iv list List object of instrumental variables
category list, default [] List object of category variables, i.e, fixed effect
formula string, default None Formula used to parse grammar.
robust bool, default False Whether or not to calculate df-adjusted white standard error (HC1)
noint bool, default True Whether or not generate intercept

Return the same object of results as fixedeffect does.

We also provide two-step GMM estimator if you set thet option "gmm2=True". Define a matrix

  • "ivgmm", the one-step GMM estimator generate with variance-covariance matrices equal

    • Unadjusted. Define , the variance-covariance matrix is
    • Heteroskedasticity robust. Define and , the variance-covariance matrix is
    • Cluster. Deine , the variance-covariance matrix is
  • "ivgmm" with "gmm2=True", the two-step GMM estimator generate

    • Unadjusted.
    • Heteroskedasticity robust. Define and as the diagonal matrix generated using the residual from the two-step GMM. , the variance-covariance matrix is
    • Cluster. Define

    , the variance-covariance matrix is .

DID

model = did (data_df = None, dependent = None, exog_x = None, treatment = None, csid = None, tsid = None, exp_date = None, group_effect = 'treatment', cluster = None, formula = None, robust = False, noint = False, c_method = 'cgm', psdef = True)

Input parameters Type Description
data_df pandas dataframe Dataframe with relevant data.
dependent list List object of dependent variables
exog_x list List object of independent variables
treatment list List object of treatment variables
csid list List object of cross sectional id variables
tsid list List object of time variables
exp_date string Experiment start date
group_effect string, default 'treatment' Either equals 'treatment' or 'individual'
cluster list, default [] List object of cluster variables, i.e, the cluster level of your standard error
formula string, default None Formula used to parse grammar.
robust bool, default False Whether or not to calculate df-adjusted white standard error (HC1)
noint bool, default True Whether or not generate intercept
c_method str, default 'cgm' Method to calculate multi-cluster standard error. Possible choices are 'cgm' and 'cgm2'.
psdef bool, default True if True, replace negative eigenvalue of variance matrix with 0 (only in multi-way clusters variance)

Return the same object of results as fixedeffect does.

fit_multi_model

This function is used to get multi results of multi models on one dataframe. During analyzing data with large data size and complicated, we usually have several model assumptions. By using this function, we can easily get the results comparison of the different models.

Input parameters Type Description
data_df pandas dataframe Dataframe with relevant data
models list, default [] List of models
table_header str, default None Title of summary table

Return a summary table of results of the different models.

getfe

This function is used to get fixed effect.

Input parameters Type Description
result object output object of fixedeffect function
epsilon double, default 1e-8 tolerance for projection
normalize bool, default False Whether or not to normalize fixed effects.
category_input list, default [] List of category variables to calculate fixed effect.

Return a summary table of estimates of fixed effects and its standard errors.

ivtest

This function is used to obtain iv test result.

Input parameters Type Description
result object output object of ivgmm/iv2sls function

Return a test result table of iv tests.

Example

# need to install from kuaishou product base
import numpy as np
import pandas as pd
from fixedeffect.iv import iv2sls, ivgmm,ivtest
from fixedeffect.fe import fixedeffect, did,getfe
from fixedeffect.utils.panel_dgp import gen_data 
from fixedeffect.iv import ivtest

N = 100
T = 10
beta = [-3,1,2,3,4]
ate = 1
exp_date = 5

#generate sample data
df = gen_data(N, T, beta, ate, exp_date)

#------------------------------#
#define instrumental variable model
# iv2sls 
formula = 'y ~ x_1|id+time|0|(x_2~x_3+x_4)'
model_iv2sls = iv2sls(data_df = df,
                      formula = formula)
result = model_iv2sls.fit()
result.summary()

# ivgmm 
formula = 'y ~ x_1|id|0|(x_2~x_3+x_4)'

model_ivgmm = ivgmm(data_df = df,
                    formula = formula)
result = model_ivgmm.fit()
result.summary()

# obtain iv test results
ivtest(result)

#------------------------------#

#define fixed effect model
exog_x = ['x_1']
y = ['y']
category = ['id','time']
cluster = ['id','time']


model_fe = fixedeffect(data_df = df,
                      dependent = y,
                      exog_x = exog_x,
                      category = category,
                      cluster = cluster)

result = model_fe.fit()
result.summary()

#obtain fixed effect 
getfe(result)

#------------------------------#
#define DID model
formula = 'y ~ 0|0|0|0'

model_did = did(data_df = df,
                formula = formula,
                treatment = ['treatment'],
                csid = ['id'],
                tsid = ['time'],
                exp_date=2)
result = model_did.fit()
result.summary()

Requirements

  • Python 3.6+
  • Pandas and its dependencies (Numpy, etc.)
  • Scipy and its dependencies
  • statsmodels and its dependencies
  • networkx

Citation

If you use FixedEffectModel in your research, please cite us as follows:

Kuaishou DA Ecology. FixedEffectModel: A Python Package for Linear Model with High Dimensional Fixed Effects.https://github.com/ksecology/FixedEffectModel,2020.Version 0.x

BibTex:

@misc{FixedEffectModel,
  author={Kuaishou DA Ecology},
  title={{FixedEffectModel: {A Python Package for Linear Model with High Dimensional Fixed Effects}},
  howpublished={https://github.com/ksecology/FixedEffectModel},
  note={Version 0.x},
  year={2020}
}

Feedback

This package welcomes feedback. If you have any additional questions or comments, please contact [email protected].

Reference

[1] Simen Gaure(2019). lfe: Linear Group Fixed Effects. R package. version:v2.8-5.1 URL:https://www.rdocumentation.org/packages/lfe/versions/2.8-5.1

[2] A Colin Cameron and Douglas L Miller. A practitioner’s guide to cluster-robust inference. Journal of human resources, 50(2):317–372, 2015.

[3] Simen Gaure. Ols with multiple high dimensional category variables. Computational Statistics & Data Analysis, 66:8–18, 2013.

[4] Douglas L Miller, A Colin Cameron, and Jonah Gelbach. Robust inference with multi-way clustering. Technical report, Working Paper, 2009.

[5] Jeffrey M Wooldridge. Econometric analysis of cross section and panel data. MIT press, 2010.

fixedeffectmodel's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

fixedeffectmodel's Issues

iv2sls() error

Hello,

when I tried to run the sample code I run into the following error.

Is my pandas version not correct for this package to run? Thanks!

image

summary() not working

Hello, I´ve been trying to perform a iv2sls regression with fixed effects with your library but I can´t get the summary of the results. I even tried it with the data from your example and it doesn´t work.
1 print
2 print
3 print
4 print

If you could please check this out I would really appreciate it. It would be really useful to be able to get the results. This seems to be the only library with decent outputs for this kind of regression in Python. Thanks.

'result.summary()' does not work

I have made some exercises of the examples, but I found that the code 'result.summary()' in each example gives some warnings.
Please help me. thx

Question: Partialout

Hi
I was looking into the command, and the helpfile. I couldnt find if there is an option to simply obtain the demean versions of the data.
For example, in Stata you could use the user written command hdfe. In Julia you have partialout as part of the package FixedEffectModel. In R, you have demean, part of fixest.
Is there a similar option with this package for python?
Thank you
Fernando

Error prevents code to run properly

The code seems to be broken when transforming estimates to strings for visualization into table. OLSFixed.py line 213:

        params_data = lzip(["%#6.5f" % float(params[i]) for i in exog_len],
                           ["%#6.5f" % float(std_err[i]) for i in exog_len],
                           ["%#6.4f" % float(tstat[i]) for i in exog_len],
                           ["%#6.4f" % float(prob_stat[i]) for i in exog_len],
                           ["%#6.4f" % float(conf_int[0][i]) for i in exog_len],
                           ["%#6.4f" % float(conf_int[1][i]) for i in exog_len])

I needed to cast the various parameters to float for the code to work again. I am using Python 3.9.6 on MacOS

Fixed Effect Regression does not work

Here, I'm doing everything fine that seems to me, but unfortunately, the package does not work as it should. I'm using the right parameters, Please have a look at the code and the error message in the attached image.
error_new

Thanks & regards
Kaleem

fixedeffect() error

hello, I have been using fixedeffect function, but there's an error when I include two category variables(userid and time).

It shows 'NameError: Total sum of square equal 0, program quit.'

The function works well when I include only one of these two category variables.
Would you please check this out? thanks a lot.

error

re: replacing NaN with zeros

Any thoughts on omitting the preprocessing step in which NaNs are set to zero? Choosing how to handle missing data is a non-trivial modeling choice and having it happen automatically might cause problems for the analyst. My preference would be to drop any row with missings, but throwing an error is also a fine way to go. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.