Giter Site home page Giter Site logo

plm's Introduction

The plm Package - Linear Models and Tests for Panel Data

CRAN status Downloads

About

plm is a package for panel data econometrics for the R statistical computing environment. The package includes functions for model estimation, testing, robust covariance matrix estimation, panel data manipulation and information. It was first published on CRAN in 2006.

Be sure to read the NEWS on CRAN for any changes in new releases (new features, bugfixes, other improvements, ...).

Non-exhaustive function overview:

  • Functions to estimate models:

    • plm: panel data estimators (within/fixed effects, random effects, between, first-difference, nested random effects), incl. instrumental-variable estimation techniques (IV) and Hausman-Taylor-style models,
    • pgmm: generalized method of moments (GMM) estimation for panel data,
    • pggls: estimation of general feasible generalized least squares models,
    • pmg: mean groups (MG), demeaned MG and common correlated effects (CCEMG) estimators,
    • pcce: estimators for common correlated effects mean groups (CCEMG) and pooled (CCEP) for panel data with common factors,
    • pvcm: variable coefficients models,
    • pldv: panel estimators for limited dependent variables.
  • Testing functions:

    • model specification (phtest, pFtest, pooltest, plmtest, pwaldtest, piest, aneweytest, mtest, sargan),
    • serial correlation (pbgtest, pwfdtest, pbnftest, pdwtest, pwartest, pbsytest, pbltest),
    • cross-sectional dependence (pcdtest),
    • panel unit root (purtest, cipstest, phansitest),
    • panel Granger (non-)causality (pgrangertest).
  • Robust covariance matrix estimators (incl. various weighting schemes for small sample adjustment):

    • vcovHC: Arellano (1987), White (1980),
    • vcovBK: Beck and Katz (1995) (PCSE),
    • vcovNW: Newey and West (1987),
    • vcovDC: double-clustering robust (Thompson (2011), Cameron et al. (2011)),
    • vcovSCC: Driscoll and Kraay (1998).
  • An enhanced data frame, called pdata.frame, to deal with data sets for which observations are identified by a combination of two indexes.

  • Panel data transformation functions (e.g., Within, Between, between, lag, lead, diff).

  • Other functions relating to panel data sets, e.g.:

    • checks for panel data dimensions (individual, time, group) and balancedness (pdim),
    • checks for panel balancedness (is.pbalanced) and consecutiveness (regularity) (is.pconsecutive) as well as functions to change data to conform to these properties (make.pbalanced, make.pconsecutive),
    • measures for unbalancedness of data (punbalancedness) (Ahrens/Pincus (1981)).

Installation

To install the released version from CRAN:

install.packages("plm")

The package's CRAN website is https://cran.r-project.org/package=plm.

The development of package plm takes place on GitHub at https://github.com/ycroissant/plm. To install the development version from GitHub, use, e.g.:

# install.packages("remotes") # remove '#' if pkg 'remotes' is not installed
remotes::install_github("ycroissant/plm")

Documentation

Package plm comes with documentation: Besides the usual help pages for each function, the vignettes provide a gentle introduction to the package and some functions. Vignettes are available at the package's CRAN website https://cran.r-project.org/package=plm and can be browsed from within R by browseVignettes("plm").

New package users are advised to start with the first vignette Panel data econometrics in R: the plm package for an overview of the package. A more in-depth treatment of estimation of error component models and instrument variable models is in the second vignette Estimation of error component models with the plm function.

Further, many textbooks treat package plm and/or use it in their examples:

  • Croissant/Millo, Panel Data Econometrics with R, 2019, John Wiley & Sons, Hoboken.

  • Kleiber/Zeileis, Applied Econometrics with R, 2008, Springer, New York. Esp. chapter 3.6.

  • Hanck/Arnold/Gerber/Schmelzer, Econometrics with R, online book https://www.econometrics-with-r.org/. Esp. chapter 10.

  • Heiss, Using R for Introductory Econometrics, 2nd edition, 2020, Independent Publishing, Düsseldorf, also available online at http://www.urfie.net/. A companion book using R to Wooldridge, Introductory Econometrics, esp. chapters 13-14.

plm's People

Contributors

michaelchirico avatar tappek avatar ycroissant avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

plm's Issues

Suggestion: use collapse::rsplit/gsplit to speed up other code

Hi Kevin, having {collapse} as a hard dependency provides many opportunities for further drop-in performance improvements, one particularly significant one that came to mind would probably be to use the faster split functions: rsplit() being a recursive version of split(), that, with arguments drop = FALSE and flatten = TRUE works just like split(). There is also a barebones function called gsplit(), which is even faster as it only works for vectors and by default does not save the names.

library(collapse)
library(microbenchmark)

v = wlddev$PCGDP
f = wlddev$iso3c
g = GRP(f)

# Vector
microbenchmark(gsplit(v, g), rsplit(v, g), rsplit(v, f), split(v, f))
#> Unit: microseconds
#>          expr     min       lq     mean   median       uq      max neval cld
#>  gsplit(v, g)  85.680 116.0250 151.1944 126.5120 166.6745  537.730   100 a  
#>  rsplit(v, g)  94.160 124.9500 165.5100 136.9995 189.4330  580.123   100 a  
#>  rsplit(v, f) 195.458 227.3640 322.0670 242.9830 270.2040 4041.218   100  b 
#>   split(v, f) 346.289 390.9145 475.0852 403.4090 489.3115 1404.788   100   c

# Data Frame
microbenchmark(rsplit(wlddev, g), rsplit(wlddev, f), split(wlddev, f))
#> Unit: milliseconds
#>               expr       min        lq      mean    median        uq        max neval cld
#>  rsplit(wlddev, g)  1.242354  1.395417  1.999074  1.533084  2.075499  11.572538   100  a 
#>  rsplit(wlddev, f)  1.318662  1.480204  1.915902  1.617872  2.362436   4.278622   100  a 
#>   split(wlddev, f) 36.952416 42.953552 52.031342 50.696393 58.052336 106.491162   100   b

Created on 2022-09-13 by the reprex package (v0.3.0)

Problem with singular covariance matrix

There is a problem with covariance matrix when regressors values' levels differ from one another.
Problem appears in few recent versions of plm package.

Code below works in some older package versions (there is no error when running it with plm package version 2.2-2) but does not work in recent plm versions, including version 2.6-0.

example.data <- read.csv2("regressors.csv")

example.data$Date <- as.Date(example.data$Date)
example.data.plm <- plm::pdata.frame(example.data, index = c("Panel.units", "Date"))

formula <-  as.formula(paste("var1 ~", paste(colnames(example.data)[-c(1:3)], collapse = "+")))

model.obj <- plm::plm(formula, example.data.plm, model = "within")

model.summary.lst <- summary(model.obj)

regressors.csv

problem with summary.pgmm

When trying to summarise results from a pgmm object there is the following error message:
Error in t(y) %*% x : non-conformable arguments

Same problem is reported at:
https://stackoverflow.com/questions/10138633/pgmm-from-plm-package-gives-error-for-summary

It only happens when T<5. The problem seems to be that summary is calculating the mtest() autocorrelation test function for orders T=1 and T=2, which is not possible whenever T=3 and T=4 (since it does both, it is only possible to get a summary for T>=5).

Solution provided in the link results in yet another error:
Error in printCoefmat(x$coefficients, digits = digits) :
'x' must be coefficient matrix/data frame

Which problably is due to the coefficients attribute of a pgmm object being a list, resulting in error in the the function printCoefmat(). However, including some kind of if statement in the summary function for pgmm may solve the issue.

Error in is.recursive(.$object) && !is.primitive(.$object) && n > 0 : 'length = 2' in coercion to 'logical(1)'

I keep getting the same error message, when I use the view command for my plm models. Summary() works. The same error message appears, when I do "Run examples":

data("Produc", package = "plm")
zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, index = c("state","year"))
View(zz)
Error in is.recursive(.$object) && !is.primitive(.$object) && n > 0 :
'length = 2' in coercion to 'logical(1)'

I am using R 4.3.3. and all packages are updated.

`detect.lindep.data.frame()` does not pass on the `suppressPrint` parameter

Hi there:

If I am not mistaken, detect.lindep.data.frame() does not pass on the suppressPrint parameter to detect.lindep.matrix(). At least for me, this causes the parameter to have no effect when calling detect.lindep() with a dataframe as first argument and suppressPrint = TRUE. The same seems to be the case for detect.lindep.plm() (untested).

II am talking about lines 205 and 215 of detect_lin_dep_alias.R. Please feel free to correct me if I am off.

Thanks for maintaining this great package!

Joachim

Output interpretation summary(<plm_object>): T = 4-10

I'm trying to estimate a simple model, but I don't know how to interpret T = 4-10. The time span is 2012-2021. Can somebody help?

Oneway (individual) effect Within Model

Call:
plm(formula = gdp_log ~ sum_log, data = pdata, model = "within")

Unbalanced Panel: n = 166, T = 4-10, N = 1644

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-0.4469225 -0.0386242  0.0063501  0.0452923  0.5530283 

Coefficients:
         Estimate Std. Error t-value  Pr(>|t|)    
sum_log 0.2814691  0.0095728  29.403 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    17.006
Residual Sum of Squares: 10.727
R-Squared:      0.36922
Adj. R-Squared: 0.29833
F-statistic: 864.541 on 1 and 1477 DF, p-value: < 2.22e-16

adding colums to pdata.frame

Suggest adding transform, with and cbind methods. Currently none of these work:

library(plm)
E <- pdata.frame(EmplUK)

transform(E, lag_emp = lag(emp)) # does not use lag.pseries; returns data.frame, not pdata.frame
transform(E, lag_emp = lag(E$emp))  # returns data.frame, not pdata.frame.  Seemingly redundant E$ must be used

with(E, cbind(E, lag_emp = lag(emp)) ) # does not use lag.pseries; returns data.frame, not pdata.frame
with(E, cbind(E, lag_emp = lag(E$emp))) # returns data.frame, not pdata.frame.  Was not able to make use of  with.

This one does work but it is not the best style with R and does not fit in with pipes. E2 had to be written 3 times to avoid overwriting E.

E2 <- E
E2$lag_emp <- lag(E2$emp)

lag much bigger than T in the pgmm example

If I run this code there are 11 times in DemocracyIncome:

data("DemocracyIncome", package = "pder")
length(unique(DemocracyIncome$year))

[1] 11

and 7 in DemocracyIncome25:

data("DemocracyIncome25", package = "pder")
length(unique(DemocracyIncome25$year))
[1] 7

I don't uderstand what does lag(democracy, 2:99) means in the examples of pgmm, how it is possible to have a model with 98 lag, when there are just 7-11 times? Where does the 2:99 comes from?

Some examples:

diff25 <- pgmm(democracỹ lag(democracy) + lag(income) |
lag(democracy, 2:99) + lag(income, 2:99),
DemocracyIncome25, model = "twosteps")
diff25coll <- pgmm(democracỹ lag(democracy) + lag(income) |
lag(democracy, 2:99)+ lag(income, 2:99),
DemocracyIncome, index=c("country", "year"),
model="twosteps", effect="twoways", subset = sample == 1,
collapse = TRUE)

Predict method for pggls

Hi I got this message while trying to predict the fgls model:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "c('pggls', 'panelmodel')"

How can I go about predicting with the fgls model?

Clean-up: remove deprecated class `pFormula`

Currently, the only package on CRAN still using pFormula is cquad (version 2.2).

Clean-up: remove deprecated class pFormula once package cquad switched to the current plm facilities without pFormula. Maintainer and developer of cquad were notified and were provided a patch in Jan 2022. Development repository of cquad incl. this patch is https://github.com/fravale/cquad_dev/.

How to predict, using plm()?

Hi team, thanks for a great package?
I check all possible resources, but do not see how to used predict() or prediction() with plm.

Currently I am trying:

predictions <- predict(plmFit1, panel_df_test)

and get

Error in crossprod(beta, t(X)) : non-conformable arguments

Difference between fitted() and predict()

I find it surprising that fitted() and predict() (w/o argument 'newdata') return different values.

Does fitted() return the fitted values of the within-transformed model, while predict() returns the values that correspond to the not-within-transformed dependent variable?

I suggest to make fitted() and predict() consistent with each other (or at least to clearly describe the difference between the two methods in the documentation of fitted.plm() and predict.plm().

The following code illustrates the difference for fixed-effects and random effects estimations:

library(plm)
data("Grunfeld", package = "plm")
Grunfeld <- pdata.frame( Grunfeld )

# fit a fixed effect model
fit.fe <- plm(inv ~ value + capital, data = Grunfeld, model = "within")
all.equal( predict( fit.fe ), fitted( fit.fe ), check.attributes = FALSE )

# fit a fixed effect model
fit.re <- plm(inv ~ value + capital, data = Grunfeld, model = "random")
all.equal( predict( fit.re ), fitted( fit.re ), check.attributes = FALSE )

Furthermore, for fixed-effect estimations, the predicted values (but not the fitted values) plus the residuals are equal to the dependent variable, while for random-effects estimations, this is neither the case for the predicted values nor for the fitted values:

all.equal( predict( fit.fe ) + residuals( fit.fe ), Grunfeld$inv )
all.equal( fitted( fit.fe ) + residuals( fit.fe ), Grunfeld$inv,
  check.attributes = FALSE)

all.equal( predict( fit.re ) + residuals( fit.re ), Grunfeld$inv,
  check.attributes = FALSE )
all.equal( fitted( fit.re ) + residuals( fit.re ), Grunfeld$inv,
  check.attributes = FALSE)

I suggest to make predict.plm() and fitted.plm() more consistent with each other and across estimation methods (e.g., fixed effects, random effects) or at least clearly describe the differences between fitted.plm() and predict.plm() and the differences between fixed-effects and random-effects estimations in the documentation of fitted.plm() and predict.plm().

Empty model problem and how to see the time variation in plm package?

Hi,

In my input data, I have 32 time periods and thousands of individual study areas. But the independent variables are 32-times repeated same data. So only the dependent variable varies over time.
In my case, I have two questions when I apply the plm package:

  1. When I set model="random" or model="within", I got this error. But when I set model="between", it runs well. Why it's like this?
    Error in plm.fit(data, model = models[1L], effect = effect) : empty model
  2. Is there any way to see the time variation through plm package, or should I use some other methods? I noticed the pvar() function to check if time variation exist, but is there any possibility to see it in more detail?

Thanks!

predict.plm for fd models

Currently predict.plm for "fd" models appears to be returning first difference predictions, rather than predictions on the original scale.

It would be helpful if there was an option ( default?) to return the values on the original scale, as well as support for 'se.fit' and 'interval' arguments like predict.lm.

Feature Request - Hausman Taylor with time effects

Currently the addition of time effects to an Hausman-Taylor model estimation
requires adding the time dummies to the model and treating those dummies as instruments.

In my current understanding, this case could be optimized using the Two-Way fixed effects and modifying slightly the code.

The calculation of the weight $\theta$ used for WLS transformation would remain the same because
the one-way within with time dummies and the two way are equivalent.

The W2SLS would be of the form:

$$ \widehat{\Omega}^{-1 / 2}W_T y_{i t}=\widehat{\Omega}^{-1 / 2}W_T X_{i t} \beta+\widehat{\Omega}^{-1 / 2}W_T Z_i \gamma+\widehat{\Omega}^{-1 / 2}W_T u_{i t} $$

and estimated using the same set of instruments as in the one-way case:

$$Z = [W_NX, B_NX_1 , Z_1]$$ for random.method == "ht".

HT model - intercept mismatch with respect to Stata

Hi,

I have noticed that Hausman-Taylor model (1981) estimated intercept coefficient are different from those in Stata xthtaylor command.
Is this behaviour intentional ?

Reproducible Example:

library(broom) 
library(plm) # v2.6-2

ht <- plm(lwage ~ wks + south + smsa + married + exp + I(exp ^ 2) +bluecol + ind + union + sex + black + ed |
  bluecol + south + smsa + ind + sex + black | 
  wks + married + union + exp + I(exp ^ 2), 
  data = Wages, index = 595, random.method = "ht", 
  model = "random", inst.method = "baltagi"
)

print(tidy(ht), n=13)
# A tibble: 13 × 5
   term         estimate std.error statistic  p.value
   <chr>           <dbl>     <dbl>     <dbl>    <dbl>
 1 (Intercept)  3.38     0.372         9.09  1.00e-19
 2 wks          0.000837 0.000600      1.40  1.63e- 1
 3 south        0.00744  0.0320        0.233 8.16e- 1
 4 smsa        -0.0418   0.0190       -2.21  2.73e- 2
 5 married     -0.0299   0.0190       -1.57  1.16e- 1
 6 exp          0.113    0.00247      45.8   0       
 7 I(exp^2)    -0.000419 0.0000546    -7.67  1.70e-14
 8 bluecol     -0.0207   0.0138       -1.50  1.33e- 1
 9 ind          0.0136   0.0152        0.893 3.72e- 1
10 union        0.0328   0.0149        2.20  2.79e- 2
11 sex         -0.131    0.127        -1.03  3.01e- 1
12 black       -0.286    0.156        -1.84  6.65e- 2
13 ed           0.138    0.0212        6.49  8.47e-11

Intercept ≃ 3.38

And in Stata (v17):

 xthtaylor lwage wks south smsa ms exp exp2 occ ind union fem blk ed, endo(ed wks ms union exp exp2) constant(ed, fem, blk)

/*
------------------------------------------------------------------------------
       lwage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
TVexogenous  |
       south |   .0074398    .031955     0.23   0.816    -.0551908    .0700705
        smsa |  -.0418334   .0189581    -2.21   0.027    -.0789906   -.0046761
         occ |  -.0207047   .0137809    -1.50   0.133    -.0477149    .0063055
         ind |   .0136039   .0152374     0.89   0.372    -.0162608    .0434686
TVendogenous |
         wks |   .0008374   .0005997     1.40   0.163    -.0003381    .0020129
          ms |  -.0298508     .01898    -1.57   0.116    -.0670508    .0073493
       union |   .0327714   .0149084     2.20   0.028     .0035514    .0619914
         exp |   .1131328    .002471    45.79   0.000     .1082898    .1179758
        exp2 |  -.0004189   .0000546    -7.67   0.000    -.0005259   -.0003119
TIexogenous  |
         fem |  -.1309236    .126659    -1.03   0.301    -.3791707    .1173234
         blk |  -.2857479   .1557019    -1.84   0.066    -.5909179    .0194221
TIendogenous |
          ed |    .137944   .0212485     6.49   0.000     .0962977    .1795902
             |
       _cons |   2.912726   .2836522    10.27   0.000     2.356778    3.468674
*/

Intercept ≃ 2.91

`I()` in LHS

I() in the LHS can lead to an error. See below reproducible example

data(Produc)
fm1 <-   diff(log(gsp))  ~ log(pcap) + log(pc) + log(emp)
fm2 <- I(diff(log(gsp))) ~ log(pcap) + log(pc) + log(emp)

plm(fm1, Produc) # runs
plm(fm2, Produc) # errors
## plm version 2.2-5
# Error in is.pbalanced.default(x) :
#   argument "y" is missing, with no default

## plm version 2.6-0
# Error in fwithin.default(x, g = effect, w = NULL, na.rm = na.rm) :
#   length(g) must match nrow(X)

fd model does not report F-Statistic

Other models do report the F-Statistic at the end. Apparently, it was reported before (here is a post from 2020). The news file does not seem to show a change on this regard. A MRE below

data("Wages", package = "plm")
df <- pdata.frame(Wages, index=595)
fd <- plm(data=df,lwage ~ ed + exp,model = "fd",effect = "individual")
summary(fd)

How to count the number of total instruments (pgmm)

How is possible to display the instrument count in the pgmm? I want to know how many total instruments are generated in my model and see how many instruments get reduced by the collapse function (vs not using the collapse function). I tried this with my model named p2s, but do not fully trus the resulted number. Sagan test displays df = 40 and my code below lands at 65 instruments:

p2s <-pgmm(roa ~ lag(roa, 1:2) + lag(gov_log,0:2) + lag(size, 0:2) + lag(growth, 0:2) + lag(lev_log, 0:2) + lag(capexp_log, 0:2) 
| lag(roa, 2:99) +lag(gov_log, 1:99) +lag(size, 1:99) +lag(growth, 2:99) + lag(lev_log, 1:99) + lag(capexp_log, 2:99), 
data = dat, model ="twostep", collapse = TRUE, transformation = "ld", effect = "twoways",index=c("id","zeit"))

# Sargan-Test durchführen
sargan_test <- sargan(p2s)

# Anzahl der geschätzten Parameter (Anzahl der Koeffizienten)
num_parameters <- length(coef(p2s))

# Anzahl der Instrumente = Freiheitsgrade des Sargan-Tests + Anzahl der geschätzten Parameter
num_instruments <- sargan_test$parameter + num_parameters

# Anzahl der Instrumente anzeigen
cat("Anzahl der Instrumente:", num_instruments, "\n")

Residuals are automatically sorted according to the values of the group variable

I am using the State Traffic Fatality Data Set from Chapter 10 of Stock and Watson's Introduction to Econometrics (3rd edition), which can be found here. In my dataset, the order of the group variable state is not sorted:

> Fatalities2$state
  [1] al al al al al al al az az az az az az az ar ar ar ar ar ar ar ca ca ca ca ca ca ca co co co co co co co ct ct ct ct ct ct ct de de de de de de de fl fl fl fl fl fl fl ga
 [58] ga ga ga ga ga ga id id id id id id id il il il il il il il in in in in in in in ia ia ia ia ia ia ia ks ks ks ks ks ks ks ky ky ky ky ky ky ky la la la la la la la me me
[115] me me me me me md md md md md md md ma ma ma ma ma ma ma mi mi mi mi mi mi mi mn mn mn mn mn mn mn ms ms ms ms ms ms ms mo mo mo mo mo mo mo mt mt mt mt mt mt mt ne ne ne
[172] ne ne ne ne nv nv nv nv nv nv nv nh nh nh nh nh nh nh nj nj nj nj nj nj nj nm nm nm nm nm nm nm ny ny ny ny ny ny ny nc nc nc nc nc nc nc nd nd nd nd nd nd nd oh oh oh oh
[229] oh oh oh ok ok ok ok ok ok ok or or or or or or or pa pa pa pa pa pa pa ri ri ri ri ri ri ri sc sc sc sc sc sc sc sd sd sd sd sd sd sd tn tn tn tn tn tn tn tx tx tx tx tx
[286] tx tx ut ut ut ut ut ut ut vt vt vt vt vt vt vt va va va va va va va wa wa wa wa wa wa wa wv wv wv wv wv wv wv wi wi wi wi wi wi wi wy wy wy wy wy wy wy
Levels: al ar az ca co ct de fl ga ia id il in ks ky la ma md me mi mn mo ms mt nc nd ne nh nj nm nv ny oh ok or pa ri sc sd tn tx ut va vt wa wi wv wy

For example, az is in front of ar, which is not following the alphabetical order. It is more obvious if we check the numeric values of state

> as.numeric(Fatalities2$state)
  [1]  1  1  1  1  1  1  1  3  3  3  3  3  3  3  2  2  2  2  2  2  2  4  4  4  4  4  4  4  5  5  5  5  5  5  5  6  6  6  6  6  6  6  7  7  7  7  7  7  7  8  8  8  8  8  8  8  9
 [58]  9  9  9  9  9  9 11 11 11 11 11 11 11 12 12 12 12 12 12 12 13 13 13 13 13 13 13 10 10 10 10 10 10 10 14 14 14 14 14 14 14 15 15 15 15 15 15 15 16 16 16 16 16 16 16 19 19
[115] 19 19 19 19 19 18 18 18 18 18 18 18 17 17 17 17 17 17 17 20 20 20 20 20 20 20 21 21 21 21 21 21 21 23 23 23 23 23 23 23 22 22 22 22 22 22 22 24 24 24 24 24 24 24 27 27 27
[172] 27 27 27 27 31 31 31 31 31 31 31 28 28 28 28 28 28 28 29 29 29 29 29 29 29 30 30 30 30 30 30 30 32 32 32 32 32 32 32 25 25 25 25 25 25 25 26 26 26 26 26 26 26 33 33 33 33
[229] 33 33 33 34 34 34 34 34 34 34 35 35 35 35 35 35 35 36 36 36 36 36 36 36 37 37 37 37 37 37 37 38 38 38 38 38 38 38 39 39 39 39 39 39 39 40 40 40 40 40 40 40 41 41 41 41 41
[286] 41 41 42 42 42 42 42 42 42 44 44 44 44 44 44 44 43 43 43 43 43 43 43 45 45 45 45 45 45 45 47 47 47 47 47 47 47 46 46 46 46 46 46 46 48 48 48 48 48 48 48

plm() gives the correct regression results

> my_model_panel2 <- plm(mrall ~ beertax, 
+                       data = Fatalities2,
+                       index = c("state", "year"), 
+                       model = "within")
> summary(my_model_panel2)
Oneway (individual) effect Within Model

Call:
plm(formula = mrall ~ beertax, data = Fatalities2, model = "within", 
    index = c("state", "year"))

Balanced Panel: n = 48, T = 7, N = 336

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-0.5869619 -0.0828376 -0.0012701  0.0795454  0.8977960 

Coefficients:
        Estimate Std. Error t-value Pr(>|t|)    
beertax -0.65587    0.18785 -3.4915 0.000556 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    10.785
Residual Sum of Squares: 10.345
R-Squared:      0.040745
Adj. R-Squared: -0.11969
F-statistic: 12.1904 on 1 and 287 DF, p-value: 0.00055597

However, if we calculate the fitted values using

> Fatalities2$mrall_hat <- Fatalities2$mrall - my_model_panel2$residuals

and plot the regression lines using

> library(ggplot2)
> ggplot(Fatalities2) +
+   geom_point(aes(y = mrall, x = beertax, color = state), size = 2, show.legend = FALSE) + 
+   geom_line(aes(y = mrall_hat, x = beertax, color = state), show.legend = FALSE) 

we get
image
Clearly, the fitted values are not correct. The reason is my_model_panel2$residuals is sorted according to the values of state. For example, the residuals for state="az" is put in my_model_panel2$residuals[15:21], but it suppose to be in my_model_panel2$residuals[8:14] to match the order with the other variables.

If we sorted it back according to the order of state using

> myorder <- seq(1,nrow(Fatalities2))[order(Fatalities2$state)]
> residuals_reordered <- my_model_panel2$residuals[order(myorder)]

and then calculate the new fitted values and plot them, we have
image

does pgmm support categorical variables?

I am trying to run a model like this (this is a minimal example with less variables), where A and B are categorical variables

fit<- pgmm(Y~lag(Y, 1)+lag(A, 1)+ B | lag(Y, 1:3) ,
                    data = dati_long_positive, 
                    index = c("id","year"),
                    model = "twostep",
                    effect="individual"
)

Are expressions like "+lag(A, 1)" or "+ B" supported if A and B are categorical variables? How are categorical variables handled in pgmm?

I have noticed that if I use A and B as a factor, my model does not run, while if I expand them in dummies with model.matrix it works, but I am not sure if it makes sense.

plm IV regression does not respect suppressed intercept in instrument matrix

plm's IV estimation does not respect suppressed intercept in instrument matrix. Reproducible example below.

library("plm")
library("ivreg")
library("AER")


data("SeatBelt", package = "pder")
SeatBelt$occfat <- with(SeatBelt, log(farsocc / (vmtrural + vmturban)))
pSB <- pdata.frame(SeatBelt)

formiv1 <- occfat ~ log(usage) + log(percapin) | log(percapin) + ds + dp + dsp
formiv2 <- occfat ~ log(usage) + log(percapin) | log(percapin) + ds + dp + dsp + 0

plm(formiv1, SeatBelt, model = "pooling")
#> 
#> Model Formula: occfat ~ log(usage) + log(percapin) | log(percapin) + ds + dp + 
#>     dsp
#> 
#> Coefficients:
#>   (Intercept)    log(usage) log(percapin) 
#>       7.56412       0.17686      -1.17226
AER::ivreg(formiv1, data = SeatBelt)
#> 
#> Call:
#> AER::ivreg(formula = formiv1, data = SeatBelt)
#> 
#> Coefficients:
#>   (Intercept)     log(usage)  log(percapin)  
#>        7.5641         0.1769        -1.1723
ivreg::ivreg(formiv1, data = SeatBelt)
#> 
#> Call:
#> ivreg::ivreg(formula = formiv1, data = SeatBelt)
#> 
#> Coefficients:
#>   (Intercept)     log(usage)  log(percapin)  
#>        7.5641         0.1769        -1.1723

plm(formiv2, SeatBelt, model = "pooling")
#> 
#> Model Formula: occfat ~ log(usage) + log(percapin) | log(percapin) + ds + dp + 
#>     dsp + 0
#> 
#> Coefficients:
#>   (Intercept)    log(usage) log(percapin) 
#>       7.56412       0.17686      -1.17226
AER::ivreg(formiv2, data = SeatBelt)
#> 
#> Call:
#> AER::ivreg(formula = formiv2, data = SeatBelt)
#> 
#> Coefficients:
#>   (Intercept)     log(usage)  log(percapin)  
#>       0.57379       -0.02646       -0.47789
ivreg::ivreg(formiv2, data = SeatBelt)
#> 
#> Call:
#> ivreg::ivreg(formula = formiv2, data = SeatBelt)
#> 
#> Coefficients:
#>   (Intercept)     log(usage)  log(percapin)  
#>       0.57379       -0.02646       -0.47789

pmg slow

I found the pmg function to be really slow even for medium-sized data. Have you considered enhancing pmg or the other functions in plm with C++ codes, just like the fixest package?

Non-standard names in formula evaluation

Hi,

I have noticed backticks do not work in plm formula.
I think it would be useful for people who would want
to create two stage models without transforming the data twice.

Example:

library(plm)
data("Grunfeld", package="plm")
model1 = plm(inv ~ value + capital + factor(year),
              data = Grunfeld, model = "pooling")
model1
# model results
# ...

plm(inv ~ value + capital + `factor(year)`, data=model.frame(model1), model="within", effect="individual")
# Error in eval(predvars, data, env) : object 'factor(year)' not found

Estimating a Two-way Random Effect Model Using lmer and plm

I am using lmer to estimate a two-way random effect model like this y ~ x1 + x2 + (1|country) + (1|year).

When I use plm(y ~ x1 + x2, index = c("country", "year"), model = "random", effect = "twoways") I get totally different estimates. Here is a reproducible example:

if (!require("pacman")) install.packages("pacman")

pacman::p_load(stargazer, lme4,  plm)

library(gapminder)

fit.lmm <- lmer( log(lifeExp) ~ log(pop) + log(gdpPercap) + 
(1|year) + (1|country), data = gapminder)

fit.plm <- plm( log(lifeExp) ~ log(pop) + log(gdpPercap), data = 
gapminder, index = c("country","year"), model="random", effect = 
"twoways")

stargazer(fit.lmm, fit.plm, type="text")

Error: "Lapack routine dgesv: system is exactly singular" with random.method = "swar"

I have posted a StackOverflow example of an error that occurs with method = "random" and random.method = "swar") (but not other random.method arguments):

# data provided at end of post

library(plm)
plm(y ~ x + factor(year), index = "panel", model = "random", random.method = "swar", data = df)

# Error in solve.default(crossprod(ZBeta)) : 
#  Lapack routine dgesv: system is exactly singular: U[15,15] = 0

The above fails with the data frame copied at the end of this issue. The issue seems to be that a singular matrix is being sent to Lapack for solving, although I do not understand why that is the case.

Changing the random.method to anything but "swar" solves the issue, so the problem seems to be located there.

Trying the same thing in Stata, using the same (Swamy-Arora) estimator, does not return an error.

df <- structure(list(y = c(0.32, 0.51, 0.26, 0.99, 0.59, 0.43, 0.6, 
0.86, 1, 0.97, 0.89, 0.63, 0.55, 0.58, 0.26, 0.69, 0.87, 0.17, 
0.09, 0, 0.87, 0.39, 0.36, 0.73, 0.13, 0.61, 0.36, 0.64, 0.72, 
0.95, 0.8, 0.96, 0.32, 0.91, 0.77, 0.14, 0.37, 0.57, 0.81, 0.98, 
0.5, 0.23, 0.8, 0.04, 0.84, 0.12, 0.56, 0.22, 0.49, 0.65, 0.59, 
0.98, 0.71, 0.58, 0.75, 0.77, 0.49, 0.72, 0.29, 0.2, 0.67, 0.06, 
0.36, 0.44, 0.65, 0.29, 0.85, 0.75, 0.2, 0.44, 0.7, 0.54, 0.19, 
0.47, 0.83, 0.47, 0.23, 0.43, 0.6, 0.48, 0.63, 0.95, 1, 0.46, 
0.28, 0.88, 0.82, 0.71, 0.57, 0.25, 0.78, 0.07, 0.45, 0.7, 0.08, 
0.2, 0.5, 0.13, 0.56, 0.12, 0.08, 0.29, 0.89, 0.37, 0.96, 0.83, 
0.81, 0.02, 0.96, 0.83, 0.51, 0.04, 0.04, 0.06, 0.44, 0.61, 0.99, 
0.83, 0.31, 0.82, 0.12, 0.18, 0.89, 0.23, 0.46, 0.73, 0.76, 0.49, 
0.32, 0.87, 0.11, 0.01, 0.96, 0.86, 0.91, 0.68, 0.8, 0.63, 0.94, 
1, 0.59, 0.5, 0.01, 0.48, 0.86, 0.92, 0.07, 0.15, 0.07, 0.33, 
0.6, 0.52, 0.12, 0.59, 0.56, 0.56, 0.55, 0.18, 0.11, 0.16, 0.27, 
0.06, 0.62, 0.34, 0.69, 0.87, 0.32, 0.31, 0.1, 0.44, 0.99, 0.96, 
0.72, 0.19, 0.81), x = c(0.25, 0.41, 0.55, 0.77, 0.95, 0.2, 0.36, 
0.58, 0.27, 0.56, 0.53, 0.88, 0.55, 0.43, 0.19, 0.54, 0.2, 0.37, 
0.18, 0.09, 0.26, 0.15, 0.75, 0.08, 0.55, 0.06, 0.23, 0.9, 0.12, 
0.51, 0.58, 0.54, 0.88, 0.24, 0.9, 0.85, 0.32, 0.43, 0.66, 0.12, 
0.09, 0.75, 0.5, 0.11, 0.07, 0.04, 0.6, 0.96, 0.39, 0.61, 0.23, 
0.28, 0.45, 0.55, 0.52, 0.99, 0.96, 0.64, 0.31, 0.47, 0.01, 0.56, 
0.7, 0.88, 0.13, 0.87, 0.2, 0.62, 0.42, 0.85, 0.5, 0.22, 0.52, 
0.15, 0.31, 0.23, 0.09, 0.76, 0.56, 0.29, 0.42, 0.87, 0.75, 0.78, 
0.67, 0.94, 0.69, 0.74, 0.07, 0.22, 0.47, 0.52, 0.85, 0.28, 0.47, 
0.39, 0.34, 0.94, 0.14, 0.5, 0.16, 0.2, 0.22, 0.71, 0.66, 0.68, 
0.54, 0.24, 0.04, 0.1, 0.44, 0.54, 0.23, 0.53, 0.24, 0.14, 0.99, 
0.18, 0.93, 0.99, 0.49, 0.39, 0.78, 0.41, 0.31, 0.11, 0.75, 0.59, 
0.85, 0.31, 0.8, 0.21, 0.67, 0.31, 0.21, 0.88, 0.84, 0.32, 0.36, 
0.89, 0.4, 0.82, 0.54, 0.18, 0.4, 0.71, 0.28, 0.83, 0.78, 0.07, 
0.93, 0.47, 0.44, 0.49, 0.71, 0.69, 0, 0.47, 0.72, 0.06, 0.13, 
0.65, 0.12, 0.26, 0.67, 0.8, 0.4, 0.82, 0.22, 0.16, 0.32, 0.01, 
0.53, 0.26, 0.99), panel = c("p1", "p1", "p1", "p1", "p1", "p1", 
"p2", "p2", "p2", "p2", "p2", "p2", "p2", "p2", "p2", "p3", "p3", 
"p3", "p3", "p4", "p4", "p4", "p4", "p5", "p5", "p5", "p5", "p6", 
"p6", "p6", "p6", "p6", "p6", "p6", "p6", "p6", "p7", "p7", "p7", 
"p7", "p7", "p7", "p7", "p7", "p7", "p8", "p8", "p8", "p8", "p8", 
"p8", "p8", "p8", "p9", "p9", "p9", "p9", "p10", "p10", "p10", 
"p10", "p10", "p10", "p11", "p11", "p11", "p11", "p11", "p11", 
"p11", "p11", "p11", "p12", "p12", "p12", "p12", "p12", "p12", 
"p12", "p12", "p12", "p13", "p13", "p13", "p13", "p13", "p13", 
"p13", "p13", "p13", "p14", "p14", "p14", "p15", "p15", "p15", 
"p15", "p16", "p16", "p16", "p16", "p16", "p16", "p16", "p16", 
"p16", "p17", "p17", "p17", "p17", "p17", "p17", "p17", "p17", 
"p17", "p18", "p18", "p18", "p18", "p19", "p19", "p19", "p19", 
"p19", "p19", "p19", "p19", "p19", "p20", "p20", "p20", "p20", 
"p21", "p21", "p21", "p21", "p22", "p22", "p22", "p22", "p22", 
"p22", "p22", "p22", "p22", "p23", "p23", "p23", "p23", "p24", 
"p24", "p24", "p24", "p24", "p24", "p24", "p24", "p25", "p25", 
"p25", "p25", "p26", "p26", "p26", "p26", "p27", "p27", "p27", 
"p27", "p28", "p28", "p28", "p28", "p28", "p28"), year = c(8, 
9, 10, 12, 14, 15, 1, 3, 5, 6, 9, 10, 12, 14, 15, 11, 12, 14, 
15, 10, 12, 14, 15, 10, 12, 14, 15, 1, 3, 5, 6, 9, 10, 12, 14, 
15, 1, 3, 5, 6, 9, 10, 12, 14, 15, 4, 5, 6, 9, 10, 12, 14, 15, 
10, 12, 14, 15, 8, 9, 10, 12, 14, 15, 1, 3, 5, 6, 9, 10, 12, 
14, 15, 1, 3, 5, 6, 9, 10, 12, 14, 15, 2, 3, 5, 6, 9, 10, 12, 
14, 15, 13, 14, 15, 10, 12, 14, 15, 1, 3, 5, 6, 9, 10, 12, 14, 
15, 1, 3, 5, 6, 9, 10, 12, 14, 15, 10, 12, 14, 15, 1, 3, 5, 6, 
9, 10, 12, 14, 15, 10, 12, 14, 15, 10, 12, 14, 15, 1, 3, 5, 6, 
9, 10, 12, 14, 15, 10, 12, 14, 15, 4, 5, 6, 9, 10, 12, 14, 15, 
11, 12, 14, 15, 10, 12, 14, 15, 10, 12, 14, 15, 7, 9, 10, 12, 
14, 15)), row.names = c(NA, -175L), class = "data.frame")

IV models with a single regressor don't work with fancy covariance matrices

Consider a (modified) example from the help:

zz <- plm(log(gsp) ~ log(pcap) | log(pc) + log(emp),
	     data = Produc, index = c("state","year"),
	     model = "within")

When you type summary(zz) you get:

> summary(zz)
Oneway (individual) effect Within Model
Instrumental variable estimation

Call:
plm(formula = log(gsp) ~ log(pcap) | log(pc) + log(emp), data = Produc, 
    model = "within", index = c("state", "year"))

Balanced Panel: n = 48, T = 17, N = 816

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-0.3779409 -0.0561652 -0.0055046  0.0440775  0.4590412 

Coefficients:
          Estimate Std. Error z-value  Pr(>|z|)    
log(pcap) 1.528846   0.035191  43.444 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    18.941
Residual Sum of Squares: 7.0211
R-Squared:      0.69103
Adj. R-Squared: 0.67169
Chisq: 1887.37 on 1 DF, p-value: < 2.22e-16

Now, what about Driscoll-Kraay standard errors?

> summary(zz, vcov = vcovSCC)
Error in demX[tind[[i]], , drop = FALSE] : incorrect number of dimensions

I believe the problem is in the definition of vcovG, where the variable demX is defined for models with instrumental variables. If the model matrix has just one column (as in the above example), demX is a column vector, not a matrix. I've found that if you wrap that line in an as.matrix everything else works.

Fixef Function with Instruments

Hello,

The fixef function gives the following error when instruments are used in plm function:

Error in tcrossprod(coef(object), as.matrix(object$model[, -1, drop = FALSE])) :
non-conformable arguments

I believe this is because the object 'as.matrix(object$model[, -1, drop = FALSE])' has the instruments in it. I think this wasn't an issue before March.

Best,

Clustered standard errors in dynamic panel estimated with GMM

Panel data is often grouped together. Not controlling for potential dependency within the group often leads to biased standard errors. Thus, it would be very useful if the plm package made it possible to group standard errors in case dynamic panel is estimated by the widely used GMM. This is especially referring to the Arellano, Bond (1991) and Blundell, Bond (1998) approaches.

External instruments in pgmm()

Hi there,

my question is about using external, "IV-like" Instruments in pgmm(). Is such a feature implemented?

For example:

mod <- pgmm(w ~ lag(w, 1) + lag(k,1) | lag(w, 2:99) | lag(k,1) , effect = "twoways", model = "twostep", data = data)

Lets say I want to instrument "k" with an external instrument "d" that is uncorrelated with the idiosyncratic error term. Is it possible to do this with pgmm(), and if so, is it done in the same way as in plm()?:

mod <- pgmm(w ~ lag(w, 1) + lag(k,1) | lag(w, 2:99) | lag(d,1) , effect = "twoways", model = "twostep", data = data)

Thank you very much in advance and I hope the question is not out of place here. I have tried to find a solution in "Panel data econometrics in R", the Vignette of "plm" and "stackoverflow", but was not successful. Hopefully I have not missed an obvious answer ...

Best,
Julian

Support `na.action` argument

Currently, the na.action argument is (silently) ignored. (Of course, NA dropping works).

Adding support for it would have many benefits in presence of NAs, e.g., making it easier to add residuals back to the original data and it seems sandwich::vcovBS on plm objects estimated from a data frame with dropped NAs requires it via use of expand.model.frame.

An old branch was once started on R-Forge and has the below information:
https://r-forge.r-project.org/scm/viewvc.php/branches/exp_na.action/?root=plm

################# First take on na.action for plm objects ####################

  • plm, residuals.plm: element "na.action" is now in plm objects. This enables extraction of residuals
    padded to match the original data in presence of NAs when model is estimated with
    the na.action argument set to, e.g. na.exclude. Example:
    plm_object <- plm(..., data = data, na.action = na.exclude)
    residuals(plm_object) # will have the length to match number of rows of data

  • many more files changes to accomodate the change (but likely not all)

        Caution developers:
        ==================
        This means, when referring to residuals, care must be taken to use plm_object$residuals
        and not residuals(plm_object) (or the shortcut resid(plm_object)) to get the residual vector in the correct
        length and without NA padding. This is also how it is done in standard R's lm().
        Many function (statistical tests, summary statistics) need the residuals (sum of squared residuals, number of residuals).
        The functions might fail if not refered to the residuals by plm_object$residuals (think of function sum with NA values
        (without , na.rm = TRUE)) or not fail while relying on the wrong residual vector (especially so for length(residuals(plm_object))).
        
        In this experimental branch, it was tried to do the necessary changes for some functions, but likely not all.
    

vcovHC throws "Error in tind[[i]] : subscript out of bounds" only when model is fitted by first-differences

I am pretty used to using R's plm package to fit models with panel data, including first difference models. Today, however, I found myself facing a strange error that is thrown when passing the results of a plm model fitted with first-differences to plm's vcovHC() function to estimated robust variance-covariance matrices:

Error in tind[[i]] : subscript out of bounds

The source code of the plm package shows that the tind mentioned in the error is an internal time index. That is the only clue we get.

Because I am exactly trying to understand what leads to this issue, I can't generate simulated data in order to provide a more general minimally reproducible example. But I can share a modified excerpt of my data, and code that, when run, will throw the error.

The data is in file soquestion.csv .

The code:

library(plm)
df <- read.csv("soquestion.csv")
pmodel <- plm(y ~ x1 + x2, data=df, index=c("group", "time"),
                model="fd", effect="individual")
vcovHC(pmodel)

This should throw the error. I have checked many things in the data, like whether the class of the time variable is correct, whether it has NAs, among other things. Nothing noticeable. But it is worth stressing that this only happens with model="fd". If one tries any other options in parameter, like within or random, then the vcovHC command works as intended and without errors.

Help with weights for instrumental variable regression

It looks like weights are implemented for instrumental variable regressions. I'm getting the error

argument 'weights' not yet implemented for instrumental variable models

How hard would this be to implement? I can try to make a PR.

Problems with 'weight' argument when using 'within' estimator

Consider an example from the help:

> set.seed(10)
> w=runif(200)
> w=(w)^-6
> grun.fe_w <- plm(inv~value+capital-1, data = Grunfeld, model = "within", weights = w)
> grun.fe_lm_w <- lm(inv~value+capital+factor(firm)-1, data = Grunfeld, weights = w)
> rbind(coefficients(grun.fe_w),coefficients(grun.fe_lm_w)[1:2])
           value     capital
[1,] -0.02815926 0.002911782
[2,] -0.02697912 0.093432287

As far as I'm concern, coeficcients should be equal, as the estimator used to compute them is the same.
Fixed effects differ too:

> rbind(fixef(grun.fe_w),coefficients(grun.fe_lm_w)[3:12])

            1        2        3         4        5        6        7        8        9       10
[1,] 728.1698 465.1416 155.7911 105.29074 66.90317 66.93043 50.89642 61.53446 50.41692 5.064283
[2,] 755.1351 463.0828 141.8153  57.60098 19.10493 30.22032 20.47606 54.01505 29.12005 4.797652

In addition, residuals method provides strange results.

> summary(as.numeric(residuals(grun.fe_w)))

      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-2127699.0     -493.3      -13.6    -5014.3      110.9  1659085.3 

If we fit the model by ourselves, we obtein something which seems to be ok.

> y_hat_plm <- Grunfeld$value*coefficients(grun.fe_w)[1] + Grunfeld$capital*coefficients(grun.fe_w)[2] + 
> as.numeric(fixef(grun.fe_w)[Grunfeld$firm])
> plot(y_hat_plm, Grunfeld$inv, cex=w*1e-8)
> abline(0, 1, col = 'red', lty = 'dashed', lwd = 2)

issueplm

But compared with the model obtain with lm, the results seem to be worse.

> plot(as.numeric(fitted(grun.fe_lm_w)), Grunfeld$inv, cex=w*1e-8)
> abline(0, 1, col = 'red', lty = 'dashed', lwd = 2)

issueplm2

automatically handle missing values in pmg

I noticed that I need to remove all missing values when using plm::pmg. Most other econometrics packages can automatically handle missing values. Would you add this feature in the future? Thanks.

Handing of missing data in method "fd" / time-wise first-differences

Greetings,

It appears that method "fd" will calculate a first difference even when data points for previous time periods are missing. I noticed this when trying to get R and Stata to produce the same results for a model with first differences, in which plm with the fd method reported having more cases in the estimation.

Specifically, if data points in the middle of a panel series are missing, method fd will still construct a first difference by going back to the most recent period in which there is data, rather than drop the case.

I illustrate in the attached code using the EmplUK dataset. I set specific data points to missing. I then construct the first differences manually. Then, I compare fd using the source data in levels to method random using the first differenced data.

Kind regards,

Jon Hanson

library(plm)

data("EmplUK" , package = "plm")
Em <- pdata.frame(EmplUK, index = c("firm", "year"))

# make selected missing data NOT in first year
Em$capital[Em$firm == 1 & Em$year == 1981] <- NA
Em$capital[Em$firm == 2 & Em$year == 1981] <- NA
Em$capital[Em$firm == 2 & Em$year == 1982] <- NA

# make first differences directly
Em$d.emp <- diff(log(Em$emp))
Em$d.wage <- diff(log(Em$wage))
Em$d.capital <- diff(log(Em$capital))

# estimate model with "fd"
mod.fd <- plm(log(emp) ~ log(wage) + log(capital), data = Em, model = "fd")
summary(mod.fd)
length(mod.fd$residuals)

# estimate model with first differences; edited based on response from tappek
mod.alt <- plm(d.emp ~ d.wage + d.capital, data = Em, model = "pooling")
summary(mod.alt)
length(mod.alt$residuals)

lag function

The lag function does not load anymore with the command plm::lag as it used to do with previous versions.
To reproduce the error it is sufficient to execute plm::lag on the console

Result of cipstest / lags = 0L

Hello,
My question about Pesaran, M.H. (2007)'s CIPS test. In R, we can test via cipstest command. I was trying to regenerate the critical values of this test. I took test statistic via cipstest commad. However, my critical values are always smaller than the original ones.

I just wanted to be sure from my data-generating process and wrote code in GAUSS. I got the same results as the original paper.
After that, using the same data set, model, and lag length R's cipstest gives different results from Eviews, Stata (pescadf), and GAUSS (tspdlib library). All of these programs provide the same result. I wonder why this difference.

Thank you,

Formula Expansions by . (dot) in plm Estimation Commands

From Twitter: Is there any reason why in #R with the plm-command for a fixed-effects-regression I cannot for the formula use the usual " y ~ . , data = df, ..." notation? It only works if I explicitly give it " y ~ x1 + x2 + ...". Which is impracticable if you have 440 variables...

PS: It’s a matter of passing the data frame to the data argument of terms.formula e.g. terms(mpg ~ ., data = mtcars) does the job.

dplyr "compatibility" (was: Results sensitive to data ordering)

Hello,

I've noticed that the fixed effects results from plm are sensitive to how the data is ordered.

The fixed-effects regression provides the expected results when the data is sorted by unit-time (unit first and then time), but provides different (wrong) coefficients when the data is sorted by time-unit (time first then unit).

I think it would make a great package even better if the results were not sensitive to how the data is sorted, or if at the very least there is an error message that pops out alerting researchers that their data is not sorted the right way if the data is not pre-arranged as unit-year.

Thanks!

Stata's 'weakivtest' with plm?

Hi,

I was wondering whether plm has a built in command for calculating the 'effective F statistic' for weak instrument tests? In stata, the user-written command 'weakivtest' calculates the effective F, see: https://journals.sagepub.com/doi/pdf/10.1177/1536867X1501500113

The test was developed by Montiel Olea and Pflueger (2013): https://www.carolinpflueger.com/MOP_FINAL_May14.pdf

I couldn't find anything related to this test in the vignette, so thought I'd ask here.

Thank you for your time!

Insufficient Number of Instruments Error When Doing IV Regression with Unbalanced Panel even though the regression does fine in Stata

I am attempting to run the following IV regression on an unbalanced panel dataset. The variables TOTAL_E, TOURISM_5k_SUM and TOURISM_10K_SUM are endogenous, while HHSIZE2 and SEX are exogenous explanatory variables, and Z_b_100,tourism_5KM_ZSCORE and tourism_10KM_ZSCORE are instruments. I want to include household characteristics as controls in this regression. I do so and am also sure to include them in the formula with the IVS, but every time I still get the insufficient number of instruments error. I am using a random effects model since my instrument is time invariant so I don't believe that is the issue. Based on this question:https://stackoverflow.com/questions/56672684/error-insufficient-number-of-instruments-when-running-plm-iv-regression, I am supposed to have 2 instruments for every endogenous variable but that isn't possible in my case so I am a bit stuck. The regression runs fine in STATA. I will add the STATA output below. Any advice would be appreciated!

random <- plm(asinh(AECAPITA)~asinh(TOTAL_E) +asinh(TOURISM_5KM_SUM)+ + HHSIZE2 + SEX|Z_b_100 + HHSIZE2 ++ tourism_10KM_ZSCORE+ SEX ,data=df,index=c("UNIQUE_HH_ID"), model="random")


Error in plm.fit(data, model = models[1L], effect = effect) : 
  insufficient number of instruments
In addition: Warning message:
In pdata.frame(data, index) :
  duplicate couples (id-time) in resulting pdata.frame
 to find out which, use, e.g., table(index(your_pdataframe), useNA = "ifany")



. gen log_total_e_10 = log(TOURISM_10KM_SUM)
(8,229 missing values generated)

. do "C:\Users\mcket\AppData\Local\Temp\STD728_000000.tmp"

. . xtivreg AE_CAPITA_US SEX HHSIZE2 ( TOTAL_E TOURISM_10KM_SUM = Z_b_100 tourism_10KM_ZSCORE), re

G2SLS random-effects IV regression              Number of obs     =     11,374
Group variable: UNIQUE_HH_ID                    Number of groups  =      5,293

R-squared:                                      Obs per group:
     Within  = 0.0044                                         min =          1
     Between = 0.0546                                         avg =        2.1
     Overall = 0.0338                                         max =         12

                                                Wald chi2(4)      =    1011.76
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

----------------------------------------------------------------------------------
    AE_CAPITA_US | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-----------------+----------------------------------------------------------------
         TOTAL_E |  -.0088191   .0015754    -5.60   0.000    -.0119068   -.0057314
TOURISM_10KM_SUM |   .0010611   .0003845     2.76   0.006     .0003076    .0018147
             SEX |   247.1207   67.98776     3.63   0.000     113.8672    380.3743
         HHSIZE2 |  -490.9899   15.70356   -31.27   0.000    -521.7683   -460.2115
           _cons |   5172.889   126.4606    40.91   0.000     4925.031    5420.747
-----------------+----------------------------------------------------------------
         sigma_u |          0
         sigma_e |  73042.599
             rho |          0   (fraction of variance due to u_i)
----------------------------------------------------------------------------------
Endogenous: TOTAL_E TOURISM_10KM_SUM
Exogenous:  SEX HHSIZE2 Z_b_100 tourism_10KM_ZSCORE





Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.