Giter Site home page Giter Site logo

Comments (3)

tyleransom avatar tyleransom commented on July 22, 2024 1

Nice work! I agree with you that the SE difference is likely due to some degrees of freedom discrepancy.

If it's all right with you, I'll close this issue. I believe that the way I present it in the lab is consistent with how it's written in Wooldridge's book. While this implementation is not perfectly correct with respect to the FWL theorem, I believe it's more accessible to introductory students.

Feel free to re-open the issue if you have more questions or proposed corrections. Thanks!

from econometricslabs.

tyleransom avatar tyleransom commented on July 22, 2024

Hi, thanks for bringing this to my attention. You are correct that the FWL theorem uses residualized y (see the second equation on the Wikipedia page), whereas the example in Lab 4 only residualizes the x.

Let's see how this differs from the example in my lab using R:

# load the data
library(tidyverse)
library(modelsummary)
df <- mtcars %>% as_tibble()

Basic regression using mtcars dataset, where cyl is coefficient of interest:

summary(lm(mpg ~ cyl + disp + hp, data = df))

Call:
lm(formula = mpg ~ cyl + disp + hp, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.0889 -2.0845 -0.7745  1.3972  6.9183 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 34.18492    2.59078  13.195 1.54e-13 ***
cyl         -1.22742    0.79728  -1.540   0.1349    
disp        -0.01884    0.01040  -1.811   0.0809 .  
hp          -0.01468    0.01465  -1.002   0.3250    
---
Signif. codes:  0***0.001**0.01*0.05.0.1 ‘ ’ 1

Residual standard error: 3.055 on 28 degrees of freedom
Multiple R-squared:  0.7679,	Adjusted R-squared:  0.743 
F-statistic: 30.88 on 3 and 28 DF,  p-value: 5.054e-09

Now residualize cyl only:

est1 <- lm(cyl ~ disp + hp, data = df)
summary(lm(mpg ~ est1$residuals, data = df))

Call:
lm(formula = mpg ~ est1$residuals, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.8351  -3.5281  -0.5277   1.8950  13.7914 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)      20.091      1.072  18.735   <2e-16 ***
est1$residuals   -1.227      1.583  -0.775    0.444    
---
Signif. codes:  0***0.001**0.01*0.05.0.1 ‘ ’ 1

Residual standard error: 6.066 on 30 degrees of freedom
Multiple R-squared:  0.01965,	Adjusted R-squared:  -0.01303 
F-statistic: 0.6012 on 1 and 30 DF,  p-value: 0.4442

Now residualize both mpg and cyl:

est2 <- lm(mpg ~ disp + hp, data=df)
summary(lm(est2$residuals ~ est1$residuals, data = df))

Call:
lm(formula = est2$residuals ~ est1$residuals, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.0889 -2.0845 -0.7745  1.3972  6.9183 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)
(Intercept)     1.695e-16  5.218e-01   0.000    1.000
est1$residuals -1.227e+00  7.702e-01  -1.594    0.122

Residual standard error: 2.952 on 30 degrees of freedom
Multiple R-squared:  0.07804,	Adjusted R-squared:  0.04731 
F-statistic: 2.539 on 1 and 30 DF,  p-value: 0.1215

In the end, they each give the same coefficient estimate, but different standard errors. I'll have to think more about why this is the case, but for now it seems that there is nothing incorrect. But please engage further if you disagree or if something else is unclear! Thanks again for bringing this up.

from econometricslabs.

pollytatouin avatar pollytatouin commented on July 22, 2024

Explaining the same coefficient value, I get :

$$\beta_{2FWL} = (X'_2M'_1M_1X_2)^{-1}X'_2M_1Y = (X'_2M'_1M_1X_2)^{-1}X'_2M'_1M_1Y$$

by idempotency.

The cheatsheet (CS) model implies this regression (Y is not residualized):

$$Y = M_1X_2\beta_2 + U$$

$$ min (Y-M_1X_2\beta_2)'(Y-M_1X_2\beta_2) = min (Y'Y-Y'M_1X_2\beta_2 - \beta'_2X'_2M'_1Y + \beta'_2X'_2M'_1M_1X_2\beta_2$$

$$ FOC: $$

$$ -2X'_2M'_1Y + 2X'_2M'_1M_1X_2\beta_2 = 0 $$

$$ \beta_{2CS} = (X'_2M'_1M_1X_2)^{-1}X'_2M'_1Y = (X'_2M'_1M_1X_2)^{-1}X'_2M'_1M_1Y $$

Again, the idempotency brings back all the $M_1$'s to make the expression equivalent.

For the variance though, I haven't succeeded at explaining it mathematically yet. But from the regression outputs we see that the FWL way gives a SE much closer to the true value than when using CS. I suspect the difference in SE between the true value and FWL comes from incorrect degrees of freedom (the model doesn't know other parameters were used in a first step).

from econometricslabs.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.