Update:
I found a way to pull the coefficients for the model. I still can't pull the correct coefficients using the method I originally asked about.
Method that works, in case it helps someone else reading this:
- Variables in the model:
lmmovies$namesx[lmBPM$bestmodel + 1]
(there are multiple ways of getting these)
- Coefficients for the variables:
lmmovies$mle[lmBPM$best]
To confirm I had pulled the correct information, I used this information to predict the first record, and it matched the fitted value from the model for that record.
- Observations for each variable (factors converted to 1/0 by the model) for the first record:
lmmovies$X[1, ]
- Because the model is centered, the sample mean for each variable is needed:
lmmovies$mean.x
- Model shrinkage is also needed, if shrinkage isn't equal to 1:
lmmovies$shrinkage[lmBPM$best]
There's some matching up/data prep to be done to get ready to do the math, which I'm not including here because I'm now working on a different model than the one shown below and the code wouldn't work.
For each variable, calculate: a = (coefficient * shrinkage) * (observation - sample mean)
Then, Y = Intercept + sum(a)
Original post:
I am using the BAS package to do Bayesian multiple regression modeling for the Coursera course from Duke University, "Bayesian Statistics."
I am experiencing what I think is a bug. When I pull the mean values for the coefficients for the BPM, I have been assuming that:
- there will be non-zero values for the variables included in the BPM model's best.vars list.
- there will be zero values for the variables not included in that list.
That isn't what I'm experiencing, however.
I'm using BAS version 1.5.5 and statsr version 0.2.1 in R version 4.0.2. My OS is Mac OSX Catalina 10.15.5.
Here's sample code that reproduces what I'm seeing.
library(dplyr)
library(statsr)
library(BAS)
load("movies.Rdata")
# data set for modeling
dsmodeling <- movies %>%
select(audience_score, genre, thtr_rel_year,
critics_score, best_pic_nom, best_pic_win, best_actor_win,
best_actress_win, best_dir_win)
# bas object
lmmovies <- bas.lm(
formula = audience_score ~ ., data = dsmodeling, prior = "BIC", modelprior = uniform())
# pred.bas object
lmBPM = predict(lmmovies, estimator = "BPM", se.fit = TRUE)
##### Comparing the bas and pred.bas objects
### pred.bas object
# BPM model: its model #
vmodel = lmBPM$best
vmodel
# explanatory variable names
lmBPM$best.vars
# intercept & explanatory variable #s
lmBPM$bestmodel
### bas object
# intercept & explanatory variable #s for [vmodel]th model
# these match what I get from lmBPM$bestmodel
lmmovies$which[vmodel]
### get mean values of the coefficients for the [vmodel]th model
coefBPM <- coef(lmmovies)$conditionalmeans[vmodel,]
coefBPM <- as.data.frame(coefBPM)
colnames(coefBPM) <- "Posterior Mean Coefficient"
coefBPM
I noticed the same behavior in the example of this procedure given in the textbook for this course, the version last built on 2020-06-01, on pp. 191-192. On p. 191, the list of variables for the BPM is pulled. On p. 192, the coefficients are pulled. As an example of what I mean, the variable Pop has a non-zero coefficient, however, it is not in the list of coefficients. The variable M, on the other hand, is on the list of coefficients but has a zero coefficient.