masurp / specr Goto Github PK
View Code? Open in Web Editor NEWConducting and Visualizing Specification Curve Analyses
Home Page: https://masurp.github.io/specr
License: GNU General Public License v3.0
Conducting and Visualizing Specification Curve Analyses
Home Page: https://masurp.github.io/specr
License: GNU General Public License v3.0
Still room for improvement in this regard.
At the moment, we use tidyr::broom
to summarize the model statistics, which is a good starting point. However, it would be great if this would be the standard procedure and run_specs()
would further allow to customize the output.
Currently, run_specs only uses glm()
to estimate the relevant statistics. It should be possible to run any type of model by providing a "model function", e.g.;
mymodel <- function(df, formula){
glm(formula, data = df, family = "binomial")
}
to run_specs()
or even several different model functions.
Hi there,
I am trying to run specifications for all possible combinations of covariates. Currently I only get specifications for each individual covariate and then an additional one that has all covariates. I would like to look at all possible subsets as well.
As a workaround, I have generated all combinations and pasted them into model formula strings using helper code I wrote myself (e.g.: 'cov1 + cov2 + cov3', 'cov1 + cov2', 'cov1 + cov2', etc.). When passing this to the covariates arguments it seems to calculate the respective models. However, there are two issues: 1) It crashes once I use a high number of model specifications (e.g. >4000). 2) In the plot it doesn't correctly show which covariates were used. Instead, it always just colors the "all_covariates" option.
e.g.
covariates <- c("airport_dist", "conservative", 'male', 'age', 'popdens',
'manufact', 'tourism', 'academics', 'medinc', 'healthcare',
'y_centroid_county', 'evanrate')
spec_results <- run_specs(df = df_us_slope_prev,
y = c("onset_prev"),
x = c("pers_e"),
model = c("lm"),
controls = covariates)
dim(spec_results)
gives me dimensions of 14x12 but it shoould be 4095x12 if all combinations were considered.
Guidance would be greatly appreciated!
I'm using fixest::feols to run a fixed effects regression, but the "no covariates" row doesn't show up.
library(dplyr)
library(fixest)
library(specr)
test_formula <- function(formula,data) {
formula <-as.formula(paste0(formula, "|group1")) # fixed effects for group1
feols(formula,data)
}
results <- run_specs(
df = example_data,
y = c("y1", "y2"),
x = c("x1", "x2"),
model = 'test_formula',
controls = c("c1", "c2")
)
plot_specs(results)
I notice that it works if the regression formula includes a variable before the fixed effects section:
test_formula2 <- function(formula,data) {
formula <-as.formula(paste0(formula, "+ c3 |group1")) # control for c3 in all specifications
feols(formula,data)
}
results2 <- run_specs(
df = dplyr::mutate(example_data, c3 = runif(dplyr::n())),
y = c("y1", "y2"),
x = c("x1", "x2"),
model = 'test_formula2',
controls = c("c1", "c2")
)
plot_specs(results2)
For my analysis, there's a certain control variable which I want to be included in all specifications.
This is very easy to do after the fact by filtering for only rows where it appears:
specr_results %>%
filter(str_detect(string = controls, pattern = fixed("important_control_var")))
When specr_results
is small, this is ok. However, when specr_results
is large, making it actually estimate the models without this crucial control variable is redundant. I thought to use setup_specs
to setup my estimation tibble, filter that and then run_specs
on that, thus reducing computation time immensely.
However, it seems that setup_specs
is for illustration purposes only(?). There's no way that I see to pass on a setup_specs
tibble to run_specs
.
If I'm right, I suggest that you add an option to run_specs
to get a setup_specs
object as input OR to get x,y, controls etc.
P.S
Thanks for a great package!
Hi!
I work with panel data and have therefore specified a few different custom functions using the PLM package. I input these as the models in run_specs() and it works just fine.
Example:
within.plm.func <- function(formula, data) {
plm(formula = formula,
data = data,
model = "within")
}
I am, however, interested in running a negative binomial panel data analysis, and for that I need the PGLM package where I can specify the family = negbin, also.
Example: within.pglm.func <- function(formula, data) {
pglm(formula = formula,
data = data,
model = "within",
family = negbin)
}
When I use this function in run_specs() I get the following error:
Error: Problem with `mutate()` input `coefs`.
x No tidy method recognized for this list.
i Input `coefs` is `map(.data$res, broom::tidy, conf.int = TRUE, conf.level = conf.level)`.
Run `rlang::last_error()` to see where the error occurred.
In addition: Der var 50 eller flere advarsler (brug warnings() for at se den første 50).
I have tried to install the development package, but the error persists.
If you know what causes it or how to fix it, I would greatly appreciate any input.
Best regards,
Toric
When the control set is a singleton, run_specs() outputs duplicates.
library(dplyr)
results <- run_specs(
df = example_data,
y = c("y1"),
x = c("x1"),
model = c("lm"),
controls = c("c1")
# controls = c("c1","c2")
)
plot_specs(results,choices=c("x","y","controls"))
distinct(results,across(x:subsets)) # grab distinct rows
My guess is that it's doing the combination category (ie, c1+c1), which is the same as c1 by itself.
If you define multiple subset variables, you should get separate specs for (a) none, (b) individual subsets, and (c) all combinations of subsets.
Hi,
Thanks for the wounderful specr package - I really love the plot and its compatibility with custom model functions so I am very much looking forward to using specr in my forthcoming papers!
However, I am currently running into problems when specifying my custom model function. When trying to run a specification analysis with an underlying spatial econometrics model from the splm packe, the run_spec() function returns the following error message:
Error (Test.R#216): Problem with 'mutate()' input 'res'.
x $ operator is invalid for atomic vectors
i Input 'res' is 'map2(.data$model, formula, ~do.call(.x, list(data = df, formula = .y)))'.
Here a reproducible example:
library(plm)
library(splm)
library(specr)
data(Produc, package = "plm")
data(usaww, package = "splm")
spml_specr <- function (formula, data,...){
spml(formula = formula, data = data, listw=mat2listw(usaww),
effect = "twoways", model = "within")
}
results <- run_specs(df=Produc,
y=c("gsp", "log(gsp)"),
x= c("pcap","log(pcap)"),
model = "spml_specr",
controls = c("log(pc)","log(emp)","unemp"))
In my debugging attempts, I have already specifyied a costum tidy function and I also re-installed the specr package with all dependencies. However, the error persists. I really would like to adjust my costum model function spml_specr() to connect spml() with specr. However, I just can't figure out the reason for the error message. The run_specr() function is impressive, but I am not so familiar with pipe operators and the map() functions to be able to trace the origin of the error.
Can anybody please give me a hint? I would appreciate it very much. Thanks a lot!
Thank you once again for building this great package. I have read the Alternative way to visualize specification results section, but I want to know how to change the colours in type (i.e. curve, results, and samplesizes) as part of the 'standard' output plot.
For curve and results, I did + scale_color_manual(values = c("#HEXCODE"))
which worked great. However, for samplesize, although it looks like a geom_bar, I am not sure why scale_fill_manual (or equally scale_color_manual) isn't working? It just returns the default grey colour.
Currently, run_specs
does not appear to support specification of models with random effects (i.e. glmer
or lmer
models). I note that since #3, run_specs
supports customisable glm
s, but would love to see a random effects parameter added to run_specs
, so e.g. the following usage would be possible:
my_glmer <- function(formula,data){
glmer(formula, data = data, family = binomial)
}
results <- run_specs(df = example_data,
y = c("y1", "y2"),
x = c("x1", "x2"),
model = c("lm"),
controls = c("c1", "c2"),
random_groups = c("group1"),
random_variance_components = c("1","x1","c1"),
subsets = list(group1 = unique(example_data$group1),
group2 = unique(example_data$group2)))
where random_groups
specified the level 2 grouping variables, and random_variance_components
specified whether to specify randomly varying intercepts only, or slopes also for listed variables.
When running the example:
results<- run_specs(df = example_data,
y = c("y1", "y2"),
x = c("x1", "x2"),
model = c("lm", "lm_gauss"),
controls = c("c1", "c2"),
subsets = list(group1 = unique(example_data$group1),
group2 = unique(example_data$group2)))
I get the following error:
Error in get(as.character(..2)) : object '1' not found
This occurs with my own code and with the example in the vignette.
I have a model called PQP
This code:
run_specs(df = db$db1,
y = c("y"),
x = c("x_000"),
model = "PQP"
) -> a
runs with no errors.
Now I enlist PQP
in the list models
.
This code:
run_specs(df = db$db1,
y = c("y"),
x = c("x_000"),
model = "models$PQP"
) -> a
gives back this error:
Error in `dplyr::mutate()`:
! Problem while computing `res = map2(.data$model, formula,
~do.call(.x, list(data = df, formula = .y)))`.
Caused by error in `models$PQP`:
! could not find function "models$PQP"
UPDATE:
TimTeaFan provided a debug:
get_model <- function(x) {
x_str <- str2lang(x)
if (is.name(x_str)) {
return(x)
} else if (is.call(x_str)) {
eval(x_str)
}
}
Then specr:::run_spec
is corrected like this:
specs %>%
dplyr::mutate(formula = pmap(.,
specr:::create_formula)
) %>% tidyr::unnest(formula) %>%
dplyr::mutate(res = map2(model,
formula,
~ do.call(get_model(.x), list(data = df, formula = .y))))
Is there a way to specify the inclusion of weights in the specifications?
I'm replying on cross-national survey data (European Social Survey) where combined post-stratifcation and country population weight is required to produce representative results.
Thank you for your work on this package and sharing it with everyone.
I'm running in to an issue where I have a long list of control variables, which causes the output of plot_specs()
to be cut off and uninterpretable. See reprex below, but imagine that there at least 20 more covariates after the 3 listed. Would it be possible to rename this long label for controls as simply "all covariates", to correspond with the label "no covariates"?
library(tidyverse)
library(specr)
results_test <-
run_specs(iris,
x = c("Sepal.Length"),
y = c("Sepal.Width"),
controls = c("Petal.Length", "Petal.Width", "Species"),
model = "lm")
plot_specs(results_test)
Created on 2022-08-15 by the reprex package (v2.0.1)
Hi there,
some authors suggest to run an additional inference statistical test on the distribution of effect sizes and test statistics across specification curves. E.g. mean/median effect size different from zero, share of significant results higher than what would be expected under H0, average test statistic different from what would be expected under H0.
Source: Simonsohn, Uri, Joseph P. Simmons, and Leif D. Nelson. “Specification Curve Analysis.” Nature Human Behaviour 4, no. 11 (November 2020): 1208–14. https://doi.org/10.1038/s41562-020-0912-z.
What is your opinion on this? Would it make sense to implement (some of) these tests in specr? Can you suggest any current workarounds? Your response would be greatly appreciated.
Best wishes,
Heinrich
run_specs()
currently uses only one core. In the long run, we should aim at making it as fast as possible (given the large numbers of specifications that one usually wants to estimate).
I want to use weights and clustering using fixest::feols, but this uses a comma in the formula, which seems to break specr.
library(dplyr)
library(fixest)
example_data <- example_data %>%
mutate(weight = runif(n()))
feols(y1~x1+x2 | group2, weights=~weight, data=example_data)
feols(y1~x1+x2 | group2, cluster=~group1, data=example_data)
feols(y1~x1+x2 | group2, cluster=~group1, weights=~weight, data=example_data) # these work
fe <- function(formula,data){
formula <- as.formula(paste0(formula, "|group2"))
feols(formula,data)
}
cluster_weight <- function(formula,data) {
formula <- as.formula(paste0(formula, "|group2,cluster=~group1,weights=~weight"))
feols(formula,data)
}
results_fe <- run_specs(df = example_data,
y = c("y1", "y2"),
x = c("x1", "x2"),
model = c("fe"),
controls = c("c1", "c2"))
plot_specs(results_fe) # this works
results_cw <- run_specs(df = example_data,
y = c("y1", "y2"),
x = c("x1", "x2"),
model = c("cluster_weight"),
controls = c("c1", "c2"))
Hello,
I'm running specr with the np package and encountering an issue.
Here's the code
kde.reg <- function(formula, data){
bw <- npregbw(formula=formula,
data = data,
nmulti = 5,
ckertype = "epanechnikov", ukertype = "liracine",
regtype="lc", bwmethod="cv.aic")
npreg(bw)
}
test_specs <- setup(data = test_df,
y = c("y1", "y2"),
x = c("x1", "x2", "x3"),
model = "kde.reg"
)
summary(test_specs, rows = 50)
test_results <- specr(test_specs)
I've attached a snippet of the code error below for context. I'm familiar with the error message and have solved it before by selecting the data for the explanatory variables. However, in specr, selecting inside of the model function causes errors of its own. Would this mean that specr isn't compatible with np or is there a way around this issue?
UPDATE: I've narrowed down the issue. In spec.R, at line 182, when it calls kde.reg
the formula it passed in (formula = structure("y1 ~ x2 + 1", class = c("glue", "character"))
) seems to be causing an error. I recreated the error by calling the npregbw
with this formula format. When I replaced it with y1 ~ x2 + 1
as a standalone formula, the error was resolved. Even when including the structure(list...))
argument that spec.R
creates. Although, I'm still unclear on how to resolve this issue.
UPDATE AGAIN: I was able to resolve the error by defining formula like formula=as.formula(formula)
, since specr defines it as a character when passed into the function. However, I'm running into a tidy error now. Working with tidy on that, but won't delete this for now.
Hello,
Incredible tool -- thank you very much.
A question about interactions. I am having trouble with specifying an interaction of a control with an x variable. If I include the "x_var:control1" in the controls specification, it removes either that specific x_var or the specific controls (even the non-interactive "control1"). It looks like I can include it in the add to formula, but then it includes it across all specifications. Is there some way to specify this in the specs()?
Thank you. If this is not clear, I can make try to make a reproducible example.
The run_specs()
function should include some sort of progress visualization. Otherwise, one does not know whether the function is working or not.... ;)
Thanks for this great package! I understand that the charts are based on ggplot2 defaults - but would it be possible to shift the choices to be displayed together with their levels on the left? Given how closely they belong together, that would seem to make it much more readable, as in the example below (unfortunately created with base R plots, code here?
This is probably a dumb question but how does specr handle dummy coded variable for regression analyses. For example, let's say I dummy coded ethnicity into several new variables and the biggest group as the reference category (coded as 0). How does specr
treat this when treating it as a covariate.
df <- df %>%
mutate(
ethnicity_white_european_heritage = ifelse(ethnicg == 1, 1, 0),
ethnicity_black_caribbean_heritage = ifelse(ethnicg == 2, 1, 0),
ethnicity_black_african_heritage = ifelse(ethnicg == 3, 1, 0),
ethnicity_any_other_ethnic_minority = ifelse(ethnicg == 4, 1, 0),
ethnicity_indian = ifelse(ethnicg == 5, 1, 0),
ethnicity_pakistani = ifelse(ethnicg == 6, 1, 0),
ethnicity_bangladeshi = ifelse(ethnicg == 7, 1, 0),
ethnicity_mixed_race = ifelse(ethnicg == 8, 1, 0)
)
There's no scenario when you would only consider 1/2/3/4 of the n number of new ethnicity variables on their own as covariates. All the dummy coded variables would have to be input together right?
Hi!
I am interested in running a specification curve based on a multilevel model in which I use the within centered x variable as well as the cluster mean. Therefore I would like for specr to treat my two variables: within_centered and cluster_mean as x´s that are always present together.
Looking at the response to issue 11 I figured that the below code might do the trick.
results <- run_specs(df = centered,
y = c("y"),
x = c("within_centered + cluster_mean"),
model = c("lmer_ri_1"),
controls = c(var1, var2, var3),
keep.results = T)
The results of this code is however an empty dataframe.
Do you know why? And is it possible to do what I am intending?
Best regards,
Toric
speccurve for Stata allows you to order the plot by specification, instead of coefficient size:
This would be a nice feature to have.
Hi there,
When I try to run through the analysis in the specr package using the example_data
or my own data, everything is fine until I try the run_specs
command, in which I get the following error message:
> results <- run_specs(df = example_data,
+ y = c("y1", "y2"),
+ x = c("x1", "x2"),
+ model = c("lm", "lm_gauss"),
+ controls = c("c1", "c2"),
+ subsets = list(group1 = unique(example_data$group1),
+ group2 = unique(example_data$group2)))
Error: Problem with `mutate()` input `..1`.
✖ Column `obs` not found in `.data`
ℹ Input `..1` is `.data$obs`.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/dplyr_error>
Problem with `mutate()` input `..1`.
✖ Column `obs` not found in `.data`
ℹ Input `..1` is `.data$obs`.
Backtrace:
1. specr::run_specs(...)
33. dplyr:::stop_error_data_pronoun_not_found(...)
34. dplyr:::stop_dplyr(index, dots, fn, problem = msg, .dot_data = TRUE)
>
If you have any idea why this error is coming up or how to solve the issue that would be greatly appreciated. Thanks!
Erin
When I'm considering which control variables to add, I'm often considering one set or another, not one variable at a time. Would you consider a syntax that supports sets of controls?
Another approach would be to consider every possible subset of controls
, but the number of possible combinations increases pretty fast.
Here's an example:
library(specr)
# Currently, this code considers models with c1, c2, and c3 individually,
# no controls, and all three together.
results <- run_specs(
df = dplyr::mutate(example_data, c3=runif(dplyr::n())),
y = c("y1", "y2"),
x = c("x1", "x2"),
model = c("lm"),
controls = c("c1", "c2", "c3")
)
dplyr::distinct(results, controls)
#> # A tibble: 5 x 1
#> controls
#> <chr>
#> 1 c1 + c2 + c3
#> 2 c1
#> 3 c2
#> 4 c3
#> 5 no covariates
# But what if these models with only one control variable are
# not in my set of "all plausible specifications"?
# Maybe I would consider c1 + c2 or c1 + c3.
results2 <- run_specs(
df = dplyr::mutate(example_data, c3=runif(dplyr::n())),
y = c("y1", "y2"),
x = c("x1", "x2"),
model = c("lm"),
controls = list(c("c1", "c2"), c("c1", "c3"))
)
#> Error in model.frame.default(formula = "y1 ~ x1 + c(\"c1\", \"c2\") + c(\"c1\", \"c3\")", : variable lengths differ (found for 'c("c1", "c2")')
As an alternative visualization/analysis, provide variance components for the specs and the variability of a predefined parameter.
Hi there,
I am trying to run a specification curve analysis on a predefined list of specifications that I generate using my own code. This list is a subset of the combinations of a very large set of variables. Theoretically, I could run analyses for my list of specifications by running all combinations and then subsetting the output to include only the relevant specifications, but I think that this is computationally prohibitive. Ideally, I would like to use a dataset of the same format as the output of the setup_specs() function as input for the run_specs() function. This would provide more control with respect to the list of specifications that are run. Is this currently possible or should I fork and modify the code?
I am thinking about something like this:
df_specs <- my_own_setup_specs(...)
run_specs(df_specs)
Guidance would be much appreciated!
Not sure if this is truly an issue or just a limitation. I can't seem to run a specr analysis with independent variables that are categorical. Tried using as.factor and as.character, none works. I tried two different datasets to be sure. Is it a limitation or am I missing something?
Hello, love the package so far - thanks for your work on it!
I want to add an interaction term to my specification curve analysis. How do I do this?
My current set-up is as follows.
setup(data = dat,
x = c("lonely_mostly_or_not", "hinctnta"),
y = c("mean_p_eff_cgg_dpr_z", "mean_p_eff_dpr_z"),
model = c("lmer"),
controls = c("agea", "icgndra", "jbexpnt", "health_problems"),
add_to_formula = "(1 | nuts1)")
I want to test the interaction between lonely_mostly_or_not
and hinctnta
. I have tried specifying it as both an x
variable (lonely_mostly_or_not * hnctnta
) and I've tried to add it to the add_to_formula
argument (lonely_mostly_or_not * hnctnta + (1 | nuts1)
). Neither seemed to work, and I didn't see anything about this in the documentation (but I am probably missing something obvious!).
Thanks for your help!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.