Comments (19)
Thanks you very much. This is great!
It works perfectly now!
from logitr.
This type of message usually means there is something wrong with the obsID
variable. That variable must be a continuously increasing numeric variable identifying the rows of each choice observation, e.g., 1,1,1,2,2,2,3,3,3, etc. Anything else will cause errors, e.g., 1,1,1,2,2,2,1,1,1,3,3,3 will cause an error because the 1 is repeated later. Can you double check your tskID
variable?
from logitr.
So this is what the data looks like.
I checked whether there is some error (more or less than two observations per choice task), but it appears to be correct.
> dat_transformed %>%
+ group_by(tskID) %>%
+ filter(n()>2) %>%summarise(n=n())
# A tibble: 0 × 2
# … with 2 variables: tskID <dbl>, n <int>
Another hint that may help understand if I am making an error, or if there is a problem with the command:
if I estimate the model with vcov=FALSE
the model is correctly computed (though not displayed)
mnl <- logitr(
+ data = dat_transformed,
+ outcome = "choice",
+ obsID = "tskID",
+ pars = c("wg",myvars),
+ modelSpace = "pref",
+ clusterID = "wrkid",
+ panelID = "wrkid",
+ numMultiStarts = 100)
Setting robust to TRUE since clusters are being used
Running multistart...
Iterations: 100
Cores: 7
Done!
> summary(mnl)
Error in rowsum.default(expV, group = obsID, reorder = FALSE) :
incorrect length for 'group'
However, if I try to include the vcov matrix in the model it gives an error while calculating the model.
> mnl_vcov <- logitr(
+ data = dat_transformed,
+ outcome = "choice",
+ obsID = "tskID",
+ pars = c("wg",myvars),
+ modelSpace = "pref",
+ clusterID = "wrkid",
+ panelID = "wrkid",
+ vcov = T,
+ numMultiStarts = 100)
Setting robust to TRUE since clusters are being used
Running multistart...
Iterations: 100
Cores: 7
Error in rowsum.default(expV, group = obsID, reorder = FALSE) :
incorrect length for 'group'
Do you still think is something that is related to the variable identifying the rows of each choice observation?
from logitr.
Okay so the vcov
part makes sense in terms of identifying the source of the error. It must be in computing the variance-covariance matrix, which also makes sense that the error happens when you call summary()
because that calls vcov()
internally to compute the standard errors.
Can you do one more check - comment out the clusterID
part and see if it still errors. This will help me understand if the error is coming from the computation of clustered standard errors or if it's something else.
from logitr.
Indeed you are correct. If I remove the clustered standard errors the command works fine! It is clearly there the issue!
from logitr.
Okay so that means this has to do with how the clustered errors are being computed. You should be able to estimate the model (with clustered errors) without any errors if you have vcov = FALSE
(the default). Then if you call vcov(model)
you'll get the error. Can you do that and post the message?
from logitr.
Apologies for the delaied reply. Here is the results and the error
> mnl_vcov <- logitr(
+ data = dat_transformed,
+ outcome = "choice",
+ obsID = "tskID",
+ pars = c("wg",myvars),
+ modelSpace = "pref",
+ clusterID = "wrkid",
+ panelID = "wrkid",
+ numMultiStarts = 100)
Setting robust to TRUE since clusters are being used
Running multistart...
Iterations: 100
Cores: 7
Done!
> vcov(mnl_vcov)
Error in rowsum.default(expV, group = obsID, reorder = FALSE) :
incorrect length for 'group'
To not however that I have run the model using the robust standard errors and I get the same error.
Is this of any help?
from logitr.
Any chance this is also related to the data errors in #30? It looks like it might be based on the errors you're getting.
from logitr.
Unfortunately, I don't think so, I believe it has something to do with the SEs, because on this data the model works fine until I do not include robust or clustered standard errors. So I am not sure what to make of it.
Any suggestion of things I could try out?
from logitr.
The issue is from computing robust standard errors. When doing this, the gradient is computed multiple times according to each cluster defined by the clusterID
variable (in your example, wrkid
). Perhaps there is something strange about that variable?
from logitr.
I believe this issue may also be related to the same bug in #30. This (I think) is resolved with the latest fixes in v0.7.1.
from logitr.
Dear jhelvy,
thank you very much for all your work and for the numerous improvements to the package you made.
I just wanted to let you know that even with the v0.7.1 I have this problem.
If I estimate a model thus and I call for the summary I get the following error: Error in rowsum.default(expV, group = obsID, reorder = FALSE) : incorrect length for 'group'
mnl_pref <- logitr::logitr(
data = df,
outcome = "choice",
obsID = "tskID",
pars = c("cost",myvars),
panelID = "wrkid",
robust = T,
numMultiStarts = 100)
Setting clusterID to 'wrkid' since robust == TRUE and a panelID is provided
Running multistart...
Iterations: 100
Cores: 7
Done!
summary(mnl_pref)
Error in rowsum.default(expV, group = obsID, reorder = FALSE) : incorrect length for 'group'
Conversely, if I comment out the robust option errors the summary function works well and I obtain the results.
I am not sure what is happening but I still believe there is an issue with the clustered standard errors.
In any case thanks a lot for the great progress.
from logitr.
Thanks for noting this - it's probably still a small bug somewhere in how I wrote the summary function when using clustered errors.
Could you try to make a similar example that creates this error using one of the package data sets, like the yogurt
data? That would help a lot in identifying the source of the issue.
from logitr.
I tried playing around with it but I was not able to produce the error I have with the real data
Compared to yogurt, my data is different because the obsID has gaps, and because not all of the ids have more than a task assigned.
data("yogurt")
random_binary <- function(n, p){
# p is the proportion of 1s
x <- c(rep(1, times=n * p), rep(0, times=n * (1 - p)))
x[sample(length(x))] # or sample(x)
}
yogurt<- yogurt %>% mutate(ran1=random_binary(nrow(yogurt),0.5),
ran2=random_binary(nrow(yogurt),0.5),
ran3=random_binary(nrow(yogurt),0.5),
ran4=random_binary(nrow(yogurt),0.5),
ran5=random_binary(nrow(yogurt),0.5))
y1 <- yogurt %>% group_by(id) %>% slice(1:4)
y2 <- yogurt %>% group_by(id) %>% slice(5:8) %>% ungroup() %>% slice(1:200)
yog <- full_join(y1,y2) %>% arrange(obsID)
head(yog,15) # this is what my data looks like
# A tibble: 15 × 12
# Groups: id [2]
id obsID alt choice price feat brand ran1 ran2 ran3 ran4 ran5
1 1 1 1 0 8.1 0 dannon 1 0 1 1 1
2 1 1 2 0 6.10 0 hiland 1 0 0 1 1
3 1 1 3 1 7.90 0 weight 1 1 0 0 0
4 1 1 4 0 10.8 0 yoplait 1 1 0 1 1
5 1 2 1 1 9.80 0 dannon 1 0 0 0 0
6 1 2 2 0 6.40 0 hiland 1 0 1 1 1
7 1 2 3 0 7.50 0 weight 0 1 0 0 1
8 1 2 4 0 10.8 0 yoplait 1 0 1 1 1
9 2 9 1 0 9.80 0 dannon 1 0 0 1 1
10 2 9 2 0 5.00 0 hiland 0 0 1 1 0
11 2 9 3 0 7.90 0 weight 0 1 0 0 1
12 2 9 4 1 10.8 0 yoplait 0 0 1 0 1
13 2 10 1 0 9.80 0 dannon 0 1 1 0 0
14 2 10 2 0 5.00 0 hiland 0 1 0 0 0
15 2 10 3 0 7.90 0 weight 1 0 0 0 1
However estimating a model with robust standard errors on this data does not give errors.
mnl_pref <- logitr::logitr(
data = yog,
outcome = "choice",
obsID = "obsID",
pars = c("price", "feat", "brand","ran1","ran2","ran3","ran4","ran5"),
panelID = "id",
robust = T,
clusterID = "id",
numMultiStarts = 100)
summary(mnl_pref)
Note that I tried to re-create the obsID variable for my real data to make it increase incrementally without gaps and I still experience the same error.
Note also that I obtain the same error in the estimation phase if I include vcov=T
option
from logitr.
So I figured out how to create a small dataset that reproduces the same error
First create the following data
data <- structure(list(id = structure(c(1, 1, 8, 8, 12, 12, 13, 13,
13, 13, 14, 14, 16, 16, 42, 42, 47, 47, 51, 51, 53, 53, 55, 55,
57, 57, 62, 62, 63, 63, 65, 65, 69, 69, 71, 71, 77, 77, 78, 78,
79, 79, 83, 83, 84, 84, 90, 90, 95, 95, 95, 95, 96, 96, 100,
100, 102, 102, 104, 104, 107, 107, 108, 108, 108, 108, 112, 112,
113, 113, 114, 114, 118, 118, 130, 130, 132, 132, 139, 139, 139,
139, 141, 141, 146, 146, 147, 147, 147, 147, 153, 153, 167, 167,
169, 169, 170, 170, 176, 176), format.stata = "%12.0g"), tskID = structure(c(2,
2, 23, 23, 35, 35, 37, 37, 39, 39, 40, 40, 48, 48, 126, 126,
139, 139, 153, 153, 157, 157, 163, 163, 171, 171, 186, 186, 188,
188, 193, 193, 205, 205, 213, 213, 230, 230, 233, 233, 235, 235,
248, 248, 251, 251, 268, 268, 284, 284, 285, 285, 287, 287, 298,
298, 306, 306, 311, 311, 320, 320, 322, 322, 323, 323, 335, 335,
339, 339, 341, 341, 352, 352, 388, 388, 395, 395, 415, 415, 416,
416, 421, 421, 438, 438, 439, 439, 440, 440, 459, 459, 501, 501,
506, 506, 508, 508, 528, 528), format.stata = "%10.0g"), choice = structure(c(1,
0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0,
1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0,
1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1,
1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1,
0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0), format.stata = "%10.0g"),
cost = structure(c(16.3, 15.6, 15.4, 15.1, 16.5, 16.4, 16.7,
15.6, 15.8, 16.5, 15.5, 15.9, 15.7, 16.4, 15.1, 15.9, 16.6,
15, 16.1, 16.3, 15.7, 17.9, 16.2, 15.3, 16.4, 16.5, 17.1,
17.3, 15.2, 16.1, 16.5, 16.9, 15, 16.8, 16.5, 17.4, 15.2,
15, 16.4, 15.4, 16.8, 16.1, 16.1, 15.4, 16.8, 17.8, 16.6,
17.6, 15.6, 15.7, 16.3, 16.3, 16, 17.3, 17.4, 17.9, 15, 17.6,
15.4, 16.2, 15.6, 17.4, 17.8, 15.7, 17.2, 17.1, 15.9, 17.5,
17, 16.4, 15.2, 16.3, 15.3, 15.2, 16.3, 15.6, 15.2, 15.7,
15.5, 15.7, 16.2, 15.6, 17.1, 16.5, 15.1, 16.8, 16.6, 16.3,
15.3, 15.9, 16.1, 16.3, 17.9, 15.9, 15.4, 16.1, 16.4, 15,
17.7, 15.8)),
var1 = structure(c(0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1,
1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0,
0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0,
0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0,
1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1)),
var2 = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1,
0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0,
0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0,
1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0,
0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1)),
var3 = structure(c(0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1,
1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0,
0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0,
0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0))), row.names = c(NA,
-100L), class = c("tbl_df", "tbl", "data.frame"))
Second try to estimate the a MNL with robust SEs.
mnl_pref <- logitr::logitr(
data = data,
outcome = "choice",
obsID = "tskID",
pars = c("cost","var1","var2","var3"),
panelID = "id",
robust = T,
numMultiStarts = 100)
Setting clusterID to 'id' since robust == TRUE and a panelID is provided
Running multistart...
Iterations: 100
Cores: 7
Done!
summary(mnl_pref)
Error in rowsum.default(expV, group = obsID, reorder = FALSE) :
incorrect length for 'group'
```
The command correctly estimates it, but it encounters problems with displaying the summary.
Note that if you comment out the robust option the summary function works correctly!
Is there something I can do to overcome the problem?
from logitr.
I am almost certain that the issue is in how the ID variables are set. As it is currently written, the package needs IDs that are in sequentially increasing order, like 1,1,2,2,3,3, etc. The gaps in the IDs are probably the issue. You could test this by taking your demo dataset and over-writing the IDs with a sequentially ID, then see if it works. If that works, then we at least know the source of the problem.
This shouldn't be something that happens though in terms of how I would want the package to work. As long as the IDs are unique, being in a sequential order shouldn't matter, e.g. 1,1,8,8,13,13, should work just the same. So if that is indeed the problem, then I'll try to add a fix to make the package work even if the IDs have gaps.
from logitr.
I tried to follow your suggestion, but unfortunately it still gives the same error
data<- data %>%
dplyr::group_by(id) %>%
dplyr::mutate(group_id = cur_group_id())
mnl_pref <- logitr::logitr(
data = data,
outcome = "choice",
obsID = "tskID",
pars = c("cost","var1","var2","var3"),
panelID = "group_id",
robust = T,
numMultiStarts = 100)
summary(mnl_pref)
Error in rowsum.default(expV, group = obsID, reorder = FALSE) :
incorrect length for 'group'
Is it possible that it also relates to the fact that some individuals perform a different number of tasks?
from logitr.
I believe I have found the source of the issue. I am working to add better checks to all of the variable IDs while also addressing this bug.
from logitr.
Okay, I think this is now corrected. Can you install the latest version (0.7.2) from Github and test it for me?
# install.packages("remotes")
remotes::install_github("jhelvy/logitr")
The issue has persisted this whole time and it simply had to do with the particular case where a cluster had only one observation. In these instances, when computing the clustered standard errors the data had only one row and was no longer being stored as a matrix, so I had to coerce it back to a matrix for the math operations to correctly execute.
from logitr.
Related Issues (20)
- Incorporate proper panel log-likelihood calculations HOT 3
- Predicted probability labeling HOT 3
- accessing intercept value HOT 1
- Missmatch of clustered standard errors between logitr and Stata clogit HOT 18
- Feature enhancement: exporting to latex logitr results HOT 11
- Using logitr to estimate heterogeneity in customers time-preferences HOT 4
- Error in X_chosen[data$obsID, ] : subscript out of bounds HOT 11
- Release logitr 0.6.0 HOT 1
- Release logitr 0.7.0
- Predicting out-of-sample tasks of known individuals - Mixed logit on panel data HOT 3
- Release logitr 0.8.0
- Predicted names don't match HOT 12
- Release logitr 1.0.0
- Is there a requirement of the amount of the data for the logitR function to predict? HOT 3
- effects coding HOT 4
- Usage help HOT 2
- Individual part-worth estimates HOT 2
- Inconsistent Estimator? HOT 29
- Improve obsID error messaging
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from logitr.