Giter Site home page Giter Site logo

Comments (19)

Ales-G avatar Ales-G commented on June 18, 2024 1

Thanks you very much. This is great!
It works perfectly now!

from logitr.

jhelvy avatar jhelvy commented on June 18, 2024

This type of message usually means there is something wrong with the obsID variable. That variable must be a continuously increasing numeric variable identifying the rows of each choice observation, e.g., 1,1,1,2,2,2,3,3,3, etc. Anything else will cause errors, e.g., 1,1,1,2,2,2,1,1,1,3,3,3 will cause an error because the 1 is repeated later. Can you double check your tskID variable?

from logitr.

Ales-G avatar Ales-G commented on June 18, 2024

So this is what the data looks like.
Screenshot 2022-05-19 at 11 41 26

I checked whether there is some error (more or less than two observations per choice task), but it appears to be correct.

> dat_transformed %>% 
+   group_by(tskID) %>% 
+   filter(n()>2) %>%summarise(n=n())
# A tibble: 0 × 2
# … with 2 variables: tskID <dbl>, n <int>

Another hint that may help understand if I am making an error, or if there is a problem with the command:

if I estimate the model with vcov=FALSE the model is correctly computed (though not displayed)

mnl <- logitr(
+   data           = dat_transformed,
+   outcome        = "choice",
+   obsID          = "tskID",
+   pars           = c("wg",myvars),
+   modelSpace     = "pref",
+   clusterID      = "wrkid",
+   panelID        = "wrkid",
+   numMultiStarts = 100)
Setting robust to TRUE since clusters are being used
Running multistart...
  Iterations: 100
  Cores: 7
Done!
> summary(mnl)
Error in rowsum.default(expV, group = obsID, reorder = FALSE) : 
  incorrect length for 'group'

However, if I try to include the vcov matrix in the model it gives an error while calculating the model.

> mnl_vcov <- logitr(
+   data           = dat_transformed,
+   outcome        = "choice",
+   obsID          = "tskID",
+   pars           = c("wg",myvars),
+   modelSpace     = "pref",
+   clusterID      = "wrkid",
+   panelID        = "wrkid",
+   vcov           = T,
+   numMultiStarts = 100)
Setting robust to TRUE since clusters are being used
Running multistart...
  Iterations: 100
  Cores: 7
Error in rowsum.default(expV, group = obsID, reorder = FALSE) : 
  incorrect length for 'group'

Do you still think is something that is related to the variable identifying the rows of each choice observation?

from logitr.

jhelvy avatar jhelvy commented on June 18, 2024

Okay so the vcov part makes sense in terms of identifying the source of the error. It must be in computing the variance-covariance matrix, which also makes sense that the error happens when you call summary() because that calls vcov() internally to compute the standard errors.

Can you do one more check - comment out the clusterID part and see if it still errors. This will help me understand if the error is coming from the computation of clustered standard errors or if it's something else.

from logitr.

Ales-G avatar Ales-G commented on June 18, 2024

Indeed you are correct. If I remove the clustered standard errors the command works fine! It is clearly there the issue!

from logitr.

jhelvy avatar jhelvy commented on June 18, 2024

Okay so that means this has to do with how the clustered errors are being computed. You should be able to estimate the model (with clustered errors) without any errors if you have vcov = FALSE (the default). Then if you call vcov(model) you'll get the error. Can you do that and post the message?

from logitr.

Ales-G avatar Ales-G commented on June 18, 2024

Apologies for the delaied reply. Here is the results and the error

> mnl_vcov <- logitr(
+    data           = dat_transformed,
+      outcome        = "choice",
+      obsID          = "tskID",
+      pars           = c("wg",myvars),
+      modelSpace     = "pref",
+      clusterID      = "wrkid",
+      panelID        = "wrkid",
+      numMultiStarts = 100)
Setting robust to TRUE since clusters are being used
Running multistart...
  Iterations: 100
  Cores: 7
Done!
> vcov(mnl_vcov)
Error in rowsum.default(expV, group = obsID, reorder = FALSE) : 
  incorrect length for 'group'

To not however that I have run the model using the robust standard errors and I get the same error.

Is this of any help?

from logitr.

jhelvy avatar jhelvy commented on June 18, 2024

Any chance this is also related to the data errors in #30? It looks like it might be based on the errors you're getting.

from logitr.

Ales-G avatar Ales-G commented on June 18, 2024

Unfortunately, I don't think so, I believe it has something to do with the SEs, because on this data the model works fine until I do not include robust or clustered standard errors. So I am not sure what to make of it.
Any suggestion of things I could try out?

from logitr.

jhelvy avatar jhelvy commented on June 18, 2024

The issue is from computing robust standard errors. When doing this, the gradient is computed multiple times according to each cluster defined by the clusterID variable (in your example, wrkid). Perhaps there is something strange about that variable?

from logitr.

jhelvy avatar jhelvy commented on June 18, 2024

I believe this issue may also be related to the same bug in #30. This (I think) is resolved with the latest fixes in v0.7.1.

from logitr.

Ales-G avatar Ales-G commented on June 18, 2024

Dear jhelvy,
thank you very much for all your work and for the numerous improvements to the package you made.
I just wanted to let you know that even with the v0.7.1 I have this problem.
If I estimate a model thus and I call for the summary I get the following error: Error in rowsum.default(expV, group = obsID, reorder = FALSE) : incorrect length for 'group'

mnl_pref <- logitr::logitr(
  data           = df,
  outcome        = "choice",
  obsID          = "tskID",
  pars           = c("cost",myvars),
  panelID        = "wrkid",
  robust         = T,
  numMultiStarts = 100)

Setting clusterID to 'wrkid' since robust == TRUE and a panelID is provided
Running multistart...
  Iterations: 100
  Cores: 7
Done!

summary(mnl_pref)

Error in rowsum.default(expV, group = obsID, reorder = FALSE) :  incorrect length for 'group'

Conversely, if I comment out the robust option errors the summary function works well and I obtain the results.
I am not sure what is happening but I still believe there is an issue with the clustered standard errors.

In any case thanks a lot for the great progress.

from logitr.

jhelvy avatar jhelvy commented on June 18, 2024

Thanks for noting this - it's probably still a small bug somewhere in how I wrote the summary function when using clustered errors.

Could you try to make a similar example that creates this error using one of the package data sets, like the yogurt data? That would help a lot in identifying the source of the issue.

from logitr.

Ales-G avatar Ales-G commented on June 18, 2024

I tried playing around with it but I was not able to produce the error I have with the real data
Compared to yogurt, my data is different because the obsID has gaps, and because not all of the ids have more than a task assigned.

data("yogurt")
random_binary <- function(n, p){
  # p is the proportion of 1s
  x <- c(rep(1, times=n * p), rep(0, times=n * (1 - p)))
  x[sample(length(x))] # or sample(x)
}
yogurt<- yogurt %>% mutate(ran1=random_binary(nrow(yogurt),0.5),
                           ran2=random_binary(nrow(yogurt),0.5),
                           ran3=random_binary(nrow(yogurt),0.5),
                           ran4=random_binary(nrow(yogurt),0.5),
                           ran5=random_binary(nrow(yogurt),0.5))

y1 <- yogurt %>% group_by(id) %>% slice(1:4)
y2 <- yogurt %>% group_by(id) %>% slice(5:8) %>% ungroup() %>% slice(1:200)
yog <- full_join(y1,y2) %>% arrange(obsID)


head(yog,15) # this is what my data looks like 
# A tibble: 15 × 12
# Groups:   id [2]
id obsID   alt choice price  feat brand    ran1  ran2  ran3  ran4  ran5
 1     1     1     1      0  8.1      0 dannon      1     0     1     1     1
 2     1     1     2      0  6.10     0 hiland      1     0     0     1     1
 3     1     1     3      1  7.90     0 weight      1     1     0     0     0
 4     1     1     4      0 10.8      0 yoplait     1     1     0     1     1
 5     1     2     1      1  9.80     0 dannon      1     0     0     0     0
 6     1     2     2      0  6.40     0 hiland      1     0     1     1     1
 7     1     2     3      0  7.50     0 weight      0     1     0     0     1
 8     1     2     4      0 10.8      0 yoplait     1     0     1     1     1
 9     2     9     1      0  9.80     0 dannon      1     0     0     1     1
10     2     9     2      0  5.00     0 hiland      0     0     1     1     0
11     2     9     3      0  7.90     0 weight      0     1     0     0     1
12     2     9     4      1 10.8      0 yoplait     0     0     1     0     1
13     2    10     1      0  9.80     0 dannon      0     1     1     0     0
14     2    10     2      0  5.00     0 hiland      0     1     0     0     0
15     2    10     3      0  7.90     0 weight      1     0     0     0     1

However estimating a model with robust standard errors on this data does not give errors.

mnl_pref <- logitr::logitr(
  data           = yog,
  outcome        = "choice",
  obsID          = "obsID",
  pars           = c("price", "feat", "brand","ran1","ran2","ran3","ran4","ran5"),
  panelID        = "id",
  robust         = T,
  clusterID      = "id",
  numMultiStarts = 100)
summary(mnl_pref)

Note that I tried to re-create the obsID variable for my real data to make it increase incrementally without gaps and I still experience the same error.

Note also that I obtain the same error in the estimation phase if I include vcov=T option

from logitr.

Ales-G avatar Ales-G commented on June 18, 2024

So I figured out how to create a small dataset that reproduces the same error

First create the following data

data <- structure(list(id = structure(c(1, 1, 8, 8, 12, 12, 13, 13, 
                                         13, 13, 14, 14, 16, 16, 42, 42, 47, 47, 51, 51, 53, 53, 55, 55, 
                                         57, 57, 62, 62, 63, 63, 65, 65, 69, 69, 71, 71, 77, 77, 78, 78, 
                                         79, 79, 83, 83, 84, 84, 90, 90, 95, 95, 95, 95, 96, 96, 100, 
                                         100, 102, 102, 104, 104, 107, 107, 108, 108, 108, 108, 112, 112, 
                                         113, 113, 114, 114, 118, 118, 130, 130, 132, 132, 139, 139, 139, 
                                         139, 141, 141, 146, 146, 147, 147, 147, 147, 153, 153, 167, 167, 
                                         169, 169, 170, 170, 176, 176), format.stata = "%12.0g"), tskID = structure(c(2, 
                                                                                                                      2, 23, 23, 35, 35, 37, 37, 39, 39, 40, 40, 48, 48, 126, 126, 
                                                                                                                      139, 139, 153, 153, 157, 157, 163, 163, 171, 171, 186, 186, 188, 
                                                                                                                      188, 193, 193, 205, 205, 213, 213, 230, 230, 233, 233, 235, 235, 
                                                                                                                      248, 248, 251, 251, 268, 268, 284, 284, 285, 285, 287, 287, 298, 
                                                                                                                      298, 306, 306, 311, 311, 320, 320, 322, 322, 323, 323, 335, 335, 
                                                                                                                      339, 339, 341, 341, 352, 352, 388, 388, 395, 395, 415, 415, 416, 
                                                                                                                      416, 421, 421, 438, 438, 439, 439, 440, 440, 459, 459, 501, 501, 
                                                                                                                      506, 506, 508, 508, 528, 528), format.stata = "%10.0g"), choice = structure(c(1, 
                                                                                                                                                                                                    0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 
                                                                                                                                                                                                    1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 
                                                                                                                                                                                                    1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 
                                                                                                                                                                                                    1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 
                                                                                                                                                                                                    0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0), format.stata = "%10.0g"), 
                     cost = structure(c(16.3, 15.6, 15.4, 15.1, 16.5, 16.4, 16.7, 
                                      15.6, 15.8, 16.5, 15.5, 15.9, 15.7, 16.4, 15.1, 15.9, 16.6, 
                                      15, 16.1, 16.3, 15.7, 17.9, 16.2, 15.3, 16.4, 16.5, 17.1, 
                                      17.3, 15.2, 16.1, 16.5, 16.9, 15, 16.8, 16.5, 17.4, 15.2, 
                                      15, 16.4, 15.4, 16.8, 16.1, 16.1, 15.4, 16.8, 17.8, 16.6, 
                                      17.6, 15.6, 15.7, 16.3, 16.3, 16, 17.3, 17.4, 17.9, 15, 17.6, 
                                      15.4, 16.2, 15.6, 17.4, 17.8, 15.7, 17.2, 17.1, 15.9, 17.5, 
                                      17, 16.4, 15.2, 16.3, 15.3, 15.2, 16.3, 15.6, 15.2, 15.7, 
                                      15.5, 15.7, 16.2, 15.6, 17.1, 16.5, 15.1, 16.8, 16.6, 16.3, 
                                      15.3, 15.9, 16.1, 16.3, 17.9, 15.9, 15.4, 16.1, 16.4, 15, 
                                      17.7, 15.8)), 
                     var1 = structure(c(0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 
                                            1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 
                                            0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 
                                            0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 
                                            1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 
                                            1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1)), 
                     var2 = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                                           0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 
                                           0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 
                                           0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 
                                           1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 
                                           0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1)), 
                     var3 = structure(c(0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 
                                           1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 
                                           0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 
                                           0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 
                                           0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 
                                           0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0))), row.names = c(NA, 
                                                                                                                                                      -100L), class = c("tbl_df", "tbl", "data.frame"))

Second try to estimate the a MNL with robust SEs.

mnl_pref <- logitr::logitr(
  data           = data,
  outcome        = "choice",
  obsID          = "tskID",
  pars           = c("cost","var1","var2","var3"),
  panelID        = "id",
  robust         = T,
  numMultiStarts = 100)
Setting clusterID to 'id' since robust == TRUE and a panelID is provided
Running multistart...
  Iterations: 100
  Cores: 7
Done!
summary(mnl_pref)
Error in rowsum.default(expV, group = obsID, reorder = FALSE) : 
  incorrect length for 'group'
```
The command correctly estimates it, but it encounters problems with displaying the summary. 
Note that if you comment out the robust option the summary function works correctly!


Is there something I can do to overcome the problem?

from logitr.

jhelvy avatar jhelvy commented on June 18, 2024

I am almost certain that the issue is in how the ID variables are set. As it is currently written, the package needs IDs that are in sequentially increasing order, like 1,1,2,2,3,3, etc. The gaps in the IDs are probably the issue. You could test this by taking your demo dataset and over-writing the IDs with a sequentially ID, then see if it works. If that works, then we at least know the source of the problem.

This shouldn't be something that happens though in terms of how I would want the package to work. As long as the IDs are unique, being in a sequential order shouldn't matter, e.g. 1,1,8,8,13,13, should work just the same. So if that is indeed the problem, then I'll try to add a fix to make the package work even if the IDs have gaps.

from logitr.

Ales-G avatar Ales-G commented on June 18, 2024

I tried to follow your suggestion, but unfortunately it still gives the same error

data<- data %>%
  dplyr::group_by(id) %>%
  dplyr::mutate(group_id = cur_group_id())


mnl_pref <- logitr::logitr(
  data           = data,
  outcome        = "choice",
  obsID          = "tskID",
  pars           = c("cost","var1","var2","var3"),
  panelID        = "group_id",
  robust         = T,
  numMultiStarts = 100)
summary(mnl_pref)
Error in rowsum.default(expV, group = obsID, reorder = FALSE) : 
  incorrect length for 'group'

Is it possible that it also relates to the fact that some individuals perform a different number of tasks?

from logitr.

jhelvy avatar jhelvy commented on June 18, 2024

I believe I have found the source of the issue. I am working to add better checks to all of the variable IDs while also addressing this bug.

from logitr.

jhelvy avatar jhelvy commented on June 18, 2024

Okay, I think this is now corrected. Can you install the latest version (0.7.2) from Github and test it for me?

# install.packages("remotes")
remotes::install_github("jhelvy/logitr")

The issue has persisted this whole time and it simply had to do with the particular case where a cluster had only one observation. In these instances, when computing the clustered standard errors the data had only one row and was no longer being stored as a matrix, so I had to coerce it back to a matrix for the math operations to correctly execute.

from logitr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.