chjackson / msm Goto Github PK

View Code? Open in Web Editor NEW

56.0 56.0 18.0 10.95 MB

The msm R package for continuous-time multi-state modelling of panel data

Home Page: https://chjackson.github.io/msm/

R 83.09% C 13.94% TeX 1.95% PostScript 1.02%

msm's Issues

Analysing multicentric panel data with msm

sugested tag -> questions

Hi,

We planed to use msm package to infer anal HPV infection and clearance rates from panel data coming from different study/center.
The idea is to conduct a kind of meta-analysis of infection and clearance rates.
Centers are very heterogeneous in term of number of patients and follow-up duration.

Is there a smart way to handle such data within msm framework?
Would you sugest to add a 'center' covariate in the model?
Would it be better to analyse centers independently and conduct a classical rate meta-analysis in a second time?

Thanks a lot for this excellent package and the kind assistance you provide.

With best regards,
Damien

`ematrix.msm` returns `NULL` on `cav`-example

Hey! I'm not sure this is supposed to happen:

library(msm)
Q <- rbind (c(0, 0.25, 0, 0.25),
            c(0.166, 0, 0.166, 0.166),
            c(0, 0.25, 0, 0.25),
            c(0, 0, 0, 0))
cav.msm <- msm::msm(state ~ years,
               subject = PTNUM,
               covariates = ~sex,
               data = cav,
               qmatrix = Q,
               deathexact = 4)
msm::ematrix.msm(cav.msm)
#> NULL

^{Created on 2021-09-13 by the reprex package (v2.0.1)}

Source tree has build artifacts

There are .o and .so binary build artifacts present in the source tree that seem to be from the initial import of the CRAN sources into this repository.

These files prevent R CMD INSTALL from building the library with the system compiler and libraries and fails the install sanity checks in mysterious ways on machines sufficiently different from the machine the binaries were built on.

HMM model constructors seem to only allow literal constraints

The following code produces an error:

lapply(c(1,2,3), function(y) hmmNorm(mean = y, sd = 0.5))

This sort of construction would be quite common when using the msm package with a variable number of states.

Derive transition probabilities analytically

I would like to find Transition Probability Matrix (TPM) analytically from the generator matrix. It seems it does not work, it seems it computes TPM for t=1.

GG= matrix(c(-2,1,1,1,-2,1,1,1,-2),nrow=3,ncol=3,byrow = T)
MatrixExp(GG,method ="analytic")

It does not give me the analytical form based on exponential function, like Mathematica. Please take a look at the Mathematica example http://reference.wolfram.com/language/ref/ContinuousMarkovProcess.html.

constraint statement does not work for factor variables.

Hi Chris,

Thanks you so much for assisting me with setting up the constraint matrix. But I have a new problem on this matrix.

sex is a covariate in my mode.
I have 8 levels in the Q-matrix, some tranisitions not allowed are 2-8, 7-1, 8-1, 8-2, 8-3.

Option 1: Sex is a factor variable coded "Female", "Male"

sex.msm <- msm(state ~ Obstime, subject = subj, data = DF,
obstype = 2,
gen.inits=init,
qmatrix = Q,
method = "BFGS",
opt.method = "optim",
covariates = ~ sex,
constraint=list(sex=c( 1, 1, 1, 1, 1, 1, 1,
-1, 2, 2, 2, 2, 2,
-1, -2, 3, 3, 3, 3, 3,
-1, -2, -3, 4, 4, 4, 4,
-1, -2, -3, -4, 5, 5, 5,
-1, -2, -3, -4, -5, 6, 6,
-2, -3, -4, -5, -6, 7,
-4, -5, -6, -7)),
control = list (fnscale=95161, trace = 2, REPORT = 1,maxit=15000, reltol=1e-8))
Error in msm.check.constraint(constraint, mm) :
Covariate "sex" in constraint statement not in model.
In addition: Warning message:
In if (gen.inits) { :
the condition has length > 1 and only the first element will be used

**Option 2 : Sex is a numeric variable having values 0, 1. Sex is not Not a factor No errors whatsoever.

DF$sex <-as.numeric(DF$sex)
DF$sex[DF$sex==1]=0
DF$sex[DF$sex==2]=1

sex.msm <- msm(state ~ Obstime, subject = subj, data = DF,
obstype = 2,
gen.inits=init,
qmatrix = Q,
method = "BFGS",
opt.method = "optim",
covariates = ~ sex,
constraint=list(sex=c( 1, 1, 1, 1, 1, 1, 1,
-1, 2, 2, 2, 2, 2,
-1, -2, 3, 3, 3, 3, 3,
-1, -2, -3, 4, 4, 4, 4,
-1, -2, -3, -4, 5, 5, 5,
-1, -2, -3, -4, -5, 6, 6,
-2, -3, -4, -5, -6, 7,
-4, -5, -6, -7)),
control = list (fnscale=95161, trace = 2, REPORT = 1,maxit=15000, reltol=1e-8))
initial value 1.000000
final value 0.999298
converged
Used 33 function and 31 gradient evaluations

NB: Option 2 does work with no errors. My worries are as follows:

I have a factor variable called DMT with foour levels "DMT cat 1", "DMT cat 2", "DMT cat 3", and "DMT cat 4", where "DMT cat 4" is the reference level. This becomes an issue converting it to a numeric variables with values 1,2,3, 4, because the you cannot get separate hazards for level 1, 2, and 3 when using the constraint statement. However, using DMT as factor with 4 levels without the constraint statement produces hazards for 3 levels.

Can you sove this problem for me please? would be very much happy.

Thanks.

NA values to rtnorm

I noted something about NA values in your TODO file, but not specifically for this little quirk.

rtnorm(1, NA)

runs forever rather than complaining about an NA.

I couldn't work out why my code took all night to run and still made no progress - turns out I was indexing x[i] where i > length(x). Ideally msm would complain about missing values and exit.
Cheers!

HMM censoring, initprobs, and viterbi.msm pstates

This issue is related to #40. I'm using an HMM that requires censoring, known states at arbitrary times, and initial probabilities, all of which vary between two groups. I'm evaluating the output of viterbi.msm, in particular the vstate.* columns, and I find that changing initprobs changes result AIC but has no effect on pstate.V1 or pstate.V2. Perhaps I'm confused, but these results are contrary to my expectation based on reading help(msm).

For example, in the example below, I've changed initprob between two values (see arg commented with ##?) and compared results:

Full example w/two options for initprobs:

library(msm)
set.seed(8)

## SI model
pars <- within(list(), {
    npair = 50
    ## transition rate
    q.i = 0.05
    prob=list(S=0.2, I=0.8)
    nstate=length(prob)
    tmax=20
})

subjects <- expand.grid(
    pair = 1:pars$npair,
    group = factor(c('A','B'))
)

## simulate time to infection
subjects <- within(subjects, {
    id <- paste(pair, group, sep='.')
    infect <- rexp(nrow(subjects), pars$q.i)
})

## regular observation schedule
obs.time = seq(from=0, to=pars$tmax, by=5)

## input to msm: test results conditioned on infection status
panel <- lapply(1:nrow(subjects), function(ii) {
    x <- subjects[ii,]
    ## subject state at time t
    ## F = state 1, T = state 2
    state = (obs.time > x$infect)
    ## observation process conditioned on state
    obs = rbinom(length(state), 1, 
        ifelse(state, pars$prob$I, pars$prob$S)
    )
    data.frame(id=x$id, group=x$group, time=obs.time, state=state, obs=obs)
})
panel <- do.call(rbind, panel)

## censoring: 
## group A starts in known state 1
panel <- within(panel, {
    obs[(group=='A' & time==0)] <- -1
    ## group B starts in unknown state, not observed
    obs[(group=='B' & time==0)] <- -2
})

## initial conditions / observation model
config <- list(
    qmat = matrix(byrow=T, nrow=pars$nstate, c(
        ## S-I
        0, pars$q.i*1.2,
        0, 0
    )),
    hmod = list(
        S=hmmBinom(size=1, prob=0.1),
        I=hmmBinom(size=1, prob=0.9)
    )
)

result <- msm(data=panel,
    obs ~ time,
    subject=id,
    censor = c(-1, -2),
    ## group A: start in state 1 
    ## group B: uninformative starting state
    censor.states=list(c(1),c(1,2)),
    obstrue=ifelse(obs<0, 1, NA),
    ## initprobs are ignored?
    initprobs=c(0.9, 0.1),
    ##? AIC changes, but no difference in pstate
    #initprobs=c(0.1, 0.9),
    qmatrix=config$qmat,
    hmodel=config$hmod,
    fixedpars=T,
)

pred <- merge(
    panel, 
    as.data.frame(viterbi.msm(result)),
    by.x=c('id', 'time'), by.y=c('subject', 'time')
)

I've run the above with the two different values of initprobs shown above:

## test 1: initprobs=c(0.9, 0.1)
aa <- subset(pred, time==0)
## test 2: initprobs=c(0.1, 0.9) 
bb <- subset(pred, time==0)
## Returns TRUE
allequal(aa, bb)

My expectation is that initprobs should effect the "uniformative" censored initial observations of group B, but the output of viterbi.msm indicates no change in the predicted initial state residency probabilities of this group. In fact, pred$pstate.V1 in this example is identically 1 in all cases, which seems incorrect, especially for test 1 above.

Disagreement in MLE between self implementation and msm

My self-implemented optim() method is returning maximum likelihood estimates which disagree with msm(). See my code below.

Perhaps I'm misinterpreting the documentation, but shouldn't the likelihood be just composed of elements of the transition probability matrix P(t) (where P(t) = exp(Qt))?

library(dplyr)
library(msm)
library(expm)

msm.data <- tibble(time = c(0, 1, 2, 3, 4, 5), states = c(1, 2, 1, 2, 2, 2), subject_id = c(1, 1, 1, 1, 1, 1))

# create a function which will be used to maximise the likelihood
log_lik <- function(par){

  # define a = q_1 and b = q_2
  a <- par[1]
  b <- par[2]

  # define transition intensity matrix
  q <- rbind(
    c(-a, a), 
    c(b, -b)
  )
  
  # compute the transition probabilities
 # by taking the matrix exponential 
  p <- expm(q)
  
  # trasition: 1 -> 2
  L1 <- p[1, 2]

  # trasition: 2 -> 1
  L2 <- p[2, 1]

  # trasition: 1 -> 2
  L3 <- p[1, 2]

  # trasition: 2 -> 2
  L4 <- p[2, 2]

  # trasition: 2 -> 2
  L5 <- p[2, 2]

  L <- log(L1)+log(L2)+log(L3)+log(L4)+log(L5)

  # use the negative since we want the maximum
  return(-2*L)
}
# estimate using self defined likelihood
optim.fit <- optim(par = c(0.1, 0.1), fn = log_lik, method = "BFGS", control = list(reltol = 1e-8, maxit = 100000))

# define the initial transition intensity matrix
q <- rbind(c(0, 0.1),
           c(0.1, 0))

msm.fit <- msm(states ~ time, exacttimes = T, data = msm.data, qmatrix = q, control = list(reltol = 1e-8, maxit = 100000))


optim.fit
msm.fit

Error in Pmat[x[1], x[2]] : out of bounds

Dear Dr. Jackson,

I am currently trying to fit a model using the msm() command and stand stumbled upon an error:

Error in Pmat[x[1], x[2]] : out of bounds

I fitted a similar model on a previous dataset and it was working fine. I searched for the error message in the msm() code but without success.

My code looks like this :

msm_model <- msm(data = database,
                 subject = id,
                 formula = state ~ age, 
                 covariates = list("1-2" = ~ sex + education,
                                   "1-3" = ~ sex + education + age,
                                   "1-4" = ~ sex + education,
                                   "2-1" = ~ sex + education,
                                   "2-3" = ~ sex + education + age,
                                   "2-4" = ~ sex + education,
                                   "3-4" = ~ sex + education),
                 qmatrix = trans_mat,
                 exacttimes = F,
                 deathexact = c(4),
                 control=list(fnscale = 37000),
                 gen.inits = T
                 )
msm_model

Any idea where the problem lies?

Thank for your help.

R msm package does not generate estimates

I am trying to use the MSM R-package to estimate a continuous-time hidden Markov model. I do not know why my code does not show the estimates and confidence intervals for the transition intensities and the hidden Markov model parameters when I print the msm object.

I am attaching the data file and the code I am using. I am brand new to HMM. Any help is sincerely appreciated!

Data: https://drive.google.com/file/d/1qirk7fpCmFvQ6seI3iXKJl-M9ppd50vr/view?usp=sharing

library (tidyLPA)
library (dplyr)

three.q<-rbind(c(0.3,0.2,0.1),c(0.25,0.5,0.25),c(0.1,0.5,0.25))

hmodel1<-list(hmmNorm(mean = 0.5, sd=0.5),hmmNorm(mean = 20.8, sd=25.1),hmmNorm(mean = 304.9, sd=203.3))

msm<-msm(Y~Day, subject=ID,data=HMMDatav0,qmatrix=three.q,hmodel =hmodel1, 
         hcovariates = list (~X, ~X, ~X), method="CG", 
         fixedpars=TRUE, control=list(fnscale=4000))

msm

The results only show the value of -2 * log-likelihood. Are there any issues with my code, or the properties of my data?

Expose likelihood function

Currently the only way to obtain the likelihood is through the use of logLik.msm, which only extracts the maximum value.
Would it be possible to add a way to calculate the likelihood for a given set of parameters and a given set of data?

pearson.msm incorrect number of dimensions

pearson.msm(sp.msm, timegroups=2, transitions=c(1,2,3,4,5,6))
Imputing sampling times after deaths...
Error in dist[1, ] : incorrect number of dimensions

Viterbi pstate calculations for HMM with initprobs

I'm a fairly new user to msm package but I've spent the past few days trying to understand the posterior probability (pstate) output from viterbi.msm(). I've gone through the mathematics of the Viterbi algorithm. I expect the posterior probability of the first observation per subject to depend on the probability of that outcome as long as initprobs argument is provided to msm and not equal to c(1,0,...,0).

Here is a toy example showing posterior probabilities (pstate values) from viterbi.msm().

set.seed(123)

sim <- data.frame(time = rep(1:5,50), Id = rep(1:50,each=5))

initialstates <- sample(size = 50, 1:2, prob=c(.5,.5), replace=TRUE)
T <- matrix(c(.8,.1,.2,.9), nrow=2)
states <- matrix(0, nrow=50, ncol = 5)

for(i in 1:50){
  states[i,1] <-  initialstates[i]
  for(j in 2:5){
    states[i,j] <-  sample(size=1,1:2,prob=T[states[i,j-1],],replace=TRUE)
  }
}
  
sim$states <- as.vector(t(states))
m <- c(0,10)
sim$y <- rnorm(50*5,mean = m[sim$states], sd = 1)


hmodel1 <-  list (hmmNorm(mean = 1, sd = 1), hmmNorm(mean= 11, sd = 2))

mod1 <- msm(y ~ time, subject = Id, data = sim, qmatrix = rbind( c(0, 0.01), c(0.01,0)),hmodel=hmodel1)
viterbi.msm(mod1) %>% filter(time == 1) # What you expect given constrained to start in state 1
viterbi.msm(mod1) %>% filter(time == 2) # What you expect based on outcome values



mod2 <- msm(y ~ time, subject = Id, data = sim, qmatrix = rbind( c(0, 0.01), c(0.01,0)),hmodel=hmodel1,initprobs=c(.1,.9))
viterbi.msm(mod2) %>% filter(time == 1) # Issue: Posterior probability should involve the probability of the first outcome value
viterbi.msm(mod2) %>% filter(time == 2) # What you expect based on outcome values

I'm not an expert in C, but I believe there is a bug in the Viterbi function in the lik.c file in the pfwd calculation when i=0 at the beginning of the function and when obs = 0 at the end of the Viterbi() function. The outcome value is calculated from GetCensored but there is no call toGetOutcomeProb to be incorporated into lvold where the comments say use initprobs.

Saving msm model without underlying data

Is there a way to save a trained msm model without underlying data which was used for the training?

P.S. Thank you for your quick reply to my last question. :)

simmulti.msm fails with hmmMV in hmodel

I'm preparing an MWE using hmmMV for issue #46, and I get an error when I use hmmMV in simmulti.msm(..., hmodel=list()). I have a working example of msm using real data and a formulation of hmodel similar to what's shown below. I've looked over help("simmulti.msm"), and I don't see any specific mention of hmmMV. Is this use-case supported? Or perhaps I'm making a silly mistake here?

library(msm)
set.seed(8)

pars <- within(list(), {
    npair = 20
    deltat=5
    tmax=20
    ## transition rate
    q = 0.05
})

qmat <- rbind(
    c(0, pars$q),
    c(0, 0)
)

## data input to simmulti.msm
sim.grid <- with(pars,
    expand.grid(time=seq(0,tmax,by=deltat), subject=1:npair)
)

sim.dat <- simmulti.msm(sim.grid, qmat,
    hmodel=list(
        S=hmmMV(
            A=hmmBinom(size=1, prob=0.1),
            B=hmmBinom(size=1, prob=0.2)
        ),
        I=hmmMV(
            ## sim uses larger size here
            A=hmmBinom(size=2, prob=0.8),
            B=hmmBinom(size=1, prob=0.6)
        )
    )
)

Result:

Error in if (!(hmodel[[i]] == identity && (length(hmodel[[i]]) ==  :
  missing value where TRUE/FALSE needed

Combining multiple plot.prevalence.msm plots

Hi,

First of all - great package and page. I'm relatively new to R, and was trying to combine several prevalence plots in 1 plot (depending on subset and covariate value), but am having a hard time changing the plot.prevalence.msm() function to allow this, as I don't have the underlying code for observed.msm() and expected.msm(). I was thinking, is there an easier way of combining the plots, for example similar to how this is done in ggplot2 ?

Just to make it more concrete, I am using a two-state model without absorbing state. I have specified pci cut-offs and 1 covariate with 6 levels, let's call the covariate district. I would like to plot the observed and expected prevalence over time per district.

The msm object I specified accordingly:

twoway.q <- rbind(
  c(-0.24, 0.24), 
  c(0.1, 0.1))

districts.msm.pci <- msm(state ~ td1jan, 
                       subject = id,
                       data = helius.inc, 
                       qmatrix = twoway.q, 
                       obstyp = 1, 
                       pci=c(0.5,0.75,1),
                       covariates = ~ district)

The plot for one of the districts would be:

plot.prevalence.msm(districts.msm.pci, mintime=0, maxtime=2,
                      xlab="Years since 1 jan 2020", ylab="Prevalence (%)",
                      subset = id.districtS,
                      covariates = list(district="South"))

Thanks a lot in advance!

Specifying covariates for qmatrix.msm and ppass.msm in models with piecewise constant intensities

Hello,

I am running into a strange problem when fitting models with a time constant binary covariate as well as piecewise constant intensities. When I set the value of the covariate, e.g., in qmatrix.msm, the intensities for some transitions can possibly be either both greater or less than the intensity for the mean value of the covariate. This results in very strange estimates of first passage probabilities and other quantities. I think I must be doing something wrong in my function calls?

As an minimal working example, I've modified the sex-adjusted cav model from the msm manual and inserted a changepoint at 10 years. The transition intensities for males and females are both lower than the transition intensity at the mean value of sex, as are the first passage probabilities.

~~Edit: Also applies to total length of stay.~~

Thanks!
Jon

library(msm)

# intensity matrix from the msm manual
Q <- rbind ( c(0, 0.25, 0, 0.25),
             c(0.166, 0, 0.166, 0.166),
             c(0, 0.25, 0, 0.25),
             c(0, 0, 0, 0))

# model with sex + piecewise constant intensities and a changepoint at 10 years
cavsex.msm <- msm( state ~ years, subject=PTNUM, data = cav,
                   qmatrix = Q, deathexact = 4, covariates = ~ sex, pci = 10)

# get rate matrices - transition intensity from 2-3 at the mean value of sex is greater than when sex = 0 or 1
qmatrix.msm(cavsex.msm)
qmatrix.msm(cavsex.msm, covariates = list(sex = 0))
qmatrix.msm(cavsex.msm, covariates = list(sex = 1))

# similarly, the first passage probability for 2-3 is much higher when using the average value of sex
ppass.msm(cavsex.msm)
ppass.msm(cavsex.msm, covariates = list(sex = 0), tot = 10)
ppass.msm(cavsex.msm, covariates = list(sex = 1), tot = 10)

Missing digits=digits in call to print

The call to print.msm.est contains the code print(unclassify(X)), which is missing the argument digits=digits.

There may be other instances where this argument is missing; this is the case that is affecting me.

error in print method for hmmMV

Creating a hmmMVdist object to pass to the hmodel argument in msm for a multivariate HMM will return an error if printed in the console. Looks like an issue with print.hmmdist. For example:

 hmmMVobj <- hmmMV(hmmNorm(mean=1, sd=3),  hmmNorm(mean=0, sd=1) )
 hmmMVobj

##>Hidden Markov model distribution
##>
##>Error in if (x$label == "categorical") paste("P(", seq(x$pars[1]), ")",  : 
##>  argument is of length zero

Cheers

Allow functions in state ~ time formula

Currently as.numeric(state) ~ as.numeric(time) doesn't work in the "formula" argument to msm(). Should either support this or give a comprehensible error message.

log(hazard ratios) and standard errors?

Hello!

I am using the msm package to model a population of hospitalized patients whose outcomes are either death or discharge. I have a covariate x for which the msm output provides the hazard ratio and 95% CI. Is there a way to obtain the standard error for the hazard ratio?
What I am really looking for is, Is there a way to obtain the log(hazard ratio) and the standard error for the log(hazard ratio) ?
Thanks for your help!!

Using inverse probability weight in MSM models

Hi Dr.Jackson,

I wonder if it's possible to use inverse probability weight in "msm"? If yes, how? There is no "weight" argument in msm, right? I am thinking of using inverse probability weight to deal with missing and dependent censoring.

I appreciate any help you can offer!

Comparing a multivariate HMM and univariate HMM for binary classification

I have trained three HMMs to do binary classification, all with the same data. Now, one HMM is multivariate (uses two variables of the same data) and I have trained it only with data from one class. For data from this same class, I have trained a univariate HMM (using one variable of the two taken in the multivariate HMM). In addition, for the other class, I have used a univariate HMM only.

Now, I want to compare whether the final classifier (which is formed by two HMMs, one for each class) using the multivariate HMM and the other univariate HMM is better than using the two univariate HMMs.

The issue is that for the multivariate HMM I get much higher likelihoods than for the univariate HMMs in the test set, so it seems that you cannot compare in such a simple way a multivariate HMM and a univariate HMM.

So, I would like to know exactly how the msm package calculates the likelihood in the multivariate HMMs, in case I need to apply any changes to the likelihood to compare it with the univariate HMMs.

HR p-values

Hi,

is there a way to compute p-values associated with hazard ratios of a model? (do HRs differ from 1?)

Best regards

Add electronic health records to panel data?

Hi Dr. Jackson,

When we model panel data with "MSM" and assume arbitrary observation times (exacttimes=F), is it worth adding available electronic health records to the panel data? For instance, adding national electronic health records and information from certificates of cause of death to cohort data collected regularly to identify as many disease cases as possible and also "impute" missing disease states due to dropout and death?

If it is beneficial to add those electronic health records, should they be added to the data as additional observations in practice?

Thanks!

Xin

Convert hmodel.object so it can be passed to hmodel argument in msm

For HMMs, is there a function that takes a hmodel.object (or a msm.object) and returns a list of hmmdist objects that can be passed to the hmodel argument in a new msm() call? For instance, it could be used in the following way:

# create first HMM
fit1 <- msm( y ~ time, subject = ptnum, data = example.df, 
                    qmatrix = rbind( c(0, 0.25, 0), c(0, 0, 0.2), c(0, 0, 0)),
                    hmodel = list (hmmNorm(mean=90, sd=8), hmmNorm(mean=70, sd=8), hmmIdent(-9)) 
           )

# extract hmodel.object from fit1
hmodel.fit1 <- hmodel.msm(fit1$hmodel)

# create second HMM with initial emission distribution paramaters from fit1
fit2 <- msm(y ~ time, subject = ptnum, data = example.df, 
                    qmatrix = rbind( c(0, 0.25, 0.1), c(0.1, 0, 0.2), c(0, 0, 0)),
                    hmodel = hmodel.fit1
           )

I've built a function that does this for a very specific multivariate categorical HMM but it would be useful to have a generic function that does this for any given hmodel.object.

Has this been considered? If not, could you provide any pointers on how best this might be achieved?

Cheers

Error with "impossible for given initial state probabilities and outcome model"

Hello Dr. Jackson,

Thank you very much for your excellent work for MSM; I'm a new user of this package and I have faced an issue that I didn't know how to solve it. I wanted to fit a misclassification model with msm of the data about a progressive disease.
However, it resulted in a few errors, for example, "initial value in 'vmmin' is not finite" or "Optimisation has probably not converged to the maximum likelihood - Hessian is not positive definite."

Please could you advise?

Many thanks

R-data and code.zip

Error in msm.form.dq(qmodel, qcmodel, pars, p, mm.cov) : reached CPU time limit

Any suggestions on how to fix this or why this is happening in the first place would be appreciated. Sometimes the models converge but then I get this error and the object fails to save.

I've tried:
setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE)
setSessionLimit(cpu = Inf, elapsed = Inf, transient = FALSE)

and neither of them fix the error.

initial value in 'vmmin' is not finite

Hi,

With the following data I get the error 'initial value in 'vmmin' is not finite' and with the following Q of allowed transitions.

Q <- rbind(c(0,1,0,1),
c(0,0,1,1),
c(0,1,0,1),
c(0,0,0,0))
partdata.zip

With a less restrictive Q I get the 'liklihood overflow error' however a less restrictive model isn't clinicly accurate.

Please could you advise?

Many thanks

Error in if (gen.inits) { : argument is not interpretable as logical with hmmMV

Hello,

I am trying to fit a model with two continuous gaussian outcomes from a HMM. I am following the Example 2 in help(hmmMV) but replacing hmmBinom by hmmNorm.

Here is the code and the data file for which I get this error message:

In addition: Warning message:
In if (gen.inits) { :
  the condition has length > 1 and only the first element will be used

Thank you in advance for any insight!

read.table(file = "df_example_msm.txt", sep ="\t")
head(df)
#           X1          X2      time X4 ID
# 1 0.00042458 -0.04983110 20.018889  C  1
# 2 0.00060557  0.01430985 20.027222  C  1
# 3 0.00181100  0.00416775 20.275000  C  1
# 4 0.01989629 -0.00678782  6.779167  C  2
# 5 0.01989629 -0.00678782  7.413056  C  2
# 6 0.02091527 -0.01099738  7.443333  C  2

df$obs <- cbind(obs1=df$X1, obs2=df$X2)
k=2 #nb. of states
q <- matrix(1, ncol=k, nrow=k)     #specifying allowed transitions. Also if gen.inits=T, used for initial value (excep for diagonal values)
m1=mean(df$X1,na.rm=T)
s1=sd(df$X1,na.rm=T)
m2=mean(df$X2,na.rm=T)
s2=sd(df$X2,na.rm=T)

mod <- msm(obs ~ time, subject=ID, data=df, qmatrix=q,
           hmodel <- list(hmmMV(hmmNorm(mean=m1, sd=s1),hmmNorm(mean=m2, sd=s2)),
                          hmmMV(hmmNorm(mean=m1, sd=s1),hmmNorm(mean=m2, sd=s2))),
           control=list(maxit=10000))

df_example_msm.txt

Predicting HMM state probabilities with new data

Say we have an hmm, fitted using msm on olddata:

fittedhmm <- msm(response ~ time, subject = subject, data = olddata, ...)

I'd like to be able to run the Viterbi algorithm with the fittedhmm model, but to predict state probabilities for a new dataset, newdata, rather than olddata. There's two ways I can think of doing this.

Method 1

Fit a hmm model with msm():

fittedhmm <- msm(response ~ time, subject = subject, data = olddata, ...)

Create a dummy, unfitted msm object whose initial parameters are taken from fittedhmm but are NOT updated to maximise likelihood wrt newdata:

dummyhmm <- msm(response ~ time, subject = suject, data = newdata, 
                                  hmodel = hlist(fittedhmm),
                                  qmatrix = qmatrix(fittedhmm), 
                                  fit = FALSE, 
                                  ...)

where hlist() takes the hmodel.object from fittedhmm and converts it to the correct format (see #17 (comment)). The fit=FALSE is the way to prevent the optimsation routine from running, so the model keeps the initial values from fittedhmm. Then simply pass this object to viterbi.msm():

newpredictions <- viterbi.msm(dummyhmm)

Method 2

This is neater. Modify viterbi.msm() directly so that it has a data argument, to be used like this:

newpredictions <- viterbi.msm(x = fittedmsm, data = newdata)

Looking at the viterbi.msm() internals, this would mean creating an object from newdata that is the same as the output of msm:::expand.data() (which is then passed to Ccall.msm(params, do.what = "viterbi", ...) I'm not sure how easy this will be.

Fudge

A simple fudge is to take create a dummy hmm that will convert newdata as appropriate (but the fitted parameters are ignored) and run the appropriate parts of viterbi.msm() with fittedhmm and the dummy hmm, like this:

library('msm')

## Simulate data from a Markov model
nsubj <- 30; nobspt <- 5
sim.df <- data.frame(subject = rep(1:nsubj, each=nobspt),
                     time = seq(0, 20, length=nobspt))
two.q <- rbind(c(-0.1, 0.1), c(0, 0))

set.seed(123)
dat <- simmulti.msm(sim.df[,1:2], qmatrix=two.q, drop.absorb=FALSE)
dat$obs1[dat$state==1] <- rbinom(sum(dat$state==1), 40, 0.1)
dat$obs1[dat$state==2] <- rbinom(sum(dat$state==2), 40, 0.5)

newdat <- simmulti.msm(sim.df[,1:2], qmatrix=two.q, drop.absorb=FALSE)
newdat$obs1[newdat$state==1] <- rbinom(sum(newdat$state==1), 40, 0.1)
newdat$obs1[newdat$state==2] <- rbinom(sum(newdat$state==2), 40, 0.5)
newdat<-subset(newdat, subject<16) #just to check it works on different sized data

## Fitted model
fittedmsm <- msm(obs1 ~ time, subject=subject, data=dat, qmatrix=two.q,
                 hmodel = list(hmmBinom(size=40, prob=0.2),
                                hmmBinom(size=40, prob=0.2))
                )

##Dummy model to process newdata in the right way (fitted parameters are ignored)
dummymsm <- msm(obs1 ~ time, subject=subject, data=newdat, qmatrix=two.q,
                hmodel = list(hmmBinom(size=40, prob=0.2),
                              hmmBinom(size=40, prob=0.2))
               )


# modified version of viterbi.msm()

viterbiout <- msm:::Ccall.msm(fittedmsm$paramdata$opt$par,
                              do.what = "viterbi",
                              msm:::expand.data(dummymsm),
                              fittedmsm$qmodel, fittedmsm$qcmodel, fittedmsm$cmodel, fittedmsm$hmodel, fittedmsm$paramdata
                             )

stateprobs <- data.frame(subject = dummymsm$data$mf$"(subject)",
                         time = dummymsm$data$mf$"(time)",
                         observed = dummymsm$data$mf$"(state)",
                         fitted = viterbiout[[1]]+1,
                         pstate = viterbiout[[2]]
                        )

head(stateprobs)

#>  subject time observed fitted     pstate.1  pstate.2
#>1       1    0        5      1 1.000000e+00 0.0000000
#>2       1    5       18      2 6.994981e-08 0.9999999
#>3       1   10       19      2 5.396904e-16 1.0000000
#>4       1   15       23      2 6.162963e-28 1.0000000
#>5       1   20       23      2 1.607795e-39 1.0000000
#>6       2    0        3      1 1.000000e+00 0.0000000

This example demonstrates the desired output but I'd obviously want to stripe away all the unneccesary parts for the conversion of newdata, rather than passing it through msm every time.

I'll probably try to put something together for method 2. If you have any advice before I get started I'd be very happy to hear it.

Cheers!

Issue with bootstrap: "state variable is character"?

Good morning,
when running this code:

`library("msm")

data_test <- read.delim("~/data_test.txt")
Q <- matrix(0.1, nrow = 12, ncol = 12)

model.msm1 <- msm( GROUP ~ TIME, subject = id, data = data_test,
qmatrix = Q, gen.inits=FALSE, control = list(fnscale = 500000000, maxit = 500), exacttimes=TRUE)

pmat <- pmatrix.msm(model.msm1, t=1, ci = "bootstrap" )
`

I am returned the following error: "Error in msm(formula = GROUP ~ TIME, subject = id, data = structure(list( :
state variable is character, should be numeric". The issue is specific to the bootstrap option and persists even if i try using the boots.msm function

the data i'm using is
https://github.com/LeoCi1907/myrep/blob/main/data_test.zip

Am I missing something obvious? The thing is, it has no issues with msm() as long as bootstrapping is not involved. Any support is appreciated.
Thank you

For HMM, viterbi pstate probabilities ignore obstrue

The following is based on my reading of the obstrue section of help("msm") and previous discussion of obstrue in Issues #40 and #41. Apologies in advance if I've misinterpreted "true state" here.

This simplified example looks at obstrue in a HMM without censoring. I expect obstrue to indicate the true hidden state here, and for this to be reflected in the pstate matrix from viterbi.msm. In this examlpe, I see that fitted always equals obstrue (when provided), but the relevant pstate seem to ignore the information in obstrue.

 library(msm)
set.seed(8)

pars <- within(list(), {
    npair = 20
    ## transition rate
    q = 0.05
    prob=list(S=0.2, I=0.8)
    deltat=5
    tmax=20
})

## Full example
sim.grid <- with(pars,
    expand.grid(time=seq(0,tmax,by=deltat), subject=1:npair)
)

## initial conditions / observation model
config <- list(
    qmat = rbind(
        c(0, pars$q),
        c(0, 0)
    ),
    hmod = list(
        S=hmmBinom(size=1, prob=0.2),
        I=hmmBinom(size=1, prob=0.8)
    )
)

sim.dat <- simmulti.msm(sim.grid, 
    config$qmat, drop.absorb=F,
    hmodel=list(
        S=hmmBinom(size=1, prob=0.2),
        ## sim uses larger size
        I=hmmBinom(size=2, prob=0.8)
    )
)

panel <- subset(sim.dat, select=c(subject, time, obs))
panel <- within(panel, {
    ## force state 2
    obstrue <- ifelse(obs==2, 2, NA)
    obs <- ifelse(obs>1, 1, 0)
})

result <- msm(data=panel,
    obs ~ time, subject=subject,
    obstrue=obstrue,
    qmatrix=config$qmat,
    hmodel=config$hmod,
    fixedpars=T
)

pred <- merge(
    panel,
    viterbi.msm(result),
    by=c('subject', 'time'), sort=F
)

## I expect pstate.2==1 when obstrue==2 
expect <- subset(pred, obstrue==2)

broom::tidy method for msm models

Hello,

thanks a lot for your package. Did you ever consider providing tidiers for msm models, allowing to easy use the results for producing graphs or table?

It would be great to have a tidy method for msm models. cf. https://broom.tidyverse.org/articles/external-tidiers.html

Ideally, there should be an option to precise that should be returned:

a data frame of all hazard ratios (one row per HR)
a data frame of all prevalence (one row per status and per time)

Markov property

Hello Dr. Jackson,
Thank you very much for your excellent work for MSM; I'm a new user of this package and I would like to know if the MSM package provides the function to check the Markov property. if not, how we can be sure that the Markov property holds? if there are other packages that can implement this function.

Interaction between time and covariate

Hello,

I have another question that we can't seem to solve ourselves, but are sure there is a way.. We have fitted a two-state time-inhomogeneous model with an absorbing state and would like to test for and model the interaction between our covariates and the time variable:

twoway.q <- rbind(
  c(-0.24, 0.24), 
  c(0, 0.1))

districts.msm.pci <- msm(state ~ td1jan, 
                       subject = id,
                       data = helius.inc, 
                       qmatrix = twoway.q, 
                       obstyp = 1, 
                       pci=c(0.5,0.75,1),
                       covariates = ~ district + sex + age)

Now, the manual (p.47) states that "this facility does not support interactions between time and other covariates. Such models need to be specified by hand, using a state variable with censored observations inserted. Note that the data component of the msm object returned from a call to msm with pci supplied contains the states with inserted censored observations and time period indicators. These can be used to construct such models".

From our districts.msm.pci object, I can see the created data frame in this list, including the (pci.imp) double and timeperiod factor. Now, how can I get from here to the interaction model? I tried using this data frame as the input for a new msm object, but am getting errors for the state variable, which now contains three states while my original model only had two. I tried using (obstrue) as the state variable instead while specifying the option obstyp==2, but then I get a lot of errors about the data being inconsistent with the transition matrix..

I have tried finding a documented example of a msm model with a similar interaction, but could not find one..

Thanks a lot in advance!

Problem with bootstrapping: Observations within subjects ... are not ordered by time

data_test.txt

Using the attached data I can successfully run an msm model:

Q <- rbind(c(0, 0, 0.25, 0, 0.25), c(0, 0, 0.25, 0, 0.25), c(0.125, 0.125, 0, 0.125, 0.125), c(0, 0, 0.25, 0, 0.25), c(0, 0, 0, 0, 0))

model1 <- msm(state ~ time, subject = id, data = data, qmatrix = Q, gen.inits = T)

But when I try to perform any function involving bootstrapping, I get the following error message:

Error in msm.check.times(mf$"(time)", mf$"(subject)", mf$"(state)") :
Observations within subjects 49, 111, 240 and others are not ordered by time

I've tried

msm.check.times(data$time, data$id, data$state)

using the function found on GitHub and there doesn't appear to be a problem with the data. I've also ordered the dataframe according to:

data <- data[order(data$id, data$time),]

This has really got me stuck. Any help greatly appreciated.

MSM with covariates :

Hello Dr. Jackson,
Thank you very much for your excellent work for MSM; I'm a new user of this package and I have faced an issue that I didn't know how to understand. I want to evaluate the effect of variates on partial transitions, such as,
I get the hazard ratios and confidence intervals of age on the first transition, as well as the age and gender on the second transition by code: covariates = list("1-2" = ~ age, "2-3" = ~ age + gender). However, when I add a new transition in the above code: covariates = list("1-2" = ~ age, "2-3" = ~ age + gender, "2-4"= ~ age + gender+education), I will obtain a new value of the hazard ratios and confidence intervals of age on transition "1-2" , and age, gender on transition "2-3". How should I understand the difference between the two values? If I only want to examine the effect of age, gender on transition "2-3", "2-4", Can I ignore the effect of variables on the transition "1-2", code like this: covariates = list( "2-3" = ~ age + gender+education, "2-4"= ~ age + gender+education)?

BLAS/LAPACK routine 'DGEBAL' gave error code -3

Error in balance(baP$z, "S") :
BLAS/LAPACK routine 'DGEBAL' gave error code -3
Calls: pmatrix.msm ... MatrixExp -> -> expm.Higham08 -> balance
Execution halted

I got the following error when I tried to use pmatrix.msm() on a model that I fit with specified values for all covariates. An initial sweep on Google didn't return too many results. I'm not sure exactly what the error means or how to avoid it.

Many thanks in advance for all your help,

Matt

Is there a way to constrained the effect of a covariate to be the multiplication of 2 others HR?

Hello,

A situation where there are 3 possible statuses (A, B and C) which are ordered: to go from A to C, you need to go through B.
We know exact transition time (i.e. obstype = 2).
Several individuals have transitioned from A to C at the same time (but conceptually, we can simply assume that they transitioned from A to B and from B to C at the same moment).

If I allow only A->B and B->C transitions in qmatrix, I'm ending up with infinite values during the optimisation process and the model is not able to converge.

An option would be therefore to allow A->B, B->C and A->C transitions in the model. In the model, I have a covariate (for example Age). I would like to constrain the associated hazard ratio for "A->C" transition or HR(A->C) to be equal to the multiplication of HR(A->B) by HR(B->C), i.e. to consider that the effect of age on transitioning from A to C is equal to the cumulative effect of age when transitioning from A to B and then from B to C.

Is there a way to specify such constraints in msm?

In all cases, thanks in advance for your help.

continuous-time vs. discrete-time models

Hi Dr.Jackson,

I read your answer to the previous issue, "numerical overflow in calculating likelihood #5". If I understand correctly, if we specify "exacttimes=T", we fit discrete-time models. I wonder, in practice, when a continuous-time model should be used and when a discrete-time model should be used? Should the time interval between two observations not be too long to be able to fit a continuous-time model?

Thanks a lot!

Xin

Perason test broken by qcut

Hello Dr. Jackson,

Thank you very much for crafting MSM; I've been reading your papers and using this package for a new project and really appreciate all of your work. Today I've run into a funky issue.

On an earlier version of my data which did not have follow-up for everyone, the Pearson test ran without issue. Today I loaded up my new data and it reported

Error in pearson.msm(mc_msm, timegroups = 2) : Remove any absorbing-absorbing transitions from data and refit model

I understood the error; a few individuals who had died at timepoint=3 were included with timepoint=4 observations, but again simply logged as having died. I removed them and re-ran the model fitting procedure without issue. However, when I attempted to rerun the Pearson-like test pearson.msm(mc_msm, timegroups=3) I received the following error:

Error in seq.default(0, 1, 1/n) : argument "n" is missing, with no default

It works for timegroups=2, but not 3. I spent a few hours debugging and I believe I have figured out the problem: in removing the absorbing observations I have shifted the distribution of the observation times (since all patients have timepoints 1 through 4). In doing so qcut(md$time, 3) in the Pearson code goes from returning a vector with levels of 1, 2, 3 to a vector with levels 1, (1, 2], [2, 3). In doing so the rest of the grouping is broken because a the group defined by (1, 2) is empty.

Having discovered the etiology I've no clue how to fix it without changing the underlying code. I was curious if you have ever encountered this before (and could suggest a remedy I may have missed, e.g. maybe my model is just being fit wrong). If not, the only solution I can think of would be to allow the Pearson function to allow as input a manual set of quantiles. I don't think this would be hard to do and if that is the only route I'll submit a pull request and try and do it.

Cheers,
Chris

Change reference value of covariates that are factor type

Good night,

How can I change the reference value it takes from the covariates that are factors?

MSM with covariate: Error in omit | (nobspt == 1) : non-conformable arrays

Hello,

I am trying to fit a multistate model with a covariate and am getting the following error:

Error in omit | (nobspt == 1) : non-conformable arrays

Not really sure what in my model setup would lead msm to enter this function, but the error appears to be thrown on line 25 of na.find.msmdata and arises from the omit object being a matrix, in my case of dimension 15k x 1, and nobspt being a table. Perhaps this is an R 4.0+ issue and these two objects were previously conformable?

Thanks,
Jon

Never pass a tibble to msm2Surv...

Hi! Thanks for this wonderful package. Im just writing in to say I spent the better part of the last three days attempting to use msm2Surv while passing a tibble as the data argument. This results in the from and to fields of this line:

## Data frame of observed events ev <- data.frame(id=data[!lpt,subject], from=data[!lpt,state],to=data[!fpt,state], Tstart=data[!lpt,time], Tstop=data[!fpt,time], time=data[!fpt,time]-data[!lpt,time], status=rep(1,nev))

Overwriting an unused state field that ultimatly gets lost as well. This results in everything under that line breaking.

Now, this is not a bug, a dataframe is requested by the function and thats what one should pass it. But most of us I believe use tibbles and data.frames as almost interchangeable: Not so!

ALSO: your follow up time should be ordered before feeding it to this function.

Thanks again!

HMM censoring and obstrue

I'm working on an infectious disease project that uses a HMM and uses censoring and known states at arbitrary times. I'm struggling to understand how obstrue works in this case. After looking closely at help(msm) for argument obstrue, my impression was that A) I needed obstrue==1 for each censored observation, and that B) I could force a censored observation into a given state by providing a single state in the appropriate entry of censor.states. My experience with msm suggests that I'm misinterpreting something.

I've found that I can use censoring to force a given state if I:

Do not provide obstrue
include a vector of length>1 in censor.states, e.g., censor.states=list(known=c(1,1), other=c(1,2))

Notably, censor.states=list(known=c(1), other=c(1,2)) does not behave as expected / as above (but still overrides provided initprobs, though that's a separate issue).

Below is a full example with four possible behaviors (see args commented with ##?): two different censor.states, each with and without obstrue. I'm struggling to understand the behavior of each option, and which is appropriate for my use-case.

Any advice would be appreciated!

library(msm)
library(data.table)
set.seed(8)

## msm docs: 
# * obstrue=1 required for censor effects (add note to censor?)
# * censor.state list: length 1 fails without error

## SI model
pars <- within(list(), {
    npair = 50
    q.i = 0.05
    prob=list(S=0.2, I=0.8)
    nstate=length(prob)
    tmax=20
})

subjects <- expand.grid(
    pair = 1:pars$npair,
    group = factor(c('A','B'))
)

## simulate time to infection
subjects <- within(subjects, {
    id <- paste(pair, group, sep='.')
    infect <- rexp(nrow(subjects), pars$q.i)
})
subjects <- data.table(subjects)

## regular observation schedule
obs.time = seq(from=0, to=pars$tmax, by=5)
#obs.time = seq(from=0, to=max(subjects$infect)+10, by=5)

## input to msm: test results conditioned on infection status
panel <- subjects[, by=.(id, group), j={
    time = obs.time
    ## F = S, T = I
    state = (obs.time > infect)
    ## observation process conditioned on state
    obs = rbinom(length(state), 1, 
        ifelse(state, pars$prob$I, pars$prob$S)
    )
    .(time, state, obs)
}]

## censoring: 
## group A starts in known state 1
panel[(group=='A' & time==0)]$obs <- -1
## group B starts in unknown state
panel[(group=='B' & time==0)]$obs <- -2
## Both have unknown observattion 
panel[(time==10)]$obs <- -2

## initial conditions
config <- list(
    qmat = matrix(byrow=T, nrow=.nstate, c(
        ## S-I
        0, pars$q.i*1.2,
        0, 0
    )),
    ## observation model 
    hmod = list(
        S=hmmBinom(size=1, prob=0.1),
        I=hmmBinom(size=1, prob=0.9)
    )
)

result <- msm(data=panel,
    obs ~ time,
    subject=id,
    censor = c(-1, -2),
    ## Informative group-A init 
    ## works as intended
    censor.states=list(c(1,1),c(1,2)),
    ##? behavior differs from above
    # censor.states=list(c(1),c(1,2)),
    ##? how does this work?
    # obstrue = ifelse(obs<0, 1, NA),
    initprobs=c(0.6, 0.4),
    qmatrix=config$qmat,
    hmodel=config$hmod,
    fixedpars=T
)

pred <- merge(
    panel, 
    as.data.frame(viterbi.msm(result)),
    by.x=c('id', 'time'), by.y=c('subject', 'time')
)

numerical overflow in calculating likelihood

sppb.msm <-msm(state_1 ~ days, subject = symbol, data = data,

           qmatrix = qmat0, death = 3, method = "BFGS",

           control = list(fnscale = 4000, maxit = 10000),

```
           covariates = ~ PB)
```

Error in Ccall.msm(params, do.what = "lik", ...) :
numerical overflow in calculating likelihood

numerical overflow in calculating likelihood

I want to raise this issue once again, since there was no answer

> Q <- rbind(c(0, 0.25, 0, 0.25),
           c(0.166, 0, 0.166, 0.166),
           c(0, 0.25, 0, 0.25),
           c(0, 0, 0, 0))
> cavsex.msm <- msm(state ~ years, subject=PTNUM, data=cav, qmatrix=Q, covariates = ~age)

Error in Ccall.msm(params, do.what = "lik", ...) : 
  numerical overflow in calculating likelihood

Why is this not working?

Interpretation of transition intensity

Hi Dr.Jackson,

I've been reading your articles about the "msm" package and I understand that the transition intensity is similar to the concept hazard in the survival analysis framework. However, I am not entirely sure, if the transition intensity can be interpreted as cause-specific hazard, as opposed to subdistribution hazard, in the competing risk framework.

Could you please explain a bit more how the transition intensity can be interpreted and how it corresponds to hazard in the survival analysis framework?

Thanks a lot!

Xin

chjackson / msm Goto Github PK

msm's Issues

sugested tag -> questions

Recommend Projects

Recommend Topics

Recommend Org