chjackson / msm Goto Github PK
View Code? Open in Web Editor NEWThe msm R package for continuous-time multi-state modelling of panel data
Home Page: https://chjackson.github.io/msm/
The msm R package for continuous-time multi-state modelling of panel data
Home Page: https://chjackson.github.io/msm/
Hi,
We planed to use msm package to infer anal HPV infection and clearance rates from panel data coming from different study/center.
The idea is to conduct a kind of meta-analysis of infection and clearance rates.
Centers are very heterogeneous in term of number of patients and follow-up duration.
Is there a smart way to handle such data within msm framework?
Would you sugest to add a 'center' covariate in the model?
Would it be better to analyse centers independently and conduct a classical rate meta-analysis in a second time?
Thanks a lot for this excellent package and the kind assistance you provide.
With best regards,
Damien
Hey! I'm not sure this is supposed to happen:
library(msm)
Q <- rbind (c(0, 0.25, 0, 0.25),
c(0.166, 0, 0.166, 0.166),
c(0, 0.25, 0, 0.25),
c(0, 0, 0, 0))
cav.msm <- msm::msm(state ~ years,
subject = PTNUM,
covariates = ~sex,
data = cav,
qmatrix = Q,
deathexact = 4)
msm::ematrix.msm(cav.msm)
#> NULL
Created on 2021-09-13 by the reprex package (v2.0.1)
There are .o
and .so
binary build artifacts present in the source tree that seem to be from the initial import of the CRAN sources into this repository.
These files prevent R CMD INSTALL
from building the library with the system compiler and libraries and fails the install sanity checks in mysterious ways on machines sufficiently different from the machine the binaries were built on.
The following code produces an error:
lapply(c(1,2,3), function(y) hmmNorm(mean = y, sd = 0.5))
This sort of construction would be quite common when using the msm package with a variable number of states.
I would like to find Transition Probability Matrix (TPM) analytically from the generator matrix. It seems it does not work, it seems it computes TPM for t=1.
GG= matrix(c(-2,1,1,1,-2,1,1,1,-2),nrow=3,ncol=3,byrow = T)
MatrixExp(GG,method ="analytic")
It does not give me the analytical form based on exponential function, like Mathematica. Please take a look at the Mathematica example http://reference.wolfram.com/language/ref/ContinuousMarkovProcess.html.
Hi Chris,
Thanks you so much for assisting me with setting up the constraint matrix. But I have a new problem on this matrix.
sex is a covariate in my mode.
I have 8 levels in the Q-matrix, some tranisitions not allowed are 2-8, 7-1, 8-1, 8-2, 8-3.
Option 1: Sex is a factor variable coded "Female", "Male"
sex.msm <- msm(state ~ Obstime, subject = subj, data = DF,
obstype = 2,
gen.inits=init,
qmatrix = Q,
method = "BFGS",
opt.method = "optim",
covariates = ~ sex,
constraint=list(sex=c( 1, 1, 1, 1, 1, 1, 1,
-1, 2, 2, 2, 2, 2,
-1, -2, 3, 3, 3, 3, 3,
-1, -2, -3, 4, 4, 4, 4,
-1, -2, -3, -4, 5, 5, 5,
-1, -2, -3, -4, -5, 6, 6,
-2, -3, -4, -5, -6, 7,
-4, -5, -6, -7)),
control = list (fnscale=95161, trace = 2, REPORT = 1,maxit=15000, reltol=1e-8))
Error in msm.check.constraint(constraint, mm) :
Covariate "sex" in constraint statement not in model.
In addition: Warning message:
In if (gen.inits) { :
the condition has length > 1 and only the first element will be used
**Option 2 : Sex is a numeric variable having values 0, 1. Sex is not Not a factor No errors whatsoever.
DF$sex <-as.numeric(DF$sex)
DF$sex[DF$sex==1]=0
DF$sex[DF$sex==2]=1
sex.msm <- msm(state ~ Obstime, subject = subj, data = DF,
obstype = 2,
gen.inits=init,
qmatrix = Q,
method = "BFGS",
opt.method = "optim",
covariates = ~ sex,
constraint=list(sex=c( 1, 1, 1, 1, 1, 1, 1,
-1, 2, 2, 2, 2, 2,
-1, -2, 3, 3, 3, 3, 3,
-1, -2, -3, 4, 4, 4, 4,
-1, -2, -3, -4, 5, 5, 5,
-1, -2, -3, -4, -5, 6, 6,
-2, -3, -4, -5, -6, 7,
-4, -5, -6, -7)),
control = list (fnscale=95161, trace = 2, REPORT = 1,maxit=15000, reltol=1e-8))
initial value 1.000000
final value 0.999298
converged
Used 33 function and 31 gradient evaluations
NB: Option 2 does work with no errors. My worries are as follows:
I have a factor variable called DMT with foour levels "DMT cat 1", "DMT cat 2", "DMT cat 3", and "DMT cat 4", where "DMT cat 4" is the reference level. This becomes an issue converting it to a numeric variables with values 1,2,3, 4, because the you cannot get separate hazards for level 1, 2, and 3 when using the constraint statement. However, using DMT as factor with 4 levels without the constraint statement produces hazards for 3 levels.
Can you sove this problem for me please? would be very much happy.
Thanks.
I noted something about NA
values in your TODO file, but not specifically for this little quirk.
rtnorm(1, NA)
runs forever rather than complaining about an NA
.
I couldn't work out why my code took all night to run and still made no progress - turns out I was indexing x[i]
where i > length(x)
. Ideally msm
would complain about missing values and exit.
Cheers!
This issue is related to #40. I'm using an HMM that requires censoring, known states at arbitrary times, and initial probabilities, all of which vary between two groups. I'm evaluating the output of viterbi.msm, in particular the vstate.*
columns, and I find that changing initprobs
changes result AIC but has no effect on pstate.V1
or pstate.V2
. Perhaps I'm confused, but these results are contrary to my expectation based on reading help(msm)
.
For example, in the example below, I've changed initprob
between two values (see arg commented with ##?
) and compared results:
Full example w/two options for initprobs:
library(msm)
set.seed(8)
## SI model
pars <- within(list(), {
npair = 50
## transition rate
q.i = 0.05
prob=list(S=0.2, I=0.8)
nstate=length(prob)
tmax=20
})
subjects <- expand.grid(
pair = 1:pars$npair,
group = factor(c('A','B'))
)
## simulate time to infection
subjects <- within(subjects, {
id <- paste(pair, group, sep='.')
infect <- rexp(nrow(subjects), pars$q.i)
})
## regular observation schedule
obs.time = seq(from=0, to=pars$tmax, by=5)
## input to msm: test results conditioned on infection status
panel <- lapply(1:nrow(subjects), function(ii) {
x <- subjects[ii,]
## subject state at time t
## F = state 1, T = state 2
state = (obs.time > x$infect)
## observation process conditioned on state
obs = rbinom(length(state), 1,
ifelse(state, pars$prob$I, pars$prob$S)
)
data.frame(id=x$id, group=x$group, time=obs.time, state=state, obs=obs)
})
panel <- do.call(rbind, panel)
## censoring:
## group A starts in known state 1
panel <- within(panel, {
obs[(group=='A' & time==0)] <- -1
## group B starts in unknown state, not observed
obs[(group=='B' & time==0)] <- -2
})
## initial conditions / observation model
config <- list(
qmat = matrix(byrow=T, nrow=pars$nstate, c(
## S-I
0, pars$q.i*1.2,
0, 0
)),
hmod = list(
S=hmmBinom(size=1, prob=0.1),
I=hmmBinom(size=1, prob=0.9)
)
)
result <- msm(data=panel,
obs ~ time,
subject=id,
censor = c(-1, -2),
## group A: start in state 1
## group B: uninformative starting state
censor.states=list(c(1),c(1,2)),
obstrue=ifelse(obs<0, 1, NA),
## initprobs are ignored?
initprobs=c(0.9, 0.1),
##? AIC changes, but no difference in pstate
#initprobs=c(0.1, 0.9),
qmatrix=config$qmat,
hmodel=config$hmod,
fixedpars=T,
)
pred <- merge(
panel,
as.data.frame(viterbi.msm(result)),
by.x=c('id', 'time'), by.y=c('subject', 'time')
)
I've run the above with the two different values of initprobs
shown above:
## test 1: initprobs=c(0.9, 0.1)
aa <- subset(pred, time==0)
## test 2: initprobs=c(0.1, 0.9)
bb <- subset(pred, time==0)
## Returns TRUE
allequal(aa, bb)
My expectation is that initprobs
should effect the "uniformative" censored initial observations of group B, but the output of viterbi.msm
indicates no change in the predicted initial state residency probabilities of this group. In fact, pred$pstate.V1
in this example is identically 1 in all cases, which seems incorrect, especially for test 1 above.
My self-implemented optim() method is returning maximum likelihood estimates which disagree with msm(). See my code below.
Perhaps I'm misinterpreting the documentation, but shouldn't the likelihood be just composed of elements of the transition probability matrix P(t) (where P(t) = exp(Qt))?
library(dplyr)
library(msm)
library(expm)
msm.data <- tibble(time = c(0, 1, 2, 3, 4, 5), states = c(1, 2, 1, 2, 2, 2), subject_id = c(1, 1, 1, 1, 1, 1))
# create a function which will be used to maximise the likelihood
log_lik <- function(par){
# define a = q_1 and b = q_2
a <- par[1]
b <- par[2]
# define transition intensity matrix
q <- rbind(
c(-a, a),
c(b, -b)
)
# compute the transition probabilities
# by taking the matrix exponential
p <- expm(q)
# trasition: 1 -> 2
L1 <- p[1, 2]
# trasition: 2 -> 1
L2 <- p[2, 1]
# trasition: 1 -> 2
L3 <- p[1, 2]
# trasition: 2 -> 2
L4 <- p[2, 2]
# trasition: 2 -> 2
L5 <- p[2, 2]
L <- log(L1)+log(L2)+log(L3)+log(L4)+log(L5)
# use the negative since we want the maximum
return(-2*L)
}
# estimate using self defined likelihood
optim.fit <- optim(par = c(0.1, 0.1), fn = log_lik, method = "BFGS", control = list(reltol = 1e-8, maxit = 100000))
# define the initial transition intensity matrix
q <- rbind(c(0, 0.1),
c(0.1, 0))
msm.fit <- msm(states ~ time, exacttimes = T, data = msm.data, qmatrix = q, control = list(reltol = 1e-8, maxit = 100000))
optim.fit
msm.fit
Dear Dr. Jackson,
I am currently trying to fit a model using the msm() command and stand stumbled upon an error:
Error in Pmat[x[1], x[2]] : out of bounds
I fitted a similar model on a previous dataset and it was working fine. I searched for the error message in the msm() code but without success.
My code looks like this :
msm_model <- msm(data = database,
subject = id,
formula = state ~ age,
covariates = list("1-2" = ~ sex + education,
"1-3" = ~ sex + education + age,
"1-4" = ~ sex + education,
"2-1" = ~ sex + education,
"2-3" = ~ sex + education + age,
"2-4" = ~ sex + education,
"3-4" = ~ sex + education),
qmatrix = trans_mat,
exacttimes = F,
deathexact = c(4),
control=list(fnscale = 37000),
gen.inits = T
)
msm_model
Any idea where the problem lies?
Thank for your help.
I am trying to use the MSM R-package to estimate a continuous-time hidden Markov model. I do not know why my code does not show the estimates and confidence intervals for the transition intensities and the hidden Markov model parameters when I print the msm object.
I am attaching the data file and the code I am using. I am brand new to HMM. Any help is sincerely appreciated!
Data: https://drive.google.com/file/d/1qirk7fpCmFvQ6seI3iXKJl-M9ppd50vr/view?usp=sharing
library (tidyLPA)
library (dplyr)
three.q<-rbind(c(0.3,0.2,0.1),c(0.25,0.5,0.25),c(0.1,0.5,0.25))
hmodel1<-list(hmmNorm(mean = 0.5, sd=0.5),hmmNorm(mean = 20.8, sd=25.1),hmmNorm(mean = 304.9, sd=203.3))
msm<-msm(Y~Day, subject=ID,data=HMMDatav0,qmatrix=three.q,hmodel =hmodel1,
hcovariates = list (~X, ~X, ~X), method="CG",
fixedpars=TRUE, control=list(fnscale=4000))
msm
The results only show the value of -2 * log-likelihood. Are there any issues with my code, or the properties of my data?
Currently the only way to obtain the likelihood is through the use of logLik.msm, which only extracts the maximum value.
Would it be possible to add a way to calculate the likelihood for a given set of parameters and a given set of data?
pearson.msm(sp.msm, timegroups=2, transitions=c(1,2,3,4,5,6))
Imputing sampling times after deaths...
Error in dist[1, ] : incorrect number of dimensions
I'm a fairly new user to msm
package but I've spent the past few days trying to understand the posterior probability (pstate
) output from viterbi.msm()
. I've gone through the mathematics of the Viterbi algorithm. I expect the posterior probability of the first observation per subject to depend on the probability of that outcome as long as initprobs
argument is provided to msm
and not equal to c(1,0,...,0)
.
Here is a toy example showing posterior probabilities (pstate
values) from viterbi.msm()
.
set.seed(123)
sim <- data.frame(time = rep(1:5,50), Id = rep(1:50,each=5))
initialstates <- sample(size = 50, 1:2, prob=c(.5,.5), replace=TRUE)
T <- matrix(c(.8,.1,.2,.9), nrow=2)
states <- matrix(0, nrow=50, ncol = 5)
for(i in 1:50){
states[i,1] <- initialstates[i]
for(j in 2:5){
states[i,j] <- sample(size=1,1:2,prob=T[states[i,j-1],],replace=TRUE)
}
}
sim$states <- as.vector(t(states))
m <- c(0,10)
sim$y <- rnorm(50*5,mean = m[sim$states], sd = 1)
hmodel1 <- list (hmmNorm(mean = 1, sd = 1), hmmNorm(mean= 11, sd = 2))
mod1 <- msm(y ~ time, subject = Id, data = sim, qmatrix = rbind( c(0, 0.01), c(0.01,0)),hmodel=hmodel1)
viterbi.msm(mod1) %>% filter(time == 1) # What you expect given constrained to start in state 1
viterbi.msm(mod1) %>% filter(time == 2) # What you expect based on outcome values
mod2 <- msm(y ~ time, subject = Id, data = sim, qmatrix = rbind( c(0, 0.01), c(0.01,0)),hmodel=hmodel1,initprobs=c(.1,.9))
viterbi.msm(mod2) %>% filter(time == 1) # Issue: Posterior probability should involve the probability of the first outcome value
viterbi.msm(mod2) %>% filter(time == 2) # What you expect based on outcome values
I'm not an expert in C, but I believe there is a bug in the Viterbi
function in the lik.c
file in the pfwd
calculation when i
=0 at the beginning of the function and when obs
= 0 at the end of the Viterbi()
function. The outcome value is calculated from GetCensored
but there is no call toGetOutcomeProb
to be incorporated into lvold
where the comments say use initprobs.
Is there a way to save a trained msm model without underlying data which was used for the training?
P.S. Thank you for your quick reply to my last question. :)
I'm preparing an MWE using hmmMV for issue #46, and I get an error when I use hmmMV
in simmulti.msm(..., hmodel=list())
. I have a working example of msm
using real data and a formulation of hmodel
similar to what's shown below. I've looked over help("simmulti.msm")
, and I don't see any specific mention of hmmMV
. Is this use-case supported? Or perhaps I'm making a silly mistake here?
library(msm)
set.seed(8)
pars <- within(list(), {
npair = 20
deltat=5
tmax=20
## transition rate
q = 0.05
})
qmat <- rbind(
c(0, pars$q),
c(0, 0)
)
## data input to simmulti.msm
sim.grid <- with(pars,
expand.grid(time=seq(0,tmax,by=deltat), subject=1:npair)
)
sim.dat <- simmulti.msm(sim.grid, qmat,
hmodel=list(
S=hmmMV(
A=hmmBinom(size=1, prob=0.1),
B=hmmBinom(size=1, prob=0.2)
),
I=hmmMV(
## sim uses larger size here
A=hmmBinom(size=2, prob=0.8),
B=hmmBinom(size=1, prob=0.6)
)
)
)
Result:
Error in if (!(hmodel[[i]] == identity && (length(hmodel[[i]]) == :
missing value where TRUE/FALSE needed
Hi,
First of all - great package and page. I'm relatively new to R, and was trying to combine several prevalence plots in 1 plot (depending on subset and covariate value), but am having a hard time changing the plot.prevalence.msm()
function to allow this, as I don't have the underlying code for observed.msm()
and expected.msm()
. I was thinking, is there an easier way of combining the plots, for example similar to how this is done in ggplot2
?
Just to make it more concrete, I am using a two-state model without absorbing state. I have specified pci
cut-offs and 1 covariate with 6 levels, let's call the covariate district. I would like to plot the observed and expected prevalence over time per district.
The msm
object I specified accordingly:
twoway.q <- rbind(
c(-0.24, 0.24),
c(0.1, 0.1))
districts.msm.pci <- msm(state ~ td1jan,
subject = id,
data = helius.inc,
qmatrix = twoway.q,
obstyp = 1,
pci=c(0.5,0.75,1),
covariates = ~ district)
The plot for one of the districts would be:
plot.prevalence.msm(districts.msm.pci, mintime=0, maxtime=2,
xlab="Years since 1 jan 2020", ylab="Prevalence (%)",
subset = id.districtS,
covariates = list(district="South"))
Thanks a lot in advance!
Hello,
I am running into a strange problem when fitting models with a time constant binary covariate as well as piecewise constant intensities. When I set the value of the covariate, e.g., in qmatrix.msm, the intensities for some transitions can possibly be either both greater or less than the intensity for the mean value of the covariate. This results in very strange estimates of first passage probabilities and other quantities. I think I must be doing something wrong in my function calls?
As an minimal working example, I've modified the sex-adjusted cav model from the msm manual and inserted a changepoint at 10 years. The transition intensities for males and females are both lower than the transition intensity at the mean value of sex, as are the first passage probabilities.
Edit: Also applies to total length of stay.
Thanks!
Jon
library(msm)
# intensity matrix from the msm manual
Q <- rbind ( c(0, 0.25, 0, 0.25),
c(0.166, 0, 0.166, 0.166),
c(0, 0.25, 0, 0.25),
c(0, 0, 0, 0))
# model with sex + piecewise constant intensities and a changepoint at 10 years
cavsex.msm <- msm( state ~ years, subject=PTNUM, data = cav,
qmatrix = Q, deathexact = 4, covariates = ~ sex, pci = 10)
# get rate matrices - transition intensity from 2-3 at the mean value of sex is greater than when sex = 0 or 1
qmatrix.msm(cavsex.msm)
qmatrix.msm(cavsex.msm, covariates = list(sex = 0))
qmatrix.msm(cavsex.msm, covariates = list(sex = 1))
# similarly, the first passage probability for 2-3 is much higher when using the average value of sex
ppass.msm(cavsex.msm)
ppass.msm(cavsex.msm, covariates = list(sex = 0), tot = 10)
ppass.msm(cavsex.msm, covariates = list(sex = 1), tot = 10)
The call to print.msm.est
contains the code print(unclassify(X))
, which is missing the argument digits=digits
.
There may be other instances where this argument is missing; this is the case that is affecting me.
Creating a hmmMVdist
object to pass to the hmodel
argument in msm
for a multivariate HMM will return an error if printed in the console. Looks like an issue with print.hmmdist
. For example:
hmmMVobj <- hmmMV(hmmNorm(mean=1, sd=3), hmmNorm(mean=0, sd=1) )
hmmMVobj
##>Hidden Markov model distribution
##>
##>Error in if (x$label == "categorical") paste("P(", seq(x$pars[1]), ")", :
##> argument is of length zero
Cheers
Currently as.numeric(state) ~ as.numeric(time) doesn't work in the "formula" argument to msm(). Should either support this or give a comprehensible error message.
Hello!
I am using the msm package to model a population of hospitalized patients whose outcomes are either death or discharge. I have a covariate x for which the msm output provides the hazard ratio and 95% CI. Is there a way to obtain the standard error for the hazard ratio?
What I am really looking for is, Is there a way to obtain the log(hazard ratio) and the standard error for the log(hazard ratio) ?
Thanks for your help!!
Hi Dr.Jackson,
I wonder if it's possible to use inverse probability weight in "msm"? If yes, how? There is no "weight" argument in msm, right? I am thinking of using inverse probability weight to deal with missing and dependent censoring.
I appreciate any help you can offer!
I have trained three HMMs to do binary classification, all with the same data. Now, one HMM is multivariate (uses two variables of the same data) and I have trained it only with data from one class. For data from this same class, I have trained a univariate HMM (using one variable of the two taken in the multivariate HMM). In addition, for the other class, I have used a univariate HMM only.
Now, I want to compare whether the final classifier (which is formed by two HMMs, one for each class) using the multivariate HMM and the other univariate HMM is better than using the two univariate HMMs.
The issue is that for the multivariate HMM I get much higher likelihoods than for the univariate HMMs in the test set, so it seems that you cannot compare in such a simple way a multivariate HMM and a univariate HMM.
So, I would like to know exactly how the msm package calculates the likelihood in the multivariate HMMs, in case I need to apply any changes to the likelihood to compare it with the univariate HMMs.
Hi,
is there a way to compute p-values associated with hazard ratios of a model? (do HRs differ from 1?)
Best regards
Hi Dr. Jackson,
When we model panel data with "MSM" and assume arbitrary observation times (exacttimes=F), is it worth adding available electronic health records to the panel data? For instance, adding national electronic health records and information from certificates of cause of death to cohort data collected regularly to identify as many disease cases as possible and also "impute" missing disease states due to dropout and death?
If it is beneficial to add those electronic health records, should they be added to the data as additional observations in practice?
Thanks!
Xin
For HMMs, is there a function that takes a hmodel.object
(or a msm.object
) and returns a list of hmmdist
objects that can be passed to the hmodel
argument in a new msm()
call? For instance, it could be used in the following way:
# create first HMM
fit1 <- msm( y ~ time, subject = ptnum, data = example.df,
qmatrix = rbind( c(0, 0.25, 0), c(0, 0, 0.2), c(0, 0, 0)),
hmodel = list (hmmNorm(mean=90, sd=8), hmmNorm(mean=70, sd=8), hmmIdent(-9))
)
# extract hmodel.object from fit1
hmodel.fit1 <- hmodel.msm(fit1$hmodel)
# create second HMM with initial emission distribution paramaters from fit1
fit2 <- msm(y ~ time, subject = ptnum, data = example.df,
qmatrix = rbind( c(0, 0.25, 0.1), c(0.1, 0, 0.2), c(0, 0, 0)),
hmodel = hmodel.fit1
)
I've built a function that does this for a very specific multivariate categorical HMM but it would be useful to have a generic function that does this for any given hmodel.object
.
Has this been considered? If not, could you provide any pointers on how best this might be achieved?
Cheers
Hello Dr. Jackson,
Thank you very much for your excellent work for MSM; I'm a new user of this package and I have faced an issue that I didn't know how to solve it. I wanted to fit a misclassification model with msm of the data about a progressive disease.
However, it resulted in a few errors, for example, "initial value in 'vmmin' is not finite" or "Optimisation has probably not converged to the maximum likelihood - Hessian is not positive definite."
Please could you advise?
Many thanks
Any suggestions on how to fix this or why this is happening in the first place would be appreciated. Sometimes the models converge but then I get this error and the object fails to save.
I've tried:
setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE)
setSessionLimit(cpu = Inf, elapsed = Inf, transient = FALSE)
and neither of them fix the error.
Hi,
With the following data I get the error 'initial value in 'vmmin' is not finite' and with the following Q of allowed transitions.
Q <- rbind(c(0,1,0,1),
c(0,0,1,1),
c(0,1,0,1),
c(0,0,0,0))
partdata.zip
With a less restrictive Q I get the 'liklihood overflow error' however a less restrictive model isn't clinicly accurate.
Please could you advise?
Many thanks
Hello,
I am trying to fit a model with two continuous gaussian outcomes from a HMM. I am following the Example 2 in help(hmmMV) but replacing hmmBinom by hmmNorm.
Here is the code and the data file for which I get this error message:
In addition: Warning message:
In if (gen.inits) { :
the condition has length > 1 and only the first element will be used
Thank you in advance for any insight!
read.table(file = "df_example_msm.txt", sep ="\t")
head(df)
# X1 X2 time X4 ID
# 1 0.00042458 -0.04983110 20.018889 C 1
# 2 0.00060557 0.01430985 20.027222 C 1
# 3 0.00181100 0.00416775 20.275000 C 1
# 4 0.01989629 -0.00678782 6.779167 C 2
# 5 0.01989629 -0.00678782 7.413056 C 2
# 6 0.02091527 -0.01099738 7.443333 C 2
df$obs <- cbind(obs1=df$X1, obs2=df$X2)
k=2 #nb. of states
q <- matrix(1, ncol=k, nrow=k) #specifying allowed transitions. Also if gen.inits=T, used for initial value (excep for diagonal values)
m1=mean(df$X1,na.rm=T)
s1=sd(df$X1,na.rm=T)
m2=mean(df$X2,na.rm=T)
s2=sd(df$X2,na.rm=T)
mod <- msm(obs ~ time, subject=ID, data=df, qmatrix=q,
hmodel <- list(hmmMV(hmmNorm(mean=m1, sd=s1),hmmNorm(mean=m2, sd=s2)),
hmmMV(hmmNorm(mean=m1, sd=s1),hmmNorm(mean=m2, sd=s2))),
control=list(maxit=10000))
Say we have an hmm, fitted using msm
on olddata
:
fittedhmm <- msm(response ~ time, subject = subject, data = olddata, ...)
I'd like to be able to run the Viterbi algorithm with the fittedhmm
model, but to predict state probabilities for a new dataset, newdata
, rather than olddata
. There's two ways I can think of doing this.
Method 1
Fit a hmm model with msm()
:
fittedhmm <- msm(response ~ time, subject = subject, data = olddata, ...)
Create a dummy, unfitted msm
object whose initial parameters are taken from fittedhmm
but are NOT updated to maximise likelihood wrt newdata
:
dummyhmm <- msm(response ~ time, subject = suject, data = newdata,
hmodel = hlist(fittedhmm),
qmatrix = qmatrix(fittedhmm),
fit = FALSE,
...)
where hlist()
takes the hmodel.object
from fittedhmm
and converts it to the correct format (see #17 (comment)). The fit=FALSE
is the way to prevent the optimsation routine from running, so the model keeps the initial values from fittedhmm
. Then simply pass this object to viterbi.msm()
:
newpredictions <- viterbi.msm(dummyhmm)
Method 2
This is neater. Modify viterbi.msm()
directly so that it has a data
argument, to be used like this:
newpredictions <- viterbi.msm(x = fittedmsm, data = newdata)
Looking at the viterbi.msm()
internals, this would mean creating an object from newdata
that is the same as the output of msm:::expand.data()
(which is then passed to Ccall.msm(params, do.what = "viterbi", ...)
I'm not sure how easy this will be.
Fudge
A simple fudge is to take create a dummy hmm that will convert newdata
as appropriate (but the fitted parameters are ignored) and run the appropriate parts of viterbi.msm()
with fittedhmm
and the dummy hmm, like this:
library('msm')
## Simulate data from a Markov model
nsubj <- 30; nobspt <- 5
sim.df <- data.frame(subject = rep(1:nsubj, each=nobspt),
time = seq(0, 20, length=nobspt))
two.q <- rbind(c(-0.1, 0.1), c(0, 0))
set.seed(123)
dat <- simmulti.msm(sim.df[,1:2], qmatrix=two.q, drop.absorb=FALSE)
dat$obs1[dat$state==1] <- rbinom(sum(dat$state==1), 40, 0.1)
dat$obs1[dat$state==2] <- rbinom(sum(dat$state==2), 40, 0.5)
newdat <- simmulti.msm(sim.df[,1:2], qmatrix=two.q, drop.absorb=FALSE)
newdat$obs1[newdat$state==1] <- rbinom(sum(newdat$state==1), 40, 0.1)
newdat$obs1[newdat$state==2] <- rbinom(sum(newdat$state==2), 40, 0.5)
newdat<-subset(newdat, subject<16) #just to check it works on different sized data
## Fitted model
fittedmsm <- msm(obs1 ~ time, subject=subject, data=dat, qmatrix=two.q,
hmodel = list(hmmBinom(size=40, prob=0.2),
hmmBinom(size=40, prob=0.2))
)
##Dummy model to process newdata in the right way (fitted parameters are ignored)
dummymsm <- msm(obs1 ~ time, subject=subject, data=newdat, qmatrix=two.q,
hmodel = list(hmmBinom(size=40, prob=0.2),
hmmBinom(size=40, prob=0.2))
)
# modified version of viterbi.msm()
viterbiout <- msm:::Ccall.msm(fittedmsm$paramdata$opt$par,
do.what = "viterbi",
msm:::expand.data(dummymsm),
fittedmsm$qmodel, fittedmsm$qcmodel, fittedmsm$cmodel, fittedmsm$hmodel, fittedmsm$paramdata
)
stateprobs <- data.frame(subject = dummymsm$data$mf$"(subject)",
time = dummymsm$data$mf$"(time)",
observed = dummymsm$data$mf$"(state)",
fitted = viterbiout[[1]]+1,
pstate = viterbiout[[2]]
)
head(stateprobs)
#> subject time observed fitted pstate.1 pstate.2
#>1 1 0 5 1 1.000000e+00 0.0000000
#>2 1 5 18 2 6.994981e-08 0.9999999
#>3 1 10 19 2 5.396904e-16 1.0000000
#>4 1 15 23 2 6.162963e-28 1.0000000
#>5 1 20 23 2 1.607795e-39 1.0000000
#>6 2 0 3 1 1.000000e+00 0.0000000
This example demonstrates the desired output but I'd obviously want to stripe away all the unneccesary parts for the conversion of newdata
, rather than passing it through msm
every time.
I'll probably try to put something together for method 2. If you have any advice before I get started I'd be very happy to hear it.
Cheers!
Good morning,
when running this code:
`library("msm")
data_test <- read.delim("~/data_test.txt")
Q <- matrix(0.1, nrow = 12, ncol = 12)
model.msm1 <- msm( GROUP ~ TIME, subject = id, data = data_test,
qmatrix = Q, gen.inits=FALSE, control = list(fnscale = 500000000, maxit = 500), exacttimes=TRUE)
pmat <- pmatrix.msm(model.msm1, t=1, ci = "bootstrap" )
`
I am returned the following error: "Error in msm(formula = GROUP ~ TIME, subject = id, data = structure(list( :
state variable is character, should be numeric". The issue is specific to the bootstrap option and persists even if i try using the boots.msm function
the data i'm using is
https://github.com/LeoCi1907/myrep/blob/main/data_test.zip
Am I missing something obvious? The thing is, it has no issues with msm() as long as bootstrapping is not involved. Any support is appreciated.
Thank you
The following is based on my reading of the obstrue
section of help("msm")
and previous discussion of obstrue
in Issues #40 and #41. Apologies in advance if I've misinterpreted "true state" here.
This simplified example looks at obstrue
in a HMM without censoring. I expect obstrue
to indicate the true hidden state here, and for this to be reflected in the pstate
matrix from viterbi.msm
. In this examlpe, I see that fitted
always equals obstrue
(when provided), but the relevant pstate
seem to ignore the information in obstrue
.
library(msm)
set.seed(8)
pars <- within(list(), {
npair = 20
## transition rate
q = 0.05
prob=list(S=0.2, I=0.8)
deltat=5
tmax=20
})
## Full example
sim.grid <- with(pars,
expand.grid(time=seq(0,tmax,by=deltat), subject=1:npair)
)
## initial conditions / observation model
config <- list(
qmat = rbind(
c(0, pars$q),
c(0, 0)
),
hmod = list(
S=hmmBinom(size=1, prob=0.2),
I=hmmBinom(size=1, prob=0.8)
)
)
sim.dat <- simmulti.msm(sim.grid,
config$qmat, drop.absorb=F,
hmodel=list(
S=hmmBinom(size=1, prob=0.2),
## sim uses larger size
I=hmmBinom(size=2, prob=0.8)
)
)
panel <- subset(sim.dat, select=c(subject, time, obs))
panel <- within(panel, {
## force state 2
obstrue <- ifelse(obs==2, 2, NA)
obs <- ifelse(obs>1, 1, 0)
})
result <- msm(data=panel,
obs ~ time, subject=subject,
obstrue=obstrue,
qmatrix=config$qmat,
hmodel=config$hmod,
fixedpars=T
)
pred <- merge(
panel,
viterbi.msm(result),
by=c('subject', 'time'), sort=F
)
## I expect pstate.2==1 when obstrue==2
expect <- subset(pred, obstrue==2)
Hello,
thanks a lot for your package. Did you ever consider providing tidiers for msm models, allowing to easy use the results for producing graphs or table?
It would be great to have a tidy
method for msm models. cf. https://broom.tidyverse.org/articles/external-tidiers.html
Ideally, there should be an option to precise that should be returned:
Hello Dr. Jackson,
Thank you very much for your excellent work for MSM; I'm a new user of this package and I would like to know if the MSM package provides the function to check the Markov property. if not, how we can be sure that the Markov property holds? if there are other packages that can implement this function.
Hello,
I have another question that we can't seem to solve ourselves, but are sure there is a way.. We have fitted a two-state time-inhomogeneous model with an absorbing state and would like to test for and model the interaction between our covariates and the time variable:
twoway.q <- rbind(
c(-0.24, 0.24),
c(0, 0.1))
districts.msm.pci <- msm(state ~ td1jan,
subject = id,
data = helius.inc,
qmatrix = twoway.q,
obstyp = 1,
pci=c(0.5,0.75,1),
covariates = ~ district + sex + age)
Now, the manual (p.47) states that "this facility does not support interactions between time and other covariates. Such models need to be specified by hand, using a state variable with censored observations inserted. Note that the data component of the msm object returned from a call to msm with pci supplied contains the states with inserted censored observations and time period indicators. These can be used to construct such models".
From our districts.msm.pci
object, I can see the created data frame in this list, including the (pci.imp)
double and timeperiod
factor. Now, how can I get from here to the interaction model? I tried using this data frame as the input for a new msm object, but am getting errors for the state variable, which now contains three states while my original model only had two. I tried using (obstrue)
as the state variable instead while specifying the option obstyp==2
, but then I get a lot of errors about the data being inconsistent with the transition matrix..
I have tried finding a documented example of a msm model with a similar interaction, but could not find one..
Thanks a lot in advance!
Using the attached data I can successfully run an msm model:
Q <- rbind(c(0, 0, 0.25, 0, 0.25), c(0, 0, 0.25, 0, 0.25), c(0.125, 0.125, 0, 0.125, 0.125), c(0, 0, 0.25, 0, 0.25), c(0, 0, 0, 0, 0))
model1 <- msm(state ~ time, subject = id, data = data, qmatrix = Q, gen.inits = T)
But when I try to perform any function involving bootstrapping, I get the following error message:
Error in msm.check.times(mf$"(time)", mf$"(subject)", mf$"(state)") :
Observations within subjects 49, 111, 240 and others are not ordered by time
I've tried
msm.check.times(data$time, data$id, data$state)
using the function found on GitHub and there doesn't appear to be a problem with the data. I've also ordered the dataframe according to:
data <- data[order(data$id, data$time),]
This has really got me stuck. Any help greatly appreciated.
Hello Dr. Jackson,
Thank you very much for your excellent work for MSM; I'm a new user of this package and I have faced an issue that I didn't know how to understand. I want to evaluate the effect of variates on partial transitions, such as,
I get the hazard ratios and confidence intervals of age on the first transition, as well as the age and gender on the second transition by code: covariates = list("1-2" = ~ age, "2-3" = ~ age + gender). However, when I add a new transition in the above code: covariates = list("1-2" = ~ age, "2-3" = ~ age + gender, "2-4"= ~ age + gender+education), I will obtain a new value of the hazard ratios and confidence intervals of age on transition "1-2" , and age, gender on transition "2-3". How should I understand the difference between the two values? If I only want to examine the effect of age, gender on transition "2-3", "2-4", Can I ignore the effect of variables on the transition "1-2", code like this: covariates = list( "2-3" = ~ age + gender+education, "2-4"= ~ age + gender+education)?
Error in balance(baP$z, "S") :
BLAS/LAPACK routine 'DGEBAL' gave error code -3
Calls: pmatrix.msm ... MatrixExp -> -> expm.Higham08 -> balance
Execution halted
I got the following error when I tried to use pmatrix.msm() on a model that I fit with specified values for all covariates. An initial sweep on Google didn't return too many results. I'm not sure exactly what the error means or how to avoid it.
Many thanks in advance for all your help,
Matt
Hello,
obstype = 2
).If I allow only A->B and B->C transitions in qmatrix
, I'm ending up with infinite values during the optimisation process and the model is not able to converge.
An option would be therefore to allow A->B, B->C and A->C transitions in the model. In the model, I have a covariate (for example Age). I would like to constrain the associated hazard ratio for "A->C" transition or HR(A->C) to be equal to the multiplication of HR(A->B) by HR(B->C), i.e. to consider that the effect of age on transitioning from A to C is equal to the cumulative effect of age when transitioning from A to B and then from B to C.
Is there a way to specify such constraints in msm
?
In all cases, thanks in advance for your help.
Hi Dr.Jackson,
I read your answer to the previous issue, "numerical overflow in calculating likelihood #5". If I understand correctly, if we specify "exacttimes=T", we fit discrete-time models. I wonder, in practice, when a continuous-time model should be used and when a discrete-time model should be used? Should the time interval between two observations not be too long to be able to fit a continuous-time model?
Thanks a lot!
Xin
Hello Dr. Jackson,
Thank you very much for crafting MSM; I've been reading your papers and using this package for a new project and really appreciate all of your work. Today I've run into a funky issue.
On an earlier version of my data which did not have follow-up for everyone, the Pearson test ran without issue. Today I loaded up my new data and it reported
Error in pearson.msm(mc_msm, timegroups = 2) : Remove any absorbing-absorbing transitions from data and refit model
I understood the error; a few individuals who had died at timepoint=3 were included with timepoint=4 observations, but again simply logged as having died. I removed them and re-ran the model fitting procedure without issue. However, when I attempted to rerun the Pearson-like test pearson.msm(mc_msm, timegroups=3)
I received the following error:
Error in seq.default(0, 1, 1/n) : argument "n" is missing, with no default
It works for timegroups=2, but not 3. I spent a few hours debugging and I believe I have figured out the problem: in removing the absorbing observations I have shifted the distribution of the observation times (since all patients have timepoints 1 through 4). In doing so qcut(md$time, 3) in the Pearson code goes from returning a vector with levels of 1, 2, 3 to a vector with levels 1, (1, 2], [2, 3). In doing so the rest of the grouping is broken because a the group defined by (1, 2) is empty.
Having discovered the etiology I've no clue how to fix it without changing the underlying code. I was curious if you have ever encountered this before (and could suggest a remedy I may have missed, e.g. maybe my model is just being fit wrong). If not, the only solution I can think of would be to allow the Pearson function to allow as input a manual set of quantiles. I don't think this would be hard to do and if that is the only route I'll submit a pull request and try and do it.
Cheers,
Chris
Good night,
How can I change the reference value it takes from the covariates that are factors?
Hello,
I am trying to fit a multistate model with a covariate and am getting the following error:
Error in omit | (nobspt == 1) : non-conformable arrays
Not really sure what in my model setup would lead msm to enter this function, but the error appears to be thrown on line 25 of na.find.msmdata
and arises from the omit object being a matrix, in my case of dimension 15k x 1, and nobspt being a table. Perhaps this is an R 4.0+ issue and these two objects were previously conformable?
Thanks,
Jon
Hi! Thanks for this wonderful package. Im just writing in to say I spent the better part of the last three days attempting to use msm2Surv while passing a tibble as the data argument. This results in the from and to fields of this line:
## Data frame of observed events ev <- data.frame(id=data[!lpt,subject], from=data[!lpt,state],to=data[!fpt,state], Tstart=data[!lpt,time], Tstop=data[!fpt,time], time=data[!fpt,time]-data[!lpt,time], status=rep(1,nev))
Overwriting an unused state field that ultimatly gets lost as well. This results in everything under that line breaking.
Now, this is not a bug, a dataframe is requested by the function and thats what one should pass it. But most of us I believe use tibbles and data.frames as almost interchangeable: Not so!
ALSO: your follow up time should be ordered before feeding it to this function.
Thanks again!
I'm working on an infectious disease project that uses a HMM and uses censoring and known states at arbitrary times. I'm struggling to understand how obstrue
works in this case. After looking closely at help(msm)
for argument obstrue
, my impression was that A) I needed obstrue==1 for each censored observation, and that B) I could force a censored observation into a given state by providing a single state in the appropriate entry of censor.states
. My experience with msm suggests that I'm misinterpreting something.
I've found that I can use censoring to force a given state if I:
obstrue
censor.states=list(known=c(1,1), other=c(1,2))
Notably, censor.states=list(known=c(1), other=c(1,2))
does not behave as expected / as above (but still overrides provided initprobs, though that's a separate issue).
Below is a full example with four possible behaviors (see args commented with ##?
): two different censor.states
, each with and without obstrue
. I'm struggling to understand the behavior of each option, and which is appropriate for my use-case.
Any advice would be appreciated!
library(msm)
library(data.table)
set.seed(8)
## msm docs:
# * obstrue=1 required for censor effects (add note to censor?)
# * censor.state list: length 1 fails without error
## SI model
pars <- within(list(), {
npair = 50
q.i = 0.05
prob=list(S=0.2, I=0.8)
nstate=length(prob)
tmax=20
})
subjects <- expand.grid(
pair = 1:pars$npair,
group = factor(c('A','B'))
)
## simulate time to infection
subjects <- within(subjects, {
id <- paste(pair, group, sep='.')
infect <- rexp(nrow(subjects), pars$q.i)
})
subjects <- data.table(subjects)
## regular observation schedule
obs.time = seq(from=0, to=pars$tmax, by=5)
#obs.time = seq(from=0, to=max(subjects$infect)+10, by=5)
## input to msm: test results conditioned on infection status
panel <- subjects[, by=.(id, group), j={
time = obs.time
## F = S, T = I
state = (obs.time > infect)
## observation process conditioned on state
obs = rbinom(length(state), 1,
ifelse(state, pars$prob$I, pars$prob$S)
)
.(time, state, obs)
}]
## censoring:
## group A starts in known state 1
panel[(group=='A' & time==0)]$obs <- -1
## group B starts in unknown state
panel[(group=='B' & time==0)]$obs <- -2
## Both have unknown observattion
panel[(time==10)]$obs <- -2
## initial conditions
config <- list(
qmat = matrix(byrow=T, nrow=.nstate, c(
## S-I
0, pars$q.i*1.2,
0, 0
)),
## observation model
hmod = list(
S=hmmBinom(size=1, prob=0.1),
I=hmmBinom(size=1, prob=0.9)
)
)
result <- msm(data=panel,
obs ~ time,
subject=id,
censor = c(-1, -2),
## Informative group-A init
## works as intended
censor.states=list(c(1,1),c(1,2)),
##? behavior differs from above
# censor.states=list(c(1),c(1,2)),
##? how does this work?
# obstrue = ifelse(obs<0, 1, NA),
initprobs=c(0.6, 0.4),
qmatrix=config$qmat,
hmodel=config$hmod,
fixedpars=T
)
pred <- merge(
panel,
as.data.frame(viterbi.msm(result)),
by.x=c('id', 'time'), by.y=c('subject', 'time')
)
sppb.msm <-msm(state_1 ~ days, subject = symbol, data = data,
qmatrix = qmat0, death = 3, method = "BFGS",
control = list(fnscale = 4000, maxit = 10000),
covariates = ~ PB)
Error in Ccall.msm(params, do.what = "lik", ...) :
numerical overflow in calculating likelihood
I want to raise this issue once again, since there was no answer
> Q <- rbind(c(0, 0.25, 0, 0.25),
c(0.166, 0, 0.166, 0.166),
c(0, 0.25, 0, 0.25),
c(0, 0, 0, 0))
> cavsex.msm <- msm(state ~ years, subject=PTNUM, data=cav, qmatrix=Q, covariates = ~age)
Error in Ccall.msm(params, do.what = "lik", ...) :
numerical overflow in calculating likelihood
Why is this not working?
Hi Dr.Jackson,
I've been reading your articles about the "msm" package and I understand that the transition intensity is similar to the concept hazard in the survival analysis framework. However, I am not entirely sure, if the transition intensity can be interpreted as cause-specific hazard, as opposed to subdistribution hazard, in the competing risk framework.
Could you please explain a bit more how the transition intensity can be interpreted and how it corresponds to hazard in the survival analysis framework?
Thanks a lot!
Xin
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.