The main functions is patp_test()
and get_data()
. get_data()
is a
function to create an example dataset.
patp_test()
requires the R packages survival
and matrixStats
to be
installed and loaded.get_data()
requires the R packages truncdist
and extraDistr
to be installed and loaded.
The input data need to be a data frame in the long format generated by
get_data()
. The data frame should contain the variables
id
: variable name that identifies the individual observations.cid
: variable name that identifies the clusters.from
: the state of the process at Tstart. The possible values are 1,…,k.Tstart
: starting time of the interval in the record.Tstop
: ending time of the interval in record.trans
: an integer that uniquely identifies the transition.status
: indicator variable. If status=1, the corresponding transition has been observed.group
: variable name of the binary grouping variable.
The function sopt_test()
calculates the p-value for the comparison of
the population-averaged transition probability
P**r(X(t) = j|X(s) = h) between two groups, using a
linear test or Kolmogorov–Smirnov test. The function performs has
following arguments:
data
: a data.frame in the long format followsget_data()
requirements.tmat
: a matrix of indicator transitions between states of the process where different transitions are identified by TRUE or FALSE.id
: variable name that identifies the individual observations.cid
: variable name that identifies the clusters.group
: variable name of the binary grouping variable.j
: the state j in P**r(X(t) = j|X(s) = h).B
: number of nonparametric cluster bootstrap replications. The default value is 1000.method
: “linear” or “KS”.
The artificial dataset contains clustered observations from an
illness-death process without recovery . The matrix tmatrix
of
possible transition looks as follows.
tmatrix <- trans(state_names = c("health", "illness", "death"),from = c( 1, 1, 2, 2),
to = c(2, 3, 3, 1))
tmatrix
## health illness death
## health FALSE TRUE TRUE
## illness TRUE FALSE TRUE
## death FALSE FALSE FALSE
The following example data has 10 clusters:
## cid id Tstart Tstop from to Z R group
## 1 1 1 0.0000000 0.1315225 1 2 0 1 0
## 2 1 1 0.1315225 0.3154674 2 3 0 1 0
## 3 1 2 0.0000000 0.4189244 1 3 0 1 0
## 4 1 3 0.0000000 0.4867005 1 2 0 1 0
## 5 1 3 0.4867005 0.9478138 2 1 0 1 0
## 6 1 3 0.9478138 0.9860794 1 3 0 1 0
Two-sample comparison of the transition probability P(X(t) = 2|X(0) = 1) between the groups defined by the variable group can be performed as follows
For linear test:
set.seed(1234)
sopt_test(data = tdat, tmat = tmatrix, cid = "cid",
id = "id", group = "group", j = 2, B = 1000,
method = "linear")
## p-value at State2
## 0.1369707
For Kolmogorov–Smirnov test:
set.seed(1234)
sopt_test(data = tdat, tmat = tmatrix, cid = "cid",
id = "id", group = "group", j = 2, B = 1000,
method = "KS")
## p-value at State2
## 0.251
It is recommended to use at least 1000 cluster bootstrap replications when performing two-sample hypothesis testing.