A previous version with only Gaussian cost function is available here.
gfpop Vignette
Vincent Runge
LaMME, Evry University
August 22, 2019 (version 2)
The Graph-constrained Change-point problem
General description
gfpop
is an R
package developed to complete parametric change-point detection in univariate time series constrained to a graph structure. Constraints are imposed to the sequence of inferred parameters (most of the time a mean parameter) of consecutive segments and related to the direction and/or the magnitude of the mean changes.
Change-point detection is performed using the functional pruning optimal partitioning method (fpop) based on an exact dynamic programming algorithm. Cf paper. On optimal multiple changepoint algorithms for large data. This algorithmic strategy can be seen as an extented Viterbi algorithm for a Hidden Markov Model with continuous parameter states. A presentation of the generalized fpop (gfpop) algorithm is available in the paper A log-linear time algorithm for constrained changepoint detection.
In the main gfpop
function the user chooses a global parametric model for the change-point problem to solve (change in "mean", "variance", "exp", "poisson" or "negbin" distribution in type
variable) and a graph
structure. To run the function the user also gives some data
to segment potentially associated with a weights
vector of same length.
gfpop <- function(data, mygraph, type = "mean", weights = NULL)
The algorithm of our package is designed to consider a large variety of constraints. These constraints are modelled by a graph. At each time a number of states is accessible, these states are the nodes of the graph. Possible transitions between states at time and are represented by the edges of the graph. Each edge is associated to main elements: a constraint (for example ), some parameters associated to the constraint, a penalty (possibly null) and robust parameters to use Huber or biweight losses only on the considered edge.
In addition to the edge definition, the nodes can be constrained to a range of values (means) in the inference. We also can specify starting and ending nodes. In our implementation the transitions are not time-dependent except for starting and ending steps.
The edges of the graph can be of type "null", "std", "up", "down" or "abs" with the following meaning:
- "null" edge : there is no change-point introduced. We stay on the same segment. "null" corresponds to the constraint . The value does not change (or we have an exponential decay if );
- "std" edge : no contraint, the next segment can have any mean;
- "up" edge : the next segment has a greater mean (we can also force the size of the gap to be greater than a minimal value). "up" corresponds to the constraint ;
- "down" edge : the next segment has a lower mean (wan also can force the size of the gap to be greater that a minimal value). "down" corresponds to the constraint ;
- "abs" edge : the absolute value of the difference of two consecutive means is greater than a given parameter.
A nonnegative internal parameter can thus be associated to an edge (in "up", "down" and "abs""). The "null" edge refers to an absence of change-point. This edge can be used between different states to constraint segment lengths. Thus our package includes the segment neighborhood algorithm for which the number of change-points is fixed. More details on graph construction are given in the section Graph construction.
The non-constrained changepoint detection problem
The change-point vector defines the segments , with fixed bounds and . We use the set to define the non-constrained minimal global cost given by
where is a penalty parameter and is the minimal cost over the segment . The penalty is understood as an additional cost when introducing a new segment. The argminimum of this quantity gives a vector containing the last position of each segment (if we do not consider ). The quantity is the solution of the non-constrained optimal partitioning method.
In our setting, the cost is the result of the minimization of a cost function with additive property:
with the argminimum defining the infered mean of the i+1-th segment with . Additivity of the cost (the decomposition) is guaranteed as we will use costs deriving from a likelihood. has in our package possible basis decompositions:
We can associate a current mean to each data point and we write a cost on a segment as a result of a constrained minimization:
so that we get another description of the objective function :
From this expression, it will be possible to impose some constraints on the vector of means .
Graph-constrained problem
is the indicator function. This previous approach can be generalized to complex constraints on consecutive means using a graph structure. We define the transition graph as a directed acyclic graph with the following properties:
-
Nodes are indexed by time and state. for node with time and state . The states are elements of the set ;
-
All the nodes have time in except for the unique starting node and the unique ending node , where denotes an undefinite state;
-
Edges are transitions between "time-consecutive" vertices of type and which gives us the edge for ;
-
Each edge is associated to a function constraining consecutive means and by the indicator function (for example ) and a penalty .
The next table summarizes all the possible constraints encoded into the gfpop
package.
constraints | , |
---|---|
null | |
std | |
up | |
down | |
abs |
We define a path of the graph as a collection of vertices with and and for and . Morever, the path is made of edges denoted with . A vector verifies the path if for all , we have (valid constraint). We write to say that the vector verifies the path . The formulation of our graph-constrained problem is then the following:
In the gfpop
package, the edges for are not time-dependent. In this case, we redefine a graph by the label of its vertices and its set of oriented edges with defining an edge going from node to node with indicator function and penalty .
In most applications, the set of edges always contains edges of type for all with indicator function as it corresponds to an absence of constraint on segment lengths.
The main algorithm of the package, the gfpop
function, returns the change-point vector defined as , with the argminimum of in and . we also give the possibility to restrict the set of valid paths by imposing a starting and/or an ending ,node and contraint the range of infered means, replacing by in the definition of .
Quick Start
we present a basic use of the main functions of the gfpop
package. More details about optional arguments are given in later sections.
We install the package from Github:
#devtools::install_github("vrunge/gfpop")
library(gfpop)
We simulate some univariate gaussian data (n = 1000
points) with relative change-point positions 0.1, 0.3, 0.5, 0.8, 1
and means 1, 2, 1, 3, 1
with a variance equal to 1
.
n <- 1000
myData <- dataGenerator(n, c(0.1,0.3,0.5,0.8,1), c(1,2,1,3,1), sigma = 1)
We define the graph of constraints to use for the dynamic programming algorithm. A simple case is the up-down constraint with a penalty here equal to a classic 2 log(n)
.
myGraph <- graph(penalty = 2*log(n), type = "updown")
The gfpop function gives the result of the segmentation using myData
and myGraph
as parameters. We choose a gaussian cost.
gfpop(data = myData, mygraph = myGraph, type = "mean")
## changepoints
## [1] 100 300 500 799 1000
##
## states
## [1] "Dw" "Up" "Dw" "Up" "Dw"
##
## forced
## [1] 0 0 0 0
##
## parameters
## [1] 1.031384 2.130883 1.047181 2.961262 1.005459
##
## globalCost
## [1] 984.3485
##
## attr(,"class")
## [1] "gfpop"
The vector changepoints
gives the last index of each segment. It always ends with the length of the vector vectData
.
The vector states
contains the states in which lies each mean. The length of this vector is the same as the length of changepoint
.
The vector forced
is a boolean vector. A forced element means that two consecutive means have been forced to satisfy the constraint. For example, the "up" edge with parameter is forced if .
The vector parameters
contains the inferred means/parameters of the successive segments.
The number globalCost
is equal to , the overall cost of the segmented data.
Some examples
Isotonic regression
The isotonic regression infers a sequence of nondecreasing means.
n <- 1000
mydata <- dataGenerator(n, c(0.1, 0.2, 0.3, 0.4, 0.6, 0.8, 1), c(0, 0.5, 1, 1.5, 2, 2.5, 3), sigma = 1)
myGraphIso <- graph(penalty = 2*log(n), type = "isotonic")
gfpop(data = mydata, mygraph = myGraphIso, type = "mean")
## changepoints
## [1] 211 383 713 1000
##
## states
## [1] "Iso" "Iso" "Iso" "Iso"
##
## forced
## [1] 0 0 0
##
## parameters
## [1] 0.2898816 1.2191021 2.1403436 2.8936415
##
## globalCost
## [1] 1025.616
##
## attr(,"class")
## [1] "gfpop"
In this example, we use in gfpop
function a robust biweight gaussian cost with K = 1
and the min
parameter in order to infer means greater than 0.5
.
Fixed number of change-points
This algorithm is called segment neighborhood in the change-point litterature. In this example, we fixed the number of segments at with an isotonic constraint. The graph contains two "up" edges with no cycling.
n <- 1000
mydata <- dataGenerator(n, c(0.1, 0.2, 0.3, 0.4, 0.6, 0.8, 1), c(0, 0.5, 1, 1.5, 2, 2.5, 3), sigma = 1)
beta <- 0
myGraph <- graph(
Edge(0, 1,"up", beta),
Edge(1, 2, "up", beta),
Edge(0, 0, "null"),
Edge(1, 1, "null"),
Edge(2, 2, "null"),
StartEnd(start = 0, end = 2))
gfpop(data = mydata, mygraph = myGraph, type = "mean")
## changepoints
## [1] 298 600 1000
##
## states
## [1] "0" "1" "2"
##
## forced
## [1] 0 0
##
## parameters
## [1] 0.5195712 1.7562499 2.6745127
##
## globalCost
## [1] 1061.791
##
## attr(,"class")
## [1] "gfpop"
Robust up-down with constrained starting and ending states
In presence of outliers we need a robust loss (biweight). We can also force the starting and ending state and a minimal gap between the means (here equal to 1
)
n <- 1000
mydata <- dataGenerator(n, c(0.1,0.3,0.5,0.8,1), c(0,1,0,1,0), sigma = 1) + 5*(rbinom(n, 1, 0.05)) - 5*(rbinom(n, 1, 0.05))
beta <- 2*log(n)
myGraph <- graph(
Edge(0, 1, "up", penalty = beta, gap = 1, K = 3),
Edge(1, 0, "down", penalty = beta, gap = 1, K = 3),
Edge(0, 0, "null"),
Edge(1, 1, "null"),
StartEnd(start = 0, end = 0))
gfpop(data = mydata, mygraph = myGraph, type = "mean")
## changepoints
## [1] 138 305 493 818 1000
##
## states
## [1] "0" "1" "0" "1" "0"
##
## forced
## [1] 1 1 0 1
##
## parameters
## [1] 0.01850678 1.01850678 0.01850678 1.02272647 0.02272647
##
## globalCost
## [1] 1079.697
##
## attr(,"class")
## [1] "gfpop"
If we skip all these constraints and use a standard fpop algorithm, the result is the following
myGraphStd <- graph(penalty = 2*log(n), type = "std")
gfpop(data = mydata, mygraph = myGraphStd, type = "mean")
## changepoints
## [1] 17 18 31 32 33 48 51 52 65 66 88 89 111 112 129 130 149 150 174 176 194 195 205 206 220 221 227 229
## [29] 284 285 340 341 358 359 407 408 424 425 428 442 449 450 473 474 479 480 486 487 510 512 549 550 551 565 566 570
## [57] 571 574 590 591 595 596 613 614 627 629 639 640 701 702 778 779 796 797 818 847 848 873 875 878 882 928 929 933
## [85] 934 953 955 957 958 962 963 1000
## states
## [1] "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std"
## [24] "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std"
## [47] "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std"
## [70] "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std" "Std"
##
## forced
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ## 0 0
## [72] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##
## parameters
## [1] 0.37883666 -5.60737627 0.14265729 -4.92600986 5.76837573 0.46081049 -2.07025038 6.85184085 -0.08186373 6.07962475 0.16536163
## [12] -6.72115878 0.11779333 -6.03851292 0.38897505 6.36318506 1.22328631 -4.84795400 1.37996121 4.93788071 0.28409636 6.30089144
## [23] 0.79739606 -5.77855389 1.43330020 6.66018030 0.79843808 -4.43936467 1.17189869 7.11750913 0.30445508 5.63498352 0.07145084
## [34] -6.13223532 -0.10524605 6.53335020 0.29761434 4.42059093 -3.52301587 -0.41869828 1.60922633 -6.59086009 0.07051877 4.87531990
## [45] -2.14802188 7.64909748 -0.03687840 -7.05437136 0.62557676 5.83147112 1.26743138 6.64006018 -4.30260774 0.94530626 -4.62237889
## [56] 1.24394216 8.62245556 -1.51272772 1.14038368 6.62490433 0.97797273 6.91505783 0.45741709 -5.00510212 0.94368809 -4.22010595
## [67] 0.82243663 6.88636862 1.06272055 -4.38864051 0.97078589 -4.92447920 1.46891367 -3.97447863 1.27138347 -0.25178268 6.50770084
## [78] 0.66843446 4.86486110 -0.36798139 -3.52991185 0.28266676 -5.37296951 1.26764318 6.32132912 -0.29253887 5.10483206 -0.24247171
## [89] 7.11097890 -0.62541613 -5.62987116 0.27378843
##
## globalCost
## [1] 1747.779
##
## attr(,"class")
## [1] "gfpop"
abs edge
With a unique "abs" edge, we impose a difference between the means of size at least 1.
n <- 10000
mydata <- dataGenerator(n, c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1), c(0, 1, 0, 2, 1, 2, 0, 1, 0, 1), sigma = 0.5)
beta <- 2*log(n)
myGraph <- graph(
Edge(0, 0,"abs", penalty = beta, gap = 1),
Edge(0, 0,"null"))
gfpop(data = mydata, mygraph = myGraph, type = "mean")
## changepoints
## [1] 1000 1999 3000 4000 5000 6000 7000 8002 9000 10000
##
## states
## [1] "0" "0" "0" "0" "0" "0" "0" "0" "0" "0"
##
## forced
## [1] 0 1 0 0 1 0 1 1 0
##
## parameters
## [1] 0.008077166 1.023073852 0.023073852 1.994958839 0.990916305 1.990916305 0.001495140 1.001495140 0.001495140 1.024786144
##
## globalCost
## [1] 2494.115
##
## attr(,"class")
## [1] "gfpop"
Notice that some of the edges are forced, the vector forced
contains non-zero values.
Exponential decay
The null edge corresponds to an exponential decay state if its parameter is not equal to 1.
n <- 1000
mydata <- dataGenerator(n, c(0.2, 0.5, 0.8, 1), c(5, 10, 15, 20), sigma = 1, gamma = 0.966)
beta <- 2*log(n)
myGraphDecay <- graph(
Edge(0, 0, "up", penalty = beta),
Edge(0, 0, "null", 0, decay = 0.966)
)
g <- gfpop(data = mydata, mygraph = myGraphDecay, type = "mean")
g
## changepoints
## [1] 200 500 800 1000
##
## states
## [1] "0" "0" "0" "0"
##
## forced
## [1] 0 0 0
##
## parameters
## [1] 0.0052340381 0.0003157659 0.0004575503 0.0198921241
##
## globalCost
## [1] 935.4619
##
## attr(,"class")
## [1] "gfpop"
and we plot the result
gamma <- 0.966
len <- diff(c(0, g<img src="/tex/46024bfe5198e5e1ff626011648c9edf.svg?invert_in_darkmode&sanitize=true" align=middle width=558.8873268pt height=24.65753399999998pt/>parameters[i]*c(1, cumprod(rep(1/gamma,len[i]-1))))}
signal <- rev(signal)
ylimits <- c(min(mydata), max(mydata))
plot(mydata, type ='p', pch ='+', ylim = ylimits)
par(new = TRUE)
plot(signal, type ='l', col = 4, ylim = ylimits, lwd = 3)
Graph construction
In the gfpop
package, graphs are represented by a dataframe with 9 features and build with the R functions Edge
, Node
, StartEnd
and graph
.
emptyGraph <- graph()
emptyGraph
## [1] state1 state2 type penalty parameter K a min max
## <0 rows> (or 0-length row.names)
state1
is the starting node of an edge, state2
its ending node. type
is one of the available edge type ("null", "std", "up", "down", "abs"). penalty
is a nonnegative parameter: the additional cost to consider when we move within the graph using a edge (or stay on the same node). parameter
is annother nonnegative parameter, a characteristics of the edge, depending of its type (it is a decay if type is "null" and a gap otherwise). K
and a
are robust parameters. min
and max
are used to constrain the rang of value for the node parameter.
We add edges into a graph as follows
myGraph <- graph(
Edge("E1", "E1", "null"),
Edge("E1", "E2", "down", 3.1415, gap = 1.5)
)
myGraph
## state1 state2 type parameter penalty K a min max
## 1 E1 E1 null 1.0 0 Inf Inf NA NA
## 2 E1 E2 down 1.5 0 Inf Inf NA NA
we can only add edges to this dataframe using the object Edge
.
The graph can contain information on the starting and/or ending edge to use with the StartEnd
function.
beta <- 2 * log(1000)
myGraph <- graph(
Edge("Dw", "Dw", "null"),
Edge("Up", "Up", "null"),
Edge("Dw", "Up", "up", penalty = beta, gap = 1),
Edge("Dw", "Dw", "down", penalty = beta),
Edge("Up", "Dw", "down", penalty = beta),
StartEnd(start = "Dw", end = "Dw"))
myGraph
## state1 state2 type parameter penalty K a min max
## 1 Dw Dw null 1 0.00000 Inf Inf NA NA
## 2 Up Up null 1 0.00000 Inf Inf NA NA
## 3 Dw Up up 1 13.81551 Inf Inf NA NA
## 4 Dw Dw down 0 13.81551 Inf Inf NA NA
## 5 Up Dw down 0 13.81551 Inf Inf NA NA
## 6 Dw <NA> start NA NA NA NA NA NA
## 7 Dw <NA> end NA NA NA NA NA NA
Some graphs are often used: they are defined by default in the graph
function. To use these graphs, we specify a string type
equal to "std", "isotonic", "updown" or "relevant".
For example,
myGraphIso <- graph(penalty = 12, type = "isotonic")
myGraphIso
## state1 state2 type parameter penalty K a min max
## 1 Iso Iso null 1 0 Inf Inf NA NA
## 2 Iso Iso up 0 12 Inf Inf NA NA
The function Node
can be used to restrict the range of value for parameter associated to a node (called also a vertex). For example the following graph is an isotonic graph with inferred parameters between 0 et 1 only.
myGraph <- graph(
Edge("Up", "Up", "up", penalty = 3.1415),
Edge("Up", "Up"),
Node("Up", min = 0, max = 1)
)
myGraph
## state1 state2 type parameter penalty K a min max
## 1 Up Up up 0 3.1415 Inf Inf NA NA
## 2 Up Up null 1 0.0000 Inf Inf NA NA
## 3 Up Up node NA NA NA NA 0 1
Supplementary R functions
Data generator function
the dataGenerator
function is used to simulate data-points from a distribution of type
equal to “mean”, “poisson”, “exp”, “variance” or “negbin”. Standard deviation parameter sigma
and decay gamma
are specific to the Gaussian mean model. size
is linked to the R “rnbinom” function from R stats package.
Standard deviation estimation
We often need to estimate the standard deviation from the observed data to normalize the data or choose the edge penalties. The sdDiff
returns such an estimation with the default HALL method [Hall et al., 1990] well suited for time series with change-points.
More on the gfpop function and its C++ structure
To be done.