vrunge / gfpop Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
Previously, I used "type="gauss"" in gfpop function, and now, I am using "type="mean"". However, I got different values for the cost function. Could you please help me to figure out why the value of globalcost is different in two versions of the package?
I tried to calculate the cost function for one example;
Calculate.cost <- function(data ,changepoints,mean,penalty){
mean.vector <- rep(mean, diff(append(0,changepoints)))
sum.vector <- sum((data-mean.vector)^2)
cost <- sum.vector + penalty*(length(changepoints)-1)
return(cost)
}
n <- 1000
myData <- dataGenerator(n, c(0.1,0.3,0.5,0.8,1), c(1,2,1,3,1), sigma = 1)
myGraph <- graph(penalty = 2*log(n), type = "updown")
fit <- gfpop(data = myData, mygraph = myGraph, type = "mean")
Calculate.cost(data=myData, changepoints=fit$changepoints, mean=fit$parameters, penalty= 2*log(n))
and got
> Calculate.cost(data=myData, changepoints=fit$changepoints, mean=fit$parameters, penalty= 2*log(n))
[1] 1062.506
> fit
$changepoints
[1] 100 289 501 800 1000
$states
[1] "Dw" "Up" "Dw" "Up" "Dw"
$forced
[1] 0 0 0 0
$parameters
[1] 1.0702011 2.0248682 1.1097585 3.0439367 0.9613254
$globalCost
[1] 1021.059
attr(,"class")
[1] "gfpop"
When I installed this package by using remotes::install_github(repo="vrunge/gfpop")
, and it shows me an error
ListPiece.cpp: In member function 'void ListPiece::test()':
ListPiece.cpp:731:48: error: 'pow10' was not declared in this scope
if(fabs(rightEval - leftEval) > pow10(-10))
^
make: *** [C:/PROGRA~1/R/R-36~1.1/etc/i386/Makeconf:215: ListPiece.o] Error 1
ERROR: compilation failed for package 'gfpop'
* removing 'C:/Users/af2329/Documents/R/win-library/3.6/gfpop'
* restoring previous 'C:/Users/af2329/Documents/R/win-library/3.6/gfpop'
(converted from warning) installation of package ‘C:/Users/af2329/AppData/Local/Temp/RtmpK27q1r/file2454140535f4/gfpop_0.2.2.tar.gz’ had non-zero exit status
I checked the dependencies, and they should be all good. Could you help me fix this issue?
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] remotes_2.1.0 devtools_2.2.1 usethis_1.5.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 rstudioapi_0.10 magrittr_1.5 pkgload_1.0.2 R6_2.4.1 rlang_0.4.2
[7] fansi_0.4.0 tools_3.6.1 pkgbuild_1.0.6 sessioninfo_1.1.1 cli_2.0.0 withr_2.1.2
[13] ellipsis_0.3.0 assertthat_0.2.1 digest_0.6.23 rprojroot_1.3-2 crayon_1.3.4 processx_3.4.1
[19] callr_3.4.0 fs_1.3.1 ps_1.3.0 curl_4.3 testthat_2.3.1 memoise_1.1.0
[25] glue_1.3.1 compiler_3.6.1 desc_1.2.0 backports_1.1.5 prettyunits_1.0.2
Hi Vincent, and CC @tdhock
These two bits of code should produce the exact same thing, right?
std_default <- gfpop::graph(type = "std")
and
std_custom <- gfpop::graph(
gfpop::Edge(state1 = "Std", state2 = "Std", type = "null", gap = 1, penalty = 0, K = Inf, a = 0),
gfpop::Edge(state1 = "Std", state2 = "Std", type = "std", gap = 0, penalty = 0, K = Inf, a = 0)
)
But testthat
doesn't agree, because the min
and max
columns of the std_default
are numeric, but they are logical in std_custom
.
> library(testthat)
> expect_equal(std_default, std_custom)
# Error: `std_default` not equal to `std_custom`.
# Component “min”: Modes: numeric, logical
# Component “min”: target is numeric, current is logical
# Component “max”: Modes: numeric, logical
# Component “max”: target is numeric, current is logical
Do you agree that these should be equal? If so, I think it's as easy as just casting min
and max
as numeric in the Edge()
function. So, changing this line:
Line 39 in e756a95
To:
data.frame(state1, state2, type, parameter, penalty, K, a,
min = as.numeric(NA), max = as.numeric(NA), stringsAsFactors = FALSE)
If I do that on my end and then name it Edge_new()
, test_that seems happy:
std_custom_revised <- gfpop::graph(
Edge_new(state1 = 'Std', state2 = 'Std', type = 'null', gap = 1, penalty = 0, K = Inf, a = 0),
Edge_new(state1 = 'Std', state2 = 'Std', type = 'std', gap = 0, penalty = 0, K = Inf, a = 0)
)
> expect_equal(std_default, std_custom_revised)
>
Hi I try to define a graph to find change points of the ECG signal.
The data I used and the code will be attached.
This is my code :
library(gfpop)
data_ecg <- read.csv("Test1.csv")
myGraph <- graph( type ="updown",
Edge("b", "P", "up", 8000000, gap=2000),
Edge("P", "x1", "down", 0 , gap=500),
Edge("x1", "Q", "down", 0, gap=0),
Edge("Q", "R", "up", 8000000, gap=2000),
Edge("R", "S", "down", 0, gap=5000),
Edge("S", "o1", "up", 0, gap=2000),
Edge("o1", "T", "up", 0, gap=1000),
Edge("T", "o2", "down", 0, gap=0),
Edge("o2", "o3", "down", 0, gap=0),
Edge("o3", "b", "down", 0, gap=0))
gfpop(data = data_ecg, mygraph = myGraph, type = "mean")
The Error I come across is :
Error in explore(mynewgraph) : Not all path have a recursive edge
I expected that the gfpop would run without error, but I did not find this error instructive to solve the issue, can you help me with resolving it?
Data link: https://drive.google.com/drive/folders/1K4Ru4y-1skY49cmUh9T5yWxGejIp-Ff3?usp=sharing
Today I imported a .csv file specifying a constraint graph and passed it through gfpop::graph()
. I tried to use that graph to run gfpop, but got an error that was a bit confusing:
Error in x[[jj]][iseq] <- vjj : replacement has length zero
The problem is that, when I import a graph from file, I need to set stringsAsFactors
to false. I saw the same problem and solution with read.csv
and data.table::fread
.
I worked through that issue here in gfpop-gui and @tdhock suggested posting here.
Example:
# This graph works perfectly
iso_graph <- gfpop::graph(penalty = 15, type = "isotonic")
data <- gfpop::dataGenerator(50, changepoints = c(1), parameters = c(1))
gfpop::gfpop(data = data, mygraph = iso_graph, type = "mean")$changepoints
# [1] 50
# If I export and re-import the graph, it gives a confusing error
write.csv(iso_graph, "graphtest.csv", row.names = FALSE)
iso_graph_import <- gfpop::graph(read.csv("graphtest.csv"))
gfpop::gfpop(data = gfpop_data$data, mygraph = iso_graph_import, type = "mean")$changepoints
# Error in x[[jj]][iseq] <- vjj : replacement has length zero
# Specifying that strings shouldn't be factors fixes the problem
iso_graph_import <- gfpop::graph(read.csv("graphtest.csv", stringsAsFactors = FALSE))
gfpop::gfpop(data = data, mygraph = iso_graph_import, type = "mean")$changepoints
# [1] 50
hey @vrunge
DANGER : code broken in August in preparation for big update!
suggestion: rather than always pushing commits to master, I would suggest creating a new branch and pushing to that, in order to leave master in a working state. https://guides.github.com/introduction/flow/
after you test your new commits (maybe using CI, e.g. https://docs.travis-ci.com/user/languages/r/ ) then you would merge them into master, using a pull request https://help.github.com/en/articles/about-pull-requests
Hi @vrunge I think your code is great, and it would be even better if it was easy to understand the internals. When you get a chance, can you please write a README / vignette about how the C++ code is organized?
Thanks
Hi @vrunge my PhD student @AtiyehFtn is running gfpop (see precise version in install_github line below) and running out of error because of a memory leak. I investigated on my linux machine using valgrind and I observe a memory leak:
tdhock@recycled:~/R/gfpop-bug$ R --vanilla -d valgrind < gfpop.R
==7799== Memcheck, a memory error detector
==7799== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==7799== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==7799== Command: /home/tdhock/lib/R/bin/exec/R --vanilla
==7799==
R version 3.6.0 (2019-04-26) -- "Planting of a Tree"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: i686-pc-linux-gnu (32-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> set.seed(1)
> x <- rnorm(100)
> g <- gfpop::graph(type="std")
> w <- rep(1, length(x))
> fit <- gfpop::gfpop(x, w, g)
> print(fit)
$changepoints
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[20] 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
[39] 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
[58] 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77
[77] 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
[96] 97 98 99 100
$states
integer(0)
$forced
integer(0)
$means
[1] 0.183643324 -0.835628612 1.595280802 0.329507772 -0.820468384
[6] 0.487429052 0.738324705 0.575781352 -0.305388387 1.511781168
[11] 0.389843236 -0.621240581 -2.214699887 1.124930918 -0.044933609
[16] -0.016190263 0.943836211 0.821221195 0.593901321 0.918977372
[21] 0.782136301 0.074564983 -1.989351696 0.619825748 -0.056128740
[26] -0.155795507 -1.470752384 -0.478150055 0.417941560 1.358679552
[31] -0.102787727 0.387671612 -0.053805041 -1.377059557 -0.414994563
[36] -0.394289954 -0.059313397 1.100025372 0.763175748 -0.164523596
[41] -0.253361680 0.696963375 0.556663199 -0.688755695 -0.707495157
[46] 0.364581962 0.768532925 -0.112346212 0.881107726 0.398105880
[51] -0.612026393 0.341119691 -1.129363096 1.433023702 1.980399899
[56] -0.367221476 -1.044134626 0.569719627 -0.135054604 2.401617761
[61] -0.039240003 0.689739362 0.028002159 -0.743273209 0.188792300
[66] -1.804958629 1.465554862 0.153253338 2.172611670 0.475509529
[71] -0.709946431 0.610726353 -0.934097632 -1.253633400 0.291446236
[76] -0.443291873 0.001105352 0.074341324 -0.589520946 -0.568668733
[81] -0.135178615 1.178086997 -1.523566800 0.593946188 0.332950371
[86] 1.063099837 -0.304183924 0.370018810 0.267098791 -0.542520031
[91] 1.207867806 1.160402616 0.700213650 1.586833455 0.558486426
[96] -1.276592208 -0.573265414 -1.224612615 -0.473400636
$cost
[1] 0
attr(,"class")
[1] "gfpop"
> devtools::session_info()
Session info ------------------------------------------------------------------
setting value
version R version 3.6.0 (2019-04-26)
system i686, linux-gnu
ui X11
language (EN)
collate en_US.UTF-8
tz America/Phoenix
date 2019-10-24
Packages ----------------------------------------------------------------------
package * version date source
base * 3.6.0 2019-05-15 local
compiler 3.6.0 2019-05-15 local
datasets * 3.6.0 2019-05-15 local
devtools 1.13.5 2018-02-18 CRAN (R 3.5.0)
digest 0.6.15 2018-01-28 CRAN (R 3.5.0)
gfpop 0.1.3 2019-10-24 local
graphics * 3.6.0 2019-05-15 local
grDevices * 3.6.0 2019-05-15 local
memoise 1.1.0 2017-04-21 CRAN (R 3.5.0)
methods * 3.6.0 2019-05-15 local
Rcpp 1.0.1 2019-03-17 CRAN (R 3.6.0)
stats * 3.6.0 2019-05-15 local
utils * 3.6.0 2019-05-15 local
withr 2.1.2 2018-03-15 CRAN (R 3.5.0)
>
==7799==
==7799== HEAP SUMMARY:
==7799== in use at exit: 34,689,521 bytes in 15,335 blocks
==7799== total heap usage: 38,721 allocs, 23,386 frees, 79,644,002 bytes allocated
==7799==
==7799== 56 bytes in 1 blocks are definitely lost in loss record 46 of 2,078
==7799== at 0x4830FF5: operator new(unsigned int) (/home/tdhock/R/valgrind-3.14.0/coregrind/m_replacemalloc/vg_replace_malloc.c:328)
==7799== by 0xBA731DA: Omega::Omega(Graph, Bound, Robust) (/home/tdhock/R/gfpop/src/Omega.cpp:28)
==7799== by 0xBA85171: gfpopTransfer(Rcpp::Vector<14, Rcpp::PreserveStorage>, Rcpp::Vector<14, Rcpp::PreserveStorage>, Rcpp::DataFrame_Impl<Rcpp::PreserveStorage>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, double, double, double, double) (/home/tdhock/R/gfpop/src/main.cpp:90)
==7799== by 0xBA7B99E: _gfpop_gfpopTransfer (/home/tdhock/R/gfpop/src/RcppExports.cpp:22)
==7799== by 0x4902968: R_doDotCall (/home/tdhock/R/R-3.6.0/src/main/dotcode.c:586)
==7799== by 0x493DDC3: bcEval (/home/tdhock/R/R-3.6.0/src/main/eval.c:7283)
==7799== by 0x494775D: Rf_eval (/home/tdhock/R/R-3.6.0/src/main/eval.c:620)
==7799== by 0x49495D7: R_execClosure (/home/tdhock/R/R-3.6.0/src/main/eval.c:1780)
==7799== by 0x494A500: Rf_applyClosure (/home/tdhock/R/R-3.6.0/src/main/eval.c:1706)
==7799== by 0x493E844: bcEval (/home/tdhock/R/R-3.6.0/src/main/eval.c:6733)
==7799== by 0x494775D: Rf_eval (/home/tdhock/R/R-3.6.0/src/main/eval.c:620)
==7799== by 0x49495D7: R_execClosure (/home/tdhock/R/R-3.6.0/src/main/eval.c:1780)
==7799==
==7799== 56 bytes in 1 blocks are definitely lost in loss record 47 of 2,078
==7799== at 0x4830FF5: operator new(unsigned int) (/home/tdhock/R/valgrind-3.14.0/coregrind/m_replacemalloc/vg_replace_malloc.c:328)
==7799== by 0xBA7798C: Piece::operator_std_min_argmin(int, int&, double&, Bound const&) (/home/tdhock/R/gfpop/src/Piece.cpp:647)
==7799== by 0xBA7169E: Omega::fpop1d_graph_std(Data const&) (/home/tdhock/R/gfpop/src/Omega.cpp:278)
==7799== by 0xBA8525A: gfpopTransfer(Rcpp::Vector<14, Rcpp::PreserveStorage>, Rcpp::Vector<14, Rcpp::PreserveStorage>, Rcpp::DataFrame_Impl<Rcpp::PreserveStorage>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, double, double, double, double) (/home/tdhock/R/gfpop/src/main.cpp:93)
==7799== by 0xBA7B99E: _gfpop_gfpopTransfer (/home/tdhock/R/gfpop/src/RcppExports.cpp:22)
==7799== by 0x4902968: R_doDotCall (/home/tdhock/R/R-3.6.0/src/main/dotcode.c:586)
==7799== by 0x493DDC3: bcEval (/home/tdhock/R/R-3.6.0/src/main/eval.c:7283)
==7799== by 0x494775D: Rf_eval (/home/tdhock/R/R-3.6.0/src/main/eval.c:620)
==7799== by 0x49495D7: R_execClosure (/home/tdhock/R/R-3.6.0/src/main/eval.c:1780)
==7799== by 0x494A500: Rf_applyClosure (/home/tdhock/R/R-3.6.0/src/main/eval.c:1706)
==7799== by 0x493E844: bcEval (/home/tdhock/R/R-3.6.0/src/main/eval.c:6733)
==7799== by 0x494775D: Rf_eval (/home/tdhock/R/R-3.6.0/src/main/eval.c:620)
==7799==
==7799== 56 bytes in 1 blocks are definitely lost in loss record 48 of 2,078
==7799== at 0x4830FF5: operator new(unsigned int) (/home/tdhock/R/valgrind-3.14.0/coregrind/m_replacemalloc/vg_replace_malloc.c:328)
==7799== by 0xBA796E5: Piece::min_function(Piece*, double) (/home/tdhock/R/gfpop/src/Piece.cpp:1001)
==7799== by 0xBA715EC: Omega::fpop1d_graph_std(Data const&) (/home/tdhock/R/gfpop/src/Omega.cpp:284)
==7799== by 0xBA8525A: gfpopTransfer(Rcpp::Vector<14, Rcpp::PreserveStorage>, Rcpp::Vector<14, Rcpp::PreserveStorage>, Rcpp::DataFrame_Impl<Rcpp::PreserveStorage>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, double, double, double, double) (/home/tdhock/R/gfpop/src/main.cpp:93)
==7799== by 0xBA7B99E: _gfpop_gfpopTransfer (/home/tdhock/R/gfpop/src/RcppExports.cpp:22)
==7799== by 0x4902968: R_doDotCall (/home/tdhock/R/R-3.6.0/src/main/dotcode.c:586)
==7799== by 0x493DDC3: bcEval (/home/tdhock/R/R-3.6.0/src/main/eval.c:7283)
==7799== by 0x494775D: Rf_eval (/home/tdhock/R/R-3.6.0/src/main/eval.c:620)
==7799== by 0x49495D7: R_execClosure (/home/tdhock/R/R-3.6.0/src/main/eval.c:1780)
==7799== by 0x494A500: Rf_applyClosure (/home/tdhock/R/R-3.6.0/src/main/eval.c:1706)
==7799== by 0x493E844: bcEval (/home/tdhock/R/R-3.6.0/src/main/eval.c:6733)
==7799== by 0x494775D: Rf_eval (/home/tdhock/R/R-3.6.0/src/main/eval.c:620)
==7799==
==7799== LEAK SUMMARY:
==7799== definitely lost: 168 bytes in 3 blocks
==7799== indirectly lost: 0 bytes in 0 blocks
==7799== possibly lost: 0 bytes in 0 blocks
==7799== still reachable: 34,689,353 bytes in 15,332 blocks
==7799== of which reachable via heuristic:
==7799== newarray : 3,880 bytes in 1 blocks
==7799== suppressed: 0 bytes in 0 blocks
==7799== Reachable blocks (those to which a pointer was found) are not shown.
==7799== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==7799==
==7799== For counts of detected and suppressed errors, rerun with: -v
==7799== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)
tdhock@recycled:~/R/gfpop-bug$
it says there is a "new" on this line ==7799== by 0xBA731DA: Omega::Omega(Graph, Bound, Robust) (/home/tdhock/R/gfpop/src/Omega.cpp:28) that has no corresponding "delete" -- can you please investigate/fix?
Hi @vrunge!
My name is Diego (@diego-urgell), and I am developing BinSeg, a changepoint analysis package for Google Summer of Code, using the Binary Segmentation algorithm. One of the objectives of this package is to support several distributions, among them Negative Binomial. The gfpop
package also implements it, and I wonder if you could please help me out with an issue I found regarding the cost function. I would really appreciate it 😄.
My current implementation of the cost function is the following:
double costFunction(int start, int end){
double lSum = this -> summaryStatistics -> getLinearSum(start, end);
double mean = this -> summaryStatistics -> getMean(start, end);
double varN = this -> summaryStatistics -> getVarianceN(start, end, false);
int N = end - start + 1;
if (varN <= 0) return INFINITY;
double var = varN/N;
double r_dispersion = ceil(fabs(pow(mean, 2)/(var-mean)));
double p_success = mean/var;
return (lSum * log(1-p_success) + N * r_dispersion * log(p_success));
}
As you can see, on every possible segment I compute the overdispersion parameter r, and then the probability of success p by using the following parameterization that depends on the mean and the variance (Lindén and Mäntyniemi, 2011):
Then, I calculate the cost using the equation described by Cleynen et al (2011) (the one used in Segmentator3IsBack
).
I tested the results against the gfpop
package. Even though both packages produces similar changepoints (not always the same, but that is expected given the algorithms), there are some differences regarding the overall cost and the parameter estimation that troubled me.
The first thing I found is that the overdispersion parameter is computed at the beginning of the algorithm. Windows of size 100 are set, and the dispersion parameter is computed for each of them. Then, the "average over dispersion" is computed and the whole signal is divided by this value. Why did you prefer to do this, instead of computing an over dispersion parameter for each segment?
Also, the overall cost of the model computed by gfpop
is different from the one computed by my implementation on BinSeg
. I believe you are also optimizing the log likelihood, so I wonder why they differ. Could you please share with me your cost function, and the equations you are using to approximate the probability of success parameter?
Thank you so much for your time!!
Cleynen, A., Koskas, M., Lebarbier, E. et al. Segmentor3IsBack: an R package for the fast and exact segmentation of Seq-data. Algorithms Mol Biol 9, 6 (2014). https://doi.org/10.1186/1748-7188-9-6
Lindén, A., & Mäntyniemi, S. (2011). Using the negative binomial distribution to model overdispersion in ecological count data. Ecology, 92(7), 1414-1421. doi:10.1890/10-1831.1
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.