hsloot / rmo Goto Github PK
View Code? Open in Web Editor NEWAn R package for Marshall--Olkin distributions and copulas.
Home Page: https://hsloot.github.io/rmo/
License: GNU General Public License v3.0
An R package for Marshall--Olkin distributions and copulas.
Home Page: https://hsloot.github.io/rmo/
License: GNU General Public License v3.0
After moving to a more sophisticated C++ backend, many parameter-checks are performed there and the original assertions became redundant. They should be removed and the error messages of the backend should be reviewed.
The functions valueOf,
uexIntensities,
exIntensities, and
exQMatrixdo not pass-through further arguments to the
integrate` routine. This should be changed.
scr/jump_generators.cpp
to expose all jump generators.The implementation of the compound Poisson LFM does not properly take into account the drift. In particular, it is not considered that the compound Poisson subordinator, simulated in sample_cpp
, can drift over multiple barrier_values
before the next waiting_time
/killing_time
is attained.
The test concept is based on using the function expect_equal_sampling_result
to easily compare to sampling algorithms against each other for a given parameter-set, a number of simulations, and dimension.
It was already evident, that testing the sample_cpp
function in this concept was not straightforward. Thus a change might be necessary.
The function is_within(i, j)
produces wrong results for very high values of j>2^31-1
, see also #35.
The reason for this is undocumented, implementation-dependent, or maybe just unknown behaviour (to me) how conversion is implemented in Rcpp
and c++
. R
prevents integer overflow, by conversing an integer higher than 2^31-1
to a double
. However, if a (theoretical) integer between 2^31-1
(32bit) and 2^63-1
(64bit) is passed from R
to an Rcpp
-function taking a long int
there might be some loss due to previous internal conversions to a double
. See also https://stackoverflow.com/a/1848762 for a discussion on the largest representable integer in a 64bit double
.
library(rmo)
rmo:::is_within(1L, 2L^55L+1L) ## FALSE, should be TRUE
#> [1] FALSE
Created on 2020-02-16 by the reprex package (v0.3.0)
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 3.6.2 (2019-12-12)
#> os macOS Catalina 10.15.3
#> system x86_64, darwin19.2.0
#> ui unknown
#> language (EN)
#> collate de_DE.UTF-8
#> ctype de_DE.UTF-8
#> tz Europe/Berlin
#> date 2020-02-16
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
#> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1)
#> callr 3.4.1 2020-01-24 [1] CRAN (R 3.6.2)
#> cli 2.0.1 2020-01-08 [1] CRAN (R 3.6.1)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
#> devtools 2.2.1 2019-09-24 [1] CRAN (R 3.6.1)
#> digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.1)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.1)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
#> glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.6.0)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.1)
#> knitr 1.28 2020-02-06 [1] CRAN (R 3.6.2)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
#> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.1)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.2)
#> processx 3.4.2 2020-02-09 [1] CRAN (R 3.6.2)
#> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.1)
#> Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.1)
#> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
#> rlang 0.4.4 2020-01-28 [1] CRAN (R 3.6.2)
#> rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.2)
#> rmo * 0.2.0.0000 2020-02-10 [1] local
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
#> stringi 1.4.5 2020-01-11 [1] CRAN (R 3.6.1)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
#> testthat 2.3.1 2019-12-01 [1] CRAN (R 3.6.1)
#> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.0)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
#> xfun 0.12 2020-01-13 [1] CRAN (R 3.6.1)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.2)
#>
#> [1] /usr/local/lib/R/3.6/site-library
#> [2] /usr/local/Cellar/r/3.6.2/lib/R/library
It would be good a have the most recent benchmarks included on the website. For this reason, we have to write our own plotting routine such that these display only tagged commits and are separated by their algorithm.
As of now, all test are run only for specific parameter configurations. To ensure that all algorithms are properly working, multiple input combinations should be tested.
The test implementations of the simulation algorithms should be refactored into separate functions which are stored in helper__*.R
files in the tests/testthat
directory.
Then, tests should be always run with various different input configurations.
Perhaps, one can use the MonteCarlo
-package to facilitate these tests.
For more information on test setup, see the setup and teardown section of https://www.tidyverse.org/blog/2017/12/testthat-2-0-0.
We should switch the CI to Github actions. Look at these instructions to see how it can be done: https://ropenscilabs.github.io/actions_sandbox/.
For some boundary cases, the numerical integration might report divergence, even if the integral theoretically should converge (see reprex). This behaviour should somehow be avoided. Possible solutions:
All implemented numerical integrations should be checked if they have singularities close to zero (or infinity).
Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.
d <- 125
lambda <- 1
alpha <- 0.5
x0 <- 5e-4
intensities <- lambda * rmo::intensities_pareto(d, alpha, x0)
#> Error in integrate(f = function(u) {: the integral is probably divergent
Created on 2020-06-30 by the reprex package (v0.3.0)
The simulation tests, which test if a C++ implementation based on R's RNG and univariate sampling algorithms yield the same result as a naive R implementation, currently use separate C++ functions following the naming convention rtest__*
. It might be more clear to mock the production R-functions.
Use mockery's stub
function similarly to
## in tests/testthat/test_esm-model.R
...
test_that("ESM implementation for d = 2", {
stub(rmo_esm, 'Rcpp__rmo_esm', rtest__rmo_esm)
...
expect_equal_rn_generation(
"rmo_esm", "test__rmo_esm_bivariate",
args, n, use_seed)
}
...
As of now, rmo_lfm_cpp
requires that is_positive_number(rate)
is TRUE
. This is too strong since it would be valid to choose e.g.
rate=0, rate_killing=1, rate_drift=0 ## comonotone case
rate=0, rate_killing=0, rate_drift=1 ## independence case
The "production" sampling algorithms should be subject to statistical unit tests. A general framework for this is outlined in https://github.com/hsloot/rmo/blob/master/other/integration-test.md.
Create a test utility class for providing a Kolmogorov-Smirnov or a chi-squared test (for discrete distributions). For MO distributions, subsequently, test the (appropriately) scaled joined minimum of each observation-row against a unit exponential distribution. For univariate distributions, test against the theoretical distribution. Boost/random
uses a similar test-framework with a test-failure if p<0.01.
At the moment, the valueOf
-function uses signatures c("BernsteinFunction", "numeric", "integer")
. This leads to impractical consequences. E.g. the third argument must be an actual integer and not and integerish double. Also for some applications a complex argument might be appropriate and for some Bernstein functions, the implementation might be able to process complex input.
Only dispatch on first argument and argument check the rest.
The rmo_esm
sampling algorithm does not properly deal with zero rates. By convention, a rate of zero should correspond to Inf
, but rexp
produces NA
. Problem is also present in the corresponding test.
library(rmo)
rmo_esm(1, 2, c(1, 1, 0))
#> Warning in rexp(1, intensities[[j]]): NAs produziert
#> [,1] [,2]
#> [1,] NaN NaN
Created on 2019-11-14 by the reprex package (v0.3.0)
In Sec. 6.3 of the following reference, it is described how an alpha-stable subordinator can be approximated by a CPP with Pareto-type jumps and a drift.
Fernández Loroño, Lexuri. "Selected topics in financial engineering: first-exit times and dependence structures of Marshall-Olkin Kind." (2015).
It would be good to implement
Pareto
which implements the sampling algorithm via the probability integral transform.For the implementation of a Lévy-frailty sampling algorithm for compound Poisson subordinators, we have to decide which families should be considered for the jump distributions.
Furthermore, let me know if you have an idea for a flexible and easy way to provide as many jump distributions as possible.
Lévy-fraily model: https://hsloot.github.io/rmo/articles/The-Levy-frailty-model.html.
Use the checkmate
package for simple assert statements (such as value is scalar and integerish).
A clear and concise description of what you want to happen.
It was discovered in the integration test that the representation of the distribution via intensities might lead to numerical problems for large dimension parameters and certain parametrisations.
rmo_ex_arnold
and large d.rmo_lfm_cpp
sometimes produces ties for large d.library(rmo)
n <- 1e4L
alpha <- 0.4
bf <- AlphaStableBernsteinFunction(alpha=alpha)
d <- 10L
valueOf(bf, d, 0L)
#> [1] 2.511886
ex_intensities <- ex_intensities_alpha_stable(d, alpha=alpha)
scale <- sum(ex_intensities * sapply(1:d, function(x) choose(d, x)))
## Kolmogorov-Smirnov test should not reject
ks.test(scale * apply(rmo_ex_arnold(n, d, ex_intensities), 1, min), pexp)
#>
#> One-sample Kolmogorov-Smirnov test
#>
#> data: scale * apply(rmo_ex_arnold(n, d, ex_intensities), 1, min)
#> D = 0.0069895, p-value = 0.713
#> alternative hypothesis: two-sided
## Should be very similar
all.equal(scale, valueOf(bf, d, 0L))
#> [1] "Mean relative difference: 2.618647e-07"
d <- 125L
valueOf(bf, d, 0L)
#> [1] 6.898648
ex_intensities <- ex_intensities_alpha_stable(d, alpha=alpha)
scale <- sum(ex_intensities * sapply(1:d, function(x) choose(d, x)))
## Kolmogorov-Smirnov test should not reject
ks.test(scale * apply(rmo_ex_arnold(n, d, ex_intensities), 1, min), pexp)
#>
#> One-sample Kolmogorov-Smirnov test
#>
#> data: scale * apply(rmo_ex_arnold(n, d, ex_intensities), 1, min)
#> D = 0.067735, p-value < 2.2e-16
#> alternative hypothesis: two-sided
## Should be very similar
all.equal(scale, valueOf(bf, d, 0L))
#> [1] "Mean relative difference: 0.0002861992"
Created on 2020-09-19 by the reprex package (v0.3.0)
library(rmo)
n <- 1e4L
d <- 125L
set.seed(1623L)
length(unique(apply(rmo_lfm_cpp(n, d, 0.1, 0.2, 0.4, "rexp", list("rate" = 0.6)), 1, min)))
#> [1] 9998
Created on 2020-09-19 by the reprex package (v0.3.0)
Rcpp
implementationR
implementationR
implementationReview the implementation of the exogenous shock model in Rcpp
:
Lines 289 to 310 in cf85b82
Lines 213 to 283 in cf85b82
Review, simplify, and document the test-implementations in R
:
rmo/tests/testthat/helper__lfm_cpp.R
Lines 186 to 203 in cf85b82
rmo/tests/testthat/helper__lfm_cpp.R
Lines 208 to 258 in cf85b82
rmo/tests/testthat/helper__lfm_cpp.R
Lines 148 to 160 in cf85b82
rmo/tests/testthat/helper__lfm_cpp.R
Lines 166 to 178 in cf85b82
rmo/tests/testthat/helper__lfm_cpp.R
Lines 6 to 14 in cf85b82
rmo/tests/testthat/helper__lfm_cpp.R
Lines 19 to 76 in cf85b82
rmo/tests/testthat/helper__lfm_cpp.R
Lines 81 to 138 in cf85b82
Since we use a limited range of Rcpp's functionality, we might benefit from switching to cpp11 (see https://github.com/r-lib/cpp11). However, some of the currently exported functions and tests might not work and have to be changed. The main benefit would probably be reduced compilation time.
Currently, the LFM outsources the generation of the whole CPP history which is required to a dedicated method. This implies overhead and produces more problems (e.g. more overhead by re-allocation or the requirement to calculate the Laplace transform of the jump distribution for estimating the required capacity, see #49).
lintr
code-checkingThe R
-package lintr
allows to define static code tests to ensure certain code standards and styles are fulfilled.
lintr
setup according to code styleThe "production" sampling algorithms should be subject to statistical unit tests. A general framework for this is outlined in https://github.com/hsloot/rmo/blob/master/other/integration-test.md.
Create a test utility class for providing a Kolmogorov-Smirnov or a chi-squared test (for discrete distributions). For MO distributions, subsequently, test the (appropriately) scaled joined minimum of each observation-row against a unit exponential distribution. For univariate distributions, test against the theoretical distribution. Boost/random
uses a similar test-framework with a test-failure if p<0.01.
The current implementations of the sampling algorithms, rmo_esm
, rmo_arnold
, and rmo_ex_arnold
, do not perform any argument checks. To avoid any potential problems, the algorithms should check the input arguments for proper type and range as well as matching dimensions.
As argument checks are only performed once in the beginning, code readability instead of speed should be the main concern. As these checks are only performed once for all simulations, it is more important that the assertions are readable than that they are fast.
For rmo_esm
, rmo_arnold
, and rmo_ex_arnold
it has to be checked that the number of sample n
and the dimension d
are counting numbers:
assert_that(is.count(n), is.count(d))
For rmo_esm
and rmo_arnold
, it has to be checked that intensities
is a numeric, non-negative vector of length 2^d-1
and that the marginal rates are strictly positive:
assert_that(is.numeric(intensities), length(intensities) == 2^d-1, all(intensities >= 0))
marginal_intensities <- numeric(d)
for (i in 1:d) {
for (j in :1:(2^d-1)) {
if (is_within(i, j) { ## use `is_within` from sets.R
marginal_intensities[i] <- marginal_intensities[i] + intensities[j]
}
}
}
assert_that(all(marginal_intensities > 0)) ## could also raise a warning instead of an error
For rmo_ex_arnold
, it has to be checked that ex_intensities
is a numeric, non-negative random vector with lentgh d
and that the marginal rates are strictly positive:
assert_that(is.numeric(ex_intensites), all(ex_intensities >= 0), length(ex_intensities) == d)
marginal_intensity <- sum(vapply(0:(d-1), function(y) choose(d-1, y) * ex_intensities[[y+1]], FUN.VALUE=0.5))
assert_that(marginal_intensity > 0)
A comparison of multiple packages to implement assertions can be found here: https://cran.r-project.org/web/packages/checkr/vignettes/assertive-programming.html.
Add an implementation of the modified Arnold model for the exchangeable Marshall-Olkin distribution in plain R
.
The following algorithm is given in section 3.2.2 in Mai, J. F., & Scherer, M. (2017). Simulating Copulas.
An implementation that I found in my old files, but which has to be tested and documented is listed below. It is based on the parametrised Marshall-Olkin distribution.
rmo_exArnold <- function(n, a) {
t(vapply(1:n, FUN = function(x) rmo_exArnold_(a), FUN.VALUE = 0.5, USE.NAMES = FALSE))
}
rmo_exArnold_ <- function(a, shuffle=TRUE) {
d <- length(a)
value <- numeric(d)
ex_intensities <- vapply(1:d,
function(x) sum(vapply(1:x, function(y) (-1)^(y-1) * choose(x-1, y-1) * a[[d-x+y]],
FUN.VALUE = 0.5 )), FUN.VALUE = 0.5)
transition_probabilities <- vapply(1:d, function(x) choose(d, x), FUN.VALUE = 0.5) *
ex_intensities
total_intensity <- sum(transition_probabilities)
transition_probabilities <- transition_probabilities / total_intensity
epsilon <- rexp(1, total_intensity)
affected <- sample.int(d, 1, replace = TRUE, transition_probabilities)
Y <- rep(epsilon, affected)
if (d == affected)
return(Y)
value <- c(Y, epsilon + rmo_exArnold_(a[d-affected], shuffle=FALSE))
if (shuffle) {
perm <- sample.int(d, d, replace = FALSE)
value <- value[perm]
}
value
}
The implementation based on the reparametrisation seems to be overly complicated. Consider the following alternative:
rmo_exArnold <- function(n, ex_intensities) {
d <- length(ex_intensities)
out <- matrix(0, nrow=n, ncol=d)
for (i in 1:n) {
value <- rmo_exArnold_sorted(ex_intensities)
perm <- sample.int(d, d, replace = FALSE)
out[i, ] <- value[perm]
}
out
}
rmo_exArnold_sorted <- function(ex_intensities) {
d <- length(ex_intensities)
transition_probabilities <- vapply(1:d, function(x) choose(d, x), FUN.VALUE = 0.5) *
ex_intensities
total_intensity <- sum(transition_probabilities)
transition_probabilities <- transition_probabilities / total_intensity
epsilon <- rexp(1, total_intensity)
affected <- sample.int(d, 1, replace = TRUE, transition_probabilities)
Y <- rep(epsilon, affected)
if (d == affected)
return(Y)
for (i in 1:affected) { ## if possible, replace with binomial expression
ex_intensities <- ex_intensities[1:(d-i)] + ex_intensities[2:(d-i+1)]
}
c(Y, epsilon + rmo_exArnold_sorted(ex_intensities))
}
Further aspects:
do_random(n, FUN, arg_list)
which calls FUN
with arg_list
n
times, ensures the results are vectors of length d
and stores the result in a n x d
matrix.No additional context.
Rcpp
implementationR
implementationR
implementationReview the implementation of the exogenous shock model in Rcpp
:
Lines 44 to 67 in cf85b82
Review, simplify, and document the test-implementations in R
:
rmo/tests/testthat/helper__esm.R
Lines 24 to 39 in cf85b82
rmo/tests/testthat/helper__esm.R
Lines 6 to 16 in cf85b82
For a good collaboration, the background material must be accessible to all possible collaborators.
Write a section on
Mai, J. F., & Scherer, M. (2017). Simulating Copulas.
Currently, the guidelines on collaborations for this project are not very specific.
README.Rmd
for contributingCompositions of Bernstein functions are again Bernstein functions. This holds in particular for the inner functions being a linear function. This is easy to implement and will allow a larger variety of Bernstein functions to be created.
A possible implementation
## allClass-S4.R
CompositScaledBernsteinFunction <- setClass("CompositScaledBernsteinFunction",
contains = "BernsteinFunction",
slots = c(cscale = "numeric", original = "BernsteinFunction"))
## allValidity-S4.R
setValidity("CompositScaledBernsteinFunction",
function(object) {
qassert(object@cscale, "N1(0,)")
invisible(TRUE)
})
## allInitialize-S4.R
setMethod("initialize", "CompositScaledBernsteinFunction",
function(.Object, cscale = 1, original = AlgebraicBernsteinFunction2()) { # nolint
.Object@cscale <- cscale
.Object@original <- original
validObject(.Object)
invisible(.Object)
})
## valueOf-S4.R
setMethod("valueOf", "CompositScaledBernsteinFunction",
function(object, x, difference_order = 0L, cscale = 1, n = 1, k = 0, ...,
method = c("default", "levy", "stieltjes"),
tolerance = .Machine$double.eps^0.5) {
method <- match.arg(method)
if (!"default" == method) {
return(callNextMethod())
}
assert(combine = "or",
check_numeric(x, lower = 0, min.len = 1L, any.missing = FALSE),
check_complex(x, min.len = 1L, any.missing = FALSE))
qassert(difference_order, "X1[0,)")
qassert(cscale, "N1(0,)")
qassert(n, "X1(0,)")
qassert(k, "N1[0,)")
valueOf(object@original, x, difference_order, cscale*object@cscale, n, k, ...)
})
As a showcase, the following shows how this can be used to create the InverseGaussianBernsteinFunction
based on the Algebraic Bernstein function no. 2.
## allClass-S4.R
AlgebraicBernsteinFunction2 <- setClass("AlgebraicBernsteinFunction2",
contains = "CompleteBernsteinFunction",
slots = c(alpha = "numeric"))
## allValidity-S4.R
setValidity("AlgebraicBernsteinFunction2",
function(object) {
qassert(object@alpha, "N1(0,)")
invisible(TRUE)
})
## allInitialize-S4.R
setMethod("initialize", "AlgebraicBernsteinFunction2",
function(.Object, alpha = log2(2 - 0.5)) { # nolint
.Object@alpha <- alpha
validObject(.Object)
invisible(.Object)
})
## levyDensity-S4.R
setMethod("levyDensity", "AlgebraicBernsteinFunction2",
function(object) {
structure(
function(x) {
object@alpha / gamma(1 - object@alpha) * x ^ (-1 - object@alpha) * exp(-x)
},
lower = 0, upper = Inf, type = "continuous"
)
})
## stieltjesDensity-S4.R
setMethod("stieltjesDensity", "AlgebraicBernsteinFunction2",
function(object) {
structure(
function(x) {
sin(object@alpha * pi) / pi * (x - 1) ^ (object@alpha - 1) / x
},
lower = 1, upper = Inf, type = "continuous"
)
})
eta <- 2
valueOf(InverseGaussianBernsteinFunction(eta = eta), seq(0, 2, by = 0.25), 0L, cscale = 1)
valueOf(
ScaledBernsteinFunction(
scale = eta,
original = CompositScaledBernsteinFunction(
cscale = 2 / eta^2,
original = AlgebraicBernsteinFunction2(
alpha = 0.5
)
)
),
seq(0, 2, by = 0.25))
The custom assertions should be as simple as possible, follow the single responsibility principle, have descriptive names. The error messages should be informative and the corresponding variables should follow a naming convention.
is_positive_number
: Tests if a number is scalar and positive; not contained in assertthat-packageis_nonnegative_number
: Tests if a number is scalar and non-negative; not contained in assertthat-packageis_dimension
: Tests if a number is a natural number larger than 1is_32bit_complient_dimension
: Tests if a dimension for a Marshall--Olkin distribution is 32-bit compliant, i.e. if the number of distribution parameters will not exceed the 32-bit limit.is_mo_parameter
: Tests if a parameter vector of shock rates is a valid Marshall--Olkin parameter vector, i.e. if it has the correct length, all elements are non-negative numbers, and if the resulting marginal rates are positive.is_ex_mo_parameter
: Tests if a parameter vector of shock rates is a valid Marshall--Olkin parameter vector, i.e. if it has the correct length, all elements are non-negative numbers, and if the resulting marginal rate is positive.is_lfm_cpp_param
: Tests if a parameter list is a valid parameterisation of a Levy-frailty Marshall--Olkin model, i.e. if the rate, drift, and killing parameters are non-negative numbers and sum up to a positive number and if jump distribution is correctly parametrised.is_rjump_name
: Tests if the given name is a valid method to simulate jumps; jump distributions are currently limited to exponential and fixed jump distributions.is_rjump_param
: Tests if the given parameter list is valid for the given method name by performing a test run (warning: this might change the RNG state).At the moment, we are able to calculate the exchangeable intensities either directly or via their integral representation. For the main application, the input often might be a scaled version thereof (multiplied by a binomial coefficient).
Generalize the valueOf
-function such that the controlled integration error is calculated for the scaled value (binomial coefficient goes under the integral).
The function rmo_lfm_cpp
produces Inf
-values for the example code for the independence case. Also, the last two examples are marked with NOT RUN
. This should be removed.
library(rmo)
sessionInfo()
#> R version 3.6.1 (2019-07-05)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 14393)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
#> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
#> [5] LC_TIME=German_Germany.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] rmo_0.1.0.9000
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_3.6.1 assertthat_0.2.1 magrittr_1.5 tools_3.6.1
#> [5] htmltools_0.4.0 yaml_2.2.0 Rcpp_1.0.3 stringi_1.4.3
#> [9] rmarkdown_1.17 highr_0.8 knitr_1.26 stringr_1.4.0
#> [13] xfun_0.11 digest_0.6.22 rlang_0.4.1 evaluate_0.14
rmo_lfm_cpp(10L, 2L, 0, 0, 1, "rposval", list("value"=1)) ## independence
#> [,1] [,2]
#> [1,] 1.1298525 Inf
#> [2,] Inf 0.32693263
#> [3,] Inf 0.09800674
#> [4,] 0.0532895 Inf
#> [5,] Inf 0.02729041
#> [6,] 0.5211671 Inf
#> [7,] 0.8131072 Inf
#> [8,] Inf 0.21877149
#> [9,] Inf 0.22773527
#> [10,] Inf 0.19788030
Created on 2019-11-27 by the reprex package (v0.3.0)
At the moment all sampling routines are implemented in the src
directory for the RRNGPolicy
. All core functionalities should be implemented as MultivariateGenerator
s.
Rcpp
implementationR
implementationR
implementationReview the implementation of the exogenous shock model in Rcpp
:
Lines 171 to 189 in cf85b82
Review, simplify, and document the test-implementations in R
:
rmo/tests/testthat/helper__rmo_esm_cuadras_auge.R
Lines 22 to 31 in cf85b82
rmo/tests/testthat/helper__rmo_esm_cuadras_auge.R
Lines 5 to 15 in cf85b82
The exchangeable Markov model is currently parametrized with the exchangeable intensities
Rewrite the algorithm such that the parameter ex_intensity
has the interpretation of the scaled exchangeable intensity.
Here are three mathematically equivalent ways to calculate the intensity matrix
qmatrix1 <- function(bf, d) {
ex_intensities <- sapply(1:d, function(i) valueOf(bf, d-i, i))
out <- matrix(0, nrow = d+1, ncol = d+1)
for (i in 0:d) {
if (i < d) {
for (j in (i+1):d) {
out[1+i, 1+j] <- rmo:::multiply_binomial_coefficient(
sum(sapply(
0:i,
function(k) {
rmo:::multiply_binomial_coefficient(ex_intensities[k + j-i], i, k)
})), d-i, j-i)
}
out[1+i, 1+i] <- -sum(out[1+i, ])
}
}
out
}
qmatrix2 <- function(bf, d) {
out <- matrix(0, nrow = d+1, ncol = d+1)
for (i in 0:d) {
if (i < d) {
for (j in (i+1):d) {
out[1+i, 1+j] <- valueOf(bf, d-j, j-i, n=d-i, k=j-i)
}
out[1+i, 1+i] <- -valueOf(bf, d-i, n = d-i, k = j-i)
}
}
out
}
qmatrix3 <- function(bf, d) {
ex_intensities <- sapply(
1:d,
function(i) {
valueOf(bf, d-i, i, n = d, k = i)
})
out <- matrix(0, nrow = d+1, ncol = d+1)
out[1, -1] <- ex_intensities
out[1, 1] <- -sum(out[1, -1])
for (i in 1:d) {
if (i < d) {
for (j in (i+1):d) {
out[1+i, 1+j] <- (d-j+1)/(d-i+1) * out[i, j] + (j+1-i)/(d-i+1) * out[i, 1+j]
}
out[1+i, 1+i] <- -sum(out[1+i, ])
}
}
out
}
Rcpp
implementationR
implementationR
implementationReview the implementation of the Arnold model
in Rcpp
:
Lines 73 to 108 in cf85b82
Review, simplify, and document the test-implementations in R
:
rmo/tests/testthat/helper__arnold.R
Lines 36 to 62 in cf85b82
rmo/tests/testthat/helper__arnold.R
Lines 6 to 28 in cf85b82
Add an implementation of the Arnold model in sample.R
in plain R
.
The algorithm for the Arnold model is described in section 3.1.2 in Mai, J. F., & Scherer, M. (2017). Simulating Copulas.
An implementation that I found in my old files, but which has to be tested and documented is the following:
rmo_Arnold <- function(n, d, intensities) {
total_intensity <- sum(intensities)
transition_probs <- intensities / total_intensity
out <- matrix(nrow=n, ncol=d)
for (k in 1:n) {
destroyed <- logical(d)
value <- numeric(d)
while(!all(destroyed)) {
W <- rexp(1, total_intensity)
Y <- sample.int(n = 2^d-1, size=1, prob = transition_probs)
for (i in 1:d) {
if (!destroyed[[i]]) {
if (is_within(i, Y)) { ## uses the fn. `is_within` (implemented in `sets.R`)
destroyed[[i]] <- TRUE
}
value[[i]] <- value[[i]] + W
}
}
}
out[k, ] <- value
}
out
}
This algorithm uses the function is_within
which is implemented in sets.R
. For the reference:
is_within <- function(i, j) {
count <- 1
while (j > 0) {
if (1 == (j %% 2) && count == i)
return(TRUE)
j <- j %/% 2
count <- count + 1
}
return(FALSE)
}
Ultimately, a C
or Rcpp
implementation of the algorithm would be preferable, but for the beginning, a plain R
implementation should be sufficient. Especially, because this implementation can later be used for testing if a reimplementation in C
or Rcpp
is done.
Ultimately, if a reimplementation in C
or Rcpp
is done, one has to carefully think on the algorithms which are used such that a comparison with the R
implementation for testing is possible. Particularly, the sample.int
function might be a candidate for review.
No additional context.
The documentation of the package-manual is missing.
Create a fileR/rmo.R
With a short description of the package and its main applications.
At the moment, most of the parameter checks are implemented in R
using assertthat
. A consequence of that is a performance hit - in particular, if large parameters are involved. It would be good to either incorporate parameter checks directly in the support library or at least rewrite some of the tests in C++.
The sorting methods mo::utils::reverse_sort
and mo::utils::sort_index
are not tested at the moment.
At the moment, the package does not provide any functions to generate Marshall--Olkin parameter. for high dimensional models.
Provide helper functions to generate various types of Marshall-Olkin distribution parameters. These could first be used internally to improve and extend testing.
The Alias method is an extremely efficient method to sample from finite distributions with non-uniform probabilities. It is described in https://en.wikipedia.org/wiki/Alias_method. It is also the preferred method in R to sample from a non-uniform discrete distribution with many elements, see https://github.com/wch/r-source/blob/6a11175ce919e7f8105e9b629511c180bac7d504/src/main/random.c#L346.
Rcpp
implementationR
implementationReview the implementation of the MO parameter mapping via the is_within
function in Rcpp
:
Lines 11 to 24 in cf85b82
The scale parameter lambda
from the PoissonBernsteinFunction
is redundant since the following Bernstein Functions are equivalent:
PoissonBernsteinFunction(lambda = lambda, eta = eta)
ScaledBernsteinFunction(scale = lambda, original = PoissonBernsteinFunction(eta = eta))
Removing lambda
will not remove any feature, but will make the user interface more consistent, since most Bernstein function families have a single parameter.
While unit tests are already extensive, an integration test of some sort is missing. This could be a Monte-Carlo simulation, which produces estimators for intensities or for shock arrival probabilities.
The Cuadras-Augé family is described on https://hsloot.github.io/rmo/articles/Cuadras-Auge-distributions.html. For this family, all but the individual shocks and the global shock are almost surely infinite and can be ignored in the shock model. Also, the Arnold model and exchangeable Arnold model can be implemented more efficiently. This allows a very efficient implementation of the shock model algorithm.
Reimplement the shock model algorithm, the Arnold model, and the exchangeable Arnold model for this class in a more efficient way.
As of now, I am not sure what the best names for the parameters as well as the specialised algorithms could be. Suggestions are very welcome.
The package should provide S4 methods to create either an intensities
vector, an ex_intensities
vector or the Markov intensity matrix qmatrix
.
## valueOf-S4.R
setGeneric("intensities",
function(object, d, ...) {
standardGeneric("intensities")
})
setGeneric("exIntensities",
function(object, d, ...) {
standardGeneric("exIntensities")
})
setGeneric("exQMatrix",
function(object, d, ...) {
standardGeneric("exQMatrix")
})
setMethod("intensities", "BernsteinFunction",
function(object, d, ...) {
tmp <- sapply(1:d, function(i) valueOf(object, d-i, i, ...))
out <- numeric(2^d-1)
for (j in seq_along(out)) {
count <- 0
for (i in 1:d) {
count <- count + Rcpp__is_within(i, j)
}
out[j] <- tmp[count]
}
out
})
setMethod("exIntensities", "BernsteinFunction",
function(object, d, ...) {
sapply(1:d, function(i) valueOf(object, d-i, i, n = d, k = i))
})
setMethod("exQMatrix", "BernsteinFunction",
function(object, d, ...) {
out <- matrix(0, nrow = d+1, ncol = d+1)
out[1, -1] <- exIntensities(object, d, ...)
for (i in 1:d) {
if (i < d) {
for (j in (i+1):d) {
out[1+i, 1+j] <- (d-j+1)/(d-i+1) * out[i, j] + (j+1-i) / (d-i+1) * out[i,j+1]
}
}
}
diag(out) <- -apply(out, 1, sum)
out
})
An article for the binary representation should be written which allows understanding the internal representation of Marshall--Olkin parameters better.
Rcpp
implementationR
implementationR
implementationReview the implementation of the exogenous shock model in Rcpp
:
Lines 114 to 164 in cf85b82
Review, simplify, and document the test-implementations in R
:
rmo/tests/testthat/helper__ex_arnold.R
Lines 76 to 96 in cf85b82
rmo/tests/testthat/helper__ex_arnold.R
Lines 101 to 114 in cf85b82
rmo/tests/testthat/helper__ex_arnold.R
Lines 40 to 68 in cf85b82
rmo/tests/testthat/helper__ex_arnold.R
Lines 6 to 32 in cf85b82
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.