Giter Site home page Giter Site logo

get's Introduction

GET: Global envelopes

https://cran.r-project.org/package=GET

The R package GET provides global envelopes which can be used for central regions of functional or multivariate data (e.g. outlier detection, functional boxplot), for graphical Monte Carlo and permutation tests where the test statistic is a multivariate vector or function (e.g. goodness-of-fit testing for point patterns and random sets, functional ANOVA, functional GLM, n-sample test of correspondence of distribution functions), and for global confidence and prediction bands (e.g. confidence band in polynomial regression, Bayesian posterior prediction).

The development version

The github repository holds a copy of the current development version of the contributed R package GET.

This development version is as or more recent than the official release of GET on the Comprehensive R Archive Network (CRAN) at https://cran.r-project.org/package=GET

Where is the official release?

For the most recent official release of GET, see https://cran.r-project.org/package=GET

Installation

Installing the official release

To install the official release of GET from CRAN, start R and type

install.packages('GET')

Installing the development version

The easiest way to install the GET library from github is through the remotes package. Start R and type:

require(remotes)
install_github('myllym/GET')

If you do not have the R library remotes installed, install it first by running

install.packages("remotes")

After installation, in order to start using GET, load it to R and see the main help page, which describes the functions of the library:

require(GET)
help('GET-package')

If you want to have also vignettes working, you should also install packages from the 'suggests' field, have MiKTeX on your computer, and install the library with

install_github('myllym/GET', build_vignettes = TRUE)

Vignettes

The package contains four vignettes. The GET vignette describes the package in general. It is available by starting R and typing

library("GET")
vignette("GET")

This vignette corresponds to Myllymäki and Mrkvička (2023).

The package provides also a vignette for global envelopes for point pattern analyses, which is available by starting R and typing

library("GET")
vignette("pointpatterns")

The third vignette describes and provides code for the examples of Mrkvička and Myllymäki (2023) using the false discovery rate (FDR) envelopes,

library("GET")
vignette("FDRenvelopes")

Finally, the fourth vignette, available by

library("GET")
vignette("HotSpots")

shows how the methodology proposed by Mrkvička et al. (2023b) for detecting hotspots on a linear network can be performed using GET.

All vignettes are also available at the package webpage https://cran.r-project.org/package=GET

Branches

Currently two branches are provided in the development version. The main branch of GET is called master.

The other branches are called FDR and quantileregression. The FDR branch includes also the experimental FDR envelopes tested in Mrkvička and Myllymäki (2023). The main branch includes the FDR envelopes which were found to have good performance in Mrkvička and Myllymäki (2023).

We note that the quantileregression branch, which included the implementation of the global quantile regression proposed in Mrkvička et al. (2023a), was recently merger to the master.

References

To cite GET in publications use

Myllymäki, M. and Mrkvička, T. (2023). GET: Global envelopes in R. arXiv:1911.06583 [stat.ME] https://doi.org/10.48550/arXiv.1911.06583

Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017). Global envelope tests for spatial processes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79: 381-404. doi: 10.1111/rssb.12172 http://dx.doi.org/10.1111/rssb.12172 (You can find the preprint version of the article here: http://arxiv.org/abs/1307.0239v4)

and a suitable selection of:

Myllymäki, M., Grabarnik, P., Seijo, H., and Stoyan, D. (2015). Deviation test construction and power comparison for marked spatial point patterns. Spatial Statistics 11: 19-34. https://doi.org/10.1016/j.spasta.2014.11.004 (You can find the preprint version of the article here: http://arxiv.org/abs/1306.1028)

Mrkvička, T., Soubeyrand, S., Myllymäki, M., Grabarnik, P., and Hahn, U. (2016). Monte Carlo testing in spatial statistics, with applications to spatial residuals. Spatial Statistics 18, Part A: 40--53. https://doi.org/10.1016/j.spasta.2016.04.005

Mrkvička, T., Myllymäki, M. and Hahn, U. (2017). Multiple Monte Carlo testing, with applications in spatial point processes. Statistics and Computing 27 (5): 1239-1255. https://doi.org/10.1007/s11222-016-9683-9

Mrkvička, T., Myllymäki, M., Jilek, M. and Hahn, U. (2020). A one-way ANOVA test for functional data with graphical interpretation. Kybernetika 56 (3), 432-458. http://doi.org/10.14736/kyb-2020-3-0432

Myllymäki, M., Kuronen, M. and Mrkvička, T. (2020). Testing global and local dependence of point patterns on covariates in parametric models. Spatial Statistics 42, 100436. https://doi.org/10.1016/j.spasta.2020.100436

Mrkvička, T., Roskovec, T. and Rost, M. (2021). A nonparametric graphical tests of significance in functional GLM. Methodology and Computing in Applied Probability 23, 593-612. https://doi.org/10.1007/s11009-019-09756-y

Dai, W., Athanasiadis, S. and Mrkvička, T. (2022). A new functional clustering method with combined dissimilarity sources and graphical interpretation. Intech open. https://doi.org/10.5772/intechopen.100124

Dvořák, J. and Mrkvička, T. (2022). Graphical tests of independence for general distributions. Computational Statistics 37, 671--699. https://doi.org/10.1007/s00180-021-01134-y

Mrkvička, T., Myllymäki, M., Kuronen, M. and Narisetty, N. N. (2022). New methods for multiple testing in permutation inference for the general linear model. Statistics in Medicine 41(2), 276-297. https://doi.org/10.1002/sim.9236

Mrkvička and Myllymäki (2023). False discovery rate envelopes. Statistics and Computing 33, 109. https://doi.org/10.1007/s11222-023-10275-7

Mrkvička, T., Konstantinou, K., Kuronen, M. and Myllymäki, M. (2023a). Global quantile regression. arXiv:2309.04746 [stat.ME] https://doi.org/10.48550/arXiv.2309.04746

Mrkvička T., Kraft S., Blažek V., Myllymäki M. (2023b). Hotspot detection on a linear network in the presence of covariates: a case study on road crash data. Available at SSRN: http://dx.doi.org/10.2139/ssrn.4627591

get's People

Contributors

mikkoku avatar myllym avatar olivroy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

get's Issues

central_region always uses mean and never median as central function (at least in some cases).

When I set the central argument of central_region equal to "median", it still uses mean as central function. I believe this is because the argument central of get_T_0 is not specified when it is called inside individual_central_region.

T_0 <- get_T_0(curve_set)

More details : my actual call looks like central_region(my_curve_set, type="erl", coverage=.8, central = "median").

Plotting of 2d envelopes broken for non-square matrices

I get an

Error in spatstat::im(x$obs < x$lo, xcol = x$rx, yrow = x$ry) : 
  Length of xcol does not match ncol(mat)

This is due to the used spatstat::im and spatstat::as.im treating input x- and y-arguments differently. Plot using the latter are produced first and show up, but the code halts when the former is called.

Code to reproduce:

obs <- matrix(runif(110), 10, 11)
sims <- array(runif(110*12), c(12,10,11))
cr <- global_envelope_test_2d(obs, sims, 1:10, 1:11)
plot(cr)

spatstat::plot.im(spatstat::im(x$obs < x$lo, xcol=x$rx, yrow=x$ry),

Gets a nonzero liberal p-value when a zero is expected

When using 'rank_envelope' where the data curve has rank 1, I would expect to get a liberal p-value of 0 (according to expression (10) in 'Global envelope tests for spatial processes' (2017)). However, the p-interval returned from 'rank_envelope' gives a liberal p-value of 1/nsim instead.

nestedness checking and equivalent interactions when omitting a main effect

When testing the significance of a variable that participates in interactions on its own, R seems to canonicalize the name of interaction terms differently, causing nestedness checks done by GET to fail:

library('GET')
library('fda.usc')

Y <- fdata(matrix(rnorm(96 * 120), 120, 96))
R <- fdata(matrix(rnorm(96 * 120), 120, 96))
S <- fdata(matrix(rnorm(96 * 120), 120, 96))
T <- fdata(matrix(rnorm(96 * 120), 120, 96))

data <- data.frame(V=as.factor(runif(120) > 0.5))
result <- graph.flm(50,
                    Y ~ V + R + S + T + V:S + V:T,
                    Y ~ R + S + T + V:S + V:T,
                    factors=data,
                    contrasts=TRUE,
                    curve_sets=list(Y=Y, Z=Z))

results in

Error in check_isnested(formula.full, formula.reduced): The reduced model includes some extra variables, not in the full model.
Traceback:

1. graph.flm(50, Y ~ V + R + S + T + V:S + V:T, Y ~ R + S + T + 
 .     V:S + V:T, factors = data, contrasts = TRUE, savefuns = TRUE, 
 .     curve_sets = list(Y = Y, Z = Z))
2. flm.checks(nsim = nsim, formula.full = formula.full, formula.reduced = formula.reduced, 
 .     curve_sets = curve_sets, factors = factors, fast = fast)
3. check_isnested(formula.full, formula.reduced)
4. stop("The reduced model includes some extra variables, not in the full model.")

This appears to be happening because the labels for the terms in the formula are V:S and V:T in the first formula but are computed as S:V and T:V in the reduced formula. A workaround for this is to add V to the full formula last.

fBoxplot: error if length(outliers_id) == 1 (line 85)

https://github.com/myllym/GET/blob/master/R/fboxplot.r

I had error tracked down to line 85 of fBoxplot
outliers <- funcs[,outliers_id]

Error in colnames<-(*tmp*, value = 15L) :
attempt to set 'colnames' on an object with less than two dimensions

if length(outliers_id) == 1, then outliers becomes a vector in recent R versions. Line 87 tries to set colnames:
colnames(outliers) <- outliers_id

solution would be
outliers <- funcs[,outliers_id, drop = FALSE]

See for instance
https://stackoverflow.com/questions/47754509/how-to-use-drop-f-correctly-in-r-to-preserve-matrix-structure-when-subsetting

There may be other issues like this? Brian

graph.flm with functional covariate and contrasts=TRUE fails with error

Hi again,

I get a strange error message from the following example:

library('GET')
library('fda.usc')

Y <- fdata(matrix(rnorm(96 * 120), 120, 96))
Z <- fdata(matrix(rnorm(96 * 120), 120, 96))
data <- data.frame(V=as.factor(runif(120) > 0.5))
result <- graph.flm(50,
                    Y ~ V + Z,
                    Y ~ Z,
                    factors=data,
                    contrasts=TRUE,
                    curve_sets=list(Y=Y, Z=Z))

The error I get (with GET 0.1-8) is

Error in (function (Y, dfs, formula.full, nameinteresting, ...) : 
The option 'contrasts' only valid for discrete factors with at least two levels.
...

but as you can see from the formula, the variable of interest is a discrete factor with 2 levels 🙂

I think this might be a bug? I wasn't able to find anything in the documentation that suggests that this is disallowed.

Thanks!

Two-level, one-way contrast and ordering of levels

Hi,

I am evaluating some data that contains two-level factors, and ran a graph.flm() test with contrasts=TRUE. The result is roughly what I expect, having previously analyzed this data with the fda package (but switching to GET in order to do tests that control for covariates), however the curve is upside down.

My factor has a natural treatment/no-treatment interpretation (labeled 1 and 0 in my dataframe, respectively) so I expected the "baseline" to be subtracted from the treatment when looking at the contrast, but the opposite is being done. Is there any way to force the ordering of the contrast? I have tried relabeling my factor in case it was alphabetical, but without success.

I have resorted to modifying the result object by multiplying obs, central, lo and hi by -1 and then swapping lo and hi, which seems to give me what I would expect, and I think should be correct. Do you have another recommendation?

Also, this was made slightly more confusing by the fact that with only a single variable omitted in the reduced model, the resulting graph is not titled, so it's not possible to determine which way around the contrast was done without running a second test with more categorical factors omitted, in which case the subplots are labeled and I was able to confirm my suspicion. Is there a way to force a plot label when there is only one plot, or otherwise determine which direction the contrast was done?

Thank you in advance and thank you for making this software available!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.