Giter Site home page Giter Site logo

happma / pseudorank Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 78 KB

R Package providing functions to calculate pseudo-ranks and pseudo-rank based nonparametric test statistics.

R 83.08% C++ 16.92%
r rank pseudo-rank nonparametric cpp rstats rank-tests nonparametric-statistics trend-test pseudo-ranks

pseudorank's Introduction

pseudorank 1.0.2

CRANstatus Travis-CI Build Status Build status

This R package provides a function written in C++ to calculate pseudo-ranks in R and some rank statistics which can opionally use pseudo-ranks instead of ranks. For a definition and discussion of pseudo-ranks, see for example

Brunner, E., Bathke, A.C., and Konietschke, F. (2018a). Rank- and Pseudo-Rank Procedures for Independent Observations in Factorial Designs - Using R and SAS. Springer Series in Statistics, Springer, Heidelberg. ISBN: 978-3-030-02912-8.

Happ M, Zimmermann G, Brunner E, Bathke AC (2020). Pseudo-Ranks: How to Calculate Them Efficiently in R. Journal of Statistical Software, Code Snippets, 95(1), 1-22. doi: 10.18637/jss.v095.c01 (URL:https://doi.org/10.18637/jss.v095.c01).

To install the current development version from github:

## install devtools package
if (!requireNamespace("devtools", quietly = TRUE)) {
  install.packages("devtools")
}
# install package
devtools::install_github("happma/pseudorank")
library(pseudorank)

Calculating Pseudo-Ranks

The function 'pseudorank' can either be used with data.frames or with vectors. Please note that when using a data.frame only one grouping factor can be used.

# some example data
df <- data.frame(data =rnorm(100),group = c(rep(1,40),rep(2,40),rep(3,20)))
df$group <- as.factor(df$group)

# two ways to calculate pseudo-ranks
# Variant 1: use a vector for the data and a group vector
pseudorank(df$data, df$group)

# Variant 2: use a formual object, Note that only one group factor can be used
# that is, in data~group*group2 only 'group' will be used
pseudorank(data ~ group, df)

Similarly to the 'rank' function from base R, there are several different methods which can be used to handle ties in the data. Most nonparametric tests rely on so-called mid-(pseudo)-ranks, these are just the average of minimum and maximum (pseudo)-ranks.

# some example data
df <- data.frame(data =rnorm(100),group = c(rep(1,40),rep(2,40),rep(3,20)))
df$group <- as.factor(df$group)

# min and max pseudo-ranks
m <- pseudorank(df$data ,df$group, ties.method = "min")
M <- pseudorank(df$data ,df$group, ties.method = "max")

mid <- pseudorank(df$data ,df$group, ties.method = "average") # identical to (m+M)/2

In case of missing values there are three options to choose from. These are the same as for the function 'rank' or 'sort' from base R. It is recommended to use 'na.last = NA' to remove the NAs. If the NAs are kept, they can either be put at the beginning or the end of your data, then the pseudo-ranks from those NAs depend on the order they appear in the data. The order does not matter only if the groups containing missing values have the same sample size. See the following R Code for an example of this problem where observation 1 and 4 are interchanged. Here, the pseudo-ranks for those two observations are different, all other pseudo-ranks remain unchanged.

data = c(NA,7,1,NA,3,3,5.5,6,NA, 3, 1)
group =  as.factor(c(1,1,1,2,2,2,3,3,3,1,3))

# Variant 1: keep NAs and put them last
pseudorank(data, group, na.last = TRUE)

# we change the order of observation 1 and 4 (both NAs)
group =  as.factor(c(2,1,1,1,2,2,3,3,3,1,3))
pseudorank(data, group, na.last = TRUE)

# Variant 2: keep NAs and put them first
pseudorank(data, group, na.last = FALSE)

# Variant 3: remove Nas (recommended)
pseudorank(data, group, na.last = NA)

Hettmansperger-Norton Test for Patterned Alternatives in k-Sample Problems

The test implemented in this package uses pseudo-ranks instead of ranks. This is mainly due to paradoxical results caused by ranks. See

Brunner, E., Konietschke, F., Bathke, A. C., & Pauly, M. (2018). Ranks and Pseudo-Ranks-Paradoxical Results of Rank Tests. arXiv preprint arXiv:1802.05650.

for a discussion of this problem.

# create some data, please note that the group factor needs to be ordered
df <- data.frame(data = c(rnorm(40, 3, 1), rnorm(40, 2, 1), rnorm(20, 1, 1)),
  group = c(rep(1,40),rep(2,40),rep(3,20)))
df$group <- factor(df$group, ordered = TRUE)

# you can either test for a decreasing, increasing or custom trend
hettmansperger_norton_test(df$data, df$group, alternative="decreasing")
hettmansperger_norton_test(df$data, df$group, alternative="increasing")
hettmansperger_norton_test(df$data, df$group, alternative="custom", trend = c(1, 3, 2))

Kruskal-Wallis Test with Pseudo-Ranks

The Kruskal-Wallis test implemented in this package can use pseudo-ranks, if the argument 'pseudoranks = TRUE' is used.

# create some artificial data
x = c(1, 1, 1, 1, 2, 3, 4, 5, 6)
grp = as.factor(c('A','A','B','B','B','D','D','D','D'))
df = data.frame(x = x, grp = grp)

kruskal_wallis_test(x~grp, data=df, pseudoranks=TRUE)

pseudorank's People

Contributors

happma avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.