Writing Your Own Functions
Nick Salkowski
March 16, 2017
Code Reuse
- Copying and pasting code leads to errors
- Every time you use your code in a new spot, you might need to tweak some of it -- you'll forget to tweak something important
- Every time you make your code better (update or fix a bug), you'll need to update it everywhere -- you'll forget to update some spot
Functions solve these problems
- Changing function arguments is a lot more straightforward than searching and replacing to tweak code
- Updates / Fixes get applied every time the function is used
Learn from the MONOLITH!
Creating your own function
Functions are objects (like almost everyting in R). You can create one with the function()
function:
outlier <- function(x) {
x_median <- median(x, na.rm = TRUE)
x_mad <- mad(x, na.rm = TRUE)
out_log <- abs(x - x_median) / x_mad > 4
out_index <- which(out_log)
return(out_index)
}
Simulate
test <- c(rpois(98, lambda = 25),
rpois(2, lambda = 625))
hist(test)
Test!
outlier(test)
## [1] 99 100
test[outlier(test)]
## [1] 631 610
Arguments
Let's modify the function to control our outlier criterion:
outlier <- function(x, crit = 4, na.rm = TRUE) {
x_median <- median(x, na.rm = na.rm)
x_mad <- mad(x, na.rm = na.rm)
out_log <- abs(x - x_median) / x_mad > crit
out_index <- which(out_log)
return(out_index)
}
Test!
outlier(test)
## [1] 99 100
test[outlier(test)]
## [1] 631 610
outlier(test, crit = 3)
## [1] 26 57 99 100
test[outlier(test, 3)]
## [1] 41 40 631 610
Complex Returns
Sometimes you want to return more than one thing. Lists are vectors of things, so return a list!
outlier <- function(x, crit = 4, na.rm = TRUE) {
x_median <- median(x, na.rm = na.rm)
x_mad <- mad(x, na.rm = na.rm)
out_log <- abs(x - x_median) / x_mad > crit
out_index <- which(out_log)
return(list(value = x[out_index],
index = out_index))
}
outlier(test)
## $value
## [1] 631 610
##
## $index
## [1] 99 100
Really Complex Returns
return(list(value = x[out_index],
index = out_index,
criteria = list(median = x_median,
mad_sd = x_mad)))
The list . . . it's full of lists!
Add Some Argument Checks
outlier <- function(x, crit = 4, na.rm = TRUE) {
if (!is.numeric(x)) {
stop("x must be numeric")
}
if (all(is.na(x))) {
stop("x values are all NA")
}
if (!is.numeric(crit)) {
stop("crit must be numeric")
}
if (length(crit) > 1) {
crit <- crit[1]
warning("length(crit) > 1, only the first element was used")
}
if (is.na(crit)) {
warning("crit value is NA")
} else if (crit < 0) {
crit <- abs(crit)
warning("crit < 0, abs(crit) used instead")
}
x_median <- median(x, na.rm = na.rm)
x_mad <- mad(x, na.rm = na.rm)
out_log <- abs(x - x_median) / x_mad > crit
out_index <- which(out_log)
return(list(value = x[out_index],
index = out_index))
}
Test!
outlier(letters)
## Error in outlier(letters): x must be numeric
outlier(test, c(4, 2))
## Warning in outlier(test, c(4, 2)): length(crit) > 1, only the first element
## was used
## $value
## [1] 631 610
##
## $index
## [1] 99 100
Test!
outlier(rep(as.integer(NA), 50))
## Error in outlier(rep(as.integer(NA), 50)): x values are all NA
outlier(test, -4)
## Warning in outlier(test, -4): crit < 0, abs(crit) used instead
## $value
## [1] 631 610
##
## $index
## [1] 99 100
Test!
outlier(test, TRUE)
## Error in outlier(test, TRUE): crit must be numeric
outlier(test, as.integer(NA))
## Warning in outlier(test, as.integer(NA)): crit value is NA
## $value
## integer(0)
##
## $index
## integer(0)
Document!
roxygen2-style -- ready for inclusion in a package!
#' Get Robustly Identified Outliers
#'
#' @param x a numeric vector of data
#' @param crit a nonnegative numeric value
#' @param na.rm logical -- passed to median() and mad()
#'
#' @details Calculates the median of x and the robustly estimated
#' standard deviation of x, using the mad() function. If the
#' absolute difference between the median and a value of x is
#' greater than crit robust standard deviations, then the value is
#' considered an outlier.
#'
#' @return a list with two elements: value is a numeric vector of
#' outliers, and index is an integer vector of outlier indices
#' @export
outlier <- function(x, crit = 4, na.rm = FALSE) {
if (!is.numeric(x)) {
stop("x must be numeric")
}
if (all(is.na(x))) {
stop("x values are all NA")
}
if (!is.numeric(crit)) {
stop("crit must be numeric")
}
if (length(crit) > 1) {
crit <- crit[1]
warning("length(crit) > 1, only the first element was used")
}
if (is.na(crit)) {
warning("crit value is NA")
} else if (crit < 0) {
crit <- abs(crit)
warning("crit < 0, abs(crit) used instead")
}
x_median <- median(x, na.rm = na.rm)
x_mad <- mad(x, na.rm = na.rm)
out_log <- abs(x - x_median) / x_mad > crit
out_index <- which(out_log)
return(list(value = x[out_index],
index = out_index))
}
Default Returns
If you don't use the return() function, your function will return the result of the last statement.
add_five <- function(x) {
x + 5
}
add_five(pi)
## [1] 8.141593
Invisible Returns
Use the invisible()
function, obviously:
add_five <- function(x) {
invisible(x + 5)
}
add_five(pi)
z <- add_five(sqrt(2))
z
## [1] 6.414214
Ellipses
You can use ellipses to pass unspecified arguments to functions within your function:
star <- function(x, y, points, radius, ...) {
old_par <- par(mar = c(0, 0, 0, 0))
on.exit(par(old_par))
symbols(x = x, y = y,
stars = matrix(rep(radius * c(1, 0.5), points), nrow = 1),
inches = FALSE, ...)
}
star(0, 0, 8, 0.75, bg = "steelblue", lwd = 4, fg = "navy")
Dropping the Braces
If your function code is really short, you can skip the braces, and it will still work:
add_pi <- function(x) x + pi
add_pi(cos(pi/2))
## [1] 3.141593
But, it is impossible to write robust, well-documented functions this way.
Ephemeral Functions
Sometimes you don't write a function to last, and a sloppy function is good enough.
sapply(mtcars, FUN = function(x) median(abs(x - median(x))))
## mpg cyl disp hp drat wt qsec vs am
## 3.6500 2.0000 94.7500 52.0000 0.4750 0.5175 0.9550 0.0000 0.0000
## gear carb
## 1.0000 1.0000
Watch out!
If you don't pay attention, your function will do things that you don't want. Try to keep your functions narrowly focused:
- It is easier to write a small function that does one simple thing than a big function that does something complex
- So, if you have a complex task, write a bunch of simple functions first, then put them together (perhaps in a bigger function)
Environments
R objects are organized into environments. Usually your code runs in the global environment. Your function can reference objects in the global environment . . .
z <- 5
double_z <- function() {
return(z * 2)
}
double_z()
## [1] 10
but it is usually better to pass objects as arguments.
Function Environments
But, your function code is evaluated in its own environment -- so objects that you create or modify generally have no effect on objects in the global environment:
z <- 5
double_z <- function() {
z <- z * 2
return(z)
}
double_z()
## [1] 10
z
## [1] 5
Black Magic
Modifying or creating objects outside the function environment is black magic, and should be avoided...
The Obvious Alternative
Assign the function result to an object in the global environment
z <- 5
double_it <- function(x) {
return(x * 2)
}
z <- double_it(z)
z
## [1] 10