Giter Site home page Giter Site logo

writing_functions_20170316's Introduction

Writing Your Own Functions

Nick Salkowski
March 16, 2017

Code Reuse

  • Copying and pasting code leads to errors
    • Every time you use your code in a new spot, you might need to tweak some of it -- you'll forget to tweak something important
    • Every time you make your code better (update or fix a bug), you'll need to update it everywhere -- you'll forget to update some spot

Don't be like this ape-guy...

Functions solve these problems

  • Changing function arguments is a lot more straightforward than searching and replacing to tweak code
  • Updates / Fixes get applied every time the function is used

Learn from the MONOLITH!

Creating your own function

Functions are objects (like almost everyting in R). You can create one with the function() function:

outlier <- function(x) {
  x_median <- median(x, na.rm = TRUE)
  x_mad <- mad(x, na.rm = TRUE)
  out_log <- abs(x - x_median) / x_mad > 4
  out_index <- which(out_log)
  return(out_index)
}

Simulate

test <- c(rpois(98, lambda = 25),
          rpois(2, lambda = 625))
hist(test)

Test!

outlier(test)
## [1]  99 100
test[outlier(test)]
## [1] 631 610

Lookin' good.

Arguments

Let's modify the function to control our outlier criterion:

outlier <- function(x, crit = 4, na.rm = TRUE) {
  x_median <- median(x, na.rm = na.rm)
  x_mad <- mad(x, na.rm = na.rm)
  out_log <- abs(x - x_median) / x_mad > crit
  out_index <- which(out_log)
  return(out_index)
}

Test!

outlier(test)
## [1]  99 100
test[outlier(test)]
## [1] 631 610
outlier(test, crit = 3)
## [1]  26  57  99 100
test[outlier(test, 3)]
## [1]  41  40 631 610

Complex Returns

Sometimes you want to return more than one thing. Lists are vectors of things, so return a list!

outlier <- function(x, crit = 4, na.rm = TRUE) {
  x_median <- median(x, na.rm = na.rm)
  x_mad <- mad(x, na.rm = na.rm)
  out_log <- abs(x - x_median) / x_mad > crit
  out_index <- which(out_log)
  return(list(value = x[out_index],
              index = out_index))
}
outlier(test)
## $value
## [1] 631 610
## 
## $index
## [1]  99 100

Really Complex Returns

return(list(value = x[out_index],
            index = out_index,
            criteria = list(median = x_median,
                            mad_sd = x_mad)))

The list . . . it's full of lists!

Add Some Argument Checks

outlier <- function(x, crit = 4, na.rm = TRUE) {
  if (!is.numeric(x)) {
    stop("x must be numeric")
  }
  if (all(is.na(x))) {
    stop("x values are all NA")
  }
  if (!is.numeric(crit)) {
    stop("crit must be numeric")
  }
  if (length(crit) > 1) {
    crit <- crit[1]
    warning("length(crit) > 1, only the first element was used")
  }
  if (is.na(crit)) {
    warning("crit value is NA")
  } else if (crit < 0) {
    crit <- abs(crit)
    warning("crit < 0, abs(crit) used instead")
  }
  x_median <- median(x, na.rm = na.rm)
  x_mad <- mad(x, na.rm = na.rm)
  out_log <- abs(x - x_median) / x_mad > crit
  out_index <- which(out_log)
  return(list(value = x[out_index],
              index = out_index))
}

Test!

outlier(letters)
## Error in outlier(letters): x must be numeric
outlier(test, c(4, 2))
## Warning in outlier(test, c(4, 2)): length(crit) > 1, only the first element
## was used
## $value
## [1] 631 610
## 
## $index
## [1]  99 100

Test!

outlier(rep(as.integer(NA), 50))
## Error in outlier(rep(as.integer(NA), 50)): x values are all NA
outlier(test, -4)
## Warning in outlier(test, -4): crit < 0, abs(crit) used instead
## $value
## [1] 631 610
## 
## $index
## [1]  99 100

Test!

outlier(test, TRUE)
## Error in outlier(test, TRUE): crit must be numeric
outlier(test, as.integer(NA))
## Warning in outlier(test, as.integer(NA)): crit value is NA
## $value
## integer(0)
## 
## $index
## integer(0)

Document!

roxygen2-style -- ready for inclusion in a package!

#' Get Robustly Identified Outliers
#'
#' @param x a numeric vector of data
#' @param crit a nonnegative numeric value
#' @param na.rm logical -- passed to median() and mad()
#' 
#' @details Calculates the median of x and the robustly estimated 
#' standard deviation of x, using the mad() function.  If the 
#' absolute difference between the median and a value of x is 
#' greater than crit robust standard deviations, then the value is 
#' considered an outlier.
#'
#' @return a list with two elements: value is a numeric vector of
#' outliers, and index is an integer vector of outlier indices
#' @export

outlier <- function(x, crit = 4, na.rm = FALSE) {
  if (!is.numeric(x)) {
    stop("x must be numeric")
  }
  if (all(is.na(x))) {
    stop("x values are all NA")
  }
  if (!is.numeric(crit)) {
    stop("crit must be numeric")
  }
  if (length(crit) > 1) {
    crit <- crit[1]
    warning("length(crit) > 1, only the first element was used")
  }
  if (is.na(crit)) {
    warning("crit value is NA")
  } else if (crit < 0) {
    crit <- abs(crit)
    warning("crit < 0, abs(crit) used instead")
  }
  x_median <- median(x, na.rm = na.rm)
  x_mad <- mad(x, na.rm = na.rm)
  out_log <- abs(x - x_median) / x_mad > crit
  out_index <- which(out_log)
  return(list(value = x[out_index],
              index = out_index))
}

Default Returns

If you don't use the return() function, your function will return the result of the last statement.

add_five <- function(x) {
  x + 5
}
add_five(pi)
## [1] 8.141593

Invisible Returns

Use the invisible() function, obviously:

add_five <- function(x) {
  invisible(x + 5)
}
add_five(pi)
z <- add_five(sqrt(2))
z
## [1] 6.414214

Ellipses

You can use ellipses to pass unspecified arguments to functions within your function:

star <- function(x, y, points, radius, ...) {
  old_par <- par(mar = c(0, 0, 0, 0))
  on.exit(par(old_par))
  symbols(x = x, y = y, 
          stars = matrix(rep(radius * c(1, 0.5), points), nrow = 1), 
          inches = FALSE, ...)
}
star(0, 0, 8, 0.75, bg = "steelblue", lwd = 4, fg = "navy")

Dropping the Braces

If your function code is really short, you can skip the braces, and it will still work:

add_pi <- function(x) x + pi
add_pi(cos(pi/2))
## [1] 3.141593

But, it is impossible to write robust, well-documented functions this way.

Ephemeral Functions

Sometimes you don't write a function to last, and a sloppy function is good enough.

sapply(mtcars, FUN = function(x) median(abs(x - median(x))))
##     mpg     cyl    disp      hp    drat      wt    qsec      vs      am 
##  3.6500  2.0000 94.7500 52.0000  0.4750  0.5175  0.9550  0.0000  0.0000 
##    gear    carb 
##  1.0000  1.0000

Watch out!

If you don't pay attention, your function will do things that you don't want. Try to keep your functions narrowly focused:

  • It is easier to write a small function that does one simple thing than a big function that does something complex
  • So, if you have a complex task, write a bunch of simple functions first, then put them together (perhaps in a bigger function)

Watch out for the unexpected!

Environments

R objects are organized into environments. Usually your code runs in the global environment. Your function can reference objects in the global environment . . .

z <- 5
double_z <- function() {
  return(z * 2)
}
double_z()
## [1] 10

but it is usually better to pass objects as arguments.

Function Environments

But, your function code is evaluated in its own environment -- so objects that you create or modify generally have no effect on objects in the global environment:

z <- 5
double_z <- function() {
  z <- z * 2
  return(z)
}
double_z()
## [1] 10
z
## [1] 5

Black Magic

Modifying or creating objects outside the function environment is black magic, and should be avoided...

because it's real.

The Obvious Alternative

Assign the function result to an object in the global environment

z <- 5
double_it <- function(x) {
  return(x * 2)
}
z <- double_it(z)
z
## [1] 10

Any Questions?

https://github.com/NickSalkowski/Writing_Functions_20170316

writing_functions_20170316's People

Contributors

nicksalkowski avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.