Giter Site home page Giter Site logo

sqlutils's Introduction

The sqlutils Package

The sqlutils package provides a set of utility functions to help manage a library of structured query language (SQL) files. The package can be installed from Github using the devtools package.

require(devtools)
install_github('sqlutils', 'jbryer')

The sqlutils package provides functions to document, cache, and execute SQL queries. The location of the SQL files is determined by the sqlPaths() function. This function behaves in a manner consistent with the .libPaths() function. By default, a single path will be defined being the data directory where the sqlutils package is installed.

> sqlPaths()
[1] "/Users/jbryer/R/sqlutils/data"

Additional search paths can be added using sqlPaths('/Path/To/SQL/Files'). By convention, sqlutils will work with any plain text files with a .sql file extention in any of the directories returned from sqlPaths(). In the case of multiple files with the same name, first one wins.

In addition to working with a library (directory) of SQL files, sqlutils recognizes roxygen2 style documentation. The StudentsInRange script (located in the data directory of the installed package), exemplifies how to create a SQL query with two parameters as well as how to define those parameters and provide default values. Default values are used when the user fails to supply values within the execQuery or cacheQuery functions (described in detail bellow). The available documenations tages are:

  • @param paramName - This provides a description of the parameter.
  • @default paramName - This defines the default value. This can be any valid R statement.
  • @return columnName - Provides documentation for any returned columns.

The contents of the StudentsInRange query follows:

#' Students enrolled within the given date range.
#' 
#' @param startDate the start of the date range to return students.
#' @default startDate format(Sys.Date(), '%Y-01-01')
#' @param endDate the end of the date range to return students.
#' @default endDate format(Sys.Date(), '%Y-%m-%d')
#' @return CreatedDate the date the row was added to the warehouse data.
#' @return StudentId the student id.
SELECT * 
FROM students 
WHERE CreatedDate >= ':startDate:' AND CreatedDate <= ':endDate:'

It should be noted that parameters are replaced just before executing the query and must be contained with a pair of colons (:) and be valid R object names (i.e. not start with a number, contain spaces, or special characters).

We can now retrieve the documentation from within R using the sqldoc command.

> sqldoc('StudentsInRange')
Students enrolled within the given date range.
Parameters:
     param                                            desc                        default default.val
 startDate the start of the date range to return students. format(Sys.Date(), '%Y-01-01')  2012-01-01
   endDate   the end of the date range to return students. format(Sys.Date(), '%Y-%m-%d')  2012-11-19
Returns (note that this list may not be complete):
    variable                                              desc
 CreatedDate the date the row was added to the warehouse data.
   StudentId                                   the student id.

The required parameters can also be retrieved using the getParameters function.

> getParameters('StudentsInRange')
[1] "startDate" "endDate"

In the case there are no parameters, an empty character vector is returned.

> getParameters('StudentSummary')
character(0)

A list of all available queries is returned using the getQueries() function.

> getQueries()
 [1] "StudentsInRange" "StudentSummary" 

There are two functions available to execute queries, execQuery and cacheQuery. The former will send the SQL query to the database upon every execution. The latter however, maintains a local cached version (as a CSV or Rda file) of the resulting data frame. Specifically, the function creates a unique filename based upon the query name and parameters (see getCacheFilename function; this can also be overwritten using the filename parameter). If that file exists in specified directory (the current working directory by default), then it reads the file from disk and returns that. If the file does not exist, then execQuery is called, the result data frame saved to disk, and then the data frame is returned. The following complete example loads the students data frame from the retention package, saves it to a SQLite database, and executes the two included queries.

> require(RSQLite)
> sqlfile <- paste(system.file(package='sqlutils'), '/db/students.db', sep='')
> m <- dbDriver("SQLite")
> conn <- dbConnect(m, dbname=sqlfile)
> q1 <- execQuery('StudentSummary', connection=conn)
> head(q1)
  CreatedDate count
1  2002-07-15  8365
2  2002-08-15  8251
3  2002-09-15  8259
4  2002-10-15  8258
5  2002-11-15  8151
6  2002-12-15  8415

Supported databases

The sqlutils package supports database access using the RODBC, RSQLite, RPostgreSQL, and RMySQL packages using an S3 generic function call called sqlexec based upon the class of the connection parameter. For example, create a new database connection for connections of class foo, the following provides the skeleton of the function to implement:

sqlexec.foo <- function(connection, sql, ...) {
	#Database implementation here.
	#The ... will be passed through from the execQuery call. 
}

sqlutils's People

Contributors

jbryer avatar compstats avatar

Watchers

Keith Twombley avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.