Giter Site home page Giter Site logo

queryparser's Introduction

queryparser

CRAN status Travis build status AppVeyor build status Codecov test coverage

queryparser translates SQL queries into lists of unevaluated R expressions.

⚠️ Most R users should not directly use queryparser. Instead, use it through tidyquery.

For an introduction to tidyquery and queryparser, watch the recording of the talk “Bridging the Gap between SQL and R” from rstudio::conf(2020).

Installation

Install the released version of queryparser from CRAN with:

install.packages("queryparser")

Or install the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("ianmcook/queryparser")

Usage

Call the function parse_query(), passing a SELECT statement enclosed in quotes as the first argument:

library(queryparser)

parse_query("SELECT DISTINCT carrier FROM flights WHERE dest = 'HNL'")
#> $select
#> $select[[1]]
#> carrier
#> 
#> attr(,"distinct")
#> [1] TRUE
#> 
#> $from
#> $from[[1]]
#> flights
#> 
#> 
#> $where
#> $where[[1]]
#> dest == "HNL"

Queries can include the clauses SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, and LIMIT:

parse_query(
" SELECT origin, dest,
    COUNT(flight) AS num_flts,
    round(SUM(seats)) AS num_seats,
    round(AVG(arr_delay)) AS avg_delay
  FROM flights f LEFT OUTER JOIN planes p
    ON f.tailnum = p.tailnum
  WHERE distance BETWEEN 200 AND 300
    AND air_time IS NOT NULL
  GROUP BY origin, dest
  HAVING num_flts > 3000
  ORDER BY num_seats DESC, avg_delay ASC
  LIMIT 2;"
)
#> $select
#> $select[[1]]
#> origin
#> 
#> $select[[2]]
#> dest
#> 
#> $select$num_flts
#> sum(!is.na(flight), na.rm = TRUE)
#> 
#> $select$num_seats
#> round(sum(seats, na.rm = TRUE))
#> 
#> $select$avg_delay
#> round(mean(arr_delay, na.rm = TRUE))
#> 
#> attr(,"aggregate")
#>                      num_flts num_seats avg_delay 
#>     FALSE     FALSE      TRUE      TRUE      TRUE 
#> 
#> $from
#> $from$f
#> flights
#> 
#> $from$p
#> planes
#> 
#> attr(,"join_types")
#> [1] "left outer join"
#> attr(,"join_conditions")
#> attr(,"join_conditions")[[1]]
#> f.tailnum == p.tailnum
#> 
#> 
#> $where
#> $where[[1]]
#> (distance >= 200 & distance <= 300) & !is.na(air_time)
#> 
#> 
#> $group_by
#> $group_by[[1]]
#> origin
#> 
#> $group_by[[2]]
#> dest
#> 
#> 
#> $having
#> $having[[1]]
#> num_flts > 3000
#> 
#> 
#> $order_by
#> $order_by[[1]]
#> -xtfrm(num_seats)
#> 
#> $order_by[[2]]
#> avg_delay
#> 
#> attr(,"aggregate")
#> [1] FALSE FALSE
#> 
#> $limit
#> $limit[[1]]
#> [1] 2
#> 
#> 
#> attr(,"aggregate")
#> [1] TRUE

Set the argument tidyverse to TRUE to use functions from tidyverse packages including dplyr, stringr, and lubridate in the R expressions:

parse_query("SELECT COUNT(*) AS n FROM t WHERE x BETWEEN y AND z ORDER BY n DESC", tidyverse = TRUE)
#> $select
#> $select$n
#> dplyr::n()
#> 
#> attr(,"aggregate")
#>    n 
#> TRUE 
#> 
#> $from
#> $from[[1]]
#> t
#> 
#> 
#> $where
#> $where[[1]]
#> dplyr::between(x, y, z)
#> 
#> 
#> $order_by
#> $order_by[[1]]
#> dplyr::desc(n)
#> 
#> attr(,"aggregate")
#> [1] FALSE
#> 
#> attr(,"aggregate")
#> [1] TRUE

queryparser will translate only explicitly allowed functions and operators, preventing injection of malicious code:

parse_query("SELECT x FROM y WHERE system('rm -rf /')")
#> Error: Unrecognized function or operator: system

Current Limitations

queryparser does not currently support:

  • Subqueries
  • Unions
  • SQL-89-style (implicit) join notation
  • The WITH clause (common table expressions)
  • OVER expressions (window or analytic functions)
  • Some SQL functions and operators

queryparser currently has the following known limitations:

  • Some SQL expressions will translate only when tidyverse is set to TRUE. An example of this is COUNT(DISTINCT ...) expressions with multiple arguments.
  • When logical operators (such as IS NULL) have unparenthesized expressions as their operands, R will interpret the resulting code using a different order of operations than a SQL engine would. When using an expression as the operand to a logical operator, always enclose the expression in parentheses.
  • The error messages that occur when attempting to parse invalid or unrecognized SQL are often non-informative.

Non-Goals

queryparser is not intended to:

  • Translate other types of SQL statements (such as INSERT or UPDATE)
  • Customize translations for specific SQL dialects
  • Fully validate the syntax of the SELECT statements passed to it
  • Efficiently process large batches of queries
  • Facilitate the analysis of queries (for example, to identify patterns)

Related Work

The sqlparseR package (CRAN) provides a wrapper around the Python module sqlparse.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.