Comments (14)
@edgararuiz commented on Sep 6, 2017, 12:46 AM UTC:
Hi @pssguy , it looks like this is a nuance of the queries MSSQL accepts. In order for it to work, the query from test1
that originally translates to this:
# <SQL>
# SELECT "Sepal.Length" AS "Sepal.Length"
# FROM (SELECT *
# FROM "iris"
# ORDER BY "Species") "omlcgfsrjt"
Should actually translate to this:
# <SQL>
# SELECT "Sepal.Length" AS "Sepal.Length"
# FROM "iris"
# ORDER BY "Species"
We will need to figure a way to optimize this query.
from dbplyr.
@pssguy commented on Sep 7, 2017, 10:48 PM UTC:
@edgararuiz Thanks. Not just optimizing, of course. Currently an error is thrown
from dbplyr.
@edgararuiz commented on Sep 7, 2017, 11:17 PM UTC:
Right, optimize was probably the wrong word choice, I meant we'll need to figure out a way to merge the two SQL query layers to prevent the error from happening.
from dbplyr.
@imanuelcostigan commented on Oct 18, 2017, 9:04 PM UTC:
@edgararuiz you may want to look at RSQLServer's sql_select()
method which deals with a number of SQL Server's SELECT
idiosyncracies.
from dbplyr.
@edgararuiz commented on Oct 19, 2017, 12:43 AM UTC:
Thanks @imanuelcostigan ! I'll take a look
from dbplyr.
@hadley commented on Oct 23, 2017, 4:41 PM UTC:
@edgararuiz do you want to work on a PR for this issue?
from dbplyr.
@edgararuiz commented on Oct 23, 2017, 4:51 PM UTC:
Yes, I'll be happy to
from dbplyr.
@hadley commented on Nov 2, 2017, 8:47 PM UTC:
Minimal reprex:
library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
mf <- copy_to(con, data.frame(x = 1:5, y = 5:1), name = "test")
mf %>%
arrange(x) %>%
select(y) %>%
show_query()
#> <SQL>
#> SELECT `y`
#> FROM (SELECT *
#> FROM `test`
#> ORDER BY `x`)
DBI::dbGetQuery(con, "SELECT y FROM test ORDER BY x")
#> y
#> 1 5
#> 2 4
#> 3 3
#> 4 2
#> 5 1
Ideally this would only generate one query because conceptually the select happens after the arrange.
I think that implies we can fix this issue by reordering select_query_clauses()
, which currently looks like this:
present <- c(
where = length(x$where) > 0,
group_by = length(x$group_by) > 0,
having = length(x$having) > 0,
select = !identical(x$select, sql("*")),
distinct = x$distinct,
order_by = length(x$order_by) > 0,
limit = !is.null(x$limit)
)
Currently select
comes before arrange
when really it should come afterwards.
And indeed if we move select
to the end then we get:
SELECT `y`
FROM `test`
ORDER BY `x`
It remains to consider if this is actually correct - i.e. are there situations when this change would yield invalid SQL
from dbplyr.
@hadley commented on Nov 2, 2017, 9:13 PM UTC:
Ah I think the problem with performing this optimisation is this query:
memdb_frame(x = 1:2) %>%
arrange(x) %>%
mutate(x = -x)
This should return c(-1, -2)
, but if we collapse the query as described above we generate:
SELECT -`x` AS `x`
FROM `gatlqlicge`
ORDER BY `x`
which yields c(-2, -1)
because the ORDER BY clause uses aliases defined in SELECT (as described in https://sqlbolt.com/lesson/select_queries_order_of_execution). This means that this optimisation is not possible in general.
But this is a mutate()
and the motivation issue is a select()
. Can we perform the optimisation at a higher level? I think the answer is no, because select()
s can rename variables and this SQL would still be incorrect:
memdb_frame(x = 1:2, y = 3:2) %>%
arrange(x) %>%
select(x = y) %>%
show_query()
#> SELECT `y` AS `x`
#> FROM `bmvfznmfws`
#> ORDER BY `x`
from dbplyr.
@hadley commented on Nov 2, 2017, 9:18 PM UTC:
But maybe we can just do the optimisation when the select
doesn't create any aliases
from dbplyr.
@hadley commented on Nov 2, 2017, 9:33 PM UTC:
I tried that and couldn't get it to work π’
from dbplyr.
Reprex of original problem:
library(DBI)
library(dplyr)
con <- dbConnect(odbc::odbc(), "SQL Server", database = "airontime")
x <- tbl(con, "airlines")
x %>%
arrange(carrier) %>%
select(name) %>%
head()
#> Error in new_result(connection@ptr, statement) :
#> nanodbc/nanodbc.cpp:1344: HY000: The ORDER BY clause is invalid in views, inline functions,
#> derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is
#> also specified., Statement(s) could not be prepared.,
Because this is the generated SQL:
SELECT TOP 6 "name" FROM
(SELECT * FROM "airlines" ORDER BY "carrier") "jmausnbmvw"
so another option might be to try and push TOP
down into the lower level.
from dbplyr.
Note that the problem also occurs for other queries:
x %>%
arrange(carrier) %>%
select(name) %>%
mutate(name = substr(name, 1, 1)) %>%
collect()
i.e. it's the ORDER BY
in the subquery that's the problem, not the TOP
.
from dbplyr.
Having read through the MS SQL docs, I don't think there's anything the dbplyr can do about this β you just need to make sure that arrange()
is always the last step in the pipe.
It would be nice if we could give a better error message here, but there's no easy way to do it, and given that no one else has commented on this issue, it seems unlikely to be a common problem.
from dbplyr.
Related Issues (20)
- Incorrect translation of dates with Oracle ODBC. HOT 1
- Pre-release revdeps HOT 2
- Helper to get `last_sql()`
- str_detect() and str_replace() not treating period (.) detection same in snowflake translation HOT 3
- `paste0()` incorrectly uses a `sep` argument
- Snowflake translation error: dropped filter with `anti_join` HOT 3
- Filter by a column in another table HOT 5
- Release dbplyr 2.5.0
- rstudio autocomplete slows down from 2.3.4 -> 2.4.0
- Breaking changes in dbplyr 2.5.0 HOT 9
- `distinct()` in Databricks/SparkSQL causes "arrange()... __row_num_*" error
- `head()` modifies `SELECT *`
- Export `check_na_rm()`
- rows_upsert worked in 2.4.0, broke with 2.5.0 HOT 2
- `sql` in `in_catalog` breaks with `filter` HOT 1
- Regression in window_order variable argument parsing for 2.5.0
- Unable to use quantile() in mutate() in DuckDB HOT 4
- Possible inconsistency in translation of stringr::str_like() with respect to ignore_case argument HOT 2
- postition of head in pipeline no longer leading to different sql HOT 3
- rows_delete fails due to key constraints: Canβt Modify Database Table
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbplyr.