Comments (15)
@edgararuiz commented on Sep 4, 2017, 6:17 PM UTC:
Hi @emilyriederer , does a regular using forward slash by itself works in you environment:
data <- mutate(data, z = x / 5)
from dbplyr.
@emilyriederer commented on Sep 15, 2017, 1:29 AM UTC:
Hi @edgararuiz -- unfortunately not.
It might work just as happenstance due to integer division, but the SQL translation also turns 5 into a floating point:
SELECT "x", "x" / 5.0 AS "z" FROM "data_db"
from dbplyr.
@edgararuiz commented on Sep 15, 2017, 1:31 AM UTC:
Ok, can we try appending an "L" to the right of 5?
data <- mutate(data, z = x / 5L)
from dbplyr.
@hadley commented on Oct 23, 2017, 4:43 PM UTC:
Minimal reprex
dbplyr::translate_sql(x %/% 5)
#> <SQL> "x" / 5.0
from dbplyr.
@hadley commented on Oct 23, 2017, 4:46 PM UTC:
To make equivalent to 1 %/% 0.2
will probably need to implement via modulo arithmetic.
x == (x %/% m) * m + (x %% m)
x %/% m == (x - (x %% m)) / m
from dbplyr.
@hadley commented on Oct 23, 2017, 10:35 PM UTC:
But this needs to be thought through correctly - I have a vague recollection that negative values might cause issues.
from dbplyr.
@hadley commented on Oct 24, 2017, 10:11 PM UTC:
sql_int_div <- function() {
function(x, m) {
build_sql("((", x, " - (", x, " % ", m, ")) / ", m, ")")
}
}
That definition gets us pretty close, but it turns out that %%
in R and %
in SQL have slightly different semantics when the signs are different:
library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)
df <- tibble(
x = c(10, 10, -10, -10),
y = c(3, -3, 3, -3)
)
df %>% mutate(x %% y, x %/% y)
#> # A tibble: 4 x 4
#> x y `x%%y` `x%/%y`
#> <dbl> <dbl> <dbl> <dbl>
#> 1 10 3 1 3
#> 2 10 -3 -2 -4
#> 3 -10 3 2 -4
#> 4 -10 -3 -1 3
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
mf <- con %>% copy_to(df)
mf %>% mutate(x %% y, x %/% y)
#> # Source: lazy query [?? x 4]
#> # Database: sqlite 3.19.3 [:memory:]
#> x y `x%%y` `x%/%y`
#> <dbl> <dbl> <dbl> <dbl>
#> 1 10 3 1 3
#> 2 10 -3 1 -3
#> 3 -10 3 -1 -3
#> 4 -10 -3 -1 3
I'm not sure how to handle this :(
from dbplyr.
@alex-gable commented on Jan 26, 2018, 8:44 AM UTC:
This is a result of C98 and beyond (and by extension SQLite) using truncated division where the modulo operator takes the sign of the dividend, and R using the mathematically preferred floored division with the modulo sign taking the sign of the divisor.
Quite frankly, C(SQLite) and R are doing fundamentally different arithmetic. There's some fascinating reading on the subject here and of course an abridged version on Wikipedia. This will likely also vary across SQL dialects, which makes it more difficult to pin down in a unified way.
Given the above, I'm not sure it is reasonable to expect equivalent output in every language. Python covered some of the complexities in this discussion in PEP-228: Reworking Python's Numeric Model and PEP-238: Changing the Division Operator, highlighting that this is not just a "dplyr issue", but rather a significantly larger architectural decision in R and computer arithmetic itself.
@emilyriederer, In the case of RedShift, a python udf could be constructed in your database leveraging numpy to replicate the output from R.
from dbplyr.
@hadley commented on Jan 26, 2018, 1:37 PM UTC:
@alex-gable thanks for that awesome summary of the problem!
from dbplyr.
@hadley commented on Jun 7, 2018, 11:37 PM UTC:
I think the best way to handle this is simple to document it.
from dbplyr.
@emilyriederer commented on Jun 9, 2018, 6:51 PM UTC:
Wow - thank you all for the very helpful, detailed responses. All of the context here is fascinating. I'm embarrassed to discover that I completely "went dark" on this thread. Somehow, I'm not getting notifications but luckily spotted this atop the new GitHub feed. Thanks again!
from dbplyr.
I was able to force non-integer operation by adding 0.0
to one of the variables. In contrast, as.numeric
seemed to have no effect:
library(dplyr)
data <- data.frame(x = as.integer(1:5), y = as.integer(5:1))
data_db <- copy_to(dbplyr::src_memdb(), data, "data_db")
mutate(data_db, z = as.numeric(x)/y)
#> # Source: lazy query [?? x 3]
#> # Database: sqlite 3.22.0 [:memory:]
#> x y z
#> <int> <int> <int>
#> 1 1 5 0
#> 2 2 4 0
#> 3 3 3 1
#> 4 4 2 2
#> 5 5 1 5
mutate(data_db, z = (x + 0)/y)
#> # Source: lazy query [?? x 3]
#> # Database: sqlite 3.22.0 [:memory:]
#> x y z
#> <int> <int> <dbl>
#> 1 1 5 0.2
#> 2 2 4 0.5
#> 3 3 3 1
#> 4 4 2 2
#> 5 5 1 5
from dbplyr.
Oh but there’s a DIV function we could use instead
from dbplyr.
- SQL server: need to cast (https://stackoverflow.com/questions/3443672)
- mysql:
DIV
https://dev.mysql.com/doc/refman/8.0/en/arithmetic-functions.html#operator_div; but example has different results to R,c(5 %/% 2, -5 %/% 2, 5 %/% -2, -5 %/% 2))
->c(2, -3, -3, -3)
- postgres:
/
truncates.
So I think overall this is just too complicated, and not worth providing a translation. I'll just make %/%
return a clean error (since otherwise it's translated to /
, which isn't correct)`
from dbplyr.
FYI, @hadley, it seems that the wanted results can be achieved by using FLOOR(x/y)
for the integer division and by x-y*FLOOR(x/y)
for the modulo.
from dbplyr.
Related Issues (20)
- Breaking changes in dbplyr 2.5.0 HOT 9
- `distinct()` in Databricks/SparkSQL causes "arrange()... __row_num_*" error
- `head()` modifies `SELECT *`
- Export `check_na_rm()`
- rows_upsert worked in 2.4.0, broke with 2.5.0 HOT 2
- `sql` in `in_catalog` breaks with `filter` HOT 1
- Regression in window_order variable argument parsing for 2.5.0
- Unable to use quantile() in mutate() in DuckDB HOT 4
- Possible inconsistency in translation of stringr::str_like() with respect to ignore_case argument HOT 2
- postition of head in pipeline no longer leading to different sql HOT 3
- rows_delete fails due to key constraints: Can’t Modify Database Table
- dbplyr `across()` behavior differs from dplyr in a grouped context HOT 1
- Adding translations for clock::date_count_between()
- `rand_expr` is undocumented
- overflow error when counting very large tables in sql server
- Unusual behaviour of dplyr::pull() HOT 2
- Make compute docs easier to understand
- Supporting persisted tables for Spark SQL backend
- MSSQL slice_sample() translation always returns the same rows
- na_matches="na" coerces inequality and overlap joins to equality joins
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbplyr.