<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

dbplyr does not translate base R integer division %/% about dbplyr HOT 15 CLOSED

tidyverse commented on May 29, 2024

dbplyr does not translate base R integer division %/%

from dbplyr.

Comments (15)

commented on May 29, 2024

@edgararuiz commented on Sep 4, 2017, 6:17 PM UTC:

Hi @emilyriederer , does a regular using forward slash by itself works in you environment:

data <- mutate(data, z = x / 5)

from dbplyr.

commented on May 29, 2024

@emilyriederer commented on Sep 15, 2017, 1:29 AM UTC:

Hi @edgararuiz -- unfortunately not.

It might work just as happenstance due to integer division, but the SQL translation also turns 5 into a floating point:

SELECT "x", "x" / 5.0 AS "z" FROM "data_db"

from dbplyr.

commented on May 29, 2024

@edgararuiz commented on Sep 15, 2017, 1:31 AM UTC:

Ok, can we try appending an "L" to the right of 5?

data <- mutate(data, z = x / 5L)

from dbplyr.

commented on May 29, 2024

@hadley commented on Oct 23, 2017, 4:43 PM UTC:

Minimal reprex

dbplyr::translate_sql(x %/% 5)
#> <SQL> "x" / 5.0

from dbplyr.

commented on May 29, 2024

@hadley commented on Oct 23, 2017, 4:46 PM UTC:

To make equivalent to 1 %/% 0.2 will probably need to implement via modulo arithmetic.

x == (x %/% m) * m + (x %% m)  
x %/% m == (x - (x %% m)) / m

from dbplyr.

commented on May 29, 2024

@hadley commented on Oct 23, 2017, 10:35 PM UTC:

But this needs to be thought through correctly - I have a vague recollection that negative values might cause issues.

from dbplyr.

commented on May 29, 2024

@hadley commented on Oct 24, 2017, 10:11 PM UTC:

sql_int_div <- function() {
  function(x, m) {
    build_sql("((", x, " - (", x, " % ", m, ")) / ", m, ")")
  }
}

That definition gets us pretty close, but it turns out that %% in R and % in SQL have slightly different semantics when the signs are different:

library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)

df <- tibble(
  x = c(10, 10, -10, -10), 
  y = c(3, -3, 3, -3)
)
df %>% mutate(x %% y, x %/% y)
#> # A tibble: 4 x 4
#>       x     y `x%%y` `x%/%y`
#>   <dbl> <dbl>  <dbl>   <dbl>
#> 1    10     3      1       3
#> 2    10    -3     -2      -4
#> 3   -10     3      2      -4
#> 4   -10    -3     -1       3

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
mf <- con %>% copy_to(df)
mf %>% mutate(x %% y, x %/% y)
#> # Source:   lazy query [?? x 4]
#> # Database: sqlite 3.19.3 [:memory:]
#>       x     y `x%%y` `x%/%y`
#>   <dbl> <dbl>  <dbl>   <dbl>
#> 1    10     3      1       3
#> 2    10    -3      1      -3
#> 3   -10     3     -1      -3
#> 4   -10    -3     -1       3

I'm not sure how to handle this :(

from dbplyr.

commented on May 29, 2024

@alex-gable commented on Jan 26, 2018, 8:44 AM UTC:

This is a result of C98 and beyond (and by extension SQLite) using truncated division where the modulo operator takes the sign of the dividend, and R using the mathematically preferred floored division with the modulo sign taking the sign of the divisor.

Quite frankly, C(SQLite) and R are doing fundamentally different arithmetic. There's some fascinating reading on the subject here and of course an abridged version on Wikipedia. This will likely also vary across SQL dialects, which makes it more difficult to pin down in a unified way.

Given the above, I'm not sure it is reasonable to expect equivalent output in every language. Python covered some of the complexities in this discussion in PEP-228: Reworking Python's Numeric Model and PEP-238: Changing the Division Operator, highlighting that this is not just a "dplyr issue", but rather a significantly larger architectural decision in R and computer arithmetic itself.

@emilyriederer, In the case of RedShift, a python udf could be constructed in your database leveraging numpy to replicate the output from R.

from dbplyr.

commented on May 29, 2024

@hadley commented on Jan 26, 2018, 1:37 PM UTC:

@alex-gable thanks for that awesome summary of the problem!

from dbplyr.

commented on May 29, 2024

@hadley commented on Jun 7, 2018, 11:37 PM UTC:

I think the best way to handle this is simple to document it.

from dbplyr.

commented on May 29, 2024

@emilyriederer commented on Jun 9, 2018, 6:51 PM UTC:

Wow - thank you all for the very helpful, detailed responses. All of the context here is fascinating. I'm embarrassed to discover that I completely "went dark" on this thread. Somehow, I'm not getting notifications but luckily spotted this atop the new GitHub feed. Thanks again!

from dbplyr.

jsta commented on May 29, 2024

I was able to force non-integer operation by adding 0.0 to one of the variables. In contrast, as.numeric seemed to have no effect:

library(dplyr)

data <- data.frame(x = as.integer(1:5), y = as.integer(5:1))
data_db <- copy_to(dbplyr::src_memdb(), data, "data_db")

mutate(data_db, z = as.numeric(x)/y)
#> # Source:   lazy query [?? x 3]
#> # Database: sqlite 3.22.0 [:memory:]
#>       x     y     z
#>   <int> <int> <int>
#> 1     1     5     0
#> 2     2     4     0
#> 3     3     3     1
#> 4     4     2     2
#> 5     5     1     5
mutate(data_db, z = (x + 0)/y)
#> # Source:   lazy query [?? x 3]
#> # Database: sqlite 3.22.0 [:memory:]
#>       x     y     z
#>   <int> <int> <dbl>
#> 1     1     5   0.2
#> 2     2     4   0.5
#> 3     3     3   1  
#> 4     4     2   2  
#> 5     5     1   5

from dbplyr.

hadley commented on May 29, 2024

Oh but there’s a DIV function we could use instead

from dbplyr.

hadley commented on May 29, 2024

SQL server: need to cast (https://stackoverflow.com/questions/3443672)
mysql: DIV https://dev.mysql.com/doc/refman/8.0/en/arithmetic-functions.html#operator_div; but example has different results to R, c(5 %/% 2, -5 %/% 2, 5 %/% -2, -5 %/% 2)) -> c(2, -3, -3, -3)
postgres: / truncates.

So I think overall this is just too complicated, and not worth providing a translation. I'll just make %/% return a clean error (since otherwise it's translated to /, which isn't correct)`

from dbplyr.

rsund commented on May 29, 2024

FYI, @hadley, it seems that the wanted results can be achieved by using FLOOR(x/y) for the integer division and by x-y*FLOOR(x/y) for the modulo.

from dbplyr.

dbplyr does not translate base R integer division %/% about dbplyr HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent