Giter Site home page Giter Site logo

Comments (11)

 avatar commented on June 11, 2024

@krlmlr commented on Dec 27, 2017, 8:41 AM UTC:

Thanks. I see your point, it seems that warning before executing the query will be difficult. We should be working on assigning unique column names, perhaps with tibble::repair_names() tibble::set_tidy_names().

from dbplyr.

 avatar commented on June 11, 2024

@bkkkk commented on Dec 28, 2017, 2:09 AM UTC:

I suspected it would be, anyways I think that having the names repaired probably fits better with other dplyr data ingestion / tibble creation workflows.

from dbplyr.

 avatar commented on June 11, 2024

@edgararuiz commented on Dec 28, 2017, 2:07 PM UTC:

Hi, if you use the native dbplyr join command, it will add a .x and .y suffix to your results automatically so you won't encounter that issue, here is an example:

library(dplyr)
library(DBI)
library(tibble)
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")

mtcars %>%
  rowid_to_column() %>%
  dbWriteTable(con, "mtcars", .)

tbl(con, "mtcars") %>%
  inner_join(tbl(con, "mtcars"), by = "rowid" ) %>%
  collect() %>%
  View()

dbDisconnect(con)

I would encourage you to try something like this:

tbl(con, "consumers") %>%
  select(main.id,
         main.phone,
         others.phone) %>%
  inner_join(tbl(con, "consumers", by = c("others.id" = "main.id"))) %>%
  collect()

from dbplyr.

 avatar commented on June 11, 2024

@bkkkk commented on Jan 2, 2018, 11:26 PM UTC:

@edgararuiz thanks for the suggestion but we use SQL queries for interoperability and it lets non-R users review my code. It's also not limited to joins, my example uses a join because the code I was working on before I posted this issue had a join.

The issue will also arise with the following query, you'll get 2 columns named date_trunc, and 2 named sum:

SELECT
   date_trunc('month', SOME_DATE) :: DATE,
   date_trunc('day', SOME_OTHER_DATE) :: DATE,
   sum(SOME_AMOUNT),
   sum(SOME_OTHER_AMOUNT)
FROM
   SOME_TABLE
GROUP BY 1, 2

This issue is a dplyr-specific issue that makes SQL workflows just a little more annoying for no particularly good reason, especially given that other parts of the tidyverse, like your example above, cleans up column names to avoid duplicates.

from dbplyr.

 avatar commented on June 11, 2024

@edgararuiz commented on Jan 3, 2018, 12:17 AM UTC:

Ok, for that, I'd use the x and y suffix in the next command, and then use show_query() to display the resulting SQL so that others can check my work. I'm sure I'm missing something, maybe if you can provide an example of how it would work with a local data.frame would be great!

> library(dplyr)
> library(DBI)
> library(tibble)
> con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
> 
> mtcars %>%
+   rowid_to_column() %>%
+   dbWriteTable(con, "mtcars", .)
> 
> new <- tbl(con, "mtcars") %>%
+   inner_join(tbl(con, "mtcars"), by = "rowid" ) %>%
+   group_by(am.x, am.y) %>%
+   summarise(sum(wt.x),
+             sum(wt.y)) 
> head(new)
# Source: lazy query [?? x 4]
# Database: sqlite 3.19.3 []
# Groups: am.x
   am.x  am.y `sum(wt.x)` `sum(wt.y)`
  <dbl> <dbl>       <dbl>       <dbl>
1  0     0           71.6        71.6
2  1.00  1.00        31.3        31.3
> 
> show_query(new)
<SQL>
SELECT `am.x`, `am.y`, SUM(`wt.x`) AS `sum(wt.x)`, SUM(`wt.y`) AS `sum(wt.y)`
FROM (SELECT `TBL_LEFT`.`rowid` AS `rowid`, `TBL_LEFT`.`mpg` AS `mpg.x`, `TBL_LEFT`.`cyl` AS `cyl.x`, `TBL_LEFT`.`disp` AS `disp.x`, `TBL_LEFT`.`hp` AS `hp.x`, `TBL_LEFT`.`drat` AS `drat.x`, `TBL_LEFT`.`wt` AS `wt.x`, `TBL_LEFT`.`qsec` AS `qsec.x`, `TBL_LEFT`.`vs` AS `vs.x`, `TBL_LEFT`.`am` AS `am.x`, `TBL_LEFT`.`gear` AS `gear.x`, `TBL_LEFT`.`carb` AS `carb.x`, `TBL_RIGHT`.`mpg` AS `mpg.y`, `TBL_RIGHT`.`cyl` AS `cyl.y`, `TBL_RIGHT`.`disp` AS `disp.y`, `TBL_RIGHT`.`hp` AS `hp.y`, `TBL_RIGHT`.`drat` AS `drat.y`, `TBL_RIGHT`.`wt` AS `wt.y`, `TBL_RIGHT`.`qsec` AS `qsec.y`, `TBL_RIGHT`.`vs` AS `vs.y`, `TBL_RIGHT`.`am` AS `am.y`, `TBL_RIGHT`.`gear` AS `gear.y`, `TBL_RIGHT`.`carb` AS `carb.y`
  FROM `mtcars` AS `TBL_LEFT`
  INNER JOIN `mtcars` AS `TBL_RIGHT`
  ON (`TBL_LEFT`.`rowid` = `TBL_RIGHT`.`rowid`)
)
GROUP BY `am.x`, `am.y`

from dbplyr.

 avatar commented on June 11, 2024

@bkkkk commented on Jan 3, 2018, 5:00 AM UTC:

These queries form part of very large shiny applications and other automated reports which are all in github, where the code reviews happen. Using show_query is simply not practical. In addition, and no offense meant at all, but the result of show_query are not as easy to read as a hand-written queries.

This is about a small ease of use improvement when using tbl directly with SQL queries. I think @krlmlr has a sensible approach, instead of dumping the results of a potentially long running query, warn and repair the names.

You can reproduce this with a local data frame but the truth is that the situation is totally artificial since you'd be building the data frame by hand and therefore this is a mistake you've made:

data_frame(x = c(1, 2), x = c(1, 6))

or it's the result of an incorrect call to add_column like this:

data_frame(x = c(1, 2), y = c(1, 6)) %>%
   add_column(x = c(1, 2))

Which actually returns an error as it should and prevents you from overwriting data when you might not intend to.

When using PostgreSQL clients, it's perfectly fine to not name columns and have those auto-generated column names be duplicated. Of course it's bad data-hygiene so when using an SQL query within a tbl call it's correct to not allow this situation, but it's better (IMHO) to have the names tidied and build the data frame than dump the data and waste a lot of processing time (on the DB side) for a column naming issue.

Using dplyr verbs instead doesn't really change anything it just side-steps the issue.

from dbplyr.

 avatar commented on June 11, 2024

@krlmlr commented on Jan 17, 2018, 9:06 AM UTC:

@edgararuiz: Would you agree to just call tibble::set_tidy_names() in collect()?

from dbplyr.

 avatar commented on June 11, 2024

@edgararuiz commented on Jan 17, 2018, 1:43 PM UTC:

Sure, that sounds fine. Thanks!

from dbplyr.

 avatar commented on June 11, 2024

@hadley commented on May 20, 2018, 2:42 PM UTC:

Minimal reprex

library(dplyr, warn.conflicts = FALSE)
mf <- dbplyr::memdb_frame(
  x = c(1,2,2),
  y = c(3,5,NA),
  z = c(NA,'a','b')
)

mf1 <- mf %>% select(x, x = y)
mf1
#> Error: Column `x` must have a unique name

Fixing this requires a call to tibble::set_tidy_names() in collect.tbl_sql() , prior to grouped_df()

from dbplyr.

hadley avatar hadley commented on June 11, 2024

This is now resolved automatically by the new dplyr duplicated column name rules:

library(dplyr, warn.conflicts = FALSE)
mf <- dbplyr::memdb_frame(
  x = c(1,2,2),
  y = c(3,5,NA),
  z = c(NA,'a','b')
)

mf1 <- mf %>% select(x, x = y)
mf1
#> # Source:   lazy query [?? x 2]
#> # Database: sqlite 3.22.0 [:memory:]
#>       x  x..2
#>   <dbl> <dbl>
#> 1     1     3
#> 2     2     5
#> 3     2    NA

Created on 2019-01-02 by the reprex package (v0.2.1)

from dbplyr.

bkkkk avatar bkkkk commented on June 11, 2024

This is awesome thank you!

from dbplyr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.