r-dbi / bigrquery Goto Github PK
View Code? Open in Web Editor NEWAn interface to Google's BigQuery from R.
Home Page: https://bigrquery.r-dbi.org
License: Other
An interface to Google's BigQuery from R.
Home Page: https://bigrquery.r-dbi.org
License: Other
Initial state: Connection from bigrquery works, no service token set
Revoke access via https://myaccount.google.com/security#connectedapps
Eventually, I'm seeing the following as a result from query_exec():
Auto-refreshing stale OAuth token.
No encoding supplied: defaulting to UTF-8.
Error: Invalid Credentials
In addition: Warning message:
Unable to refresh token
9: stop(out$err$message, call. = FALSE) at request.r#67
8: process_request(req) at request.r#34
7: bq_post(url, body) at jobs.r#60
6: insert_query_job(query, project, destination_table = destination_table,
default_dataset = default_dataset) at query.r#32
5: query_exec(sql, project = project) at oauth.R#5
4: eval(expr, envir, enclos)
3: eval(ei, envir)
2: withVisible(eval(ei, envir))
1: source("~/git/R/bigrquery/oauth.R", echo = TRUE)
The second call of query_exec() finally opens the authentication page in the browser
insert_upload_job()
asserts that values
is a data frame
Putting bigrquery into a shiny app running on shiny server returns the error "interactive() is not TRUE".
There is already this open issue but I think this one is slightly different for two reasons.
First, the error message returned is different than the error message in given by httr::oauth_listener
if the session is not interactive.
Second, I've attempted to override both the base::interactive
and httr::oauth_listener
following directions I got on StackOverflow and neither has done anything. Although I do expect the directions might be wrong.
Per @hadley, he was looking for a way to get the result schema for a particular query prior to executing it.
Unfortunately the BigQuery API does not support this today. Per advice from @fhoffa:
http://stackoverflow.com/questions/26265362/bigrquery-error-400-on-authorization
I'm running this code:
devtools::install_github("assertthat")
devtools::install_github("hadley/bigrquery")
library(bigrquery)
project <- "four-iq" # put your projectID here
sql <- 'SELECT year, month, day, weight_pounds FROM [publicdata:samples.natality] LIMIT 5;'
data <- query_exec(sql, project = project)
Then I get to a page with this URL:
The page says:
400. That’s an error. Error: invalid_scope Some requested scopes were invalid. {invalid=["https://www.googleapis.com/auth/cloud-platform"),
c("https://www.googleapis.com/auth/bigquery",]}
Learn more Request Details response_type=code scope=c("https://www.googleapis.com/auth/bigquery", "https://www.googleapis.com/auth/cloud-platform") redirect_uri=http://localhost:18491/custom/OAuth/cred state=PEzwlZnwc1 client_id=465736758727.apps.googleusercontent.com
The URL seems to be the problem so I tried to manually fix it then I get to authorize the App.
Then I get this Error inside RStudio:
Error in config(token = token) : Unknown RCurl options: token
Any idea how to get bigrquery working?
At least delete and insert, to support one-off datasets for testing.
https://cloud.google.com/bigquery/docs/reference/v2/datasets
Since bigquery seems to like using unix timestamps in milliseconds (at least that's how we set it up, but also many of the bigquery time functions deal with msecs), and since R integers are limited to 32 bits (and since it's not 1970 anymore) I'm wondering if changing the conversion in extract_data to use as.numeric would be a good idea.
Part of the problem here is due to an inconsistency with how as.integer works. The man page states that:
The answer will be ‘NA’ unless the coercion succeeds. Real
values larger in modulus than the largest integer are coerced to
‘NA’ (unlike S which gives the most extreme integer of the same
sign).
But it turns out that as.integer('1414040723000') returns 2^31 - 1 instead of NA (applying as.numeric first returns NA).
It seems like this is a problematic failure mode - if you see a column of NAs, you realize something failed, whereas a column of maxint is less clearly wrong.
Of course there are lots of workarounds - I was just surprised by this behavior.
Hello - thanks for developing bigrquery it's been a fantastic help. I'm having an issue extracting data from a column containing comma separated data.
mutate(df , PL_REF = as.integer(str_extract(N2_PL,"([0-9]+),[0-9]+,[0-9]+")))
Source: bigquery [201573431837:bamsurgeon]
From: variants3 [?? x 31]
Filter: call_method != "jointtest"
Error: Encountered " <STRING_LITERAL> "\'([0-9]+),[0-9]+,[0-9]+\' "" at line 1, column 368.
Was expecting:
")" ...
query invalidQuery. Encountered " <STRING_LITERAL> "\'([0-9]+),[0-9]+,[0-9]+\' "" at line 1, column 368.
Was expecting:
")" ...
I'm able to exectute a working query in bigquery directly as such:
SELECT INTEGER(PL_REF) AS PL_REF, INTEGER(PL_HET) AS PL_HET, POW(INTEGER(PL_ALT),10) AS PL_ALT FROM
(SELECT
REGEXP_EXTRACT([N2_PL], '([0-9]+),[0-9]+,[0-9]+') AS [PL_REF],
REGEXP_EXTRACT([N2_PL], '[0-9]+,([0-9]+),[0-9]+') AS [PL_HET],
REGEXP_EXTRACT([N2_PL], '[0-9]+,[0-9]+,([0-9]+)') AS [PL_ALT]
FROM [bamsurgeon.variants3])
LIMIT 10
Also note the power function for the third variable (PL_ALT), I was only able to execute this directly as well. bigrquery doesn't recognize **
or ^
from what I have seen, but does appear to recognize POW, although without being able to convert successfully to an integer it will not function.
Thanks!
> query_exec("SELECT date('2015-01-01') as a", "bigrquery-1168") %>% dput
structure(list(a = "2015-01-01"), .Names = "a", row.names = 1L, class = "data.frame")
DBItest: "^data_date_.*"
length(list_tables(project, dataset, max_results = 1001))
[1] 1000
I have over 3000 tables in the dataset and could really use all of their names.
project_id vs. project
dataset_id vs. dataset
To me, the _id suffix looks unnecessary, but stripping it everywhere might create incompatibilities.
How do I clear the wrong credential so that the query will re-prompt me to enter the right credential? I tried to start my AWS or computer... it all doesn't seem to clear the memory. Many thanks!
The new oauth2-related improvements in httr
have changed how tokens need to be passed around -- in particular, we need to pass in a token object (so it can be refreshed) instead of just a signature.
I'm starting to do this now -- opening this issue for tracking.
Should check that on windows (which doesn't use UTF-8) standardise_csv()
converts to UTF-8 on write, and maintains UTF-8 encoding status on read
When result sets are especially large, BigQuery returns an error ("Response too large to return.").
This is easily handled by specifying a couple of options in the query body (allowLargeResults = TRUE
and specifying a destination table).
I'm happy to submit a PR for it, but didn't know if anyone else was running into the limitation with larger queries.
using Devtools version v1.6 produces the below error on a mac when installing using 'devtools::install_github("bigquery")':
"Error in username %||% getOption("github.user") %||% stop("Unknown username.") :
Unknown username."
On a pc results in the following warning:
'Username parameter is deprecated. Please use /bigrquery'
Linux command line R with bigrquery returns "Error:", as in error with no details. Does anyone have any idea what I might be doing wrong? I just ran an update for all installed packages. Could this be an issue with a new package?
when connecting to the RStudio server version of this package I've found a problem:
the first request is required to allow access to the BigQuery in browser, but after giving permission, browser redirects to localhost:1410 with an error 400. R session is opened on server 192.241. ... , but RStudio listening to 1410 port without any reaction on given permissions
The same problem arises when I'm trying to set_access_cred
Is this problem caused by Google API, or server configs, or R-package configs?
Seems recently I updated some packages in in Rstudio, then I always get this error:
> library(bigrquery)
> billing_project <- "341409650721" # put your project number here
> sql <- "SELECT year, month, day, weight_pounds FROM natality LIMIT 5"
> query_exec("publicdata", "samples", sql, billing = billing_project)
Error: Job configuration must contain exactly one job-specific configuration object (e.g., query, load, extract, spreadsheetExtract), but there were 0:
After several tests, I found when I upload a data.table with function insert_upload_job
, if the cell in last row and last column is empty (NA in R), I got this error:
Running query: RUNNING 21.5s
Error: Too many errors encountered. Limit is: 0.
File: 0 / Line:28 / Field:1 invalid. Bad character (ASCII 0) encountered: field starts with: <U+003c><U+003e>
invalid. Too many errors encountered. Limit is: 0.
Note: Line: 28 is the last line, which is an empty line.
Here are last several rows of the data table that didn't pass (copied from temp .csv file generated)
"2092558",1,20000,1,1410739200,1410739200,,,
"2099381",1,25000,1,1410739200,1410739200,,,
"2100055",1,10000,1,1410739200,1410739200,,,
"2104395",1,70000,1,1410652800,1410652800,,,
"2106301",1,15000,1,1410739200,1410739200,1,15000,425
"2113586",1,20000,1,1410739200,1410739200,,,
"2117141",1,50000,1,1410739200,1410739200,,,
"2118506",1,10000,1,1410739200,1410739200,,,
Here are 2 examples passed:
This one I added a new Date column in last column
"2099381",1,25000,1,1410739200,1410739200,,,,1410912000
"2100055",1,10000,1,1410739200,1410739200,,,,1410912000
"2104395",1,70000,1,1410652800,1410652800,,,,1410912000
"2106301",1,15000,1,1410739200,1410739200,1,15000,425,1410912000
"2113586",1,20000,1,1410739200,1410739200,,,,1410912000
"2117141",1,50000,1,1410739200,1410739200,,,,1410912000
"2118506",1,10000,1,1410739200,1410739200,,,,1410912000
I noted the error was only from the last empty line, so I removed the last rows with NA values. It worked. (the last empty line was still there.)
"2089603",1,22000,1,1410739200,1410739200,,,
"2090306",1,25000,1,1410652800,1410652800,,,
"2092558",1,20000,1,1410739200,1410739200,,,
"2099381",1,25000,1,1410739200,1410739200,,,
"2100055",1,10000,1,1410739200,1410739200,,,
"2104395",1,70000,1,1410652800,1410652800,,,
"2106301",1,15000,1,1410739200,1410739200,1,15000,425
currently cannot be returned.
> query_exec("SELECT 10000000000", "bigrquery-1168")
f0_
1 NA
Warning message:
NAs introduced by coercion to integer range
> query_exec("SELECT 10000000000.1", "bigrquery-1168")
f0_
1 1e+10
https://github.com/rstats-db/bigrquery/pull/77/files#r50803431. Search the code for ".*/datasets/
To avoid wasting time/money
I am trying to load a CSV file in R then upload it to BigQuery.
The original data in this column is:
V.O.I "LOVASOA AROVANA"
After load it to R, the string display as:
[1] "V.O.I \"LOVASOA AROVANA\""
Then when I try to upload this table to BigQuery, I got this issue in this field:
File: 0 / Line:151377 / Field:3 invalid. Data between close double quote (") and field separator: field starts with: <U+003c>V.O.I \<U+003e>
Is it possible to handle the double quotes in bigrquery automatically? or should I do some work around to solve this issue?
DBItest: "roundtrip_timestamp"
1. Failure (at ~/git/R/DBItest/R/test_sql.R#566): DBItest: roundtrip_timestamp
tbl_in is not identical to tbl_out[order(tbl_out$id), ]. Differences:
Objects equal but not identical
Error: Failures detected.
In addition: Warning messages:
1: In check_tzones(target, current) : 'tzone' attributes are inconsistent
2: In check_tzones(target, current) : 'tzone' attributes are inconsistent
Execution halted
I want to perform a query and grab a large amount of data from one table, then add it to another table that already contains data using R.
So if I select a table that already exists I get "Error: Already Exists: Table ic..."
In Bigquery what is happpening is we are
Can we add this functionality to bigrquery's query_exec? Is it available in the BigQuery API? I imagine it gets complicated when the tables are different.
According to the function code allowLargeResults is set to True when the destination_table option is used. However, I've run a query with destination_table set and I still get the following error:
Error: Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more details, see https://cloud.google.com/bigquery/querying-data#largequeryresults
responseTooLarge. Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more details, see https://cloud.google.com/bigquery/querying-data#largequeryresults
Any ideas why?
It says that we need to do
devtools::install_github("bigrquery")
We need to do
devtools::install_github("bigrquery",username="hadley/bigrquery")
cheers,
I am making a Shiny App that calls to a database on the BgQuery and am having trouble with getting credentials for other users. When I run query_exec from R, it asks if I want to authorize and then opens a web tab for me to click the allow button. When I try to run it from the shiny server with someone else's project id, it throws an error, "Access Denied".
How do I automate this using the get_access_cred function?
Dear hadley,
I am trying reproduce the example in the README.md with a dataset of mine, but authentication first completes succesfully and then fails because of a "Invalid access credentials have been reset" error. This happens with both Google's sample data and mine.
Any hints? Thanks!
Giacecco
> library(bigrquery)
> query_exec("<project name>", "<dataset name>", sql, billing = billing_project)
Loading required package: Rook
Loading required package: tools
Loading required package: brew
Waiting for authentication in browser...
Authentication complete.
Loading required package: rjson
Error: Invalid access credentials have been reset. Please try again.
>
First I am a beginner with R. I get an error when I run the following from Rstudio (doesn't happen in R directly)
billing_project <- "" # put your project number here
sql <- "SELECT * FROM [table] LIMIT 5"
query_exec("[project]", "[dataset]", sql, billing = billing_project)
I get the following error after accepting the request for permission via the browser:
"localhost 1410 custom OAuth cred"
What code do I need to add to get around this issue?
has_project <- function() {
env <- Sys.getenv("BIGRQUERY_PROJECT")
if (env == "") {
message("No default billing project found. Either set `billing` argument ",
"in individual calls, or set BIGRQUERY_PROJECT environment variable")
FALSE
} else {
TRUE
}
}
if (has_project()) {
sql <- "SELECT year, month, day, weight_pounds FROM natality LIMIT 5"
query_exec("publicdata", "samples", sql)
}
# Or even make billing = default_billing()
# And then the examples could like this:
sql <- "SELECT year, month, day, weight_pounds FROM natality LIMIT 5"
query_exec("publicdata", "samples", sql)
The BigQuery API just changed; now, when pulling the results of a list table, if there are no more pages to be read the pageToken is no longer returned. This breaks the list_tabledata() function in tabledata.r.
Craig Citro and I are looking into a fix for this (shouldn't be too bad).
Hi,
The error:
"Error in rbind(deparse.level, ...) :
les nombres de colonnes des arguments ne correspondent pas" (~column number mismatch).
I runned the inner steps of the function. The line causing trouble is:
"do.call("rbind", lapply(data, as.data.frame, row.names = 1L))"
The data is list of vectors of different lenghts thus the list cant be bound to produce a dataset.
This line has no impact on the expected output of list_tables. Would it be possible to remove it?
Best
This should mostly involve providing methods for DBI generics. This will make dplyr implementation easy as it will be able to use more default methods.
Hi,
I am brand new into big query and r world. I am trying to access some sample data from big query within R. I followed the examples here and some online, but got this error while trying to execute the query inside R. Could you please share ideas on what could be wrong? Here is the query I am trying to run:
data1<-query_exec(sql,project,dataset)
If there is any tutorial on how to use bigrquery that you are share, that would be great!
Thanks!
From what I can tell, bigrquery auth doesn't store the oauth2 refresh token, just the access token. The access token is only valid for a brief amount of time, which means you need to re-authorize periodically. If you store the refresh token, you shouldn't have to re-authorize.
For server-side implementations where browser confirmation is not possible, a server side auth can be really handy, as per described below using client ID and a keystore:
https://developers.google.com/bigquery/authorization#service-accounts-server
Hey there Hadley,
Another way to use BigQuery without going to Google's webapp is by using bq.
With bq
, you basically locally save a file ~/.bigquery.v2.token
which in the latest version (doesn't seem documented on Google's side) is nothing more than a json
configuration file containing your OAuth2 credentials (tokens, user-agent, etc).
The second file of interest that one needs to have when using bq
is ~/.bigqueryrc
which contains the default project id ( billing_id for query_exec(...)
), and the aforementioned token file.
Now, when those files are present, would you be interested in seeing the web-authentication replaced by those files? The flow would be quite simple:
If ~/.bigquery.rc exists:
use content for authentication
If authentication() fails:
go to Web-Browser authentication
Else default to web-browser authentication()
I've run the following code:
sql<-"SELECT visitorId FROM [table]"
project<-"project-name"
query_exec(sql, project)
and received the following error:
Error in token$sign : $ operator is invalid for atomic vectors
Would anyone be able to shed some light on whats causing this?
Thanks
I am using bigrquery version 0.1.0.9000 version. I found two issues -
Fixes #45.
Hello,
Recent update of httr to 1.0.0 switched httr
to curl
instead of RCurl
.
This update breaks the basic functionality of bigrquery
at the very least with respect to tbl
function. In order to confirm, please update to httr 1.0.0
and run the following to confirm:
library(dplyr)
library(bigrquery)
BILLING_PROJECT <- "<YOUR_BILLING_PROJECT>"
BILLING_DATASET <- "<DATASET_INSIDE_THAT_PROJECT>"
bq <- src_bigquery(BILLING_PROJECT, BILLING_DATASET)
# Upload the test data frame to your project
df <- data.frame(a = 1:100, b = 101:200)
httr_issue <- copy_to(bq, df, name = "httr_issue")
httr_issue2 <- bq %>% tbl("httr_issue")
which will produce the following error if httr
is version 1.0.0:
Error in curl::curl_escape(names(url$query)) :
`url` must be a character vector.
and will work correctly if httr
is 0.6.1
.
The issue is caused by difference in behavior of curlEscape with NULLs for curl
vs RCurl
. Note that curl::curl_escape(NULL)
results in an error while RCurl::curlEscape
produces an empty vector.
> curl::curl_escape(NULL)
Error in curl::curl_escape(NULL) :
`url` must be a character vector.
> RCurl::curlEscape(NULL)
character(0)
As of now, the quick fix is to downgrade httr
back to v0.6.1
:
devtools::install_github("hadley/[email protected]")
Hi. I am not sure if this has been reported. I have not seen this error on the closed issues or under the open issues. So I will post.
here is sample data that receives an error. I get the same error when using my own personal data, but I can't post this because it is for a private google data base, with billing information etc. so please see this sample data set. Thank you in advance!
library(bigrquery)
trying Google sample datasets
bq_db = src_bigquery('publicdata', 'samples')
bq_db
% src: bigquery [publicdata/samples]
% tbls: github_nested, github_timeline, gsod, natality, shakespeare, trigrams, wikipedia
d = tbl(bq_db, 'github_nested')
% Error in UseMethod("sql_select") :
% no applicable method for 'sql_select' applied to an object of class "bigquery"
nrow(query_exec("SELECT * FROM [publicdata:samples.wikipedia] limit 300000", project, max_pages = Inf))
Doesn't get me 300 000.
It seems that max_pages = Inf does'nt work.
I have been getting this issue since today.
Hi,
I try to run Rscript file.r
on my Ubuntu, with the following code in the first few lines of the file.
library(bigrquery)
project <- "project"
sql <- "SELECT * FROM [publicdata:samples] LIMIT 250"
query_exec(sql, project = project)
If i do this on the R shell it works all out, but by running the script I get this error back.
Error in stopifnot(is(token, "refClass"), is.oauth_endpoint(endpoint), :
could not find function "is"
Calls: query_exec ... get_access_cred -> oauth2.0_token -> new_token -> stopifnot
Execution halted
How can i make it work in a script?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.