daroczig / botor Goto Github PK
View Code? Open in Web Editor NEWReticulate wrapper on 'boto3' with convenient helper functions -- aka "boto fo(u)r R"
Home Page: https://daroczig.github.io/botor
Reticulate wrapper on 'boto3' with convenient helper functions -- aka "boto fo(u)r R"
Home Page: https://daroczig.github.io/botor
First of all, great idea to create an R-wrapper for boto3 to directly communicate with s3 within scripts!
I have found the s3_ls function not working in version 0.3.0 with R version 4.0.3 on Windows 10 (tested also in linux environments with same effect). The error message is:
Error in data.frame(bucket_name = uri_parts$bucket_name, key = object$data$Key, :
arguments imply differing number of rows: 1, 0
We have encountered a situation where
Line 274 in 652996b
Error in data.frame(bucket_name = uri_parts$bucket_name, key = object$data$Key, :
arguments imply differing number of rows: 1, 0
I ran browser()
in the above function and discovered that $Owner
was not defined in objects[[1]]$meta$__dict__
. Commenting out the above line (L274) solved the issue. I did not create this object in question so I'm not sure if it's possible to create an object without an owner or if perhaps the credentials I'm using do not have permissions to view who the Owner is. (All of the other metadata referenced in that code chunk is there. )
It seem that when trying to read an xlsx file with the openxlsx
package doesn't work. botor::s3_read("s3://mybucket/example_file.xlsx", fun=openxlsx::read.xlsx)
results in Error: openxlsx can only read .xlsx files
I can confirm that reading the file locally (openxlsx::read.xlsx("example_file.xlsx")
) works fine. And readxl
's read_xlsx
also works with botor. I would probably prefer readxl
over openxlsx
anyway, but it would be useful to understand why these aren't working nicely together.
When I am working with boto3
I can connect to AWS by typing aws sso login
in the terminal.
However, when I am trying this with botor
, its not working.
Example
# Go through the sso process via browser
system("aws sso login")
s3 <- botor::botor_client("s3")
pr<-s3$list_objects_v2(Bucket = "<BUCKET>",Prefix = "<PREFIX>",Delimiter = "/")
Error: botocore.errorfactory.NoSuchBucket: An error occurred (NoSuchBucket) when calling the ListObjectsV2 operation: The specified bucket does not exist
Actual Behavior: s3_ls("s3://my-bucket")
returns NULL in the case where my_bucket
is valid. (ie s3_ls("s3://my-bucket/")
would work)
My expectation would be that this would either work (returning a data frame of objects) or raise an error (similar to if the regex matching fails).
Hi all,
The helper read and write functions are really helpful when interacting with R and AWS S3. However there is a problem when it comes to reading / write large amount of files/R objects to AWS S3. If a the user has AWS KMS enabled on their AWS S3 bucket then AWS KMS is called each time s3_read/s3_write is used. This means the KMS could be called hundreds of times if s3_read/s3_write is put in a loop. If another boto_session or s3_client was passed as an parameter then the same session can be used over multiple calls to the function s3_read/s3_write. This should reduce a cost to AWS KMS dramatically.
# example of s3_read
s3_read(uri, fun, ..., extract = c("none", "gzip", "bzip2", "xz"), s3_client = s3())
# Example code in practise:
library(botor)
library(data.table)
# create s3 client
s3 <- s3()
# list files to be read:
s3_files <- s3_ls("s3://mybucket/my_files/")
# persisting s3 client session
df = rbindlist(lapply(s3_files$uri, s3_read, fun = fread, s3_client = s3))
This would be SUPER helpful for s3 'compatible' object stores
Hi,
Thanks a lot for this great library.
It works fine, but I am not sure how to realize the following case: Upload a file to S3 with KMS key encryption. E.g. in Python Boto3 I would do the following:
s3.Bucket('test-bucket').upload_file('test.txt', 'experiments/test.txt', ExtraArgs={"ServerSideEncryption": "aws:kms", "SSEKMSKeyId": "alias/test-key" })
Can this be done already? Or would it be possible to add?
Thank you a lot.
Best regards
Option to have logging of s3 paths when running s3_read/s3_write etc
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.