Comments (8)
I've had some time to think about this and put some code down. Interacting with S3 requires (at least) three pieces of information:
- Object key, analogous to a file path
- Bucket name
- Configuration, including credentials
The first two are easy. The configuration (possibly inspired by boto's Session.client
) is more involved. Refer to . It optionally specifies the following configuration fields, some of which are considered secrets:
On my machine and with my straightforward AWS configuration, I can get away with using the default paws::s3() configuration. This is probably because I've run aws configure
in a terminal. So we do have a sensible default value.
So I'm expecting two target archetypes:
tar_s3_get_object(key, bucket, config = paws::s3(), <other_targets_args>, ...)
tar_s3_push_object(key, bucket, config = paws::s3(), <other_targets_args>, ...)
And then using either the etag/hash or the last modified date to avoid re-downloading. These checks are actually performed server-side, so there's no need for targets
to compare hashes.
The intent here is that ...
is passed to PAWS so that users can deal with things like server-side encryption without it weighing down the targets
syntax. I'll get some draft code together but I fully expect this to take a few rounds of iteration.
from tarchetypes.
Thanks for starting on this!
I agree that most of the config should happen up front and not encumber the target's interface.
After looking at your proposal, I see a couple alternative potential routes. Not sure which one I like more yet.
- Just get and push existing files. This is how I read your proposal. Maybe I am missing an implicit argument for the target's command.
- For the "push" archetype, act more like
tar_target(format = "file")
, which contains some customizable R code to run that returns a file path.
tar_s3_push_object(name, command, key, bucket, config = paws::s3(), ...)
from tarchetypes.
Concerns with my approach:
- How do CRAN feel about using the
system
command like this? - Does the AWS CLI approach work on Windows?
- How do we handle the authentication side of things when uploading data to S3? (Things like server-side encryption). Downloading is generally simpler than uploading with S3.
These issues may be somewhat resolved if we wrap a fully-featured AWS API like the PAWS
package. This has the added bonus of opening up the possibility of other integrations with AWS besides S3.
I'm a bit busy at the moment, despite being in strict lockdown, but I can try to have a look if you'd like?
from tarchetypes.
Yeah, I was hoping to use paws
. If you have time, I would appreciate help and input. I plan to learn paws
for ropensci/targets#152, but you have a huge head start.
from tarchetypes.
Probably best to build this on top of tar_change_raw()
like in #9. Another thought is to use #9 somehow like Miles mentioned here.
from tarchetypes.
I think I may still be in a drake
state of mind, trying to replicate file_in
and file_out
. I'll try to get a better understanding of the targets
equivalent concepts.
from tarchetypes.
Yeah, with targets
, all files are dynamic (e.g. tar_target(format = "file")
).
On reflection, I would actually prefer ropensci/targets#154 if it works out. I think S3 will be more seamless and efficient that way.
from tarchetypes.
Let's go with ropensci/targets#176 instead. I think it's as seamless as Metaflow.
from tarchetypes.
Related Issues (20)
- Local persistence of cloud-backed file targets HOT 5
- `tar_quarto()` always ends normally for quarto project even if there is error HOT 3
- `tar_quarto()` ignores `output_dir` in _quarto.yml when passed an individual file
- combine tar_cue_age with a conditional statement HOT 4
- Rep-specific seeds in tar_rep(), tar_map_rep(), etc. HOT 5
- optional garbage collection between reps of the `tar_rep*()` functions HOT 1
- tar_change repository not considered for change part
- Branches not in metadata: branches out of range
- GitHub interactions are temporarily limited because the maintainer is out of office.
- tar_cross() HOT 2
- Bug: `tar_quarto_rep()` throws an error if used together with `future::plan()` from _targets.R template HOT 1
- Support Quarto profiles? HOT 10
- Expose `tar_render()`, `tar_quarto()` and similar functions to the `deps` argument of `tar_target_raw()` HOT 8
- Errors and warnings with Quarto
- tar_quarto_rep doesn't work on reports in subdirectories HOT 2
- `retrieval = "none"` in quarto target factories HOT 2
- [general] Use `tar_rep()` and `tar_rep2()` inside of `tar_map()` HOT 2
- Allow trailing comma in `tar_map()` HOT 1
- Let `tar_map()` substitute more fields, e.g., `priority` HOT 2
- Safely allow tar_quarto() etc. to run the report from a custom working directory HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tarchetypes.