Giter Site home page Giter Site logo

sixtyfour's People

Contributors

sckott avatar seankross avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

sixtyfour's Issues

Temporary accounts for testing?

It'd be perfect if there was a way to spin up a temporary top level S3 account to run a test suite, then clean it up afterwards INSTEAD OF messing with our real account.

Though unit tests I think will be using cached fixtures anyway, so on CI won't be hitting our real buckets.

Allow for managing separate client objects?

Right now on the dbs branch we allow only one client instance for each of Redshift and RDS.

One needs a new client for either of those services only if there are different credentials; if credentials don't change then the same client can be reused.

How to do this?

  1. Tell users to open up separate R sessions, one for each set of unique credentials.
  2. Require passing in client object into each function that interacts with Redshift/RDS.

Tags and Users

Can sixtyfour::aws_users() include a list column called Tags that contains dataframes with columns Key and Value? Same with aws_user()?

Dealing with credentials

I was looking at addressing this comment about checking an env var, and then realized that paws has a number of different ways to find user auth details. So we can't just look for env vars.

However, we still need to have access to some credentials for our own use within this package. e.g, the link above where we want to get the aws region the user has set in their creds.

We can hack getting access key and secret key via calling this anonymous function in an environment, but that's hacky for sure. and it doesn't list the aws region either

s3 <- paws::s3()
s3$.internal$config$credentials$provider[[2]]()

Perhaps there's a way in paws to fetch user creds somehow, and I just haven't found it yet

Permissions framework?

Thinking about this from the perspective of this image

Screenshot 2023-11-08 at 9 13 18 AM

from the youtube video sean shared

Here's what I'm thinking:

  • suite of fxns for users (already in the works) - aws_user*/aws_users*
  • suite of fxns for groups - aws_group*/aws_groups*
  • suite of fxns for roles - aws_role*/aws_roles*
  • suite of fxns for policies - aws_policy*/aws_policies* - some of these fxns used for attaching policies to users, groups, roles

so in the end we could have a workflow like:

# in each case below aws_policy_attach determines from input whether
# its a group, role, or user. And prefixes policy with `arn:aws:iam::aws:policy`
aws_group_create("testers") %>% aws_policy_attach("ReadOnlyAccess")
aws_role_create("ReadOnlyRole") %>% aws_policy_attach("ReadOnlyAccess")
aws_user_create("jane") %>% aws_policy_attach("AdministratorAccess")

# or if already created, then:
aws_role("ReadOnlyRole") %>% aws_policy_attach("ReadOnlyAccess")

Another example

aws_group_add_users(group = "testers", 
  aws_user_create("jane"),
  aws_user_create("sally"),
  aws_user_create("susy")
)

@seankross feedback plz

Bucket (and file?) policies

At least I currently don't have permission to modify bucket ACLs, so can't test and make sure that aws_bucket_acl_modify works.

Perhaps with the new test AWS account i'll be able to test this.

Database functionality

What kinds of functions do we want to provide

  • Setup a simple database; make sane choices for users, with assumption that users needs are for basic data science tasks
  • Change permissions for a database, table, or row
  • Fetch a key to use to connect to a database, e.g.:
rds <- paws::rds()
token <- rds$build_auth_token(endpoint, region, user)
# then token passed to DBI::dbConnect()

Things we will leave as an exercise for the user

Everything else, but we SHOULD document in vignettes how to do certain things:

  • Table management
  • Simple queries

"cookbook" docs

I'm thinking we should have a set of docs like these as a baseline, and then more of a "cookbook" set of docs based on the interactions that we find people using the most, but that's down the line.

Originally posted by @seankross in #25 (review)

Add tests

  • http request caching: looks like paws is using httr under the hood https://cran.r-project.org/web/packages/paws.common/index.html - should be able to use vcr to cache requests, speed up tests, hide secrets, etc.
  • be very sure no secrets are in fixtures
  • aws env vars need to be present within this github repo, but they don't need to be real
  • what else?

Billing improvements

user stories

  • people want to be able to interact with their data via dplyr, etc. - Do we already allow that? Do we document how to do that enough?

How to handle data from paws functions

The results of paws calls are generally named lists, sometimes nested. What should we do to these data before giving back to users:

  1. Just give them what we get back from paws - a named list, in most cases
  2. Coerce named lists to an S3 object that summarizes the list and hides the full list underneath?
  3. Coerce named lists to a vctrs object that summarizes the list and hides the full list underneath?
  4. Coerrce named lists to tibbles?
  5. Something else?

There's other cases where what we return is more clear cut. Eg.., in a fxn that checks if a bucket exists we give back a boolean

@seankross thoughts?

Stumbling blocks and avoiding them

The list (add new ones as needed) and how to help users avoid them:

  • authentication
  • leaving an expensive process running
  • messing up permissions

How to do error handling with paws?

Some error messages that I've seen thus far in paws are not going to be useful to the typical sixtyfour user. e.g.,

Below, probably users won't be familiar with http status codes, though I know that a 404 means not found, so I can intuit that something wasn't found, but was it the bucket or the key? In addition, the error message is not useful to the user at all.

aws_file_attr(bucket = "s64-test-2", key = "doesntexist")
#> Error: SerializationError (HTTP 404). failed to read from query HTTP response body

Though sometimes the error messages are good:

aws_bucket_list_objects(bucket="s64-test-211")
#> Error: NoSuchBucket (HTTP 404). The specified bucket does not exist

and another good error message

desc_file <- file.path(system.file(), "DESCRIPTION")
aws_file_upload(bucket = "not-a-bucket", path = desc_file)
#> Error: BucketAlreadyExists (HTTP 409). The requested bucket name is not available. 
#> The bucket namespace is shared by all users of the system. Please select a different name and try again.

@seankross just a placeholder to maybe deal with this, any thoughts welcome

Add Getting Started vignette

  • Ideally this uses our new testing AWS account so its easier to build this and other vignettes and not worry about the other prod account

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.