Placeholder issue until we are ready for a review.

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

/cc <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Setting some expectations: I'm a lead in GCS, not a Ruby maven

From reading the <a href="http://googlecloudplatform.github.io/gcloud-ruby/docs/master

Also, reading the <a href="https://cloud.google.com/storage/docs/json_api/v1/how-tos/u

Get final review of Storage API by Cloud Storage team from Google about google-cloud-ruby HOT 17 CLOSED

googleapis commented on July 26, 2024

Get final review of Storage API by Cloud Storage team from Google

from google-cloud-ruby.

Comments (17)

blowmage commented on July 26, 2024

@jgeewax Who does this need to be assigned to?

from google-cloud-ruby.

jgeewax commented on July 26, 2024

Assigning to me to find the right person. Should have someone soon.

from google-cloud-ruby.

jgeewax commented on July 26, 2024

/cc @thobrla

from google-cloud-ruby.

jgeewax commented on July 26, 2024

/cc @Capstan

from google-cloud-ruby.

Capstan commented on July 26, 2024

Setting some expectations:

I'm a lead in GCS, not a Ruby maven. Bear with my n00bitidity.
I am likely to have time for this first on Monday, 5/11.

from google-cloud-ruby.

blowmage commented on July 26, 2024

Great! Please let me know if you have any questions.

from google-cloud-ruby.

Capstan commented on July 26, 2024

From reading the Gcloud::Storage docs:

Why name the class File for a GCS object? Is that not confusing vis-a-vis the Core File class? Java uses StorageObject to differentiate between Object, and that seems useful here too to differentiate between Ruby's Object
file.delete() permanently deletes the file only if versioning is not on. Otherwise, it'll create an archive version, accessible only by generation.
file.download() – nice use of verification!
Is there an IO::generic_readable or IO::generic_writable accessor planned?
file.copy() – why not just have it take another Storage.File object?
file.signed_url() – what does this do?
bucket.create_file()
- s/265/256/
- I think we might want to default to a larger chunk size for performance, maybe 2MB. Will clients be able to handle that? We advise that you keep the chunk size as large as possible.
- What are the options? Can you override what the file system guesses the Content-Type is?
bucket.default_acl() – prefer default_object_acl. Or is the idea that this is the acl for "contained" things?
bucket.files()
- does this do pagination for you under the covers?
- what are the criteria? Does the consumer know/care that you are doing client-side filtering?
bucket.find_file() – can this find non-existent objects? as in one you're about to create by using a IO::generic_writable? Or does it only refer to extant objects?
Buckets are missing some misc. config options, like setting lifecycle configuration, website configuration, versioning.

from google-cloud-ruby.

Capstan commented on July 26, 2024

What is the way a gcloud-ruby consumer adds their own application (or tool) name/version to the UserAgent header? Presumably it should look something like MyWebsite/1.0 gcloud-ruby/0.1.0 google-api-ruby-client/0.8.6. Or perhaps the last is subsumed by the second-to-last, if you tie releases to specific underlying clients.

from google-cloud-ruby.

aozarov commented on July 26, 2024

bucket.create_file()

I think we might want to default to a larger chunk size for
performance, maybe 2MB. Will clients be able to handle that? We advise
that you keep the chunk size as large as possible
https://cloud.google.com/storage/docs/json_api/v1/how-tos/upload#chunking

Is there a guideline/recommendation from the gcs team about when it is
preferred to use non-resumable writes and resumable is preferred?
I found one example -
https://cloud.google.com/storage/docs/json_api/v1/objects/insert where it
is suggested for small files (and example is using ~2MB) to use
non-resumable and resumable otherwise.

from google-cloud-ruby.

aozarov commented on July 26, 2024

Also, reading the chunking reference it looks like its being discouraged ("This is not the preferred approach since there are performance costs associated with the additional requests, and it is generally not needed."). I am surprised about that as that was not my impression/experience when working on the appengine_gcs_client (even when taking the AE 10MB up, 32MB down limits).
I guess the only alternative for chunking to make resumable writes meaningful would be to query upon failure and continue the writes from that point. If so, is it guarantied that every writes sent to the service before the failed write are going to be available? Otherwise I am not sure how much written data the client would need to keep in order recover from a failure (and retry) transparently.

from google-cloud-ruby.

thobrla commented on July 26, 2024

Querying upon failure and continuing is desirable with seekable data; for non-seekable data, chunking+buffering is necessary.

from google-cloud-ruby.

blowmage commented on July 26, 2024

Family emergency. I'll try to respond later tonight or over the weekend.

from google-cloud-ruby.

Capstan commented on July 26, 2024

Chunking is not discouraged in that it solves two problems:

GAE has per-HTTP request size limits
For being able to retire client-side write buffer, esp. when the client itself is being streamed data, knowing the committed point is useful. If you get a 308 Resume Incomplete response, you will get the # of bytes stored so far server side and can retire the buffer.*

Aside from those two things, it is inefficient in that it requires re-erecting an HTTP session for every chunk, and so reducing that overhead by increasing chunk size is preferable (single-chunk obviously best). You might be able to, in parallel, perform a request of upload status to see how much the server has committed, and retire the buffer that way, but that will be a conservative number and not necessarily strongly consistent with an ongoing upload session.

As to aozarov@'s question, the canonical definition of what has been committed is what is returned from a chunk write in the Range header, so yes, every previous 308 Resume Incomplete write is committed. The client should get a 400 level error if they try to commit a partial chunk not at the end of the file.

*There is the possibility that you will end up with some later issue that could cause the upload session to be unresumable, e.g., MD5 mismatch, that would then still abort the whole upload. For true safety, a client would have to buffer the entire amount until the final 201 Created

from google-cloud-ruby.

Capstan commented on July 26, 2024

The break-even point for resumable vs. non-resumable is how much latency the creation of the resumable session incurs vs. the throughput and quality of your network connection to Google. The fatter and consistent the pipe, the bigger the object would need to be to make it worth performing a second round trip, since retransmitting that data is likely to be as fast if not faster in the event of an error. Certainly, in aggregate, due to low error rates, uploading multiple files will be faster to just retry from scratch for small enough objects.

from google-cloud-ruby.

Capstan commented on July 26, 2024

file.download() needs help. Downloading a very large object directly will cause you to OOM, if not risk having your connection be broken. Retrying by redownloading the whole object again is not ideal. Does Ruby allow io-based HTTP responses, or must it fit everything in memory? In the latter case, you'll want to choose an appropriate chunk size, and download it using Content-Range: bytes 0-chunksize/length, repeatedly appending to your file. You will want to pin the download to a specific object generation or risk getting mixed data. If there is an error, you know what offset you had last, and can continue from there.

Advanced forms could parallelize downloads, but I don't know how relevant that is.

from google-cloud-ruby.

Capstan commented on July 26, 2024

It seems like google-api-ruby-client's media.rb is missing download support, so perhaps we should come up with what is appropriate for that level of abstraction and reuse it here.

from google-cloud-ruby.

blowmage commented on July 26, 2024

@Capstan Apologies for the delay. As you said, we are at the mercy of google-api-ruby-client here. Google API Client is built on a library named Faraday, and Faraday does not support streaming downloads. Ruby's stdlib HTTP library does support streaming downloads, but users may configure any number of alternate providers to use, for any number of justifiable reasons.

I'll follow up with @remi and see if there is anything we can do to avoid OOM situations when downloading very large files.

from google-cloud-ruby.

Get final review of Storage API by Cloud Storage team from Google about google-cloud-ruby HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent