Comments (5)
@ludwigschubert Under the covers, Blob.exists()
makes a GET
request to the blob's resource URL and converts a 404 response into a False
, 20x into a True
. This implementation does not fit well with the current Batch
design, which fakes the responses inside the context manager, and then applies the sub-responses to the individual targets at exit. See the comment in Blob.exists
.
from python-storage.
To be clear: when this was moved to "feature request" by @frankyn , the implied statement is:
there is no way to use blob.exists() within a batch context
Is this correct?
Thanks!
from python-storage.
For @ludwigschubert and those who arrive via Google.
I found a simple fix for this: wrap each call for blob.exists() in a try/catch.
def to_parallelize_catch_exceptions(f, on_error_value, *args):
try:
return f(*args)
except:
return on_error_value
def parallelize(blobs, function, args_lists=None, on_error_value=None, n_threads=50):
if args_lists is None:
args = [(function, on_error_value, blob) for i, blobs in enumerate(stubs)]
else:
args = [(function, on_error_value, blob, *args_lists[i]) for i, stub in enumerate(blobs)]
with multiprocessing.Pool(min(n_threads, len(blobs))) as p:
values = p.starmap(to_parallelize_catch_exceptions, args)
return values
def blob_exists(blob):
blob.exists()
blobs = [....]
with client.batch():
parallelize(blobs, blob_exists, on_error_value=False)
Of course, there are two layers of abstraction above which are not expressly necessary, but I've copy-pasted my internal code that handles the parallelization for all the cloud client functions. Feel free to comment or DM me for more information as necessary. Cheers.
from python-storage.
Possible implementation at googleapis/google-cloud-python#8618
from python-storage.
Thank you folks for providing above workarounds!
The current batch design does not support library methods whose return values depend on the response payload. In this case, the error handling and conversion in blob.exists()
is not fully supported in a Batch context.
However, note that a new raise_exception
flag is added to Batch
via #1043 (pending release in v2.10.0). Setting raise_exception=False
allows all exceptions to be included in list of return responses. Although not recommended, it is now possible to get a list of 404 responses calling blob.exists()
with storage_client.batch(raise_exception=False)
In addition, we've also added clarifications on the limited Batch support in the python client, see details in #1045
from python-storage.
Related Issues (20)
- Implement soft delete
- Need to be able to download objects without storage.bucket.get permission same as gsutil HOT 1
- BlobWriter should abort multipart upload during exception handling HOT 7
- docs: add object retention samples
- Add http.client.RemoteDisconnected to list of retryable exceptions HOT 1
- Add 503 to list of retriable HTTP response codes HOT 4
- Warning: a recent release failed HOT 1
- Discussion: Contribution Idea - Python Code Sample for Handling Large JSON Files on Google Cloud HOT 3
- Sign blob URL using workload identity instead of common service account credentials HOT 8
- blob.upload_from_string get error Caused by SSLError(SSLEOFError HOT 2
- Support Storage Control Quickstart HOT 2
- `Blob.content_type` is `None` when created `from_string()` HOT 1
- match_glob keyword argument on google.cloud.storage.Client().list_blobs() has disappeared HOT 2
- Blob Writer's close function causes latency > 15s under load. HOT 6
- FR: Support HNS enablement in bucket metadata
- Can't set Cache-Control on GCS object HOT 2
- media_link & self_link in blob do not update when client option "api_endpoint" is set HOT 2
- Bypass 8MB limit to allow file to be uploaded in single request HOT 1
- Retry batch delete blob on 503
- Some unit tests require real credentials files
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-storage.