Comments (10)
@mcsimps2 Thanks for the quick feedback, it seems that the fix could be pretty straightforward. It might even make sense to rename timeout
to total_timeout
in order to better distinguish it from the timeout
passed to the requests
transport...
The issue can probably also be reproduced by artificially throttling one's internet connection, but if there is a need to access a big file, I will let you know. Or for verifying a patch when ready as a part of a PR review, that would also be cool!
from python-storage.
I also realize that it would be wise to split such large files into a multi-part upload, but the issue still remains, especially for slow internet connections.
from python-storage.
@plamut, PTAL
from python-storage.
(disclaimer: speaking from the top of my head, would need to check the code to confirm)
Not a definite answer, but the motivation for adding a TimeoutGuard
was to support customizable total timeouts for the underlying operations, when these operations can potentially involve multiple requests.
For example, sometimes there is a Future object sitting on top that wants to timeout after X seconds, even though each individual underlying request would take less than X seconds. This was needed to support some use case scenarios in BigQuery.
With that said, it is possible that this addition broke the expected semantics in other use cases, e.g. where timeout
is expected to represent a per-request timeout.
If there is no explicit timeout given (timeout=None
), then the TimeoutGuard
should act as a no-op. I think the following paragraph is the key:
File uploads for a couple minutes, exceeding the default timeout of 60/61 seconds (...) that the TimeGuard uses.
IIRC, this is what happens - the library user provides no explicit timeout, but unlike some other libraries, this library also defines a default timeout (60, 61)
, which is passed to the TimeoutGuard
. However, that timeout is probably meant as a timeout for each requests
request, and not the total timeout. As the requests
docs explain:
timeout is not a time limit on the entire response download; rather, an exception is raised if the server has not issued a response for timeout seconds (more precisely, if no bytes have been received on the underlying socket for timeout seconds). If no timeout is specified explicitly, requests do not time out.
@mcsimps2 Does this sound plausible? From the issue description it seems to me that your application does not use explicit timeouts, is that correct?
(in which case the reported timeout error is indeed a bug - thanks for reporting it!)
Update: Classified as P1, as this can break large file uploads, yet no obvious and reasonable workaround exists (monkeypatching does not count).
from python-storage.
Ah, that does make sense - I see how that's useful if you have multiple requests going on and need to cancel an operation mid-way through the sequence of requests if the "clock" timeout was exceeded.
You're correct, I do not set any explicit timeouts when calling the code, so any non-null timeout populated to the TimeoutGuard
must be coming from a library default.
However, that timeout is probably meant as a timeout for each requests request, and not the total timeout.
As a quick check (and for debugging purposes only), I can verify that setting the timeout passed to TimeoutGuard
to None
, while still passing the read and connect timeouts to requests
per the timeout
parameter, does indeed stop this issue from happening in the AuthorizedSession.request
function and allows me to successfully complete a large/slow upload operation.
So it seems that, perhaps, specifying a separate timeout to pass to TimeoutGuard
in addition to the one we want to pass to the requests
library would remediate this problem, although that might be a naive fix since I have only looked at a very limited portion of the codebase, and it would be one more method argument to worry about.
I would be happy to run any tests that would be helpful since I have access to big files and a slow network.
from python-storage.
@mcsimps2 FYI, I opened a pull request in the google-auth
library.
The timeout
parameter no longer affects the TimeoutGuard
, and instead a separate max_allowed_time
parameter is added to control it. Its value is None
by default, making the TimeoutGuard
a no-op by default as well.
If you wish, you can already try it out by using the google-auth
version from the PR branch, but otherwise there will probably be a new release soon anyway, and the Storage
lib will just need a version bump of the google-auth
dependency.
from python-storage.
Awesome, thank you for pushing this forward so quickly - much appreciated! This will help a lot, I'm looking forward to the next release
from python-storage.
@mcsimps2 The most recent google-auth release google-auth==1.11.0
contains the change that separates transport timeouts from the total time wall clock timeout. The latter is None
by default, meaning that TimoutGuard
should not be raising unnecessary Timeout errors anymore.
Until a new storage
lib release is made, you could try manually pinning google-auth to the abovementioned version, resolving this issue.
Also, thanks for the very good and detailed report, it made diagnosing the issue much faster. 👍
from python-storage.
@plamut is all of the work done for this (minus a release)?
from python-storage.
@crwilcox Yes, the bug no longer occurs if google-auth>=1.11.0
is used (the bugfix was done in the latter awhile ago).
from python-storage.
Related Issues (20)
- tests.system.test_bucket: test_blob_exists_hierarchy failed HOT 1
- tests.system.test_bucket: test_bucket_list_blobs_hierarchy_root_level failed
- tests.system.test_bucket: test_bucket_list_blobs_hierarchy_first_level failed
- tests.system.test_bucket: test_bucket_list_blobs_hierarchy_second_level failed
- tests.system.test_bucket: test_bucket_list_blobs_hierarchy_third_level failed
- tests.system.test_bucket: test_bucket_list_blobs_hierarchy_w_include_trailing_delimiter failed
- tests.system.test_bucket: test_bucket_list_blobs failed HOT 1
- tests.system.test_bucket: test_bucket_list_blobs_w_user_project failed
- tests.system.test_bucket: test_bucket_list_blobs_paginated failed HOT 1
- tests.system.test_bucket: test_bucket_list_blobs_paginated_w_offset failed HOT 1
- tests.system.test_transfer_manager: test_upload_many_skip_if_exists failed HOT 2
- tests.system.test_transfer_manager: test_upload_many_from_filenames_with_attributes failed HOT 1
- tests.system.test_transfer_manager: test_download_many failed HOT 1
- tests.system.test_transfer_manager: test_download_many_with_threads_and_file_objs failed HOT 1
- tests.system.test_bucket: test_ubla_set_unset_preserves_acls failed HOT 1
- tests.system.test_kms_integration: test_bucket_w_default_kms_key_name failed HOT 1
- Make it possible to update / add to the user-agent for an existing client object
- Micropi cannot install the package HOT 1
- OSError occurred while downloading files using transfer_manager.download_many_to_path HOT 6
- Warning: a recent release failed
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-storage.