Comments (6)
@waahm7 - the problem is not easily reproducible, it seems to be a timing-related problem. So far, after 10s of thousands of batch jobs, we have only data for this instance. (It is possible that this has happened more frequently in production, since batch jobs are retried on failure.).
With regard to the CRT logs, we log at WARN
level to stderr
; the only other log entry other than the above error trace was this one:
[ERROR] 2023-02-15 16:39:26.623 S3MetaRequest [140468667021056] id=0x7fc125951d00 Meta request cannot recover from error 14343 (Invalid response status from request). (request=0x7fc15e00a600, response status=400)
terminate called after throwing an instance of 'av::CheckException'
what(): Check failure at perception/s2a/dataset_extraction/data_extraction_module.cc:574:
Expected: 'x is ok', with x := 'output_blob_->close()' [av::status::Status]
x = PutObject() failed
where: cloud/aws/s3/s3_streambuf.cc:93
extra: s3://perception-prod-training-data/opt/a831200c/s2a/2023-02-15-bless-collect_dking_updateOverlapFeb10_latestIssues/test/36c6f8923fe514d6b5a28ac5dbdea034.rats: HTTP response code: 400
Resolved remote host IP address:
Request ID: 2BAQ8WRZGGH3PPG1
[... rest as above]
The following is the summary of a multi-week-long effort to narrow down the cause of the problem:
- The
CompleteMultipartUpload
that failed with above error had the following data:
opt/a831200c/s2a/2023-02-15-bless-collect_dking_updateOverlapFeb10_latestIssues/test/36c6f8923fe514d6b5a28ac5dbdea034.rats
Object Size: -1 Bytes
Part Count: 1-2
Multipart Upload ID: bSMw15PyZS8eNZgoSxZ.qdZw4Awfm4beQGf9ERvdc8izMN5yFd6quULmunaMpJ8gUh2ZAjjidDlVr97lOBlSwhssgFeb_Zsb5CHN5fPYX6qHLjyRm7uxJZtnPerZGhbwsBU5xNStVDsjlr.Qnhe18s387J.7VybyeA2H9yrUnfk-
- As per the
Part Count
field, it seemed to expect 2 parts. However, only one part was uploaded: theCompleteMultipartUpload
with ID 2BAQ8WRZGGH3PPG1 was preceded by 1UploadPart
with ID 2BAR2P5XFDM71V14 on the same TCP connection. There was no trace of anotherUploadPart
call. - The s3 key of the failed multipart upload did not show up in
list-multipart-uploads
or inlist-parts
.
Since the CompleteMultipartUpload
request requires adding the ETag
values returned by the individual UploadPart
responses into the XML request body, the cause may be that either
- both
UploadPart
calls completed, but the second one failed after the API response returned theETag
(seems unlikely); or - due some kind of race condition or unsuitable timing, the
CompleteMultipartUpload
"thought" there was only 1 instead of 2ETags
/UploadParts
present, and thus tried to complete the multipart upload with a singleUploadPart
.
from aws-c-s3.
Hi,
Thank you for reporting the issue. Could you please attach the full CRT logs and provide some reproduction steps? Thank you!
from aws-c-s3.
Thank you for the details. Is the request getting paused and resumed later?
from aws-c-s3.
No, it does not use aws_s3_meta_request_pause
.
from aws-c-s3.
@grrtrr there have been some update since this issue was opened. Are you still running into this with the latest version of this repo?
from aws-c-s3.
@jmklix what are the updates and how do they fix the issue described here? In particular, have test cases been added to ensure the condition does not happen?
from aws-c-s3.
Related Issues (20)
- [s3 endpoint] aws_s3_client_endpoint_release race condition produces segmentation faults HOT 1
- Hard-coded Host header domain in aws_s3_get_object_size_message_new HOT 1
- TCP keep-alive settings that have proved useful HOT 1
- FreeBSD port: testunit fails HOT 3
- s3_request: num_requests_in_flight not reduced due to nil client field HOT 1
- [s3_auto_ranged_get] off-by-one error in calculating the end of the range
- `test_s3_get_performance` is an invalid test name HOT 4
- [Feature Request]: support for multi-part byte ranges HOT 1
- Under certain conditions meta requests are retried after receiving non-recoverable S3 error responses. HOT 6
- [CopyObject]: please re-enable client support and tests for CopyObject
- Improve Samples HOT 1
- S3-Transfer-Manager Streaming HOT 22
- Failed to build dependencies on ubuntu 20 HOT 5
- Fail to download full folder with 15K images and 7K text files HOT 17
- Trouble to try the samples in my S3 account HOT 5
- Faster Paging HOT 3
- Issue about possible mistake "bad copypast" HOT 1
- Handle range header client-side HOT 4
- Build broken since #360 HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aws-c-s3.