Giter Site home page Giter Site logo

Comments (7)

aajtodd avatar aajtodd commented on August 12, 2024

Thanks for the issue.

we run into frequent, unhandled networking exceptions when talking to just about any AWS service, and especially S3.

I'd be interested in understanding what types of exceptions you are seeing. Are they all the same as the one you included in this issue (e.g. java.net.SocketException: Broken pipe (Write failed)) or are there others? Do they differ by service?


To consider java.net.SocketException retryable.

The issue is that knowing if a network exception is retryable generically for any operation depends on whether the operation is idempotent. Take the example given, you hit something like a socket closed issue (perhaps the server closed it or there was some other network issue). What if this failure happened after the server modified state? How do we know it's safe to retry at that point?

The answer is we don't (unless the operation is marked as @idempotent). We don't special case idempotent operations yet, that would be a good addition/feature request.

It's possible we may be able to special case S3's retry policy to be more relaxed but again we'd need a good way to classify which networking errors are retryable.


In the meantime you can always customize the retry policy for the SDK to retry whatever kind of error you want. This doesn't require changing the RetryStrategy, just the policy for what errors are considered retryable.

e.g.

class CustomRetryPolicy : RetryPolicy<Any?> {
    override fun evaluate(result: Result<Any?>): RetryDirective {
        return when(result.exceptionOrNull()) {
            is IOException -> RetryDirective.RetryError(RetryErrorType.Transient)
            else -> AwsDefaultRetryPolicy.evaluate(result)
        }
    }
}

fun main(): Unit = runBlocking {

    val s3 = S3Client.fromEnvironment {
        retryPolicy = CustomRetryPolicy()
    }

    ...

Note: You'll need a dependency on aws.sdk.kotlin:aws-http:<sdk-version> to access AwsDefaultRetryPolicy

from smithy-kotlin.

berlix avatar berlix commented on August 12, 2024

Thank you for your response. I see the problem with non-idempotent requests and that not all errors (or all kinds of SocketException) should be considerend retryable.

And thanks for the hint about the RetryPolicy, we'll try that.

I'd be interested in understanding what types of exceptions you are seeing

I don't have good stats right now, because we also use some older SDK versions in some services, but I believe that from the recent versions, the SocketException with Broken pipe (Write failed) is the one we see the most by far, mostly from S3 and sometimes from CloudWatch (PutMetrics). I see you already use some hacks where you check the exception's message to determine retryability - it seems to me that "Write failed" does qualify for retryability, so one idea could be to add that as a discriminator.

One thing we also see regularly, but I suppose it's unrelated and just a bug in OkHttp, is IllegalStateException: Unbalanced enter/exit from inside OkHttp. That was why tried switching to the CRT client. Now the connection-related exception from the CRT client (as seen in my initial comment) doesn't seem to be very specific, unfortunately, and doesn't say whether it was a write that failed or a read.

from smithy-kotlin.

ianbotsf avatar ianbotsf commented on August 12, 2024

Hi @berlix, are you still encountering socket exceptions when using OkHttp with more recent versions of the SDK?

from smithy-kotlin.

madisp avatar madisp commented on August 12, 2024

Googling aws-kotlin-sdk "Unbalanced enter/exit" brought me here :)

@ianbotsf I'm getting Caused by: java.lang.IllegalStateException: Unbalanced enter/exit when spawning multiple SQS consumers, test: https://gist.github.com/madisp/c3c1e04ad0ef7c34ed66b5545606c846. Seems to reliably happen on my local m1 macbook when there are 10 or more parallel consumers.

(this is on aws.sdk.kotlin:sqs-jvm:1.1.10)

This might be a test coroutines issue though?

from smithy-kotlin.

ianbotsf avatar ianbotsf commented on August 12, 2024

Thanks for the test code @madisp but I cannot reproduce the failure on my local Windows machine or my cloud EC2 Linux machine. It's likely that parallelism and timings are making this difficult to reproduce in different executing environments.

I see that OkHttp recently fixed an unbalanced enter/exit exception caused by rapid release/re-acquisition of connections (square/okhttp#7381). If you have stable repro code, it would be interesting to see if it still fails on OkHttp 5.0.0-alpha.12 (smithy-kotlin currently uses 5.0.0-alpha.11). If not then we can prioritize upgrading the OkHttp dependency to the latest alpha version.

from smithy-kotlin.

madisp avatar madisp commented on August 12, 2024

Yup, still getting with 5.0.0-alpha.12. One thing that makes it easier to reproduce is the number of MESSAGES and CONSUMERS - can you try increasing these to 1000 and 100 respectively?

from smithy-kotlin.

berlix avatar berlix commented on August 12, 2024

Apologies for the late response. We haven't observed that specific exception ("Broken pipe") with SDK versions 1.0.13 or 1.1.13. We do still regularly get the following exception, at least when downloading objects from S3 - no clue if it's related:

aws.smithy.kotlin.runtime.http.HttpException: java.io.IOException: unexpected end of stream on https://s3.eu-west-1.amazonaws.com/...
    at aws.smithy.kotlin.runtime.http.engine.okhttp.OkHttpEngine.roundTrip(OkHttpEngine.kt:158)
    at aws.smithy.kotlin.runtime.http.engine.okhttp.OkHttpEngine$roundTrip$1.invokeSuspend(OkHttpEngine.kt)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
    at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)
Caused by: java.io.IOException: unexpected end of stream on https://s3.eu-west-1.amazonaws.com/...
    at okhttp3.internal.http1.Http1ExchangeCodec.readResponseHeaders(Http1ExchangeCodec.kt:209)
    at okhttp3.internal.connection.Exchange.readResponseHeaders(Exchange.kt:111)
    at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.kt:95)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
    at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:34)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
    at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
    at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:84)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
    at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:65)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
    at aws.smithy.kotlin.runtime.http.engine.okhttp.MetricsInterceptor.intercept(MetricsInterceptor.kt:30)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
    at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:205)
    at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:537)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.EOFException: \n not found: limit=0 content=…
    at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.kt:335)
    at okhttp3.internal.http1.HeadersReader.readLine(HeadersReader.kt:29)
    at okhttp3.internal.http1.Http1ExchangeCodec.readResponseHeaders(Http1ExchangeCodec.kt:179)
    ... 18 common frames omitted    

from smithy-kotlin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.