Giter Site home page Giter Site logo

Comments (5)

BenHenning avatar BenHenning commented on August 16, 2024

I could be wrong, but is there actually a way to mark this test as flaky in code? Bazel does technically support a way by auto-rerunning flaky tests, but I think it'd be better just to fix the test, instead, since we don't want to encourage checking in flaky code (where we can avoid it).

Currently running 100x runs to see how flaky the test is.

from oppia-android.

BenHenning avatar BenHenning commented on August 16, 2024

With 100x runs, I saw 19 failures:

  • testLoggingInterceptor_makeNetworkCall_succeeds 14/19 times (14% failure rate)
  • testLoggingInterceptor_makeNetworkCallWithInvalidUrl_failsAndCompletes 5/19 times (5% failure rate)

Edit: Also, here's the command run:

bazel test --runs_per_test=100 //data:src/test/java/org/oppia/android/data/backends/gae/NetworkLoggingInterceptorTest

Unfortunately, running the tests above in isolation doesn't seem to actually catch any flakes, so these flakes are seemingly caused by state interaction between the tests of the suite.

Specific failures that we're seeing for each:

testLoggingInterceptor_makeNetworkCall_succeeds:

testLoggingInterceptor_makeNetworkCall_succeeds(org.oppia.android.data.backends.gae.NetworkLoggingInterceptorTest)
java.lang.IllegalStateException: This job has not completed yet
        at kotlinx.coroutines.JobSupport.getCompletedInternal$kotlinx_coroutines_core(JobSupport.kt:1199)
        at kotlinx.coroutines.DeferredCoroutine.getCompleted(Builders.common.kt:100)
        at org.oppia.android.data.backends.gae.NetworkLoggingInterceptorTest.testLoggingInterceptor_makeNetworkCall_succeeds(NetworkLoggingInterceptorTest.kt:99)

testLoggingInterceptor_makeNetworkCallWithInvalidUrl_failsAndCompletes:

testLoggingInterceptor_makeNetworkCallWithInvalidUrl_failsAndCompletes(org.oppia.android.data.backends.gae.NetworkLoggingInterceptorTest)
java.lang.IllegalStateException: This job has not completed yet
        at kotlinx.coroutines.JobSupport.getCompletedInternal$kotlinx_coroutines_core(JobSupport.kt:1199)
        at kotlinx.coroutines.DeferredCoroutine.getCompleted(Builders.common.kt:100)
        at org.oppia.android.data.backends.gae.NetworkLoggingInterceptorTest.testLoggingInterceptor_makeNetworkCallWithInvalidUrl_failsAndCompletes(NetworkLoggingInterceptorTest.kt:123)

from oppia-android.

BenHenning avatar BenHenning commented on August 16, 2024

So the ultimate problem for both tests seems to be this line:

    val firstRequest = firstRequestsDeferred.getCompleted().single()

This was caused indirectly by #5402 since it fixed the tests to properly verify what they're trying to verify.

Note that #5402 also made similar changes to ConsoleLoggerTest. I verified that this test, fortunately, doesn't have any flakes per the following run:

bazel test --runs_per_test=100 //utility/src/test/java/org/oppia/android/util/logging:ConsoleLoggerTest

However, this is probably a false result since, per the previous comment, it seems that multiple tests need to cooperate in order to trigger the failure.

from oppia-android.

BenHenning avatar BenHenning commented on August 16, 2024

I've added a bunch of println instrumentation lines to better understand what's happening, and disabled the testLoggingInterceptor_makeCrashingNetworkCall_failsAndCompletes test just to make sure it isn't impacting the flakiness (and to reduce noise).

See https://gist.github.com/BenHenning/473e76df316acdfc99c609c363db155f for the instrumentation used for these results.

Here's the debug output when testLoggingInterceptor_makeNetworkCall_succeeds fails:

@@@@@@ [B] pre mockWebServer.enqueue(mockResponse)
@@@@@@ [B] pre request execute
@@@@@@ NetworkLoggingInterceptor.intercept()
@@@@@@ intercept() pre-launch success emit launch
@@@@@@ intercept() pre-launch fail 1 emit launch
@@@@@@ intercept() top-level is done
@@@@@@ [B] pre coroutines sync
@@@@@@ [B] pre networkLoggingInterceptor.logNetworkCallFlow.take(1).toList()
@@@@@@ [B] pre networkLoggingInterceptor.logFailedNetworkCallFlow.take(1).toList()
@@@@@@ intercept() pre actual success emit()
@@@@@@ intercept() pre actual fail 1 emit()
@@@@@@ [B] post networkLoggingInterceptor.logNetworkCallFlow.take(1).toList()
@@@@@@ [B] post networkLoggingInterceptor.logFailedNetworkCallFlow.take(1).toList()
@@@@@@ intercept() post actual fail 1 emit()
@@@@@@ intercept() post actual success emit()
@@@@@@ [B] pre test coroutine sync
@@@@@@ [B] pre firstRequestsDeferred get completed
@@@@@@ [B] pre firstFailingRequestsDeferred get completed
@@@@@@ [B] post all getCompleted()s
.@@@@@@ [A] pre mockWebServer.enqueue(MockResponse().setBody(testResponseBody))
@@@@@@ [A] pre request execute
@@@@@@ NetworkLoggingInterceptor.intercept()
@@@@@@ intercept() pre-launch success emit launch
@@@@@@ intercept() top-level is done
@@@@@@ [A] pre coroutines sync
@@@@@@ intercept() pre actual success emit()
@@@@@@ [A] pre networkLoggingInterceptor.logNetworkCallFlow.take(1).toList()
@@@@@@ intercept() post actual success emit()
@@@@@@ [A] pre test coroutine sync
@@@@@@ [A] pre get completed
E
Time: 14.058
There was 1 failure:
1) testLoggingInterceptor_makeNetworkCall_succeeds(org.oppia.android.data.backends.gae.NetworkLoggingInterceptorTest)
java.lang.IllegalStateException: This job has not completed yet
        at kotlinx.coroutines.JobSupport.getCompletedInternal$kotlinx_coroutines_core(JobSupport.kt:1199)
        at kotlinx.coroutines.DeferredCoroutine.getCompleted(Builders.common.kt:100)
        at org.oppia.android.data.backends.gae.NetworkLoggingInterceptorTest.testLoggingInterceptor_makeNetworkCall_succeeds(NetworkLoggingInterceptorTest.kt:108)

FAILURES!!!
Tests run: 2,  Failures: 1

Here's the debug output when testLoggingInterceptor_makeNetworkCall_succeeds passes:

@@@@@@ [B] pre mockWebServer.enqueue(mockResponse)
@@@@@@ [B] pre request execute
@@@@@@ NetworkLoggingInterceptor.intercept()
@@@@@@ intercept() pre-launch success emit launch
@@@@@@ intercept() pre-launch fail 1 emit launch
@@@@@@ intercept() top-level is done
@@@@@@ [B] pre coroutines sync
@@@@@@ [B] pre networkLoggingInterceptor.logNetworkCallFlow.take(1).toList()
@@@@@@ [B] pre networkLoggingInterceptor.logFailedNetworkCallFlow.take(1).toList()
@@@@@@ intercept() pre actual success emit()
@@@@@@ intercept() pre actual fail 1 emit()
@@@@@@ [B] post networkLoggingInterceptor.logFailedNetworkCallFlow.take(1).toList()
@@@@@@ [B] post networkLoggingInterceptor.logNetworkCallFlow.take(1).toList()
@@@@@@ intercept() post actual fail 1 emit()
@@@@@@ intercept() post actual success emit()
@@@@@@ [B] pre test coroutine sync
@@@@@@ [B] pre firstRequestsDeferred get completed
@@@@@@ [B] pre firstFailingRequestsDeferred get completed
@@@@@@ [B] post all getCompleted()s
.@@@@@@ [A] pre mockWebServer.enqueue(MockResponse().setBody(testResponseBody))
@@@@@@ [A] pre request execute
@@@@@@ NetworkLoggingInterceptor.intercept()
@@@@@@ intercept() pre-launch success emit launch
@@@@@@ intercept() top-level is done
@@@@@@ [A] pre coroutines sync
@@@@@@ intercept() pre actual success emit()
@@@@@@ [A] pre networkLoggingInterceptor.logNetworkCallFlow.take(1).toList()
@@@@@@ [A] post networkLoggingInterceptor.logNetworkCallFlow.take(1).toList()
@@@@@@ intercept() post actual success emit()
@@@@@@ [A] pre test coroutine sync
@@@@@@ [A] pre get completed
@@@@@@ [A] post get completed

Time: 4.972

OK (2 tests)

The diff is interesting:

image

Specific takeaways:

  • logNetworkCallFlow and logFailedNetworkCallFlow are processed in reverse order in testLoggingInterceptor_makeNetworkCallWithInvalidUrl_failsAndCompletes in the passing case.
  • Fetching networkLoggingInterceptor.logNetworkCallFlow in testLoggingInterceptor_makeNetworkCall_succeeds actually finishes (plus getCompleted finishes--obviously that fails for the failure case since it's the actual exception being thrown).

I'm not certain the order difference matters here, and we're otherwise not getting interesting information on why the flow isn't completing.

from oppia-android.

BenHenning avatar BenHenning commented on August 16, 2024

Ah, I think I figured it out. Per https://kotlinlang.org/api/kotlinx.coroutines/kotlinx-coroutines-core/kotlinx.coroutines.flow/-mutable-shared-flow.html MutableSharedFlow has no replay caching by default which makes it time sensitive to subscriptions. We can see in the above logs that emit() is actually being called before we try to consume the flow which means there's a data race between starting to observe the flow and a value actually being delivered (events delivered will only block if there's another event being delivered, not if there zero subscribers--those are simply lost).

Adding another additional synchronization barrier seems to fix the problem by ensuring flow consumption happens before emit() is called.

from oppia-android.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.