Giter Site home page Giter Site logo

Comments (21)

hoerup avatar hoerup commented on July 18, 2024 1

Have you tried with Microsoft.Data.SqlClient 5.1.0-preview2 ?

https://github.com/dotnet/SqlClient/releases/tag/v5.1.0-preview2

from distributedlock.

OskarKlintrot avatar OskarKlintrot commented on July 18, 2024 1

No, I haven't. For some reason we don't lose locks in our test env but in our acceptance test env (and production, which the graph is from). I'll deploy the two worst offenders with the preview version to the acceptance test env and we'll see in a few days if it makes any changes.

from distributedlock.

OskarKlintrot avatar OskarKlintrot commented on July 18, 2024 1

I noticed something really interesting/weird. 2/3 of the losses of the two services that loses the most locks occurs around midnight. In one env it's 23:45-00:00 and in another it's 00:00-00:05. I have no clue if it's related to this issue though, probably not. I wonder if Azure is doing something around midnight...

from distributedlock.

OskarKlintrot avatar OskarKlintrot commented on July 18, 2024 1

If the connection is broken, I would expect disposing the handle to throw an exception since it will try to execute the release sproc and that will fail. Are you seeing any exceptions there?

I made a mistake and didn't realize that there could be an exception thrown from there. I accidently suppress any exceptions thrown during dispose. I'll have to fix that and deploy the services.

Are you able to repro this at all locally? I assume not since you said just in acceptance test...

Unfortunately, no. Maybe if I leave a job running for a few days on a computer but I'll hope I can get some exceptions that makes sense from production instead.

Is it possible for you to try and repro this without subscribing to HandleLostToken? I want to try and isolate whether this is an issue that is being caused by handle monitoring or whether it is being detected by handle monitoring.

Sure, I think I could just use .WasLost() instead so I still can log when the lock have been lost. I'll start with logging the exceptions to get a picture of what is happening before I change.

from distributedlock.

OskarKlintrot avatar OskarKlintrot commented on July 18, 2024 1

Sorry, got buried in other work. Closing this for now and tracking the issue here. Might re-open this issue if Microsoft.Data.SqlClient don't think it has anything to do with them.

from distributedlock.

madelson avatar madelson commented on July 18, 2024

Can you share more detail?

  • Confirm that you are using the latest DistributedLock.SqlServer
  • How are you creating the lock?
  • How are you acquiring the lock?
  • How are you detecting lost locks? HandleLostToken I assume?
  • What exceptions are you seeing?

from distributedlock.

OskarKlintrot avatar OskarKlintrot commented on July 18, 2024

Confirm that you are using the latest DistributedLock.SqlServer
Yes, version 1.0.2.

How are you creating the lock?
How are you acquiring the lock?

private async Task<IDistributedSynchronizationHandle?> GetHandleAsync(CancellationToken cancellationToken)
{
    using var _ = _logger.BeginScope(_state);

    while (!cancellationToken.IsCancellationRequested)
    {
        var failure = true;

        try
        {
            _logger.LogInformation("Trying to acquire lock for {Job Name}", _jobName);

            var handle = await _distributedLockProvider.AcquireLockAsync(
                _lockName,
                timeout: null,
                cancellationToken: cancellationToken);

            failure = false;

            return handle;
        }
        // Multiple catch clauses
        finally
        {
            if (failure)
            {
                await Delay.TryWait(TimeSpan.FromSeconds(5), cancellationToken);
            }
        }
    }

    return null;
}

How are you detecting lost locks? HandleLostToken I assume?
That's correct:

cancellationTokenRegistration = handle.HandleLostToken.Register(() =>
{
    // HandleLostToken.Register is to slow to use for anything else than logging.
    _logger.LogError("Lost lock for job {Job Name}.", _jobName);

    _telemetryClient.TrackEvent(new($"Lost lock for {_jobName}")
    {
        Properties =
        {
            { "Job Name", _jobname },
            { "Lock State", "Lost" }
        },
    });
});

What exceptions are you seeing?
I can't find any logged exceptions. I don't think we are suppressing any. I think we might be actually, doh... Once you got the handle, is there any exceptions being thrown?

I made this repo for a previous bug but that's pretty much still how we are using the library.

from distributedlock.

madelson avatar madelson commented on July 18, 2024

Once you got the handle, is there any exceptions being thrown?

If the connection is broken, I would expect disposing the handle to throw an exception since it will try to execute the release sproc and that will fail. Are you seeing any exceptions there?

In one env it's 23:45-00:00 and in another it's 00:00-00:05

Are you able to repro this at all locally? I assume not since you said just in acceptance test...

How are you detecting lost locks? HandleLostToken I assume?
That's correct:

Is it possible for you to try and repro this without subscribing to HandleLostToken? I want to try and isolate whether this is an issue that is being caused by handle monitoring or whether it is being detected by handle monitoring.

from distributedlock.

madelson avatar madelson commented on July 18, 2024

Sounds good @OskarKlintrot . FWIW I ran the unit test suite with the 5.0.1 upgrade and everything passes, so no luck there.

Another thing that would be interesting to test on your end is to try different versions of Microsoft.Data.SqlClient and see which one introduced the behavior.

There are a lot of different suggestions flying around here so I thought it might make sense to track with a list:

  • Try latest prerelease version of Microsoft.Data.SqlClient
  • Try different versions of Microsoft.Data.SqlClient to see which one causes the issue
  • Log exceptions from Dispose() and report back
  • Try removing HandleLostToken subscription (but still logging Dispose exceptions to get a delayed sense of lock losses)
  • Try to create local repro (perhaps based on this one).

from distributedlock.

hoerup avatar hoerup commented on July 18, 2024

@OskarKlintrot We did see some weird SQL connectivity issues in one of our microservices after upgrade to net7
We have just upgraded that service to the newly released Microsoft.Data.SqlClient 5.1.0 - and initial tests indicates that the issue is gone.

from distributedlock.

OskarKlintrot avatar OskarKlintrot commented on July 18, 2024
  • Try latest prerelease version of Microsoft.Data.SqlClient

I let it run for two days and on day 2 the behaviour come back. So no luck there I would say.

  • Try different versions of Microsoft.Data.SqlClient to see which one causes the issue

Since I can't go any lower than the one required by EF Core 7 I don't think this is feasible, unfortunately.

  • Log exceptions from Dispose() and report back

It doesn't seem to be any exceptions from Dispose(). On the same instance that lost it locks and at the same time I now see this exception:

BeginExecuteNonQuery requires an open and available Connection. The connection's current state is closed.

image

Stack trace:

System.InvalidOperationException:
   at Microsoft.Data.SqlClient.SqlCommand.ValidateCommand (Microsoft.Data.SqlClient, Version=5.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5)
   at Microsoft.Data.SqlClient.SqlCommand.InternalExecuteNonQuery (Microsoft.Data.SqlClient, Version=5.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5)
   at Microsoft.Data.SqlClient.SqlCommand.BeginExecuteNonQueryInternal (Microsoft.Data.SqlClient, Version=5.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5)
   at Microsoft.Data.SqlClient.SqlCommand.BeginExecuteNonQueryAsync (Microsoft.Data.SqlClient, Version=5.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncImpl (System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Data.SqlClient.SqlCommand.InternalExecuteNonQueryAsync (Microsoft.Data.SqlClient, Version=5.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Medallion.Threading.Internal.Data.DatabaseCommand+<ExecuteAsync>d__12`1.MoveNext (DistributedLock.Core, Version=1.0.0.0, Culture=neutral, PublicKeyToken=12bc08512096ade0: /_/DistributedLock.Core/Internal/Data/DatabaseCommand.cs:151)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable`1+ConfiguredValueTaskAwaiter.GetResult (System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Medallion.Threading.SqlServer.SqlApplicationLock+<ExecuteReleaseCommandAsync>d__16.MoveNext (DistributedLock.SqlServer, Version=1.0.0.0, Culture=neutral, PublicKeyToken=12bc08512096ade0: /_/DistributedLock.SqlServer/SqlApplicationLock.cs:71)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Medallion.Threading.Internal.Data.MultiplexedConnectionLock+<ReleaseAsync>d__12`1.MoveNext (DistributedLock.Core, Version=1.0.0.0, Culture=neutral, PublicKeyToken=12bc08512096ade0: /_/DistributedLock.Core/Internal/Data/MultiplexedConnectionLock.cs:159)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Medallion.Threading.Internal.Data.MultiplexedConnectionLock+<ReleaseAsync>d__12`1.MoveNext (DistributedLock.Core, Version=1.0.0.0, Culture=neutral, PublicKeyToken=12bc08512096ade0: /_/DistributedLock.Core/Internal/Data/MultiplexedConnectionLock.cs:173)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at My code+<ExecuteAsync>d__9.MoveNext (My Assembly, Version=7.1.2.0, Culture=neutral, PublicKeyToken=null: /_/Hosting/My Assembly/Lock/NonConcurrent/NonConcurrentLock.cs:115)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at My Code+<ExecuteContinuously>d__15.MoveNext (My Assembly, Version=7.1.2.0, Culture=neutral, PublicKeyToken=null: /_/Hosting/My Assembly/CronJob.cs:324)
  • Try removing HandleLostToken subscription (but still logging Dispose exceptions to get a delayed sense of lock losses)

Do you still think this is worth doing? (see below)

  • Try to create local repro (perhaps based on this one).

On hold (see below)


Given that the exceptions (there a some other as well from another occasion all related to db connection issues, Azure admits the DB became unavailable at that time) points to some hiccup with the db connection I doubt that this issue has anything to do with DistributedLock but rather Microsoft.Data.SqlClient and Azure SQL Server. Do you (@hoerup and @madelson) agree with me? If so I think we can close this issue. I can always re-open it if I find something new.

from distributedlock.

OskarKlintrot avatar OskarKlintrot commented on July 18, 2024

@OskarKlintrot We did see some weird SQL connectivity issues in one of our microservices after upgrade to net7 We have just upgraded that service to the newly released Microsoft.Data.SqlClient 5.1.0 - and initial tests indicates that the issue is gone.

@hoerup Oh, that's very interesting! Are you also using Azure SQL perhaps? When I upgraded to the latest preview everything seemed fine for a day until the issues come back. Please let me know if it works for you! If it does we will probably also upgrade to 5.1.0.

from distributedlock.

OskarKlintrot avatar OskarKlintrot commented on July 18, 2024

This issue seems somewhat similar to what I am seeing, not sure if this issue is related. Is there any retry-logic in DistributedLock.SqlServer btw?

from distributedlock.

hoerup avatar hoerup commented on July 18, 2024

@OskarKlintrot We did see some weird SQL connectivity issues in one of our microservices after upgrade to net7 We have just upgraded that service to the newly released Microsoft.Data.SqlClient 5.1.0 - and initial tests indicates that the issue is gone.

@hoerup Oh, that's very interesting! Are you also using Azure SQL perhaps? When I upgraded to the latest preview everything seemed fine for a day until the issues come back. Please let me know if it works for you! If it does we will probably also upgrade to 5.1.0.

I might have been to fast there - just saw some connection errors again :(

we are using SQL server on-prem

from distributedlock.

OskarKlintrot avatar OskarKlintrot commented on July 18, 2024

I might have been to fast there - just saw some connection errors again :(

Too bad!

we are using SQL server on-prem

Sounds like the issue is with Microsoft.Data.SqlClient then. Are you seeing the issues from other places than DistributedLock.SqlServer as well? We do, we see them from EF as well.

from distributedlock.

madelson avatar madelson commented on July 18, 2024

Is there any retry-logic in DistributedLock.SqlServer btw?

There isn't any retry logic. Especially once we grab the lock if the connection dies we can't really retry at that point. Even if we reaquire the lock someone else still could have held it during the blip so we want to report as a lost handle.

I doubt that this issue has anything to do with DistributedLock but rather Microsoft.Data.SqlClient and Azure SQL Server

we see them from EF as well.

@OskarKlintrot that does seem likely if you're encountering a spike across uses of Microsoft.Data.SqlClient and not just in distributed lock

As a next step, I think it makes sense to report an issue on the Microsoft.Data.SqlClient github and link it here. Do you agree? Then we can close this for now while that plays out. The issue you already linked does seem related but that was closed without action. If we were able to pin down the appearance of the problem to a specific release of the client that might help it gain traction, as would creating a standalone repro that can run against Azure.

from distributedlock.

madelson avatar madelson commented on July 18, 2024

@OskarKlintrot any updates here? Did you end up filing an issue with Microsoft?

from distributedlock.

OskarKlintrot avatar OskarKlintrot commented on July 18, 2024

Still working on it, got sidetracked with a man cold :)

from distributedlock.

madelson avatar madelson commented on July 18, 2024

@OskarKlintrot sorry to hear that; hope you feel better soon :-)

from distributedlock.

madelson avatar madelson commented on July 18, 2024

Thanks!

from distributedlock.

JRahnama avatar JRahnama commented on July 18, 2024

@hoerup will you be able to provide a minimal sample that could repro the issue?

from distributedlock.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.