Giter Site home page Giter Site logo

Comments (11)

yurishkuro avatar yurishkuro commented on May 25, 2024

Usually we use sync in-memory reporter in tests. Is there a reason a real async reporter is used here?

from jaeger-client-cpp.

mdouaihy avatar mdouaihy commented on May 25, 2024

I cannot tell. Maybe @isaachier has more info about that.

Nevertheless, I think that there is a problem here even if, in real life, replacing the global tracer is not a good practice.

from jaeger-client-cpp.

isaachier avatar isaachier commented on May 25, 2024

This is a legitimate bug. There are circular references here. As the test completes, the reporter emits the last span over UDP and immediately destroys it. Since the span holds a reference to the tracer, the tracer cannot be released until the last span is destroyed. The tracer maintains ownership of the reporter, so the reporter is not destroyed until the tracer is destroyed. The call stack is something like this (sorted chronologically top to bottom):

RemoteReporter::sweepQueue (emit and destroy queued spans)
Span::~Span() (destroy span)
Tracer::~Tracer() (destroy tracer)
RemoteReporter::~RemoteReporter() (destroy reporter, invokes RemoteReporter::close)

One thing to keep in mind is that the reporter's mutex is supposed to guarantee that close cannot happen during a sweepQueue call. On Windows, for some reason, the mutex is recursive and thus the destructor completes successfully. This is not correct behavior and should be fixed. On Mac/Linux, the mutex is not recursive, which results in a deadlock. Either way, there is a reference cycle here.

The reference from span to tracer must remain as is. It allows the destructor to automatically handle span reporting correctly (only destroyed once all spans from the process are complete).

Maybe the solution could be to avoid calling the span destructor in the sweepQueue method. This is done implicitly with pop_front. I'm not sure what the alternative would look like. Perhaps some sort of ring buffer.

Update: It seems like the Windows mutex doesn't allow recursive locking, but it checks for it and throws an exception. That is OK with me.

from jaeger-client-cpp.

mdouaihy avatar mdouaihy commented on May 25, 2024

Here, we are in a case of a circular dependency. The behavior is not as intended neither on Linux nor on Windows. Herb Sutter presented a study about such use case: deffered_ptr (https://github.com/hsutter/gcpp)

I am not sure to understand you last phrase about Windows. The mutex under Windows are only recursive, hence a thread can lock twice the same mutex (Holding two locks on the mutex in the same time).

You are saying that the span must hold a shared_ptr to the tracer in order to be reported when destroyed. But since we know that the Trace will wait to flush all spans (so its lifetime goes beyond the spans' one), a raw pointer would be sufficient in this case, no?

from jaeger-client-cpp.

isaachier avatar isaachier commented on May 25, 2024

I am not sure to understand you last phrase about Windows. The mutex under Windows are only recursive, hence a thread can lock twice the same mutex (Holding two locks on the mutex in the same time).

I do not believe that std::mutex is recursive on Windows. See https://msdn.microsoft.com/en-us/library/hh921467.aspx. For a recursive mutex, you would have to use std::recursive_mutex.

You are saying that the span must hold a shared_ptr to the tracer in order to be reported when destroyed. But since we know that the Trace will wait to flush all spans (so its lifetime goes beyond the spans' one), a raw pointer would be sufficient in this case, no?

The OpenTracing API takes advantage of the reference count in a number of ways. One situation involves the freeing of a dynamically loaded library handle. The tracer has no inherent logic to "wait for all spans." It just waits until the tracer destructor is called, which is controlled by the reference count. Using a raw pointer would break a lot of things.

from jaeger-client-cpp.

mdouaihy avatar mdouaihy commented on May 25, 2024

According to the latest MSVC documentation found at https://docs.microsoft.com/en-us/cpp/standard-library/mutex-class-stl?view=vs-2017, the behavior is undefined if we try to lock on a mutex that we already hold. The only way to have the stack I added to the issue is if the lock was reentrant. The same thread was able to acquire the lock twice.

As for the tracer reference in the spans, I am not saying that the tracer should wait for the span. But, practically, it is: the tracer dtor closes the reporter which flush its spans. This is true in case of already reported spans. The question is: what behavior do we want in case of not yet reported spans?

if we really need the shared_ptr to the tracer, then we should find a way to request tracer destruction instead of destroying it when the last span is destroyed. A solution can be to keep a reference to the tracer in the thread that will act as a GC in this case.

from jaeger-client-cpp.

isaachier avatar isaachier commented on May 25, 2024

@mdouaihy whether or not it is undefined behavior, the stack trace you provided shows that MSVC is throwing an exception when it detects recursive locking. For reference, here is the symbol jaegertracing::reporters::RemoteReporter::close'::1'::catch$2() Line 77 C++ (note catch as in try/catch).

As per the solution, it is a little tricky. If we use a shared_ptr in the reporter to keep the tracer active, then the tracer will never be destroyed (tracer never closes reporter, so reference count is always at least 1). Using a weak_ptr would not seem to help either. I'm starting to think we cannot destroy the spans unless we are ready to close the reporter. The reporter sweepQueue thread is not a good place for destroying the spans. I'm going to have to rethink the design of the queue here to avoid deallocating spans in the background thread.

from jaeger-client-cpp.

isaachier avatar isaachier commented on May 25, 2024

I posted a question on StackOverflow to see if anyone has ideas on how to solve this: https://stackoverflow.com/questions/52330499/thread-safe-reference-counted-queue-c.

from jaeger-client-cpp.

isaachier avatar isaachier commented on May 25, 2024

@rnburn can you please take a look at this. Not sure if it affects opentracing-cpp, but it could if I remove shared_ptr from spans to tracer in favor of a global singleton.

from jaeger-client-cpp.

mdouaihy avatar mdouaihy commented on May 25, 2024

hi @isaachier, what about reseting the shared ptr from the reported spans?

The current shared_ptr ensures that a span created and not yet reported will be even if the tracer was destroyed.
Once it's reported, we no more have this constraint. We can even create a new span type that does not have this shared_ptr and that is used in the reporters and senders. This might be also a better solution because we have to implement a valid accessor of the tracer in the span as per the OpenTracing API.

What do you think?

from jaeger-client-cpp.

isaachier avatar isaachier commented on May 25, 2024

Hey @mdouaihy, I like the thought, but I believe we'd have the same issue if we call reset on the last reported span from within the sweepQueue method. The call to reset would still invoke the destructor of the reporter from within one of its methods.

from jaeger-client-cpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.