Comments (11)
Usually we use sync in-memory reporter in tests. Is there a reason a real async reporter is used here?
from jaeger-client-cpp.
I cannot tell. Maybe @isaachier has more info about that.
Nevertheless, I think that there is a problem here even if, in real life, replacing the global tracer is not a good practice.
from jaeger-client-cpp.
This is a legitimate bug. There are circular references here. As the test completes, the reporter emits the last span over UDP and immediately destroys it. Since the span holds a reference to the tracer, the tracer cannot be released until the last span is destroyed. The tracer maintains ownership of the reporter, so the reporter is not destroyed until the tracer is destroyed. The call stack is something like this (sorted chronologically top to bottom):
RemoteReporter::sweepQueue (emit and destroy queued spans)
Span::~Span() (destroy span)
Tracer::~Tracer() (destroy tracer)
RemoteReporter::~RemoteReporter() (destroy reporter, invokes RemoteReporter::close)
One thing to keep in mind is that the reporter's mutex is supposed to guarantee that close
cannot happen during a sweepQueue
call. On Windows, for some reason, the mutex is recursive and thus the destructor completes successfully. This is not correct behavior and should be fixed. On Mac/Linux, the mutex is not recursive, which results in a deadlock. Either way, there is a reference cycle here.
The reference from span to tracer must remain as is. It allows the destructor to automatically handle span reporting correctly (only destroyed once all spans from the process are complete).
Maybe the solution could be to avoid calling the span destructor in the sweepQueue
method. This is done implicitly with pop_front
. I'm not sure what the alternative would look like. Perhaps some sort of ring buffer.
Update: It seems like the Windows mutex doesn't allow recursive locking, but it checks for it and throws an exception. That is OK with me.
from jaeger-client-cpp.
Here, we are in a case of a circular dependency. The behavior is not as intended neither on Linux nor on Windows. Herb Sutter presented a study about such use case: deffered_ptr (https://github.com/hsutter/gcpp)
I am not sure to understand you last phrase about Windows. The mutex under Windows are only recursive, hence a thread can lock twice the same mutex (Holding two locks on the mutex in the same time).
You are saying that the span must hold a shared_ptr to the tracer in order to be reported when destroyed. But since we know that the Trace will wait to flush all spans (so its lifetime goes beyond the spans' one), a raw pointer would be sufficient in this case, no?
from jaeger-client-cpp.
I am not sure to understand you last phrase about Windows. The mutex under Windows are only recursive, hence a thread can lock twice the same mutex (Holding two locks on the mutex in the same time).
I do not believe that std::mutex
is recursive on Windows. See https://msdn.microsoft.com/en-us/library/hh921467.aspx. For a recursive mutex, you would have to use std::recursive_mutex
.
You are saying that the span must hold a shared_ptr to the tracer in order to be reported when destroyed. But since we know that the Trace will wait to flush all spans (so its lifetime goes beyond the spans' one), a raw pointer would be sufficient in this case, no?
The OpenTracing API takes advantage of the reference count in a number of ways. One situation involves the freeing of a dynamically loaded library handle. The tracer has no inherent logic to "wait for all spans." It just waits until the tracer destructor is called, which is controlled by the reference count. Using a raw pointer would break a lot of things.
from jaeger-client-cpp.
According to the latest MSVC documentation found at https://docs.microsoft.com/en-us/cpp/standard-library/mutex-class-stl?view=vs-2017, the behavior is undefined if we try to lock on a mutex that we already hold. The only way to have the stack I added to the issue is if the lock was reentrant. The same thread was able to acquire the lock twice.
As for the tracer reference in the spans, I am not saying that the tracer should wait for the span. But, practically, it is: the tracer dtor closes the reporter which flush its spans. This is true in case of already reported spans. The question is: what behavior do we want in case of not yet reported spans?
if we really need the shared_ptr to the tracer, then we should find a way to request tracer destruction instead of destroying it when the last span is destroyed. A solution can be to keep a reference to the tracer in the thread that will act as a GC in this case.
from jaeger-client-cpp.
@mdouaihy whether or not it is undefined behavior, the stack trace you provided shows that MSVC is throwing an exception when it detects recursive locking. For reference, here is the symbol jaegertracing::reporters::RemoteReporter::close'::1'::catch$2() Line 77 C++
(note catch
as in try
/catch
).
As per the solution, it is a little tricky. If we use a shared_ptr
in the reporter to keep the tracer active, then the tracer will never be destroyed (tracer never closes reporter, so reference count is always at least 1). Using a weak_ptr
would not seem to help either. I'm starting to think we cannot destroy the spans unless we are ready to close the reporter. The reporter sweepQueue
thread is not a good place for destroying the spans. I'm going to have to rethink the design of the queue here to avoid deallocating spans in the background thread.
from jaeger-client-cpp.
I posted a question on StackOverflow to see if anyone has ideas on how to solve this: https://stackoverflow.com/questions/52330499/thread-safe-reference-counted-queue-c.
from jaeger-client-cpp.
@rnburn can you please take a look at this. Not sure if it affects opentracing-cpp, but it could if I remove shared_ptr
from spans to tracer in favor of a global singleton.
from jaeger-client-cpp.
hi @isaachier, what about reseting the shared ptr from the reported spans?
The current shared_ptr ensures that a span created and not yet reported will be even if the tracer was destroyed.
Once it's reported, we no more have this constraint. We can even create a new span type that does not have this shared_ptr and that is used in the reporters and senders. This might be also a better solution because we have to implement a valid accessor of the tracer in the span as per the OpenTracing API.
What do you think?
from jaeger-client-cpp.
Hey @mdouaihy, I like the thought, but I believe we'd have the same issue if we call reset on the last reported span from within the sweepQueue
method. The call to reset
would still invoke the destructor of the reporter from within one of its methods.
from jaeger-client-cpp.
Related Issues (20)
- Provide access to context HTTP Header name
- Jaeger 32 bit C/C++ client required
- QUESTION: Can not use with hunter manager
- how to release version to real scene
- Unit test: Socket.testFailBind fails when run as "root" HOT 6
- Can we please ditch hunter HOT 1
- Enable ppc64le arch for jaeger-client-cpp
- Unable to build windows in a pipeline.
- Add support for adding tracer tags via config yaml HOT 1
- My jaeger version is 1.23, and the jaeger-collector component has enabled --sampling.strategies-reload-interval, but the collected logs show that the sampling strategy is still old HOT 4
- Envoy - failed to load dynamic library opentracing versions are incomptable
- `RemoteReporter::report()` is blocked
- Default queue size & buffer flush interval are not used when reading configuration from YAML
- Issue facing while trying to build jaeger-client-cpp
- Compilation issue on Ubuntu 20.04 HOT 2
- When the sampling type is rate, the number of actual results is equal to the number of threads multiplied by the configured rate parameter
- warning const' was hidden [-Woverloaded-virtual] when using jaeger client with opentracing libraries
- Enablement TravisCI integration for IBM ppc64le architecture HOT 1
- Deprecate Jaeger C++ SDK HOT 3
- [Bug]: W3C Propagation is wrongfully encoding debug span flag into traceparent HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jaeger-client-cpp.