cactusdynamics / cactus-rt Goto Github PK

View Code? Open in Web Editor NEW

84.0 84.0 19.0 5.26 MB

A C++ framework for programming real-time applications

License: Mozilla Public License 2.0

CMake 7.07% C++ 90.15% C 0.23% Dockerfile 0.46% Shell 1.05% Makefile 1.04%

cactus-rt's Issues

Provide default signal handling in cactus_rt

SIGTERM and SIGINT should cause all threads registered under App to gracefully stop by default

Intern string to save space in trace packets

Hook up category to track event packets

Verify that tracing system doesn't lose messages on graceful shutdown

std::thread parity

Should consider making cactus_rt::Thread equivalent to std::thread.

joinable()
std::terminate on destruction?

The framework should be able to automatically trace regions of application run similar to Golang's trace region API: https://pkg.go.dev/runtime/trace#hdr-User_annotation. The application can then leverage the same system to create even more detailed traces.

One possible way to do this is via LTTng-UST. Another way to do this is via Perfetto. The former should be lock-free and constant time, the latter should also be. However, the latter uses string interning internally and it's unclear how that affects the worst-case runtime of the trace API calls.

Make cactus_rt::tracing usable without App and create example for it

Tracing should handle cases where threads are stopped

Investigate if we can use weak_ptr in the App's cache of thread_tracers and deal with cleaning up of thread_tracers when a thread winds down.

TraceAggregator should place event on a buffer to maximize the data pop rate from the queue

Re-add sleep and busy wait for Fifo scheduler type

Validate tracing counter events

Refactor `TraceAggregator::Run`

Reduce duplication of logic during the loop and during the flush before exit.
Check for dropped message counts and investigate Perfetto data packets for appropriate signals to emit.
Consider using a buffer (maybe after benchmarking is setup)
Better handle write errors (both inside Run and outside Run in places like RegisterThreadTracer and RegisterSink)

Measure memory allocations in RT

RT code should avoid memory allocations. We should provide a way to measure this with the framework and certain debug flags.

Emit instant event when overrun is detected in Loop

Decide on what to do with exceptions

Merge Thread and BaseThread

Deal with `Loop`s that over-run their deadlines

Currently CyclicThread doesn't know if Loop is over their deadlines. It naively continues to sleep and perform the next calculation. Find out how to deal with Loop functions that over-runs the deadline. Perhaps in literature there are some solutions. See https://ojs.dagstuhl.de/index.php/lites/article/view/LITES-v004-i001-a002 for an example.

Setup Doxygen

quill thread should be stopped and flushed on exit

Misc fixes for tracing

Investigate TraceSpan move operation correctness
Verify protobuf version and shutdown protobuf library at stop
Fix documentations such that images and links show up properly in doxygen

Builtin data logging with MPMC queue

The message passing example shows an example of passing data around via boost::lockfree::spsc_queue, which is fine as a demo. It would be more flexible to provide a built-in way to log a data struct directly via a MPMC queue.

One idea is to use iceoryx, which can even allow us to pass the data to another process. However, there's currently some latency issues with the latest release of iceoryx (which has since been fixed in production). It is also relatively complex to setup due to the requirement to start RouDi, it's service discovery layer.

Another possible library is https://github.com/cameron314/concurrentqueue.

Ideally, the thread has an API that allows the user to directly log data to disk without blocking. The serialization format can be pluggable, but we can have some "sane" defaults (like MCAP, CSV).

Remove latency tracking and/or use HDR histogram to track latency

Write tests for tracing system

Investigate built-in way to log in real time without blocking

A possible reference on how to do this: https://news.ycombinator.com/item?id=7815443

Bad variant access in simple example

On graceful termination of examples, the following error occurs:

terminate called after throwing an instance of 'std::bad_variant_access'
  what():  std::get: wrong index for variant

Setup CPU affinity for TraceAggregator

Compile-disable of tracing

Benchmark tracing performance

Emit latency (min, avg, max, p99) and throughput (max emit per second).
TraceAggregator throughput: need to understand the impact of serializing the data and also file write performance
- Can use a "test sink" that doesn't write to file to test for CPU overhead.
Data rate: need to know the data size per event which can be used to extrapolate IO rates when writing data to sinks. Compare results with when string interning is implemented.

Built-in tracing mega issue

Tracing is an important aspect of developing real-time applications as it allows the developer to identify long-running code blocks. This involves two components: a real-time trace collection system and an offline trace analysis/visualization system. The idea is to integrate trace collection into cactus_rt such that the program is automatically traced during development (either for the entire duration of the run, or be started/stopped dynamically via an external signal). The cactus_rt framework should also allow the program to be traced during production runs should the user opt to do so. If the performance impact of the trace event emission is low and the number of emissions are kept to a reasonably number, there's no reason why tracing can't be done continuously while the program is running to gain better insights into the program under production conditions.

A trace analysis system that includes gantt-chart-style visualization should be available for the tracing data. More complex analysis such as using SQL can also be good.

A bonus feature would be to pass log messages out of the RT thread and be able to format +print in a separate thread/process.

Perfetto

Perfetto is a Google-developed tracing tool with three major components: (1) the tracing SDK, (2) the trace processor, and (3) the trace visualizer. The tracing SDK enables application-specific traces by passing the trace data quickly out of the application process into a tracing service, which can then record the data into a file. It also has the ability to record the data directly in process, via a separate thread. The trace processor allows users to run SQL queries on an existing trace file, which can simplify the trace analysis. The trace visualizer is a web UI that allows for visualization of the trace data in a gantt-chart-style view, as well as providing a web UI for interacting SQL execution.

This theoretically checks all boxes on paper. My understanding on how it works is as follows, based on this document:

When trace events are emitted, it grabs a free page in a shared memory buffer and serializes the protobuf-encoded message into it (via a specialized protozero library that has very low overhead).
An async IPC gets sent to the tracing service which instructs the tracing service to copy the shared memory buffer into its own buffer (central buffer) and mark the shard memory buffer as free again for reuse.
From the central buffer, the data is written either periodically to disk, or written at the end of the program, depending on the configuration.

However, after careful reading of the documentations and quick look through the code base shows that the emission of trace events are not real time safe. Specifically, the documentation states:

At some point one of the set_int_val() calls will hit the slow-path and acquire a new buffer. The overall idea is having a serialization mechanism that is extremely lightweight most of the times and that requires some extra function calls when buffer boundary, so that their [time] cost gets amortized across all trace events.

In the context of the overall Perfetto tracing use case, the slow-path involves grabbing a process-local mutex and finding the next free chunk in the shared memory buffer. Hence writes are lock-free as long as they happen within the thread-local chunk and require a critical section to acquire a new chunk once every 4KB-32KB (depending on the trace configuration).

My understanding is that this occurs during the shared memory buffer write. If a trace event is emitted from the RT thread at the same time as a non-RT thread and the slow-path is triggered (due to the buffer boundary being crossed by the trace packet), a priority inversion problem could occur, which can result in unbounded latency. Further, the documentation suggests that memory allocation occurs in the slow path (not 100% sure on this tho), which can also trigger problems for real-time.

Thus, Perfetto is not suitable for real-time production tracing. However, it's possible we can still use Perfetto to trace in development, and use a compile time flag to disable tracing for release builds.

Even though the Perfetto tracing SDK is unusable in real-time, we might still be able to use the trace processor and visualizer components, if we can emit a Perfetto-compatible data file with a custom tracing solution, perhaps based on LTTng. Since the Perfetto trace processor also takes the Chromium trace JSON format, we can maybe emit that as well.

Also, Perfetto tracing SDK can't pass log messages (by default), but can emit counter information which can be plotted in the UI.

LTTng

TBD.

Move name out of AppConfig and ThreadConfig and into the constructor

revisit default heap reservation

Currently we reserve 512MB of heap on startup. Is this necessary? This is not that useful because the default allocator is not O(1), and RT code shouldn't allocate on the heap anyway.

cactusdynamics / cactus-rt Goto Github PK

cactus-rt's People

Contributors

Stargazers

Watchers

Forkers

cactus-rt's Issues

Perfetto

LTTng

Recommend Projects

Recommend Topics

Recommend Org