Giter Site home page Giter Site logo

tokio-rs / console Goto Github PK

View Code? Open in Web Editor NEW
3.2K 26.0 132.0 7.49 MB

a debugger for async rust!

License: MIT License

Rust 80.89% Nix 0.86% Shell 1.80% JavaScript 0.06% HTML 0.04% CSS 0.18% TypeScript 16.17%
rust tokio debugger async console tokio-console

console's Introduction

tokio-console

API Documentation(main) MIT licensed Build Status Discord chat

Chat | API Documentation (main branch)

what's all this, then?

this repository contains an implementation of TurboWish/tokio-console, a diagnostics and debugging tool for asynchronous Rust programs. the diagnostic toolkit consists of multiple components:

  • a wire protocol for streaming diagnostic data from instrumented applications to diagnostic tools. the wire format is defined using gRPC and protocol buffers, for efficient transport on the wire and interoperability between different implementations of data producers and consumers.

    the console-api crate contains generated code for this wire format for projects using the tonic gRPC implementation. additionally, projects using other gRPC code generators (including those in other languages!) can depend on the protobuf definitions themselves.

  • instrumentation for collecting diagnostic data from a process and exposing it over the wire format. the console-subscriber crate in this repository contains an implementation of the instrumentation-side API as a tracing-subscriber Layer, for projects using Tokio and tracing.

  • tools for displaying and exploring diagnostic data, implemented as gRPC clients using the console wire protocol. the tokio-console crate implements an an interactive command-line tool that consumes this data, but other implementations, such as graphical or web-based tools, are also possible.

extremely cool and amazing screenshots

wow! whoa! it's like top(1) for tasks!

task list view

viewing details for a single task:

task details view

on the shoulders of giants

the console is part of a much larger effort to improve debugging tooling for async Rust. a 2019 Google Summer of Code project by Matthias Prechtl (@matprec) implemented an initial prototype, with a focus on interactive log viewing. more recently, both the Tokio team and the async foundations working group have made diagnostics and debugging tools a priority for async Rust in 2021 and beyond. in particular, a series of blog posts by @pnkfelix lay out much of the vision that this project seeks to eventually implement.

furthermore, we're indebted to our antecedents in other programming languages and environments for inspiration. this includes tools and systems such as pprof, Unix top(1) and htop(1), XCode's Instruments, and many others.

using it

instrumenting your program

to instrument an application using Tokio, add a dependency on the console-subscriber crate, and add this one-liner to the top of your main function:

console_subscriber::init();

notes:

  • in order to collect task data from Tokio, the tokio_unstable cfg must be enabled. for example, you could build your project with

    RUSTFLAGS="--cfg tokio_unstable" cargo build

    or add the following to your .cargo/config.toml file:

    [build]
    rustflags = ["--cfg", "tokio_unstable"]

    For more information on the appropriate location of your .cargo/config.toml file, especially when using workspaces, see the console-subscriber readme.

  • the tokio and runtime tracing targets must be enabled at the TRACE level.

running the console

to run the console command-line tool, install tokio-console from crates.io

cargo install --locked tokio-console

and run locally

tokio-console

alternative method: run the tool from a local checkout of this repository

$ cargo run

by default, this will attempt to connect to an instrumented application running on localhost on port 6669. if the application is running somewhere else, or is serving the console endpoint on a different port, a target address can be passed as an argument to the console (either as an <IP>:<PORT> or <DNS_NAME>:<PORT>). for example:

cargo run -- http://my.great.console.app.local:5555

The console command-line tool supports a number of additional flags to configure its behavior. The help command will print a list of supported command-line flags and arguments:

$ tokio-console --help
The Tokio console: a debugger for async Rust.

Usage: tokio-console[EXE] [OPTIONS] [TARGET_ADDR] [COMMAND]

Commands:
  gen-config
          Generate a `console.toml` config file with the default
          configuration values, overridden by any provided command-line
          arguments
  gen-completion
          Generate shell completions
  help
          Print this message or the help of the given subcommand(s)

Arguments:
  [TARGET_ADDR]
          The address of a console-enabled process to connect to.
          
          This may be an IP address and port, or a DNS name.
          
          On Unix platforms, this may also be a URI with the `file`
          scheme that specifies the path to a Unix domain socket, as in
          `file://localhost/path/to/socket`.
          
          [default: http://127.0.0.1:6669]

Options:
      --log <LOG_FILTER>
          Log level filter for the console's internal diagnostics.
          
          Logs are written to a new file at the path given by the
          `--log-dir` argument (or its default value), or to the system
          journal if `systemd-journald` support is enabled.
          
          If this is set to 'off' or is not set, no logs will be
          written.
          
          [default: off]
          
          [env: RUST_LOG=]

  -W, --warn <WARNINGS>...
          Enable lint warnings.
          
          This is a comma-separated list of warnings to enable.
          
          Each warning is specified by its name, which is one of:
          
          * `self-wakes` -- Warns when a task wakes itself more than a
          certain percentage of its total wakeups. Default percentage is
          50%.
          
          * `lost-waker` -- Warns when a task is dropped without being
          woken.
          
          * `never-yielded` -- Warns when a task has never yielded.
          
          [default: self-wakes lost-waker never-yielded]
          [possible values: self-wakes, lost-waker, never-yielded]

  -A, --allow <ALLOW_WARNINGS>...
          Allow lint warnings.
          
          This is a comma-separated list of warnings to allow.
          
          Each warning is specified by its name, which is one of:
          
          * `self-wakes` -- Warns when a task wakes itself more than a
          certain percentage of its total wakeups. Default percentage is
          50%.
          
          * `lost-waker` -- Warns when a task is dropped without being
          woken.
          
          * `never-yielded` -- Warns when a task has never yielded.
          
          If this is set to `all`, all warnings are allowed.
          
          [possible values: all, self-wakes, lost-waker, never-yielded]

      --log-dir <LOG_DIRECTORY>
          Path to a directory to write the console's internal logs to.
          
          [default: /tmp/tokio-console/logs]

      --lang <LANG>
          Overrides the terminal's default language
          
          [env: LANG=en_US.UTF-8]

      --ascii-only <ASCII_ONLY>
          Explicitly use only ASCII characters
          
          [possible values: true, false]

      --no-colors
          Disable ANSI colors entirely

      --colorterm <truecolor>
          Overrides the value of the `COLORTERM` environment variable.
          
          If this is set to `24bit` or `truecolor`, 24-bit RGB color
          support will be enabled.
          
          [env: COLORTERM=truecolor]
          [possible values: 24bit, truecolor]

      --palette <PALETTE>
          Explicitly set which color palette to use
          
          [possible values: 8, 16, 256, all, off]

      --no-duration-colors <COLOR_DURATIONS>
          Disable color-coding for duration units
          
          [possible values: true, false]

      --no-terminated-colors <COLOR_TERMINATED>
          Disable color-coding for terminated tasks
          
          [possible values: true, false]

      --retain-for <RETAIN_FOR>
          How long to continue displaying completed tasks and dropped
          resources after they have been closed.
          
          This accepts either a duration, parsed as a combination of
          time spans (such as `5days 2min 2s`), or `none` to disable
          removing completed tasks and dropped resources.
          
          Each time span is an integer number followed by a suffix.
          Supported suffixes are:
          
          * `nsec`, `ns` -- nanoseconds
          
          * `usec`, `us` -- microseconds
          
          * `msec`, `ms` -- milliseconds
          
          * `seconds`, `second`, `sec`, `s`
          
          * `minutes`, `minute`, `min`, `m`
          
          * `hours`, `hour`, `hr`, `h`
          
          * `days`, `day`, `d`
          
          * `weeks`, `week`, `w`
          
          * `months`, `month`, `M` -- defined as 30.44 days
          
          * `years`, `year`, `y` -- defined as 365.25 days
          
          [default: 6s]

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

for development

the console-subscriber/examples directory contains some potentially useful tools:

  • app.rs: a very simple example program that spawns a bunch of tasks in a loop forever
  • dump.rs: a simple CLI program that dumps the data stream from a Tasks server

Examples can be executed with:

cargo run --example $name

console's People

Contributors

asonix avatar dependabot[bot] avatar gnieto avatar grahamking avatar guerinoni avatar guswynn avatar hawkw avatar hds avatar hi-rustin avatar joshka avatar luciofranco avatar maaxxs avatar matteonardi avatar maximeborges avatar mbiggio avatar memoryruins avatar michealkeines avatar milo123459 avatar noah-kennedy avatar nrskt avatar oguzbilgener avatar pnkfelix avatar rukai avatar seanmonstar avatar striezel avatar taiki-e avatar thombles avatar yerke avatar ymgyt avatar zaharidichev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

console's Issues

console: figure out parent/child spans should be displayed

While #9 is more focused on the data collection side of things, this issue is concerned with how parent/child relationships should be rendered in the console. Possible approaches:

  • @hawkw suggests that something like htop's tree mode would be useful.
  • being able to dynamically highlight an individual task and view its tree would be also helpful, as @tobz suggested.

console: task view: show poll time percentiles

This depends on #36. Once the subscriber is sending the poll times as a histogram, the task instance view (the one viewing just a single task in more detail) should include the percentiles of the poll times, at least p50 and p99 (perhaps p25, and p90 are also useful?).

Instrument memory allocations

Instrument memory allocations and associate them to a span. It is unclear exactly how to do this yet, but it is worth experimentation.

Data to capture:

  • Number of allocations
  • Number of releases
  • Allocation sizes
  • Total amount of memory allocated.

Standardize on span naming

The prototype currently just looks for a span named tokio::task::spawn, but if want to work for any "runtime"-like thingy, we probably should have a more generic name. It still needs to be unique enough that a user would be very unlikely to create a span with the same name.

Besides task spawning, we also need to do this for wakers and resources too.

Pausing the console should pause expiring stopped tasks

If a user pauses the Console, the stopped tasks should not continue to tick away when they are expired. This currently makes it hard to inspect a stopped task, unless you scroll to it super quickly.

The fix is simply to add a check to if the current temporality is paused when ticking in retain for the tasks.

First class notion of runtimes

A Tokio application is capable of having multiple independent runtimes. I believe this may be the case for other runtime implementations as well (e.g., I believe rayon allows creating multiple separate threadpools). This may also be useful when a crate uses multiple async runtimes provided by different libraries (e.g. running async-stds global runtime in a bg thread).

Currently, the console has no way of being aware of this --- we just see a lot of tasks, without any way of determining which runtime a task is running on. It would be nice to be able to tag tasks with runtimes. In the future, when we add runtime-global stats/metrics, we will also want a way to uniquely identify runtimes.

As a first pass, we could consider adding a field to the task spans in Tokio with some kind of runtime ID (could just increment an integer every time Runtime::new is called), and add that to the task spans when spawning. This would at least let users distinguish between runtimes by looking at the task's fields. Tokio could also add a new (unstable) builder method to allow naming the runtime itself, as well as naming its worker threads.

Building on that, we could make the console subscriber actually aware of this field, and allow associating tasks with runtimes. We could consider adding a new RPC to list runtimes and/or get per-runtime stats.

Instrument Tokio Resources

Part of #39

Each "resource" in Tokio should be instrumented with a tracing span that is created when the resource is constructed. The span should be entered on the relevant poll_* methods of that resource (perhaps with an inner span identifying that specific method/action). The resources would probably have spans with different metadata, so the optimization used in the subscriber to store the metadata pointer may not work. The span name should be unique enough, but probably also the same among all the resources, with the specific kind being a structured field, so that adding new resource types doesn't require modifying the Subscriber to support them. It also allows other runtimes or engines to define their own resource types unknown to us.

Once those details are proposed, we could start with a simpler one, like timers, to prove it out (and easily view them with the example app in this repo, which is based around timers).

Create a view of Resources

As we instrument resources (sockets, files, timers, etc) (#39), we want a view to visualize them, seeing related attributes and states, such as what task they currently "belong" to.

Replay a recording

Now that there is an option to record all events to a file (#84), the next step is to update the console application to be able to read and replay that recording as if it were live. (Linked section has more details). I imagine the way doing this would be to make the console use the console_subscriber library as a dependency

  1. Implement parsing of the recording.
  2. When parsing an event's timestamp, the parsing task would tokio::timer::sleep(timestamp - last_timestamp).await, so as to simulate the correct timing.
  3. As events are recorded, simulate the emitting of the tracing events, but without actually using tracing. The events could be converted into the console_subscriber::Event enum, and then send it to the aggregator.
  4. This would assume then that the console_subscriber::Aggregator server has been spawned. While this isn't strictly necessary, since within the same process we could skip the gRPC step, it might be easier to start with.
  5. The console UI would show REPLAY (${recordingName/ID}) instead of CONNECTED (127.0.0.1:3339).

The trait bound `TasksRequest: prost::message::Message` is not satisfied

I'm on a Mac M1 and fixed the #62 in my local fork of prost.

I then hacked tonic to point to my local fork of of prost and my local fork of console to point to local fork of tonic. Still, I'm getting this build error.

What's the correct way to set up for development?

Do I edit Cargo.toml all over the place and edit the dependencies, then clean it all up before packaging?

   Compiling hyper-timeout v0.4.1
   Compiling tonic v0.5.0 (/Users/joelr/Work/Rust/tonic/tonic)
error[E0277]: the trait bound `TasksRequest: prost::message::Message` is not satisfied
   --> /Users/joelr/Work/Rust/console/target/debug/build/console-api-d4acf0bef599edb6/out/rs.tokio.console.tasks.rs:157:2649
    |
157 | ...ks") ; self . inner . server_streaming (request . into_request () , path , codec) . await } pub async fn watch_task_details (& mut sel...
    |                          ^^^^^^^^^^^^^^^^ the trait `prost::message::Message` is not implemented for `TasksRequest`
    |
    = note: required because of the requirements on the impl of `Codec` for `ProstCodec<TasksRequest, _>`

subscriber: record a histogram of wake-to-poll times

A way for someone to better understand how the runtime itself is behaving is to record a history of how long it takes a task to be polled after it was been woken. We could either record this information per task, or on a whole. I think as a whole is probably sufficient, and would mean less data needs to be sent per task (see #47 (comment)). But if there's a very helpful use case for knowing this data per task, we could do that.

Meta: Warnings

The statistics of tasks shown in the console is already an awesome tool to help understand what you tasks are doing. With the info shown, it's possible to diagnose when some tasks are exhibiting a bad behavior. However, knowing what to look for is a skill in itself. Since we (the maintainers) already know some things to look for, we can codify them into warnings that the Console explicitly points out.

This proposes adding a "warnings" system that:

  • Builds a framework of defining warnings, in spirit to the way rustc defines compiler lints.
  • When a warning is detected on a task, a symbol will appear in the new "warnings" column for the task's row.
  • When viewing the details of that task, a list of the warnings detected and words explaining them will be shown.
  • Perhaps an additional view of "only warnings" found by the Console.

(Should these be "lints" instead of "warnings", since they may have different "levels"? For instance, some might be serious no matter what, while others may just be hints.)

Here's some examples of warnings that we can show with a smaller amount of effort. It would also be wonderful to eventually include more complex ones, like deadlock detection, but that will require better instrumented resources (see #39) and more developer time.

Poll time over some maximum

If the time to poll a task is over some max (10us?), flag task with a warning. The task is either hitting some blocking code (even if sporadically), or is otherwise doing too much per poll.

This maximum should probably be configurable as a CLI arg to the console.

(This is only reliable if the app is compiled in --release mode. It'd help if the subscriber could somehow detect if it's been compiled with optimizations...)

Lost Waker

If after a poll, the waker count is 0, and a wake isn't scheduled, then the task exhibits a "lost waker" problem.

It could be that the task is "part of" a bigger one, thanks to select or join, and thus masking the issue, but it's still worth showing the warning label.

Wake-to-Poll time over some maximum

This might be useful to diagnose the system in general. For instance, assuming the "poll time maximum" warning isn't being triggered, it could show a system that has too many tasks and is starting to degrade in performance.

Task Panicked

A task panicking is always exceptional, and should show a warning.

Potential deadlock when using `console_subscriber`

Platform

x86_64 GNU/Linux

Description

Spawning multiple instrumented futures along with the console_subscriber consistently results in the program hanging, with 0% CPU usage – looks like a deadlock. It could of course be that I'm doing something completely wrong!

We're spawning multiple tasks in the same location, with the following snippet:

#[track_caller]
fn spawn<W, G, F>(&mut self, g: G)
where
    W: Worker<Self>,
    G: FnOnce(oneshot::Receiver<()>) -> F,
    F: Future<Output = ()> + Send + 'static,
{
    let (tx, rx) = oneshot::channel();
    let future = g(rx);

    let caller = Location::caller();
    let span = tracing::info_span!(
        target: "tokio::task", 
        "task", 
        file = caller.file(), 
        line = caller.line(),
    );

    let task = tokio::spawn(future.instrument(span))

    self.tasks
        .entry(TypeId::of::<W>())
        .or_default()
        .push((tx, Box::new(task)));
}

The following code is used to create the subscriber. If the TasksLayer is never constructed, and layer never passed to the tracing_subscriber, the code runs as expected:

let (layer, server) = console_subscriber::TasksLayer::new();

let filter = tracing_subscriber::EnvFilter::from_default_env()
    .add_directive(tracing::Level::INFO.into())
    .add_directive("tokio=info".parse().unwrap());

tracing_subscriber::registry()
    .with(tracing_subscriber::fmt::layer())
    .with(filter)
    .with(layer)
    .init();

let serve = tokio::spawn(async move { server.serve().await });

// ----------------

let _ = tokio::try_join!(tokio__spawn(node.run()), serve);

[Question] How to attach custom info to task?

We have a wrapper for tokio::spawn, then most of our code uses this wrapper instead of vanilla tokio::spawn, this makes the console's spawn_location always same.

Is there someway to attach extra info to the task's key values?
I can use record for span, seems TasksLayer didn't impl record method.

Pretty task IDs

Currently, task IDs can be big:

We could use an AtomicUsize to generate incrementing task IDs that start off shorter.

Meta: Record, replay, rewind

The current console allows a nice live view of the application that it is monitoring. But sometimes things happen quickly, and a user may not have been able to react to inspect the data they wanted. We want to be able to go back in time, and have the Console show those events again, even being able to pause.

We've written a proper design document exploring this feature set here: https://hackmd.io/6xPTpWK5RGO_eGygI9QEPA?view

Actions based on the design document:

  • #85
  • #84
  • #96
  • Interactive rewind (step back)
  • Watch a value for a change or to equal an expression

console: sort task list

right now, in the top view, it's not possible to sort the tasks by anything --- they are stored in a hashmap, and rendered in iteration order.

we should probably find a nice way to make them sortable, and (eventually) let the user control which key they're sorted by, like in real top(1). it's useful to store the tasks in a hashmap for applying updates, since we can index a particular task and update its stats, or quickly remove them. ideally, we should find a way to sort them without having to copy from the hashmap into a vec every time we render the list of tasks.

one thing we could do is make the actual task data refcounted, the actual task data, and then construct btrees for every sortable parameter as different sorted views of the task list, so that every sort is ready to go when the user selects it...

console code assumes `sorted_tasks` is positive

I've been having problems connecting to my instrumented app, or getting the instrumentation working. That's not directly what this bug is about.

As a consequence of my problems, I'm hitting arithmetic overflow panics from the console, e.g.:

connection: http://127.0.0.1:6669/ (CONNECTED)
The application panicked (crashed).
Message:  attempt to subtract with overflow
Location: console/src/view/tasks.rs:264

I got the above pretty reliably by opening the console (via cargo run, a debug build, so arithmetic overflow is checked), it failing to connect, and I, while sitting there, hit the down arrow and then the up arrow. (Down down and up up also do the trick; I suspect any pair will suffice.)

I looked up the relevant code, and yep, I can believe that panics:

if i >= self.sorted_tasks.len() - 1 {

Overall I'm seeing evidence that the code thinks there should be some invariant connecting the indexes traversed by self.table_state.selected() to the length of sorted_tasks. But it also seems like when there are no tasks, that invariant does not hold.

console: handle disconnection more gracefully

right now, if the console is connected to a server, and the server disconnects abruptly, we immediately crash with a big error message:

:; cargo run
   Compiling console-subscriber v0.1.0 (/home/eliza/Code/consolation/console-subscriber)
    Finished dev [unoptimized + debuginfo] target(s) in 0.69s
     Running `target/debug/console`
using default address (http://127.0.0.1:6669)
Error:
   0: status: Unknown, message: "h2 protocol error: broken pipe", details: [], metadata: MetadataMap { headers: {} }

Location:
   console/src/main.rs:52

  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
                                ⋮ 5 frames hidden ⋮
   6: console::main::{{closure}}::h0f02eae7993e30df
      at /home/eliza/Code/consolation/console/src/main.rs:52
   7: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h89be4d09d64d3ea1
      at /home/eliza/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/future/mod.rs:80
   8: tokio::park::thread::CachedParkThread::block_on::{{closure}}::h8f240001a5901c1b
      at /home/eliza/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.5.0/src/park/thread.rs:263
   9: tokio::coop::with_budget::{{closure}}::h3c95765523460448
      at /home/eliza/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.5.0/src/coop.rs:106
  10: std::thread::local::LocalKey<T>::try_with::hab4856318f2607b4
      at /home/eliza/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:272
  11: std::thread::local::LocalKey<T>::with::h3d4d9732c38fcac6
      at /home/eliza/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:248
  12: tokio::coop::with_budget::h54da439ec06757dd
      at /home/eliza/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.5.0/src/coop.rs:99
  13: tokio::coop::budget::h9ae2f64aafef4b99
      at /home/eliza/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.5.0/src/coop.rs:76
  14: tokio::park::thread::CachedParkThread::block_on::h045a83290eb58c9b
      at /home/eliza/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.5.0/src/park/thread.rs:263
  15: tokio::runtime::enter::Enter::block_on::hdddd13bfd62fbbcb
      at /home/eliza/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.5.0/src/runtime/enter.rs:151
  16: tokio::runtime::thread_pool::ThreadPool::block_on::h7920c3eacc3339a7
      at /home/eliza/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.5.0/src/runtime/thread_pool/mod.rs:71
  17: tokio::runtime::Runtime::block_on::h678a88f6df65c487
      at /home/eliza/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.5.0/src/runtime/mod.rs:452
  18: console::main::hdbffd1488c60b5c6
      at /home/eliza/Code/consolation/console/src/main.rs:17
  19: core::ops::function::FnOnce::call_once::hf4e9e56d9a430675
      at /home/eliza/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227
                                ⋮ 11 frames hidden ⋮

Run with COLORBT_SHOW_HIDDEN=1 environment variable to disable frame filtering.
Run with RUST_BACKTRACE=full to include source snippets.

(this is after i ctrl-c'd the example app).

We should probably handle connection errors a little more gracefully --- if the user is debugging something when the app crashes or there's a network issue, it's almost certainly better to continue showing the last state and add a message saying "hey, the connection to the remote ended abruptly, here's why".

Eventually, it would also be good to try to reconnect with a backoff period, so that we can recover from transient network issues.

license

Is there an intended license for this repo? I wasn't able to find one listed, but wanted to check before looking further into it.

Great readme and asciinema; already looks like an exciting project!

Name or classify tasks spawned by the subscriber runtime

The subscriber starts its own runtime and server, and those tasks are instrumented and shown in the Console too. That is actually useful in some situations. But it also can be distracting. It would help if we could name or classify any tasks in that runtime to help a user viewing the Console, so they understand "oh, those aren't my tasks".

wish: more distinguished units on numbers

Its great that we're using units on our values so that we don't have to fill up too many columns when people will only want 2 or 3 sig figs of info per value.

But: I worry that its too easy to overlook the difference when you see "21ms 34μs" in a column or even "21s 3ms"

What are good ways to mitigate this? Main ideas I can think of:

  1. When sorting by column, add blank lines between rows when the unit changes. (The main drawbacks here: it wastes space on aforementioned blank lines. Also, it only works for sorting by the column where confusion occurs, which means people can still overlook it when it happens in an unsorted column.)
  2. Use colors to distinguish the units. I.e. color s with red, ms with yellow, μs with green.

Build fails on Mac M1

   Compiling h2 v0.3.3
error: failed to run custom build command for `console-api v0.1.0 (/Users/joelr/Work/Rust/console/console-api)`

Caused by:
  process didn't exit successfully: `/Users/joelr/Work/Rust/console/target/debug/build/console-api-509b6171e0106590/build-script-build` (exit status: 1)
  --- stderr
  Error: Custom { kind: Other, error: "failed to invoke protoc (hint: https://docs.rs/prost-build/#sourcing-protoc): Bad CPU type in executable (os error 86)" }
warning: build failed, waiting for other jobs to finish...
error: build failed

UI design proposal

Hi everyone,

I've got an initial redesign prototype here: https://github.com/pcwalton/turbowish-mocks

Screen Shot 2021-05-12 at 3 27 45 PM

The layout is done using the stretch implementation of flexbox, because I needed more power than the tui interface to Cassowary provided. Emphasis has been place on ease of use and discoverability, with the "obvious" keyboard focus and mouse intended to be supported. The design is responsive, in that it scales down to narrow 80-column terminals fine. The custom widgets are designed in a reusable way—when it's ready, I want to just be able to copy and paste this code in instead of rewriting it.

My next goals are:

  • Iterate on feedback!

  • Update to match the current tokio-console/turbowish functionality.

  • Have a fallback to standard Unicode with common characters when the appropriate icon fonts (nerd fonts/powerline fonts) aren't installed. I think it should look fine without them: for the most part, some of the icons will just disappear and/or become text.

  • Get this integrated with the app, once folks are satisfied with it.

Feedback is very welcome! This is early and I'm extremely open to suggestions and comments.

subscriber: capture parent contexts

when a new task is spawned ,the TasksLayer in console-subscriber should capture the parent span context in which the task was spawned. we'll need to send the parent spans' data to the client as well, if we haven't already.

subscriber/tasks proto: capture spans *inside* spawned tasks

Right now, the TasksLayer in console-subscriber only captures the span attached to every spawned task by Tokio's own tracing integration. these spans are useful, but they lack a lot of the user-generated data that's provided by the user's spans. If the user instruments the future passed to tokio::spawn, for example, we should be able to recognize that this span is also part of the task, and record its data as well.

The primary motivation for implementing a feature like this is that most of the mocks and designs that people have come up with so far show user-provided names and fields for tasks in the console. However, we can't capture that if we are only looking at the per-task spans generated by the runtime (e.g. tokio's task span).

Need some way to enable "tokio=trace" filter for the console Layer only

First off...this is super cool, thanks for creating it!

The requirement to enable trace-level events for tokio to use this console means that other layers also receive those events, which makes a fmt Layer pretty unusable without out-of-band filtering.

Per tokio-rs/tracing#302, it looks like per-layer filtering does not yet exist. Before I'd discovered that I'd tried combining separate EnvFilters with a fmt Layer and a console Layer using Layer::and_then, but this clearly doesn't work as expected.

Maybe there's another workaround I haven't thought of?

subscriber: record and send a histogram of Task poll times

We currently send the total elapsed time that a task spends inside its poll function, but it'd be helpful to also include more statistics. This would help to see average poll time, which would make it easier to identify tasks that are "blocking" too long in their poll functions. It would also help to identify when only sometimes a poll takes too long. And we could eventually use this data in the console to show a warning of tasks with poll times that are too high.

Using a histogram to record this seems like a good balance, since a simple average can lose the spikiness of some polls. I imagine we can use something like hdrhistogram.

console freezes in `CONNECTING` state when there are too many tasks

console freezes in CONNECTING state to this application.

use std::time::Duration;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
    console_subscriber::init();

    let futures = (0..10/* A smaller number like 5 was Ok */)
        .map(|_| {
            tokio::spawn(async {
                loop {
                    tokio::time::sleep(Duration::from_millis(
                        10, /* A larger number like 1000 was Ok */
                    ))
                    .await;
                }
            })
        })
        .collect::<Vec<_>>();

    for f in futures {
        f.await?;
    }

    Ok(())
}

image

As commented above, it's Ok if the number of spawned futures is small or sleep duration is large.

subscriber: emit data about Waker events

Next step after #37.

We want to detect a few things with regards to the Wakers related to a Task:

  • Time from wake to poll
  • If poll returns Pending but the Waker wasn't cloned.
    • While not fool proof, this could be a good-enough heuristic to detect a "forgot to wake".
  • If wake was called from a different thread, or same thread.
  • If a wake is called from within poll
    • While not wrong, could be a sign to help someone notice a task that "busy loops".

Blocking task considered idle

The following program will spawn a task that is blocking the thread, preventing shutdown of the runtime. However, if you connect a console to the program, then the task is listed as idle.

use std::time::Duration;

#[tokio::main]
async fn main() {
    use tracing_subscriber::{prelude::*, EnvFilter};
    let (layer, server) = console_subscriber::TasksLayer::new();
    let filter = EnvFilter::from_default_env().add_directive("tokio=trace".parse().unwrap());

    tracing_subscriber::registry()
        .with(tracing_subscriber::fmt::layer())
        .with(filter)
        .with(layer)
        .init();

    std::thread::spawn(move || {
        // Use extra runtime for tokio-console server to avoid it being killed
        // when main returns.
        let rt = tokio::runtime::Builder::new_current_thread()
            .enable_all()
            .build()
            .unwrap();
        rt.block_on(server.serve()).unwrap();
    });

    tokio::spawn(async move {
        for _ in 0..100 {
            println!("hello");
            std::thread::sleep(Duration::from_millis(500));
        }
    });

    tokio::time::sleep(Duration::from_secs(1)).await;
    println!("exit from main");
}

console: filter tasks list

It would be nice to be able to filter which tasks are displayed in the tasks list.

Some ways we might want to filter the tasks list:

  • By name (with a regex?)
  • By target (with a regex?)
  • By fields:
    • by the presence of a field with a given name
    • by the value of a field with a given name (regex?)

Since we would want to allow filtering on different columns in the task list, we would need some UI for selecting which column is being filtered on, as well as for writing the actual filter. Here's a simple proposal:

  • When the user presses a particular key, a text entry box is opened to type in a filter pattern
  • Using the arrow keys in the filter entry mode selects which column is filtered on, rather than which column sorts the task list
  • pressing some other key (enter?) adds the filter, and returns the user to the mode where the arrow keys select the column to sort by

Questions:

  • Do we want to support multiple filters? We probably do, but that complicates the UI
    • Opening the filter entry box again would then create a new filter, rather than editing the current filter
    • There would need to be some way to edit an existing filter as well
    • Perhaps we should add a UI box that lists all the current filters?
    • Some kind of command for clearing all filters
  • How to combine patterns in filters? e.g. users might want to have OR filters or AND filters...

Most of these questions are things we can punt on for an MVP implementation, especially if we only allow a single filter at a time. But, we'll want to figure them out eventually.

UI inspiration:

  • htop (which only allows filtering on one column, the process's name):
    image
  • atuin history search:
    image

Work out-of-the-box

Hey,

This looks absolutely amazing and I definitely want to use this on all my apps. However, I feel like this should be a tool that works out of the box, ie, any asynchronous code (tokio::spawn, etc) should be displayed without having to create tasks.

This is my personal opinion, but I'm open to suggestions, and discussions!

subscriber: provide a simpler 'init' function

The current way to setup the subscriber in an app is flexible, but complex. It'd be useful to have a simple console_subscriber::init() function that does the most common thing for you. Notably:

  • Builds the TaskLayer with defaults.
  • Inits a tracing registry and adds the layer.
  • Starts a new thread to run a single-threaded Tokio runtime.
  • Spawns the server portion of the task layer into that runtime.

Truncate certain fields in the UI.

Certain fields that are displayed by the UI -- currently/specifically spawn.location -- can be exceedingly long to the point where most of the value is effectively noise, or even worse, it pushes other fields out of the viewport entirely.

There should be a way to configure some sort of transformation for field data such that we can describe exactly how we want that field to look in the main UI view, vs the raw value itself.

As an example, right now spawn.location shows the file/line combination for where a task was spawned from. When the spawn location is in a file relative to the binary being instrumented, you typically end up with a relatively short string, such as src/foo/bar/mod.rs:32. When the spawn location is in another crate, however, such as a dependency pulled in by Cargo, you end up with the absolute path, such as /Users/tobz/.cargo/github.com-d121af1a167ef/hyper-0.14.11/src/common/exec.rs:12. Half of that path is effectively useless information: all we really need to know that it came from Hyper is to be able to see hyper-0.14.11/src/common/exec.rs:12.

As an example of how we might deal with this field specifically, the proposed transformation code may split the path such that it looks for common ancestors -- such as .cargo -- and removes everything before the crate name. It could, alternatively or in addition, also collapse the string in a manner similar to Java's compressed classpaths in stack traces: changing org.foo.bar.something.MyClass to o.f.b.something.MyClass, where we might collapse the aforementioned absolute path to /U/t/./g/h/src/common/exec.rs.

Instrument block_on

The task used for block_on, both when using #[tokio::main] and when using Runtime::block_on or Handle::block_on, isn't instrumented and thus shown in the Console. It'd be good to do so.

We could either consider it a form of spawn, thus using runtime::spawn for the span and setting the kind=block_on or kind=main. Or, we could call it runtime::block_on, but then we'd need to add support to the Console to look for that new span.

Meta: Resources

This is more of a meta issue about the concept of Resources. The Console will likely have two main concepts, Tasks and Resources (there could probably be more, smaller ones). Besides knowing about all the tasks that have been spawned in the runtime, it's also helpful to understand what resources that task is interacting with, waiting on, moving around, and dropping. Resources could be sockets, files, timers, channels, mutexes, semaphores, and the like.

When inspect a task in more detail, we want to be able to view what resources the task has been touching, which it is waiting on, and if possible, the relationship with another task that could be unblocked by this one acting on the resource (sending a channel, unlocking a mutex, etc).

We also would like a resources view, similar to the tasks view, which can show us a list of the resources that currently are "alive", what their state is, and what tasks are waiting on them. And then just like tasks, a way to inspect an individual resource for more detail.

This will require a few steps (hence why this is a meta issue):

  • Defining the names of the spans and/or structured fields to identify resources. (tokio-rs/tokio#3954)
  • Instrumenting Tokio's "resource" types to create a span when constructed, and to enter the span in their respective poll_* methods. (#40)
  • Collecting this data in the Subscriber.
  • #63

console: aggregate metrics from the datastream

it would be cool if the console could also aggregate metrics from the incoming data and compute percentiles of stats like scheduler latency, active tasks, and poll/idle durations over a rolling time window. we could display histograms or sparklines in the UI, as well as just displaying numeric values; tui has sparklines, bar charts and line charts that might be useful...

Consider option to disable expiring of stopped tasks

In applications with short-lived tasks, it may be desirable to always see all finished tasks. Or at least be able to specify as such when running the console app. (Does this seem like a feature anyone else would want?)

To do this, we would add another command line option to the console in console/src/config.rs, set the value in console/src/tasks.rs, and check it before calling retain.

suggestion: timeline views

A helpful capacity in diagnostic tools like this is to be able to switch, while exploring the data, between the aggregate view (top, histogram, etc) and a sequential/timeline view that shows, for example, the scheduling events of {all,selected} {tasks,resources}, possibly with the option to interleave trace messages or similar user event annotations.

Often there's a bit of back and forth between these, like "find problem task in aggregate view, switch to timeline of task events, find problem resource, switch to timeline of resource, switch to timeline of all tasks contending on that resource"

Instrument Tokio Wakers

We want to detect a few things with regards to the Wakers related to a Task:

  • Time from wake to poll
  • If poll returns Pending but the Waker wasn't cloned.
    • While not fool proof, this could be a good-enough heuristic to detect a "forgot to wake".
  • If wake was called from a different thread, or same thread.
  • If a wake is called from within poll
    • While not wrong, could be a sign to help someone notice a task that "busy loops".

So, we'll need to instrument Tokio's wakers: emitting events for wake, wake_by_ref, clone, and drop. A couple notes about implementing:

  • These are tracing events, not spans.
  • We should pick some unique names, so that the subscriber can identify them.
  • We need a way to map a waker to the task ID.
    • The current "task ID" used by the subscriber is the span ID of the instrumented spawn.
    • We could possibly have Tokio's runtime record an actual task ID once the task is stored in the scheduler, something that the waker can access too.

Sortable/prioritizable task fields in UI.

Currently, tasks are implicitly/explicitly sorted in ascending alphabetical order. This means that, for example, you might see these fields from left to right: kind, spawn.location, and task.name. When these fields are relatively short/concise, they are all visible to the user and things should be relatively nice to look at. However, when some of these fields are longer -- right now, specifically, spawn.location, it can push others far off to the side, where a more meaningful field might be pushed off the viewport entirely.

Empirically, this can be seen in pnkfelix's demo of the console: https://www.youtube.com/watch?v=JGCewPUvF70. The task.name field is often times pushed entirely out of the viewport, or truncated, due to the length of spawn.location.

It would be useful to be able to either provide some sort of extra sorting or prioritization capabilities such that more "useful" fields, like an assigned task name, could come before spawn.location, or so a less important field could come later, and so on.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.