Comments (10)
So I've got this mostly done, but I've hit a rather fundamental issue with how nextest-runner captures output, namely that it is split into 2 separate buffers for stdout and stderr. This makes it impossible to properly write the captured output to the json object in the same way that libtest does.
Concretely, lets say we have this test.
#[test]
fn fails() {
println!("this is stdout");
eprintln!("this is stderr");
panic!("fiddlesticks");
}
The libtest output with cargo +nightly test --no-fail-fast -- -Z unstable-options --format json --report-time --ensure-time
for the test is
{ "type": "test", "name": "fails", "event": "failed", "exec_time": 0.009125365, "stdout": "this is stdout\nthis is stderr\nthread 'fails' panicked at src/lib.rs:34:5:\nfiddlesticks\nstack backtrace:\n 0: rust_begin_unwind\n at /rustc/aa1a71e9e90f6eb3aed8cf79fc80bea304c17ecb/library/std/src/panicking.rs:597:5\n 1: core::panicking::panic_fmt\n at /rustc/aa1a71e9e90f6eb3aed8cf79fc80bea304c17ecb/library/core/src/panicking.rs:72:14\n 2: myplayground::fails\n at ./src/lib.rs:34:5\n 3: myplayground::fails::{{closure}}\n at ./src/lib.rs:30:11\n 4: core::ops::function::FnOnce::call_once\n at /rustc/aa1a71e9e90f6eb3aed8cf79fc80bea304c17ecb/library/core/src/ops/function.rs:250:5\n 5: core::ops::function::FnOnce::call_once\n at /rustc/aa1a71e9e90f6eb3aed8cf79fc80bea304c17ecb/library/core/src/ops/function.rs:250:5\nnote: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.\n" }
The stdout
field is a misnomer, as really it is all captured output, from both stdout and stderr, and crucially, it is the output as if it was sent to a terminal, ie, in the order it was written.
I think the proper way to fix this would be to have a capture structure in nextest-runner that keeps a single buffer containing the combination of both stdout and stderr, and then an array that contains the ranges of bytes for each output block, the stream it came from, and the timestamp (or duration from start of capture), so that the output can be captured faithfully for this case (and others?), and then simple helpers that can reconstruct the individual streams as they are currently used in nextest, which shouldn't be much of a problem since the streams are currently just byte buffers, and the cases where they are interpreted as utf-8 strings already take the hit of (potentially) allocating by using the _lossy
forms of byte buffer -> utf-8 strings. Having the timing information for each individual block (usually but not always a single line) would also make implementing a future message-format that includes such timing information as structured data quite a bit easier.
I can submit what I have now in a PR first, or I could do the described work in a separate PR first? Let me know how you want me to proceed.
from nextest.
I've opened #1088 which adds support for interleaved test output that can have each stream recovered separately to satisfy all of the current code.
from nextest.
This is something that we'd still like to do but it's just the work that needs to be done.
Help wanted: If you're interested in tackling this project, here's the general set of things we need to do:
- Define a format with data structures. The format should carefully be designed to allow future extensibility (e.g. enums should be open universes for forward compatibility).
- Add that format to
nextest-metadata
. - Make
nextest-runner
produce output in that format. - Carefully work through backwards compatibility concerns: if we need to make a change in the future, how would it be done?
- Add tests for the format: probably can migrate the nextest integration tests to the new format.
- Add tests for forward compatibility as well -- need to make sure that future additions don't break the format.
- Write documentation for the format.
- Stretch goal: Make
nextest-metadata
able to execute cargo nextest run with that format.
I'd estimate this to be around 80 hours of work for a newcomer to the project -- it's a chunky, nontrivial project but totally doable for someone new and interested.
from nextest.
We're interested in doing this work so that we can feed structured test results to this. (Note that while it supports JUnit which nextest can already produce that option has limitations that make it less appealing).
Before I started work on this, I wanted to run the idea I have for the initial implementation by you first before I do work that then gets rejected.
Basically the idea would be to have a --message-format libtest-json --format-version <n>
where n is a serial incrementing number (the same --format-version <n>
option already exists for cargo metadata
) that follows the current structured output of nightly so that users can specify a particular revision of the format for stability, even in the face of breaking changes. This would allow tools that understand the (current) nightly output to be integrated with nextest, similar to the rationale for supporting JUnit.
This adds a maintenance burden as breaking changes to the nightly output would need to be reflected in a new format-version in nextest, but I haven't actually looked at how often the format has changed historically to know if this is a major concern or not. This is mitigated by the ability to pin to a specific version so IMO it's not a huge deal, especially if the code is done in the right way, and thus hopefully fairly easy for contributors who need the latest and greatest format output to add support for them without too much trouble.
This would also allow nextest to define its own format separately that would be unaffected by libtest decisions.
from nextest.
We're interested in doing this work so that we can feed structured test results to this. (Note that while it supports JUnit which nextest can already produce that option has limitations that make it less appealing).
Yay, super excited about this!
Could you document the limitations of JUnit? It wasn't entirely clear to me what they were from reading the documentation.
For reference, buildkite uses this test collector.
Basically the idea would be to have a --message-format libtest-json --format-version where n is a serial incrementing number (the same --format-version option already exists for cargo metadata).
The idea of supporting libtest JSON is definitely interesting. Is this an accurate read of the format?
- My immediate concern is that it doesn't look like there's any space for the name of the test binary (what gets reflected in the
testsuite
in JUnit), which I think is important. Looking at this definition, buildkite doesn't appear to be able to take in the name of the binary either. - There doesn't appear to be support for retries, timeouts, slow tests or other similar events that are a part of nextest but not of cargo test.
So what this suggests is that there have to be changes made to the buildkite collector, and honestly to anything else that consumes libtest JSON events, to support nextest.
But at that point why don't we just define our own format?
- We can draw inspiration from the libtest format, but there are so many additional features that nextest supports that aren't representable in the libtest format.
- It also seems like writing a buildkite test collector isn't very difficult -- maybe there should just be a separate buildkite test collector for nextest.
Some other thoughts if we go down the libtest-json emulation path:
- Users may want to mix Cargo JSON output with nextest JSON output, so we should have a
reason
field with the valuenextest-libtest-json
attached to each message. - We need to also be able to say "output messages to stdout while rendering human-readable messages to stderr". With cargo that's done with
--message-format json-render-diagnostics
, and we'll want to support something similar.
--format-version
sounds good, though I'd prefer something like 0.1
to indicate that this is not yet a stable format.
from nextest.
I think before we decide on the right path here, I'd like to see a more fleshed out proposal with how to handle retries and timeouts. I'd like to understand the limits of what can be supported within the libtest format.
from nextest.
From here:
While most test frameworks have a built-in JUnit XML export feature, these JUnit reports do not provide detailed span information. Therefore, features in Test Analytics that depend on span information aren't available when using JUnit as a data source. If you need span information, consider using the JSON import API instead.
The idea of supporting libtest JSON is definitely interesting. Is rust-lang/rust#49359 (comment) an accurate read of the format?
Yes, this is the output from rustc 1.75.0-nightly (cd674d617 2023-10-24)
for a portion of our tests, which, at least according to the linked comment is unchanged since 2019.
Running unittests src/lib.rs (target/debug/deps/ark_module_publish-3a31a00a8f43e692)
{ "type": "suite", "event": "started", "test_count": 0 }
{ "type": "suite", "event": "ok", "passed": 0, "failed": 0, "ignored": 0, "measured": 0, "filtered_out": 0, "exec_time": 0.000055475 }
Running tests/publish.rs (target/debug/deps/publish-f5e75470c143959e)
{ "type": "suite", "event": "started", "test_count": 1 }
{ "type": "test", "event": "started", "name": "test_publish" }
{ "type": "test", "name": "test_publish", "event": "ok" }
{ "type": "suite", "event": "ok", "passed": 1, "failed": 0, "ignored": 0, "measured": 0, "filtered_out": 0, "exec_time": 0.000870104 }
Running tests/transforms.rs (target/debug/deps/transforms-4566ecd1e222c87c)
{ "type": "suite", "event": "started", "test_count": 3 }
{ "type": "test", "event": "started", "name": "adds_sections" }
{ "type": "test", "event": "started", "name": "reads_rustc" }
{ "type": "test", "event": "started", "name": "strips" }
{ "type": "test", "name": "strips", "event": "failed", "stdout": "thread 'strips' panicked at components/module-publish/tests/transforms.rs:70:67:\nfailed to build module: \"the wasm32-unknown-unknown target is not installed, unable to build wasm module 'test-panic-applet'\"\nstack backtrace:\n 0: rust_begin_unwind\n at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/std/src/panicking.rs:597:5\n 1: core::panicking::panic_fmt\n at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/panicking.rs:72:14\n 2: core::result::unwrap_failed\n at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/result.rs:1653:5\n 3: core::result::Result<T,E>::expect\n at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/result.rs:1034:23\n 4: transforms::strips\n at ./tests/transforms.rs:70:20\n 5: transforms::strips::{{closure}}\n at ./tests/transforms.rs:64:12\n 6: core::ops::function::FnOnce::call_once\n at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/ops/function.rs:250:5\n 7: core::ops::function::FnOnce::call_once\n at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/ops/function.rs:250:5\nnote: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.\n" }
{ "type": "test", "name": "adds_sections", "event": "failed", "stdout": "thread 'adds_sections' panicked at components/module-publish/tests/transforms.rs:23:67:\nfailed to build module: \"the wasm32-unknown-unknown target is not installed, unable to build wasm module 'test-panic-applet'\"\nstack backtrace:\n 0: rust_begin_unwind\n at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/std/src/panicking.rs:597:5\n 1: core::panicking::panic_fmt\n at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/panicking.rs:72:14\n 2: core::result::unwrap_failed\n at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/result.rs:1653:5\n 3: core::result::Result<T,E>::expect\n at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/result.rs:1034:23\n 4: transforms::adds_sections\n at ./tests/transforms.rs:23:20\n 5: transforms::adds_sections::{{closure}}\n at ./tests/transforms.rs:22:19\n 6: core::ops::function::FnOnce::call_once\n at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/ops/function.rs:250:5\n 7: core::ops::function::FnOnce::call_once\n at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/ops/function.rs:250:5\nnote: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.\n" }
{ "type": "test", "name": "reads_rustc", "event": "failed", "stdout": "thread 'reads_rustc' panicked at components/module-publish/tests/transforms.rs:124:67:\nfailed to build module: \"the wasm32-unknown-unknown target is not installed, unable to build wasm module 'test-panic-applet'\"\nstack backtrace:\n 0: rust_begin_unwind\n at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/std/src/panicking.rs:597:5\n 1: core::panicking::panic_fmt\n at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/panicking.rs:72:14\n 2: core::result::unwrap_failed\n at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/result.rs:1653:5\n 3: core::result::Result<T,E>::expect\n at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/result.rs:1034:23\n 4: transforms::reads_rustc\n at ./tests/transforms.rs:124:20\n 5: transforms::reads_rustc::{{closure}}\n at ./tests/transforms.rs:118:17\n 6: core::ops::function::FnOnce::call_once\n at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/ops/function.rs:250:5\n 7: core::ops::function::FnOnce::call_once\n at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/ops/function.rs:250:5\nnote: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.\n" }
{ "type": "suite", "event": "failed", "passed": 0, "failed": 3, "ignored": 0, "measured": 0, "filtered_out": 0, "exec_time": 2.413460761 }
My immediate concern is that it doesn't look like there's any space for the name of the test binary (what gets reflected in the testsuite in JUnit), which I think is important. Looking at this definition, buildkite doesn't appear to be able to take in the name of the binary either.
You are correct, my guess is the libtest output is more focused on a single crate use case but obviously loses relevant information inside a large workspace with hundreds or thousands of tests spread across multiple crates with n test binaries. That being said, I think the libtest output can still be replicated, just that the name
field of the event could include the crate + test binary name as a prefix to the name, that output consumers can then parse out. But I definitely think this is one place where nextest could definitely improve on for its own output format separately (and maybe even inform libtest itself).
There doesn't appear to be support for retries, timeouts, slow tests or other similar events that are a part of nextest but not of cargo test.
Also correct, I think this could be fine by adding a "nextest" subobject in the event for this kind of additional information outside of the scope of libtest so that consumers can either ignore it if they don't understand it, or take advantage of it they do. That being said, in the retries case that might need to be handled explicitly by consumers if they have logic that will fail if it encounters the same test multiple times (though I suppose if nextest already does mutations on the test name by adding the crate name/test binary as stated above then having something like #<n>
as a suffix or whatnot to ensure uniqueness would be "fine").
But at that point why don't we just define our own format?
- We can draw inspiration from the libtest format, but there are so many additional features that nextest supports that aren't representable in the libtest format.
- It also seems like writing a buildkite test collector isn't very difficult -- maybe there should just be a separate buildkite test collector for nextest.
My reason for suggesting having libtest compatible output was more so that projects that use other tooling that used to (or still does) depend on that format, as it was usable in stable rust for years before 1.70.0 made it unstable, can seamlessly (hopefully) drop in cargo-nextest without too much disruption, but I don't know how many people would actually find that useful.
We will write our own buildkite test collector regardless, its support for the libtest output was just what made me think of other tooling in the wild that might be able to get that seamless interop with cargo-nextest if it supported the same output.
I think before we decide on the right path here, I'd like to see a more fleshed out proposal with how to handle retries and timeouts. I'd like to understand the limits of what can be supported within the libtest format.
I think the easiest thing for me to understand all of this would be just to do a quick and dirty implementation (+ tests) in both nextest as well as a consumer to actually have a concrete thing to talk about, as I find that much easier, if that's ok?
from nextest.
From here:
Hmm, I might be misunderstanding but how would you get the span information out of a Rust test? It seems like to do that you'd have to have tests produce the span information in a way that nextest can understand. This is certainly possible to do in principle but the protocol for it doesn't exist yet.
That being said, I think the libtest output can still be replicated, just that the name field of the event could include the crate + test binary name as a prefix to the name
I think this is fine, we could use a separator like $
to split up the binary and test names.
Also correct, I think this could be fine by adding a "nextest" subobject in the event for this kind of additional information outside of the scope of libtest so that consumers can either ignore it if they don't understand it, or take advantage of it they do.
I like having the nextest
subobject with more information attached to each object. Retries are a little tricky but if we're doing libtest emulation, using #n
is fine.
We will write our own buildkite test collector regardless, its support for the libtest output was just what made me think of other tooling in the wild that might be able to get that seamless interop with cargo-nextest if it supported the same output.
I'm curious how buildkite supports retries. They have a section on flaky tests but I didn't see anything in the JSON output which indicated support for that.
I think the easiest thing for me to understand all of this would be just to do a quick and dirty implementation (+ tests) in both nextest as well as a consumer to actually have a concrete thing to talk about, as I find that much easier, if that's ok?
Sure, that sounds good.
from nextest.
Perhaps you can use a pipe and then set stdout and stderr to both write into that pipe instead, when json output is required?
That will be much simpler.
from nextest.
I've opened #1086 as a draft, will create a helper for doing combined output in the same branch but can split it out into a separate PR if needed later.
from nextest.
Related Issues (20)
- Dependency Dashboard
- cargo-hack and cargo-nextest don't compose well HOT 3
- Combining nextest, llvm-cov and delayed execution HOT 14
- test archive does not include split-debuginfo files HOT 2
- Switch to cargo zigbuild for Linux releases HOT 2
- Support units string for terminate-after HOT 4
- nextest seems to handle stdio-intensive tests less efficiently than cargo test HOT 15
- Coverage with doctests HOT 7
- Is it possible to display intentionally leaked output from tests? HOT 3
- Configurartion for a graceful shutdown period for tests manually aborted with Ctrl+C or other signals HOT 7
- Contributing features back to cargo? HOT 4
- Is there a way to prevent `cargo nextest list` from listing every individual file? HOT 2
- Manipulating the terminal causes tests to be stuck (linux) HOT 2
- wasm-pack test support HOT 1
- Question/Suggestion: Keep an expected test duration log and guide CI users in using it HOT 1
- Consider filling `message` attribute of `<failure>` element in junit report HOT 3
- `dyld[27309]: Library not loaded: @rpath/libtest-01ad18af57f4eecf.dylib` on MacOS (or with mold on Linux) HOT 6
- Tracking issue for setup-scripts
- cargo nextest run --all-features fail on windows 11 HOT 5
- Dependency Dashboard
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nextest.