Just some random thoughts. I'm not too familiar with xunit-performance, so I'm not sur

CC <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Some suggestions about xunit-performance HOT 6 CLOSED

kouvel commented on June 2, 2024

Some suggestions

from xunit-performance.

Comments (6)

kouvel commented on June 2, 2024

CC @adamsitnik

from xunit-performance.

adamsitnik commented on June 2, 2024

@kouvel how to get this with BenchmarkDotNet:

In BDN we have a thing called Jobs and they allow you to configure benchmarks. You can do it by using attributes or our fluent API or our object initialization API.

1 Provide a way for a test to run for a fixed amount of time rather than for a fixed number of iterations

You can specify the number of iterations and invocation per iteration or use our heuristics to get stable results without providing any values (default mode). You can also specify the minimum iteration time which will be respected by our heuristics.

DefaultConfig.Instance
            .With(
                Job.Default.WithMinIterationTime(TimeInterval.FromSeconds())

2 Provide another option to just have the test do the measurement and report a result as a time or score,

What do you mean by score? If you mark one of the Benchmarks/Jobs as the baseline then all other benchmarks are going to be scaled:

[Benchmark(Baseline = true)]
public void Baseline() => Thread.Sleep(BaseTime);

[Benchmark]
public void Slow() => Thread.Sleep(BaseTime * 2);

[Benchmark]
public void Fast() => Thread.Sleep(BaseTime / 2);

You can set the time units in following way:

var config = ManualConfig.Create(config: DefaultConfig.Instance);
config.Set(new Reports.SummaryStyle
{
	PrintUnitsInHeader = true,
	PrintUnitsInContent = false,
	TimeUnit = TimeUnit.Microsecond,
	SizeUnit = Columns.SizeUnit.B
});

2 for some extra flexibility including nontrivial setup/teardown and custom warmup/test. Utilities for taking measurements and doing calculations may be provided in libraries.

We support global setup/cleanup (Executed once) and Iteration Setup/Cleanup (executed for every itereation), the docs

public class SetupAndCleanupExample
{
  private int setupCounter;
  private int cleanupCounter;

  [IterationSetup]
  public void IterationSetup() => Console.WriteLine("// " + "IterationSetup" + " (" + ++setupCounter + ")");

  [IterationCleanup]
  public void IterationCleanup() => Console.WriteLine("// " + "IterationCleanup" + " (" + ++cleanupCounter + ")");

  [GlobalSetup]
  public void GlobalSetup() => Console.WriteLine("// " + "GlobalSetup");

  [GlobalCleanup]
  public void GlobalCleanup() => Console.WriteLine("// " + "GlobalCleanup");

  [Benchmark]
  public void Benchmark() => Console.WriteLine("// " + "Benchmark");
}

3 Not sure if there is a way to disable ETW event collection.

We have some ETW diagnosers: Inlining, HardwareCounters and TailCall. They are not enabled by default, moreover, the diagnosers with an overhead run the benchmarks once again, gather the data and ignore the results (they are skewed by the overhead)

Btw I am sure it's possible today with xunit-performance too,

Provide error in % (maybe standard error) so that it's easy to tell from a glance which tests were noisy

BenchmarkDotNet provides:

Min, Lower Fence, Q1, Median, Mean, Q3, Upper Fence, Max, Interquartile Range, Outliers
Standard Error, Variance, Standard Deviation
Skewness, Kurtosis
Confidence Interval (Mean, Error, Level, Margin, Lower, Upper)
Percentiles (P0, P25, P50, P67, P80, P85, P90, P95, P100)

And removes the outliers by default.

@AndreyAkinshin (the main author of BDN) is a PhD with huge interest in statistics ;)

5 Provide ability to run multiple iterations of the test. Or even better would be to provide a way to specify minimum and maximum number of iterations over the whole test, a target standard error %, and have the harness run iterations until the error is below the target

It's our default mode (to run the benchmarks until our herustic is happy about the results), you can configure the accuracy by using some of the Job extension methods:

 WithMaxRelativeError
 WithMaxAbsoluteError
 WithMinIterationTime
 WithMinInvokeCount
 WithEvaluateOverhead
 WithRemoveOutliers
 WithAnalyzeLaunchVariance

6 I assume a GC.Collect is done between test invocations,

BenchmarkDotNet forces GC.Collect + GC.WaitForPendingFinalizers + GC.Collect for every iteration. This behavior can be disabled by calling Job.WithGcForce(false)

Moreover, by default, we run every benchmark in a new, dedicated process so the self-tuning nature of GC does not affect the final results and order of executing benchmarks/any other side effects do not matter due to the process isolation.

@kouvel please let me know if you have some more questions!

from xunit-performance.

kouvel commented on June 2, 2024

Sounds very interesting, thanks for the info!

from xunit-performance.

kouvel commented on June 2, 2024

What do you mean by score?

I meant some opaque value where higher is better. It could just be iterations per unit of time, or sometimes a test may measure several things and produce one value by weighting those measurements. For instance in a test there can be two types of operations happening at the same time (intent is to test both together), but the perf of one may be more important than the other. For testing a reader-writer lock with many readers and few writers, we'd want to see that readers are getting most of the locks but writers are also making progress.

If you mark one of the Benchmarks/Jobs as the baseline then all other benchmarks are going to be scaled

This could be useful but I was hoping to get some more flexibility with that example. Probably there's already a way though.

from xunit-performance.

adamsitnik commented on June 2, 2024

I meant some opaque value where higher is better. It could just be iterations per unit of time

We display Operations/s by default.

from xunit-performance.

kouvel commented on June 2, 2024

I see. Another thought is if a test could provide multiple results from one measurement sequence (like in the RWLock test above it could produce readers/s and writers/s that could be tracked independently instead of combining the values into a score. Maybe they could appear as child tests of the parent.

from xunit-performance.

Some suggestions about xunit-performance HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent