hdrhistogram / hdrhistogram.net Goto Github PK

View Code? Open in Web Editor NEW

176.0 16.0 30.0 376 KB

The .NET port of HdrHistogram

License: Other

C# 99.85% Batchfile 0.15%

hdrhistogram.net's Introduction

HdrHistogram

A High Dynamic Range (HDR) Histogram

What is it

HdrHistogram.NET is the official port of the Java HdrHistogram library. All official implementations of HdrHistogram can be found at https://github.com/HdrHistogram

Why would I use it?

You would use it to efficiently capture large number of response times measurements.

Often when measuring response times, one could make the common mistake of reporting on the mean value or the 90th percentile. Gil Tene (the original author of the Java HdrHistogram) illustrates in numerous presentations (such as here and here) on why this is a mistake. Instead you want to collect all of the data and then be able to report your measurements across the range of measurements.

How would I use it?

The library is available as a package from Nuget as HdrHistogram

Generally you want to be able to record at the finest accuracy the response-time of a given function of your software. To do this code might look something like this

Declare the Histogram

// A Histogram covering the range from ~466 nanoseconds to 1 hour (3,600,000,000,000 ns) with a resolution of 3 significant figures:
var histogram = new LongHistogram(TimeStamp.Hours(1), 3);

Record your measurements

Next you would record your measurements. The System.Diagnostics.Stopwatch.GetTimestamp() method provides the most accurate way to record the elapsed time an action took to run. By measuring the difference of the timestamp values before and after the action to measure, we can get the most accurate recording of elapsed time available on the .NET platform.

long startTimestamp = Stopwatch.GetTimestamp();
//Execute some action to be measured
long elapsed = Stopwatch.GetTimestamp() - startTimestamp;
histogram.RecordValue(elapsed);

Output the results.

Once you have recorded all of your data, you are able to present that data based on a highly dynamic range of buckets. We are not interested in all the values, but just enough of the values to get a picture of our system's performance. To do this we want to generate a percentile distribution, with exponentially increasing fidelity.

Here we show an example of writing to the Console.

var writer = new StringWriter();
var scalingRatio = OutputScalingFactor.TimeStampToMicroseconds;
histogram.OutputPercentileDistribution(
  writer,
  outputValueUnitScalingRatio: scalingRatio);
Console.WriteLine(writer.ToString());
//Or just simply write directly to the Console output  stream
//histogram.OutputPercentileDistribution(
//  Console.Out,
//  outputValueUnitScalingRatio: scalingRatio);

Would produce output similar to:

       Value     Percentile TotalCount 1/(1-Percentile)

       0.285 0.000000000000          1           1.00
       0.448 0.100000000000       3535           1.11
       0.466 0.200000000000       7100           1.25
       0.497 0.300000000000      10504           1.43
       0.523 0.400000000000      14046           1.67
       0.535 0.500000000000      17644           2.00
       0.541 0.550000000000      19466           2.22
       0.547 0.600000000000      21134           2.50
       0.555 0.650000000000      22898           2.86
       0.567 0.700000000000      24513           3.33
       0.594 0.750000000000      26260           4.00
       0.609 0.775000000000      27129           4.44
       0.627 0.800000000000      28005           5.00
       0.642 0.825000000000      28939           5.71
       0.660 0.850000000000      29793           6.67
       0.680 0.875000000000      30649           8.00
       0.687 0.887500000000      31095           8.89
       0.693 0.900000000000      31550          10.00
       0.698 0.912500000000      31992          11.43
       0.703 0.925000000000      32415          13.33
       0.710 0.937500000000      32880          16.00
       0.713 0.943750000000      33080          17.78
       0.717 0.950000000000      33277          20.00
       0.721 0.956250000000      33476          22.86
       0.727 0.962500000000      33710          26.67
       0.736 0.968750000000      33925          32.00
       0.741 0.971875000000      34023          35.56
       0.748 0.975000000000      34141          40.00
       0.757 0.978125000000      34249          45.71
       0.768 0.981250000000      34352          53.33
       0.786 0.984375000000      34459          64.00
       0.803 0.985937500000      34515          71.11
       0.815 0.987500000000      34567          80.00
       0.838 0.989062500000      34622          91.43
       0.869 0.990625000000      34676         106.67
       1.045 0.992187500000      34731         128.00
       1.815 0.992968750000      34759         142.22
       1.943 0.993750000000      34786         160.00
       1.989 0.994531250000      34813         182.86
       2.038 0.995312500000      34841         213.33
       2.087 0.996093750000      34868         256.00
       2.127 0.996484375000      34881         284.44
       2.161 0.996875000000      34895         320.00
       2.225 0.997265625000      34909         365.71
       2.355 0.997656250000      34922         426.67
       2.539 0.998046875000      34936         512.00
       2.601 0.998242187500      34943         568.89
       2.653 0.998437500000      34950         640.00
       2.689 0.998632812500      34957         731.43
       2.755 0.998828125000      34964         853.33
       2.801 0.999023437500      34970        1024.00
       2.827 0.999121093750      34974        1137.78
       2.847 0.999218750000      34977        1280.00
       2.889 0.999316406250      34982        1462.86
       2.947 0.999414062500      34984        1706.67
       2.979 0.999511718750      34987        2048.00
       3.015 0.999560546875      34989        2275.56
       3.131 0.999609375000      34991        2560.00
       3.267 0.999658203125      34993        2925.71
       3.397 0.999707031250      34994        3413.33
       3.627 0.999755859375      34996        4096.00
       3.845 0.999780273438      34997        4551.11
       3.995 0.999804687500      34998        5120.00
       4.299 0.999829101563      34999        5851.43
       4.299 0.999853515625      34999        6826.67
       4.839 0.999877929688      35000        8192.00
      10.039 0.999890136719      35001        9102.22
      10.039 0.999902343750      35001       10240.00
      11.911 0.999914550781      35002       11702.86
      11.911 0.999926757813      35002       13653.33
      11.911 0.999938964844      35002       16384.00
      15.367 0.999945068359      35003       18204.44
      15.367 0.999951171875      35003       20480.00
      15.367 0.999957275391      35003       23405.71
      15.367 0.999963378906      35003       27306.67
      15.367 0.999969482422      35003       32768.00
    2543.615 0.999972534180      35004       36408.89
    2543.615 1.000000000000      35004
#[Mean    =        0.633, StdDeviation   =       13.588]
#[Max     =     2541.568, Total count    =        35004]
#[Buckets =           21, SubBuckets     =         2048]

Note that in the example above a value for the optional parameter outputValueUnitScalingRatio is provided. If you record elapsed time using the suggested method with Stopwatch.GetTimestamp(), then you will have recorded values in a non-standard unit of time. Instead of paying to cost of converting recorded values at the time of recording, record raw values. Use the helper methods to convert recorded values to standard units at output time, when performance is less critical.

Example of reporting results as a chart

You can also have HdrHistogram output the results in a file format that can be charted. This is especially useful when comparing measurements.

First you will need to create the file to be used as an input for the chart.

using (var writer = new StreamWriter("HistogramResults.hgrm"))
{
	histogram.OutputPercentileDistribution(writer);
}

The data can then be plotter to visualize the percentile distribution of your results. Multiple files can be plotted in the same chart allowing effective visual comparison of your results. You can use either

the online tool - http://hdrhistogram.github.io/HdrHistogram/plotFiles.html
the local tool - .\GoogleChartsExample\plotFiles.html

If you use the local tool, there are example result files in the .\GoogleChartsExample directory. The tool also allows you to export to png.

So what is so special about this way of recording response times?

itself is low latency
tiny foot print due to just storing a dynamic range of buckets and counts
produces the reports you actually want

Full code example

This code sample show a recording of the time taken to execute a ping request. We execute and record this in a loop.

// A Histogram covering the range from ~466 nanoseconds to 1 hour (3,600,000,000,000 ns) with a resolution of 3 significant figures:
var histogram = new LongHistogram(TimeStamp.Hours(1), 3);
using (var ping = new System.Net.NetworkInformation.Ping())
{
	for (int i = 0; i < 100; i++)
	{
		long startTimestamp = Stopwatch.GetTimestamp();
		//Execute our action we want to record.
		ping.Send("www.github.com");
		long elapsed = Stopwatch.GetTimestamp() - startTimestamp;
		histogram.RecordValue(elapsed);
	}
}
//Output the percentile distribution of our results to the Console with values presented in Milliseconds
histogram.OutputPercentileDistribution(
	printStream: Console.Out,
	percentileTicksPerHalfDistance: 3,
	outputValueUnitScalingRatio: OutputScalingFactor.TimeStampToMilliseconds);

output:

       Value     Percentile TotalCount 1/(1-Percentile)

      79.360 0.000000000000          1           1.00
      80.435 0.166666666667         17           1.20
      80.896 0.333333333333         36           1.50
      81.050 0.500000000000         52           2.00
      81.152 0.583333333333         59           2.40
      81.254 0.666666666667         70           3.00
      81.357 0.750000000000         76           4.00
      81.459 0.791666666667         86           4.80
      81.459 0.833333333333         86           6.00
      81.510 0.875000000000         93           8.00
      81.510 0.895833333333         93           9.60
      81.510 0.916666666667         93          12.00
      81.562 0.937500000000         94          16.00
      81.613 0.947916666667         98          19.20
      81.613 0.958333333333         98          24.00
      81.613 0.968750000000         98          32.00
      81.613 0.973958333333         98          38.40
      81.613 0.979166666667         98          48.00
      81.664 0.984375000000         99          64.00
      81.664 0.986979166667         99          76.80
      81.664 0.989583333333         99          96.00
      86.067 0.992187500000        100         128.00
      86.067 1.000000000000        100
#[Mean    =       80.964, StdDeviation   =        0.746]
#[Max     =       86.067, Total count    =          100]
#[Buckets =           26, SubBuckets     =         2048]

How would I contribute to this project?

We welcome pull requests! If you do choose to contribute, please first raise an issue so we are not caught off guard by the pull request. Next can you please ensure that your PR (Pull Request) has a comment in it describing what it achieves and the issues that it closes. Ideally if it is fixing an issue or a bug, there would be a Unit Test proving the fix and a reference to the Issues in the PR comments.

HdrHistogram Details

An HdrHistogram supports the recording and analyzing of sampled data value counts across a configurable integer value range with configurable value precision within the range. Value precision is expressed as the number of significant digits in the value recording, and provides control over value quantization behavior across the value range and the subsequent value resolution at any given level.

For example, a Histogram could be configured to track the counts of observed integer values between 0 and 3,600,000,000 while maintaining a value precision of 3 significant digits across that range. Value quantization within the range will thus be no larger than 1/1,000th (or 0.1%) of any value. This example Histogram could be used to track and analyze the counts of observed response times ranging between 1 microsecond and 1 hour in magnitude. This Histogram would still maintain a value resolution of 1 microsecond up to 1 millisecond, a resolution of 1 millisecond (or better) up to one second, and a resolution of 1 second (or better) up to 1,000 seconds. At its maximum tracked value (1 hour), it would still maintain a resolution of 3.6 seconds (or better).

The HdrHistogram package includes the LongHistogram implementation, which tracks value counts in long fields, and is expected to be the commonly used Histogram form. IntHistogram and ShortHistogram, which track value counts in int and short fields respectively, are provided for use cases where smaller count ranges are practical and smaller overall storage is beneficial. Performance impacts should be measured prior to choosing one over the other in the name of optimization.

HdrHistogram is designed for recoding histograms of value measurements in latency and performance sensitive applications. Measurements show value recording times as low as 3-6 nanoseconds on modern (circa 2012) Intel CPUs. That is, 1,000,000,000 (1 billion) recordings can be made at a total cost of around 3 seconds on modern hardware. A Histogram's memory footprint is constant, with no allocation operations involved in recording data values or in iterating through them. The memory footprint is fixed regardless of the number of data value samples recorded, and depends solely on the dynamic range and precision chosen. The amount of work involved in recording a sample is constant, and directly computes storage index locations such that no iteration or searching is ever involved in recording data values.

A combination of high dynamic range and precision is useful for collection and accurate post-recording analysis of sampled value data distribution in various forms. Whether it's calculating or plotting arbitrary percentiles, iterating through and summarizing values in various ways, or deriving mean and standard deviation values, the fact that the recorded data information is kept in high resolution allows for accurate post-recording analysis with low [and ultimately configurable] loss in accuracy when compared to performing the same analysis directly on the potentially infinite series of sourced data values samples.

An common use example of HdrHistogram would be to record response times in units of microseconds across a dynamic range stretching from 1 usec to over an hour, with a good enough resolution to support later performing post-recording analysis on the collected data. Analysis can include computing, examining, and reporting of distribution by percentiles, linear or logarithmic value buckets, mean and standard deviation, or by any other means that can can be easily added by using the various iteration techniques supported by the Histogram. In order to facilitate the accuracy needed for various post-recording analysis techniques, this example can maintain a resolution of ~1 usec or better for times ranging to ~2 msec in magnitude, while at the same time maintaining a resolution of ~1 msec or better for times ranging to ~2 sec, and a resolution of ~1 second or better for values up to 2,000 seconds. This sort of example resolution can be thought of as "always accurate to 3 decimal points." Such an example Histogram would simply be created with a highestTrackableValue of 3,600,000,000, and a numberOfSignificantValueDigits of 3, and would occupy a fixed, unchanging memory footprint of around 185KB (see "Footprint estimation" below).

Histogram variants and internal representation

The HdrHistogram package includes multiple implementations of the HistogramBase class:

LongHistogram, which is the commonly used Histogram form and tracks value counts in long fields.
IntHistogram and ShortHistogram, which track value counts in int and short fields respectively, are provided for use cases where smaller count ranges are practical and smaller overall storage is beneficial (e.g. systems where tens of thousands of in-memory histogram are being tracked).
SynchronizedHistogram (see 'Synchronization and concurrent access' below)

Internally, data in HdrHistogram variants is maintained using a concept somewhat similar to that of floating point number representation. Using an exponent a (non-normalized) mantissa to support a wide dynamic range at a high but varying (by exponent value) resolution. Histograms use exponentially increasing bucket value ranges (the parallel of the exponent portion of a floating point number) with each bucket containing a fixed number (per bucket) set of linear sub-buckets (the parallel of a non-normalized mantissa portion of a floating point number). Both dynamic range and resolution are configurable, with highestTrackableValue controlling dynamic range, and numberOfSignificantValueDigits controlling resolution.

Synchronization and concurrent access

In the interest of keeping value recording cost to a minimum, the commonly used LongHistogram class and its IntHistogram and ShortHistogram variants are NOT internally synchronized, and do NOT use atomic variables. Callers wishing to make potentially concurrent, multi-threaded updates or queries against Histogram objects should either take care to externally synchronize and/or order their access, or use the SynchronizedHistogram variant. It is worth mentioning that since Histogram objects are additive, it is common practice to use per-thread, non-synchronized histograms for the recording fast path, and "flipping" the actively recorded-to histogram (usually with some non-locking variants on the fast path) and having a summary/reporting thread perform histogram aggregation math across time and/or threads.

Iteration

Histograms supports multiple convenient forms of iterating through the histogram data set, including linear, logarithmic, and percentile iteration mechanisms, as well as means for iterating through each recorded value or each possible value level. The iteration mechanisms are accessible through the HistogramData available through getHistogramData(). Iteration mechanisms all provide HistogramIterationValue data points along the histogram's iterated data set.

Recorded values are available as instance methods:

RecordedValues: An IEnumerable<HistogramIterationValue> through the histogram using a RecordedValuesEnumerable`RecordedValuesEnumerator`
AllValues: An IEnumerable<HistogramIterationValue> through the histogram using a AllValueEnumerable`AllValuesEnumerator`

All others are available for the default (corrected) histogram data set via the following extension methods:

Percentiles: An IEnumerable<HistogramIterationValue> through the histogram using a PercentileEnumerable/PercentileEnumerator
LinearBucketValues: An IEnumerable<HistogramIterationValue> through the histogram using a LinearBucketEnumerable/LinearEnumerator
LogarithmicBucketValues: An IEnumerable<HistogramIterationValue> through the histogram using a LogarithmicBucketEnumerable/LogarithmicEnumerator

Iteration is typically done with a for-each loop statement. E.g.:

 foreach (var v in histogram.Percentiles(ticksPerHalfDistance))
 {
     ...
 }

 for (var v in histogram.LinearBucketValues(unitsPerBucket))
 {
     ...
 }

These enumerators are optimised for fast forward readonly "hosepipe" usage. They are low allocation and may reuse objects internally to keep allocations low and thus reduce garbage collection/memory pressure.

Equivalent Values and value ranges

Due to the finite (and configurable) resolution of the histogram, multiple adjacent integer data values can be "equivalent". Two values are considered "equivalent" if samples recorded for both are always counted in a common total count due to the histogram's resolution level. HdrHistogram provides methods for

determining the lowest and highest equivalent values for any given value,
determining whether two values are equivalent,
finding the next non-equivalent value for a given value (useful when looping through values, in order to avoid a double-counting count).

Corrected vs. Raw value recording calls

In order to support a common use case needed when histogram values are used to track response time distribution, Histogram provides for the recording of corrected histogram value by supporting a RecordValueWithExpectedInterval(long, long) variant is provided. This value recording form is useful in [common latency measurement] scenarios where response times may exceed the expected interval between issuing requests, leading to "dropped" response time measurements that would typically correlate with "bad" results.

When a value recorded in the histogram exceeds the expectedIntervalBetweenValueSamples parameter, recorded histogram data will reflect an appropriate number of additional values, linearly decreasing in steps of expectedIntervalBetweenValueSamples, down to the last value that would still be higher than expectedIntervalBetweenValueSamples.

To illustrate why this corrective behavior is critically needed in order to accurately represent value distribution when large value measurements may lead to missed samples, imagine a system for which response times samples are taken once every 10 msec to characterize response time distribution. The hypothetical system behaves "perfectly" for 100 seconds (10,000 recorded samples), with each sample showing a 1msec response time value. At each sample for 100 seconds (10,000 logged samples at 1 msec each). The hypothetical system then encounters a 100 sec pause during which only a single sample is recorded (with a 100 second value). The raw data histogram collected for such a hypothetical system (over the 200 second scenario above) would show ~99.99% of results at 1 msec or below, which is obviously "not right". The same histogram, corrected with the knowledge of an expectedIntervalBetweenValueSamples of 10msec will correctly represent the response time distribution. Only ~50% of results will be at 1 msec or below, with the remaining 50% coming from the auto-generated value records covering the missing increments spread between 10msec and 100 sec.

Data sets recorded with and without an expectedIntervalBetweenValueSamples parameter will differ only if at least one value recorded with the RecordValue(..) method was greater than its associated expectedIntervalBetweenValueSamples parameter. Data sets recorded with an expectedIntervalBetweenValueSamples parameter will be identical to ones recorded without it if all values recorded via the RecordValue(..) calls were smaller than their associated (and optional) expectedIntervalBetweenValueSamples parameters.

When used for response time characterization, the recording with the optional expectedIntervalBetweenValueSamples parameter will tend to produce data sets that would much more accurately reflect the response time distribution that a random, uncoordinated request would have experienced.

Footprint estimation

Due to it's dynamic range representation, Histogram is relatively efficient in memory space requirements given the accuracy and dynamic range it covers. Still, it is useful to be able to estimate the memory footprint involved for a given highestTrackableValue and numberOfSignificantValueDigits combination. Beyond a relatively small fixed-size footprint used for internal fields and stats (which can be estimated as "fixed at well less than 1KB"), the bulk of a Histogram's storage is taken up by it's data value recording counts array. The total footprint can be conservatively estimated by:

 largestValueWithSingleUnitResolution = 2 * (10 ^ numberOfSignificantValueDigits);
 subBucketSize = RoundedUpToNearestPowerOf2(largestValueWithSingleUnitResolution);

 expectedHistogramFootprintInBytes = 512 +
      ({primitive type size} / 2) *
      (Log2RoundedUp((highestTrackableValue) / subBucketSize) + 2) *
      subBucketSize;

A conservative (high) estimate of a Histogram's footprint in bytes is available via the GetEstimatedFootprintInBytes() method.

Terminology

Latency : The time that something is latent i.e. not being processed. This maybe due to being in a queue.
Service Time : The time taken to actually service a request.
Response time : The sum of the latency and the service time. e.g. the time your request was queued, plus the time it took to process.

References (see also)

How NOT to Measure Latency Gil Tene - qCon 2013
Understanding Latency Gil Tene - React San Francisco 2014
Designing for Performance Martin Thompson - GOTO Chicago 2015
https://en.wikipedia.org/wiki/Response_time_(technology)

hdrhistogram.net's People

Contributors

Stargazers

Watchers

hdrhistogram.net's Issues

Benchmark the Library

The unofficial port at [https://github.com/LeeCampbell/HdrHistogram.NET] had some benchmarks that were used to compare it to the existing official implementation and could be used to track performance over versions.

It seems sensible to investigate BenchmarkDotNet as a standard way to measure performance. It should also reduce the amount of code to maintain in this repository.

Add Tag support to Log format

https://gitter.im/HdrHistogram/HdrHistogram?at=572fa2a7f9a53a60793cd710

Create documentation standards

Adoption of a library can be greatly improved with quality documentation.

Providing guidelines on how to create consistent documentation can reduce rework and improve the feeling of a quality repository.

example of a set of guidelines : http://dotnet.github.io/orleans/Documentation-Guidelines

HistogramLogWriter is blocking on purpose ?

in order to persist data on external system overtime we spins up a Timer that will run this :
HistogramLogWriter.Append(someStream)

This code in wrapped in a "PeriodicTask" and contains a "Stop" method and some CancellationToken
The Stream is done throught this API :

Is that intended not to use the Async method with the possibility to pass a cancellation token so that everything could be stopped "gracefully" ?

for example if the accumulated data are like 500Mo, and your are trying to shutdown the App, this would probably not goes really well with blocking code

Document Unit Tests and Benchmarks

The readme currently doesn't have the basic details on how to run tests!

It should say something like

dotnet test .\HdrHistogram.UnitTests\HdrHistogram.UnitTests.csproj -v=q -c=Release

.NET Lib to create charts

Currently I believe that the only way for a .NET program to render the captured histograms into a chart is via the web project http://hdrhistogram.github.io/HdrHistogram/plotFiles.html.

It would be nice if this was ported to a .NET process (using the Drawing or WPF libs) to generate this stuff on the fly.

Create extreme packages

Create extreme packages that are designed for extreme performance cases

only have a single implementation of a histogram defined
remove the base class
only implement the interface
sealed class with no inheritance
x64 and x86 (32bit) releases.
no support for synchronization.

e.g. HdrHistogramLongx64.nupkg, HdrHistogramIntx64.nupkg, HdrHistogramShortx86.nupkg

Test to see if they provide significant improvements in

footprint (dll/nuget package)
throughput

These would be targets for either .NET platforms requiring extreme throughput (MMOG, Trading, etc), or that require very small footprint (UWA, RasberryPi, etc)

Create CI build

Create an automated build that

compiles the code for each platform
runs all the tests
packages and deploys

Consider using AppVeyor?

Invalid .hgrm output produced

The logic for `IsLastValue()' is incorrect and can flag multiple values as the last value.

This causes the incorrect output shown below (note multiple lines with the last column missing):

       Value     Percentile TotalCount 1/(1-Percentile)

       1.000 0.000000000000    7604459           1.00
       1.000 0.100000000000    7604459           1.11
       1.000 0.200000000000    7604459           1.25
       1.000 0.300000000000    7604459           1.43
       1.000 0.400000000000    7604459           1.67
       1.000 0.500000000000    7604459           2.00
       1.000 0.550000000000    7604459           2.22
       1.000 0.600000000000    7604459           2.50
<SEVERAL ROWS REMOVED FOR BREVITY>
     383.000 0.999998283386    9999983      582542.22
     453.000 0.999998474121    9999985      655360.00
     511.000 0.999998664856    9999987      748982.86
     537.000 0.999998855591    9999990      873813.33
     672.000 0.999999046326    9999991
     777.000 0.999999141693    9999992
   18143.000 0.999999237061    9999993
  208127.000 0.999999332428    9999994
  224639.000 0.999999427795    9999995
  229759.000 0.999999523163    9999996
  229759.000 0.999999570847    9999996
  230271.000 0.999999618530    9999997
  230271.000 0.999999666214    9999997
  258943.000 0.999999713898    9999998
  258943.000 0.999999761581    9999998
  258943.000 0.999999785423    9999998
  275711.000 0.999999809265    9999999
  275711.000 0.999999833107    9999999
  275711.000 0.999999856949    9999999
  275711.000 0.999999880791    9999999
  275711.000 0.999999892712    9999999
  282111.000 0.999999904633   10000000
  282111.000 1.000000000000   10000000
#[Mean    =        1.714, StdDeviation   =      205.505]
#[Max     =   282111.000, Total count    =     10000000]
#[Buckets =           14, SubBuckets     =         2048]

I'm pretty sure the fix is to make IsLastValue look like this (i.e using double.Epsilon):

public bool IsLastValue()
{
    return Math.Abs(PercentileLevelIteratedTo - 100.0D) < double.Epsilon
}

Allow thread safe writing/recording

Either create a thread safe implementation of each Histogram (16/32/64 bit) or provide a generic wrapper to allow synchronized access to them

Thread safe writes
Thread safe reads (via a recorder)

Update home page/Readme.md to have less detail

Instead of having a dense detailed homepage, link to the content in the wiki

Move Detailed description to wiki
Update code sample to a factory
Reduce the sample output content size

Lzct SSE instruction when available

We currently have at the heart of the hot path a manual method for find the leading zero count.
This is used to identify the correct bucket to assign a recorded value.
Frustratingly, this is supported as an intrinsic instruction on most modern CPU architectures.

The code is found in Bitwise.NumberOfLeadingZeros, and has been isolated with the intent that it can be a single place to refactor/optimize if the opportunity arises.

Follow this .NET core issue for progress/resolution

dotnet/corefx#2209
possibly moved from coreFx to CoreCLR https://github.com/dotnet/coreclr/issues/8089

Create IntHistogram

A histogram with 32Bit bucket counts

Example of how to Send or save Histograms

This seems like it will become a popular thing and people are already asking for it

https://gitter.im/HdrHistogram/HdrHistogram?at=563f8127c712fe074e4e7101

Correct docs and Scaling regarding Ticks

The comments, documents and scaling helper class all misinterpret the result of a Stopwatch.GetTimeStamp() - Stopwatch.GetTimeStamp() to be an elapsed period of ticks i.e. 1/10,000,000 of a second.
This assumption was made from simply thinking that TimeSpan.TicksPerSecond related to the tick values returned from Stopwatch.GetTimestamp().
As per documentation

Gets the current number of ticks in the timer mechanism.

However, it is correctly defined* as 1 second = Stopwatch.Frequency;

This means the following helper fields should be added

public static readonly double TimeStampToMicroseconds = Stopwatch.Frequency / (1000d * 1000d);
public static readonly double TimeStampToMilliseconds = Stopwatch.Frequency / 1000d;
public static readonly double TimeStampToSeconds = Stopwatch.Frequency;

And documents referring to ticks, should be amended.

https://msdn.microsoft.com/en-us/library/system.diagnostics.stopwatch.frequency(v=vs.110).aspx

Histogram Factory

I also propose that the current Histogrram type be renamed to LongHistogram. The Histogram type can then be used as a factory that can help guide users to which instance they need.

potentially something like

var histogram = Histogram
    .With64bitBucketCount() 
    .AsThreadSafe()
    .Create()

GetPerecentileOfValue

How hard would it be to add a method that queries the histogram for a percentile given a sample?

If i have a measurement of 300ms what percentile does this sample fall under?

Add support for decoding V0 and V1 encodings

Test logs with V2, V1, and V0 encoded histograms are included in the Java repo under test/resources.

.NET Core support

Ideally HdrHistogram should be available on all supported versions of .NET.
Currently this is 4.5.2 & 4.6.1
It may however be prudent to wait until .NET Core and supporting tooling is available before undertaking this task, as it may incur alot of rework if the goalposts move by too much.

Create EBNF definition of the log format

https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form

https://gitter.im/HdrHistogram/HdrHistogram?at=571e8a9d47b4c6480ffa300d

Document/Visualize the internal bucket model

As asked and answered here https://gitter.im/HdrHistogram/HdrHistogram?at=56db5610ddfe3d431627fa97

Benchmarks as part of the build.

Can we use Benchmark DotNet or NBench to ensure our hope paths stay allocation free and fast.

Examples

I've been using HdrHistogram for a while and have created some C# classes to simplify its usage. They allow me to enable of set of histograms via simple names in an application configuration file. The classes that use the enabled histograms can initialize and use them with only a small amount of code. I also have an F# version that uses F# modules/functions instead of classes. Here's a quick C# example:

// step 1 - initialize the histogram names that will be used in the current run from config file
List<string> names = cfg.getValues("histograms");

foreach (string name in names)
{
    Histograms.Add(name);
    _log.DebugFormat("enabled histogram: {0}", name);
}

// step 2 - initialize an enabled histogram within class it's used in

private static HistogramTimer _hgPostAck = null;
private static bool _hgPostAckEnabled = false;
public static readonly string HG_POSTACK = "postack";

_hgPostAckEnabled = Histograms.isEnabled(HG_POSTACK);

if (_hgPostAckEnabled)
    _hgPostAck = Histograms.makeHistogramTimer(HG_POSTACK, HistogramTimer.NSECS_IN_MIN * 10L, 3, false, 100); // 1 nsec to 10 min, 3 decimal point resolution, don't warmup, only log report every 100 calls

// step 3 - use histogram within class

try
{
    if (_hgPostAckEnabled)
        _hgPostAck.startTimer();

    // do work...
}
finally
{
    if (_hgPostAckEnabled)
    {
        _hgPostAck.recordTime();
        Histograms.logReport(HG_POSTACK, ScaleFactor.MSEC);
    }
}

Let me know if you are interested in these potential contributions. Note that my existing C# and F# code will most likely require some modifications before they can be included.

Peter Santoro

Event tracing for Windows support

It would be nice if there would be an out-of-the-box way to use input from ETW events as input.
I will look into this as I think that things like this should be done out-of-process. As the .NET garbage collector also publishes stats via ETW, it would give a standard way to look at Garbage collection times of programs.

.gitignore does not currently exclude .vs files generated by Visual Studio

Please consider signing your assemblies

Hello,

Could you please consider signing your assemblies so that they can be used in a wider range of projects where strong-name information is required?

Further reading: https://www.pedrolamas.com/2018/09/11/start-strong-naming-your-assemblies/

Create coding standards

To enable the community to contrubute to the repository in a consitent and predictable manner, it would be good if there was a set of standards for the project to adopt.

Which style of C# to use (e.g. StyleCop defaults vs Resharper defaults)
Expectation of XML documentation for public api
Test coverage and style expectations.
Platform support that is expected (.NET 20-4.5, Mono, Dnxcore etc?)

These expectation should be documented in the wiki

Subtract - additional method rather than issue

Hello,

Would it be possible to add a Subtract function that would subtract one histogram from another. An Add function exists. A Subtract function would also be very helpful.

Apologies if this request is in the wrong place. All help appreciated.

Simon

Non shared values from Enumerables

As identified in #64 , it is non-intuitive that values returned from our Enumerable factories are shared and mutable.

Running benchmarks against master and PR #64 shows no significant performance change, so this looks like it is an unnecessary coding style ported from the Java implementation

Consistent index error on Windows

When initialising the following LongHistogram

var measurements = new LongHistogram((long) TimeSpan.FromMinutes(15).TotalMilliseconds, 3);

and then in a continuous loop add measurements to it with

measurements.RecordValue(actual - expected);

I consistently get the following exception:

 Index was outside the bounds of the array.
Stack Trace:
   at HdrHistogram.Utilities.Bitwise.Log2(Int32 i)
   at HdrHistogram.Utilities.Bitwise.NumberOfLeadingZeros(Int64 value)
   at HdrHistogram.HistogramBase.RecordSingleValue(Int64 value)

Could this be because actual - expected can be negative – or 0? I had assumed HdrHistogram handled that, but I realise now I might have been incorrect.

CultureInfo used to create output cannot be controlled resulting in invalid .hgrm format

A line in a .hgrm file should look something like this:

       5.02 0.881250000000        888           8.42

Notice the decimal separator which is a dot.

The OutputPercentileDistribution method will format the data using Thread.CurrentThread.CurrentCulture and in many cultures the decimal separator is a comma and not a dot.

       5,02 0,881250000000        888           8,42

This results in a .hgrm file that cannot be parsed correctly.

To work around this you need to set Thread.CurrentThread.CurrentCulture = CultureInfo.InvariantCulture before calling OutputPercentileDistribution. In many cases you will have to change the culture back after the call to avoid situations where other code depends on the current culture of the thread. In general this is a hack and not a sustainable solution.

However, the decimal should not always be a dot. If the output of OutputPercentileDistribution is intended to be read by a human the current choice of using Thread.CurrentThread.CurrentCulture is the right way of formatting the output.

I suggest that an overload of OutputPercentileDistribution accepting a CultureInfo is created. Writing to a .hgrm file would then require specifying CultureInfo.InvariantCulture.

Auto release on tag

https://www.appveyor.com/docs/deployment/github/

When we tag the repo, it would be nice if a release was just created

Repo tagged
GitHub "release" created
Nuget package added to the Release
Package published to Nuget.org

Create automated Documentation generation

Could use https://readthedocs.org/ to host the generated docs.

Create a Log writer

Support a log writer that would output V2 encoded histograms to a log file compatible with HistogramLogWriter

Investigate SkinnyHistogram

I am not sure what the Skinny Histogram is and what it could offer the .NET project.

Create ShortHistogram

A histogram with 16Bit bucket counts

Update to Encoding docs

As per https://gitter.im/HdrHistogram/HdrHistogram?at=5731f888b51b0e294850e027

File Format:

the use of "File" is not the best as this construct is not always saved in a file (for example in our app it is stored in a memory buffer and sent over the wire)
add valid values for cookie
add length encoding and valid range

I initially struggled to understand the rational for this encapsulation, maybe @giltene can provide the history behind this (i.e. why do we have this encapsulation)?

CompressedHeader Format:

the description should specify the byte order for all the 32 and 64 bit fields (A to F)
list of valid cookie values
valid values for B to F

Perhaps a better (final) place is to have that document one level up or in a doc repo since it is language independent, then it'll be easier to have others contribute information to it (for example other people can add info related to encoding size and speed for their language implementation )

Remove usage of "file"
document Cookie
length encoding definition
specify the byte order for compressed header (provide example?)
list valid cookie values
valid values for B to F

Instrument ASP.NET end point package

I assume that there is a simple way to add a middleware/handler/router/filter/thing to ASP.NET to allow HdrHistogram to record the time taken for the request to be processed.

If there is then this would be good to provide as a separate nuget package that web devs can just add and then wire up in a one-liner
It should

record the service time,
autorotate instances of HdrHistogram when writing to disk (to target directory)
default to using the endpoint/method/action name as the key for grouping/tagging. (histograms can be merged at later date for higher level aggregates)

Provide a simple entry point for ASP.NET recording

Ideally Web devs should be able to just pull a nuget package and add an attribute to controllers that should be measured.

Potentially an ActionFilterAttibute could be provided in a standalone ASP specific nuget

public class HistogramAttribute : ActionFilterAttribute
{
    private const string StartTimestampKey = "HistogramAttribute.StartTimeStamp";

    public override void OnActionExecuting(ActionExecutingContext context)
    {
        context.HttpContext.Items[StartTimestampKey] = Stopwatch.GetTimestamp();
    }

    public override void OnActionExecuted(ActionExecutedContext context)
    {
        var stopTimestamp = Stopwatch.GetTimestamp();
        var startTimestamp = (long)context.HttpContext.Items[StartTimestampKey];
        var elapsed = stopTimestamp - startTimestamp;
        Log.RecordValue(elapsed);
    }
}

Ideally this would also integrate with tag, recorder and thread safety support.

Create log reader

Fix line endings

The test files have \n line endings, but currently git applies autocrlf to these files changing their line ending incorrectly to \r\n.
I think the fix is to remove the .gitattributes file.

Auto-sizing

Guidance from Gil:

Auto-sizing is another useful thing… Not having to specify an initial range is useful for lazy folks (who are ok with resizing latencies in the recording path). It is also useful as a way to avoid overflowing wrongly-initial-sized histograms: unexpected large values result in a resize rather than an AIOOB exception. If your are ok with taking the latency hit (and potential mem size hit) for that, it's cleaner to code to.

Create a Recorder

Consider supporting Recorder (which supports multiple concurrent writers), but that will also require a ConcurrentHistogram.

Question: What is the Recorder?

Create LongHistogram

Create the default implementation with signed 64 bit counts.

It should support

auto-sizing
percentile, linear, and log based iteration
encode into the compressed V2 serialized histogram format (which is what the current Java code will encode to, as well as the current C and Python code bases).
decode V2.
SingleWriter i.e. not thread safe for writes.

I also propose that the current Histogrram type be renamed to LongHistogram. The Histogram type can then be used as a factory that can help guide users to which instance they need.

potentially something like

var histogram = Histogram
    .With64bitBucketCount() 
    .AsThreadSafe()
    .Create()

Record Scope

Add a feature where you can record the scope of a function call by leveraging the using statement in C#.

It could be use as such

using(recorder.RecordScope())
{
    await SomeExpensiveCall();
}

This allows recording of tasks. It also simplifies recording of long statements without having to create lambdas/closures.

It would incur the allocation cost of assigning the IDisposable resource, but in theory if there is a Task being involved, that allocation cost should be dwarfed by the async context switch and work.

Code to be added could be like the following (added to src/HdrHistogram/HistogramExtensions.cs):

/// <summary>
/// Records the time to call dispose on the returned token.
/// This can be useful to testing large blocks of code, or wrapping around and <c>await</c> clause.
/// </summary>
/// <param name="recorder">The <see cref="IRecorder"/> instance to record the latency in.</param>
/// <returns>Returns a token to be disposed once the scope </returns>
public static IDisposable RecordScope(this IRecorder recorder)
{
    return new Timer(recorder);
}

private sealed class Timer : IDisposable
{
    private readonly IRecorder _recorder;
    private readonly long _start;

    public Timer(IRecorder recorder)
    {
        _recorder = recorder;
        _start = Stopwatch.GetTimestamp();
    }

    public void Dispose()
    {
        var elapsed = Stopwatch.GetTimestamp() - _start;
        _recorder.RecordValue(elapsed);
    }
}

StrongName Assembly

Hi,

Could you provide a strong named version assembly package on nuget?

Thanks.

Rename "header" to envelope in HistogramEncoding

There are cases in the code where we refere to an envelope structure as a header.
This leads you to look for the body/contents, however they are inside the "header".
Thus the header is really and envelope.

Publish the costs to record

Having created benchmarks for each variation of the library, have these published to allow users to make an educated decision about which is the most appropriate way to use the library.

These benchmarks should include a wide variety of CPUs.
Once broad platform support is available, each platform should be included in the results.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.