metrics-rs / quanta Goto Github PK

high-speed timing library in Rust

License: MIT License

Rust 100.00%

rust-lang timing timestamp

quanta's Introduction

quanta

quanta is a high-speed timing library, useful for getting the current time very quickly, as well as manipulating it.

code of conduct

NOTE: All conversations and contributions to this project shall adhere to the Code of Conduct.

usage

The API documentation of this library can be found at docs.rs/quanta.

general features

count CPU cycles via Time Stamp Counter (TSC). or
get monotonic time, in nanoseconds, based on TSC (or OS fallback)
extremely low overhead where possible
mockable
cross-platform
fun, science-y name!

platform / architecture support

For most major platforms -- Linux, Windows, and macOS -- with processors made around or after 2008, you should have no problems using quanta with full TSC support. quanta will always fallback to the included stdlib timing facilities if TSC support is not present. The biggest caveat to this, as evidenced in the compatibility matrix below, is that we only support the TSC on x86/x86_64 platforms.

Platform	stdlib fallback	TSC support?	CI tests?
Linux (x86/x86_64)	✅	✅	✅
Linux (MIPS/ARM)	✅	❌	✅
Windows (x86/x86_64)	✅	✅	✅
Windows (ARM)	✅	❌	❌
macOS (x86/x86_64)	✅	✅	✅
macOS (ARM)	✅	❌	❌
iOS (ARM)	✅	❌	❌

performance

quanta sits neck-and-neck with native OS time facilities: the cost of Clock::now is on par Instant::now from the stdlib, if not better.

why use this over stdlib?

Beyond having a performance edge in specific situations, the ability to use mocked time makes it easier to actually test that your application is doing the right thing when time is involved.

Additionally, and as mentioned in the general features section, quanta provides a safe/thin wrapper over accessing the Time Stamp Counter, which allows measuring cycle counts over short sections of code. This can be relevant/important for accurately measuring performance-critical sections of code.

alternative crates

chrono:
- based on std::time::SystemTime: non-monotonic reads
- focused on timezone-based "date/time" measurements, not intervals/elapsed time
- clock cannot be altered at all (no pause, no discrete updates)
time:
- based on std::time::SystemTime and std::time::Instant:
  - time::Time/time::PrimitiveDateTime use SystemTime: non-monotonic reads
  - time::Instant uses Instant: monotonic reads
- focused on timezone-based "date/time" measurements, not interval/elapsed time
- clock cannot be altered at all (no pause, no discrete updates)
clock:
- based on std::time::SystemTime: non-monotonic reads
- clock can be swapped (trait-based)
- no free function for acquiring time
clocksource:
- based on TSC w/ OS fallback; non-monotonic reads
- clock cannot be altered at all (no pause, no discrete updates)
- depends on unstable asm! macro + feature flag to enable TSC
- no free function for acquiring time
pausable_clock:
- based on std::time::Instant: monotonic reads
- clock can be paused (time can be delayed, but not discretely updated)
- no free function for acquiring time

license

quanta is licensed under the MIT license. (LICENSE or http://opensource.org/licenses/MIT)

quanta's People

Contributors

Stargazers

Watchers

quanta's Issues

Check for RDTSCP support doesn't work

Hi, I faced a problem.
My VM had no support for RDTSCP which resulted in app crash without a stacktrace:

Main process exited, code=dumped, status=4/ILL

After looking into core dumps I found that the reason is rdtscp dependency:

Program terminated with signal SIGILL, Illegal instruction.
#0  core::core_arch::x86::rdtsc::__rdtscp (aux=<optimized out>) at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/core/src/../../stdarch/crates/core_arch/src/x86/rdtsc.rs:49
49	/rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/core/src/../../stdarch/crates/core_arch/src/x86/rdtsc.rs: No such file or directory.

It started to be a problem with introduction of metrics with quanta as a dependency. RDTSCP enabled on VM solved the problem, but I think it should be handled in a more user friendly way inside quanta.

Thank you in advance.

Windows monotonic measurements are slow as heck.

This is the output of the bench tests without asm:

running 13 tests
test bench::time_clocksource_counter       ... bench:     143,028 ns/iter (+/- 29,090)
test bench::time_clocksource_counter_delta ... bench:     293,744 ns/iter (+/- 58,375)
test bench::time_clocksource_time          ... bench:     149,426 ns/iter (+/- 93,349)
test bench::time_clocksource_time_delta    ... bench:     274,217 ns/iter (+/- 53,080)
test bench::time_hotmic_end                ... bench:     112,174 ns/iter (+/- 22,552)
test bench::time_hotmic_now                ... bench:     111,638 ns/iter (+/- 22,125)
test bench::time_hotmic_now_delta          ... bench:     217,484 ns/iter (+/- 37,775)
test bench::time_hotmic_raw                ... bench:     104,440 ns/iter (+/- 24,005)
test bench::time_hotmic_raw_delta          ... bench:     235,843 ns/iter (+/- 40,865)
test bench::time_hotmic_start              ... bench:     109,396 ns/iter (+/- 18,952)
test bench::time_hotmic_start_end_delta    ... bench:     248,919 ns/iter (+/- 348,278)
test bench::time_instant_delta             ... bench:     346,402 ns/iter (+/- 64,994)
test bench::time_instant_now               ... bench:     173,454 ns/iter (+/- 37,758)

With asm enabled, we get the expected performance out of rdtsc, but I really want to figure out if there's a better monotonic measurement on Windows... because it's dirt slow without rdtsc.

Drift from system time

My solution requires a high-performance timestamp (as unix epoch) with the precision of 10µs.

I recognise that it's probably me trying to solve this wrong, but it seems like Quanta continues to drift away from the system clock.

The below code creates both a t0 starting point from Quanta and a t0_sys starting point from SystemTime and attempts to calculate the current system time based on those starting points, but it appears that the Quanta clock keeps drifting further away from the system time.

Is this a Quanta calibration issue, or is my solution just too naive?

use std::{thread, time::{Duration, SystemTime}};
use quanta::Clock;

fn main() {
    let clock = Clock::new();

    let t0_sys = SystemTime::now();
    let t0 = clock.now();

    loop {
        let now_sys = SystemTime::now();
        let now = clock.now();
        
        let calc = t0_sys + now.duration_since(t0);

        println!(" now = {:?}", now_sys);

        let drift = if calc < now_sys { now_sys.duration_since(calc).unwrap() } else { calc.duration_since(now_sys).unwrap() };
        println!("calc = {:?}  drift = {:?}  total elapsed = {:?}\n", calc, drift, t0.elapsed());
        
        thread::sleep(Duration::from_secs(5));
    }
}

Output (for 9 minutes):

 now = SystemTime { tv_sec: 1707962551, tv_nsec: 719348623 }
calc = SystemTime { tv_sec: 1707962551, tv_nsec: 719348610 }  drift = 13ns  total elapsed = 13.161µs

 now = SystemTime { tv_sec: 1707962556, tv_nsec: 719427601 }
calc = SystemTime { tv_sec: 1707962556, tv_nsec: 719843572 }  drift = 415.971µs  total elapsed = 5.000506946s

 now = SystemTime { tv_sec: 1707962561, tv_nsec: 719520138 }
calc = SystemTime { tv_sec: 1707962561, tv_nsec: 720352061 }  drift = 831.923µs  total elapsed = 10.001013009s

 now = SystemTime { tv_sec: 1707962566, tv_nsec: 719617435 }
calc = SystemTime { tv_sec: 1707962566, tv_nsec: 720865172 }  drift = 1.247737ms  total elapsed = 15.001527575s

 now = SystemTime { tv_sec: 1707962571, tv_nsec: 719717931 }
calc = SystemTime { tv_sec: 1707962571, tv_nsec: 721381604 }  drift = 1.663673ms  total elapsed = 20.002040832s

 now = SystemTime { tv_sec: 1707962576, tv_nsec: 719810160 }
calc = SystemTime { tv_sec: 1707962576, tv_nsec: 721889714 }  drift = 2.079554ms  total elapsed = 25.002547581s

 now = SystemTime { tv_sec: 1707962581, tv_nsec: 719881768 }
calc = SystemTime { tv_sec: 1707962581, tv_nsec: 722377122 }  drift = 2.495354ms  total elapsed = 30.003041242s

 now = SystemTime { tv_sec: 1707962586, tv_nsec: 719979596 }
calc = SystemTime { tv_sec: 1707962586, tv_nsec: 722890534 }  drift = 2.910938ms  total elapsed = 35.003548131s

 now = SystemTime { tv_sec: 1707962591, tv_nsec: 720072435 }
calc = SystemTime { tv_sec: 1707962591, tv_nsec: 723399050 }  drift = 3.326615ms  total elapsed = 40.00405675s

 now = SystemTime { tv_sec: 1707962596, tv_nsec: 720163419 }
calc = SystemTime { tv_sec: 1707962596, tv_nsec: 723905741 }  drift = 3.742322ms  total elapsed = 45.004564026s

 now = SystemTime { tv_sec: 1707962601, tv_nsec: 720250036 }
calc = SystemTime { tv_sec: 1707962601, tv_nsec: 724408080 }  drift = 4.158044ms  total elapsed = 50.00506554s

 now = SystemTime { tv_sec: 1707962606, tv_nsec: 720341067 }
calc = SystemTime { tv_sec: 1707962606, tv_nsec: 724914852 }  drift = 4.573785ms  total elapsed = 55.005572367s

 now = SystemTime { tv_sec: 1707962611, tv_nsec: 720431006 }
calc = SystemTime { tv_sec: 1707962611, tv_nsec: 725420448 }  drift = 4.989442ms  total elapsed = 60.006077376s

 now = SystemTime { tv_sec: 1707962616, tv_nsec: 720520919 }
calc = SystemTime { tv_sec: 1707962616, tv_nsec: 725925852 }  drift = 5.404933ms  total elapsed = 65.006585156s

 now = SystemTime { tv_sec: 1707962621, tv_nsec: 720608273 }
calc = SystemTime { tv_sec: 1707962621, tv_nsec: 726428696 }  drift = 5.820423ms  total elapsed = 70.007086931s

 now = SystemTime { tv_sec: 1707962626, tv_nsec: 720698650 }
calc = SystemTime { tv_sec: 1707962626, tv_nsec: 726934700 }  drift = 6.23605ms  total elapsed = 75.007591894s

 now = SystemTime { tv_sec: 1707962631, tv_nsec: 720787916 }
calc = SystemTime { tv_sec: 1707962631, tv_nsec: 727440931 }  drift = 6.653015ms  total elapsed = 80.008100868s

 now = SystemTime { tv_sec: 1707962636, tv_nsec: 720880483 }
calc = SystemTime { tv_sec: 1707962636, tv_nsec: 727950531 }  drift = 7.070048ms  total elapsed = 85.008609345s

 now = SystemTime { tv_sec: 1707962641, tv_nsec: 720966320 }
calc = SystemTime { tv_sec: 1707962641, tv_nsec: 728453200 }  drift = 7.48688ms  total elapsed = 90.009112827s

 now = SystemTime { tv_sec: 1707962646, tv_nsec: 721043183 }
calc = SystemTime { tv_sec: 1707962646, tv_nsec: 728946270 }  drift = 7.903087ms  total elapsed = 95.009604076s

 now = SystemTime { tv_sec: 1707962651, tv_nsec: 721138216 }
calc = SystemTime { tv_sec: 1707962651, tv_nsec: 729457356 }  drift = 8.31914ms  total elapsed = 100.010114365s

 now = SystemTime { tv_sec: 1707962656, tv_nsec: 721248755 }
calc = SystemTime { tv_sec: 1707962656, tv_nsec: 729983988 }  drift = 8.735233ms  total elapsed = 105.010641908s

 now = SystemTime { tv_sec: 1707962661, tv_nsec: 721338475 }
calc = SystemTime { tv_sec: 1707962661, tv_nsec: 730490203 }  drift = 9.151728ms  total elapsed = 110.01115683s

 now = SystemTime { tv_sec: 1707962666, tv_nsec: 721456914 }
calc = SystemTime { tv_sec: 1707962666, tv_nsec: 731024553 }  drift = 9.567639ms  total elapsed = 115.011682454s

 now = SystemTime { tv_sec: 1707962671, tv_nsec: 721551649 }
calc = SystemTime { tv_sec: 1707962671, tv_nsec: 731535516 }  drift = 9.983867ms  total elapsed = 120.01219395s

 now = SystemTime { tv_sec: 1707962676, tv_nsec: 721664798 }
calc = SystemTime { tv_sec: 1707962676, tv_nsec: 732064866 }  drift = 10.400068ms  total elapsed = 125.012722723s

 now = SystemTime { tv_sec: 1707962681, tv_nsec: 721752590 }
calc = SystemTime { tv_sec: 1707962681, tv_nsec: 732568835 }  drift = 10.816245ms  total elapsed = 130.013226457s

 now = SystemTime { tv_sec: 1707962686, tv_nsec: 721864067 }
calc = SystemTime { tv_sec: 1707962686, tv_nsec: 733096600 }  drift = 11.232533ms  total elapsed = 135.013758803s

 now = SystemTime { tv_sec: 1707962691, tv_nsec: 721962493 }
calc = SystemTime { tv_sec: 1707962691, tv_nsec: 733611001 }  drift = 11.648508ms  total elapsed = 140.014271895s

 now = SystemTime { tv_sec: 1707962696, tv_nsec: 722076686 }
calc = SystemTime { tv_sec: 1707962696, tv_nsec: 734141093 }  drift = 12.064407ms  total elapsed = 145.014801735s

 now = SystemTime { tv_sec: 1707962701, tv_nsec: 722164929 }
calc = SystemTime { tv_sec: 1707962701, tv_nsec: 734645171 }  drift = 12.480242ms  total elapsed = 150.015305174s

 now = SystemTime { tv_sec: 1707962706, tv_nsec: 722260082 }
calc = SystemTime { tv_sec: 1707962706, tv_nsec: 735156154 }  drift = 12.896072ms  total elapsed = 155.01581361s

 now = SystemTime { tv_sec: 1707962711, tv_nsec: 722352244 }
calc = SystemTime { tv_sec: 1707962711, tv_nsec: 735664343 }  drift = 13.312099ms  total elapsed = 160.01632217s

 now = SystemTime { tv_sec: 1707962716, tv_nsec: 722460704 }
calc = SystemTime { tv_sec: 1707962716, tv_nsec: 736188859 }  drift = 13.728155ms  total elapsed = 165.016848541s

 now = SystemTime { tv_sec: 1707962721, tv_nsec: 722546280 }
calc = SystemTime { tv_sec: 1707962721, tv_nsec: 736690460 }  drift = 14.14418ms  total elapsed = 170.017353324s

 now = SystemTime { tv_sec: 1707962726, tv_nsec: 722660744 }
calc = SystemTime { tv_sec: 1707962726, tv_nsec: 737220838 }  drift = 14.560094ms  total elapsed = 175.01788044s

 now = SystemTime { tv_sec: 1707962731, tv_nsec: 722753321 }
calc = SystemTime { tv_sec: 1707962731, tv_nsec: 737729302 }  drift = 14.975981ms  total elapsed = 180.018387307s

 now = SystemTime { tv_sec: 1707962736, tv_nsec: 722861646 }
calc = SystemTime { tv_sec: 1707962736, tv_nsec: 738253540 }  drift = 15.391894ms  total elapsed = 185.018912819s

 now = SystemTime { tv_sec: 1707962741, tv_nsec: 722947428 }
calc = SystemTime { tv_sec: 1707962741, tv_nsec: 738755227 }  drift = 15.807799ms  total elapsed = 190.019413896s

 now = SystemTime { tv_sec: 1707962746, tv_nsec: 723055532 }
calc = SystemTime { tv_sec: 1707962746, tv_nsec: 739279228 }  drift = 16.223696ms  total elapsed = 195.019936533s

 now = SystemTime { tv_sec: 1707962751, tv_nsec: 723162098 }
calc = SystemTime { tv_sec: 1707962751, tv_nsec: 739801828 }  drift = 16.63973ms  total elapsed = 200.020462181s

 now = SystemTime { tv_sec: 1707962756, tv_nsec: 723247690 }
calc = SystemTime { tv_sec: 1707962756, tv_nsec: 740303255 }  drift = 17.055565ms  total elapsed = 205.02096349s

 now = SystemTime { tv_sec: 1707962761, tv_nsec: 723335215 }
calc = SystemTime { tv_sec: 1707962761, tv_nsec: 740806776 }  drift = 17.471561ms  total elapsed = 210.021466438s

 now = SystemTime { tv_sec: 1707962766, tv_nsec: 723443518 }
calc = SystemTime { tv_sec: 1707962766, tv_nsec: 741331079 }  drift = 17.887561ms  total elapsed = 215.021989661s

 now = SystemTime { tv_sec: 1707962771, tv_nsec: 723549737 }
calc = SystemTime { tv_sec: 1707962771, tv_nsec: 741853312 }  drift = 18.303575ms  total elapsed = 220.022510808s

 now = SystemTime { tv_sec: 1707962776, tv_nsec: 723631714 }
calc = SystemTime { tv_sec: 1707962776, tv_nsec: 742351272 }  drift = 18.719558ms  total elapsed = 225.023009199s

 now = SystemTime { tv_sec: 1707962781, tv_nsec: 723719892 }
calc = SystemTime { tv_sec: 1707962781, tv_nsec: 742855697 }  drift = 19.135805ms  total elapsed = 230.023520487s

 now = SystemTime { tv_sec: 1707962786, tv_nsec: 723833750 }
calc = SystemTime { tv_sec: 1707962786, tv_nsec: 743385329 }  drift = 19.551579ms  total elapsed = 235.024042784s

 now = SystemTime { tv_sec: 1707962791, tv_nsec: 723939491 }
calc = SystemTime { tv_sec: 1707962791, tv_nsec: 743907064 }  drift = 19.967573ms  total elapsed = 240.024565823s

 now = SystemTime { tv_sec: 1707962796, tv_nsec: 724045277 }
calc = SystemTime { tv_sec: 1707962796, tv_nsec: 744428766 }  drift = 20.383489ms  total elapsed = 245.025087785s

 now = SystemTime { tv_sec: 1707962801, tv_nsec: 724131037 }
calc = SystemTime { tv_sec: 1707962801, tv_nsec: 744930446 }  drift = 20.799409ms  total elapsed = 250.025587826s

 now = SystemTime { tv_sec: 1707962806, tv_nsec: 724235352 }
calc = SystemTime { tv_sec: 1707962806, tv_nsec: 745450690 }  drift = 21.215338ms  total elapsed = 255.026108662s

 now = SystemTime { tv_sec: 1707962811, tv_nsec: 724339538 }
calc = SystemTime { tv_sec: 1707962811, tv_nsec: 745970845 }  drift = 21.631307ms  total elapsed = 260.026630324s

 now = SystemTime { tv_sec: 1707962816, tv_nsec: 724445564 }
calc = SystemTime { tv_sec: 1707962816, tv_nsec: 746492833 }  drift = 22.047269ms  total elapsed = 265.027150946s

 now = SystemTime { tv_sec: 1707962821, tv_nsec: 724531230 }
calc = SystemTime { tv_sec: 1707962821, tv_nsec: 746994508 }  drift = 22.463278ms  total elapsed = 270.027653986s

 now = SystemTime { tv_sec: 1707962826, tv_nsec: 724636581 }
calc = SystemTime { tv_sec: 1707962826, tv_nsec: 747515816 }  drift = 22.879235ms  total elapsed = 275.02817344s

 now = SystemTime { tv_sec: 1707962831, tv_nsec: 724739638 }
calc = SystemTime { tv_sec: 1707962831, tv_nsec: 748034862 }  drift = 23.295224ms  total elapsed = 280.028693258s

 now = SystemTime { tv_sec: 1707962836, tv_nsec: 724844064 }
calc = SystemTime { tv_sec: 1707962836, tv_nsec: 748555303 }  drift = 23.711239ms  total elapsed = 285.029214313s

 now = SystemTime { tv_sec: 1707962841, tv_nsec: 724932917 }
calc = SystemTime { tv_sec: 1707962841, tv_nsec: 749060269 }  drift = 24.127352ms  total elapsed = 290.029724703s

 now = SystemTime { tv_sec: 1707962846, tv_nsec: 725048188 }
calc = SystemTime { tv_sec: 1707962846, tv_nsec: 749591523 }  drift = 24.543335ms  total elapsed = 295.030253565s

 now = SystemTime { tv_sec: 1707962851, tv_nsec: 725161208 }
calc = SystemTime { tv_sec: 1707962851, tv_nsec: 750120569 }  drift = 24.959361ms  total elapsed = 300.030786318s

 now = SystemTime { tv_sec: 1707962856, tv_nsec: 725277412 }
calc = SystemTime { tv_sec: 1707962856, tv_nsec: 750652741 }  drift = 25.375329ms  total elapsed = 305.031315749s

 now = SystemTime { tv_sec: 1707962861, tv_nsec: 725372624 }
calc = SystemTime { tv_sec: 1707962861, tv_nsec: 751163943 }  drift = 25.791319ms  total elapsed = 310.031826795s

 now = SystemTime { tv_sec: 1707962866, tv_nsec: 725482145 }
calc = SystemTime { tv_sec: 1707962866, tv_nsec: 751689248 }  drift = 26.207103ms  total elapsed = 315.032347362s

 now = SystemTime { tv_sec: 1707962871, tv_nsec: 725584724 }
calc = SystemTime { tv_sec: 1707962871, tv_nsec: 752207880 }  drift = 26.623156ms  total elapsed = 320.032868903s

 now = SystemTime { tv_sec: 1707962876, tv_nsec: 725671064 }
calc = SystemTime { tv_sec: 1707962876, tv_nsec: 752710589 }  drift = 27.039525ms  total elapsed = 325.033372152s

 now = SystemTime { tv_sec: 1707962881, tv_nsec: 725764340 }
calc = SystemTime { tv_sec: 1707962881, tv_nsec: 753219758 }  drift = 27.455418ms  total elapsed = 330.033881672s

 now = SystemTime { tv_sec: 1707962886, tv_nsec: 725871355 }
calc = SystemTime { tv_sec: 1707962886, tv_nsec: 753743370 }  drift = 27.872015ms  total elapsed = 335.034403578s

 now = SystemTime { tv_sec: 1707962891, tv_nsec: 725946554 }
calc = SystemTime { tv_sec: 1707962891, tv_nsec: 754238903 }  drift = 28.292349ms  total elapsed = 340.034901387s

 now = SystemTime { tv_sec: 1707962896, tv_nsec: 726057245 }
calc = SystemTime { tv_sec: 1707962896, tv_nsec: 754768739 }  drift = 28.711494ms  total elapsed = 345.035432365s

 now = SystemTime { tv_sec: 1707962901, tv_nsec: 726150524 }
calc = SystemTime { tv_sec: 1707962901, tv_nsec: 755277930 }  drift = 29.127406ms  total elapsed = 350.035937817s

 now = SystemTime { tv_sec: 1707962906, tv_nsec: 726255630 }
calc = SystemTime { tv_sec: 1707962906, tv_nsec: 755799281 }  drift = 29.543651ms  total elapsed = 355.036461513s

 now = SystemTime { tv_sec: 1707962911, tv_nsec: 726364243 }
calc = SystemTime { tv_sec: 1707962911, tv_nsec: 756324124 }  drift = 29.959881ms  total elapsed = 360.036984313s

 now = SystemTime { tv_sec: 1707962916, tv_nsec: 726441390 }
calc = SystemTime { tv_sec: 1707962916, tv_nsec: 756817667 }  drift = 30.376277ms  total elapsed = 365.037481449s

 now = SystemTime { tv_sec: 1707962921, tv_nsec: 726523866 }
calc = SystemTime { tv_sec: 1707962921, tv_nsec: 757316302 }  drift = 30.792436ms  total elapsed = 370.037977773s

 now = SystemTime { tv_sec: 1707962926, tv_nsec: 726611141 }
calc = SystemTime { tv_sec: 1707962926, tv_nsec: 757819806 }  drift = 31.208665ms  total elapsed = 375.038480758s

 now = SystemTime { tv_sec: 1707962931, tv_nsec: 726717065 }
calc = SystemTime { tv_sec: 1707962931, tv_nsec: 758342011 }  drift = 31.624946ms  total elapsed = 380.039003595s

 now = SystemTime { tv_sec: 1707962936, tv_nsec: 726826375 }
calc = SystemTime { tv_sec: 1707962936, tv_nsec: 758867631 }  drift = 32.041256ms  total elapsed = 385.039530785s

 now = SystemTime { tv_sec: 1707962941, tv_nsec: 726914121 }
calc = SystemTime { tv_sec: 1707962941, tv_nsec: 759371518 }  drift = 32.457397ms  total elapsed = 390.040032136s

 now = SystemTime { tv_sec: 1707962946, tv_nsec: 727016646 }
calc = SystemTime { tv_sec: 1707962946, tv_nsec: 759890241 }  drift = 32.873595ms  total elapsed = 395.0405477s

 now = SystemTime { tv_sec: 1707962951, tv_nsec: 727095395 }
calc = SystemTime { tv_sec: 1707962951, tv_nsec: 760385210 }  drift = 33.289815ms  total elapsed = 400.041043593s

 now = SystemTime { tv_sec: 1707962956, tv_nsec: 727196072 }
calc = SystemTime { tv_sec: 1707962956, tv_nsec: 760902134 }  drift = 33.706062ms  total elapsed = 405.041563365s

 now = SystemTime { tv_sec: 1707962961, tv_nsec: 727282692 }
calc = SystemTime { tv_sec: 1707962961, tv_nsec: 761405032 }  drift = 34.12234ms  total elapsed = 410.042068703s

 now = SystemTime { tv_sec: 1707962966, tv_nsec: 727364747 }
calc = SystemTime { tv_sec: 1707962966, tv_nsec: 761903292 }  drift = 34.538545ms  total elapsed = 415.04256442s

 now = SystemTime { tv_sec: 1707962971, tv_nsec: 727447152 }
calc = SystemTime { tv_sec: 1707962971, tv_nsec: 762401816 }  drift = 34.954664ms  total elapsed = 420.043060108s

 now = SystemTime { tv_sec: 1707962976, tv_nsec: 727542869 }
calc = SystemTime { tv_sec: 1707962976, tv_nsec: 762913801 }  drift = 35.370932ms  total elapsed = 425.043576301s

 now = SystemTime { tv_sec: 1707962981, tv_nsec: 727629679 }
calc = SystemTime { tv_sec: 1707962981, tv_nsec: 763416689 }  drift = 35.78701ms  total elapsed = 430.044077332s

 now = SystemTime { tv_sec: 1707962986, tv_nsec: 727704420 }
calc = SystemTime { tv_sec: 1707962986, tv_nsec: 763907663 }  drift = 36.203243ms  total elapsed = 435.044568185s

 now = SystemTime { tv_sec: 1707962991, tv_nsec: 727787001 }
calc = SystemTime { tv_sec: 1707962991, tv_nsec: 764406485 }  drift = 36.619484ms  total elapsed = 440.045068706s

 now = SystemTime { tv_sec: 1707962996, tv_nsec: 727894123 }
calc = SystemTime { tv_sec: 1707962996, tv_nsec: 764929873 }  drift = 37.03575ms  total elapsed = 445.045593969s

 now = SystemTime { tv_sec: 1707963001, tv_nsec: 727980439 }
calc = SystemTime { tv_sec: 1707963001, tv_nsec: 765432400 }  drift = 37.451961ms  total elapsed = 450.046092337s

 now = SystemTime { tv_sec: 1707963006, tv_nsec: 728080601 }
calc = SystemTime { tv_sec: 1707963006, tv_nsec: 765948898 }  drift = 37.868297ms  total elapsed = 455.046609879s

 now = SystemTime { tv_sec: 1707963011, tv_nsec: 728163762 }
calc = SystemTime { tv_sec: 1707963011, tv_nsec: 766448113 }  drift = 38.284351ms  total elapsed = 460.047105949s

 now = SystemTime { tv_sec: 1707963016, tv_nsec: 728265499 }
calc = SystemTime { tv_sec: 1707963016, tv_nsec: 766966179 }  drift = 38.70068ms  total elapsed = 465.04762976s

 now = SystemTime { tv_sec: 1707963021, tv_nsec: 728349028 }
calc = SystemTime { tv_sec: 1707963021, tv_nsec: 767465873 }  drift = 39.116845ms  total elapsed = 470.048124905s

 now = SystemTime { tv_sec: 1707963026, tv_nsec: 728431554 }
calc = SystemTime { tv_sec: 1707963026, tv_nsec: 767964685 }  drift = 39.533131ms  total elapsed = 475.048627962s

 now = SystemTime { tv_sec: 1707963031, tv_nsec: 728519357 }
calc = SystemTime { tv_sec: 1707963031, tv_nsec: 768468693 }  drift = 39.949336ms  total elapsed = 480.049127778s

 now = SystemTime { tv_sec: 1707963036, tv_nsec: 728617802 }
calc = SystemTime { tv_sec: 1707963036, tv_nsec: 768983387 }  drift = 40.365585ms  total elapsed = 485.049645546s

 now = SystemTime { tv_sec: 1707963041, tv_nsec: 728700575 }
calc = SystemTime { tv_sec: 1707963041, tv_nsec: 769482394 }  drift = 40.781819ms  total elapsed = 490.050143427s

 now = SystemTime { tv_sec: 1707963046, tv_nsec: 728785953 }
calc = SystemTime { tv_sec: 1707963046, tv_nsec: 769983997 }  drift = 41.198044ms  total elapsed = 495.050645064s

 now = SystemTime { tv_sec: 1707963051, tv_nsec: 728867235 }
calc = SystemTime { tv_sec: 1707963051, tv_nsec: 770481686 }  drift = 41.614451ms  total elapsed = 500.051142864s

 now = SystemTime { tv_sec: 1707963056, tv_nsec: 728969667 }
calc = SystemTime { tv_sec: 1707963056, tv_nsec: 771000717 }  drift = 42.03105ms  total elapsed = 505.051662541s

 now = SystemTime { tv_sec: 1707963061, tv_nsec: 729050049 }
calc = SystemTime { tv_sec: 1707963061, tv_nsec: 771497556 }  drift = 42.447507ms  total elapsed = 510.052158226s

 now = SystemTime { tv_sec: 1707963066, tv_nsec: 729146029 }
calc = SystemTime { tv_sec: 1707963066, tv_nsec: 772009898 }  drift = 42.863869ms  total elapsed = 515.052667822s

 now = SystemTime { tv_sec: 1707963071, tv_nsec: 729220911 }
calc = SystemTime { tv_sec: 1707963071, tv_nsec: 772501025 }  drift = 43.280114ms  total elapsed = 520.053158718s

 now = SystemTime { tv_sec: 1707963076, tv_nsec: 729314811 }
calc = SystemTime { tv_sec: 1707963076, tv_nsec: 773011208 }  drift = 43.696397ms  total elapsed = 525.053672783s

 now = SystemTime { tv_sec: 1707963081, tv_nsec: 729396483 }
calc = SystemTime { tv_sec: 1707963081, tv_nsec: 773509236 }  drift = 44.112753ms  total elapsed = 530.054172296s

 now = SystemTime { tv_sec: 1707963086, tv_nsec: 729481277 }
calc = SystemTime { tv_sec: 1707963086, tv_nsec: 774010302 }  drift = 44.529025ms  total elapsed = 535.054669551s

 now = SystemTime { tv_sec: 1707963091, tv_nsec: 729557816 }
calc = SystemTime { tv_sec: 1707963091, tv_nsec: 774503205 }  drift = 44.945389ms  total elapsed = 540.055163203s

Create a torture test application to help validate behavior

As evidenced by #17, the act of using the Time Stamp Counter can be rife with corner cases that make it hard to reason about how intended behavior will diverge from actual behavior. As well, we lack a consistent way to attempt to validate the information we do find: if the Intel SDM tells us something should function a certain way, how do we try and verify that in practice?

Given the overarching goal of quanta to provide a safe and fast interface to TSC/RDTSC, we should create a torture test application that can be used to attempt to surface bugs and inconsistencies.

Specifically, given the concern about TSC synchronization between cores, and how that would affect timekeeping math, a good variant of this hypothetical torture test app would be one where multiple threads are taking measurements, yielding, doing something that hopefully gets them rescheduled on different cores, and taking a follow up measurement, looking for warps in time. Potentially even some sort of Dining Philosophers type of thing where values are sent between threads specifically to attempt to exacerbate being scheduled on another core, or another socket.

Does it make sense to pin `Upkeep` thread?

My knowledge regarding timestamp counters is limited. However, the discussions in #17 and #61 left me wondering - does it make sense to pin the Upkeep thread and prevent it from moving to other cores/sockets?

Since Upkeep can move freely, and it calls Clock::now() periodically, is it possible to get a wildly different raw value after rescheduling, thus resulting in less than ideal measurement even after scaling? Would pinning improve accuracy (at least theoretically), or is it not a concern due to some other reasons?

core::arch ?

Hi,

I was looking for a timing library on crates.io and saw quanta.
I am wondering why you are not using the module core::arch ?

It is available on stable rust and the instruction rdtsc is there:
https://doc.rust-lang.org/core/arch/x86_64/fn._rdtsc.html

Should quanta read the Time Stamp Counter at all?

In #16, we explored the inherent sources of potential error for quanta given that it's reaching out to the processor directly, and carries none of the calibration/edge case handling that a hardened usage like, say, by the OS, would provide.

None of this is to say that we couldn't do some of this work ourselves. Similarly, I don't believe that quanta is grossly inaccurate. It shouldn't be. We do have to admit, though, that using quanta as a drop-in replacement for Instant where timing is not done purely for extremely hot loops may not be the most correct choice where both high resolution (nanoseconds) and high accuracy is required.

With all of that said, we have some potential options that I've been thinking of, and this is not necessarily an exhaustive list:

we could only ever use the reference clock, and depend on OS-specific guarantees as our base assumptions (this lets us keep mockability, recent time, etc)
we can bake in our own calibration and try to account for all of the edge cases
we could bifurcate now and raw such that we still allow raw measurements, and could attempt to scale them to reference time, but we transform now into the more generalized/stable measurement

My belief is that #2 is achievable but that it would take a decently large effort to test all of this behavior, read existing sources, distill that into quanta, etc. For example, how do we deal with the potential TSC offset across cores/sockets? Can the same solution to "different cores" be applied to "different sockets"? How do we deal with this across multiple OSes where I may not have any source to read to understand how they handle TSC synchronization?

Providing the most information we can to the users of the crate might be beneficial enough to allow them to make a smarter choice about where we have these pitfalls. For example, if we can show that Linux does a damn good job of synchronizing the TSC no matter the processor configuration, then the exposing that to the user means they can decide if they want to roll with the fast path TSC, or have quanta fallback to the OS-based timing.

Long story short, we have some things to explore and think about!

does not build on MIPS

I can't build quanta 0.3.1 with mipsel-unknown-linux-musl target. Should it still build and fallback to OS facilities?
Is there any workaround to use it with metrics-runtime?

Found this in docs for std::sync::atomic:

PowerPC and MIPS platforms with 32-bit pointers do not have AtomicU64 or AtomicI64 types.

error[E0432]: unresolved import `std::sync::atomic::AtomicU64`
  --> /cargo/registry/src/github.com-1ecc6299db9ec823/quanta-0.3.1/src/lib.rs:51:14
   |
51 |     atomic::{AtomicU64, Ordering},
   |              ^^^^^^^^^
   |              |
   |              no `AtomicU64` in `sync::atomic`
   |              help: a similar name exists in the module: `AtomicU8`

error[E0432]: unresolved import `std::sync::atomic::AtomicU64`
 --> /cargo/registry/src/github.com-1ecc6299db9ec823/quanta-0.3.1/src/mock.rs:5:18
  |
5 |         atomic::{AtomicU64, Ordering},
  |                  ^^^^^^^^^
  |                  |
  |                  no `AtomicU64` in `sync::atomic`
  |                  help: a similar name exists in the module: `AtomicU8`

error: aborting due to 2 previous errors

Testing for all supported platforms.

We should invest some time in seeing if we can get our test suite to actually run for all platforms that we target/support.

We already target Linux, Windows, and macOS on 86_64, which is good. We should theoretically be able to target WASM/WASI via cargo-bindgen-wasm and cargo-wasi and I think those could just execute on Linux x86_64 runners.

The tricker bit is if we wanted to test on another CPU architecture, as the free tier Github Actions runners have no ARM/ARM64 support, and setting up our own runners would be difficult/cost prohibitive... unless there's some sort of 3rd party runner service I don't know about.

Allow Instant to be used atomically

I have a use case in a rate limiter where I need an atomic instant. There are to solutions that I can think of:

Add the ability for the user to convert the instant into and from a raw u64. I think this ability used to exist but was removed (as_unix_duration).
Add a AtomicInstant value that inputs and outputs Instant with a load and set method etc.

Currently a user can transmute the Instant into u64 to emulate the first solution, however this is unsound obviously.

Unclear how to use Mock from rustdoc

Hi there, I just stumbled over quanta in search for a mockable time source for my crate, https://crates.io/crates/ratelimit_meter - it seems like a great match for what I'm trying to do over there!

Now, I'd like to figure out if I could make a time source for the rate-limiter that can be mocked for tests, and just reading the rustdoc for Mock, I am not sure how to proceed. Do you have an example that I could steal^H^H^H^H^Hreference? (:

Thanks!

Any plans for ARM support ?

Do you have any plans for ARM support ? Specially now that Apple is moving to ARM processors.

In case it helps, this is the closest thing I found, but written in C:
https://github.com/google/benchmark/blob/v1.1.0/src/cycleclock.h#L116

Populate recent time once before starting upkeep thread.

During some testing in #92, it was discovered that the initial call to set recent time can take unexpectedly long, which leaves the initial recent time at 0, which can lead to the first delta attempt using recent time (i.e. let start = Clock::recent(); ...; let delta = start.elapsed()) being unexpectedly large.

We should do a single call to populate recent time before we spawn the actual upkeep thread, so that we're anchored as close as we possibly can be to the true time, rather than just having the initial value of zero.

Adding support for TSC on ARM.

A user recently asked if quanta has ARM support for TSC and what it would take to add support. It felt like a good time to add an issue and potentially revisit what it would take.

At a high level, ARM doesn't have exactly the same instruction for reading it directly, but it has a similar instruction -- mrs -- which can read.. coprocessor registers? And apparently there's a coprocessor register for a counter that works in the same way as the TSC does in x86: cntvct_el0. Additionally, there's also another coprocessor register that apparently holds the true TSC frequency: cntfrq_el0.

Additionally, I had mistakenly thought this was only doable when the process had access to run privileged instructions a la reading certain Intel performance MSRs requiring root on Linux. Turns out that this is not the case.

The major thing we're lacking at the moment is a stable way to run the mrs instruction on ARM. There's no intrinsic for it in core::arch, and even the existing ARM intrinsics are themselves all still unstable. Likewise, asm! for writing assembly directly is also still unstable.

Once either of those approaches becomes stable, we can investigate doing an initial attempt to support a TSC-like mode on ARM.

Rust asm! tracking issue: rust-lang/rust#72016

Switch to CLOCK_BOOTTIME and friends to improve accuracy.

Per the discussion happening on rust-lang/rust#88714, there's a meaningful difference between CLOCK_BOOTTIME and CLOCK_MONOTONIC when it comes to time across system suspends on Linux. According to the issue, the problem they're trying to solve is that CLOCK_MONOTONIC stops ticking during suspend, while CLOCK_BOOTTIME does not. This raises two problems for quanta:

Monotonic mode

When invariant TSC support is not detected, we fall back to the "monotonic" mode where we query the time directly. This is all fine and good, but we're also querying with CLOCK_MONOTONIC, and similar variants on other platforms. This leaves us open to the exact same problem described in the above issue.

Counter (TSC) mode

While I have not fully traced whether or not this matters, there's a potential reality where CLOCK_MONOTONIC stops ticking during lower CPU power states, such that as we're going through the calibration loop, our reference drifts with every loop we perform. While invariant TSC should be guaranteed to tick at a constant rate -- recent Intel manuals specifically use the language of The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. -- this is moot if our initial reference/source calibration is off, as we need that in order to go from TSC cycles to real time units.

At any rate, switching shouldn't do anything but make things more accurate, but to reference the issue again, there are also some concerns about when the support for it was introduced, and on which platforms it matters. With that in mind, we likely need to wait for that PR to shake out to make sure we have a good example of where we'll need to make our changes.

Always gives the time since last boot

I don't if I'm misunderstanding, Instant::now().as_unix_duration() should return a Duration since UNIX epoch right? But it always gives me the time since my last boot.

Code:

use quanta::Instant;
fn main() {
    println!("{:#?}", Instant::now().as_unix_duration());
}

Result:

Environment:
rustc 1.49.0 (e1884a8e3 2020-12-29)
cargo 1.49.0 (d00d64df9 2020-12-05)
5.12.8-arch1-1 ArchLinux
AMD 5950x

Intermittent panic due to overflowing our source calibration denominator.

This was reported in an old issue in the metrics crate -- metrics-rs/metrics#230 -- but essentially the user was hitting the line where we hypothesized that something was amiss if we managed to wrap around when calling u64::next_power_of_two.

Thinking on this some more, one potential explanation is that the user had different TSC offsets on different cores, and somehow they were hitting a case where the source measurement happening in adjust_cal_ratio was actually on a core where the TSC value was smaller than the value taken initially via Counter::now in calibrate.

That would seemingly explain how the end - start calculation could yield a number that would cause next_power_of_two to overflow, and as long as the absolute delta was smaller than (2^64)-(2^63), we'd always end up with a value that would trigger that overflow.

The bigger question might be: why did this user's set-up somehow manage to trigger this behavior, even at the quoted "maybe once out of 20 times" rate, when quanta is used in many applications that never seem to experience this?

RUSTSEC-2020-0168: mach is unmaintained

I have a project, that indirectly depends on quanta through metrics-exporter-prometheus. The mach dependency, on which quanta depends, is unmaintained: https://rustsec.org/advisories/RUSTSEC-2020-0168.html

There is a fork of mach, that might be an alternative: https://crates.io/crates/mach2

While the project does not run on iOS or MacOS, I would prefer to know the problem is fixed instead of suppressing this advisory.

Upkeep's Error is not accessible

I'm finally getting around to updating governor to use quanta 0.6.0 - unfortunately, it looks like the Error that Upkeep starting returns is not accessible (it's in a private mod, but not pub used in lib.rs), so can't be used in function signatures.

Could you publicly export it?

Fix benchmark GH action.

Currently, the action is broken (although it worked before? hmmm) with what looks to be some sort of permission error, not allowing the action to push to GH Pages. This causes the bench steps in CI to always fail which is annoying on PRs.

Can Clock provide `delta_as_nanos`?

In governor, I am using a raw nanoseconds representation for quanta readings, which means that every time I take a clock reading, quanta converts nanos into an std::time::Duration from a nanos value, which I then convert back. That costs us quite a bit of performance, as well as being so dissatisfyingly close to what I want: You already have the right kind of value, can I just get it without costly conversions? (-:

Attempting to start Upkeep always returns an UpkeepRunning error

Even given the below minimal example:

use quanta::Upkeep;
use std::time::Duration;

fn main() {
    Upkeep::new(Duration::from_millis(250)).start().unwrap();
}

This will fail with:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: UpkeepRunning', src/main.rs:5:5

There are no other threads, and I have not started another Upkeep.
Perhaps I am just using it wrong, but there aren't any examples. I apologize if this ticket is erroneous.

This is with quanta 0.6.4, tested on Mac OSX.

Crash on iOS

Please review it. here is the call stack.

Types for measurements?

I'm looking into Quanta for a clock implementation in my library, and am somewhat confused how to safely use it, especially with the optimized clock methods that all return u64.

To help me understand the various interconnected pieces, I made the following - maybe it's useful for somebody, or maybe it can be included in quanta as a module?

#![cfg_attr(feature = "cargo-clippy", deny(warnings))]

use std::sync::Arc;
use std::time::Duration;

#[repr(transparent)]
pub struct RawMeasurement(u64);

impl RawMeasurement {
    pub unsafe fn raw_value(&self) -> u64 {
        self.0
    }
}

#[repr(transparent)]
pub struct ReferenceTime(u64);

impl ReferenceTime {
    pub fn to_nanos(&self) -> u64 {
        self.0
    }

    pub fn to_duration(&self) -> Duration {
        Duration::from_nanos(self.0)
    }
}

#[repr(transparent)]
pub struct Clock(quanta::Clock);

impl Clock {
    pub fn new() -> Clock {
        Clock(quanta::Clock::new())
    }

    pub fn mock() -> (Clock, Arc<quanta::Mock>) {
        let (qc, mock) = quanta::Clock::mock();
        (Clock(qc), mock)
    }

    pub fn now(&self) -> ReferenceTime {
        ReferenceTime(self.0.now())
    }

    pub fn start(&self) -> RawMeasurement {
        RawMeasurement(self.0.start())
    }

    pub fn end(&self) -> RawMeasurement {
        RawMeasurement(self.0.end())
    }

    pub fn scaled(&self, raw: RawMeasurement) -> ReferenceTime {
        ReferenceTime(self.0.scaled(raw.0))
    }

    pub fn delta(&self, start: RawMeasurement, end: RawMeasurement) -> ReferenceTime {
        ReferenceTime(self.0.delta(start.0, end.0))
    }

    pub fn recent(&self) -> ReferenceTime {
        ReferenceTime(self.0.recent())
    }
}

Main questions are:

Is the type structure above correct?
Is that even useful?

Fails to build on x86 without SSE2

FreeBSD, NetBSD, OpenBSD still support pre-SSE2 CPUs, so downstream rust package for 32-bit x86 (aka i386) targets real i686 (aka pentiumpro) or i586 (aka pentium). To reproduce on Linux (via rustup) pass --target i586-unknown-linux-gnu or similar.

$ cargo build
[...]
   Compiling quanta v0.10.0 (/tmp/quanta)
error[E0599]: no function or associated item named `new` found for struct `CpuId` in the current scope
   --> src/lib.rs:558:24
    |
558 |     let cpuid = CpuId::new();
    |                        ^^^ function or associated item not found in `CpuId`

error[E0599]: no function or associated item named `new` found for struct `CpuId` in the current scope
   --> src/lib.rs:566:24
    |
566 |     let cpuid = CpuId::new();
    |                        ^^^ function or associated item not found in `CpuId`

For more information about this error, try `rustc --explain E0599`.
error: could not compile `quanta` due to 2 previous errors
warning: build failed, waiting for other jobs to finish...

Commit for version 0.12.3 it not reachable from main branch

I've just picked up new version, and decided to go check the changelog, opened up https://github.com/metrics-rs/quanta/blob/main/CHANGELOG.md, and new version was not there!

It was confusing, even worrisome for a second: what if your credentials was stolen or something like that. Can you please move main branch to 278b3b6 or later? At the moment it is pointing to 8c14d6e

If anyone wonders if 0.12.3 is a geniune release: tag v0.12.3 is present, and diff between tarballs for 0.12.2 and 0.12.3 matches with changes in 267effa, 702ea18 and fe3bb3d

Calibration is broken after switching away from feature flags.

When we switched away from using the tsc feature, we forgot to update Clock::new.

Thus, when using the underlying RDTSC counter mode, we're initializing our Clock with an "identical" calibration, aka no calibration at all.

We could fix this by either replacing the cfg attributes with the equivalent ones used to enable/disable the RDTSC counter code, or we could always just trigger a calibration and call it a day.

Either way, calibration is very broken.

Provide global shared calibration?

In magnet/metered-rs#21, it was brought up that the calibration process for Clock takes one second and can be a surprise to folks, especially if they were creating brand new Clock objects very often.

While addressing the calibration time is a separate issue, I believe it's possible for us to provide an out-of-the-box solution to sharing calibration time between instances of Clock when none is provided specifically.

This should be more performant (one calibration vs N) as well as safe: given that configuration is handled at compile-time, all non-mock Clock instances should be able to share the exact same calibration.

CLOCK_REALTIME support?

I'm curious what your thoughts are on adding CLOCK_REALTIME support as a very fast way to get calibrated wall-clock time from the TSC.

My initial use-case is on an AMD system where faulty firmware has caused Linux to fall back to the slow HPET, despite it seemingly possible to get good enough results from the TSC. But it may be nice just to avoid get_clocktime overhead in other cases too.

I switched the clock myself, and the calibration looks good but maybe I'm missing something.

Clean up our wild use of conditional compilation.

Conditional compilation is a necessity for targeting different platforms, although at this point, we've found ourselves down quite the rabbit hole in terms of how Cargo.toml and src/monotonic.rs look.

We should spend some time trying to clean this up where we can. On the code side, we might be better off splitting out the various architecture-specific implementations of Monotonic into dedicated files. As far as Cargo.toml, that one is dicey, but perhaps there's some magic we can exploit there to do it in a cleaner fashion.

TSC always disabled on AMD

I wanted to use quanta on my AMD Ryzen 4700u laptop with linux, but it seems that has_constant_or_better_tsc always returns false on AMD hardware because of the check has_multiple_sockets is hard-coded to return true. read_cpuid_nonstop_tsc returns the correct result, but that check is never reached.

I've forced constant TSC support and quanta successfully calibrates and seems to be working fine, so the fix should maybe be to trust the result of read_cpuid_nonstop_tsc.

Figure out a way to conditionally enable RDTSC support in clocksource for benches

Right now, we can only (easily) turn on ASM support for RDTSC in quanta by using normal features, but we can't easily also turn it on for clocksource, which makes it harder to do proper side-by-side benching.

We should try and see if there's a way to make this possible... whether it's through Cargo.toml modifications, or some special nightly support in Cargo.

Can `Instant`s be `NonZeroU64`?

Currently, quanta::Instant is represented as a single u64 value. Will Instants ever be 0? If we're reasonably confident that Instants with the value 0 will never be generated, we may want to consider changing the internal representation to NonZeroU64. This will permit Option<Instant> to be niche optimized into a single 64-bit value.

My particular use case for this is that I'd like to be able to store an Instant in an AtomicU64, and some of those instants may be initially unset. With the current quanta API, I can implement this myself using Instant::as_u64, and using 0 as the unset value in my code.

However, my understanding is that, in quanta 1.0, the intention is to make Instant opaque and remove the as_u64 method. This means that it will be necessary to switch to crossbeam_util's AtomicCell type to store Instants atomically. When using AtomicCell with opaque Instant types, there's no way to initialize those cells to an "empty" value. I could use a specific "program start time" Instant as the zero value, but it would have to be passed around to a lot of places, making this code significantly more awkward.

Instead, it would be really nice to be able to use AtomicCell<Option<Instant>> and have it be lock-free on platforms with 64-bit atomics. This would require that Option<Instant> occupy a single 64-bit word, which is only possible if Instant is represented as NonZeroU64.

Unify global recent / global clock usage.

During the work to expose more free functions on Instant and add mock ability, we created a small rift between Clock and Instant.

Primarily, we have three issues:

Instant::recent will fallback to the global clock, while Clock::recent won't
Instant::recent is an acquire load, while Clock::recent is a relaxed load
quanta::set_recent and Clock::upkeep aren't DRY (small problem, but still)

We should unify these codepaths so that we're limiting the chance for differences in behavior, as well as switching entirely to relaxed loads for "recent" time to ensure maximum performance.

Quanta Clock is off by a factor of 1000 in web browsers

window.performance.now() return milliseconds. As Instant is in nanoseconds, it needs to be multiplied by 1.000.000 not 1.000.

Example to demonstrate the problem:

#[wasm_bindgen]
pub struct TestQuanta
{
    last_time: quanta::Instant
}

#[wasm_bindgen]
impl TestQuanta
{
    #[wasm_bindgen(constructor)]
    pub fn new() -> Self
    {
        Self{last_time: quanta::Instant::now()}
    }

    #[wasm_bindgen(method)]
    pub fn time_since_last_call(&mut self) -> f64
    {
        let now = quanta::Instant::now();
        let elapsed = now.duration_since(self.last_time);
        self.last_time = now;
        elapsed.as_millis() as f64
    }
}

let test_quanta = new TestQuanta();
setInterval(() => console.log(`elapsed: ${test_quanta.time_since_last_call()}ms`), 3000);

Expected: log something around 3000ms, as interval is called every 3000ms.

macOS facilities in libc crate are now deprecated in favor of the mach crate

We should switch over our usages of libc for macOS to the mach crate, as well as much sure we're conditionally including it based on target platform.

Speed up calibration?

In magnet/metered-rs#21, it was noted that the calibration time for Clock takes one second, which may come across as a surprise to users. This is totally fair, especially since it's not documented.

Are there alternative calibration approaches we can take to avoid spending a full second of undocumented time while still achieving an accurate calibration?

One idea is that we loop until we have a statistically significant number of measurements and have reached some stable deviance of the measurements, while falling back to a maximum amount of time spent where we would use the current calibration logic.

Failed to build version 0.9.1

340 |                 let last = last.fetch_update(|current| Some(current.max(now))).unwrap();
    |                                 ^^^^^^^^^^^^ method not found in `&AtomicCell<u64>`

Crate does not build on Windows

   Compiling quanta v0.1.0
error[E0433]: failed to resolve: use of undeclared type or module `mem`
  --> C:\Users\appveyor\.cargo\registry\src\github.com-1ecc6299db9ec823\quanta-0.1.0\src\monotonic.rs:37:24
   |
37 |         let mut freq = mem::uninitialized();
   |                        ^^^ use of undeclared type or module `mem`
error[E0433]: failed to resolve: use of undeclared type or module `mem`
  --> C:\Users\appveyor\.cargo\registry\src\github.com-1ecc6299db9ec823\quanta-0.1.0\src\monotonic.rs:38:26
   |
38 |         debug_assert_eq!(mem::align_of::<LARGE_INTEGER>(), 8);
   |                          ^^^ use of undeclared type or module `mem`
error[E0433]: failed to resolve: use of undeclared type or module `profileapi`
  --> C:\Users\appveyor\.cargo\registry\src\github.com-1ecc6299db9ec823\quanta-0.1.0\src\monotonic.rs:39:19
   |
39 |         let res = profileapi::QueryPerformanceFrequency(&mut freq);
   |                   ^^^^^^^^^^ use of undeclared type or module `profileapi`
error[E0433]: failed to resolve: use of undeclared type or module `mem`
  --> C:\Users\appveyor\.cargo\registry\src\github.com-1ecc6299db9ec823\quanta-0.1.0\src\monotonic.rs:51:24
   |
51 |         let mut lint = mem::uninitialized();
   |                        ^^^ use of undeclared type or module `mem`
error[E0433]: failed to resolve: use of undeclared type or module `mem`
  --> C:\Users\appveyor\.cargo\registry\src\github.com-1ecc6299db9ec823\quanta-0.1.0\src\monotonic.rs:52:26
   |
52 |         debug_assert_eq!(mem::align_of::<LARGE_INTEGER>(), 8);
   |                          ^^^ use of undeclared type or module `mem`
error[E0433]: failed to resolve: use of undeclared type or module `profileapi`
  --> C:\Users\appveyor\.cargo\registry\src\github.com-1ecc6299db9ec823\quanta-0.1.0\src\monotonic.rs:53:19
   |
53 |         let res = profileapi::QueryPerformanceCounter(&mut lint);
   |                   ^^^^^^^^^^ use of undeclared type or module `profileapi`
error[E0412]: cannot find type `LARGE_INTEGER` in this scope
  --> C:\Users\appveyor\.cargo\registry\src\github.com-1ecc6299db9ec823\quanta-0.1.0\src\monotonic.rs:38:42
   |
38 |         debug_assert_eq!(mem::align_of::<LARGE_INTEGER>(), 8);
   |                                          ^^^^^^^^^^^^^ not found in this scope
help: possible candidates are found in other modules, you can import them into scope
   |
1  | use winapi::shared::ntdef::LARGE_INTEGER;
   |
1  | use winapi::um::winnt::LARGE_INTEGER;
   |
error[E0412]: cannot find type `LARGE_INTEGER` in this scope
  --> C:\Users\appveyor\.cargo\registry\src\github.com-1ecc6299db9ec823\quanta-0.1.0\src\monotonic.rs:52:42
   |
52 |         debug_assert_eq!(mem::align_of::<LARGE_INTEGER>(), 8);
   |                                          ^^^^^^^^^^^^^ not found in this scope
help: possible candidates are found in other modules, you can import them into scope
   |
1  | use winapi::shared::ntdef::LARGE_INTEGER;
   |
1  | use winapi::um::winnt::LARGE_INTEGER;
   |
error: aborting due to 8 previous errors

Comparing to `minstant`

Hi, I'm one of the authors of minstant, and minitrace based on minstant which is a fast tracing library used by TiKV. Similar to quanta, minstant is also based on the TSC. I'm considering migrating minitrace to use quanta but have found some blocking problems, which are:

quanta doesn't handle TSC deviation on the CPU cores. This problem can occur on some AMD chips. In minstant, the calibration will be executed on every core, and a correction for each core will be calculated once the deviation is detected.
The first call takes some time to calibrate the clock. In minstant, rust-ctor helps start the calibration at the start of the process.

Are these two problems possible to be fixed?

Does not build on MIPS and ARMv5

MIPS issue: #22 (comment) , MIPS seems work before #52

ARMv5(armv5te-unknown-linux-musleabi target) has the similar issue

error[E0599]: no method named `fetch_add` found for struct `Arc<AtomicCell<u64>>` in the current scope
  --> /root/.cargo/registry/src/github.com-1ecc6299db9ec823/quanta-0.9.2/src/mock.rs:48:21
   |
48 |         self.offset.fetch_add(amount.into_nanos());
   |                     ^^^^^^^^^ method not found in `Arc<AtomicCell<u64>>`

error[E0599]: no method named `fetch_sub` found for struct `Arc<AtomicCell<u64>>` in the current scope
  --> /root/.cargo/registry/src/github.com-1ecc6299db9ec823/quanta-0.9.2/src/mock.rs:53:21
   |
53 |         self.offset.fetch_sub(amount.into_nanos());
   |                     ^^^^^^^^^ method not found in `Arc<AtomicCell<u64>>`

error: aborting due to 2 previous errors

On macOS arm64, `quanta::Instant::elapsed` method tends to return a shorter duration than `std::time::Instant::elapsed` does

quanta versions: v0.12.1, v0.11.1
Platform: macOS arm64

On macOS arm64, quanta::Instant::elapsed method tends to return a shorter duration than std::time::Instant::elapsed does.

Steps to reproduce

Run the following program. It measures the duration of 1 send thread sleep using std::time::Instant and quanta::Instant.

// Cargo.toml
//
// [dependencies]
// quanta = "0.12.1"

use std::{thread, time::Duration};

fn main() {
    const N: usize = 20;
    measure_elapsed("std", N, elapsed_std);
    measure_elapsed("quanta", N, elapsed_quanta);
}

fn measure_elapsed<F>(ty: &str, repeat: usize, f: F)
where
    F: Fn() -> Duration,
{
    let mut elapsed_sum = 0;

    for i in 1..=repeat {
        let elapsed = f().as_nanos();
        println!("{ty}: {i:02} - {elapsed} ns");
        elapsed_sum += elapsed;
    }

    let elapsed_avg = elapsed_sum / repeat as u128;
    println!("{ty} - avg: {elapsed_avg} ns");
}

fn elapsed_std() -> Duration {
    let start = std::time::Instant::now();
    thread::sleep(Duration::from_secs(1));
    start.elapsed()
}

fn elapsed_quanta() -> Duration {
    let start = quanta::Instant::now();
    thread::sleep(Duration::from_secs(1));
    start.elapsed()
}

Expected result

Both std::time::Instant and quanta::Instant should return durations slightly longer than 1 second, the sleep duration.

Actual results

Rust 1.75.0.

On macOS AArch64 (arm64), quanta::Instant::elapsed method returned durations slightly shorter then 1 second.
On Linux AArch64 and x86_64, quanta::Instant::elapsed method returned durations slightly longer then 1 second.

OS	CPU arch	Target triple	`std::time::Instant`	`quanta::Instant` v0.12.1	`quanta::Instant` v0.11.1
macOS Sonoma 14.2.1	arm64 (Apple M1)	`aarch64-apple-darwin`	1,003,161,220 ns	987,545,939 ns	986,960,251 ns
Ubuntu 22.04	AArch64 (Amazon Graviton)	`aarch64-unknown-linux-gnu`	1,000,113,459 ns	1,000,125,420 ns	1,000,129,295 ns
Ubuntu 22.04	x86_64	`x86_64-unknown-linux-gnu`	1,000,147,215 ns	1,000,150,965 ns	1,000,145,745 ns

Much slower than std version in case of high contention

Hello, thanks for the crate.

We simultaneously use Instant::now() from multiple cores for our metrics. We noticed that Instant::now() consumes a lot of CPU cycles (actually, TOP-1 in our perf), so we tried to replace it with std::time::Instant, and it performs much better.

The problem is that quanta updates one global atomic every time, which leads to massive contention:

quanta/src/lib.rs

Lines 359 to 361 in 3a7d361

    
           let last = last 
        
               .fetch_update(|current| Some(current.max(now))) 
        
               .expect("should never return an error");

After removing this code (we don't have problems with decreasing rdtsc in our case), quanta outperforms std::time::Instant again.

Std version doesn't try to handle violation of monotonicity in now(), instead it's saturating to zero in sub, duration_since and elapsed. Should quanta follow this way and avoid contention on the global atomic? Or, at least, it should store the atomic in TLS. What do you think about it all?

`Clock::new` overhead

Awesome work on this library!

I want to preface this question by acknowledging that anyone who is using quanta is probably not creating a new clock instance except at the beginning of their program. Also, I'm mostly out of curiosity since this is a pretty trivial problem. However, I was somewhat surprised at the relatively long startup time on the first initialization. Is this wholly unavoidable or is there a possibility of declaring a clock const or static to fold the initialization cost into program startup? Is the time spent recording multiple clock measurements for calibration purposes or is there any actual computational overhead to it?

	let last = last
	.fetch_update(\|current\| Some(current.max(now)))
	.expect("should never return an error");