I haven't used it, but I imagine that <a href="https://github.com/bloomberg/pytest-mem

resource (CPU, RAM, I/O) usage tracking,about ionelmc/pytest-benchmark

Comments (18)

varac commented on May 19, 2024 1

Hej, I'm also interested in including CPU, RAM, I/O usage.
Just curious what's the state of this issue - any progress since Oct 2015 ?

from pytest-benchmark.

NickCrews commented on May 19, 2024 1

I haven't used it, but I imagine that pytest-memray might be a good alternative, instead of profiling memory directly in this pytest extension. This appears to be a tricky task, so maybe leave it to the well-funded experts at Bloomberg to implement it for us, and then we just use two tools, each one good for what it's good for.

from pytest-benchmark.

ionelmc commented on May 19, 2024

I like the idea of measuring different metrics.

Unfortunately measuring memory use is a bit tricky. Because Python keeps memory internally for performance reasons (eg: free lists) it doesn't always give back the memory to the operating system. There's a http://pytracemalloc.readthedocs.org/ for more "fine" measurements however.

About the other metrics (page faults, signals, context switches) I'm not really sure. Ideally you wouldn't have any changes in those I guess :)

0 ru_utime time in user mode (float)
1 ru_stime time in system mode (float)

For these you can make a custom timer right now (--benchmark-timer). Only one can be shown right now. Showing aggregate metrics is a big display change, if you could mock up how the results table would look it would be really great.

from pytest-benchmark.

anarcat commented on May 19, 2024

i understand this is a big change... with only sys/user/wall clocks, it could look like this:

Wall clock time:

Name (time in us)         Min      Max     Mean  StdDev  Rounds  Iterations
---------------------------------------------------------------------------
test_ssh_cmd          24.3047  73.0390  24.8964  1.0002    6323           1
---------------------------------------------------------------------------

System time:

Name (time in us)         Min      Max     Mean  StdDev  Rounds  Iterations
---------------------------------------------------------------------------
test_ssh_cmd          24.3047  73.0390  24.8964  1.0002    6323           1
---------------------------------------------------------------------------

User time:

Name (time in us)         Min      Max     Mean  StdDev  Rounds  Iterations
---------------------------------------------------------------------------
test_ssh_cmd          24.3047  73.0390  24.8964  1.0002    6323           1
---------------------------------------------------------------------------

ie. just split in multiple tables?

from pytest-benchmark.

ionelmc commented on May 19, 2024

That could work. I plan to rework the display logic at some point.

Another required thing is support for aggregates in the timing handling. There's one tricky thing - calibration would only be possible for a single metric. Hmmm ....

from pytest-benchmark.

anarcat commented on May 19, 2024

On 2015-10-06 11:02:53, Ionel Cristian Mărieș wrote:

Another required thing is support for aggregates in the timing handling. There's one tricky thing - calibration would only be possible for a single metric. Hmmm ....

maybe we could assume the metrics are correlated for the purpose of
calibration?

from pytest-benchmark.

ionelmc commented on May 19, 2024

Yes, I don't think we have a choice. :)

On Tue, Oct 6, 2015 at 6:17 PM, anarcat [email protected] wrote:

On 2015-10-06 11:02:53, Ionel Cristian Mărieș wrote:

Another required thing is support for aggregates in the timing handling.
There's one tricky thing - calibration would only be possible for a single
metric. Hmmm ....

maybe we could assume the metrics are correlated for the purpose of
calibration?

from pytest-benchmark.

ionelmc commented on May 19, 2024

We could have this as option (--benchmark-memprofile + --benchmark-cpuprofile or similar names) or an always on (active if tracemalloc module availabe and whatever is needed for cpu profiling). The problem with the later is that behavior might be surprising (sudden and unnoticeable loss of functionality if some deps aren't there).

Ideally these stats would be included in the table somehow so you can compare/sort (unlike the --benchmark-cprofile feature where the profile data is completely outside).

The problem with including them in the timings table is that avg/mean/iqr/stddev colums would be completely unrelated to these cpu/men stats.

Also, can we agree that these cpu/men stats would be consistent and there's no need for calibration (iow: no rounds/iterations)? (iow: cpu/mem stats for a function they would not vary from run to run)

There are some interesting things to measure mem/cpu stats in https://github.com/haypo/perf - perhaps we can use it.

from pytest-benchmark.

ofek commented on May 19, 2024

Just fyi resource from stdlib isn't available on Windows. A cross-platform way would be:

>>> import os
>>> import psutil
>>> from binary import convert_units
>>>
>>> process = psutil.Process(os.getpid())
>>> convert_units(process.memory_info().rss)
(15.609375, 'MiB')

from pytest-benchmark.

huonw commented on May 19, 2024

In https://github.com/stellargraph/stellargraph, we're benchmarking/tracking memory-use of particular objects using a technique hacked on top of Python's built-in tracemalloc of creating a clock-like API for memory use: https://github.com/stellargraph/stellargraph/blob/15114f2258a93bc92f28882eea766639f3201ad1/tests/test_utils/alloc.py

Example test using it: https://github.com/stellargraph/stellargraph/blob/15114f2258a93bc92f28882eea766639f3201ad1/tests/core/test_stellargraph.py#L515-L532

@pytest.mark.benchmark(group="StellarGraph creation", timer=snapshot)
# various element counts, to give an indication of the relationship
# between those and memory use (0,0 gives the overhead of the
# StellarGraph object itself, without any data)
@pytest.mark.parametrize("num_nodes,num_edges", [(0, 0), (100, 200), (1000, 5000)])
# features or not, to capture their cost
@pytest.mark.parametrize("feature_size", [None, 100])
def test_allocation_benchmark_creation_from_networkx(
    allocation_benchmark, feature_size, num_nodes, num_edges
):
    g, node_features = example_benchmark_graph(
        feature_size, num_nodes, num_edges, features_in_nodes=True
    )

    def f():
        return StellarGraph(g, node_features=node_features)

    allocation_benchmark(f)

The key is the timer=snapshot and allocation_benchmark fixture.

from pytest-benchmark.

maingoh commented on May 19, 2024

It would be a nice feature ! About the RAM tracking, https://github.com/python-recsys/benchy might be a source of inspiration ? It uses memory_profiler package

from pytest-benchmark.

LucaPaterlini commented on May 19, 2024

hello is anything happened on adding the memory profile option?

from pytest-benchmark.

ionelmc commented on May 19, 2024

Not yet. Anyone interested in helping with this?

from pytest-benchmark.

anarcat commented on May 19, 2024

note that https://github.com/P403n1x87/pytest-austin is a thing too...

from pytest-benchmark.

AngryUbuntuNerd commented on May 19, 2024

I think I got this working with a custom timer function:

import tracemalloc

tracemalloc.start()


def malloc():
    size, peak = tracemalloc.get_traced_memory()
    tracemalloc.clear_traces()

    return peak*4  # this factor seems to vary on operating systems

Then call pytest with --benchmark-timer timers.malloc assuming that the above function is in a timers.py file.

For this to work, you have to use pedanctic mode and define the rounds yourself. Also, there needs to be one warmup_round, as otherwise the first round leads to funny results.

from pytest-benchmark.

huonw commented on May 19, 2024

@AngryUbuntuNerd you may be interested in an expansion of #28 (comment) above: since that comment was written, we've added benchmarking of peak memory too. New references:

code: https://github.com/stellargraph/stellargraph/blob/1e6120fcdbbedd3eb58e8fecc0eabc6999101ee6/tests/test_utils/alloc.py (see the peak function and MallocPeak class for how peak memory recording works)
example of a benchmark that records time, long-term memory use and peak memory use: https://github.com/stellargraph/stellargraph/blob/1e6120fcdbbedd3eb58e8fecc0eabc6999101ee6/tests/core/test_graph.py#L1054-L1103

Unfortunately, I think your approach doesn't quite do the right thing, because it does start = malloc(); <benchmark code>; end = malloc(); return end - start, as it's recording the peak before the benchmark starts, resets it, and then subtracting that first peak from the unrelated peak after running the benchmark. The MallocPeak class above avoids this via a custom __sub__ method.

from pytest-benchmark.

AngryUbuntuNerd commented on May 19, 2024

@huonw thanks for your solution

I think the tracemalloc way works as well though: the second malloc() will record the memory spike since the last call of clear_traces(). Tested it with a few MB and GB of memory eating, and different rounds/iterations, and the results are matching.

from pytest-benchmark.

huonw commented on May 19, 2024

Yeah, you're right that tracemalloc works really well for this: both solutions are using it. The only difference between them is the return value of the "timer" and how it computes the "duration" via a subtraction between the start "time" and the end "time".

(I think your * 4 factor may be driven by this because I don't think it should be necessary: tracemalloc is returning the exact byte size of the current/peak memory allocations it knows about, so multiplying by some arbitrary factor is... confusing.)

To show this in practice, I wrote a benchmark, which uses the basic one (without the * 4), and the one from StellarGraph above (that is, with a class with a custom __sub__): I've used a setup function with a very high peak (80MB, on a 64-bit computer) followed by a benchmark function with a lower peak (8MB), as an extreme example.

Expand for benchmark code

# issue_28.py
import pytest
import tracemalloc

tracemalloc.start()

class MallocPeak:
    def __init__(self, peak):
        self.peak = peak

    def __sub__(self, other):
        # ignore other
        return self.peak

def peak_default_sub():
    _, peak = tracemalloc.get_traced_memory()
    tracemalloc.clear_traces()
    return peak

def peak_fancy_sub():
    _, peak = tracemalloc.get_traced_memory()
    tracemalloc.clear_traces()
    return MallocPeak(peak)

def _setup():
    # setup code with a comically high peak, to make the issue clear
    x = [None] * 10_000_000 # 80MB
    del x

def _func():
    # real code has a lower peak than the setup
    y = [None] * 1_000_000 # 8MB
    del y

@pytest.mark.benchmark(timer=peak_default_sub)
def test_default_sub(benchmark):
    benchmark.pedantic(_func, iterations=1, rounds=1, warmup_rounds=0, setup=_setup)

@pytest.mark.benchmark(timer=peak_fancy_sub)
def test_fancy_sub(benchmark):
    benchmark.pedantic(_func, iterations=1, rounds=1, warmup_rounds=0, setup=_setup)

With that setup, the results of pytest issue_28.py are:

-------------------------------------------------------------------------------------------------------------------------- benchmark: 2 tests --------------------------------------------------------------------------------------------------------------------------
Name (time in ns)                              Min                                     Max                                    Mean            StdDev                                  Median               IQR            Outliers     OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_default_sub      -72,167,433,000,000,000.0000 (1.0)      -72,167,433,000,000,000.0000 (1.0)      -72,167,433,000,000,000.0000 (1.0)      0.0000 (1.0)      -72,167,433,000,000,000.0000 (1.0)      0.0000 (1.0)           0;0 -0.0000 (0.11)          1           1
test_fancy_sub          8,000,512,000,000,000.0000 (0.11)       8,000,512,000,000,000.0000 (0.11)       8,000,512,000,000,000.0000 (0.11)     0.0000 (1.0)        8,000,512,000,000,000.0000 (0.11)     0.0000 (1.0)           0;0  0.0000 (1.0)           1           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

(Note that it's reporting in "nanoseconds" because of the negative values: all the byte sizes are multiplied by 1_000_000_000.)

In summary:

test_default_sub is very incorrect (a negative peak!): it gives a value close to -72MB = 8MB - 80MB, because the timer is subtracting the peak of the setup function off the peak of the real benchmark
test_fancy_sub is correct: it gives almost exactly 8MB, which is the right answer, given the test function just allocates a 8MB list

This is exaggerated for clarity, but more realistic settings show the error too: for instance, change the setup function to have a peak of 8MB too (test_default_sub: -168KB; test_fancy_sub: 8MB, as above) or don't use a setup function at all (test_default_sub: 7.8MB, test_fancy_sub: 8MB, as above).

Why is this happening? If we unroll the loop for the test_default_sub benchmark and inline the functions, we effectively get:

# do setup
_setup()

# snapshot 'time'/size before
_, peak_before = tracemalloc.get_traced_memory() # peak_before = 80MB
tracemalloc.clear_traces() # reset the peak to 0, but peak_before = 80MB still

# run benchmarked function
_func()

# snapshot 'time'/size after
_, peak_after = tracemalloc.get_traced_memory() # peak_after = 8MB
tracemalloc.clear_traces()

# compute result 'duration'
result = peak_after - peak_before # 8MB - 80MB = -72MB

In particular, the 'before' value needs to be 0, because the clear_traces means that's where we're starting. AFAIK, the best way to achieve this is to use a custom subtraction on the result line, because it's hard to know inside a timer call if it's the 'before' snapshot (to return 0) or the 'after' one (to return the real peak).

from pytest-benchmark.

resource (CPU, RAM, I/O) usage tracking about pytest-benchmark HOT 18 OPEN

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent