ionelmc / pytest-benchmark Goto Github PK

py.test fixture for benchmarking code

License: BSD 2-Clause "Simplified" License

Python 100.00%

python pytest benchmark benchmarking performance

pytest-benchmark's Introduction

Overview

docs
tests
package

A pytest fixture for benchmarking code. It will group the tests into rounds that are calibrated to the chosen timer.

See calibration and FAQ.

Free software: BSD 2-Clause License

Installation

pip install pytest-benchmark

Documentation

For latest release: pytest-benchmark.readthedocs.org/en/stable.

For master branch (may include documentation fixes): pytest-benchmark.readthedocs.io/en/latest.

Examples

But first, a prologue:

This plugin tightly integrates into pytest. To use this effectively you should know a thing or two about pytest first. Take a look at the introductory material or watch talks.

Few notes:

This plugin benchmarks functions and only that. If you want to measure block of code or whole programs you will need to write a wrapper function.

In a test you can only benchmark one function. If you want to benchmark many functions write more tests or use parametrization.

To run the benchmarks you simply use pytest to run your "tests". The plugin will automatically do the benchmarking and generate a result table. Run pytest --help for more details.

This plugin provides a benchmark fixture. This fixture is a callable object that will benchmark any function passed to it.

Example:

def something(duration=0.000001):
    """
    Function that needs some serious benchmarking.
    """
    time.sleep(duration)
    # You may return anything you want, like the result of a computation
    return 123

def test_my_stuff(benchmark):
    # benchmark something
    result = benchmark(something)

    # Extra code, to verify that the run completed correctly.
    # Sometimes you may want to check the result, fast functions
    # are no good if they return incorrect results :-)
    assert result == 123

You can also pass extra arguments:

def test_my_stuff(benchmark):
    benchmark(time.sleep, 0.02)

Or even keyword arguments:

def test_my_stuff(benchmark):
    benchmark(time.sleep, duration=0.02)

Another pattern seen in the wild, that is not recommended for micro-benchmarks (very fast code) but may be convenient:

def test_my_stuff(benchmark):
    @benchmark
    def something():  # unnecessary function call
        time.sleep(0.000001)

A better way is to just benchmark the final function:

def test_my_stuff(benchmark):
    benchmark(time.sleep, 0.000001)  # way more accurate results!

If you need to do fine control over how the benchmark is run (like a setup function, exact control of iterations and rounds) there's a special mode - pedantic:

def my_special_setup():
    ...

def test_with_setup(benchmark):
    benchmark.pedantic(something, setup=my_special_setup, args=(1, 2, 3), kwargs={'foo': 'bar'}, iterations=10, rounds=100)

Screenshots

Normal run:

Compare mode (--benchmark-compare):

Histogram (--benchmark-histogram):

Development

To run the all tests run:

tox

Credits

Timing code and ideas taken from: https://github.com/vstinner/misc/blob/34d3128468e450dad15b6581af96a790f8bd58ce/python/benchmark.py

pytest-benchmark's People

Contributors

Stargazers

Watchers

Forkers

msabramo thedavecollins aldanor tony antocuni pombredanne astrojuanlu gitter-badger felsen t-ukon dlunin thedrow ekultek qntln sanga pybenchmark moagstar sectorlabs varac drebs rsudarson frrp popravich firejava vreuter jdhardy oeuftete samholt daerwang dvincelli ofek uw-ipd matthewfeickert josekilo aureus5 anjoman btel scorphus rubbish822 stanislavlevin darkfoxh4k3r gyermolenko cygnusv amrlotfy77 dimrozakis jonathansp wenzou dylancromer trydirect agbeltran chakib-belgaid marijana892 krkrreddy vincentsarago vidyackabber ntninja sbellem machadojpf fudp vestigegroup wpoxon maciejdomagala gnagel neotycoder angryubuntunerd wfyhz thomasahle ben4932042 sarahbx louiezhou stjordanis boschmitt juliengrv taupan parona-source mrrobot2211 hlinnaka theelx caseyglasgow monkeyman192 sujin1135 odidev kianmeng qinwentu efiop keioder samuelcolvin visheshh 134579 the-compiler shahram4m archesko danigm dotlambda mohammedelhamraoui ngie-eign jackiechang2016 aphi arpitjain799 jedore

pytest-benchmark's Issues

On PyPy use pypytools.clonefunc in the benchmark decorator

This will make the JIT specialize the function each time.

Use sample code on package page, but got error

I use the sample code on package page. Below is my error:

[gw1] darwin -- Python 3.4.3 /Users/asoul/.virtualenvs/lms/bin/python3
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/unittest/case.py:58: in testPartExecutor
    yield
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/unittest/case.py:577: in run
    testMethod()
E   TypeError: test_my_stuff() missing 1 required positional argument: 'benchmark'

What should I do? Thanks :)

Colors appear to be broken

As you can see, the colors appear to be broken on the (*) line, they are solid black, where as everything else is shaded "correctly". Any ideas?

Make test suite fail if benchmark is unsatisfactory

Hello,
I read the documentation and hope I did not miss something completely obvious.

My use case: I would like to use pytest-benchmark for continuous integration. A part of my test suite is actually performance tests, i.e. making sure that my toto function does not take longer than, let's say, 20ms.

I would like the test suite to fail is some modifications in the code make toto() exceeds the 20ms. I am aware of the --benchmark-compare-fail=EXPR, but I think what I am looking for is more specific.

I have no idea what would be the best way to describe this, perhaps:

@pytest.mark.benchmark(fail_at=0.02)
def test_toto(benchmark):
    benchmark(toto)

Or maybe:

def test_toto(benchmark):
    results = benchmark(toto, fail_at=0.02)

Or provide a way for the user to access the results of the benchmark? Before using pytest-benchmark, I would do something like this:

def test_toto(benchmark):
    results = benchmark(toto)
    if results.average_time > 0.02:
        pytest.fail('Exceeding 20ms!')

Would this make sense in pytest-benchmark?

allow specifying timing units in config file

4 million us is not very readable

Release a new version on pypi?

The current version that's out there has a Python 3 syntax error (except clause) so it's broken and not installable -- might make sense to release a new one since that error has been fixed?

benchmark-compare no longer defaulting to comparing against the previous run

My test suite passes --benchmark-compare with no value specified, so that it defaults to the latest saved run. Upgrading from 3.0.0 to 3.1.0a1 broke this such that pytest-benchmark now always compares the current run against itself, rather than against the previous run.

help: estimate BigO for multiple functions?

Hi, I'd like to use this package to estimate the BigO for multiple functions. I wonder what's the practical way to implement it. Currently I can get the benchmark statistics for one input size, and I have to manually change the input size and run it again to get a curve for function v.s. input size.

the code is like this

size = 100
x = np.random.randn(size)

def test_f1(benchmark):
    benchmark(f1)

def test_f2(benchmark):
    benchmark(f2)

def test_f3(benchmark):
    benchmark(f3)

Thanks.

Allow to configure columns reported

Referring again to the output of pytest-benchmark's test suite itself, I'd point out that it is 188 characters wide which is a lot more than 80 or 100.

In a lot of cases, you would just want some mix of mean/median/min/max; especially so in the manual mode when you know exactly what you are doing.

Like, I could look at the outliers, stddev and IQR while designing the benchmark, but when it's being reported at the end of the test suite run, I'll probably just look at min/mean.

With this in mind, it would be then nice if the report could be configured to include/exclude custom stat columns, like stats=['min', 'mean'], sort='min' (or command line parameters). Maybe the default setting should be not to output absolutely every stat that it computes, too.

Warn if benchmarks in the same group have different options

It's quite a bad idea to compare tests that don't have same disable_gc settings at least.

Which test to run when packaging for linux distribution

Hi,

I'm packaging pytest-benchmark for nixos.org. Which tests should I run to make sure the packaging is ok? I saw you recommend nox but I would like a test for the current python environment, not for all of them. Moreover, I'm not interested in benchmark tests of pytest-benchmark. Thanks

generate an index.html for benchmark-histogram

My project directory now has 57 randomly named svgs in it.

It would be good if these were hidden somewhere, and an index.html was generated that showed them all

Accessing results of benchmark to catch performance regressions?

I would like to do something like:

def test_perf(benchmark):
    results = benchmark(some_fn)
    if results.median_ms > 2000:
        raise Exception('Performance regression!')

This is obviously not the current API of benchmark (results is the return of some_fn) -- but can this be done some other way?

Allow to output the resulting table to a file

This would be extremely useful if you run this as a part of continuous integration -- currently, the images can be saved (per each benchmark), but the resulting table itself cannot. Grepping through test logs on a build server is not fun -- would be much nicer if the benchmarks could be pulled out.

If it was possible to dump the results into a file (txt, csv or maybe even a nice formatted html, kind of like coverage does), then the benchmarks could be automatically published on each build as test artifacts.

Have a way to compare multiple runs

Currently --benchmark-compare takes a single run.

What if it could take multiple, eg: --benchmark-compare=0001,0002,0003?

Also another unfortunate choice was to only compare to runs from the same interpreter. What if we could compare different interpreters, eg: --benchmark-compare=Windows-CPython-3.5-32bit\0001,Windows-CPython-3.5-64bit\0001,Windows-CPython-2.7-32bit\0001?

Add a --benchmark-time-unit option

--benchmark-time-unit could take values like: s, ms, us, ns, second, micro-second, microsecond, ... or auto:min, auto:max, auto:avg, auto:mean etc for automatic pick from given column.

See: https://github.com/ionelmc/pytest-benchmark/pull/37/files#r46095325

Disable progress display when there's no TTY

When running benchmark inside vim editor where output goes into copen window which does not support terminal escape sequences the output does not look good.

py.test can detect it it runs in an interactive session or not and in case if environment is non interactive, all colours and other screen drawing features are turned of.

It would be nice if pytest-benchmark could support this too.

You can check for interactive session like this:

if sys.stdin.isatty():
    # interactive, turn on colors
else:
    # non-interactive, turn off colors

Output doesn't fit in terminal

Since I upgraded to the latest beta, the pytest results now no longer fit in the terminal. The iterations column wraps around

benchmark-histogram generates black empty svgs

platform linux2 -- Python 2.7.3, pytest-2.8.2, py-1.4.30, pluggy-0.3.1
benchmark: 3.0.0rc1 (defaults: timer=time.time disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/XXX, inifile: setup.cfg
plugins: benchmark-3.0.0rc1
collected 56 items / 1 errors

pip install --upgrade statistics pytest-benchmark[histogram]

Number of benchmark warmup rounds is sometimes not enough for PyPy

See https://travis-ci.org/thedrow/drf-benchmarks/jobs/66443524#L420 for example.
I need a way to specify the minimum of warmup rounds for a benchmark so that I'll be able to verify that the JIT has been triggered.

Save raw data

Currently only the relevant statistics are saved to the .json file, but it would be nice to add some --benchmark-full command line option to actually save the elapsed time of each run too. I would like to take this results and plot them using tools other than pygal, so exporting them in some way would be helpful.

Indicate somehow that some benchmarks have error

And remove any recorded timings?

Better grouping

Group by test's params (eg: fixtures etc)
Group by test's name

New options:

--benchmark-group-by=params
--benchmark-group-by=group (default)
--benchmark-group-by=testname

lower level api

I'm currently using your module to benchmark some external api performance. Currently this is implemented using severala coroutines running concurrent parallel taks where one coroutine calls some "processor" and this one processor should be benchmarked. using weave here is possible but problematic bacause of complex "setup" and concurrency issues. I have a workaround to solve my problem:

def some_test(benchmark, event_loop):
    benchmark._mode = 'benchmark(...)'
    stats = benchmark._make_stats(1)
    primary_task = event_loop.create_task(primary_task_factory(data_source, duration_callback=stats.update))

@asyncio.coroutine
def process_docs(
        docs_iterator, max_frequency, max_amount, processor,
        doc_index_in_args=1,
        duration_callback=None,
        *args, **kwargs):
    # ...
    p_t0 = default_timer()
    processor(*args, **kwargs)
    processor_time = default_timer()-p_t0
    total_processor_time += processor_time
    if duration_callback:
        duration_callback(processor_time)

It would be great to have some "legal" and documented way to do similar things. I had to read pytest-benchmark source code to do this. I think that public api for similar tasks should be exposed:

def test_something(benchmark):
    duration_handler = benchmark.manual()
    for x in something():
        foo()
        baz()

        duration_handler.start()
        some_processing_to_benchmark()
        duration_handler.end()

Save results to file and compare

16:36 <hpk> ionelmc: what i'd need would be a way to compare agsinst prior benchmarks
16:37 <ionelmc> hpk: that means you'd need a way to measure relative times against a "control benchmark"
16:37 <hpk> ionelmc: i.e. "py.test --bench-compare path-to-old-benchmarkresults"
16:37 <ionelmc> because machines don't have same perf
16:38 <hpk> yes, writing out of results as well as comparing against them and getting errors when they slowed too much
16:38 <hpk> so probably a "py.test --bench-as-control-sample" and "py.test --bench"
16:38 <hpk> (conceptually)
16:38 <ionelmc> hpk: in other words, you'd be comparing percentages, not actual timings
16:39 <hpk> i'd be looking how much a benchmark deviates
16:39 <hpk> would report all deviations and say that >10% slowdown is an error or so
16:39 <ionelmc> nooo, i thing you missed my point
16:40 <ionelmc> so, you have a "control test"
16:40 <ionelmc> that does something, whatever, something simple
16:40 <hpk> what i said was not directly related to what you said before -- more what i think would be useful for myself
16:40 <ionelmc> and the other tests compare to that
16:40 <ionelmc> eg: 50% slower than the "control bench"
16:41 <hpk> might be useful for some people, not for me, i guess
16:41 <ionelmc> and in the file you only save percentages (the relative values to thecontrol test)
16:41 <ionelmc> otherwise saving to a file is not practical
16:41 <ionelmc> i'm thinking travis
16:41 <hpk> ah, now i get it
16:41 <ionelmc> i run it locally but travis is gonna be very unpredictable
16:41 <ionelmc> ever between runs
16:41 <hpk> i don't know if this coulid work
16:42 <hpk> but it's an interesting idea
16:42 <ionelmc> so the only reliable thing to compare against is a "contro test" that is ran in the same session
16:42 <hpk> question is if you can do a control test that makes sense
16:42 <ionelmc> eg, i wanna benchmark instance creation of some objects
16:42 <hpk> and where the relation "realtest versus controltest" is stable
16:42 <hpk> across machines and interpreters
16:43 <hpk> i somehow doubt it
16:43 <ionelmc> and the control is "object()"
16:43 <ionelmc> ofcourse some things will be slower on some interpreters
16:43 <hpk> you need to try and run such things on multiple machines, including travis, to find out if it's viable i guess
16:44 <ionelmc> i think it's best to just have a nice way to look at historical data
16:44 <ionelmc> eg, a separate service that records timings
16:44 <hpk> what i proposed should work without having to figure out control tests but you need a somewhat repeatable environment
16:44 <ionelmc> like coveralls but for benchmarks
16:45 <ionelmc> https://coveralls.io/
16:45 <ionelmc> coveralls integrates well into travis
16:46 <ionelmc> hpk: well, repeatable environments are a luxury
16:47 <ionelmc> with all the crazy virtualization and even crazy cpu scaling (intel turboboost) it's fairly hard
16:47 <hpk> ionelmc: yes -- the other question is if it's possible to count CPU ticks used for a computation rather than time
16:48 <hpk> ionelmc: but it's even hard within the lifetime of one process
16:48 <ionelmc> hmmmm
16:48 <ionelmc> that should work
16:48 <ionelmc> you only need to have the same cpu then
16:49 <hpk> on travis i guess during 60 seconds of a test run you might experience different speeds
16:49 <hpk> so doing a control run first, then benchmarks might or might not be accurate enough
16:49 <ionelmc> wildly different i'd add :)
17:05 <hpk> for me it all becomes only useful with the comparison feature, but then it would be very useful
17:05 <hpk> (i am currently doing some benchmarking but manually, and i'd love to have a more systematic approach)
17:06 <ionelmc> hpk: so you're basically assuming consistent environments, like, you're not going to use it on travis
17:06 <hpk> yes
17:06 <ionelmc> only locally, to get feedback on perf regression
17:07 <hpk> yes, so if pytest had that, prior to pytest-2.7 would check it didn't regress
17:07 <hpk> or even just for a PR
17:07 <ionelmc> yeah, sounds very useful
17:08 <hpk> and then integrate the web code behind http://speed.pypy.org/ :)
17:12 <hpk> i'd be fine with just terminal reporting, already, though :)
17:16 <ionelmc> ok, what would be an error situation
17:17 <ionelmc> compare against minimums
17:17 <ionelmc> what's a good default for error threshold ?
17:30 <hpk> ionelmc: no clue, 10% maybe?

Having a setup function and >1 iterations is not possible

As coded in src/pytest_benchmark/fixture.py:

if iterations > 1 and setup:
            raise ValueError("Can't use more than 1 `iterations` with a `setup` function.")

I have not found any reference to this in documentation. Moreover, the initial pedantic example in documentation suggests using a setup function and 10 iterations, which is impossible.

Show relative differences in the results table

Something like 25% (± 5%) faster.

I'd use these as the basis for implementation:

However the formulas are designed for fixed number of rounds, so all tests need to be run same number of rounds. This is a bit of a problem because I have the --max-time limiting.

resource (CPU, RAM, I/O) usage tracking

the resource library allows Python processes to track memory usage and such things.

forking may be necessary to properly monitor each test invidually in this case. a separate data structure should also be built parallel to results to track other resources.

there seems to be a way with getrusage(2) to get information about threads as well, but this doesn't seem to be a good idea considering Python's limited support for threads and how the extension is Linux-specific.

i think the following data points could be collected:

0   ru_utime    time in user mode (float)
1   ru_stime    time in system mode (float)
2   ru_maxrss   maximum resident set size
9   ru_inblock  block input operations
10  ru_oublock  block output operations

those could be interesting but may be just adding too much noise:

3   ru_ixrss    shared memory size
4   ru_idrss    unshared memory size
5   ru_isrss    unshared stack size
6   ru_minflt   page faults not requiring I/O
7   ru_majflt   page faults requiring I/O
8   ru_nswap    number of swap outs
11  ru_msgsnd   messages sent
12  ru_msgrcv   messages received
13  ru_nsignals signals received
14  ru_nvcsw    voluntary context switches
15  ru_nivcsw   involuntary context switches

basically, this would extend the time metric to be an array of metrics with different units and so on...

would that be useful to others as well?

Issue with Xdist plugin: impossibile to serialize

I have different test written with pytest-benchmark. I use also the Xdist plugin to distribute my test on more than a process.
Starting py.test with Xdist run gives this output:

py.test -n 2
 ============================= test session starts =============================
platform win32 -- Python 2.7.3 -- py-1.4.26 -- pytest-2.6.4
plugins: benchmark, xdist
gw0 C / gw1 IINTERNALERROR> Traceback (most recent call last):
INTERNALERROR>   File "C:\Python27\lib\site-packages\_pytest\main.py", line 82,
in wrap_session
INTERNALERROR>     config.hook.pytest_sessionstart(session=session)
INTERNALERROR>   File "C:\Python27\lib\site-packages\_pytest\core.py", line 413,
 in __call__
INTERNALERROR>     return self._docall(methods, kwargs)
INTERNALERROR>   File "C:\Python27\lib\site-packages\_pytest\core.py", line 424,
 in _docall
INTERNALERROR>     res = mc.execute()
INTERNALERROR>   File "C:\Python27\lib\site-packages\_pytest\core.py", line 315,
 in execute
INTERNALERROR>     res = method(**kwargs)
INTERNALERROR>   File "C:\Python27\lib\site-packages\xdist\dsession.py", line 48
0, in pytest_sessionstart
INTERNALERROR>     nodes = self.nodemanager.setup_nodes(putevent=self.queue.put)

INTERNALERROR>   File "C:\Python27\lib\site-packages\xdist\slavemanage.py", line
 45, in setup_nodes
INTERNALERROR>     nodes.append(self.setup_node(spec, putevent))
INTERNALERROR>   File "C:\Python27\lib\site-packages\xdist\slavemanage.py", line
 54, in setup_node
INTERNALERROR>     node.setup()
INTERNALERROR>   File "C:\Python27\lib\site-packages\xdist\slavemanage.py", line
 223, in setup
INTERNALERROR>     self.channel.send((self.slaveinput, args, option_dict))
INTERNALERROR>   File "C:\Python27\lib\site-packages\execnet\gateway_base.py", l
ine 681, in send
INTERNALERROR>     self.gateway._send(Message.CHANNEL_DATA, self.id, dumps_inter
nal(item))
INTERNALERROR>   File "C:\Python27\lib\site-packages\execnet\gateway_base.py", l
ine 1285, in dumps_internal
INTERNALERROR>     return _Serializer().save(obj)
INTERNALERROR>   File "C:\Python27\lib\site-packages\execnet\gateway_base.py", l
ine 1303, in save
INTERNALERROR>     self._save(obj)
INTERNALERROR>   File "C:\Python27\lib\site-packages\execnet\gateway_base.py", l
ine 1321, in _save
INTERNALERROR>     dispatch(self, obj)
INTERNALERROR>   File "C:\Python27\lib\site-packages\execnet\gateway_base.py", l
ine 1402, in save_tuple
INTERNALERROR>     self._save(item)
INTERNALERROR>   File "C:\Python27\lib\site-packages\execnet\gateway_base.py", l
ine 1321, in _save
INTERNALERROR>     dispatch(self, obj)
INTERNALERROR>   File "C:\Python27\lib\site-packages\execnet\gateway_base.py", l
ine 1398, in save_dict
INTERNALERROR>     self._write_setitem(key, value)
INTERNALERROR>   File "C:\Python27\lib\site-packages\execnet\gateway_base.py", l
ine 1392, in _write_setitem
INTERNALERROR>     self._save(value)
INTERNALERROR>   File "C:\Python27\lib\site-packages\execnet\gateway_base.py", l
ine 1319, in _save
INTERNALERROR>     raise DumpError("can't serialize %s" % (tp,))
INTERNALERROR> DumpError: can't serialize <class 'pytest_benchmark.plugin.NameWr
apper'>

Allow reading config from tox.ini / setup.cfg?

Seeing as there's a growing list of options that barely fits one help screen, maybe it would make sense to think about reading the default from config file same as most other tools do (tox itself, coverage, pytest, all sorts of linters)?

So instead of passing a bajillion of command line arguments, you could just add a section in your tox.ini like so (just making these up):

[benchmark]
columns = min, max
precision = auto
report = term, json
json = foo.json

More detailed params info

Currently for each benchmark we save also the "param" attribute, which is directly taken from pytest's callspec.id. This works very well if you have only one parametric fixture, but it becomes less useful if you have more. For example, consider this real life example:
https://bitbucket.org/antocuni/capnpy/src/22749ed7be02fc969907391e38cac1fe4f925e18/capnpy/benchmarks/test_benchmarks.py?at=master&fileviewer=file-view-default#test_benchmarks.py-86

Here, both Storage and numeric_types are parametrized; currently, the test names look like this:
test_getattr[instance-int32]
where "instance-int32" is the param. What it would be REALLY useful is to:

save the infos about all params in the json, so I can later filter/aggregate based on those
be able to group by a specific param; e.g., --benchmark-group-by=param:numeric_type (or, even better: --benchmark-group-by=numeric_type)

I started to implement the feature here:
antocuni@3399849

but before going on, I'd like to see what you think. In particular, what do we do with the current "param" attribute? Do we keep it side by side with "params", or remove it completely?

Allow to configure precision or be smarter about it

From pytest-benchmarks's test suite:

test_single             1192.0929 
test_setup              1907.3486 
test_args_kwargs        1907.3486 
test_iterations          190.7349  
test_rounds_iterations    95.3674

The benchmark report is 188 characters wide but I think it contains a lot of noise: the chances are, you wouldn't really care about precision to 4 digits if the number is in the thousands or millions range.

You could do something like this (precision depends on the value):

test_single             1192 
test_setup              1907 
test_args_kwargs        1907 
test_iterations          190  
test_rounds_iterations  95.4

And maybe add thousands separators as well:

test_single             1,192 
test_setup              1,907 
test_args_kwargs        1,907 
test_iterations          190  
test_rounds_iterations  95.4

You could have a fixed precision mode (like now, precision == 4), or auto precision mode, which could be formatted like so:

0.00001 : < 1e-4
0.00012 : 0.0001
0.00123 : 0.0012
0.01234 : 0.0123
0.12345 : 0.1234
1.23456 : 1.234
12.3456 : 12.34
123.456 : 123.4
1234.56 : 1,234

(as for the small values, could print them as 0.0000 or indicate explicitly that they fall below precision range)

Add benchmark.weave

The weave attribute, as a shorthand for the benchmark_weave fixture.

Add function wrapper support

Eg:

def test_stuff(benchmark):
    assert benchmark(func)(1, 2, 3) = 'blabla'

unusable due to missing statistics

WARNING: Benchmarks are automatically disabled because we could not import statistics

Traceback (most recent call last):
File "/home/stowers/.virtualenvs/opencv3/local/lib/python2.7/site-packages/pytest_benchmark/plugin.py", line 46, in
import statistics
ImportError: No module named statistics

maybe pytest-benchmark should install it if missing?

Possible improvements in graphing

From what I gather and from what I've tried, it's currently possible to render the results for each benchmark (i.e. a row) into an svg file using pygal/pygaljs.

There's a few problems with this:

pygal is far from being a conventional graphing library; it's not shipped with conda and is a bit of pain to build (let alone test); has heavy deps like cairosvg etc
if you run a parametrized test and want to compare the results (i.e. by putting them in the same benchmark group), the images will still be generated for each separate row, which could be quite meaningless. E.g., if you have a test suite like this:
```
@pytest.mark.benchmark(group='read')
def test_read(benchmark, method):
    benchmark(read(method=method))

@pytest.mark.benchmark(group='write')
def test_write(benchmark, method):
    benchmark(write(method=method))
```
where you want to benchmark different read/write methods and method is a parameterized fixture taking on 10 values, this will generate 20 images. However, it would be nice if it could generate just 2 images where you could compare the times and confidence intervals visually (one per group).
I'm fairly certain this could be nicely done e.g. with matplotlib/seaborn (http://stanford.edu/~mwaskom/software/seaborn) -- if you've never used either I could give a hand with this

Graph plotting

please give a complete example?

I'm unfamiliar with "py.test" or what a "fixture" is. Reading your docs, there is no complete example of a source file with appropriate command-line. Also, there didn't seem to be any pointers from your documentation to some docs about py.test that might explain. Sure, I can use google to find the info, but maybe it could be more obvious?

(and heck, googling didn't help! I see I'm supposed to run it under py.test, so that's why there's no import in the file... but... ? ... ok a few minutes of experiment, I eventually got it.)

benchmarking side-effectful code

I'm working on benchmarking some functions that modify their input. The input is a list of dictionaries, and the code sorts the list and also adds things to the dictionaries. As a result of this, the benchmarks are not totally true, because the first iteration modifies the input and the consequent iterations have to do much less work, because the input is already sorted.

I can "fix" this problem by doing something (expensive) like deepcopy before the function I'm benchmarking is run, however, this will add to the running time statistics. Any thoughts?

The tests fail due to version bump

https://travis-ci.org/ionelmc/pytest-benchmark/jobs/80087971

Looks like you'd need to update those JSON files in test_storage with 3.0.0a1.

Ops/Sec

Is it possible to display how many benchmarks were run per second?

Marker can't be found on py2?

Was trying to run lazy-object-proxy tests manually using pytest and the latest pytest-benchmark from git, and weirdly enough, it works just fine on Python 3, but doesn't work on Python 2 due to pytest being unable to find the marker. The two environments are pretty much identical aside from the Python version, and the same version of pytest-benchmark is properly installed in both. Any ideas why this could happen?

tests/test_lazy_object_proxy.py:1900: in <module>
    @pytest.mark.benchmark(group="prototypes")
../envs/py2/lib/python2.7/site-packages/_pytest/mark.py:183: in __getattr__
    self._check(name)
../envs/py2/lib/python2.7/site-packages/_pytest/mark.py:198: in _check
    raise AttributeError("%r not a registered marker" % (name,))
E   AttributeError: 'benchmark' not a registered marker
_____________________________________________________ ERROR collecting tests/test_lazy_object_proxy.py ______________________________________________________
tests/test_lazy_object_proxy.py:1900: in <module>
    @pytest.mark.benchmark(group="prototypes")
../envs/py2/lib/python2.7/site-packages/_pytest/mark.py:183: in __getattr__
    self._check(name)
../envs/py2/lib/python2.7/site-packages/_pytest/mark.py:198: in _check
    raise AttributeError("%r not a registered marker" % (name,))
E   AttributeError: 'benchmark' not a registered marker

Error in pygal.graph.box import is_list_like

version: pytest-benchmark-3.0.0
How to reproduce:
py.test --benchmark-histogram

output:

File "/usr/local/lib/python2.7/dist-packages/pytest_benchmark/histogram.py", line 8, in
raise ImportError(exc.args, "Please install pygal and pygaljs or pytest-benchmark[histogram]")
ImportError: (('cannot import name is_list_like',), 'Please install pygal and pygaljs or pytest-benchmark[histogram]')

It seems to be an issue in pygal. I also tried: from pygal.graph.box import is_list_like, and it raises a error.

What is the proper way to workaround it?

Disable tracers when benchmarking

pytest-benchmark fails with `setup.py:tests_require`

The following is inside setup.py;

    tests_require=[
        'pytest-benchmark>=3.0',
        'pytest-raisesregexp>=2.1',
        'pytest-cov>=2.2.0',
        'pytest>=2.8.5',
        'webtest>=2.0.20',
        'tox'
    ]

However, running python setup.py test results in the following;

Searching for pytest-benchmark>=3.0
Reading https://pypi.python.org/simple/pytest-benchmark/
Best match: pytest-benchmark 3.0.0
Downloading https://pypi.python.org/packages/source/p/pytest-benchmark/pytest-benchmark-3.0.0.zip#md5=f8ab8e438f039366e3765168ad831b4c
Processing pytest-benchmark-3.0.0.zip
Writing /tmp/easy_install-tcebs675/pytest-benchmark-3.0.0/setup.cfg
Running pytest-benchmark-3.0.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-tcebs675/pytest-benchmark-3.0.0/egg-dist-tmp-8ifylcko
error: Setup script exited with error in pytest-benchmark setup command: Invalid environment marker: python_version < "3.4"

If I install the package first with pip install pytest-benchmark, then the error goes away.

I'm guessing it has something to do with this line.

give example of parametrizing on input size

I'm using pytest-benchmark for the first time to benchmark bidict:
https://github.com/jab/bidict/blob/0.12.0-dev/tests/test_benchmark.py

I'm wondering how using different input sizes affects my benchmarks (it would be cool to be able to generate a graph that shows that a particular function is e.g. quadratic with respect to input size, for example).

This seems like a common use case people might have when benchmarking their code, but I don't see any examples of how to do this in the README, so I'm wondering if I'm missing something. If not, would it be valuable to give an example or two? If I can figure out how to do this, I'd be happy to work up a PR adding an example to your docs if there is interest.

And if you happen to have any other benchmarking advice from looking at what I'm doing above, it'd be much appreciated.

Thanks!

Feature request: number=1 or max_rounds

I realise how much effort has gone into getting a reasonable average benchmark.

However I have just run into a use case where the unit under test must run exactly once.
It's not so much a benchmark as indicative.
The unit is inserting objects into a database (within a complex seq) so runs after the first are not representative.

A bit of an edge case I know.

For now I'm using:
t = timeit.timeit(sync_objects, number=1)
assert t < 1

Relative benchmarks with manual baseline

Using this example from a previous issue:

@pytest.mark.benchmark(group='read')
def test_read(benchmark, method):
    benchmark(read(method=method))

@pytest.mark.benchmark(group='write')
def test_write(benchmark, method):
    benchmark(write(method=method))

Let's say this outputs

test_read[method0]    264
test_read[method1]    112
test_read[method2]    274
test_read[method3]    130
test_read[method4]    196

test_write[method0]   333
test_write[method1]   99
test_write[method2]   100
test_write[method3]   98
test_write[method4]   79

What if you wanted to see how different methods perform relative to a baseline method? Would be cool to be able to enable output like this:

test_read[method0][*]   100.0%
test_read[method1]       42.4%
test_read[method2]      103.8%
test_read[method3]       49.2%
test_read[method4]       74.2%

test_write[method0][*]  100.0%
test_write[method1]      29.7%
test_write[method2]      30.0%
test_write[method3]      29.4%
test_write[method4]      23.7%

Implementation-wise, maybe benchmark could accept a boolean argument baseline, which, if set to True, would set a benchmark as baseline for the benchmark group, so the code above could be something like

@pytest.mark.benchmark(group='read')
def test_read(benchmark, method):
    benchmark(read(method=method), baseline=(method == 'method0))

@pytest.mark.benchmark(group='write')
def test_write(benchmark, method):
    benchmark(write(method=method), baseline=(method == 'method0'))

Perhaps it could be done differently which is not the point, just wanted to share the idea.

plot histogram in one graph

Hi, thanks for writing this package. I just started use it and found some useful results. I wonder how could I plot histogram into one graph? Currently I have 4 svn files, one for each function.

Thanks.

Is is possible to "parametrize" a benchmark?

I want to benchmark different JSON engines serialization/deserialization functions, with different sets of data. More specifically, I'm trying to convert an already existing set of benchmarks to pytest-benchmark.

Here the contenders is a list of tuples (name, serialization_func, deserialization_func):

@pytest.mark.benchmark(group='serialize default_data')
@pytest.mark.parametrize('serializer',
                         [c[1] for c in contenders],
                         ids=[c[0] for c in contenders])
def test_serializer_benchmark(serializer, benchmark):
    benchmark(serializer, default_data)

@pytest.mark.benchmark(group='deserialize default_data')
@pytest.mark.parametrize('serializer,deserializer',
                         [(c[1], c[2]) for c in contenders],
                         ids=[c[0] for c in contenders])
def test_deserialization_benchmark(serializer, deserializer, benchmark):
    data = serializer(default_data)
    benchmark(deserializer, data)

This will produce two distinct benchmarks tables, one for the serialization function and one for its counterpart. I can go down the boring way of repeating that pattern for each dataset...

What I'd like to achieve is to factorize that to something like the following (that does not work):

@pytest.mark.parametrize('name,data', [('default data', default_data)])
def test_gen(name, data):
    @pytest.mark.benchmark(group=name + ': serialize')
    @pytest.mark.parametrize('serializer',
                             [c[1] for c in contenders],
                             ids=[c[0] for c in contenders])
    def serializer_benchmark(serializer, benchmark):
        benchmark(serializer, data)

    @pytest.mark.benchmark(group=name + ': deserialize')
    @pytest.mark.parametrize('serializer,deserializer',
                             [(c[1], c[2]) for c in contenders],
                             ids=[c[0] for c in contenders])
    def deserializer_benchmark(serializer, deserializer, benchmark):
        serialized_data = serializer(data)
        benchmark(deserializer, serialized_data)

    yield serializer_benchmark
    yield deserializer_benchmark

That way I could reuse the very same code to create benchmarks against all other sets of data, without repeating the code, simply adding them to the initial parametrize:

@pytest.mark.parametrize('name,data', [('default data', default_data),
                                       ('array 256 doubles', doubles),
                                       ('array 256 unicode', unicode_strings),
                                      ])
def test_gen(name, data):
...

Is there any trick I'm missing?