sosy-lab / benchexec Goto Github PK

BenchExec: A Framework for Reliable Benchmarking and Resource Measurement

License: Apache License 2.0

Python 89.92% HTML 0.10% Shell 0.56% Gnuplot 0.35% PHP 0.29% TeX 0.52% Roff 0.20% JavaScript 5.82% CSS 0.04% SCSS 1.75% C 0.22% Jinja 0.02% Dockerfile 0.22%

resource-measurement python cgroups linux benchmarking benchmark-framework benchmark

benchexec's Introduction

BenchExec

A Framework for Reliable Benchmarking and Resource Measurement

News and Updates:

BenchExec is part of Google Summer of Code again! If you are interested in being paid by Google for contributing to BenchExec, check our project ideas and instructions and contact us!
BenchExec 3.18 brings support for systems with cgroups v2!
Linux kernel 5.11 finally makes it possible to use all BenchExec features on distributions other than Ubuntu!
We now provide an Ubuntu PPA that makes installing and upgrading BenchExec easier (docs).
An extended version of our paper on BenchExec and its background was published as open access in the journal STTT, you can read Reliable Benchmarking: Requirements and Solutions online. We also provide a set of overview slides.

BenchExec provides three major features:

execution of arbitrary commands with precise and reliable measurement and limitation of resource usage (e.g., CPU time and memory), and isolation against other running processes
an easy way to define benchmarks with specific tool configurations and resource limits, and automatically executing them on large sets of input files
generation of interactive tables and plots for the results

Unlike other benchmarking frameworks, BenchExec is able to reliably measure and limit resource usage of the benchmarked tool even if the latter spawns subprocesses. In order to achieve this, it uses the cgroups feature of the Linux kernel to correctly handle groups of processes. For proper isolation of the benchmarks, it uses (if available) Linux user namespaces and an overlay filesystem to create a container that restricts interference of the executed tool with the benchmarking host. BenchExec is intended for benchmarking non-interactive tools on Linux systems. It measures CPU time, wall time, and memory usage of a tool, and allows to specify limits for these resources. It also allows to limit the CPU cores and (on NUMA systems) memory regions, and the container mode allows to restrict filesystem and network access. In addition to measuring resource usage, BenchExec can verify that the result of the tool was as expected, and extract further statistical data from the output. Results from multiple runs can be combined into CSV and interactive HTML tables, of which the latter provide scatter and quantile plots (have a look at our demo table).

BenchExec works only on Linux and needs a one-time setup of cgroups by the machine's administrator. The actual benchmarking can be done by any user and does not need root access.

BenchExec was originally developed for use with the software verification framework CPAchecker and is now developed as an independent project at the Software Systems Lab of the Ludwig-Maximilians-Universität München (LMU Munich).

Literature

Reliable Benchmarking: Requirements and Solutions, by D. Beyer, S. Löwe, and P. Wendler. International Journal on Software Tools for Technology Transfer (STTT), 21(1):1-29, 2019. doi:10.1007/s10009-017-0469-y. Journal paper about BenchExec (open access, also with a supplementary webpage and overview slides)
CPU Energy Meter: A Tool for Energy-Aware Algorithms Engineering, by D. Beyer and P. Wendler. In Proc. TACAS 2020, part 2, LNCS 12079, pages 126-133, 2020. Springer. doi:10.1007/978-3-030-45237-7_8
Benchmarking and Resource Measurement, by D. Beyer, S. Löwe, and P. Wendler. In Proc. SPIN 2015, LNCS 9232, pages 160-178, 2015. Springer. doi:10.1007/978-3-319-23404-5_12

License and Copyright

BenchExec is licensed under the Apache 2.0 License, copyright Dirk Beyer. Exceptions are some tool-info modules and third-party code that is bundled in the HTML tables, which are available under several other free licenses (cf. folder LICENSES).

Authors

Maintainer: Philipp Wendler

Contributors:

Aditya Arora
Levente Bajczi
Dirk Beyer
Laura Bschor
Thomas Bunk
Montgomery Carter
Andreas Donig
Karlheinz Friedberger
Robin Gloster
Peter Häring
Florian Heck
Hugo
George Karpenkov
Mike Kazantsev
Michael Lachner
Thomas Lemberger
Sebastian Ott
Stefan Löwe
Stephan Lukasczyk
Alexander von Rhein
Alexander Schremmer
Dennis Simon
Andreas Stahlbauer
Thomas Stieglmaier
Martin Yankov
Ilja Zakharov
and lots of more people who integrated tools into BenchExec

Users of BenchExec

BenchExec was successfully used for benchmarking in all instances of the international competitions on Software Verification and Software Testing with a wide variety of benchmarked tools and hundreds of thousands benchmark runs. It is integrated into the cluster-based logic-solving service StarExec (GitHub).

The developers of the following tools use BenchExec:

CPAchecker, also for regression testing
Dartagnan
ESBMC, also for regression testing and even with a GitHub action for BenchExec
SMACK
SMTInterpol
TriCera
Ultimate

If you would like to be listed here, contact us.

benchexec's People

Contributors

Stargazers

Watchers

Forkers

montycarter martin-neuhaeusser petr-muller cheshire tautschnig mchalupa seahorn zmanchun williamerocha mitchellgerrard smackers letonchanh cryingshadow bjowac peterschrammel mutilin michkot martinhruska wwang0857 ollisaarikivi vlstill gildotomasco wanderseeme ultimate-pa hgoes pablo-aledo lequangloc jhensel jwacalex ottseb niyasc anushrijana trucnguyenlam nakwon-lee berryjam vmordan staticafi mllg lembergerth pombredanne rainoftime schrodibear verifierintegerassignment duanzhao-dz hugovk fheck avritichauhan altmattr hussamaa malteschledjewski kalpit4088 rishabhnambiar kumaradilshan nilay117 farian martinspiessl jkeiren xuechaofelix cedricrupb javapathfinder adi-16-7 versokova hainingbaby yinliangze xvitovs1 bitcalc franck44 etichaudhary hbgit sruland martinnowack carolemieux animeshbchowdhury xurongchen cyberdrudge vishalbhosale5 shaobo-he yshalabi kishorekumar1505028 blizzard4591 jansvejda gaybro8777 gernst vaibhavbsharma tudo-aqua raveenkm hoangmle bquiring jacogeld dennissimon rasoolmaghareh omainv brick-tool-dev tewk lachnerm sim642 cajomferro ftsrg priyankadarke donggeliu

benchexec's Issues

html-tables: escaping of links does not work any more

see broken links to logfiles in BuildBot, where # should be replaced through %23.

--offline option does not work for javascript

(or I don't understand it correctly).

The python code says that if "--offline" is passed in, instead of CDN URL, the local URL is used for JS libraries, yet no JS libraries are contained in the repository?..
Should we choose to package some (reasonable?) version of jQuery/etc together with benchexec?

Lazy-load JavaScript libraries where possible

The HTML tables produced by table-generator include several libraries like JQPlot and zip.js that are only sometimes necessary. These could be loaded on demand to reduce the initial loading time of the table.

Users need the way to specify their own wall time limits in benchmarks

At the moment all benchmarks implicitly have the default wall time limit that is 30 seconds greater than a specified CPU time limit. Likely users need/want develop new benchmarks with explicit wall time limits. For old-style benchmarks new versions of BenchExec can issue warnings if wall time limits aren't specified.
BTW, runexec already has an option to specify wall time limits explicitly, so its direct users can use it already.

Report problems during benchmarks

The benchmark scripts should detect if some possible problems occurred during benchmarking and issue a warning to the user.

The following problems could be detected:

swapping: the memory cgroups allow to monitor whether swapping occurs for any of the processes in each cgroup (file memory.stat), so we could detect this both for the benchmarked process and for the whole system.
thermal throttling of the CPU due to overheating: can be detected from the files /sys/devices/system/cpu/cpu*/thermal_throttle/*_throttle_count

Add button to shrink/expand table header in HTML tables

On small monitors, the header of HTML tables takes up a lot of valuable screen space.

We could provide a button that hides most of the header rows, probably all except for the row "run set" and the row with the actual column names.

Implement wall time limit for systems without cpuacct cgroups.

A wall time limit does not actually require the cpuacct cgroup, so the implementation should be made more flexible and allow specifying such a limit even without the cpuacct cgroup.

Score should not be SV-Comp specific

BenchExec should support multiple scoring schemata, and use the appropriate one depending on the property. If there is no scoring schema for a property, the row "Score" should be hidden from the tables.

Allow multiple executions of same command with runexec

To increase confidence in time results, it may sometimes be useful to run the command multiple times, and report all results, as well as mean/median/standard deviation etc. This could be implemented in runexec, similarly to what dumbbench does.

Table-generator fails to format very small values

Table-generator crashes because of the assertion in line 234 of columns.py if the value is very small (< 0.00001) (example file that causes the crash)

Assignment of non-sibling processors to a single task

Hi,

when running a tool which uses multiple cpu cores, BenchExec is assigning sibling cores to a single run even though non-sibling cores are available. Is this an intended behavior?

In particular, I have a benchmark with <benchmark cpuCores="4" threads="1" ...> and the output of BenchExec is

2016-01-21 17:07:28,707 - DEBUG - List of available CPU cores is [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23].
2016-01-21 17:07:28,709 - DEBUG - Physical packages of cores are defaultdict(<class 'list'>, {0: [0, 1, 2, 3, 4, 5, 12, 13, 14, 15, 16, 17], 1: [6, 7, 8, 9, 10, 11, 18, 19, 20, 21, 22, 23]}).
2016-01-21 17:07:28,710 - DEBUG - Siblings of cores are {0: [0, 12], 1: [1, 13], 2: [2, 14], 3: [3, 15], 4: [4, 16], 5: [5, 17], 6: [6, 18], 7: [7, 19], 8: [8, 20], 9: [9, 21], 10: [10, 22], 11: [11, 23], 12: [0, 12], 13: [1, 13], 14: [2, 14], 15: [3, 15], 16: [4, 16], 17: [5, 17], 18: [6, 18], 19: [7, 19], 20: [8, 20], 21: [9, 21], 22: [10, 22], 23: [11, 23]}.
2016-01-21 17:07:28,710 - DEBUG - Going to assign at most 1 runs per package, each one using 4 cores and blocking 4 cores on 1 packages.
2016-01-21 17:07:28,711 - DEBUG - Final core assignment: [[0, 1, 12, 13]].

However, cores 0 and 12 are sibling virtual cores mapped to the same physical core (and so are 1 and 13). Shouldn't BenchExec choose a core assignment [[0,1,2,3]], since cores 2 and 3 are available and not siblings of any of used cores?

If this is the case, I think the problem is caused by an assignment on the line 181 of resources.py, where all sibling cores are assigned to be used. More reasonable behavior from my point of view would be to assign only the particular core and block all of its siblings, if there is enough of physical cores available. For example, the following change fixes the problem in my case:

if core not in cores_with_siblings:
    cores.append(core)
    cores_with_siblings.update(c for c in siblings_of_core[core] if not c in used_cores)

-N parameter lead to crash

I found that if I use -N parameter for parallel run, results are much much more different compared to that without -N parameter, and most are crash.

Direct plot with "correct results only" does not work

http://buildbot.sosy-lab.org/results/integration-policyIteration/1767.-r17671%23integration-policyIteration.2015-08-24_1806.results.html
Click on abstraction time -> direct plot -> correct results only.
Maybe I'm doing something wrong, but I always get this result on all result sets I get.

Table Enhancement: Frozen column and row headers

It would be really nice to have tables generated with frozen or fixed column and row headers. With this, the benchmark file row headers on the left would always be visible, in addition to the tool/execution info column headers always being visible on top.

This is by no means a feature I'm waiting on, so for my purposes it is very low priority, and simply an enhancement. I just thought it would be nice, as we've started to generate real results, and I've noticed it's difficult to figure out which benchmark input file is causing an erroneous result when there are several runs displayed in the same table.

I haven't really looked at what's involved, and whether this is an extensive overhaul of table-generator. If it's a big job, don't worry about it...

Derived columns in tables

Provide a way to have "derived" columns in tables, i.e. columns with values that are computed from other column values. First step would be to be able to use only the values of the same run and the same run set as input.

As calculation language we could simply use Python with eval, but this is insecure, so maybe it should be enabled only if a special command-line flag is given. Alternatively we could implement our own expression evaluation language.

This is how it could look like:

<column title="column title" type="derived">calculation code</column>

Alternative names for "derived" are "calculated" or "computed".

Columns should probably be able to be marked as "hidden", which would they do not appear in the tables but are available as input to derived columns.

Upgrade JavaScript libraries

JQuery and JQPlot are required by the table-generator, but are quite old versions.

Incorrect summary stats when modifying status column 'value'.

This is by no means necessary, and is a result of me modifying the output xml files, so I am not sure whether this should actually be changed as I suggest, or if there is perhaps another way to accomplish my goal.

To create an output like the final results on the svcomp webpage, setting the status column value to "witness confirmed", "witness timeout", etc breaks the stat counting on the new, more fine-grained results summary. (See attached image)

It looks like in tablegenerator/init.py, in the get_stats_of_run_set(), table-generator determines whether each item was supposed to be true or false by using the status column directly if CATEGORY is CORRECT, and negating the status column if CATEGORY was WRONG.

As a result, if the status column 'value' isn't in results.py's RESULT_LIST, the result won't count towards any of the four combinations of correct/incorrect true/false.

If expected results were generated using the sourcefile's name and associated property file (using satisfies_property_file, I think), then it wouldn't matter what value is in the status column's value field, and the counting for things like this would work.

Attached is an example of where this is problematic for me. The two benchexec runsets are actually the same run, however, one has had all false files witness checked. Notice that the scores are the same, but correct/incorrect counts are not identical.

Filter for only differing and only equal rows in HTML tables

The "Filter Rows" possibility should be extended to allow easily filtering for rows that have differing values in their status columns, just like the "diff" table that table-generator also produces.

The negation of this filter (only rows where all status values are equal) can be also useful for graphs and statistics and should thus also be added.

permission to add wrapper script for hiprec

Dear Dirk Beyer,
I am Le Quang Loc, from NUS- SIngapore. This year, our team participate SVCOMP with hiprec.

May I ask you the permission to commit a wrapper into the project?

Thank you,
Loc

Table-generator has formatting issues related to shared state

When replacing the usage of concurrent.futures.ProcessPoolExecutor with concurrent.futures.ThreadPoolExecutor or with Util.DummyExecutor, the output of table-generator changes with regard to some formatting of HTML table values.

I do not know which of the two behaviors is the correct one, but clearly this is unintended.
I suspect it is the case because some of the Column objects get modified, and when there are multiple copies of these objects (one in each worker process), the behavior is different from the case where we have shared instances among all worker threads.

Introduce Comparison Column to Output Tables

I'm not sure if this idea fits within the direction you guys would like to take BenchExec, however, I like to get my ideas written down because I forget them 15 minutes later if I don't. If this doesn't fit within the direction of BenchExec, please feel free to close this straight away.

Anyway, it might be useful to be able to compare two executions. What I have envisioned is an additional column for a table with two tool runs. In this additional column, the rows for each benchmark could detail information such as difference in runtime (or speedup as a ratio), memory usage, and discrepancy in execution result (correct vs wrong, etc). In the execution summary at the bottom, different metrics could be used for comparing the two executions as a whole, such as difference in aggregate runtime, aggregate speedup, aggregate accuracy, etc.

In fact, instead of building special tables with this additional column, two columns could be selected for comparison dynamically (possibly similar to a "Compare Items" functionality on an e-commerce website), and the execution comparison column can be displayed with javascript alone, without having to alter the output table from tablegenerator, since all required information is already present in the table. (That is, add the javascript code to compare execution columns to all tables generated by tablegenerator, and then users can activate the functionality by selecting two columns to compare.)

This idea has no urgency - I just thought it would be a nice enhancement.

outputPath & base_dir incorrect for no output file (html to stdout via option "-o -")

When using the "-o -" option, file paths for benchmark source files and run result logfiles are incorrect.

At least for my purposes, all file paths referenced by the html sent to stdout should ALWAYS be relative to the current working directory. I use this "-o -" to build a table on the fly that is served as if the html physically existed in the directory where table-generator was called.

When using the "-o -" option, if paths referenced by the html sent to stdout aren't always relative paths rooted in the current working directory from where table-generator was called, then we need a way to control the base directory for relative paths.

Here's a little more detail regarding the problem. I don't have a patch to propose, since I'm not sure how everyone is using this "-o -" option... I put LOTS of detail here, so only read as much as you need to understand the problem. ;)

All functions mentioned are from ./benchexec/tablegenerator/init.py.

In the function insert_logfile_names(), the path to each log file is added to its corresponding element in the resultFile xml tree. This is done using a relative path rooted in the current working directory where table-generator was executed. (This is good, for my purposes).

However, before table-generator injects things into the template, this base directory gets recalculated. In main(), on line 1242 of ./benchexec/tablegenerator/init.py, the 'outputPath' is changed to be the same directory as all input xml files if all input xml files are in the same directory. If all input xml files are not in the same directory, then 'outputPath' is set to the 'DEFAULT_OUTPUT_PATH', which is 'results/'.

As a result, for the case when input xml files come from different folders, when the relative path is calculated for logfiles (and for original input sourcefiles, for that matter), a "../" is prepended to each file, to step back a level from this "results/" directory. For the case when input xml files come from the same folder, the relative path to sourcefiles and logfiles is adjusted to be relative to this common folder. In either case, the path is not correct to the actual location of the folders, unless the input xml files happened to be in the same relative folder structure expected by the if branch on line 1242.

To make this more concrete, here is my setup.

We have a main 'data' folder. In this folder, we have several folders matching the pattern "exec_". Each of these contains an input.xml file, and a results folder. In the results folder are the output xml files containing results, as well as the logfiles directory (this results folder is generated by BenchExec, so the contents of this folder are as BenchExec creates them). From the main 'data' folder, a script called results.py is executed, which searches through all of the "exec_" folders to find output xml files. results.py calls table-generator from this main 'data' folder, and passes in the output xml files selected by the end user to be included in the table (all output xml files are relative to this 'data' folder). Finally, all input benchmarks are located in data/sv-benchmarks.

Our apache virtual directory (called "dev" at the moment) is rooted at this data folder. The user opens ./results.py, passing in the filter results which determine which output xml files will be included in the table. The resulting html from table-generator needs to reference all files relative to this virtual directory root (in this case, ./data/), or else the correct directory isn't selected. With the current problem, benchmarks end up getting referenced as something like http:///dev/../../sv-benchmarks/ssh/s3_srvr.blast.03_false-unreach-call.i.cil.c. Similarly, the logfile link in the 'status' column gets something like http:///dev/smack.2015-05-28_1417.logfiles/corral_u8.s3_srvr.blast.03_false-unreach-call.i.cil.c.log.

Respectively, these two paths need to be http:///dev/sv-benchmarks/ssh/s3_srvr.blast.03_false-unreach-call.i.cil.c and http:///dev/exec_/results/smack.2015-05-28_1417.logfiles/corral_u8.s3_srvr.blast.03_false-unreach-call.i.cil.c.log

Soft memory limit

Similarly to time limits, BenchExec should support a "soft" memory limit. When it is reached and the kernel notifies us, we should send the TERM signal to the process, increase the memory limit, and let the process continue until it terminates or reaches the hard memory limit.

Support for checking output of SMT solvers

When benchmarking SMT solvers, BenchExec should be able to verify the correct result (sat/unsat) as given by the solver in SMTLib 2 format.

Improper usage of sys.exit()

I noticed that, especially in runexecutor.RunExecutor, sys.exit() is used to handle error cases quite often. It's fine to simply exit the interpreter from code which is only meant to be run standalone but inside code which may be imported this seems to be very improper. Third party code should be given a chance to handle such cases properly by raising something derived from Exception (you could catch SystemExit but this feels just wrong here).

How to add dynamic library?

If my archive has dynamic library that some of my binaries depend on, should I copy the dynamic library into /usr/local/lib in the very beginning of my executable?

Support tables with multiple results for the same file in a single column

For regression tests of a project, it is convenient to have a benchmark definition with multiple run definitions with different configurations. In order to create a table for checking for regressions, it is convenient to put the results for all configurations below each other in a single column, and have multiple columns with results from different revisions.

This is supported by the table-generator with the union tag in the table-definition file, but it does not support this if each of the run definitions is run on the same files (each file belongs to exactly one row in the tables). This should be changed to allow such a table:

	revision 0	revision 1
rundef1.file1	...	...
rundef1.file2	...	...
rundef2.file1	...	...
rundef2.file2	...	...

Unable to install git version

Hi, I'm trying to install the git-version of benchexec with the following command

pip3 install --user git+https://github.com/sosy-lab/benchexec.git

However, I obtain this error, which I don't fully understand:

Downloading/unpacking git+https://github.com/sosy-lab/benchexec.git
  Cloning https://github.com/sosy-lab/benchexec.git to /tmp/pip-0hi2gv95-build
  Running setup.py (path:/tmp/pip-0hi2gv95-build/setup.py) egg_info for package from git+https://github.com/sosy-lab/benchexec.git
    no previously-included directories found matching 'doc/.build'
    warning: build_py: byte-compiling is disabled, skipping.

    warning: install_lib: byte-compiling is disabled, skipping.

    Traceback (most recent call last):
      File "<string>", line 17, in <module>
      File "/tmp/pip-0hi2gv95-build/setup.py", line 87, in <module>
        zip_safe = True,
      File "/usr/lib/python3.4/distutils/core.py", line 108, in setup
        _setup_distribution = dist = klass(attrs)
      File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 239, in __init__
        self.fetch_build_eggs(attrs.pop('setup_requires'))
      File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 264, in fetch_build_eggs
        replace_conflicting=True
      File "/usr/lib/python3/dist-packages/pkg_resources.py", line 620, in resolve
        dist = best[req.key] = env.best_match(req, ws, installer)
      File "/usr/lib/python3/dist-packages/pkg_resources.py", line 858, in best_match
        return self.obtain(req, installer) # try and download/install
      File "/usr/lib/python3/dist-packages/pkg_resources.py", line 870, in obtain
        return installer(requirement)
      File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 314, in fetch_build_egg
        return cmd.easy_install(req)
      File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 616, in easy_install
        return self.install_item(spec, dist.location, tmpdir, deps)
      File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 646, in install_item
        dists = self.install_eggs(spec, download, tmpdir)
      File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 834, in install_eggs
        return self.build_and_install(setup_script, setup_base)
      File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 1045, in build_and_install
        eggs.append(self.install_egg(dist.location, setup_base))
      File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 871, in install_egg
        (os.path.basename(egg_path),os.path.dirname(destination)))
      File "/usr/lib/python3.4/distutils/cmd.py", line 336, in execute
        util.execute(func, args, msg, dry_run=self.dry_run)
      File "/usr/lib/python3.4/distutils/util.py", line 301, in execute
        func(*args)
      File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 1131, in unpack_and_compile
        self.byte_compile(to_compile)
      File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 1147, in byte_compile
        byte_compile(to_compile, optimize=0, force=1, dry_run=self.dry_run)
      File "/usr/lib/python3.4/distutils/util.py", line 355, in byte_compile
        raise DistutilsByteCompileError('byte-compiling is disabled.')
    distutils.errors.DistutilsByteCompileError: byte-compiling is disabled.
    Complete output from command python setup.py egg_info:
    no previously-included directories found matching 'doc/.build'

warning: build_py: byte-compiling is disabled, skipping.



warning: install_lib: byte-compiling is disabled, skipping.



Traceback (most recent call last):

  File "<string>", line 17, in <module>

  File "/tmp/pip-0hi2gv95-build/setup.py", line 87, in <module>

    zip_safe = True,

  File "/usr/lib/python3.4/distutils/core.py", line 108, in setup

    _setup_distribution = dist = klass(attrs)

  File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 239, in __init__

    self.fetch_build_eggs(attrs.pop('setup_requires'))

  File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 264, in fetch_build_eggs

    replace_conflicting=True

  File "/usr/lib/python3/dist-packages/pkg_resources.py", line 620, in resolve

    dist = best[req.key] = env.best_match(req, ws, installer)

  File "/usr/lib/python3/dist-packages/pkg_resources.py", line 858, in best_match

    return self.obtain(req, installer) # try and download/install

  File "/usr/lib/python3/dist-packages/pkg_resources.py", line 870, in obtain

    return installer(requirement)

  File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 314, in fetch_build_egg

    return cmd.easy_install(req)

  File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 616, in easy_install

    return self.install_item(spec, dist.location, tmpdir, deps)

  File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 646, in install_item

    dists = self.install_eggs(spec, download, tmpdir)

  File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 834, in install_eggs

    return self.build_and_install(setup_script, setup_base)

  File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 1045, in build_and_install

    eggs.append(self.install_egg(dist.location, setup_base))

  File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 871, in install_egg

    (os.path.basename(egg_path),os.path.dirname(destination)))

  File "/usr/lib/python3.4/distutils/cmd.py", line 336, in execute

    util.execute(func, args, msg, dry_run=self.dry_run)

  File "/usr/lib/python3.4/distutils/util.py", line 301, in execute

    func(*args)

  File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 1131, in unpack_and_compile

    self.byte_compile(to_compile)

  File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 1147, in byte_compile

    byte_compile(to_compile, optimize=0, force=1, dry_run=self.dry_run)

  File "/usr/lib/python3.4/distutils/util.py", line 355, in byte_compile

    raise DistutilsByteCompileError('byte-compiling is disabled.')

distutils.errors.DistutilsByteCompileError: byte-compiling is disabled.

----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /tmp/pip-0hi2gv95-build
Storing debug log for failure in /home/mint/.pip/pip.log

Can someone help me with this issue?.

Regards.

Generate "max score" value in tables

The table-generator should compute the maximal achievable score for the benchmarks in a given table and add it to the HTML table. To do this, it needs to know the property that was verified for each benchmark, because the score depends on the verdict for the property. Thus BenchExec needs to add the property to the output XML file.

Round numbers in tables to significant digits

All numeric measurement results shown in the HTML tables should be rounded to a given number of significant digits instead of to a fixed number of decimal places, e.g., 12.3 and 1.23 instead of 12.344 and 1.234.

The numbers should stay aligned at the decimal point. This could be done by computing the maximum width of all numbers in a column and filling the remaining places of each numbers with white space. The characters   (FIGURE SPACE) and   (PUNCTUATION SPACE) are the correct whitespace characters for same-width replacements of digits and dots, respectively.

For example, the value 1234.567 could become 1230     (1230 ).

HTML	Shown as
`1230    `	`1230`
` 123    `	`123`
`  12.3  `	`12.3`
`   1.23 `	`1.23`
`    .123`	`.123`

Feature Request - Allow Memory Oversubscription

It would be really nice to be able to oversubscribe memory. I realize this causes results to be unreliable, however, in preparation for svcomp it would be nice to more fully utilize our servers.

For example, we have some machines with 32 cores (64 with HT) and 128GB. To simulate svcomp, we need each concurrent benchmark to have 15GB ram, which means we can only run 8 benchmarks concurrently. However, visually monitoring total system memory usage never really shows more than around 32GB in use at a time.

It would be nice to have a mode for benchexec where memory can be oversubscribed. Since it is unlikely that all concurrent benchmarks will peak their memory consumption at the same time, it would allow us to run more benchmarks concurrently. For example, (on this system) allow 16 concurrent benchmarks to run, each with 15GB of dedicated memory, despite there only being 128GB of physical memory.

If it does happen that all physical memory gets used, it would be fine for my purposes if benchexec just aborted. (This is probably preferable to allowing things to continue while utilizing swap space, since this could introduce timeout results that wouldn't be there without memory oversubscription).

Perhaps this isn't feasible with cgroups, but if the total memory outlay enforcement is in benchexec rather than cgroups, it would be nice to be able to more full utilize the hardware.

To give an idea of my imagined workflow, we would run lots of benchmarks concurrently as we fix bugs in our tool. Each bug fix will warrant re-running benchmarks so we can ensure the bug fix has resolved the issue (and hasn't introduced new bugs). During this time, the performance (runtime, memory consumption) isn't particularly important. This is when oversubscription would be useful.

After we freeze bug fixes to the tool, we would want to rerun things, but more faithfully simulate svcomp. During this time, performance would be relevant, so we wouldn't want to oversubscribe memory.

Closing overlay panes of HTML tables is broken

Since 4ae5dde the overlay panes of HTML tables generated by the table-generator are broken: they cannot be closed anymore, neither by clicking beyond them on the background, neither by opening another pane.

XML files should use a DTD declaration

The example XML files in the documentation do not contain a <DOCTYPE> tag, although there is a DTD available that could be specified. This would allow automatic validation of the files.

To add this, we should first think about the best way to specify DTDs, probably defining a canonical URL for them.

Restrict and document environment variables

There is a paper that shows that the size of the environment variables of a process can have a non-predictable impact on the performance. In order to avoid this and increase the reproducibility, BenchExec should probably restrict the set of environment variables to a minimally necessary set and document their values in the result file.

It should also be considered whether environment variables like LANG, which might control the behavior of many programs, should be fixed to specific values by default.

Support benchmark sets with files with equal names

Currently, each input file in a benchmark set needs to have a unique name, it is not sufficient to have equally-named files in different directories. The reason is that the file name is used for the log-file name. BenchExec should support having files dir1/file.c and dir2/file.c in a single benchmark.
This can be achieved by putting the log files into ...logfiles/dir1/file.c.log and ...logfiles/dir2/file.c.log instead of ...logfiles/file.c.log.

Make tables better usable

Many features in the HTML tables created by table-generator are quite hidden.

There should be a bar at the top of the table (preferably one that does not scroll) with entries for the features "select columns", "filter rows", "quantile plot", and "scatter plot".
Without JavaScript, the bar should be hidden, and it should be printed.

In order to use quantile plots from the table, the quantile-plot window should get a drop-down box for selecting one of the available columns, like the scatter-plot window has.

Improve core assignment with multiple NUMA regions per socket

If a machine has multiple NUMA regions per socket, BenchExec should try to distribute the runs across all NUMA regions of each socket. Currently it first uses the cores that belong to one NUMA region, then those of the second region etc.

Feature request: additional command line options

Hi,
I often find myself editing the XML file repeatedly to turn on/off some options, which gets error prone.
What if we allow to set properties on the command line? E.g. we can say that all options after -- gets passed to the binary, as in e.g. $ benchexec blah.xml -- -setprop optionIWant=True.

In a similar vein, I think it would be very useful to have an option which would allow to set a name for the output .xml file, as again often it is tiresome to create a new config file just to set one option to true.
Something like $ benchexec blah.xml --out=myresults.xml

Thoughts?

Columns with links to output file in HTML tables

Provide a way to add columns to the HTML tables that contain a link to an output file of the tool, if it exists. This could be specified for example like this:

<column title="column title" type="link">file/name.txt</column>

Human-readable memory consumption

Hi,

For benchexec tables I find it very difficult to read memory usage values like "3442581504".
I can only assume it's bytes (units are written in the brackets for time, but not for memUsage!) and I would much rather prefer something like "3.5GB".
Most unix tools have this ability, e.g. check du -h

Add "sudo" functionality to benchmark scripts

Running a benchmarked tool under a different user account than the benchmarking framework prevents the tool from (accidentally) messing with the resource measurements, and is thus (slightly) preferable. It should be relatively easy to add this to the scripts, at least such that it is usable if the administrator configured sudo appropriately.

Update statistics in HTML tables when filtering rows

The statistics in the footer of the HTML tables should get automatically recomputed after the row filter was changed. For this the raw data values need to be added to the HTML, possibly as additional data attributes to the table cell tags.

The statistics computation inside the table-generator should still stay such that the statistics are also available if JavaScript is blocked in the browser, and to not delay loading the table further.

Where to submit xml configuration?

I didn't found information about how to submit the task definition xml? Maybe I missed some information...

add "requiredfiles_per_task" in benchmark.xml (especially for VCloud)

Instead of distributing all "requiredfiles" onto all workers (this runs in O(#files * #workers)!), it would be sufficient to send "requiredfiles_per_task" only to the worker for this task (this should need only O(#files)). This is beneficial for witness-checking, precision-re-usage, and everything related to re-using task-based information from a previous benchmark as input for new benchmark-tasks.

The substitution of the substring "${sourcefile_name}" in the "requiredfiles_per_task"-element has to be evaluated once per task.

This issue has to be implemented in benchexec first, then it can be used by other scripts based on benchexec. This issue also requires to modify the script vcloud.py at CPAchecker to simply write the "requiredfiles_per_task" at the end of the task-lines (just like an additional sourcefile, tab-separated). The VCloud itself does not need to be changed.

Example:
By replacing (1) by (2) the files would only be used for the task itself, because only there the name is matching (instead of the wildcard "*" that matches all files.

(1) <requiredfiles>../../test/results/benchmark.2015-05-27_1105.logfiles/test.*.files/output.txt</requiredfiles>
(2) <requiredfiles_per_task>../../test/results/benchmark.2015-05-27_1105.logfiles/test.${sourcefile_name}.files/output.txt</requiredfiles_per_task>

Use tool-specific code for adding columns to benchmark result

Currently, both benchexec and table-generator support <column> tags which allow to add values from the tool output to the table. However, only benchexec uses a method from the tool-specific wrapper script. table-generator supports this feature only if the output is in the same format as CPAchecker (label: value on stdout). table-generator should use the same methods as benchexec, to allow this feature independent of the benchmarked tool.

tablegenerator: support both table definition and raw result file(s) as arguments

Hi,

Re-using table definitions across different benchexec runs is slightly annoying, because the produced filename keeps changing (post-fixed with the date), and it is not user-friendly to have to edit the table definition each way.
Would be nice if I could instead specify logfile locations using an additional command line argument to table-generator.

Make back button of browser close overlay pane in HTML panels

Some users expect to be able to click on a link (such as the log files in tables) and use the back button to get back to the original page, but because we open the target of the link in an overlay pane with JavaScript, this is not possible. As it is possible to manipulate the back button of the browser through JavaScript, we should do so and make the back button close the overlay pane (or, even better, let the overlay pane show the previous content if there is one).

Compress results

BenchExec results, especially the tool output, can take up a large amount of disk space. Additionally, the large number of small files can make file systems slow.

The following features should be added to avoid these problems:

benchexec should get a parameter --compress-results that compresses the XML result files with BZip2 and puts all log files into a ZIP file.
table-generator should be able to read column values from zipped log files.
serveFileFromZIP.php can be used to make log-file links in HTML table work if they reside on a web server, this should be documented.
For local viewing of tables and log files, some JavaScript should be added to the table that can read a log file from the ZIP archive.

Provide check whether cgroups are available

An automatic script that checks whether cgroups are available, the permissions are correct etc. can be useful. This script should also check whether there is any daemon that interferes with the cgroups of a process (as cgrulesengd may do depending on its configuration) and warn the user if this happens.

Debian package for BenchExec

We should find a way to automatically generate Debian packages for BenchExec in our build.