Here's long list of what thorough testing support could be. It is more tooling support than language feature.
1] It should be really easy to write a new test.
TEST()
{
assert(2 + 2 == 4);
}
TEST()
{
assert(2 + 2 == 5);
}
The main point is: one doesn't need to invent a unique name for a test.
The usual alternatives:
@test
void foo()
{
assert(2 + 2 == 4);
}
or:
void test_foo()
{
assert(2 = 2 == 5);
}
give false perception that test is a function (it is not, no one calls a test). Inventing a name is hard in large programs, IDE browser may be cluttered with these useless names.
Uppercase TEST
makes it easy to spot.
2] Tests may have optional parameters. For example, maximum expected time for the test. If it takes longer, there's something wrong, error could be shown.
Test parameters are not like ordinary function parameters, they should be free form. E.g.:
TEST() // no parameters
{
}
TEST(time < 10 ms) // one parameter
{
...
}
TEST(10 ms < time < 30 ms) // still only one parameter
{
...
}
// two parameters: a name if we like to call the test individually, max time, plus some separator
TEST(name = "xyz" | time < 1 s)
{
...
}
// three parameters
TEST( qwerty | asdf | x = true)
{
...
}
There should be several implicit parameters available:
- something like
__FILE__
/__LINE__
for every test
- how long ago (e.g. in seconds) was the source file with current test modified. Useful when running only tests from recently modified source files.
3] The last component of test system is the "test runner". It collects all tests, reads their parameters (if any), parses these parameters (giving error if it doesn't understand) and then executes all or some of the tests.
There should be default test runner, which understands couple of the most common parameters (name, timeout, ...).
Its API could be:
// returns # of executed tests,
uint run-all-tests(bool show_results_dialog_on_success);
uint run-recent-tests(uint recent_in_seconds, bool show_results_dialog_on_success);
The default test runner should be able to verify timeout constrains, if present. A default timeout could be e.g. 1. second.
4] Custom test runner may be used e.g. to run those tests used for performance regressions. These tests would be identified by a parameter, their duration would be stored somewhere, and if it gets worse, programmer would be notified.
Another use case for a test runner is code coverage. Certain tests could be labeled so, and run only when code coverage is needed (because they would take lot of time).
Another use case is to exclude platform specific tests.
5] It should be up to the programmer, to invoke the test runner at the beginning of the application. Either the default one (e.g. run-recent-tests
), or a custom variant.
There should be no special "test invocation mode" for the compiler. It is hassle, it complicates things, it is inflexible. Just let the programmer do it explicitly in main
, exactly as he wishes.
6] Where should be tests placed? There are several options:
- after the relevant code (e.g. after a function)
- at the end of source file (this one I do not recommend, gets messy very fast)
- in a separate file (not to overcrowd the source file)
IMO programmer should have choice. Most important small tests could be placed after the functions, those long and checking minutae details could be put into the separate file. Let's call it companion test file.
Companion test file should have name similar to its "parent source file", should have full access to it, w/o need to import anything. It should behave as if it was copy-pasted at the end of the source file.
Companion test file should define only tests (plus maybe helpers), should not export anything, should not be importable elsewhere. It should be just a convenient storage for tests, nothing else.
7] Tests should have access to everything. Nothing should be private
to them.
8] Since the language plans to have generics, there could be tests for uncompilable code:
// this parameter say the code inside test must NOT compile
TEST(does-not-compile)
{
...
}
Parameter does-not-compile
cannot be used with any other parameter together.
The compiler should make sure the code really doesn't compile, but NOT because of some trivial error, like unbalanced parenthesis or an invalid name.
9] If possible, test should be able to use any part of the project, w/o explicit need to import anything. This is keep the code clean, not to pollute it it because of the tests. If all tests are removed, the rest of the code should not need any modification.
E.g.
TEST()
{
// this should not require explicit import, if compiler is able to infer what it is
SomethingFromThisProject x;
assert(foo() == x);
}
I'm not sure whether this is feasible, but it would help a lot.
10] With usual compilation modes debug
and release
, it should be possible to use unit test both in debug and in release too (e.g. for performance checks, and to make sure optimization didn't screw up something).
11] All tests should do memory checking. Whatever was allocated within certain test, should be also deallocated right there. I'd implemented such feature in C and C++ testing library, and it does wonders to keep application free of leaks.
I know there are big complicated tools that try to do the same, but having such tool inside every test allows to identify leaks very quickly.
Such checking against the leaks could be major "selling point" for the language.
Intentionally leaking tests could be annotated with a parameter.
12] assert
could be seen as yet another tool, not just as a function:
When this fails:
assert(x == y)
you may be interested what the values of x
and y
were. Advanced assert could help:
assert(x == y | x = %x y = %y, z = %z); // z is an interesting value visible in this scope
Here, if assert fails, it would print x, y and z values. The syntax of assert should not be restricted by language rules. Whatever helps should be available. E.g. I could imagine something like:
assert(x == y)
{ // code block accompanying the assert
a = x.foo();
b =y.bar();
assert_print(Because x has %a and y has %b it failed here);
}
This feature would help a lot against Heisenbugs.
13] When assert is fired while a test is running, it should show the exact location of the test (it should not be hidden inside a nearly useless long stack trace).
14] There are different kinds of assert.
-
The good old ordinary assert
. Should be used a lot, should be active in debug
mode, should be compiled away in release
mode. Nothing unusual here.
-
The assert used in a test, at the top level of such test:
TEST()
{
assert(2 + 2 == 4);
}
If test are allowed in release
mode, then such asserts should not be compiled away. There are two options:
a) use different name (e.g. verify
). It would be the same as assert, but allowed only in tests, at the top level, nowhere else. It would be present even in release mode.
b) compiler should keep asserts in tests (at the top level) intact, even in release mode
-
There could be asserts "just to be sure", placed to notify the user that something impossible did actually happen. E.g.:
if (something-impossible) {
assert(false); // cannot happen
return -1;
}
However, when one does truly defensive testing, such situations should be arranged and tested.
But how to distinguish the situation when we intentionally provoked this situation versus the unexpected error? There are two options:
a) If assert(false)
is fired, it would check whether a test is running. If it is, then it would assume it was triggered intentionally, and it would not show the error. If tests is not running, impossible bug was spotted, show it.
b) there could be special form of assert, e.g. impossible_assert()
. If invoked during a test, it does nothing, outside of a test it shows error.
15] The language should standardize how to distinguish between debug
and release
modes, between tests compiled in and tests not present. It should avoid the C/C++ mess with NDEBUG
, DEBUG
, _DEBUG
and other inconsistent inventions.
16] Mocking support. The biggest feature I can imagine. Ability to replace specified functions for the duration of the test:
TEST()
{
mock fopen = ... // fopen dummy reimplementation
mock fclose = ...
// code doing lot of fopen/fclose, but using mocks instead.
}
It could be implemented using function pointers. In release mode without test there would be no performance penalty at all.
17] Mocking support could be extended even to constants. E.g. timeout constant, to keep test duration down.
It could be also implemented using function pointers. The constant would became variable accessed by calling that function pointer.
18] Support for white box testing. Another potentially huge feature.
The problem: I want to be sure that this code invoked exactly 2 allocations and 2 deallocations and does NOT invoke any TCP/IP calls.
This could be handled by making the code very complicated, or by using "tracing" feature proposed here:
http://akkartik.name/post/tracing-tests
Basically, it could work this way:
- Within a test you specify your intent, to watch for certain "traces". Here it would be "allocate", "deallocate", "socket", etc.
- If such a trace happens (either it is logged explicitly somewhere in the code, or implicitly, e.g. a funtion call), then such a trace would be stored somewhere.
- When the test ends, it goes through stored traces. It makes sure there are 2 "allocate", 2 "deallocate", no "socket" etc. It could check their proper order, it could check parameters (like bytes allocated).
E.g.
TEST()
{
// I 'm interested in these calls - record them
register-trace("allocate", "deallocate", "socket");
...
... code to be tested
...
// now I do check whether my expectations were correct
assert(check-for-trace("socket") == false); // no TCP/IP?
// 2 allocations and then 2 deallocations?
assert(check-for-trace("allocate"));
assert(check-for-trace("allocate"));
assert(check-for-trace("allocate") == false); // not 3
assert(check-for-trace("deallocate"));
assert(check-for-trace("deallocate"));
assert(check-for-trace("deallocate") == false);
// all collected traces would be wiped out the end of the test
}
A trace could be created explicitly:
void* allocate(uint bytes)
{
...
TRACE("allocated %u bytes", bytes);
return p;
}
or implicitly, by the compiler inserting trace when a function is called. The compiler would know (by checking the "register-trace
" within all tests) which functions could be possibly checked and which do not.
void* allocate(uint bytes)
{
...
TRACE("allocate()"); // inserted implicitly by the compiler
...
}
Implicit traces reduce clutter in the code.
The compiler should check the traced strings, to make sure there's no typo, should really do the implicit traces (to avoid unneeded clutter in the code).
The whole mechanism should be well optimized not to slow down the tests too much and to optimize trace data.
When tests are not running, no traces would be collected. There would be only small impact on performance.
When tests are not compiled in, there would be no traces and no impact at all on the performance.
19] Tests should run strictly serially. Parallel execution is virtually impossible (every single piece of code would need to be guarded against multithreading bugs). If something multithreaded has to be tested, a test should spawn such threads.
Running only recent tests automatically whenever program is started take little time and guards against bugs well.
I routinely do:
int main(void) {
#ifdef DEBUG
(void)run-recent-tests(
recent_in_seconds: 60 * 60 /* hour*/,
show_results_dialog_on_success: false);
#endif
...
If someone really, really needs parallel execution of tests, he could write his own test runner and blame himself for the problems.
20] Having a failing test and then running remaining ones should not be supported. Its a misfeature.
If someone really, really needs this, he could write his own test runner, plus probably overload the assert.
21] There could be support to also invoke tests from foreign C code, or to allow invocation of the tests from C code. This feels to me as a trivial thing.
22] Tests are expected to run at the start of main
. However, nothing stops one to invoke some of the tests at any moment, even interactively. This may help to deal with Heisenbugs.
Tests checking automatically against leaks would make this adventure less risky.
Multithreaded applications would need to protect the tests invoked in the middle of application run, but this is unavoidable.
23] People who do not like testing wouldn't be forced to use them, those who want their own unique testing framework could implement one. Ability to overload or overwrite assert
would be desirable in this case.
AFAIK no language known to me supports all the features, not even half of them. Most only pay lip service to testing, few stopped in the middle (e.g. language D allows easy definition of tests, but neither test parameters not custom test runners).
Some of proposed features were implemented in C++ and even in C. It is possible to write:
#ifdef DEBUG
TEST()
{
assert(2 + 2 == 4);
}
TEST()
{
assert(2 + 2 == 5);
}
#endif
It was also possible to implement whitebox testing too, but w/o compiler support it gets too clumsy.