Mutation testing

Mutation testing tests the test

Testing is about assessing the quality of a product. Testing itself also has products, for example automated unit tests. Can the products of testing also be tested? Certainly! And they should be tested! But how can you verify if a test is good? There are multiple ways. A review of the tests is definitely a good quality measure. By executing the tests, can you determine whether the tests are working? And how can you see if the tests are complete? To some extent, this can be done by mutation testing.

 

Mutation testing alters the original slightly to see if this is notified by the tests

Definition

Mutation testing is a type of testing where certain statements in the source code are changed (mutated) to check if test cases will identify the fault that was introduced this way. This is a manner to verify the quality of the test set (instead of the test object).

What is mutation testing?

At its core, mutation testing is the practice of executing a test, or a set of tests, over many versions of the software under test—the so-called mutants. Each version of the software under test has different faults deliberately and programmatically injected. Each testing iteration is conducted on a slightly different version of the software under test, with different faults injected based on heuristics, or "rules of thumb" that correspond to common faults. These versions are referred to as mutants, in the context that they are small variations. The purpose of the testing is usually to determine which faults are detected by the tests. [Smith 2020]

Mutation testing is a useful way to check if common faults would be detected by the test set and this way helps to improve a test set in case the injected faults would not be detected. It is however no replacement for proper test design.

Mutation testing is also known as fault seeding (or, wrongly, error seeding).

How does mutation testing work?

Write a piece of code. Write tests for that code. Execute the tests. If all tests pass the first version of the test, the test is ready.

Create a mutant in the code and execute the tests. If a test fails, the mutant is detected and the tests are OK. If none of the tests fail, the mutant is not detected; the tests are then incomplete and an extra test needs to be added or the code must be refactored.

As with any activity in DevOps, we strive to automate as much of this process as possible. Good mutation testing tools are available. Keep in mind however that the analysis of mutants that are not detected often requires human action.

Example

In case the IT system contains a check that someone should be at least 120 cm long to be admitted, the example pseudo-code is:

IF length ≥ 120 cm     THEN MOVE "admitted" TO status     ELSE MOVE "not admitted" TO status ENDIF

With 2-value boundary value analysis the test cases would be:

     TC1: input – length = 119, expected outcome – "not admitted"
     TC2: input – length = 120, expected outcome – "admitted"

Both test cases will pass for the pseudocode above.

Now the IF statement in the pseudocode is changed with a common mutant (changing "greater-than-or equal" to "greater-than"):

IF length > 120 cm

For this mutant TC1 will still pass, but TC2 will fail because the actual output will be "not admitted".

Now the IF statement in the pseudocode is changed with another common mutant:

IF length = 120 cm

For this mutant TC1 and TC2 will both pass. The mutation test has detected a fault in the test set that can be fixed by adding a third test case:

     TC3: input – length = 121, expected outcome – "admitted"

How does this relate to test coverage?  

In the case of unit testing, an often-used metric is the coverage of the test (for more information see section "Code coverage"). In the example above, TC1 and TC2 would result in 100%-line coverage, 100%-statement coverage, 100%-branch coverage and 100%-decision coverage. All coverage types that are usually measured in unit testing would have indicated the test as OK. Still, a simple mutant already reveals an incompleteness. Code coverage can therefore be a useful indicator, but no guarantee for a good test. Mutation testing is a very useful technique for understanding the quality of the test.  

Source:
[Smith 2020] Chasing Mutants (chapter in The Future of Software Quality Assurance), Adam Leon Smith, 2020, Springer, ISBN 978-3-030-29509-7.