killmap
is an utility Java program that allows developers to perform
mutation-analysis whose output could then be used to perform mutation-based
fault localization. Although killmap
does not depend on the
Defects4J infrastructure and therefore
could be used by just executing java -jar killmap.jar ...
, the
scripts provided to automatise the mutation-analysis have been
designed and developed to to be executed on
Defects4J bugs.
ant -f build.xml jar
java -cp <bin directory with mutated classes>:<classpath of the project under test>:__killmap_directory__/bin/killmap-<version>.jar \
<triggering-tests> \
<relevant-tests> \
<partial-run> 2>err.txt | gzip > matrix.csv.gz
where:
<triggering-tests>
is the path to a file that contains a list of test methods that trigger (expose) the bug (one per row, see Defects4J documentation for more information)<relevant-tests>
is the path to a file that contains a list of relevant tests classes (one per row, see Defects4J documentation for more information)<partial-run>
is the path to a file that contains the output of a previous run of thekillmap
program (in case it did not finish or it was interrupt). This allowskillmap
to reuse information from a previous run, rather than having to re-run everything from scratch. To this end,<partial-run>
can contain any subset of lines from a previous run, in the order they were produced. Some examples:- The first time you run the
killmap
program, i.e., if there was no previous run,/dev/null
is a good choice. - If the program is interrupted halfway through, you can run
java -jar ... --partial-output <(zcat matrix.csv.gz) ... > matrix-2.csv.gz
which will reuse the results inmatrix.csv.gz
. - If you think something went wrong with a particular test-run, you can delete that line from the matrix and re-run the program, passing in that matrix; every test-result from the original run will be reused, except the deleted result, which will be re-calculated.
- The first time you run the
matrix.csv.gz
is a test-outcome matrix which is written to the stdout (where each line represents "the outcome of running a<test>
with a<mutant>
enabled") and has the following form:
<test case>,<mutant id>,<timeout>,<outcome>,<runtime>,<output hash>,<covered mutants>,<stack trace>
- The
mutant id
is either a positive integer (i.e., a real mutant id) or 0, meaning no mutant was enabled. - The
timeout
is the number of milliseconds allocated for the test case to run. - The
outcome
is PASS/FAIL/TIMEOUT/CRASH, describing the general type of outcome. - The
runtime
is the number of milliseconds the test case actually took to run. - The
output hash
is the concatenation of two SHA-1 hashes: one of whatever the test wrote to stdout, one of whatever it wrote to stderr. - The
covered mutants
is empty unless the "mutant id" column is 0. - The
stack trace
is the thrown exception's stack trace, if any. Leading/trailing whitespace is stripped, and any bunch of whitespace including a newline is replaced by a single space (i.e.,\s*\n\s*
is replaced with
Supposing you want to perform mutation-analysis on a specific bug of
Defects4J, e.g., Chart-1, a script
generate-matrix is provided to automatise the
analysis, i.e., to checkout Chart-1, compile and mutate it, and run the
killmap
program. Here is an example of how to run
generate-matrix script.
$ export DEFECTS4J_HOME=__defects4j_directory__
$ mkdir /tmp/killmap_example/
$ bash scripts/generate-matrix.sh \
Chart 1 /tmp/killmap_example/Chart-1b \
/tmp/killmap_example/mutants.txt 2>/tmp/killmap_example/err.txt | gzip > /tmp/killmap_example/matrix.csv.gz
where matrix.csv.gz
is a test-outcome matrix which is written to the stdout
and looks like:
org.jfree.chart.renderer.category.junit.AbstractCategoryItemRendererTests#test2947660,0,60000,FAIL,476,da39...709,1 11 12 13 ...,junit.framework.AssertionFailedError: ...
org.jfree.chart.renderer.category.junit.AbstractCategoryItemRendererTests#test2947660,1,952,FAIL,140,da39...709,,junit.framework.AssertionFailedError:
...
Assuming the message "Completed successfully!" appears at the end of err.txt
file, it means the mutation-analysis has finished successfully and no other
step or re-execution of the script is required. Otherwise, an incomplete
execution can be extended by executing the following command:
$ bash scripts/generate-matrix.sh \
--partial-output /tmp/killmap_example/matrix.csv.gz \
Chart 1 /tmp/killmap_example/Chart-1b \
/tmp/killmap_example/mutants.txt 2>/tmp/killmap_example/err.txt | gzip > /tmp/killmap_example/matrix-2.csv.gz
In order to combine both test-outcome matrices (i.e., matrix.csv.gz
and
matrix-2.gz
) into a single matrix, a script
killmap-combiner is provided and can be
executed as:
$ bash scripts/killmap-combiner.sh \
matrix.csv.gz matrix-2.csv.gz \
matrix-complete.csv.gz`
killmap.Main
determines the outcome of every test and run each one with
every (or no) mutant. It makes use of two optimisations:
- if test T does not cover mutant M, its outcome with M enabled will be the same as its outcome with no mutants enabled;
- if no triggering test changes because of M, the outcomes of passing tests with M enabled are irrelevant.
So, killmap.Main
needs to run every test with every mutant that (a) the test
covers and (b) at least one triggering test changes behaviour because of. To
do this, it:
- runs each failing test once with no mutants, then once for each mutant it covers, recording which mutants change its behaviour;
- runs each passing test once with no mutants, then once for each mutant it covers if that mutant changed the behaviour of any failing test.
And as it goes, it prints the result of every test-run.
Most importantly, the tests are all run in a subprocess "worker JVM" because tests might do nasty stuff like eat all the memory. From the perspective of the "host JVM" (the main process), running tests looks like this:
- (If necessary) Spawn a worker subprocess, by executing
java ... killmap.TestRunner ...
. Listen on a certain port for the worker to connect. - Over the socket, give the worker a "work order," consisting of a test to run, a mutant to enable, and a timeout.
- Wait for the worker to respond with the test-run's outcome.
If the worker ever fails to respond in step 3, the host kills it and, next
time a test needs to be run, it will spawn a new worker. All of this logic
lives in the RemoteTestRunner
class.
From the worker's perspective, running a test looks like this:
- Read a work order from the socket.
- Replace
System.out
andSystem.err
with dummy streams that can easily eat up infinite amounts of data (because some tests could print infinite amounts of data). - In a new thread, replace the thread's classloader with a fresh one (to isolate the effects of the impending test-run); enable the given mutant; then run the given test.
- If that thread returns before the timeout expires, take the returned outcome; otherwise, create an outcome meaning "timed out".
- Write that outcome to the socket.
Almost all of that logic lives in the TestRunner
. A little bit lives in
IsolatingClassLoader
and DeadEndDigestOutputStream
.
There are four kinds of outcome:
PASS
: the test completed in time, with all assertions passing.FAIL
: the test completed in time, but by raising an exception rather than passing.TIMEOUT
: the test didn't complete in time, but the worker JVM was able to terminate it and clean up successfully.CRASH
: everything else. The test must have done something nasty (e.g., raised anOutOfMemoryError
; made the worker completely unresponsive; refused to halt when the thread was interrupted).