Add JUnit/XUnit-formatted output to the lldb test run system
h1. Test Events
h2. Overview
This change introduces a test-event-based mechanism and a manner to
format test events to an output file. The short-term motivation for
this was to support xUnit/Junit xml output format to be consumed by CI
systems (Jenkins in my case), which this change supports. The
longer-term motivation is to support moving away from a dependency
on stdout/stderr text scraping to get basic stats like test counts.
h2. Test Event Details
Test events capture test lifetime events such as test started, test
completion with status (success, failure, skip, error, unexpected
success, expected failure). More events may be added in the future
(see docs on TestResultsFormatter for some features I'd like to see
added in the future to provide a richer test reporting experience).
Currently one end-user formatter is provided out of the box:
test_results.XunitFormatter. This class handles writing events to a
JUnit-compliant .xml file, and also supports options to configure how
the user wants xfail and xpass results handled since these are
not within the XUnit vocabulary.
test_results.TestResultFormatter has class documentation that
further details the way that class processes test events.
h2. Options to support event-based test results formatting
A new "Test results options" section has been added to the dotest.py
options parsing. The new options are:
--results-file FILENAME
If specified, a TestResultsFormatter will be created and given
the specified file in which to write output. If the
--results-formatter method is not specified, then the default
will be to use the test_results.XunitFormatter class to output
test results in XUnit/JUnit format.
--results-formatter {FORMATTER-NAME}
If specified, contains the [package.]module.class name for a class
that derives from the test_results.TestResultsFormatter class. This
class is only used if either --results-file or --results-port is
specified. The results formatter defaults to the
test_results.XunitFormatter when the --results-file option is
specified, and defaults to test_results.RawPickledFormatter when
--results-port is specified. The latter is used to support collating
inferior dotest test events for the parallel test runner
process/thread and can be ignored for all intents and purposes.
This mechanism allows the community to add other test formatters,
possibly upstream or not, without requiring any changes to the
public top of tree LLDB. Just specify a different name here, and
as long as Python can find it (i.e. package/module names are found
via normal mechanims in the PYTHONPATH), the new formatter will
be used.
--results-port PORTNUMBER
If specified, the TestResultsFormatter class is given a socket opened
to IPV4 localhost:PORTNUMBER. Test results are streamed using the
test_results.RawPickedFormatter test formatter by default. Test
inferiors use this side-channel mechanism to get test events to the
parallel test runner reliably without depending on any kind of results
scraping.
--results-formatter-options="QUOTED-OPTIONS-FOR-FORMATTER"
Each TestResultsFormatter can support its own formatter options. They
are parsed with an argparse.ArgumentParser that is given arguments
provided by this option. To see the options available for a given
TestResultsFormatter, run the command you would want to use in
dotest.py and add --results-formatter-options="--help".
This will print out the arguments supported by the
TestResultsFormatter in force at the time of the call (including the
default formatter choice logic as dictated by specifying
--results-file/--results-port without an explicit
--results-formatter). As is standard when given a --help or -h
argument, the program will not execute further after printing
help/usage info.
e.g. the following command line will pass the --xpass MAPMODE
and --xfail MAPMODE arguments to the test_results.XunitFormatter
(see the XunitFormatter Options section for details on
the options used below).
python test/dotest.py --executable /some/path/to/lldb \ --results-file /tmp/test-results.xml \ --results-formatter-options="--xpass ignore --xfail success"
h2. XunitFormatter Options
By default, no options should be needed to do something reasonable
with any formatter. In the case of XunitFormatter output, there
are some mapping options that have some opinionated defaults:
--xpass MAPMODE
--xfail MAPMODE
MAPMODE can be one of success, failure, ignore, or passthru. This
option is needed because the JUnit output format has no concept of
expected failures and unexpected successes. Thus, to avoid breaking
JUnit/XUnit format readers that validate report files, we need to
decide how to handle xpass and xfail results. This argument
allows overriding the defaults.
The default for unexpected passes (xpass) is to mark them as JUnit
failures. The rationale is that a passing test that is marked as
expecting to fail represents an actionable item:
- It is passing everywhere and is misrepresented. Action:
remove the expected failure decoration.
- It is passing on some architectures. Action: remove the
expected failure decorator for the passing systems, and
mark the rest as expected failure.
- It is flaky test that is known to sometimes pass or fail, but is
difficult to make it work 100% of the time. Still, passing some of the
time is useful info and allows exercising the code path.
See TestResultsFormatter for thoughts on a future feature to aid with
the flaky case. We should have a full flaky test event mechanism that
has a flaky-success and flaky-fail result. We could then collect these
separately from xfail/xpass, and appropriately recognize they are not
expected to pass all the time. Further, we can start tracking failure
rates with an appropriate test results formatter/collection scheme,
and can be alerted if failure rates change for the worse with given
submission.
For expected failures (xfail), the default is to ignore the result in
xUnit output. This is to prevent the somewhat misleading scenario of
having a failing test counted as an xUnit "success", and to support
seeing passing test numbers increase as XFAIL tests are made to pass
on a given architecture.
The MAPMODE is interpreted as follows:
- ignore: the given xpass/xfail is skipped. Default for --xfail
- success: report as an xUnit success.
- failure: report as an xUnit failure. Default for --xpass
- passthru: a new JUnit/XUnit xml element is added to the testcase
element named 'expected-failure' or 'unexpected-success'.
The passthru option is meant to support schemes where the JUnit/
XUnit report collector is modified to handle these extra test
categories.
h2. Proof of concept - Jenkins Integration
I have been using the script below on a Linux Jenkins bot to build and
collect test results. (The OS X version is a bit different and uses
the Jenkins Xcode Integration plug-in to do the build). The unit
testing results are picked up with the "Publish JUnit test result
report" Post-build Action:
SOURCE_DIR=$HOME/work/mirror/git/llvm BUILD_DIR=$WORKSPACE/build # Configure (if not already) if [ ! -d $BUILD_DIR ]; then mkdir -p $BUILD_DIR cd $BUILD_DIR CC=/usr/bin/clang-3.6 CXX=/usr/bin/clang++-3.6 cmake \ -GNinja -DCMAKE_BUILD_TYPE=Debug -Wno-dev $SOURCE_DIR RESULT=$? if [ $RESULT -ne 0 ]; then print "cmake failed" rm -rf $BUILD_DIR exit 1 fi fi # Build (which will configure again if needed). cd $BUILD_DIR ninja RESULT=$? if [ $RESULT -ne 0]; then print "build failed" exit 2 fi # Setup the test results dir. TEST_RESULTS_DIR=$BUILD_DIR/test-results mkdir -p $TEST_RESULTS_DIR # Clear any existing test traces. TRACE_DIR=$BUILD_DIR/lldb-test-traces rm -rf $TRACE_DIR # Run the tests. python -u $SOURCE_DIR/tools/lldb/test/dotest.py \ --executable $BUILD_DIR/bin/lldb \ -s $TRACE_DIR \ --test-runner-name threading \ --results-file $TEST_RESULTS_DIR/lldb-all-tests.xml
I use another Post-build Action to archive the build/test-results
directory and the build/lldb-test-traces directory so I can
look back at them over time.
h1. Ctrl-C Loop Improvements
The Ctrl-C processing code has been refactored into a helper method
that can handle an arbitrary number of Ctrl-C presses and is shared
between the different test runner strategies. It simplifies the
potential future addition of different behavior on a different number
of Ctrl-C receptions. Note as before, the multiprocessing-pool and
threading-pool test runners do not support Ctrl-C. Also, I have
found that the inability to shut off interrupts for the
threading-based worker threads is not 100% bulletproof, although
it is serviceable.
This was a test spec failure that I found when looking at skipped tests. It is unrelated to the main thrust of this change but it does represent a test that was erroneously being skipped on x86_64 runs.