Page MenuHomePhabricator

Add JUnit/XUnit-formatted output to the lldb test run system
ClosedPublic

Authored by tfiala on Sep 12 2015, 2:29 PM.

Details

Reviewers
zturner
clayborg
Summary

Add JUnit/XUnit-formatted output to the lldb test run system

h1. Test Events

h2. Overview

This change introduces a test-event-based mechanism and a manner to
format test events to an output file. The short-term motivation for
this was to support xUnit/Junit xml output format to be consumed by CI
systems (Jenkins in my case), which this change supports. The
longer-term motivation is to support moving away from a dependency
on stdout/stderr text scraping to get basic stats like test counts.

h2. Test Event Details

Test events capture test lifetime events such as test started, test
completion with status (success, failure, skip, error, unexpected
success, expected failure). More events may be added in the future
(see docs on TestResultsFormatter for some features I'd like to see
added in the future to provide a richer test reporting experience).

Currently one end-user formatter is provided out of the box:
test_results.XunitFormatter. This class handles writing events to a
JUnit-compliant .xml file, and also supports options to configure how
the user wants xfail and xpass results handled since these are
not within the XUnit vocabulary.

test_results.TestResultFormatter has class documentation that
further details the way that class processes test events.

h2. Options to support event-based test results formatting

A new "Test results options" section has been added to the dotest.py
options parsing. The new options are:

--results-file FILENAME

If specified, a TestResultsFormatter will be created and given
the specified file in which to write output. If the
--results-formatter method is not specified, then the default
will be to use the test_results.XunitFormatter class to output
test results in XUnit/JUnit format.

--results-formatter {FORMATTER-NAME}

If specified, contains the [package.]module.class name for a class
that derives from the test_results.TestResultsFormatter class. This
class is only used if either --results-file or --results-port is
specified. The results formatter defaults to the
test_results.XunitFormatter when the --results-file option is
specified, and defaults to test_results.RawPickledFormatter when
--results-port is specified. The latter is used to support collating
inferior dotest test events for the parallel test runner
process/thread and can be ignored for all intents and purposes.

This mechanism allows the community to add other test formatters,
possibly upstream or not, without requiring any changes to the
public top of tree LLDB. Just specify a different name here, and
as long as Python can find it (i.e. package/module names are found
via normal mechanims in the PYTHONPATH), the new formatter will
be used.

--results-port PORTNUMBER

If specified, the TestResultsFormatter class is given a socket opened
to IPV4 localhost:PORTNUMBER. Test results are streamed using the
test_results.RawPickedFormatter test formatter by default. Test
inferiors use this side-channel mechanism to get test events to the
parallel test runner reliably without depending on any kind of results
scraping.

--results-formatter-options="QUOTED-OPTIONS-FOR-FORMATTER"

Each TestResultsFormatter can support its own formatter options. They
are parsed with an argparse.ArgumentParser that is given arguments
provided by this option. To see the options available for a given
TestResultsFormatter, run the command you would want to use in
dotest.py and add --results-formatter-options="--help".

This will print out the arguments supported by the
TestResultsFormatter in force at the time of the call (including the
default formatter choice logic as dictated by specifying
--results-file/--results-port without an explicit
--results-formatter). As is standard when given a --help or -h
argument, the program will not execute further after printing
help/usage info.

e.g. the following command line will pass the --xpass MAPMODE
and --xfail MAPMODE arguments to the test_results.XunitFormatter
(see the XunitFormatter Options section for details on
the options used below).

python test/dotest.py --executable /some/path/to/lldb \
--results-file /tmp/test-results.xml \
--results-formatter-options="--xpass ignore --xfail success"

h2. XunitFormatter Options

By default, no options should be needed to do something reasonable
with any formatter. In the case of XunitFormatter output, there
are some mapping options that have some opinionated defaults:

--xpass MAPMODE
--xfail MAPMODE

MAPMODE can be one of success, failure, ignore, or passthru. This
option is needed because the JUnit output format has no concept of
expected failures and unexpected successes. Thus, to avoid breaking
JUnit/XUnit format readers that validate report files, we need to
decide how to handle xpass and xfail results. This argument
allows overriding the defaults.

The default for unexpected passes (xpass) is to mark them as JUnit
failures. The rationale is that a passing test that is marked as
expecting to fail represents an actionable item:

  1. It is passing everywhere and is misrepresented. Action:

remove the expected failure decoration.

  1. It is passing on some architectures. Action: remove the

expected failure decorator for the passing systems, and
mark the rest as expected failure.

  1. It is flaky test that is known to sometimes pass or fail, but is

difficult to make it work 100% of the time. Still, passing some of the
time is useful info and allows exercising the code path.

See TestResultsFormatter for thoughts on a future feature to aid with
the flaky case. We should have a full flaky test event mechanism that
has a flaky-success and flaky-fail result. We could then collect these
separately from xfail/xpass, and appropriately recognize they are not
expected to pass all the time. Further, we can start tracking failure
rates with an appropriate test results formatter/collection scheme,
and can be alerted if failure rates change for the worse with given
submission.

For expected failures (xfail), the default is to ignore the result in
xUnit output. This is to prevent the somewhat misleading scenario of
having a failing test counted as an xUnit "success", and to support
seeing passing test numbers increase as XFAIL tests are made to pass
on a given architecture.

The MAPMODE is interpreted as follows:

  • ignore: the given xpass/xfail is skipped. Default for --xfail
  • success: report as an xUnit success.
  • failure: report as an xUnit failure. Default for --xpass
  • passthru: a new JUnit/XUnit xml element is added to the testcase

element named 'expected-failure' or 'unexpected-success'.

The passthru option is meant to support schemes where the JUnit/
XUnit report collector is modified to handle these extra test
categories.

h2. Proof of concept - Jenkins Integration

I have been using the script below on a Linux Jenkins bot to build and
collect test results. (The OS X version is a bit different and uses
the Jenkins Xcode Integration plug-in to do the build). The unit
testing results are picked up with the "Publish JUnit test result
report" Post-build Action:

SOURCE_DIR=$HOME/work/mirror/git/llvm
BUILD_DIR=$WORKSPACE/build

# Configure (if not already)
if [ ! -d $BUILD_DIR ]; then
	mkdir -p $BUILD_DIR
    cd $BUILD_DIR
    CC=/usr/bin/clang-3.6 CXX=/usr/bin/clang++-3.6 cmake \
    -GNinja -DCMAKE_BUILD_TYPE=Debug -Wno-dev $SOURCE_DIR
    RESULT=$?
    if [ $RESULT -ne 0 ]; then
    	print "cmake failed"
        rm -rf $BUILD_DIR
        exit 1
    fi
fi

# Build (which will configure again if needed).
cd $BUILD_DIR
ninja
RESULT=$?
if [ $RESULT -ne 0]; then
	print "build failed"
    exit 2
fi

# Setup the test results dir.
TEST_RESULTS_DIR=$BUILD_DIR/test-results
mkdir -p $TEST_RESULTS_DIR

# Clear any existing test traces.
TRACE_DIR=$BUILD_DIR/lldb-test-traces
rm -rf $TRACE_DIR

# Run the tests.
python -u $SOURCE_DIR/tools/lldb/test/dotest.py \
--executable $BUILD_DIR/bin/lldb \
-s $TRACE_DIR \
--test-runner-name threading \
--results-file $TEST_RESULTS_DIR/lldb-all-tests.xml

I use another Post-build Action to archive the build/test-results
directory and the build/lldb-test-traces directory so I can
look back at them over time.

h1. Ctrl-C Loop Improvements

The Ctrl-C processing code has been refactored into a helper method
that can handle an arbitrary number of Ctrl-C presses and is shared
between the different test runner strategies. It simplifies the
potential future addition of different behavior on a different number
of Ctrl-C receptions. Note as before, the multiprocessing-pool and
threading-pool test runners do not support Ctrl-C. Also, I have
found that the inability to shut off interrupts for the
threading-based worker threads is not 100% bulletproof, although
it is serviceable.

Diff Detail

Event Timeline

tfiala updated this revision to Diff 34632.Sep 12 2015, 2:29 PM
tfiala retitled this revision from to Add JUnit/XUnit-formatted output to the lldb test run system.
tfiala updated this object.
tfiala added reviewers: clayborg, zturner.
tfiala added a subscriber: lldb-commits.

@zturner, if you could check that this runs on Windows, that would be great. (I've been able to test with Linux Jenkins and OS X Jenkins, in addition to hand running).

You should just need to run like this:

python /your/dotest.py {your-normal-options} --results-file {some-path-to-write-xunit-output.xml}

If that doesn't blow up on Windows, great. If it does, let me know where. One module being used now is asyncore, which has source and Stack Overflow responses that seem to imply it works on Windows, but differs from what at least one piece of the official documentation says. If so we'll have to revisit that somehow.

tfiala added inline comments.Sep 12 2015, 2:43 PM
test/settings/TestSettings.py
168

This was a test spec failure that I found when looking at skipped tests. It is unrelated to the main thrust of this change but it does represent a test that was erroneously being skipped on x86_64 runs.

A couple notes on this:

  • Failures don't currently capture backtraces. I can add this later, but I am leaving it for another pass. This change is big enough as is. Right now I'm focusing on counts and green/red CI solutions.
  • Tests don't capture testcase-specific stdout/stderr. This is too big a change for right now. We can get there but not until we stop depending on the stdout/stderr going to other places for the whole test run.

Skip reason is captured, as are bug numbers, exception types and exception messages, stuffed into relevant JUnit xml attributes.

zturner edited edge metadata.Sep 12 2015, 5:21 PM
zturner added a subscriber: zturner.

I can't check this until Monday. But drive by comment: Why do we need the
MAPMODE? Can't the JUnitFormatter just treat xpasses as whatever it wants
without the help of this option?

Also, I'm not really familiar with JUnit as a general tool. Is there some
reason a one-size-fits-all JSON formatter is insufficient?

Last question: Is the current stdout/stderr formatter re-written as a
second formatter plugin, or not?

tfiala added a comment.EditedSep 12 2015, 5:35 PM

I can't check this until Monday.

No worries, that is fine.

But drive by comment: Why do we need the
MAPMODE? Can't the JUnitFormatter just treat xpasses as whatever it wants
without the help of this option?

There are several CI solutions that have support for JUnit-style test results. It is more or less the ubiquitous test result format for many commercial and open source CI tools. JUnit output is XML-based. The CI front-ends for builds that read the JUnit/XUnit output will display and call out the number of passes and fails.

With the defaults, I have unexpected successes being marked as failures in the JUnit report, and expected failures ignored. These I think are valid (although opinionated) defaults. They satisfy the property that XFAILs are expected, don't indicate successes (so don't get counted as such), and don't generate actionable "red" in the JUnit output visualization tools built into build systems such as Jenkins, Bamboo, TeamBuilder, and other build systems that read in JUnit output. The options allow somebody to choose to map those values a different way. For example, I think it is reasonable to ignore unexpected successes once the truly stable passing ones have been weeded out and you're left with flaky tests that pass a high percentage of the time. (I am a proponent of adding the actual flaky category so we can keep those running but make no hard assumptions about pass/fail rates other than tracking them to see trends, which would add a flaky-success and flaky-failure test result).

Also, I'm not really familiar with JUnit as a general tool. Is there some
reason a one-size-fits-all JSON formatter is insufficient?

We absolutely can add a JSON formatter.

For existing CI solutions, though, JUnit is already exceptionally well supported, and having that be the format we write to as an option gives us immediate ability to report test results meaningfully in just about every CI build system on the planet.

Last question: Is the current stdout/stderr formatter re-written as a
second formatter plugin, or not?

Not yet. I see that as a future goal for us to take on, though. And when we solve that, we'll be able to properly capture per-test-method stdout/stderr in the test event stream. Which will again make the JUnit/XUnit consumption better, as well as any other output format we care to create/adapt.

clayborg accepted this revision.Sep 14 2015, 9:52 AM
clayborg edited edge metadata.

Looks good.

This revision is now accepted and ready to land.Sep 14 2015, 9:52 AM

@zturner, how does this look on Windows?

If you need a refresh with recent changes (there was at least one change outside to dosep.py), I can update the patch.

tfiala updated this revision to Diff 34779.Sep 14 2015, 10:18 PM
tfiala edited edge metadata.

Updated patch against svn r247664.

Sorry this took so long. Here's my first run:

Traceback (most recent call last):

File "D:/src/llvm/tools/lldb/test/dotest.py", line 1416, in <module>
  import dosep
File "D:\src\llvm\tools\lldb\test\dosep.py", line 48, in <module>
  import dotest_channels
File "D:\src\llvm\tools\lldb\test\dotest_channels.py", line 22, in

<module>

class CollectingReaderChannel(asyncore.file_dispatcher):

AttributeError: 'module' object has no attribute 'file_dispatcher'

I haven't looked into this at all yet, but thought I would post a
preliminary result as soon as I had one. Will look into it now

Sorry this took so long. Here's my first run:

No worries.

Traceback (most recent call last):

File "D:/src/llvm/tools/lldb/test/dotest.py", line 1416, in <module>
  import dosep
File "D:\src\llvm\tools\lldb\test\dosep.py", line 48, in <module>
  import dotest_channels
File "D:\src\llvm\tools\lldb\test\dotest_channels.py", line 22, in

<module>

class CollectingReaderChannel(asyncore.file_dispatcher):

AttributeError: 'module' object has no attribute 'file_dispatcher'

Okay. That's the part where asyncore docs and googling/SO differ on how much (if any) of asyncore works on Windows.

At the moment I can nuke the CollectingReaderChannel. I'll put up a diff. If that works, great. If not, we may still hit another part of asyncore that isn't supported, and that will take longer to work out a non-breaking solution. Back as soon as I have a change to try...

I haven't looked into this at all yet, but thought I would post a
preliminary result as soon as I had one. Will look into it now

If you check the python docs, it looks like you just can't use
asyncore.file_dispatcher and asyncore.file_wrapper. Everything else seems
ok. Just search the page for "Availability" and the only hits you get are
on those 2 fields, which says they are UNIX specific.

tfiala updated this revision to Diff 34834.Sep 15 2015, 2:07 PM

Strips out asyncore-based stdout/stderr dotest inferior handling. Unnecessary for the final implementation that moved the socket handling into the main thread/main process of the parallel test runner.

@zturner, can you give this a shot? Thanks!

This appears to work now. Thansk for working on this, I'm glad to see the
test suite finally get some love.

Thanks, Zachary!

I'll get this in now.

tfiala closed this revision.Sep 15 2015, 2:39 PM

Sending test/dosep.py
Sending test/dotest.py
Sending test/dotest_args.py
Adding test/dotest_channels.py
Sending test/settings/TestSettings.py
Adding test/test_results.py
Transmitting file data ......
Committed revision 247722.

I'm glad to see the test suite finally get some love.

Me too! It's about time :-)