This is an archive of the discontinued LLVM Phabricator instance.

Parse dotest.py/dosep.py output, providing general stats and skip reason breakdown by counts.
AbandonedPublic

Authored by tfiala on Aug 27 2015, 4:12 PM.

Download Raw Diff

Details

Reviewers

Summary

The new script can be run against output of dotest.py/dosep.py, with the primary benefit of providing a detailed breakdown of the number of test methods that skip for a given reason.

Output looks something like this:

Test Counts

success: 637
unexpected success: 14
failure: 1
expected failure: 41
skipped: 681

Skip Reasons
---- -------
requires on of darwin, macosx, ios: 520
debugserver tests: 68
skip on linux: 42
dsym tests: 20
benchmarks tests: 13
Skip this long running test: 8
requires Darwin: 2
The 'expect' program cannot be located, skip the test: 1
requires on of windows: 1
skipping because os:None compiler: None None arch: None: 1
-data-evaluate-expression doesn't work on globals: 1
MacOSX doesn't have a default thread name: 1
due to taking too long to complete.: 1
This test is only for LLDB.framework built 64-bit and !lldb.test_remote: 1
requires one of x86-64, i386, i686: 1

Use a flow like this:

cd {your_lldb_source}/test
python dosep.py -s --options "-q --executable /path/to/lldb -A {your_arch} -C {your_compiler_path}" 2>&1 | tee /tmp/test_output.log
python reports/dotest_stats.py -t /tmp/test_output.log

Diff Detail

Event Timeline

Note the output is formatted for whitespace and left justified, with the numeric columns all lined up. My verbatim mode above doesn't look like it did what I expected...

And, the file is actually in test/reports/, not reports/...

tfiala updated this revision to Diff 33375.Aug 27 2015, 4:17 PM

Any reason dotest / dosep cannot just print the stats?

Same question as Zachary. This sounds like a very useful feature and it would be nice to have it integrated into the current test system, instead of making another layer on top of that (it's bad enough we have dosep and dotest already). Parsing the output like this is likely to break the script due to random changes in the other parts.

In D12416#235112, @labath wrote:

Same question as Zachary. This sounds like a very useful feature and it would be nice to have it integrated into the current test system,

Well, I'd be happy to do that in dosep.py. There are a couple of challenges:

It requires information that is only provided when parsable output mode is invoked in dotest.py. So it mandates a particular output style that is not necessarily what everyone wants to see.

Does everyone always want this output from dosep.py? I agree with Zachary's earlier comment that having a bunch of options is more painful than not having them, particularly for things that everyone wants. But this can be verbose.

instead of making another layer on top of that (it's bad enough we have dosep and dotest already).

The existence of those two is at least partly due to the organic nature in which they developed. Some energy could be put into that to wrap them together unless dosep.py has grown wildly since I last touched it.

Parsing the output like this is likely to break the script due to random changes in the other parts.

That probably cuts both ways. Changing output of a test script (the output I'm adding) seems quite plausible to break other scripts that parse the output of the test infrastructure.

All that aside, if we did want to move this more firmly into dotest/dosep, it currently requires the data to be generated from dotest and collated in dosep. So it could change like so:

Dosep.py could be modified to collect the skip reason data if it detects its presence, and can collate and display if the skip reason info is there.

If the data isn't present, dosep.py can just skip presenting the skip reason report piece.

We can add a skip reason report suppression flag that skips printing out the skip reason in dosep.py if for some reason somebody doesn't want to see it. By default it would print skip reason tabulations when available unless this flag was specified. I see that as an optional task pending anybody really wanting/needing it.

Then the user requirement becomes "add so and such an option to your dotest.py invocation options and you'll get skip counts in your dosep.py output."

Seem reasonable?

Then the user requirement becomes "add so and such an option to your dotest.py invocation options and you'll get skip counts in your dosep.py output."

And the "add so and such an option" may already be there for many. Not a new dotest.py option, just to be clear. It's whatever turns on parseable output.

In D12416#235166, @tfiala wrote:

instead of making another layer on top of that (it's bad enough we have dosep and dotest already).

The existence of those two is at least partly due to the organic nature in which they developed.

I understand that. I tried to merge them when I initially started on the project, but failed miserably. One day we should still go ahead and do that, but there are more pressing issues now. I am just trying to make sure we don't end up with three things to merge :).

All that aside, if we did want to move this more firmly into dotest/dosep, it currently requires the data to be generated from dotest and collated in dosep. So it could change like so:

Dosep.py could be modified to collect the skip reason data if it detects its presence, and can collate and display if the skip reason info is there.

If the data isn't present, dosep.py can just skip presenting the skip reason report piece.

We can add a skip reason report suppression flag that skips printing out the skip reason in dosep.py if for some reason somebody doesn't want to see it. By default it would print skip reason tabulations when available unless this flag was specified. I see that as an optional task pending anybody really wanting/needing it.

Then the user requirement becomes "add so and such an option to your dotest.py invocation options and you'll get skip counts in your dosep.py output."

Seem reasonable?

Sounds great to me. :)

I would like to see a few things as long as we are chaning things:

get rid of dosep.py and just put the functionality into dotest.py if possible, it can just spawn itself in different modes
if we can't get rid of dosep.py lets make all options that work for dotest.py work in dosep.py
we should never have to launch dosep.py with any arguments in order for it to run correctly
add options for formatted (JSON) output and have dosep.py (or dotest.py if we move dosep.py functionality over there) enable that option when spawning subprocesses. We can then parse the output easily in the tools that spawns the sub processes and we can expand the format as needed. This would also allow for buildbots to translate the results of the testing into their preferred format for correct display for the buildbot web interfaces.
all functionality should exist in dotest.py (no external reporting python scripts...)

This revision now requires changes to proceed.Aug 28 2015, 9:18 AM

Seem reasonable?

Sounds great to me. :)

Okay, I'll give that a shot and we can see what that looks like.

In D12416#235191, @clayborg wrote:

I would like to see a few things as long as we are chaning things:

get rid of dosep.py and just put the functionality into dotest.py if possible, it can just spawn itself in different modes

I'm all for doing that. It's a bit of a bigger job, though. I'd like to tackle that separate from getting the skip reason tabulation in (and I'm happy to get skip tabulation into dosep.py now).

if we can't get rid of dosep.py lets make all options that work for dotest.py work in dosep.py

That makes sense. We should be able to build the options for dotest.py without requiring them to be passed through, and would aid in a transition to a dotest.py-only world.

we should never have to launch dosep.py with any arguments in order for it to run correctly

Hmm, would you say that you can currently run dotest.py without any arguments and it does an admirable job? I think at the very least we need to specify the architecture(s) we're testing, and the compiler specification on Linux is pretty critical per some other bugs we've discussed (not resolving symlinks before making decisions based on compiler name, for example). I'd shoot for saying dosep.py should match dotest.py in terms of what it needs as arguments, and independently tackling the items in dotest.py that currently require options to make them smarter. (i.e. the goal is command-line parity on dosep.py/dotest.py, only adding args to dosep.py for additional functionality that is related to the parallel test running).

add options for formatted (JSON) output and have dosep.py (or dotest.py if we move dosep.py functionality over there) enable that option when spawning subprocesses. We can then parse the output easily in the tools that spawns the sub processes and we can expand the format as needed. This would also allow for buildbots to translate the results of the testing into their preferred format for correct display for the buildbot web interfaces.

I like that. This could be done as a separate task.

all functionality should exist in dotest.py (no external reporting python scripts...)

I'm on board with that.

For now I'd like to just tackle the part of getting the reporting in dosep.py. I'm happy to take a crack at both (1) merging the command line options so dosep.py no longer requires the pass-through dotest.py options, and instead takes the options directly on the dosep.py command line, and (2) trying to eliminate dosep.py altogether.

(And I'd like to see our test harness get some more tests for itself!()

These are great longer-term roadmap items for the test infrastructure IMHO.

-Todd

I just put up http://reviews.llvm.org/D12587 to take care of merging the user experience of dosep.py and dotest.py into dotest.py. I want to knock that and another change out before coming back to the skip reason tallying here.

I'm going to redo this some time in the future based on the test event architecture.

Revision Contents

Path

Size

test/

reports/

dotest_stats.py

142 lines

Diff 33375

test/reports/dotest_stats.py

				"""
				Report stats on the test output from dosep.py/dotest.py,
				breaking down reported reasons for skipped tests.

				Here is a flow to run this report:
				cd {your_lldb_source_dir}/test
				python dosep.py -s --options "-q --executable /path/to/lldb -A {your_arch} \
				-C {your_compiler_path} 2>&1 \| tee /tmp/test_output.log
				python {path_to_this_script} -t /tmp/test_output.log
				"""

				import argparse
				import os.path
				import re


				def parse_options():
				parser = argparse.ArgumentParser(
				description='Collect stats on lldb test run trace output dir')
				parser.add_argument(
				'--trace-file', '-t', action='store', required=True,
				help='trace file to parse')
				parser.add_argument(
				'--verbose', '-v', action='store_true',
				help='produce verbose output during operation')
				return parser.parse_args()


				def validate_options(options):
				if not os.path.isfile(options.trace_file):
				print 'trace file "{}" does not exist'.format(options.trace_file)
				return False
				return True


				def process_skipped_test(options, line, match, skip_reasons):
				if len(match.groups()) > 0:
				key = match.group(1)
				else:
				print "*** unspecified skip reason on line:", line
				exit(1)
				key = 'unspecified'

				if key in skip_reasons:
				skip_reasons[key] += 1
				else:
				skip_reasons[key] = 1


				def parse_trace_output(options):
				skip_reasons = {}

				test_result_types = [
				{'value_key': 'test suites', 'regex': re.compile(r'^RESULT:.+$')},
				{'value_key': 'success', 'regex': re.compile(r'^PASS: LLDB.+$')},
				{'value_key': 'failure', 'regex': re.compile(r'^FAIL: LLDB.+$')},
				{'value_key': 'expected failure',
				'regex': re.compile(r'^XFAIL:.+$')},
				{'value_key': 'skipped',
				'regex': re.compile(r'^UNSUPPORTED:.+$([^$]+)[\)\s]*$'),
				'substats_func': process_skipped_test,
				'substats_dict_arg': skip_reasons},
				# Catch anything that didn't match the regex above but clearly
				# is unsupported.
				{'value_key': 'skipped',
				'regex': re.compile(r'^UNSUPPORTED:.+$'),
				'substats_func': process_skipped_test,
				'substats_dict_arg': skip_reasons},
				{'value_key': 'unexpected success',
				'regex': re.compile(r'^XPASS:.+$')}
				]

				early_termination_re = re.compile(r'^Ran \d+ test suites.*$')

				# Initialize count values for each type.
				counts = {}
				for tr_type in test_result_types:
				counts[tr_type['value_key']] = 0

				with open(options.trace_file, 'r') as trace_file:
				for line in trace_file:
				# Early termination condition - stop after test suite
				# counts are printed out so we don't double count fails
				# and other reported test entries.
				if early_termination_re.match(line):
				break

				for tr_type in test_result_types:
				match = tr_type['regex'].match(line)
				if match:
				counts[tr_type['value_key']] += 1
				if 'substats_func' in tr_type:
				tr_type['substats_func'](
				options, line, match, tr_type['substats_dict_arg'])
				break
				return (counts, skip_reasons)


				def print_counts(options, counts, skip_reasons):
				print 'Test Counts'
				print '---- ------'
				# Print entries parsed directly out of filenames.
				report_entries = [
				{'name': 'started', 'unit': 'file'},
				{'name': 'success', 'unit': 'method'},
				{'name': 'unexpected success', 'unit': 'method'},
				{'name': 'failure', 'unit': 'method'},
				{'name': 'expected failure', 'unit': 'method'},
				{'name': 'skipped', 'unit': 'method'},
				{'name': 'skipped.no-reason', 'unit': 'method'}
				]
				max_name_len = max(
				len(report_entry['name']) for report_entry in report_entries)
				format_str = '{:<' + str(max_name_len + 2) + '}{}'

				for report_entry in report_entries:
				if report_entry['name'] in counts:
				print format_str.format(
				report_entry['name'] + ':',
				counts[report_entry['name']])

				# Print computed entries.
				max_skip_reason_len = max(
				len(reason) for reason in skip_reasons.keys())
				reason_format_str = '{:<' + str(max_skip_reason_len + 2) + '}{}'

				print
				print 'Skip Reasons'
				print '---- -------'
				for reason_key in sorted(skip_reasons, key=skip_reasons.get, reverse=True):
				print reason_format_str.format(reason_key + ':', skip_reasons[reason_key])


				def main():
				options = parse_options()
				if not validate_options(options):
				exit(1)

				(counts, skip_reasons) = parse_trace_output(options)
				print_counts(options, counts, skip_reasons)

				main()