This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lldb/test/API/
-
test/
-
API/
3
lldbtest.py

Differential D127258

[lldb] Parse the dotest output to determine the most appropriate result code
ClosedPublic

Authored by JDevlieghere on Jun 7 2022, 3:28 PM.

Download Raw Diff

Details

Reviewers

aprantl
mib
kastiglione

Commits

rGd3202a592317: [lldb] Parse the dotest output to determine the most appropriate result code

Summary

Currently we look for keywords in the dotest.py output to determine the result code. By parsing the dotest output and looking at the number of tests that passed/failed/were skipped etc we can be more structured in determining the appropriate lit result code. We're still mapping multiple tests to one result code, so some loss of information is inevitable, but I think the new approach is much more sane.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

JDevlieghere created this revision.Jun 7 2022, 3:28 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2022, 3:28 PM

JDevlieghere requested review of this revision.Jun 7 2022, 3:28 PM

Harbormaster completed remote builds in B168423: Diff 434967.Jun 7 2022, 3:31 PM

Makes more sense. LGTM!

This revision is now accepted and ready to land.Jun 7 2022, 3:35 PM

kastiglione accepted this revision.Jun 7 2022, 3:55 PM

I was in the process of writing a comment saying that we could be even more accurate by parsing the number of tests with a particular result code from the dotest output but that that seemed overkill before I changed my mind mid-sentence and implemented that instead.

Harbormaster completed remote builds in B168433: Diff 434984.Jun 7 2022, 4:11 PM

Even better! LGTM!

JDevlieghere retitled this revision from [lldb] Mark API tests as XFAIL if they have expected failures and no passing tests to [lldb] Parse the dotest output to determine the most appropriate result code.Jun 7 2022, 4:13 PM

JDevlieghere edited the summary of this revision. (Show Details)

Here are the results:

Before:

  Unsupported: 145
  Passed     : 873
  Failed     :   7

After:

  Unsupported      : 167
  Passed           : 834
  Expectedly Failed:  16
  Unresolved       :   1
  Failed           :   7

Note that this patch does change the behavior where passes and failures are no longer prioritized. That means that if you have a test that has 3 passes and 4 skips, it will be reported as unsupported (while currently it's considered a pass). We could special case PASS and FAIL so that if there's at least on PASS or FAIL the test is marked as such. That would be more in line with the current state of things.

That means that if you have a test that has 3 passes and 4 skips, it will be reported as unsupported (while currently it's considered a pass).

My initial reaction to this is that I don't love it. I think N passes and M skips is a pass, even if M>N.

lldb/test/API/lldbtest.py
91	previously `out` was also being searched, I take it that's not actually needed?
115	you could use `operator.itemgetter(0)` instead of the lambda.

instead of reducing and picking returning a single result, can we return the raw counts and then report the totals of the counts?

Mark the test as FAIL or PASS if there's at least one test that respectively failed or passed.

In D127258#3565279, @kastiglione wrote:

instead of reducing and picking returning a single result, can we return the raw counts and then report the totals of the counts?

No, the lit test format does not support that.

Harbormaster completed remote builds in B168580: Diff 435170.Jun 8 2022, 8:13 AM

kastiglione accepted this revision.Jun 8 2022, 8:36 AM

kastiglione added inline comments.

lldb/test/API/lldbtest.py
122	you could alternatively use `max(lit_results, key=...)`

Closed by commit rGd3202a592317: [lldb] Parse the dotest output to determine the most appropriate result code (authored by JDevlieghere). · Explain WhyJun 8 2022, 8:58 AM

This revision was automatically updated to reflect the committed changes.

JDevlieghere added a commit: rGd3202a592317: [lldb] Parse the dotest output to determine the most appropriate result code.

Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2022, 8:58 AM

Revision Contents

Path

Size

lldb/

test/

API/

lldbtest.py

50 lines

Diff 435199

lldb/test/API/lldbtest.py

from __future__ import absolute_import		from __future__ import absolute_import
import os		import os
import tempfile		import re
import subprocess		import operator
import sys
import platform

import lit.Test		import lit.Test
import lit.TestRunner		import lit.TestRunner
import lit.util		import lit.util
from lit.formats.base import TestFormat		from lit.formats.base import TestFormat


class LLDBTest(TestFormat):		class LLDBTest(TestFormat):
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	def execute(self, test, litConfig):
if out:		if out:
output += """Command Output (stdout):\n--\n%s\n--\n""" % (out,)		output += """Command Output (stdout):\n--\n%s\n--\n""" % (out,)
if err:		if err:
output += """Command Output (stderr):\n--\n%s\n--\n""" % (err,)		output += """Command Output (stderr):\n--\n%s\n--\n""" % (err,)

if timeoutInfo:		if timeoutInfo:
return lit.Test.TIMEOUT, output		return lit.Test.TIMEOUT, output

if exitCode:		# Parse the dotest output from stderr.
if 'XPASS:' in out or 'XPASS:' in err:		result_regex = r"\((\d+) passes, (\d+) failures, (\d+) errors, (\d+) skipped, (\d+) expected failures, (\d+) unexpected successes\)"
return lit.Test.XPASS, output		results = re.search(result_regex, err)
		kastiglioneUnsubmitted Not Done Reply Inline Actions previously `out` was also being searched, I take it that's not actually needed? kastiglione: previously `out` was also being searched, I take it that's not actually needed?

# Otherwise this is just a failure.
return lit.Test.FAIL, output

has_unsupported_tests = 'UNSUPPORTED:' in out or 'UNSUPPORTED:' in err
has_passing_tests = 'PASS:' in out or 'PASS:' in err
if has_unsupported_tests and not has_passing_tests:
return lit.Test.UNSUPPORTED, output

passing_test_line = 'RESULT: PASSED'		# If parsing fails mark this test as unresolved.
if passing_test_line not in out and passing_test_line not in err:		if not results:
return lit.Test.UNRESOLVED, output		return lit.Test.UNRESOLVED, output

		passes = int(results.group(1))
		failures = int(results.group(2))
		errors = int(results.group(3))
		skipped = int(results.group(4))
		expected_failures = int(results.group(5))
		unexpected_successes = int(results.group(6))

		if exitCode:
		# Mark this test as FAIL if at least one test failed.
		if failures > 0:
		return lit.Test.FAIL, output
		lit_results = [(failures, lit.Test.FAIL),
		(errors, lit.Test.UNRESOLVED),
		(unexpected_successes, lit.Test.XPASS)]
		else:
		# Mark this test as PASS if at least one test passed.
		if passes > 0:
return lit.Test.PASS, output		return lit.Test.PASS, output
		lit_results = [(passes, lit.Test.PASS),
		kastiglioneUnsubmitted Not Done Reply Inline Actions you could use `operator.itemgetter(0)` instead of the lambda. kastiglione: you could use `operator.itemgetter(0)` instead of the lambda.
		(skipped, lit.Test.UNSUPPORTED),
		(expected_failures, lit.Test.XFAIL)]

		# Return the lit result code with the maximum occurrence. Only look at
		# the first element and rely on the original order to break ties.
		return max(lit_results, key=operator.itemgetter(0))[1], output
		kastiglioneUnsubmitted Not Done Reply Inline Actions you could alternatively use `max(lit_results, key=...)` kastiglione: you could alternatively use `max(lit_results, key=...)`