Download Raw Diff

Details

Reviewers

yln
jdenny

Commits

rG3b3cdcc7a557: [lit] Remove ANSI control characters from xunit output

Summary

Failing test output sometimes contains control characters like \x1b (e.g.
if there was some -fcolor-diagnostics output) which are not allowed inside
XML files. This causes problems with CI systems: for example, the Jenkins
JUnit XML will throw an exception when ecountering those characters and
similar problems also occur with GitLab CI.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

arichardson created this revision.Jul 21 2020, 5:20 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2020, 5:20 AM

Herald added subscribers: llvm-commits, delcypher. · View Herald Transcript

Harbormaster failed remote builds in B65065: Diff 279491!Jul 21 2020, 6:09 AM

fix tests

Harbormaster failed remote builds in B65246: Diff 279863!Jul 22 2020, 9:56 AM

Thanks for adding a test, we definitely want to keep it!

Implementation: do you think that this is the best approach to solving this issue? Have you considered other options?
Is this a new issue or was this caused by my "report generators" refactoring?

yln added inline comments.Jul 22 2020, 12:21 PM

llvm/utils/lit/lit/reports.py
130–138	I just spotted a bug in this existing code, `decode()` returns a new string. It should be: output = output.decode('utf-8', 'ignore') Can you check if this would resolve your issue?

In D84233#2167650, @yln wrote:

Thanks for adding a test, we definitely want to keep it!

Implementation: do you think that this is the best approach to solving this issue? Have you considered other options?
Is this a new issue or was this caused by my "report generators" refactoring?

No this issue has been around for a while, I originally added a workaround to our fork in 2018 (https://github.com/CTSRD-CHERI/llvm-project/commit/3179f32c07f224c96d33f485747f508a34106c91).
I'm currently trying to reduce our diff to upstream and cleaning up local workarounds for upstreaming.

llvm/utils/lit/lit/reports.py
130–138	Good catch. But I don't believe this fixes the issue since those characters are valid utf-8 characters just not ones that the Jenkins JUnit XML parser accepts.

Fix missing assignment

Harbormaster failed remote builds in B65348: Diff 280056!Jul 23 2020, 3:09 AM

@arichardson: can you double-check that this workaround is still needed?
Do we understand the semantics of CDATA blocks? I was under the impression we use it here to avoid problems like this.

Anyways, I am fine with this. Adding Joel as a second reviewer to get his feedback before accepting.

@jdenny:
Do you have opinions on this? Is llvm-lit responsible for this (or should the fix be in Jenkins/Jenkin's config)?
Any other reviewers we should include?

yln added a reviewer: jdenny.Jul 23 2020, 11:43 AM

In D84233#2170305, @yln wrote:

@arichardson: can you double-check that this workaround is still needed?
Do we understand the semantics of CDATA blocks? I was under the impression we use it here to avoid problems like this.

Anyways, I am fine with this. Adding Joel as a second reviewer to get his feedback before accepting.

I believe CDATA just avoids the need for escape XML special characters. However, characters 0-0x20 (with the exception of \t \r an \n are not valid anywhere in the document according to the XML spec (https://www.w3.org/TR/xml/#charsets):
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

XML 1.1 seems to relax that an allow everything except NUL: https://www.w3.org/TR/xml11/#charsets

Maybe specifying version 1.1 for the XUnit output would make the Java parsers happy again, but escaping ANSI control characters might also be useful if you open the report XML file in a text editor.

Thanks for working on this. I've been wanting to see this fixed too. I agree that lit is the culprit.

In D84233#2171536, @arichardson wrote:

Maybe specifying version 1.1 for the XUnit output would make the Java parsers happy again, but escaping ANSI control characters might also be useful if you open the report XML file in a text editor.

I'm not sure which fix is best:

The current patch: If I understand correctly, the current patch's escaping for XML 1.0 is not reversible: the escape sequences are indistinguishable from any existing occurrences of the two-character sequence \x.
Drop control characters instead: That would make the output more readable. It loses even more information, but that information's value is questionable.
Use a reversible escape sequence: This doesn't seem worth it and might make simple output harder to read.
XML 1.1: It looks like that would avoid the above issues (I think it's best to just drop nul bytes). Does anyone know how widely supported XML 1.1 is today? A quick google search turned up info from a decade ago saying it was not widely supported then.

If we cannot come to a conclusion soon, I think the current patch is an improvement, so we should commit it and then consider possible improvements.

llvm/utils/lit/lit/reports.py
72	Please add a comment here citing the relevant part of the XML 1.0 spec.
134	I had similar trouble with Gitlab CI around the end of last year. Maybe the main point this comment should make is that the XML is invalid without this escaping, and then the comment can mention CI failures as a practical consequence.
llvm/utils/lit/tests/shtest-format.py
35	Is this line intentional?

Improve comments
Remove unnecessary line from test

Harbormaster failed remote builds in B65795: Diff 280841!Jul 27 2020, 4:16 AM

Other than the nit I just added about comment duplication, this LGTM. Thanks. However, let's a wait a couple of days in case anyone else has an opinion.

I still wonder whether it would be more helpful to drop control characters rather than irreversibly escape them. Dropping them makes the output more readable. Reversibly escaping them makes it possible to feed the text to a tool that can render the control codes. Irreversibly escaping them doesn't achieve either of those advantages, and I'm not sure if it serves a real use case. Any comments on this question?

llvm/utils/lit/lit/reports.py
140	You indeed did what I asked, but I didn't consider that we would end up with duplication of some comments at the caller and callee. I think it would be better to consolidate these comments with the ones you just inserted into the `escape_invalid_xml_chars` definition, and place the consolidated version as header comments on `escape_invalid_xml_chars` and nowhere else.

This revision is now accepted and ready to land.Jul 27 2020, 10:27 AM

In D84233#2176397, @jdenny wrote

I still wonder whether it would be more helpful to drop control characters rather than irreversibly escape them. Dropping them makes the output more readable. Reversibly escaping them makes it possible to feed the text to a tool that can render the control codes. Irreversibly escaping them doesn't achieve either of those advantages, and I'm not sure if it serves a real use case. Any comments on this question?

I agree with Joel here and would like to see this done as part of this patch.

yln added inline comments.Jul 28 2020, 10:18 AM

llvm/utils/lit/lit/reports.py
76	Should this be a global to avoid building the dict on every call to this function?

arichardson added inline comments.Jul 28 2020, 2:07 PM

llvm/utils/lit/lit/reports.py
76	Yes, this could be called many times so that definitely makes sense. Will update the patch tomorrow.

Remove instead of escaping. This avoids a py2 -> py3 difference
fix comments

arichardson retitled this revision from [lit] Escape ANSI control character in xunit output to [lit] Remove ANSI control character from xunit output.Aug 3 2020, 3:57 AM

arichardson edited the summary of this revision. (Show Details)

fix python2 issue

remove debug code

Harbormaster completed remote builds in B66749: Diff 282570.Aug 3 2020, 4:37 AM

Harbormaster completed remote builds in B66751: Diff 282572.Aug 3 2020, 5:02 AM

Harbormaster completed remote builds in B66750: Diff 282571.Aug 3 2020, 5:12 AM

LGTM with two small nits. Thanks!

llvm/utils/lit/lit/reports.py
71	This is now an `_invalid_xml_chars_dict`
83	Why reassign `s`? Just `return`.

arichardson added inline comments.Aug 3 2020, 11:11 AM

llvm/utils/lit/lit/reports.py
83	Leftover from when I had debug statements there. Will remove before committing.

fix variable name and remove temporary

Harbormaster completed remote builds in B67271: Diff 283515.Aug 6 2020, 1:05 AM

This revision was landed with ongoing or failed builds.Aug 6 2020, 1:17 AM

Closed by commit rG3b3cdcc7a557: [lit] Remove ANSI control characters from xunit output (authored by arichardson). · Explain Why

This revision was automatically updated to reflect the committed changes.

arichardson added a commit: rG3b3cdcc7a557: [lit] Remove ANSI control characters from xunit output.

Diff 283520

llvm/utils/lit/lit/reports.py

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	def write_results(self, tests, elapsed):

tests_data.append(test_data)		tests_data.append(test_data)

with open(self.output_file, 'w') as file:		with open(self.output_file, 'w') as file:
json.dump(data, file, indent=2, sort_keys=True)		json.dump(data, file, indent=2, sort_keys=True)
file.write('\n')		file.write('\n')


		_invalid_xml_chars_dict = {c: None for c in range(32) if chr(c) not in ('\t', '\n', '\r')}
		ylnUnsubmitted Done Reply Inline Actions This is now an `_invalid_xml_chars_dict` yln: This is now an `_invalid_xml_chars_dict`

		jdennyUnsubmitted Done Reply Inline Actions Please add a comment here citing the relevant part of the XML 1.0 spec. jdenny: Please add a comment here citing the relevant part of the XML 1.0 spec.

		def remove_invalid_xml_chars(s):
		# According to the XML 1.0 spec, control characters other than
		# \t,\r, and \n are not permitted anywhere in the document
		ylnUnsubmitted Done Reply Inline Actions Should this be a global to avoid building the dict on every call to this function? yln: Should this be a global to avoid building the dict on every call to this function?
		arichardsonAuthorUnsubmitted Done Reply Inline Actions Yes, this could be called many times so that definitely makes sense. Will update the patch tomorrow. arichardson: Yes, this could be called many times so that definitely makes sense. Will update the patch…
		# (https://www.w3.org/TR/xml/#charsets) and therefore this function
		# removes them to produce a valid XML document.
		#
		# Note: In XML 1.1 only \0 is illegal (https://www.w3.org/TR/xml11/#charsets)
		# but lit currently produces XML 1.0 output.
		return s.translate(_invalid_xml_chars_dict)

		ylnUnsubmitted Done Reply Inline Actions Why reassign `s`? Just `return`. yln: Why reassign `s`? Just `return`.
		arichardsonAuthorUnsubmitted Done Reply Inline Actions Leftover from when I had debug statements there. Will remove before committing. arichardson: Leftover from when I had debug statements there. Will remove before committing.

class XunitReport(object):		class XunitReport(object):
def __init__(self, output_file):		def __init__(self, output_file):
self.output_file = output_file		self.output_file = output_file
self.skipped_codes = {lit.Test.EXCLUDED,		self.skipped_codes = {lit.Test.EXCLUDED,
lit.Test.SKIPPED, lit.Test.UNSUPPORTED}		lit.Test.SKIPPED, lit.Test.UNSUPPORTED}

def write_results(self, tests, elapsed):		def write_results(self, tests, elapsed):
tests.sort(key=by_suite_and_test_path)		tests.sort(key=by_suite_and_test_path)
Show All 29 Lines	def _write_test(self, file, test, suite_name):
class_name=quo(class_name), name=quo(name), time=time))		class_name=quo(class_name), name=quo(name), time=time))

if test.isFailure():		if test.isFailure():
file.write('>\n <failure><![CDATA[')		file.write('>\n <failure><![CDATA[')
# In the unlikely case that the output contains the CDATA		# In the unlikely case that the output contains the CDATA
# terminator we wrap it by creating a new CDATA block.		# terminator we wrap it by creating a new CDATA block.
output = test.result.output.replace(']]>', ']]]]><![CDATA[>')		output = test.result.output.replace(']]>', ']]]]><![CDATA[>')
if isinstance(output, bytes):		if isinstance(output, bytes):
output.decode("utf-8", 'ignore')		output = output.decode("utf-8", 'ignore')

		# Failing test output sometimes contains control characters like
		# \x1b (e.g. if there was some -fcolor-diagnostics output) which are
		# not allowed inside XML files.
		jdennyUnsubmitted Done Reply Inline Actions I had similar trouble with Gitlab CI around the end of last year. Maybe the main point this comment should make is that the XML is invalid without this escaping, and then the comment can mention CI failures as a practical consequence. jdenny: I had similar trouble with Gitlab CI around the end of last year. Maybe the main point this…
		# This causes problems with CI systems: for example, the Jenkins
		# JUnit XML will throw an exception when ecountering those
		# characters and similar problems also occur with GitLab CI.
		output = remove_invalid_xml_chars(output)
		ylnUnsubmitted Done Reply Inline Actions I just spotted a bug in this existing code, `decode()` returns a new string. It should be: output = output.decode('utf-8', 'ignore') Can you check if this would resolve your issue? yln: I just spotted a bug in this existing code, `decode()` returns a new string. It should be: ```…
		arichardsonAuthorUnsubmitted Done Reply Inline Actions Good catch. But I don't believe this fixes the issue since those characters are valid utf-8 characters just not ones that the Jenkins JUnit XML parser accepts. arichardson: Good catch. But I don't believe this fixes the issue since those characters are valid utf-8…
file.write(output)		file.write(output)
file.write(']]></failure>\n</testcase>\n')		file.write(']]></failure>\n</testcase>\n')
		jdennyUnsubmitted Done Reply Inline Actions You indeed did what I asked, but I didn't consider that we would end up with duplication of some comments at the caller and callee. I think it would be better to consolidate these comments with the ones you just inserted into the `escape_invalid_xml_chars` definition, and place the consolidated version as header comments on `escape_invalid_xml_chars` and nowhere else. jdenny: You indeed did what I asked, but I didn't consider that we would end up with duplication of…
elif test.result.code in self.skipped_codes:		elif test.result.code in self.skipped_codes:
reason = self._get_skip_reason(test)		reason = self._get_skip_reason(test)
# file.write(f'>\n <skipped message={quo(reason)}/>\n</testcase>\n')		# file.write(f'>\n <skipped message={quo(reason)}/>\n</testcase>\n')
file.write('>\n <skipped message={reason}/>\n</testcase>\n'.format(		file.write('>\n <skipped message={reason}/>\n</testcase>\n'.format(
reason=quo(reason)))		reason=quo(reason)))
else:		else:
file.write('/>\n')		file.write('/>\n')

▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/utils/lit/tests/Inputs/shtest-format/external_shell/fail_with_control_chars.txt

This file was added.

				# Run a command that fails and prints control characters on stdout.
				# This tests checks that the xunit output correctly escapes them in the XML.
				#
				# RUN: %{python} %S/write-control-chars.py

llvm/utils/lit/tests/Inputs/shtest-format/external_shell/write-control-chars.py

This file was added.

				#!/usr/bin/env python

				from __future__ import print_function
				import sys

				print("a line with \x1b[2;30;41mcontrol characters\x1b[0m.")
				sys.exit(1)

llvm/utils/lit/tests/shtest-format.py

	Show All 21 Lines

	# CHECK: FAIL: shtest-format :: external_shell/fail_with_bad_encoding.txt			# CHECK: FAIL: shtest-format :: external_shell/fail_with_bad_encoding.txt
	# CHECK-NEXT: * TEST 'shtest-format :: external_shell/fail_with_bad_encoding.txt' FAILED *			# CHECK-NEXT: * TEST 'shtest-format :: external_shell/fail_with_bad_encoding.txt' FAILED *
	# CHECK: Command Output (stdout):			# CHECK: Command Output (stdout):
	# CHECK-NEXT: --			# CHECK-NEXT: --
	# CHECK-NEXT: a line with bad encoding:			# CHECK-NEXT: a line with bad encoding:
	# CHECK: --			# CHECK: --

				# CHECK: FAIL: shtest-format :: external_shell/fail_with_control_chars.txt
				# CHECK-NEXT: * TEST 'shtest-format :: external_shell/fail_with_control_chars.txt' FAILED *
				# CHECK: Command Output (stdout):
				# CHECK-NEXT: --
				# CHECK-NEXT: a line with {{.}}control characters{{.}}.
				# CHECK: --
				jdennyUnsubmitted Done Reply Inline Actions Is this line intentional? jdenny: Is this line intentional?

	# CHECK: PASS: shtest-format :: external_shell/pass.txt			# CHECK: PASS: shtest-format :: external_shell/pass.txt

	# CHECK: FAIL: shtest-format :: fail.txt			# CHECK: FAIL: shtest-format :: fail.txt
	# CHECK-NEXT: * TEST 'shtest-format :: fail.txt' FAILED *			# CHECK-NEXT: * TEST 'shtest-format :: fail.txt' FAILED *
	# CHECK-NEXT: Script:			# CHECK-NEXT: Script:
	# CHECK-NEXT: --			# CHECK-NEXT: --
	# CHECK-NEXT: printf "line 1			# CHECK-NEXT: printf "line 1
	# CHECK-NEXT: false			# CHECK-NEXT: false
	Show All 25 Lines
	# CHECK: XFAIL: shtest-format :: xfail.txt			# CHECK: XFAIL: shtest-format :: xfail.txt
	# CHECK: XPASS: shtest-format :: xpass.txt			# CHECK: XPASS: shtest-format :: xpass.txt
	# CHECK-NEXT: * TEST 'shtest-format :: xpass.txt' FAILED *			# CHECK-NEXT: * TEST 'shtest-format :: xpass.txt' FAILED *
	# CHECK-NEXT: Script			# CHECK-NEXT: Script
	# CHECK-NEXT: --			# CHECK-NEXT: --
	# CHECK-NEXT: true			# CHECK-NEXT: true
	# CHECK-NEXT: --			# CHECK-NEXT: --

	# CHECK: Failed Tests (3)			# CHECK: Failed Tests (4)
	# CHECK: shtest-format :: external_shell/fail.txt			# CHECK: shtest-format :: external_shell/fail.txt
	# CHECK: shtest-format :: external_shell/fail_with_bad_encoding.txt			# CHECK: shtest-format :: external_shell/fail_with_bad_encoding.txt
				# CHECK: shtest-format :: external_shell/fail_with_control_chars.txt
	# CHECK: shtest-format :: fail.txt			# CHECK: shtest-format :: fail.txt

	# CHECK: Unexpectedly Passed Tests (1)			# CHECK: Unexpectedly Passed Tests (1)
	# CHECK: shtest-format :: xpass.txt			# CHECK: shtest-format :: xpass.txt

	# CHECK: Testing Time:			# CHECK: Testing Time:
	# CHECK: Unsupported : 4			# CHECK: Unsupported : 4
	# CHECK: Passed : 6			# CHECK: Passed : 6
	# CHECK: Expectedly Failed : 4			# CHECK: Expectedly Failed : 4
	# CHECK: Unresolved : 3			# CHECK: Unresolved : 3
	# CHECK: Failed : 3			# CHECK: Failed : 4
	# CHECK: Unexpectedly Passed: 1			# CHECK: Unexpectedly Passed: 1


	# XUNIT: <?xml version="1.0" encoding="UTF-8"?>			# XUNIT: <?xml version="1.0" encoding="UTF-8"?>
	# XUNIT-NEXT: <testsuites time="{{[0-9.]+}}">			# XUNIT-NEXT: <testsuites time="{{[0-9.]+}}">
	# XUNIT-NEXT: <testsuite name="shtest-format" tests="21" failures="7" skipped="4">			# XUNIT-NEXT: <testsuite name="shtest-format" tests="22" failures="8" skipped="4">

	# XUNIT: <testcase classname="shtest-format.external_shell" name="fail.txt" time="{{[0-9]+\.[0-9]+}}">			# XUNIT: <testcase classname="shtest-format.external_shell" name="fail.txt" time="{{[0-9]+\.[0-9]+}}">
	# XUNIT-NEXT: <failure{{[ ]*}}>			# XUNIT-NEXT: <failure{{[ ]*}}>
	# XUNIT: </failure>			# XUNIT: </failure>
	# XUNIT-NEXT: </testcase>			# XUNIT-NEXT: </testcase>


	# XUNIT: <testcase classname="shtest-format.external_shell" name="fail_with_bad_encoding.txt" time="{{[0-9]+\.[0-9]+}}">			# XUNIT: <testcase classname="shtest-format.external_shell" name="fail_with_bad_encoding.txt" time="{{[0-9]+\.[0-9]+}}">
	# XUNIT-NEXT: <failure{{[ ]*}}>			# XUNIT-NEXT: <failure{{[ ]*}}>
	# XUNIT: </failure>			# XUNIT: </failure>
	# XUNIT-NEXT: </testcase>			# XUNIT-NEXT: </testcase>

				# XUNIT: <testcase classname="shtest-format.external_shell" name="fail_with_control_chars.txt" time="{{[0-9]+\.[0-9]+}}">
				# XUNIT-NEXT: <failure><![CDATA[Script:
				# XUNIT: Command Output (stdout):
				# XUNIT-NEXT: --
				# XUNIT-NEXT: a line with [2;30;41mcontrol characters[0m.
				# XUNIT: </failure>
				# XUNIT-NEXT: </testcase>

	# XUNIT: <testcase classname="shtest-format.external_shell" name="pass.txt" time="{{[0-9]+\.[0-9]+}}"/>			# XUNIT: <testcase classname="shtest-format.external_shell" name="pass.txt" time="{{[0-9]+\.[0-9]+}}"/>

	# XUNIT: <testcase classname="shtest-format.shtest-format" name="fail.txt" time="{{[0-9]+\.[0-9]+}}">			# XUNIT: <testcase classname="shtest-format.shtest-format" name="fail.txt" time="{{[0-9]+\.[0-9]+}}">
	# XUNIT-NEXT: <failure{{[ ]*}}>			# XUNIT-NEXT: <failure{{[ ]*}}>
	# XUNIT: </failure>			# XUNIT: </failure>
	# XUNIT-NEXT: </testcase>			# XUNIT-NEXT: </testcase>

	# XUNIT: <testcase classname="shtest-format.shtest-format" name="no-test-line.txt" time="{{[0-9]+\.[0-9]+}}">			# XUNIT: <testcase classname="shtest-format.shtest-format" name="no-test-line.txt" time="{{[0-9]+\.[0-9]+}}">
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[lit] Remove ANSI control character from xunit output
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 283520

llvm/utils/lit/lit/reports.py

llvm/utils/lit/tests/Inputs/shtest-format/external_shell/fail_with_control_chars.txt

llvm/utils/lit/tests/Inputs/shtest-format/external_shell/write-control-chars.py

llvm/utils/lit/tests/shtest-format.py

This is an archive of the discontinued LLVM Phabricator instance.

[lit] Remove ANSI control character from xunit outputClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 283520

llvm/utils/lit/lit/reports.py

llvm/utils/lit/tests/Inputs/shtest-format/external_shell/fail_with_control_chars.txt

llvm/utils/lit/tests/Inputs/shtest-format/external_shell/write-control-chars.py

llvm/utils/lit/tests/shtest-format.py

[lit] Remove ANSI control character from xunit output
ClosedPublic