This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/utils/lit/
-
trunk/
-
utils/
-
lit/
-
lit/
-
TestRunner.py
-
tests/
-
Inputs/shtest-format/external_shell/
-
shtest-format/
-
external_shell/
-
utf8_command.txt
-
shtest-format.py

Differential D63254

[lit] Fix UnicodeEncodeError when test commands contain non-ASCII chars
ClosedPublic

Authored by mgorny on Jun 13 2019, 5:13 AM.

Download Raw Diff

Details

Reviewers

rnk
• ddunbar
ruiu
serge-sans-paille
jmittert

Commits

rG0c28a8f6282f: [lit] Fix UnicodeEncodeError when test commands contain non-ASCII chars
rL363388: [lit] Fix UnicodeEncodeError when test commands contain non-ASCII chars

Summary

Ensure that the bash script written by lit TestRunner is open with UTF-8
encoding when using Python 3. Otherwise, attempt to write non-ASCII
characters causes UnicodeEncodeError. This happened e.g. with
the following LLD test:

UNRESOLVED: lld :: ELF/format-binary-non-ascii.s (657 of 2119)
******************** TEST 'lld :: ELF/format-binary-non-ascii.s' FAILED ********************
Exception during script execution:
Traceback (most recent call last):
  File "/home/mgorny/llvm-project/llvm/utils/lit/lit/worker.py", line 63, in _execute_test
    result = test.config.test_format.execute(test, lit_config)
  File "/home/mgorny/llvm-project/llvm/utils/lit/lit/formats/shtest.py", line 25, in execute
    self.execute_external)
  File "/home/mgorny/llvm-project/llvm/utils/lit/lit/TestRunner.py", line 1644, in executeShTest
    res = _runShTest(test, litConfig, useExternalSh, script, tmpBase)
  File "/home/mgorny/llvm-project/llvm/utils/lit/lit/TestRunner.py", line 1590, in _runShTest
    res = executeScript(test, litConfig, tmpBase, script, execdir)
  File "/home/mgorny/llvm-project/llvm/utils/lit/lit/TestRunner.py", line 1157, in executeScript
    f.write('{ ' + '; } &&\n{ '.join(commands) + '; }')
UnicodeEncodeError: 'ascii' codec can't encode character '\xa3' in position 274: ordinal not in range(128)

Diff Detail

Repository: rL LLVM

Event Timeline

mgorny created this revision.Jun 13 2019, 5:13 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2019, 5:13 AM

Herald added a subscriber: delcypher. · View Herald Transcript

LGTM. Can you add a test case in lit/tests for that?

Here's a proposed test case. I've verified that it failed (became unresolved) without my patch.

lgtm

llvm/utils/lit/lit/TestRunner.py
1139 ↗	(On Diff #204557)	I was thinking perhaps this should be `if` not `elif`, but I've run the ELF LLD test suite successfully on Windows with Python 3 and it works fine as is.

This revision is now accepted and ready to land.Jun 13 2019, 10:23 AM

mgorny marked an inline comment as done.Jun 13 2019, 11:36 AM

mgorny added inline comments.

llvm/utils/lit/lit/TestRunner.py
1139 ↗	(On Diff #204557)	This is `elif` because encoding only applies when mode is `w` and not `wb`.

LGTM

llvm/utils/lit/lit/TestRunner.py
1137–1141 ↗	(On Diff #204557)	I'd perhaps write this this way, f = None if litConfig.isWindows and not isWin32CMDEXE: f = open(script, 'wb') elif sys.version_info > (3,0): f = open(script, 'w', encoding='utf-8') else f = open(script, 'w') so that it is clear what values are passed to `open`, but that's probably my personal preference.

mgorny marked an inline comment as done.Jun 14 2019, 6:23 AM

mgorny added inline comments.

llvm/utils/lit/lit/TestRunner.py
1137–1141 ↗	(On Diff #204557)	Sure but the code below relies on `mode` being defined :-(. I personally think the whole function here is a horrible hack but I currently don't have a good idea how to write it better. I mean, how does it make sense to have two copies of everything with different string types in order to control endlines as a side effect? Even conditionally putting `\r\n` would be cleaner than that. In fact, maybe I'll rewrite it to do just that…

Closed by commit rL363388: [lit] Fix UnicodeEncodeError when test commands contain non-ASCII chars (authored by mgorny). · Explain WhyJun 14 2019, 6:28 AM

This revision was automatically updated to reflect the committed changes.

Anyway, committed as-is to resolve the issue at hand and I'll think how to rewrite it best.

@rnk, do you happen to know whether we are using UTF-8 on Windows these days or some system-native encoding?

Revision Contents

Path

Size

llvm/

trunk/

utils/

lit/

TestRunner.py

7 lines

tests/

Inputs/

shtest-format/

external_shell/

utf8_command.txt

3 lines

shtest-format.py

4 lines

Diff 204756

llvm/trunk/utils/lit/lit/TestRunner.py

Show First 20 Lines • Show All 1,127 Lines • ▼ Show 20 Lines	def executeScript(test, litConfig, tmpBase, commands, cwd):
bashPath = litConfig.getBashPath()		bashPath = litConfig.getBashPath()
isWin32CMDEXE = (litConfig.isWindows and not bashPath)		isWin32CMDEXE = (litConfig.isWindows and not bashPath)
script = tmpBase + '.script'		script = tmpBase + '.script'
if isWin32CMDEXE:		if isWin32CMDEXE:
script += '.bat'		script += '.bat'

# Write script file		# Write script file
mode = 'w'		mode = 'w'
		open_kwargs = {}
if litConfig.isWindows and not isWin32CMDEXE:		if litConfig.isWindows and not isWin32CMDEXE:
mode += 'b' # Avoid CRLFs when writing bash scripts.		mode += 'b' # Avoid CRLFs when writing bash scripts.
f = open(script, mode)		elif sys.version_info > (3,0):
		open_kwargs['encoding'] = 'utf-8'
		f = open(script, mode, **open_kwargs)
if isWin32CMDEXE:		if isWin32CMDEXE:
for i, ln in enumerate(commands):		for i, ln in enumerate(commands):
commands[i] = re.sub(kPdbgRegex, "echo '\\1' > nul && ", ln)		commands[i] = re.sub(kPdbgRegex, "echo '\\1' > nul && ", ln)
if litConfig.echo_all_commands:		if litConfig.echo_all_commands:
f.write('@echo on\n')		f.write('@echo on\n')
else:		else:
f.write('@echo off\n')		f.write('@echo off\n')
f.write('\n@if %ERRORLEVEL% NEQ 0 EXIT\n'.join(commands))		f.write('\n@if %ERRORLEVEL% NEQ 0 EXIT\n'.join(commands))
▲ Show 20 Lines • Show All 505 Lines • Show Last 20 Lines

llvm/trunk/utils/lit/tests/Inputs/shtest-format/external_shell/utf8_command.txt

				# Run a command including UTF-8 characters.
				#
				# RUN: echo £

llvm/trunk/utils/lit/tests/shtest-format.py

	Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	# CHECK: Unexpected Passing Tests (1)			# CHECK: Unexpected Passing Tests (1)
	# CHECK: shtest-format :: xpass.txt			# CHECK: shtest-format :: xpass.txt

	# CHECK: Failing Tests (3)			# CHECK: Failing Tests (3)
	# CHECK: shtest-format :: external_shell/fail.txt			# CHECK: shtest-format :: external_shell/fail.txt
	# CHECK: shtest-format :: external_shell/fail_with_bad_encoding.txt			# CHECK: shtest-format :: external_shell/fail_with_bad_encoding.txt
	# CHECK: shtest-format :: fail.txt			# CHECK: shtest-format :: fail.txt

	# CHECK: Expected Passes : 7			# CHECK: Expected Passes : 8
	# CHECK: Expected Failures : 4			# CHECK: Expected Failures : 4
	# CHECK: Unsupported Tests : 5			# CHECK: Unsupported Tests : 5
	# CHECK: Unresolved Tests : 3			# CHECK: Unresolved Tests : 3
	# CHECK: Unexpected Passes : 1			# CHECK: Unexpected Passes : 1
	# CHECK: Unexpected Failures: 3			# CHECK: Unexpected Failures: 3


	# XUNIT: <?xml version="1.0" encoding="UTF-8" ?>			# XUNIT: <?xml version="1.0" encoding="UTF-8" ?>
	# XUNIT-NEXT: <testsuites>			# XUNIT-NEXT: <testsuites>
	# XUNIT-NEXT: <testsuite name="shtest-format" tests="23" failures="7" skipped="5">			# XUNIT-NEXT: <testsuite name="shtest-format" tests="24" failures="7" skipped="5">

	# XUNIT: <testcase classname="shtest-format.shtest-format" name="argv0.txt" time="{{[0-9]+\.[0-9]+}}"/>			# XUNIT: <testcase classname="shtest-format.shtest-format" name="argv0.txt" time="{{[0-9]+\.[0-9]+}}"/>

	# XUNIT: <testcase classname="shtest-format.external_shell" name="fail.txt" time="{{[0-9]+\.[0-9]+}}">			# XUNIT: <testcase classname="shtest-format.external_shell" name="fail.txt" time="{{[0-9]+\.[0-9]+}}">
	# XUNIT-NEXT: <failure{{[ ]*}}>			# XUNIT-NEXT: <failure{{[ ]*}}>
	# XUNIT: </failure>			# XUNIT: </failure>
	# XUNIT-NEXT: </testcase>			# XUNIT-NEXT: </testcase>

	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines