This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lldb/test/API/
-
test/
-
API/
1
lldbtest.py

Differential D76955

[lldb/Test] Decode stdout and stderr in case it contains UTF-8
ClosedPublic

Authored by JDevlieghere on Mar 27 2020, 2:46 PM.

Download Raw Diff

Details

Reviewers

aprantl
davide
labath

Group Reviewers

Restricted Project

Commits

rG87b6ab2eaffe: [lldb/Test] Decode stdout and stderr in case it contains Unicode.
rG2de52422acf0: [lldb/Test] Decode stdout and stderr in case it contains Unicode.

Summary

Fixes a UnicodeDecodeError when stdout or stderr contain non-ASCI characters.

    output += """Command Output (stdout):\n--\n%s\n--\n""" % (out,)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 721: ordinal not in range(128)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Use single quotes for consistency

I think this won't work with Python3, were out and err will be already "decoded" str objects

Right, I didn't check whether out and err were strings or bytes. I'll see if I can find a better solution or alternatively wrap it in a try-except.

In D76955#1947160, @JDevlieghere wrote:

Right, I didn't check whether out and err were strings or bytes. I'll see if I can find a better solution or alternatively wrap it in a try-except.

It might be simplest to wrap that in if PY2 (when can we drop python2 support again?). I don't believe this can happen with py3 due to the try-catch here.

A lot of this stuff could be cleaned up when python2 is gone. Right now it's very confusing to reason about what is the type a thing in which python version.

lldb/test/API/lldbtest.py
86	`replace` or `backslashreplace` error strategy might be better, so that the presence/value of the erroneous character is not completely lost.

After looking at util.py I'm convinced that this should never happen, unless there's a bug in the to_string implementation in which case we'd need to fix it there.

JDevlieghere updated this revision to Diff 258857.Apr 20 2020, 3:37 PM

Lit's to_string will just return the string when it's a str instance, which in Python can still contain UTF-8 characters:

>>> foo = "😀"
>>> isinstance(foo, str)
True
>>> foo.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)

This seems like the simplest thing to do while we wait for py2 to die.

This revision is now accepted and ready to land.Apr 21 2020, 1:17 AM

Closed by commit rG2de52422acf0: [lldb/Test] Decode stdout and stderr in case it contains Unicode. (authored by JDevlieghere). · Explain WhyApr 21 2020, 9:10 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2020, 9:10 AM

Revision Contents

Path

Size

lldb/

test/

API/

lldbtest.py

5 lines

Diff 259022

lldb/test/API/lldbtest.py

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	def execute(self, test, litConfig):
sys.executable.startswith('/usr/bin/')):		sys.executable.startswith('/usr/bin/')):
copied_python = os.path.join(builddir, 'copied-system-python')		copied_python = os.path.join(builddir, 'copied-system-python')
if not os.path.isfile(copied_python):		if not os.path.isfile(copied_python):
import shutil, subprocess		import shutil, subprocess
python = subprocess.check_output([		python = subprocess.check_output([
sys.executable,		sys.executable,
'-c',		'-c',
'import sys; print(sys.executable)'		'import sys; print(sys.executable)'
]).decode('utf-8').strip()		]).decode('utf-8').strip()
		labathUnsubmitted Not Done Reply Inline Actions `replace` or `backslashreplace` error strategy might be better, so that the presence/value of the erroneous character is not completely lost. labath: `replace` or `backslashreplace` [[ https://docs.python.org/3/howto/unicode.html#the-string-type…
shutil.copy(python, copied_python)		shutil.copy(python, copied_python)
cmd[0] = copied_python		cmd[0] = copied_python

if 'lldb-repro-capture' in test.config.available_features or \		if 'lldb-repro-capture' in test.config.available_features or \
'lldb-repro-replay' in test.config.available_features:		'lldb-repro-replay' in test.config.available_features:
reproducer_root = os.path.join(builddir, 'reproducers')		reproducer_root = os.path.join(builddir, 'reproducers')
mkdir_p(reproducer_root)		mkdir_p(reproducer_root)
reproducer_path = os.path.join(reproducer_root, testFile)		reproducer_path = os.path.join(reproducer_root, testFile)
Show All 10 Lines	def execute(self, test, litConfig):
timeout=litConfig.maxIndividualTestTime)		timeout=litConfig.maxIndividualTestTime)
except lit.util.ExecuteCommandTimeoutException as e:		except lit.util.ExecuteCommandTimeoutException as e:
out = e.out		out = e.out
err = e.err		err = e.err
exitCode = e.exitCode		exitCode = e.exitCode
timeoutInfo = 'Reached timeout of {} seconds'.format(		timeoutInfo = 'Reached timeout of {} seconds'.format(
litConfig.maxIndividualTestTime)		litConfig.maxIndividualTestTime)

		if sys.version_info.major == 2:
		# In Python 2, string objects can contain Unicode characters.
		out = out.decode('utf-8')
		err = err.decode('utf-8')

output = """Script:\n--\n%s\n--\nExit Code: %d\n""" % (		output = """Script:\n--\n%s\n--\nExit Code: %d\n""" % (
' '.join(cmd), exitCode)		' '.join(cmd), exitCode)
if timeoutInfo is not None:		if timeoutInfo is not None:
output += """Timeout: %s\n""" % (timeoutInfo,)		output += """Timeout: %s\n""" % (timeoutInfo,)
output += "\n"		output += "\n"

if out:		if out:
output += """Command Output (stdout):\n--\n%s\n--\n""" % (out,)		output += """Command Output (stdout):\n--\n%s\n--\n""" % (out,)
Show All 23 Lines