This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lldb/test/API/
-
test/
-
API/
-
lldbtest.py

Differential D76955

[lldb/Test] Decode stdout and stderr in case it contains UTF-8
ClosedPublic

Authored by JDevlieghere on Mar 27 2020, 2:46 PM.

Download Raw Diff

Details

Reviewers

aprantl
davide
labath

Group Reviewers

Restricted Project

Commits

rG87b6ab2eaffe: [lldb/Test] Decode stdout and stderr in case it contains Unicode.
rG2de52422acf0: [lldb/Test] Decode stdout and stderr in case it contains Unicode.

Summary

Fixes a UnicodeDecodeError when stdout or stderr contain non-ASCI characters.

    output += """Command Output (stdout):\n--\n%s\n--\n""" % (out,)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 721: ordinal not in range(128)

Diff Detail

Event Timeline

Use single quotes for consistency

I think this won't work with Python3, were out and err will be already "decoded" str objects

Right, I didn't check whether out and err were strings or bytes. I'll see if I can find a better solution or alternatively wrap it in a try-except.

In D76955#1947160, @JDevlieghere wrote:

Right, I didn't check whether out and err were strings or bytes. I'll see if I can find a better solution or alternatively wrap it in a try-except.

It might be simplest to wrap that in if PY2 (when can we drop python2 support again?). I don't believe this can happen with py3 due to the try-catch here.

A lot of this stuff could be cleaned up when python2 is gone. Right now it's very confusing to reason about what is the type a thing in which python version.

lldb/test/API/lldbtest.py
85	`replace` or `backslashreplace` error strategy might be better, so that the presence/value of the erroneous character is not completely lost.

After looking at util.py I'm convinced that this should never happen, unless there's a bug in the to_string implementation in which case we'd need to fix it there.

JDevlieghere updated this revision to Diff 258857.Apr 20 2020, 3:37 PM

Lit's to_string will just return the string when it's a str instance, which in Python can still contain UTF-8 characters:

>>> foo = "😀"
>>> isinstance(foo, str)
True
>>> foo.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)

This seems like the simplest thing to do while we wait for py2 to die.

This revision is now accepted and ready to land.Apr 21 2020, 1:17 AM

Closed by commit rG2de52422acf0: [lldb/Test] Decode stdout and stderr in case it contains Unicode. (authored by JDevlieghere). · Explain WhyApr 21 2020, 9:10 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2020, 9:10 AM

Revision Contents

Path

Size

lldb/

test/

API/

lldbtest.py

5 lines

Diff 258857

lldb/test/API/lldbtest.py

Context not available.
	timeoutInfo = 'Reached timeout of {} seconds'.format(	timeoutInfo = 'Reached timeout of {} seconds'.format(
	litConfig.maxIndividualTestTime)	litConfig.maxIndividualTestTime)

		if sys.version_info.major == 2:
		# In Python 2, string objects can contain Unicode characters.
		out = out.decode('utf-8')
		err = err.decode('utf-8')

	output = """Script:\n--\n%s\n--\nExit Code: %d\n""" % (	output = """Script:\n--\n%s\n--\nExit Code: %d\n""" % (
	' '.join(cmd), exitCode)	' '.join(cmd), exitCode)
	if timeoutInfo is not None:	if timeoutInfo is not None:
Context not available.