This is an archive of the discontinued LLVM Phabricator instance.

[lldb/Test] Decode stdout and stderr in case it contains UTF-8
ClosedPublic

Authored by JDevlieghere on Mar 27 2020, 2:46 PM.

Details

Summary

Fixes a UnicodeDecodeError when stdout or stderr contain non-ASCI characters.

    output += """Command Output (stdout):\n--\n%s\n--\n""" % (out,)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 721: ordinal not in range(128)

Diff Detail

Event Timeline

JDevlieghere created this revision.
JDevlieghere added a reviewer: labath.

Use single quotes for consistency

friss added a subscriber: friss.Mar 27 2020, 3:57 PM

I think this won't work with Python3, were out and err will be already "decoded" str objects

JDevlieghere planned changes to this revision.Mar 27 2020, 3:59 PM

Right, I didn't check whether out and err were strings or bytes. I'll see if I can find a better solution or alternatively wrap it in a try-except.

Right, I didn't check whether out and err were strings or bytes. I'll see if I can find a better solution or alternatively wrap it in a try-except.

It might be simplest to wrap that in if PY2 (when can we drop python2 support again?). I don't believe this can happen with py3 due to the try-catch here.

A lot of this stuff could be cleaned up when python2 is gone. Right now it's very confusing to reason about what is the type a thing in which python version.

lldb/test/API/lldbtest.py
85

replace or backslashreplace error strategy might be better, so that the presence/value of the erroneous character is not completely lost.

JDevlieghere abandoned this revision.Apr 2 2020, 8:53 AM

After looking at util.py I'm convinced that this should never happen, unless there's a bug in the to_string implementation in which case we'd need to fix it there.

JDevlieghere added a comment.EditedApr 20 2020, 3:39 PM

Lit's to_string will just return the string when it's a str instance, which in Python can still contain UTF-8 characters:

>>> foo = "๐Ÿ˜€"
>>> isinstance(foo, str)
True
>>> foo.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)
labath accepted this revision.Apr 21 2020, 1:17 AM

This seems like the simplest thing to do while we wait for py2 to die.

This revision is now accepted and ready to land.Apr 21 2020, 1:17 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. ยท View Herald TranscriptApr 21 2020, 9:10 AM