Fixes a UnicodeDecodeError when stdout or stderr contain non-ASCI characters.
output += """Command Output (stdout):\n--\n%s\n--\n""" % (out,) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 721: ordinal not in range(128)
Differential D76955
[lldb/Test] Decode stdout and stderr in case it contains UTF-8 JDevlieghere on Mar 27 2020, 2:46 PM. Authored by
Details
Fixes a UnicodeDecodeError when stdout or stderr contain non-ASCI characters. output += """Command Output (stdout):\n--\n%s\n--\n""" % (out,) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 721: ordinal not in range(128)
Diff Detail
Event TimelineComment Actions I think this won't work with Python3, were out and err will be already "decoded" str objects Comment Actions Right, I didn't check whether out and err were strings or bytes. I'll see if I can find a better solution or alternatively wrap it in a try-except. Comment Actions It might be simplest to wrap that in if PY2 (when can we drop python2 support again?). I don't believe this can happen with py3 due to the try-catch here. A lot of this stuff could be cleaned up when python2 is gone. Right now it's very confusing to reason about what is the type a thing in which python version.
Comment Actions After looking at util.py I'm convinced that this should never happen, unless there's a bug in the to_string implementation in which case we'd need to fix it there. Comment Actions Lit's to_string will just return the string when it's a str instance, which in Python can still contain UTF-8 characters: >>> foo = "๐" >>> isinstance(foo, str) True >>> foo.encode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128) |
replace or backslashreplace error strategy might be better, so that the presence/value of the erroneous character is not completely lost.