This is an archive of the discontinued LLVM Phabricator instance.

Fix llvm-objcopy/ELF/preserve-segment-contents test on UTF-8 locale
ClosedPublic

Authored by aganea on Apr 25 2019, 9:12 AM.

Details

Summary

Previously, I had the following error under WSL/Ubuntu 18.04 with Python 3.6:

+ /usr/bin/python3.6 /mnt/f/svn/buildWSL/test/tools/llvm-objcopy/ELF/Output/preserve-segment-contents.test.tmp.py
Traceback (most recent call last):
  File "/mnt/f/svn/buildWSL/test/tools/llvm-objcopy/ELF/Output/preserve-segment-contents.test.tmp.py", line 8, in <module>
    input.write('\xDE\xAD\xBE\xEF')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

Strangely, Python 3.7 on Windows under MINGW32 was fine.

Tested on Python 2.7.16 and 3.7 on Windows. Tested on Python 3.6 on WSL.

Diff Detail

Repository
rL LLVM

Event Timeline

aganea created this revision.Apr 25 2019, 9:12 AM

to me looks good.

Some side notes:

Strangely, Python 3.7 on Windows under MINGW32 was fine

perhaps because the default encoding for those environments is different

https://stackoverflow.com/questions/20923663/unicodeencodeerror-ascii-codec-cant-encode-character-in-position-0-ordinal/20923915#20923915
https://stackoverflow.com/questions/31917595/how-to-write-a-raw-hex-byte-to-stdout-in-python-3

I would check the other tests as well - if there are any similar places where we write raw bytes in Python

This revision is now accepted and ready to land.Apr 25 2019, 8:15 PM

Yes the encoding is different. By running https://gist.github.com/zed/5898423 I get:

   Windows 10 cmd shell:                    WSL:
locale(False):     cp1252             locale(False):     UTF-8
device(stdout):    cp850              device(stdout):    UTF-8
stdout.encoding:   utf-8              stdout.encoding:   UTF-8
device(stderr):    cp850              device(stderr):    UTF-8
stderr.encoding:   utf-8              stderr.encoding:   UTF-8
device(stdin):     cp850              device(stdin):     UTF-8
stdin.encoding:    utf-8              stdin.encoding:    UTF-8

I've checked clang/llvm/lld/lldb, this seems to be the only place where raw bytes are encoded in python.

aganea retitled this revision from Fix llvm-objcopy/ELF/preserve-segment-contents test under Python 3.6 to Fix llvm-objcopy/ELF/preserve-segment-contents test on UTF-8 locale.Apr 26 2019, 6:05 AM
This revision was automatically updated to reflect the committed changes.