This enables lit to work with unicode file names via mkdir, rm, and redirection. Lit still uses utf-8 internally, but converts to utf-16 on Windows, or just utf-8 bytes on everything else.
Taking my best guess at a reviewer.
Differential D56754
Add Support for Creating and Deleting Unicode Files and Directories in Lit jmittert on Jan 15 2019, 4:03 PM. Authored by
Details This enables lit to work with unicode file names via mkdir, rm, and redirection. Lit still uses utf-8 internally, but converts to utf-16 on Windows, or just utf-8 bytes on everything else. Taking my best guess at a reviewer.
Diff Detail
Event TimelineComment Actions Could you upload this with context? The easiest way is to use arcanist (https://llvm.org/docs/Phabricator.html). Otherwise use -U9999 when generating the diff to upload to Phabricator.
Comment Actions Ugh, I had broken something somewhere else which caused the tests look like I've now properly fixed the binary mode.
Comment Actions @jmittert I tried the following (simpler) patch on Linux and it seems to work nice for both Python2 and Python3 Index: lit/TestRunner.py =================================================================== --- lit/TestRunner.py (revision 353501) +++ lit/TestRunner.py (working copy) @@ -345,7 +345,7 @@ exitCode = 0 for dir in args: if not os.path.isabs(dir): - dir = os.path.realpath(os.path.join(cmd_shenv.cwd, dir)) + dir = os.path.realpath(to_bytes(os.path.join(cmd_shenv.cwd, dir))) if parent: lit.util.mkdir_p(dir) else: @@ -599,7 +599,7 @@ exitCode = 0 for path in args: if not os.path.isabs(path): - path = os.path.realpath(os.path.join(cmd_shenv.cwd, path)) + path = os.path.realpath(to_bytes(os.path.join(cmd_shenv.cwd, path))) if force and not os.path.exists(path): continue try: @@ -695,7 +695,7 @@ else: # Make sure relative paths are relative to the cwd. redir_filename = os.path.join(cmd_shenv.cwd, name) - fd = open(redir_filename, mode) + fd = open(to_bytes(redir_filename), mode) # Workaround a Win32 and/or subprocess bug when appending. # # FIXME: Actually, this is probably an instance of PR6753. What do you think of this path? Comment Actions
This doesn't work on Windows with Python 2 because it to_bytes doesn't convert the bytes to UTF16. It will work on Python 3 with Windows because py3 strings are already unicode aware. Running with python 2 creates the garbled ä¸æ–‡ directory on Windows because it tries to interpret the UTF8 as UTF16. Running with python 3 properly produces the 中文 directory. For example, adding a quick test with # RUN: mkdir -p c:/Users/jmittertreiner/Output/中文 Produces S:\build\Ninja-DebugAssert\llbuild-windows-amd64> dir C:\Users\jmittertreiner\Output\ Directory: C:\Users\jmittertreiner\Output Mode LastWriteTime Length Name ---- ------------- ------ ---- d----- 2/8/2019 9:48 AM ä¸æ–‡ <-- Running with Python 2 d----- 2/8/2019 9:49 AM 中文 <-- Running with Python 3
Comment Actions Rather than using the ambiguous (and not particularly safe)
Comment Actions @jmittert sorry for the long delay, but I'm finally fine with this patch now. I like how it explicitly emphasizes on the Windows/Linux difference. However the patch needs to be rebased against master, can you update it? |
That's nce, but now, you need to use the b prefix for all strings, and for decoding o f the joined command, right?