This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/utils/lit/lit/
-
utils/
-
lit/
-
lit/
3
TestRunner.py

Differential D37946

[lit] Fix some Python 3 compatibility issues.
AcceptedPublic

Authored by zturner on Sep 16 2017, 10:29 AM.

Download Raw Diff

Details

Reviewers

dlj
rnk
modocache

Summary

It seems we already have bots running lit under Python 3, so as I'm making more changes, I need to test under Python 3. Unfortunately, there are some Windows differences that this bot doesn't exercise, and so I can't verify that my changes work without getting the suite Python 3 clean on Windows.

There were two main problems, both having to do with str / bytes compatibility.

On Windows we sometimes re-open stdout as binary instead of text. But this means we have to write bytes instead of str in Py3.

We were using str.decode('string_escape'), but a) string_escape doesn't even exist in Py3 (you have to use unicode_escape), and b) you can't decode a str in Py3, you can only decode a bytes.

The patch here solves this by:

a) providing a wrapper that takes a str and encodes as utf8 bytes if and only if we re-opened stdout in binary mode.
b) Making the unescaper normalize on bytes up front, then decoding as either string_escape or unicode_escape depending on the version. The result is always a str, which can then be passed to the function in a).

After this patch there's only 1 test still failing for me under Python 3, but it's non critical to verifying that my future patches don't break anything.

Diff Detail

Event Timeline

zturner created this revision.Sep 16 2017, 10:29 AM

Herald added a reviewer: modocache. · View Herald TranscriptSep 16 2017, 10:29 AM

Going to go ahead and submit this since it unblocks me and I tested in both Py2 and Py3. So consider this a request for optional post-commit review :) (although if it somehow breaks something on the bots I'll go ahead and revert).

dlj added inline comments.Sep 18 2017, 4:17 PM

llvm/utils/lit/lit/TestRunner.py
286	So I have to admit that here, I'm not sure exactly what corner cases are problematic... but would lit.util.to_string give you the correct result?

zturner added inline comments.Sep 18 2017, 7:34 PM

llvm/utils/lit/lit/TestRunner.py
286	No, because `to_string` literally just converts from bytes to string. `string_escape` is kind of a non-standard codec. It literally just adds or removes backslashes as necessary in order to convert between a python string literal and the actual value. For example, if you encode `a\nb` using `string_escape`, then you're saying "I have the actual string literal `a\nb`, give me something that I could use to write that string literal in Python. So it returns `a\\nb`. On the other hand, if you decode `a\nb` using the `string_escape` codec, you're telling it that the string is already escaped, and you want to know what it would look like if printed. So it returns a b We need this logic here because we're literally parsing a command line, and we're mimicing the shell's escaping rules. This way a person can write `echo "foo\tbar"` and it will actually print `foo<tab>bar` (Hopefully I got that explanation right)

dlj accepted this revision.Sep 18 2017, 8:17 PM

dlj added inline comments.

llvm/utils/lit/lit/TestRunner.py
286	OK, that makes sense. The to_string conversion doesn't quite do a straightforward conversion, but IIRC returns some sort of a repr-like syntax if the input isn't otherwise-valid UTF-8. So your choice here seems better in this case. I also seem to recall that string_escape only escapes double quotes, not single quotes (since its purpose is to generate legal strings for C). But I don't think that's an issue here.

This revision is now accepted and ready to land.Sep 18 2017, 8:17 PM

jplatte added a subscriber: jplatte.Sep 10 2019, 5:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 10 2019, 5:14 AM

Herald added a subscriber: delcypher. · View Herald Transcript

modocache resigned from this revision.Nov 21 2019, 8:09 PM

Revision Contents

Path

Size

llvm/

utils/

lit/

TestRunner.py

19 lines

Diff 115541

llvm/utils/lit/lit/TestRunner.py

Show First 20 Lines • Show All 245 Lines • ▼ Show 20 Lines	stdin, stdout, stderr = processRedirects(cmd, subprocess.PIPE, shenv,
opened_files)		opened_files)
if stdin != subprocess.PIPE or stderr != subprocess.PIPE:		if stdin != subprocess.PIPE or stderr != subprocess.PIPE:
raise InternalShellError(		raise InternalShellError(
cmd, "stdin and stderr redirects not supported for echo")		cmd, "stdin and stderr redirects not supported for echo")

# Some tests have un-redirected echo commands to help debug test failures.		# Some tests have un-redirected echo commands to help debug test failures.
# Buffer our output and return it to the caller.		# Buffer our output and return it to the caller.
is_redirected = True		is_redirected = True
		encode = lambda x : x
if stdout == subprocess.PIPE:		if stdout == subprocess.PIPE:
is_redirected = False		is_redirected = False
stdout = StringIO()		stdout = StringIO()
elif kIsWindows:		elif kIsWindows:
# Reopen stdout in binary mode to avoid CRLF translation. The versions		# Reopen stdout in binary mode to avoid CRLF translation. The versions
# of echo we are replacing on Windows all emit plain LF, and the LLVM		# of echo we are replacing on Windows all emit plain LF, and the LLVM
# tests now depend on this.		# tests now depend on this.
		# When we open as binary, however, this also means that we have to write
		# 'bytes' objects to stdout instead of 'str' objects.
		encode = lit.util.to_bytes
stdout = open(stdout.name, stdout.mode + 'b')		stdout = open(stdout.name, stdout.mode + 'b')
opened_files.append((None, None, stdout, None))		opened_files.append((None, None, stdout, None))

# Implement echo flags. We only support -e and -n, and not yet in		# Implement echo flags. We only support -e and -n, and not yet in
# combination. We have to ignore unknown flags, because `echo "-D FOO"`		# combination. We have to ignore unknown flags, because `echo "-D FOO"`
# prints the dash.		# prints the dash.
args = cmd.args[1:]		args = cmd.args[1:]
interpret_escapes = False		interpret_escapes = False
write_newline = True		write_newline = True
while len(args) >= 1 and args[0] in ('-e', '-n'):		while len(args) >= 1 and args[0] in ('-e', '-n'):
flag = args[0]		flag = args[0]
args = args[1:]		args = args[1:]
if flag == '-e':		if flag == '-e':
interpret_escapes = True		interpret_escapes = True
elif flag == '-n':		elif flag == '-n':
write_newline = False		write_newline = False

def maybeUnescape(arg):		def maybeUnescape(arg):
if not interpret_escapes:		if not interpret_escapes:
return arg		return arg
# Python string escapes and "echo" escapes are obviously different, but
# this should be enough for the LLVM test suite.		arg = lit.util.to_bytes(arg)
		dljUnsubmitted Not Done Reply Inline Actions So I have to admit that here, I'm not sure exactly what corner cases are problematic... but would lit.util.to_string give you the correct result? dlj: So I have to admit that here, I'm not sure exactly what corner cases are problematic... but…
		zturnerAuthorUnsubmitted Not Done Reply Inline Actions No, because `to_string` literally just converts from bytes to string. `string_escape` is kind of a non-standard codec. It literally just adds or removes backslashes as necessary in order to convert between a python string literal and the actual value. For example, if you encode `a\nb` using `string_escape`, then you're saying "I have the actual string literal `a\nb`, give me something that I could use to write that string literal in Python. So it returns `a\\nb`. On the other hand, if you decode `a\nb` using the `string_escape` codec, you're telling it that the string is already escaped, and you want to know what it would look like if printed. So it returns a b We need this logic here because we're literally parsing a command line, and we're mimicing the shell's escaping rules. This way a person can write `echo "foo\tbar"` and it will actually print `foo<tab>bar` (Hopefully I got that explanation right) zturner: No, because `to_string` literally just converts from bytes to string. `string_escape` is kind…
		dljUnsubmitted Not Done Reply Inline Actions OK, that makes sense. The to_string conversion doesn't quite do a straightforward conversion, but IIRC returns some sort of a repr-like syntax if the input isn't otherwise-valid UTF-8. So your choice here seems better in this case. I also seem to recall that string_escape only escapes double quotes, not single quotes (since its purpose is to generate legal strings for C). But I don't think that's an issue here. dlj: OK, that makes sense. The to_string conversion doesn't //quite// do a straightforward…
return arg.decode('string_escape')		codec = 'string_escape' if sys.version_info < (3,0) else 'unicode_escape'
		return arg.decode(codec)

if args:		if args:
for arg in args[:-1]:		for arg in args[:-1]:
stdout.write(maybeUnescape(arg))		stdout.write(encode(maybeUnescape(arg)))
stdout.write(' ')		stdout.write(encode(' '))
stdout.write(maybeUnescape(args[-1]))		stdout.write(encode(maybeUnescape(args[-1])))
if write_newline:		if write_newline:
stdout.write('\n')		stdout.write(encode('\n'))

for (name, mode, f, path) in opened_files:		for (name, mode, f, path) in opened_files:
f.close()		f.close()

if not is_redirected:		if not is_redirected:
return stdout.getvalue()		return stdout.getvalue()
return ""		return ""

▲ Show 20 Lines • Show All 907 Lines • Show Last 20 Lines