This is an archive of the discontinued LLVM Phabricator instance.

[lit] Fix some Python 3 compatibility issues.
AcceptedPublic

Authored by zturner on Sep 16 2017, 10:29 AM.

Details

Reviewers
dlj
rnk
modocache
Summary

It seems we already have bots running lit under Python 3, so as I'm making more changes, I need to test under Python 3. Unfortunately, there are some Windows differences that this bot doesn't exercise, and so I can't verify that my changes work without getting the suite Python 3 clean on Windows.

There were two main problems, both having to do with str / bytes compatibility.

  1. On Windows we sometimes re-open stdout as binary instead of text. But this means we have to write bytes instead of str in Py3.
  1. We were using str.decode('string_escape'), but a) string_escape doesn't even exist in Py3 (you have to use unicode_escape), and b) you can't decode a str in Py3, you can only decode a bytes.

The patch here solves this by:

a) providing a wrapper that takes a str and encodes as utf8 bytes if and only if we re-opened stdout in binary mode.
b) Making the unescaper normalize on bytes up front, then decoding as either string_escape or unicode_escape depending on the version. The result is always a str, which can then be passed to the function in a).

After this patch there's only 1 test still failing for me under Python 3, but it's non critical to verifying that my future patches don't break anything.

Diff Detail

Event Timeline

zturner created this revision.Sep 16 2017, 10:29 AM

Going to go ahead and submit this since it unblocks me and I tested in both Py2 and Py3. So consider this a request for optional post-commit review :) (although if it somehow breaks something on the bots I'll go ahead and revert).

dlj added inline comments.Sep 18 2017, 4:17 PM
llvm/utils/lit/lit/TestRunner.py
286

So I have to admit that here, I'm not sure exactly what corner cases are problematic... but would lit.util.to_string give you the correct result?

zturner added inline comments.Sep 18 2017, 7:34 PM
llvm/utils/lit/lit/TestRunner.py
286

No, because to_string literally just converts from bytes to string. string_escape is kind of a non-standard codec. It literally just adds or removes backslashes as necessary in order to convert between a python string literal and the actual value. For example, if you encode a\nb using string_escape, then you're saying "I have the actual string literal a\nb, give me something that I could use to write that string literal in Python. So it returns a\\nb.

On the other hand, if you decode a\nb using the string_escape codec, you're telling it that the string is already escaped, and you want to know what it would look like if printed. So it returns

a
b

We need this logic here because we're literally parsing a command line, and we're mimicing the shell's escaping rules. This way a person can write echo "foo\tbar" and it will actually print foo<tab>bar

(Hopefully I got that explanation right)

dlj accepted this revision.Sep 18 2017, 8:17 PM
dlj added inline comments.
llvm/utils/lit/lit/TestRunner.py
286

OK, that makes sense.

The to_string conversion doesn't quite do a straightforward conversion, but IIRC returns some sort of a repr-like syntax if the input isn't otherwise-valid UTF-8. So your choice here seems better in this case.

I also seem to recall that string_escape only escapes double quotes, not single quotes (since its purpose is to generate legal strings for C). But I don't think that's an issue here.

This revision is now accepted and ready to land.Sep 18 2017, 8:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 10 2019, 5:14 AM
Herald added a subscriber: delcypher. · View Herald Transcript
modocache resigned from this revision.Nov 21 2019, 8:09 PM