This is an archive of the discontinued LLVM Phabricator instance.

RFC [lit] Detect if processes to execute are MSys based, apply custom quoting logic
AbandonedPublic

Authored by mstorsjo on Mar 26 2021, 3:15 AM.

Details

Reviewers
rnk
Summary

There's no single quoting strategy that works for all kinds of inputs
in a consistent way to both regular win32 executables and MSys based
ones. Instead try to detect the target executable kind, and apply the
most appropriate quoting logic.

This is a different alternative to D99330. This requires an extra
python module (from https://github.com/erocarrera/pefile) - I presume
we can't just blindly add such new dependencies, not even if made
conditional to when running on windows. We could bundle it in-tree
though (if that's acceptable license-wise).

I'm not sure if this is the way to go or not, but this contains
examples of what breaks and disqualifies other solution attempts,
and showcases the issue that D99330 works around.

The added testcases fail if executed with msys (e.g. git bash) based
tools today, before fixing the quoting.

With D98859 applied, one can also run the same testcase wrapped in 'not',
with 'not not echo' - with this fix alone the plain 'echo' cases work,
but not when roundtripped via the 'not' tool (which uses
llvm/lib/Support/Windows/Program.inc for quoting arguments, which
uses quoting logic of its own).

Diff Detail

Event Timeline

mstorsjo created this revision.Mar 26 2021, 3:15 AM
mstorsjo requested review of this revision.Mar 26 2021, 3:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 26 2021, 3:15 AM
rnk added inline comments.Apr 5 2021, 1:26 PM
llvm/utils/lit/lit/TestRunner.py
221–226

I want to try and find a way to challenge this finding, it's not consistent with what I found when I looked into this years ago, and, really, I just don't want it to be true, for obvious reasons. :)

What cases require msys globbing? I thought the whole reason that we implemented globbing in the lit shell was because regular win32 tools don't implement globbing.

Is MSys glob support something new, something that came after D26009 (2016)?

232

I think this is the case I encountered when I added this custom quoting logic. The original needquote condition is:
https://github.com/python/cpython/blob/master/Lib/subprocess.py#L568

needquote = (" " in arg) or ("\t" in arg) or not arg

I added a check for ", so that in this case, the input will be "a\"b", and both an msys-no-glob and win32 app will see a"b.

235

Are you sure the msys-noglob case here is correct? That seems wrong...

I have always believed that the correct, safe way to quote anything on Windows is to:

  • surround each argument in double quotes
  • backslash escape every \ and " character

Even though this escapes strictly more than is necessary, you usually get the same results. This is for example the algorithm used to produce clang crash reproducer scripts, which is nice because they usually run in both cmd and bash.

Can we file a bug against msys about this? I feel like the app really should see a"b in this case.

mstorsjo added inline comments.Apr 5 2021, 1:53 PM
llvm/utils/lit/lit/TestRunner.py
221–226

I think you might just have been lucky. Given all the funky char strings used in test command lines, all of them used right now happen to work, so apparently these cases are pretty rare. The one exception seems to be the case in clang-tidy (D99330), where the [ in the argument later breaks \ - and that's just luckily/accidentally masked by being passed via not.exe (with different quoting rules) right now.

It's not so much that anything _requires_ msys globbing - it's just that certain characters trigger that msys codepath. In traditional win32, a wildcard argument doesn't get expanded by the caller but is supposed to be expanded by the callee, but I guess msys steps in and tries to expand these before the application sees them.

https://github.com/msys2/msys2-runtime/blob/msys2-3_2_0-release/winsup/cygwin/dcrt0.cc#L208 is where this happens; any of the chars ?*[\"\'(){} trigger this codepath, and this codepath seems to be mangling some cases of e.g. single backslashes (see all the cases of quoted arguments below, where some of them lose backslashes; either losing a single backslash entirely, or reducing a series of multiple backslashes into fewer).

232

Yup. And my attempt to fix this by using MSYS=noglob surfaced this issue as it broke again.

235

I guess the big question is "what is correct"; it's clearly not what you'd expect to be interpreted given the regular win32 arg quoting rules, no. But then again, msys executables also live in a world with lots of other wild requirements with how things are supposed to behave when invoked from various contexts, so it might come from handling some other msys specific usecase of expanding arguments in a more unix-shell-y way. So not sure if it's intentionally this way, or just ends up this way while trying to fulfill all other requirements.

Your generic windows quoting algorithm is off a bit, as you're only supposed to backslash escape " and backslashes directly preceding a ", but other backslashes are supposed to stay single.

I guess I could file a bug... One problem is where to aim it, as msys is mainly a friendly fork/repackaging of cygwin. I dunno how much it differs from upstream cygwin (I'll need to get a fresh installation of that.) But I guess I could file a bug there, discuss what's expected and what's not, and if possible continue further upstream.

Secondly, which ones of the behaviours should we report as buggy? I guess we could report all of the combinations that do differ from the regular win32 interpretation. If all of them would end up fixed, we wouldn't need to care about anything else than proper win32 quoting. But even by fixing a couple of them, we could find one setup (either quoting just what we need, or quoting excessively, and with or without setting the noglob setting) where things work consistently enough for us to rely on.

rnk added inline comments.Apr 5 2021, 3:12 PM
llvm/utils/lit/lit/TestRunner.py
235

Well, the generic algorithm is meant to work with a wide variety of Unix-y and Windows-y consumers, so it doubles all backslashes. It sounds like the only cases where it doesn't work are either case 6 or related to glob characters.

As for filing a bug, I'm really focused on case 6: if the command line uses double quotes and globbing is off, MSys should try to strictly follow the unquoting rules. Otherwise, what is the correct way to pass arguments with double quotes to an msys-noglob program?

mstorsjo added inline comments.Apr 5 2021, 11:39 PM
llvm/utils/lit/lit/TestRunner.py
235

No, the generic algorithm (that we have today, inherited from the python implementation) doesn't double all backslashes.

This is the highly non-obvious part about windows argument parsing, please reread this bit slowly.

There's two concepts involved; quoting (to make sure arguments aren't split on spaces), and escaping (to disambiguate quotes/backslashes).

On Windows, when parsing a path, doubled backslashes are *only* narrowed down to a single backslash if they're followed by a quote. I.e. if I pass the argument a\\b, the called app gets a\\b, but if I pass the argument a\\"b, the app gets a\"b. So the traditionally safe approach of "let's escape anything that might be ambiguous" doesn't work here, we can escape (double) backslashes only if we're in that condition.

See the code below here in our python quoting implementation, lines 273, 277 and 283 (in the right hand view after applying this patch); first when we get a backslash we don't know if we can/must double it, so we buffer it (273-274), then when we hit a non-backslash char, we either output the buffered backslashes as-is (283) or doubled (276-277).

Now finally, this handling of backslash escaping is (in win32 apps) the same regardless of whether the current argument is enclosed in double quotes or not. On msys however, it's not; adding the double quotes trigger thinning/unescaping of too many cases of double quotes (case 5 vs 2 above).

Now back why I think I'll report all the cases, the msys implementation really is a minefield of various cases that differ from the win32 reference:

  1. You noticed that a plain a"b, which gets escaped into a\"b, doesn't get parsed back into a"b on msys, so you added " to the list of chars that require quoting. This was case 3 in the list above, and quoting makes it case 6 which works fine, so far. This is one bug in msys's parsing, but one which we could live with due to quoting.
  2. I notice that a[b\c loses the single backslash on msys (case 1). My initial reaction is to add \ to the list of characters that trigger quoting the whole string (which makes it case 4), like you did above. This fixes this particular case, but then breaks case 2 which is turned into case 5 above. (We have a lot of cases where there's a number of backslashes e.g. in regexes, where the arguments aren't quoted right now, and quoting them breaks with msys based tools.)
  3. I consider adding MSYS=noglob, which fixes case 1, but breaks case 6.

I realize now I haven't tested only adding [ to the list of chars that trigger quoting the string. That might work for now (I'll check it in a moment), but the underlying issue still remains; there doesn't seem to be a single way to consistently quote arguments so that they get parsed correctly for both proper win32 apps and msys based ones. I.e. that case would break if you'd have an argument consisting of both [ and series of double backslashes (where they're supposed to be escaped for the regex, but msys unescapes one layer too much).

mstorsjo added inline comments.Apr 6 2021, 3:25 AM
llvm/utils/lit/lit/TestRunner.py
235

Extending the check for chars that require quoting the whole string to also include [ actually fixes the clang-tidy case without breaking anything else. But the root cause still stands; if a string that contains double backslashes (like case 2) ends up quoted (into case 5), the MSYS tool breaks it by unescaping the backslashes, so it's a rather brittle setup.

I filed an initial MSYS bug about these issues at https://github.com/msys2/msys2-runtime/issues/36, cc @mati865, but I guess I'll need to continue upstream to cygwin for any actual fix for it.

I saw https://github.com/msys2/msys2-runtime/issues/36 but I have never worked with MSYS2 runtime internals so you probably know more than I do at this point.
https://github.com/dscho who maintains Git for Windows might have some insights.

FWIW, see https://github.com/msys2/msys2-runtime/issues/36#issuecomment-815692656 - Git for Windows has got a check/heuristic for whether a tool to execute is an msys based one, and use a different quoting algorithm for that case, which supports the notion that arguments need to be quoted differently for msys based tools. We've been lucky so far, and with D99938 it seems that we can postpone that issue a bit further.

There's an unmerged patch for upstream cygwin that is supposed to make the argument parsing standards compliant (I haven't tested it myself yet), but as Johannes pointed out in that msys2-runtime issue, fixing the argument parsing properly breaks things for everybody who so far have had to work around the issue, as they quote things in a nonstandard way for the current msys parser. (That also is an argument against going with this patch - if we can avoid doing standards-incompliant special quoting for msys, we've got an easier situation if they fix their behaviour in the future too.)

rnk added inline comments.Apr 8 2021, 2:17 PM
llvm/utils/lit/lit/TestRunner.py
235

On Windows, when parsing a path, doubled backslashes are *only* narrowed down to a single backslash if they're followed by a quote.

Oh yeah, you're right. This approach doesn't work. I was so confident I had to actually a CreateProcess / argv sample program to convince myself.

No, the generic algorithm (that we have today, inherited from the python implementation) doesn't double all backslashes.

The algorithm I'm describing is what we use to make crash reports, not what's used here in lit:
https://github.com/llvm/llvm-project/blob/4fcb25583c3ccbe10c4367d02086269e5fa0bb87/llvm/lib/Support/Program.cpp#L83

In any case, I only imagined that it worked. >_>

Extending the check for chars that require quoting the whole string to also include [ actually fixes the clang-tidy case without breaking anything else.

I think that's my preference, then, for now.

mstorsjo added inline comments.Apr 8 2021, 2:22 PM
llvm/utils/lit/lit/TestRunner.py
235

Oh, right. That quoting that doubles backslashes might end up working in practice if the backslashes are path separators, and interprets it correctly if the string is passed to a bash shell though.

rnk added inline comments.Apr 13 2021, 1:55 PM
llvm/utils/lit/lit/TestRunner.py
235

So, if we add [ to the needquote condition, does that more or less make things work for now?

mstorsjo added inline comments.Apr 13 2021, 2:04 PM
llvm/utils/lit/lit/TestRunner.py
235

Yes, that fixes it for now (that's D99938). That patch does't make things strictly worse than what we have right now anyway.

In the msys2 bug report there's a potentially useful patch referenced (that hasn't yet been merged), but there's at least some hope of getting it to work reliably at some point in the future (after such a fix is in place, and after all relevant build setups are upgraded far enough).

mstorsjo abandoned this revision.Apr 14 2021, 5:38 AM

Not needed for now, handled by D99938 for now.