This is an archive of the discontinued LLVM Phabricator instance.

[lit] Only return a found bash executable on Windows if it can understand Windows paths
ClosedPublic

Authored by gbedwell on Oct 3 2018, 8:16 AM.

Details

Summary

I've been putting up with a couple of failing tests locally for a little while now and finally made the time to investigate the problem:

Failing Tests (2):
    lit :: shtest-format.py
    lit :: shtest-run-at-line.py

This seemed to be happening on my PC only. Digging deeper, I found this error nested deep in the logs:

******************** TEST 'shtest-format :: external_shell/fail.txt' FAILED ********************
Script:
--
: 'RUN: at line 3';   echo "line 1: failed test output on stdout"
: 'RUN: at line 4';   echo "line 2: failed test output on stdout"
: 'RUN: at line 5';   cat "does-not-exist"
--
Exit Code: 127

Command Output (stderr):
--
/bin/bash: E:workupstream-llvmbuild-vs2015-native-ninjautilslittestsInputsshtest-formatexternal_shellOutputfail.txt.script: No such file or directory

--

It turns out that the problem was that WSL installation had placed bash.exe in C:\windows\system32 which was now used in preference to my other version of bash ("C:\Program Files\Git\usr\bin\bash.exe"). The primary difference being that unlike git's bash.exe, WSL's bash.exe can't cope with being invoked with

bash.exe c:\\foo\\script.sh

and would instead need to be invoked with:

bash.exe /mnt/c/foo/script.sh

I've worked around the problem by just changing the order of paths in my environment variable so that git bash gets preference, but to save someone else running into the same thing, here's an attempt at fixing it. It's not the most elegant solution, but seems the least intrusive in terms of changing current behaviour.

An obvious question is whether we should ever be using bash on Windows, considering everything seems to pass without it anyway. Is there a specific use-case for that combination or could we just force it to use 'cmd /c' in TestRunner::executeScript when on Windows for all cases nowadays?

Diff Detail

Repository
rL LLVM

Event Timeline

gbedwell created this revision.Oct 3 2018, 8:16 AM

I personally think we should never use bash on Windows (wsl being the
exception but that won’t identify as Windows anyway). There’s value in
consistency and in my ideal world the lit shell would be feature rich
enough that we wouldn’t have to use bash *anywhere*. In any case right now
the configuration matrix is Windows (lit shell) and non Windows (bash). I
don’t think we should grow this by supporting Windows (bash)

rnk accepted this revision.Oct 3 2018, 2:48 PM

I see, these are the two lit tests that force usage of the external shell. Eventually we should kill off external shell support on Windows and delete the tests and this code, but for now, this workaround seems fine if it fixes the existing tests for you.

This revision is now accepted and ready to land.Oct 3 2018, 2:48 PM
gbedwell updated this revision to Diff 168190.Oct 3 2018, 3:40 PM

Thanks both for reviewing. Given that it appears that there doesn't, as I'd worried, appear to be some special hidden use-case for bash as an external shell on Windows, but is more of a historical artifact, I'm going to propose an alternative approach. To keep the code all in the same review, and as it's in a completely separate file, I've just added it to this review in the same patch instead. To be clear, I'm talking about committing only one of these files, not both.

On balance, I prefer the approach of removing the code path for bash on Windows (i.e. the TestRunner.py change), which reduces our matrix to essentially what @zturner said above and generally seems a bit cleaner.

gbedwell requested review of this revision.Oct 4 2018, 3:59 AM
rnk added a comment.Oct 4 2018, 1:43 PM

Running cmd.exe is probably even less supported than running bash. If someone is accidentally using execute_external=True, I'd prefer it if they kept using bash and not cmd. So, I'd stick with the first patch.

There are only 111 instances of REQUIRES.*shell in the LLVM monorepo today. We should push all the remaining ones to invoke sh -c ... directly for the shell they need, and make the internal lit shell the default lit shell implementation everywhere. It'll simplify lit code and make tests more portable.

To be clear, I never suggested we should treat cmd.exe as a valid choice of
external shell. What I meant was that the two supported configurations
should be:

  1. (non-Windows or WSL) && bashExists: use external shell
  2. everything else: use internal shell

Note that #2 definitely is supported from inside cmd.exe

In D52831#1255765, @rnk wrote:

Running cmd.exe is probably even less supported than running bash. If someone is accidentally using execute_external=True, I'd prefer it if they kept using bash and not cmd. So, I'd stick with the first patch.

I'm fine either way, but my curiosity has gotten the better of me now :)

I only happen to have a usable bash installation on my Windows PC because I have git for windows installed and on my path. Someone who does everything via SVN on Windows likely won't have bash. https://llvm.org/docs/GettingStartedVS.html#software only lists Python and Gnuwin32, which doesn't appear to include a shell, so there's no hard requirement to have bash listed anywhere.

If cmd really is a second class citizen, my concern is that the most likely deciding factor is going to be whether you happen to have git installed or not, hence my preference for the approach that essentially simplifies things to:

execute_external && isWindows -> 'cmd.exe'
execute_external && !isWindows -> getBashPath() or '/bin/sh'

so at least it's less reliant on external factors such as my preferred SCM system.

We used to warn about not finding bash which was removed because it was causing the Windows bots to always show up yellow (see rL228221). If the case of accidentally using execute_external=True is a concern I wonder if there's a benefit to resurrecting the warning only in the case of

execute_external && not litConfig.getBashPath()

as long as it could be done in such a way that those two above tests which explicitly force the external shell wouldn't cause the warning to be triggered every time and thus reintroduce the noise on the bots (I have to confess I've not yet looked to see how this would work).

We should push all the remaining ones to invoke sh -c ... directly for the shell they need, and make the internal lit shell the default lit shell implementation everywhere. It'll simplify lit code and make tests more portable.

Agreed. This is definitely the way forward. I wish I could find enough time to work on this :-\

execute_external && isWindows -> 'cmd.exe'

My suggestion is to make this case assert. It should not ever happen. execute_external should imply not-windows

rnk added a comment.Oct 4 2018, 4:20 PM

@gbedwell I think that, if on Windows, execute_external happens to be True and bash is available, it is more likely that existing lit tests (assuming LLVM is representative) will pass with some random version of bash we find on PATH than with cmd.exe. They are pretty different shells, after all. So, I'm hesitant to prefer cmd over bash if we can find it.

It's crummy that it's sensitive to the environment, of course. Maybe by forcing lit to use cmd.exe and letting the tests fail, more users will realize that they should change their configuration to use execute_external=False.

I guess I don't have a strong opinion one way or the other.

execute_external && isWindows -> 'cmd.exe'

My suggestion is to make this case assert. It should not ever happen. execute_external should imply not-windows

That'd be nice, but we'd have to disable or delete the tests that explicitly exercise execute_external on Windows.

gbedwell updated this revision to Diff 169011.Oct 10 2018, 7:45 AM

Another iteration :)

This is the original solution, but with added warnings, essentially reverting the general behaviour of rL228221 - "Don't warn or note if bash is missing".

Now that we're using the internal shell by default on Windows, the only place the warnings turn up is in log output for those two lit tests that specifically use the external shell, so this shouldn't be a regression in behaviour. If someone is enabling the external shell more widely and doesn't have bash available, it sounds like they'd want to know about it. I've also verified that I don't see any differences in lit output with Ubuntu 18.04.

rnk accepted this revision.Oct 11 2018, 3:37 PM

Sounds good.

This revision is now accepted and ready to land.Oct 11 2018, 3:37 PM

This is not just causing benign test failures, it also causes the entire test suite to complete not run on Python 3 due to some bytes / string issues in the !IsWin32CMDExe codepath. Can we get this submitted?

This revision was automatically updated to reflect the committed changes.