This is an archive of the discontinued LLVM Phabricator instance.

[lit] Support running tests on Windows without GnuWin32.
ClosedPublic

Authored by zturner on Jul 22 2020, 7:14 PM.

Details

Summary
Historically, we have told contributors that GnuWin32 is a pre-requisite
because our tests depend on utilities such as sed, grep, diff, and more.
However, Git on Windows includes versions of these utilities in its
installation.  Furthermore, GnuWin32 has not been updated in many years.
For these reasons, it makes sense to have the ability to run llvm tests
in a way that is both:
  a) Easier on the user (less stuff to install)
  b) More up-to-date (The verions that ship with git are at least as
     new, if not newer, than the versions in GnuWin32.

We add support for this here by attempting to detect where Git is
installed using the Windows registry, confirming the existence of
several common Unix tools, and then adding this location to lit's PATH
environment.

Diff Detail

Event Timeline

zturner created this revision.Jul 22 2020, 7:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2020, 7:14 PM
Herald added a subscriber: delcypher. · View Herald Transcript
amccarth requested changes to this revision.Jul 23 2020, 10:52 AM

I'm okay with the idea. I haven't used GnuWin32 in long while since I already have Git's usr/bin in my PATH. See detailed comments inline.

In the big picture, this unspools like this:

Old code:

if lit_tool_path is an absolute directory
    if it contains all of the tools needed
        append lit_tool_path to the PATH  (other tools may exist earlier in the PATH)
    else
        fail
else if the PATH has a directory that has all of the tools needed
    append that directory to the PATH  (but it's already in the PATH!)
else
    fail

New code:

if lit_tool_path is defined
    if lit_tool_path is an absolute directory
        if it contains all of the tools needed
            append lit_tool_path to the PATH  (other tools may exist earlier in the PATH)
        else if Git/usr/bin exists and contains all the tools needed
            append Git/usr/bin to the PATH  (other tools may exist earlier in the PATH, Git/usr/bin may already be in the PATH)
        else
            fail
    else if the PATH has a directory that has all of the tools needed
        append that directory to the PATH  (but it's already in the PATH!)
    else if Git/usr/bin exists and contains all the tools needed
        append Git/usr/bin to the PATH  (other tools may exist earlier in the PATH)
    else
        fail
else if Git/usr/bin exists and contains all the tools needed
    append Git/usr/bin to the PATH  (other tools may exist earlier in the PATH)
else
    fail

I'm not sure the first branch in the new code adds any value.

I have concerns about both the old and the new code:

  1. We often add a redundant directory to the path.
  2. We append to PATH rather than prepend, which is both right and wrong. It's right in that we don't pre-empt other programs, but it's wrong in that we don't guarantee that we get the versions of the tools we want.
  3. We require all of the tools to come from a single directory. If grep.exe exists in one place in the path but diff.exe is in another, it's a failure. Furthermore, given #2, if we find them both in the same directory, that doesn't mean both (or either) will be invoked from the same directory.

These larger concerns are outside the bounds of this patch, so I won't block on those.

llvm/utils/lit/lit/llvm/config.py
23–24

Not part of your change, but "sane" seems unnecessarily judgmental. Perhaps "necessary" conveys the point.

And then maybe make a variable with the list of tools we need so that it's kind of documented in one place. Something like:

tools_needed = ['cmp.exe', 'grep.exe', 'sed.exe', 'diff.exe', 'echo.exe']

And then we can pass that in to the couple of places we need it.

23–24

BTW, this looks redundant, as it's set below in a cascaded if starting about line 53.

30

Extending the list of tools to check for is great. This would be the first place to use tools_needed as I mentioned in the previous comment.

32

You could also pass tools_needed here.

129

I wonder if it's necessary to search both WOW64 hives explicitly.

  • If lit is a 64-bit process (on 64-bit Windows), then a regular search of HKCU or HKLM would find Git, regardless Git's bitness--it wouldn't have to make two distinct searches. (And, if both Gits were installed, you'd get the 64-bit one with a regular search, but the 32-bit one with this explicit technique.)
  • If lit is a 32-bit process on 32-bit Windows, you're screwed because the WOW nodes won't exist. Unless the lookup ignores those bits...?
  • If lit is a 32-bit process on 64-bit Windows, a regular search wouldn't find a 64-bit Git installation. That's the only case this solves.

Registry searches can take a long time. So let's say we want to optimize for the case where Git is installed; if it isn't installed, we're going to fail anyway because we only got here because the tools don't exist elsewhere.

So I think I'd use masks = [0, winreg.KEY_WOW64_64]. In the common case, we'll find Git on the first try, regardless of its bitness. In the less-common case, we'll make a last ditch attempt which may end up repeating part of a search we've already done.

130

I'd probably switch these to check HKCU first, as that's the one that would usually take precedence, and it's also the one the user has easier control over (if they decide to install Git).

134

Is the inner try-block necessary? If the query below fails without the inner try block, wouldn't the exception just propagate to the outer except: continue?

145

There's already a function that does this part in util.py. I think all you need to do is:

if not lit.util.checkToolsPath(candidate_path, tools_needed):
    continue

(assuming tools_needed was passed in to this function as I suggested earlier).

This revision now requires changes to proceed.Jul 23 2020, 10:52 AM
akhuang added inline comments.Jul 23 2020, 10:55 AM
llvm/utils/lit/lit/llvm/config.py
126

It seems import winreg doesn't work in python 2.7

zturner updated this revision to Diff 282638.Aug 3 2020, 8:49 AM
zturner marked 7 inline comments as done.Aug 3 2020, 8:53 AM

I'm not sure the first branch in the new code adds any value.

The first branch is important because lit_tools_path is up to the individual local config to define. In other words, if the user does not specifically write config.lit_tools_path = Foo/Bar then this will be undefined and they'll get strange errors. In all LLVM cases, the configs do this, and this knowledge has been propagated across all LLVM lit configs by way of copy/paste. But it's a silent / undocumented requirement that caused me some confusion until I tracked it down. So it seems better to be resilient in the case of it not being there.

We often add a redundant directory to the path.

Internally, I believe the method to add a new directory to path will de-duplicate if it's already there.

We append to PATH rather than prepend, which is both right and wrong. It's right in that we don't pre-empt other programs, but it's wrong in that we don't guarantee that we get the versions of the tools we want.

Despite being called append, it actually *does* prepend. I agree this is confusing.

We require all of the tools to come from a single directory. If grep.exe exists in one place in the path but diff.exe is in another, it's a failure. Furthermore, given #2, if we find them both in the same directory, that doesn't mean both (or either) will be invoked from the same directory.

This part is definitely true.

llvm/utils/lit/lit/llvm/config.py
126

It's apparently called _winreg in Python 2.7. I've fixed this.

130

I've left this as is. On my local installation, which I installed using all defaults and without explicitly trying to elevate when double clicking the installer, It's under HKLM.

I don't have comments on the patch itself, but I would like to say that as someone who develops primarily on Windows: thank you, I really appreciate this effort to get away from GnuWin32!

amccarth accepted this revision.Aug 3 2020, 10:23 AM

I have a couple responses, but nothing worth blocking this path for. LGTM, and thanks for doing this!

I'm already working on an update to the LLVM/Windows developer docs to pull the GnuWin requirement along with some other bits.

I'm not sure the first branch in the new code adds any value.

The first branch is important because lit_tools_path is up to the individual local config to define. In other words, if the user does not specifically write config.lit_tools_path = Foo/Bar then this will be undefined and they'll get strange errors.

Ah, I didn't realize that was a bug in the old code.

If there is no config.lit_tools_dir (or if it isn't a directory that contains the tools), then the new code is going to skip checking PATH and just go searching for the Git tools. I guess that's fine, but it might be surprising to someone who still has GnuWin in their path.

llvm/utils/lit/lit/llvm/config.py
130

tl;dr: Not a big deal, since this would be rare and unusual.

In the rare case that one version is installed for all users (HKLM), but another version is installed for this user (HKCU), then ideally the HKCU version would be selected. Searching HKLM won't.

(If you managed to install to HKLM, then the installer process was running as Administrator. Maybe you kicked it off from an already-elevated command console.)

This revision is now accepted and ready to land.Aug 3 2020, 10:23 AM

I went ahead and committed this. Let's see if I break anything, it's been a long time since I've committed a patch and I haven't committed anything since the git switchover :D

llvm/utils/lit/lit/llvm/config.py
130

Ok, you convinced me. I switched the order to check HKCU first.

aganea added a subscriber: aganea.Sep 22 2020, 3:31 PM

I went ahead and committed this. Let's see if I break anything, it's been a long time since I've committed a patch and I haven't committed anything since the git switchover :D

Hey @zturner! I can't find a trace of this patch upstream, has it landed?

That's.... Pretty strange. I was pretty sure I landed it. Feel free to
do this on my behalf. Sorry about that.

This revision was automatically updated to reflect the committed changes.
aganea added a comment.EditedSep 24 2020, 10:01 AM

@zturner I've commited it. It works for most projects (without git or GnuWin32 in the path), although not compiler-rt, that would require some more changes. It seems their setup is different and it doesn't reach the code you've modified above (probably because they don't call lit.llvm.initialize among other things).

rnk added a subscriber: rnk.Oct 1 2020, 11:27 AM

This ended up breaking my windows buildbot:
http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/18832

It took me a long time to understand the error messages, but what's going on is that C:\Windows\system32 is earlier in the PATH than git's usr/bin coreutils. lit doesn't do any favors, it doesn't print the PATH, so you get an error message like this:

$ "find" "c:\b\slave\clang-x64-windows-msvc\build\stage1\tools\clang\test\VFS\Output\module_missing_vfs.m.tmp/mcp" "-name" "A-*.pcm"
# command stderr:
File not found - A-*.pcm

If you are really clever, maybe you will realize that this error message comes from Windows find, not coreutils find.

On the bot, it appears to me in regedit that HKEY_LOCAL_MACHINE\SOFTWARE\GitForWindows is present, but the loop doesn't find anything. Any tips for helping it find it?

In D84380#2307030, @rnk wrote:

Figured it out, Python 2 vs 3 bug:
rGd12ae042e17b27ebc8d2b5ae3d8dd5f88384d093

Nice catch. I suspect this was part of the problem we were having setting up the new Win32 bot as well. And it would explain why I couldn't repro it: I don't have Python 2.x installed.

Since this was committed, all of our internal bots (but not the external windows bot!) are hanging when the tests run. I've looked into it some and I noticed a couple of things:

  1. We are *pre*pending to the path, so our existing working setup with gnuwin32 is ignored. I think it would be better to add the paths at the end, so that what already works is not broken.
  2. All of the tools that are found along with git (tar.exe, echo.exe, etc.) never return when called in this setup. I am not sure why this is since we have a fairly new version of git and the tools, so they should work correctly, but they don't. This may be a simple update requirement on our part, but if that's the case, the updated documentation should specify what version of git and the tools is required.
llvm/utils/lit/lit/llvm/config.py
24

Why did you drop this? My understanding is that this is needed in the tests to determine when to run/not run some of them based on the system.

aganea added a comment.Oct 2 2020, 7:39 PM
  1. We are *pre*pending to the path, so our existing working setup with gnuwin32 is ignored. I think it would be better to add the paths at the end, so that what already works is not broken.

It appears the behavior before was to preprend the tools' found path, to the application's %PATH%, to rightly use C:\Program Files\Git\usr\bin\find.exe but not C:\WINDOWS\system32\find.EXE which would appear most certainly first in %PATH%. I think the important change in behavior after this patch is that now we check if lit_tools_dir: (L28) which is empty, and most likely always use the git registry detection. Maybe if L28 wasn't there, things would work correctly if you already had the required tools in %PATH%? Would you possibly try that please?

  1. All of the tools that are found along with git (tar.exe, echo.exe, etc.) never return when called in this setup. I am not sure why this is since we have a fairly new version of git and the tools, so they should work correctly, but they don't. This may be a simple update requirement on our part, but if that's the case, the updated documentation should specify what version of git and the tools is required.

Would it possible to share a bit more about your environement? Which python version? Which git version? Can you share the contents of PATH?
When the tests become stuck on your machine, can you look at the threads with Process Explorer or with a debugger?
Does this happen randomly, or all the time?

llvm/utils/lit/lit/llvm/config.py
24

It was already there, please see L53.

In D84380#2309853, @aganea wrote: Maybe if L28 wasn't there, things would work correctly if you already had the required tools in %PATH%? Would you possibly try that please?

I will try it. I'll let you know if it makes a difference.

Would it possible to share a bit more about your environement? Which python version? Which git version? Can you share the contents of PATH?

It is *mostly* the same as the public buildbot. The only difference is that internally we use VS to build and the buildbot uses ninja. I am not sure why that would cause this though.

Python: 3.6.6
Git: 2.25.0.windows.1
PATH: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\Scripts\;C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\;C:\Windows\system32;C:\Windows;C:\Program Files (x86)\swigwin-3.0.12;C:\Program Files (x86)\GetGnuWin32\gnuwin32\bin;c:\Python27\Scripts\;C:\Program Files\Git\cmd;C:\Program Files\CMake\bin;

When the tests become stuck on your machine, can you look at the threads with Process Explorer or with a debugger?
Does this happen randomly, or all the time?

The tests get stuck every time on all three of the internal bots that we have when looking for tar.exe inside lld/test/lit.cfg.py (see line 104). When I look at the current processes, python.exe has tar.exe as a child and tar.exe (from Git\usr\bin) is not exiting. If I manually kill tar.exe, the tests resume but then every test that uses a tool from Git\usr\bin times out when it reaches the tool. That's as far as I've had time to debug this so far.

llvm/utils/lit/lit/llvm/config.py
24

I didn't think sys.platform == 'win32' is equivalent to platform.system() == 'Windows'. Is it?

In D84380#2309853, @aganea wrote: Maybe if L28 wasn't there, things would work correctly if you already had the required tools in %PATH%? Would you possibly try that please?

I will try it. I'll let you know if it makes a difference.

Yes, that works. We don't end up prepending the lit path and the tests work as before.

zturner added a comment.EditedOct 4 2020, 1:21 PM
In D84380#2309853, @aganea wrote: Maybe if L28 wasn't there, things would work correctly if you already had the required tools in %PATH%? Would you possibly try that please?

I will try it. I'll let you know if it makes a difference.

Yes, that works. We don't end up prepending the lit path and the tests work as before.

For some context, L28 was added because the previous code made the underlying assumption that this variable was defined. Specifically, this:

path = self.lit_config.getToolsPath(lit_tools_dir, config.environment['PATH'], required_tools)

will raise an exception in the case that lit_tools_dir==None. It sounds like you're saying lit_tools_dir = [] in your case. If so, instead of removing this line, can it be changed to if lit_tools_dir != None:?

rnk added a comment.Oct 5 2020, 11:31 AM

Please don't remove the prepending, it is necessary for the gnu tools to beat out C:\Windows\system32. Removing it will break my bot.

At some point, you will need to debug why the MSys git coreutils commands are timing out on your bot, because they are what we recommend. git is obviously required to do LLVM development on Windows, so every developer must already have these tools. We want to standardize on the git tools and remove any mention of gnuwin32 tools from our docs.

In D84380#2309853, @aganea wrote: Maybe if L28 wasn't there, things would work correctly if you already had the required tools in %PATH%? Would you possibly try that please?

I will try it. I'll let you know if it makes a difference.

Yes, that works. We don't end up prepending the lit path and the tests work as before.

For some context, L28 was added because the previous code made the underlying assumption that this variable was defined. Specifically, this:

path = self.lit_config.getToolsPath(lit_tools_dir, config.environment['PATH'], required_tools)

will raise an exception in the case that lit_tools_dir==None. It sounds like you're saying lit_tools_dir = [] in your case. If so, instead of removing this line, can it be changed to if lit_tools_dir != None:?

if lit_tools_dir is not None: works also. @rnk - Does that work on your bot also? It would be ideal to not break either setup until we can figure out why the git tools don't work for everyone.

rnk added a comment.Oct 5 2020, 12:51 PM

if lit_tools_dir is not None: works also. @rnk - Does that work on your bot also? It would be ideal to not break either setup until we can figure out why the git tools don't work for everyone.

I have no idea if lit_tools_dir is defined, but I think the tools are already on PATH, just in the wrong order. If you can make this code check PATH for the tools and then do the preprending, that would work.

hans added a subscriber: hans.Oct 6 2020, 8:54 AM

It seems confusing to me to automagically use tools that are not on PATH (or passed in explicitly as cmake flags or similar).

Can't folks who want to use the git tools put them on their PATH?

After this patch I think it's surprising to users why the git tools are used when e.g. the gnuwin tools are the only ones available on PATH.

It seems confusing to me to automagically use tools that are not on PATH (or passed in explicitly as cmake flags or similar).

Can't folks who want to use the git tools put them on their PATH?

After this patch I think it's surprising to users why the git tools are used when e.g. the gnuwin tools are the only ones available on PATH.

I started D88850 to make the git tools be added to the path only if other tools are not already in the path because they are failing in our setup, so the discussion has moved there. But I agree, letting the users add the tools (of their choosing) to the path themselves seems like the best option.

I'll revert the patch for now.
In the meanwhile, we can still do cmake -DLLVM_LIT_TOOLS_DIR="C:/Program Files/Git/usr/bin" and achieve the same purpose.
There's also an option in the Git installer to add the GNU tools to %PATH% automatically, which works out-of-the-box without this patch, and without GnuWin32:

It seems confusing to me to automagically use tools that are not on PATH (or passed in explicitly as cmake flags or similar).

It's confusing me why builds should be so sensitive to the user's environment. Sometimes what you want or need in your shell is different than what the build needs. We essentially have an high dimensional matrix of build configurations, which is completely unsupportable. This change was a step in the right direction in that it lopped off one of the countless dimensions where things could go wrong by making it easy to use current, supported tools.

aganea added a comment.Oct 6 2020, 1:29 PM

Sometimes what you want or need in your shell is different than what the build needs.

Adrian, are there use cases where the steps in my previous post (https://reviews.llvm.org/D84380#2314959) wouldn't work without this patch, or where auto-detection is preferable? If Git isn't in %PATH%, one could add -DLLVM_LIT_TOOLS_DIR, no? I reckon that's an extra option to provide (as opposed to auto-detection), but that leaves latitude to users?

I think one thing that was missing here is visiblity. If we display the auto-detected path, and tell users by the same occasion that it can be changed through some option, maybe that would be better. One other issue is maybe that auto-detection comes late in the process - when running the tests. Shall we do that in cmake instead? Also, we're mixing a runtime setup (%PATH% or registry detection) and a static setup (LLVM_LIT_TOOLS_DIR).

hans added a comment.Oct 6 2020, 2:56 PM

It seems confusing to me to automagically use tools that are not on PATH (or passed in explicitly as cmake flags or similar).

It's confusing me why builds should be so sensitive to the user's environment. Sometimes what you want or need in your shell is different than what the build needs.

If the build runs in the shell it seems natural to me that it should depend on the shell environment.

We essentially have an high dimensional matrix of build configurations, which is completely unsupportable. This change was a step in the right direction in that it lopped off one of the countless dimensions where things could go wrong by making it easy to use current, supported tools.

To me this seems to have added an extra dimension. Previously, lit would look for the tools on PATH, now it was made to look in two places.

To me this seems to have added an extra dimension. Previously, lit would look for the tools on PATH, now it was made to look in two places.

What I really wanted to do was remove support for using PATH entirely. I was worried about breaking downstream users (and in this case my worry was justified). At the same time, we should not hold the open-source project back because of problems with downstream users' environments. What we should aim for is a completely hermetic environment, which will never be achievable with a dependence on PATH.

By reverting this, we are going to be stuck with a required dependency on a project that has been unmaintained for close to a decade with no path forward. That doesn't seem like the right direction.

aganea added a comment.Oct 6 2020, 5:57 PM

@zturner I didn't mean to shelve this. It seemed that there was no agreement on how things should work, and at the same time I simply wanted to unblock @stella.stamenova. I'll upload a new patch shortly, after fixing the issues.

aganea added a comment.EditedOct 6 2020, 6:07 PM

Also, worth nothing that C:\Program Files\Git\usr\bin is not enough to run the lldb tests, which need make and maybe other things.
I'm not sure if that was part of your original goal, but since clang changes can affect lldb behavior, that means developers would still need some additionnal tools along Git for Windows.

Also, worth nothing that C:\Program Files\Git\usr\bin is not enough to run the lldb tests, which need make and maybe other things.
I'm not sure if that was part of your original goal, but since clang changes can affect lldb behavior, that means developers would still need some additionnal tools along Git for Windows.

One more note: even after installing make with choco install make and adding it to %PATH%, many lldb tests still fail because we call sh F:\llvm-project\build\foo and the backslashes are escaped by sh. We end up with F:lvm-projectuildoo something like this. We could maybe insert cygpath before the calls to sh to fix that in the lldb scripts.

Also, worth nothing that C:\Program Files\Git\usr\bin is not enough to run the lldb tests, which need make and maybe other things.
I'm not sure if that was part of your original goal, but since clang changes can affect lldb behavior, that means developers would still need some additionnal tools along Git for Windows.

One more note: even after installing make with choco install make and adding it to %PATH%, many lldb tests still fail because we call sh F:\llvm-project\build\foo and the backslashes are escaped by sh. We end up with F:lvm-projectuildoo something like this. We could maybe insert cygpath before the calls to sh to fix that in the lldb scripts.

I was under the impression that lit tests used a fake built-in shell on Windows. Under what circumstances are Windows lit tests calling sh.exe?

I ask because I'm still trying to figure out why 899 LLDB tests started failing for me on Friday. (Actually, it was 936 tests, but I fixed 37 of them with a fix to a line that was introduced almost a year ago. I have not idea why the problems just started appearing on the 9th.)

I ask because I'm still trying to figure out why 899 LLDB tests started failing for me on Friday. (Actually, it was 936 tests, but I fixed 37 of them with a fix to a line that was introduced almost a year ago. I have not idea why the problems just started appearing on the 9th.)

@amccarth I am not aware of anything recently that would have caused this, but if you send me some examples, I could have a look and let you know if it rings a bell :)

I ask because I'm still trying to figure out why 899 LLDB tests started failing for me on Friday. (Actually, it was 936 tests, but I fixed 37 of them with a fix to a line that was introduced almost a year ago. I have not idea why the problems just started appearing on the 9th.)

Adrian, can it be rG97e7fbb343e2a4ea913686254105b9965c5468f8 or rG79809f58b02419a5d1bfb6c9a59dbd13cd038c77? What errors do you see? Can you paste the log?

I have no reason to suspect those patches in particular. I did a huge git
bisect, which turned up nothing. Older versions that used to work no
longer worked by the time the bisect worked that far back. That makes me
think it might be incomplete dependency checking: Something goes bad, and,
instead of rebuilding it, the bad result is re-used in each subsequent
build.

The 37 tests that I fixed were due to a lit config template that mangled
one of the zillion Python home environment variables because it failed to
escape the backslashes in a Windows path (which is what made me suspect a
connection to this patch). But I cannot explain why that hasn't been a
problem since last December when the error was committed.

The entire log is over 10,000 lines. In a random sampling, most of the
remaining failures look like FileCheck miscompares, as though the tests are
catching real problems. Other than that, I don't see a pattern. I'll be
working on this today, and, if I'm still stumped, I'll start a separate
thread with examples from the log.