This is an archive of the discontinued LLVM Phabricator instance.

[lit, windows] Fix the search for git tools on Windows to check the path first
AbandonedPublic

Authored by stella.stamenova on Oct 5 2020, 1:27 PM.

Details

Reviewers
rnk
zturner
aganea
Summary

Always check the path first, even if lit_tools_dir is not set or None. This will make sure that existing configurations work and are not overwritten by the git tools. Once we know why git tools don't always work and can be reasonably sure that they will, we can check for git tools first.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptOct 5 2020, 1:27 PM
stella.stamenova requested review of this revision.Oct 5 2020, 1:27 PM
rnk added inline comments.Oct 5 2020, 1:53 PM
llvm/utils/lit/lit/llvm/config.py
26

I thought the implication was that getToolsPath will fail if you pass it None. I thought the suggestion was to make the default value here by [], so it will succeed.

llvm/utils/lit/lit/llvm/config.py
23

@aganea: From the discussion on zturner's change: Do you know if sys.platform == 'win32' is equivalent to platform.system() == 'windows'? If it is, we can clean this up. If it is not, we need to bring back features.add('system-windows'). As far as I can tell, they should be the same, but we are not using them that way.

26

getToolsPath (and other calls in that code path) check for None, so passing None works correctly.

stella.stamenova marked an inline comment as not done.Oct 5 2020, 2:17 PM
aganea added a comment.Oct 5 2020, 2:25 PM

@stella.stamenova Strangely, your initial bug description sounds like a deadlock initiated by tar_version.communicate() in https://github.com/llvm/llvm-project/blob/f7941d98091827b8d0b6fdabb731e38c99f44b13/lld/test/lit.cfg.py#L110
Are the lit tests using python 3.6 and not the 2.7 that you seem to have installed as well? In any case, I wonder if setting stderr=open(os.devnull) at lld/test/lit.cfg.py#L108, would fix the issue? Or perhaps just check the 'Threads' tab in Process Explorer on tar.exe, see why it is blocked.

llvm/utils/lit/lit/llvm/config.py
23

sys.platform == 'win32' should be equivalent to platform.system() == 'windows': https://stackoverflow.com/a/54837707 - were you thinking of an edge case?

stella.stamenova marked an inline comment as done.Oct 5 2020, 3:02 PM
stella.stamenova added inline comments.
llvm/utils/lit/lit/llvm/config.py
23

In my experience, there are frequently edge cases around 32b vs. 64b vs. wow64 and I don't know enough about python's platform and sys.platform to be sure that there isn't one here. Certainly, the code is not written as if the two conditions are equivalent.

I don't know whether Python and our config tools use the terms consistent, but Win32 is the name of the Windows API used on both 32- and 64-bit processors, Win16 was the API used on 16-bit machines, and Win64 is a made-up term that confuses the issues.

I don't know whether Python and our config tools use the terms consistent, but Win32 is the name of the Windows API used on both 32- and 64-bit processors, Win16 was the API used on 16-bit machines, and Win64 is a made-up term that confuses the issues.

Our code uses sys.platform (win32) and platform.system (windows) *both*. My assertion was that they are either the same, or not. If they are the same, we should pick one (in this file, at least) and run with it. If they are different, we should treat them as such and undo the change that to removed calling features.add('system-windows') in one of the cases. I don't know enough about python's APIs to be sure that there isn't an edge case, and I've certainly seen weird edge cases because of naming conventions elsewhere.

zturner added inline comments.Oct 5 2020, 5:18 PM
llvm/utils/lit/lit/llvm/config.py
26

I misspoke. The old code was:

if config.lit_tools_dir:`

which fails if lit_tools_dir is not actually an attribute of config. That was the problem I was trying to solve.

@stella.stamenova Strangely, your initial bug description sounds like a deadlock initiated by tar_version.communicate() in https://github.com/llvm/llvm-project/blob/f7941d98091827b8d0b6fdabb731e38c99f44b13/lld/test/lit.cfg.py#L110
Are the lit tests using python 3.6 and not the 2.7 that you seem to have installed as well? In any case, I wonder if setting stderr=open(os.devnull) at lld/test/lit.cfg.py#L108, would fix the issue? Or perhaps just check the 'Threads' tab in Process Explorer on tar.exe, see why it is blocked.

It does, doesn't it! But it's not. If I manually kill tar.exe, then any of the tests that use the git tools time out and fail as well, because all of the tools hang similarly. I attached a debugger to tar.exe after it hung and it is getting an access violation when trying to call GetCommandLineA, so I ran the tests as a different user and they passed. It looks like the tools that come with git can't run successfully under an account that has restricted permissions while the tools that come with GnuWin32 can (also, any other tools we are using in the tests run just fine under the same restricted account, the git tools are the exception). This makes me question whether using the git tools by default is the right choice - the failure was not obvious and not easy to track down. Granted, it is not one that will likely happen on a developer machine, but anyone who is trying to set up automated testing on Windows might (and likely should) be using a restricted account. If we do decide to proceed with using the git tools by default, this should at least be documented and gnuwin32 should be an alternative. Thoughts?

aganea added inline comments.Oct 6 2020, 12:09 PM
llvm/utils/lit/lit/llvm/config.py
23

By quickly browsing through the Python source code (both 2.7.18 and 3.9.0), it seems sys.platform always returns win32 (see PC/pyconfig.h, #define PLATFORM), although values of win16, dos were historically supported, but they aren't anymore. platform.system() always returns Windows regardless of the CPU architecture or the bitness. We can safely assume they are interchangeable when running on Windows.

@stella.stamenova Strangely, your initial bug description sounds like a deadlock initiated by tar_version.communicate() in https://github.com/llvm/llvm-project/blob/f7941d98091827b8d0b6fdabb731e38c99f44b13/lld/test/lit.cfg.py#L110
Are the lit tests using python 3.6 and not the 2.7 that you seem to have installed as well? In any case, I wonder if setting stderr=open(os.devnull) at lld/test/lit.cfg.py#L108, would fix the issue? Or perhaps just check the 'Threads' tab in Process Explorer on tar.exe, see why it is blocked.

It does, doesn't it! But it's not. If I manually kill tar.exe, then any of the tests that use the git tools time out and fail as well, because all of the tools hang similarly. I attached a debugger to tar.exe after it hung and it is getting an access violation when trying to call GetCommandLineA, so I ran the tests as a different user and they passed. It looks like the tools that come with git can't run successfully under an account that has restricted permissions while the tools that come with GnuWin32 can (also, any other tools we are using in the tests run just fine under the same restricted account, the git tools are the exception). This makes me question whether using the git tools by default is the right choice - the failure was not obvious and not easy to track down. Granted, it is not one that will likely happen on a developer machine, but anyone who is trying to set up automated testing on Windows might (and likely should) be using a restricted account. If we do decide to proceed with using the git tools by default, this should at least be documented and gnuwin32 should be an alternative. Thoughts?

Thanks for investigating! Isn't that related to access rights to the binaries, rather than an issue with the Git tools?

@stella.stamenova Strangely, your initial bug description sounds like a deadlock initiated by tar_version.communicate() in https://github.com/llvm/llvm-project/blob/f7941d98091827b8d0b6fdabb731e38c99f44b13/lld/test/lit.cfg.py#L110
Are the lit tests using python 3.6 and not the 2.7 that you seem to have installed as well? In any case, I wonder if setting stderr=open(os.devnull) at lld/test/lit.cfg.py#L108, would fix the issue? Or perhaps just check the 'Threads' tab in Process Explorer on tar.exe, see why it is blocked.

It does, doesn't it! But it's not. If I manually kill tar.exe, then any of the tests that use the git tools time out and fail as well, because all of the tools hang similarly. I attached a debugger to tar.exe after it hung and it is getting an access violation when trying to call GetCommandLineA, so I ran the tests as a different user and they passed. It looks like the tools that come with git can't run successfully under an account that has restricted permissions while the tools that come with GnuWin32 can (also, any other tools we are using in the tests run just fine under the same restricted account, the git tools are the exception). This makes me question whether using the git tools by default is the right choice - the failure was not obvious and not easy to track down. Granted, it is not one that will likely happen on a developer machine, but anyone who is trying to set up automated testing on Windows might (and likely should) be using a restricted account. If we do decide to proceed with using the git tools by default, this should at least be documented and gnuwin32 should be an alternative. Thoughts?

Thanks for investigating! Isn't that related to access rights to the binaries, rather than an issue with the Git tools?

I haven't had time to dig deeper into this, but I don't think it's an issue of access *to* the binaries. The same process that runs the tests, runs git before that and it succeeds. I suppose the git installation could set different permissions on the different paths in the install, but I checked them and they appear to be uniform.

@stella.stamenova Strangely, your initial bug description sounds like a deadlock initiated by tar_version.communicate() in https://github.com/llvm/llvm-project/blob/f7941d98091827b8d0b6fdabb731e38c99f44b13/lld/test/lit.cfg.py#L110
Are the lit tests using python 3.6 and not the 2.7 that you seem to have installed as well? In any case, I wonder if setting stderr=open(os.devnull) at lld/test/lit.cfg.py#L108, would fix the issue? Or perhaps just check the 'Threads' tab in Process Explorer on tar.exe, see why it is blocked.

It does, doesn't it! But it's not. If I manually kill tar.exe, then any of the tests that use the git tools time out and fail as well, because all of the tools hang similarly. I attached a debugger to tar.exe after it hung and it is getting an access violation when trying to call GetCommandLineA, so I ran the tests as a different user and they passed. It looks like the tools that come with git can't run successfully under an account that has restricted permissions while the tools that come with GnuWin32 can (also, any other tools we are using in the tests run just fine under the same restricted account, the git tools are the exception). This makes me question whether using the git tools by default is the right choice - the failure was not obvious and not easy to track down. Granted, it is not one that will likely happen on a developer machine, but anyone who is trying to set up automated testing on Windows might (and likely should) be using a restricted account. If we do decide to proceed with using the git tools by default, this should at least be documented and gnuwin32 should be an alternative. Thoughts?

Thanks for investigating! Isn't that related to access rights to the binaries, rather than an issue with the Git tools?

I haven't had time to dig deeper into this, but I don't think it's an issue of access *to* the binaries. The same process that runs the tests, runs git before that and it succeeds. I suppose the git installation could set different permissions on the different paths in the install, but I checked them and they appear to be uniform.

Is it worth filing a bug against upstream Git For Windows project about this? Especially if you have an easily reproducible case.

stella.stamenova abandoned this revision.Oct 14 2020, 11:18 AM
aganea added a comment.EditedDec 10 2020, 6:44 PM

I've re-landed @zturner's patch and included this fix into rG0f1f13fcb17fbc8c93d505da989a04ab5cbd9ed3.