This is an archive of the discontinued LLVM Phabricator instance.

[lit] Attempt for fix tests failing because of 'warning: non-portable path to file'
ClosedPublic

Authored by krisb on May 24 2021, 5:46 AM.

Details

Summary

This is an attempt to fix clang test failures due to 'nonportable-include-path'
warnings on Windows when a path to llvm-project's base directory contains some
uppercase letters.

For example, for 'clang/test/PCH/enum.c' test running from 'J:\MyDir\llvm-project'
lit generates the following command for the first RUN line:

"j:\mydir\build\bin\clang.exe" "-cc1" "-internal-isystem" "j:\mydir\build\lib\clang\1.12.0\include" "-nostdsysteminc" "-include" "j:\mydir\llvm-project\clang\test\PCH/enum.h" "-fsyntax-only" "j:\mydir\llvm-project\clang\test\PCH\enum.c"

which produces the following warning causing the test to fail:

<built-in>:1:10: warning: non-portable path to file '"J:\MyDir\llvm-project\clang\test\PCH/enum.h"'; specified path differs in case from file name on disk [-Wnonportable-include-path]
#include "j:\mydir\llvm-project\clang\test\PCH/enum.h"
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         "J:\MyDir\llvm-project\clang\test\PCH/enum.h"
1 warning generated.

The problem caused by two issues:

  • discovery.py loads site config in lower case causing all the paths

based on file and requested within the config file to be in lowercase as well,

  • neither os.path.abspath() nor os.path.realpath() (both used to obtain paths of

config files, sources, object directories, etc) do not return paths in correct
case for Windows (at least consistently for all python versions).

As os.path library doesn't seem to provide any relaible way to restore
case for paths on Windows, this patch proposes to use pathlib.resolve().

pathlib is a part of Python 3.4 [0] while llvm lit requires Python 3.6 [1].

Note: I noticed the same problem reported in D78169, but the suggestion
to use Python 3.8. doesn't seem working (at least for my setup).

[0] https://docs.python.org/3/library/pathlib.html
[1] https://llvm.org/docs/GettingStarted.html

Diff Detail

Event Timeline

krisb created this revision.May 24 2021, 5:46 AM
krisb requested review of this revision.May 24 2021, 5:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 24 2021, 5:46 AM
Meinersbur added a comment.EditedJun 1 2021, 6:59 PM

Besides the switch from os.path to pathlib, is the change to first build the path, then to resolve it (instead of first using abspath to resolve it, then concat the path) intentional?

Could this also be fixed by making discovery.py consistently use the the map value instead its normcase-d key when building paths?

krisb added a comment.Jun 4 2021, 7:12 AM

Besides the switch from os.path to pathlib, is the change to first build the path, then to resolve it (instead of first using abspath to resolve it, then concat the path) intentional?

Oh, surely no. I'll add back resolving first. Thank you for pointing to this!

Could this also be fixed by making discovery.py consistently use the the map value instead its normcase-d key when building paths?

We can do smth like

--- a/llvm/utils/lit/lit/discovery.py
+++ b/llvm/utils/lit/lit/discovery.py
@@ -53,8 +53,7 @@ def getTestSuite(item, litConfig, cache):
         config_map = litConfig.params.get('config_map')
         if config_map:
             cfgpath = os.path.realpath(cfgpath)
-            cfgpath = os.path.normcase(cfgpath)
-            target = config_map.get(cfgpath)
+            target = config_map.get(os.path.normcase(cfgpath))
             if target:
                 cfgpath = target

for discovery.py so that site config will be loaded in the 'original' case (and this seems to be right things to do), but it will not fully fix the problem. Additionally, we can reslove the cfgpath here before loading , smth like

--- a/llvm/utils/lit/lit/discovery.py
+++ b/llvm/utils/lit/lit/discovery.py
@@ -53,8 +53,7 @@ def getTestSuite(item, litConfig, cache):
         config_map = litConfig.params.get('config_map')
         if config_map:
             cfgpath = os.path.realpath(cfgpath)
-            cfgpath = os.path.normcase(cfgpath)
-            target = config_map.get(cfgpath)
+            target = config_map.get(os.path.normcase(cfgpath))
             if target:
                 cfgpath = target
+            cfgpath = str(Path(cfgpath).resolve())

and that seems to do the trick. But I still think that the implementation of path() function should be changed to return real correctly cased paths and do not rely on the case of __file__.

krisb updated this revision to Diff 349866.Jun 4 2021, 7:21 AM
krisb retitled this revision from [lit] Make LLVM_LIT_PATH_FUNCTION to use pathlib to [lit] Attempt for fix tests failing because of 'warning: non-portable path to file'.
krisb edited the summary of this revision. (Show Details)

Resolve __file__ path first, avoid normcasing site config paths in discovery.py.

thakis added a comment.Jun 4 2021, 7:34 AM

Besides the switch from os.path to pathlib, is the change to first build the path, then to resolve it (instead of first using abspath to resolve it, then concat the path) intentional?

Oh, surely no. I'll add back resolving first. Thank you for pointing to this!

Could this also be fixed by making discovery.py consistently use the the map value instead its normcase-d key when building paths?

We can do smth like

--- a/llvm/utils/lit/lit/discovery.py
+++ b/llvm/utils/lit/lit/discovery.py
@@ -53,8 +53,7 @@ def getTestSuite(item, litConfig, cache):
         config_map = litConfig.params.get('config_map')
         if config_map:
             cfgpath = os.path.realpath(cfgpath)
-            cfgpath = os.path.normcase(cfgpath)
-            target = config_map.get(cfgpath)
+            target = config_map.get(os.path.normcase(cfgpath))
             if target:
                 cfgpath = target

for discovery.py so that site config will be loaded in the 'original' case (and this seems to be right things to do), but it will not fully fix the problem.

I had tried that, but it didn't work: http://reviews.llvm.org/rGb9fd375d75d4bbf34453696127854d0192e3ccf6

krisb added a comment.Jun 5 2021, 3:28 AM

Besides the switch from os.path to pathlib, is the change to first build the path, then to resolve it (instead of first using abspath to resolve it, then concat the path) intentional?

Oh, surely no. I'll add back resolving first. Thank you for pointing to this!

Could this also be fixed by making discovery.py consistently use the the map value instead its normcase-d key when building paths?

We can do smth like

--- a/llvm/utils/lit/lit/discovery.py
+++ b/llvm/utils/lit/lit/discovery.py
@@ -53,8 +53,7 @@ def getTestSuite(item, litConfig, cache):
         config_map = litConfig.params.get('config_map')
         if config_map:
             cfgpath = os.path.realpath(cfgpath)
-            cfgpath = os.path.normcase(cfgpath)
-            target = config_map.get(cfgpath)
+            target = config_map.get(os.path.normcase(cfgpath))
             if target:
                 cfgpath = target

for discovery.py so that site config will be loaded in the 'original' case (and this seems to be right things to do), but it will not fully fix the problem.

I had tried that, but it didn't work: http://reviews.llvm.org/rGb9fd375d75d4bbf34453696127854d0192e3ccf6

Yeah, I've seen this patch, but it doesn't seem like the case now.

When the PWD has a lower-case drive letter, like after running cd c:\ with a lower-case "c:" (cmd's default is upper-case, but it takes case-ness from what's passed to cd apparently).

This should be a 'drive letter' specific problem. Since you've already made Wnonportable-include-path tolerant to drive letter case, it seems no longer a problem.

When manually passing an absolute path to llvm-lit with a lower-case drive letter: python bin\llvm-lit.py -sv c:\llvm-project\clang\test\PCH

I guess the issue here is in assert t is None or t == cfgpath statement as t == cfgpath assumption seems not fully correct. cfgpath in this case is a user-specified absolute path, so it may or may not be the same as one that was constructed by os.path.abspath() and added to the map.
If to comment the assert out everything else would work as expected.

On the other side, os.path.abspath() for relative paths bases on cwd and its case, and may return different results for the same file/directory, depending on the case of cwd. os.path.realpath() doesn't seem to work consistently as well, especially for msys/mingw setups. But Path.resolve() always returns the same on-disk path for the specified file or directory (at least in my experiments, but it would be really great if somebody could try this patch and tell if they see some issues with it).

Besides the switch from os.path to pathlib, is the change to first build the path, then to resolve it (instead of first using abspath to resolve it, then concat the path) intentional?

Oh, surely no. I'll add back resolving first. Thank you for pointing to this!

Now it is resolving twice?!?

@thakis To not get lost in nitpicking, I would accept this unless you have objections.

krisb added a comment.Jun 8 2021, 8:41 AM

Besides the switch from os.path to pathlib, is the change to first build the path, then to resolve it (instead of first using abspath to resolve it, then concat the path) intentional?

Oh, surely no. I'll add back resolving first. Thank you for pointing to this!

Now it is resolving twice?!?

I think it would be better to get rid of lots of \..\..\..\..\ when path() is used to construct paths for source, object, and other directories and files within a site config. So this is why I made it resolved twice (the first one for __file__ to remove symlinks and the second one to simplify the resulting path). What do you think? Does this make sense?

Meinersbur accepted this revision.Jun 8 2021, 5:52 PM

Now it is resolving twice?!?

I think it would be better to get rid of lots of \..\..\..\..\ when path() is used to construct paths for source, object, and other directories and files within a site config. So this is why I made it resolved twice (the first one for __file__ to remove symlinks and the second one to simplify the resulting path). What do you think? Does this make sense?

I originally asked whether changing the point where resolve is called is intentional, and this seems to be the intention (although unrelated to this patch). The second call will resolve all the paths of __file__ as well, making the first resolve redundant.

Greenlighting the patch, but please remove the first call to resolve().

This revision is now accepted and ready to land.Jun 8 2021, 5:52 PM
krisb added a comment.Jun 9 2021, 7:16 AM

Now it is resolving twice?!?

I think it would be better to get rid of lots of \..\..\..\..\ when path() is used to construct paths for source, object, and other directories and files within a site config. So this is why I made it resolved twice (the first one for __file__ to remove symlinks and the second one to simplify the resulting path). What do you think? Does this make sense?

I originally asked whether changing the point where resolve is called is intentional, and this seems to be the intention (although unrelated to this patch). The second call will resolve all the paths of __file__ as well, making the first resolve redundant.

Greenlighting the patch, but please remove the first call to resolve().

Then it seems I misinterpreted your comment.
Resolving __file__ first and then concatenating is not the same as concatenating then resolving if we are talking about symlinks.
For example, having

~/workspace/llvm/llvm-project$ ln -s llvm/test/ test

we would get different results

$ python
Python 3.7.10 (default, Feb 20 2021, 21:15:28) 
[GCC 9.3.0] on linux
>>> from pathlib import Path
>>> Path('test').resolve().parent
PosixPath('/home/kbessonova/workspace/llvm/llvm-project/llvm')
>>> Path('test').parent.resolve()
PosixPath('/home/kbessonova/workspace/llvm/llvm-project')

I thought you were talking about this case (for example, if site configs are symlinks themselves or placed under symlinked directory within llvm tree) and it isn't something I was about to change.

The patch in the current form does both at the same time:

Path(__file__).resolve().parent.resolve()

I realize that it is actually different from

Path(__file__).parent.resolve()

but only if __file__ (that is, the site config) itself is a symlink. Which it should not be because the site config is written by CMake which does not create it as a symlink.

I still think this is overdoing the resolving. Please remove one of them. I assume you will want to keep the outer one such that the concatenated part (p) is resolved as well.

krisb updated this revision to Diff 351646.Jun 12 2021, 2:59 AM

Make path() resolving once.

This change has caused some problems for us - resolve doesn't respect the use of subst on windows which we've been using to avoid long file path problems. This is a particular issue for the llvm-lit tests.

@dstuttard, could you please give an example that reproduces the issue?

dstuttard added a comment.EditedSep 27 2021, 4:28 AM

@dstuttard, could you please give an example that reproduces the issue?

I've been trying to recreate this with a vanilla llvm-project checkout and build on windows, but I'm struggling to get it to work. The problem itself shows up in our automated testing which is part of a larger application, so creating something standalone is difficult.

However, if you check out llvm with a long path (say c:\Testing\MyLongPath\MyReallyReallyLongPath\ThisIsVeryLongIndeed\llvm-project) you should see the problem.
Alternatively, if you simply mount a directory as a new drive using subst (subst x: c:\Testing\MyLongPath\MyReallyReallyLongPath\ThisIsVeryLongIndeed) and build/run llvm-lit testing from x:\llvm-project, you'll see that the llvm-lit will use the original c:\..... path rather than the new x: drive

I'll continue to try and create a better example of a reproducer and update the ticket if I get it to work (the main issue is setting up a dev system for windows that works, rather than anything fundamental).