This is an archive of the discontinued LLVM Phabricator instance.

[lit][clang] Avoid realpath on Windows due to MAX_PATH limitations
ClosedPublic

Authored by MrTrillian on Jun 29 2023, 11:29 AM.

Details

Summary

Running lit tests on Windows can fail because its use of os.path.realpath expands substitute drives, which are used to keep paths short and avoid hitting MAX_PATH limitations.

Changes lit logic to:

  • Use os.path.abspath on Windows, where MAX_PATH is a concern that we can work around using substitute drives, which os.path.realpath would resolve.
  • Use os.path.realpath on Unix, where the current directory always has symlinks resolved, so it is impossible to preserve symlinks in the presence of relative paths, and so we must make sure that all code paths use real paths.

Also updates clang's FileManager::getCanonicalName and ExtractAPI code to avoid resolving substitute drives (ie resolving to a path under a different root).

How tested: built with -DLLVM_ENABLE_PROJECTS=clang and built check-all on both Windows

Diff Detail

Event Timeline

MrTrillian created this revision.Jun 29 2023, 11:29 AM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: delcypher. · View Herald Transcript
MrTrillian requested review of this revision.Jun 29 2023, 11:29 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript

How do I take action on the test failure? It runs without issues on my machine:

tristan@nuc-on-wood:~/project-llvm$ python3 build/bin/llvm-lit -sv --filter test.toy build/tools/mlir
/test/

Testing Time: 0.55s
  Excluded: 1636
  Passed  :    1

And I see that the failure is a timeout so I wonder if it's reliable?

Exit Code: -9
Timeout: Reached timeout of 60 seconds

All premerge build failures seem like flukes.

@rnk , I would appreciate your review on this since you helped with the previous iteration.

aaron.ballman added a subscriber: aaron.ballman.

Adding a few more folks who are interested in lit changes to try to get the review unstuck.

FWIW, I worry about the subtlety of the > change because it's not entirely clear to me when I'd need to use %>t in a test. I worry code reviewers will miss this sort of thing and we'll only find out there's an issue when the test fails for someone with a problematic path. Is there a rule of thumb we should be following for its use?

All premerge build failures seem like flukes.

FWIW, I agree.

MrTrillian added a comment.EditedJul 7 2023, 12:45 PM

Adding a few more folks who are interested in lit changes to try to get the review unstuck.

FWIW, I worry about the subtlety of the > change because it's not entirely clear to me when I'd need to use %>t in a test. I worry code reviewers will miss this sort of thing and we'll only find out there's an issue when the test fails for someone with a problematic path. Is there a rule of thumb we should be following for its use?

Thanks for the extra reviewers!

95% of the %>t are around clang modulemap files, because that code resolves real paths in C++ by design, so I can't avoid it. In fact I should rename PREFIX_EXPANDED to MODULEMAP_PREFIX so it would be much clearer.

There are three cases where I didn't expect to need the expanded paths: relative_include.m, case-insensitive-include-win.c and module-header-mismatches.m. There may be a way to change the clang implementation so it does not need to have expanded paths, but that felt like a different investigation.

I'm happy to consider alternative syntaxes to %>t too.

Adding a few more folks who are interested in lit changes to try to get the review unstuck.

FWIW, I worry about the subtlety of the > change because it's not entirely clear to me when I'd need to use %>t in a test. I worry code reviewers will miss this sort of thing and we'll only find out there's an issue when the test fails for someone with a problematic path. Is there a rule of thumb we should be following for its use?

Thanks for the extra reviewers!

95% of the %>t are around clang modulemap files, because that code resolves real paths in C++ by design, so I can't avoid it. In fact I should rename PREFIX_EXPANDED to MODULEMAP_PREFIX so it would be much clearer.

Okay, if this is mostly specific to module maps, that may resolve most of my concern (we don't add a lot of new tests there, so it's less of a burden for reviewers). CC @ChuanqiXu @Bigcheese @tahonermann to see if there are modules concerns with this.

There are three cases where I didn't expect to need the expanded paths: relative_include.m, case-insensitive-include-win.c and module-header-mismatches.m. There may be a way to change the clang implementation so it does not need to have expanded paths, but that felt like a different investigation.

I'm happy to consider alternative syntaxes to %>t too.

I think the syntax is reasonable enough (at least, I don't have arguably better suggestions), it was more just the "how will I know when to use it?" concerns.

I don't write module map a lot neither. But I am curious where is the definition for the term %>/t? It is indeed a little bit odd at the first look.

95% of the %>t are around clang modulemap files, because that code resolves real paths in C++ by design, so I can't avoid it.

Can you link to evidence that this behavior is by design? It isn't obvious to me why modulemap files would demand different behavior; especially since that would exacerbate MAX_PATH problems.

I'm not fond of the safe_abs_path name; "safe" doesn't communicate anything and the implementation is no more safe than os.path.abspath or os.path.realpath. Suggested alternatives:

  • short_abs_path
  • shortest_abs_path
  • abs_path_no_subst_drive
  • abs_path_preserve_drive

If the %> feature is going to remain (depending on, for example, the answer to @tahonermann's question about modulemap), please:

  1. Add it to the lit documentation at https://llvm.org/docs/TestingGuide.html#substitutions and https://llvm.org/docs/CommandGuide/lit.html#substitutions. (Aside: I wish someone would reconcile those lists.)
  2. Consider whether there's a clearer syntax. For example, we already have %{/t:regex_replacement}. What about %{t:real} and %{/t:real}?
  1. Extend lit's own test suite to cover it.

95% of the %>t are around clang modulemap files, because that code resolves real paths in C++ by design, so I can't avoid it.

Can you link to evidence that this behavior is by design? It isn't obvious to me why modulemap files would demand different behavior; especially since that would exacerbate MAX_PATH problems.

I agree with you. The original rationale seems to be here:
https://github.com/llvm/llvm-project/blob/926f3759ec62a8f170e76a60316cc0bdd9dd2ec9/clang/lib/Lex/HeaderSearch.cpp#L257

cpp
    // To avoid false-negatives, we form as canonical a path as we can, and map
    // to lower-case in case we're on a case-insensitive file system.

And this change has more context: https://reviews.llvm.org/D134923 by @benlangmuir

I'm not fond of the safe_abs_path name; "safe" doesn't communicate anything and the implementation is no more safe than os.path.abspath or os.path.realpath. Suggested alternatives:

  • short_abs_path
  • shortest_abs_path
  • abs_path_no_subst_drive
  • abs_path_preserve_drive

Thanks for the name suggestions, those are better indeed! I will update with the last one.

  • Renamed safe_abs_path to abs_path_preserve_drive @tahonermann
  • Renamed %>t and %>/t to %{t:real} and %{/t:real} @tahonermann
  • Renamed PREFIX_EXPANDED to SUBMODULE_PREFIX @aaron.ballman
  • Documented %{t:real} and %{/t:real} syntaxes @jdenny
  1. Extend lit's own test suite to cover it.

I submitted an update with your suggestions #1 and #2 but this #3 is much more difficult because of the nature of %{t:real}. I'm not sure I can have a source/target path that includes symlinks in a way that allows me to test this, and even more so with substitute drives since we can't know which drive letters are unallocated.

  1. Extend lit's own test suite to cover it.

I submitted an update with your suggestions #1 and #2 but this #3 is much more difficult because of the nature of %{t:real}.

Thanks for doing that. I'll try to review more carefully soon, hopefully early next week.

I'm not sure I can have a source/target path that includes symlinks in a way that allows me to test this, and even more so with substitute drives since we can't know which drive letters are unallocated.

Not being a windows person, it's hard for me to answer about substitute drives, but I understand your concern.

One possible approach is to verify the relationships among the various flavors of a substitution (e.g., %t, %{t:real}, etc.), and hope (or ensure) that some CI configs that run check-lit will use substitute drives for the source and/or build directory. The tests should pass regardless.

A less desirable approach is just to do minimal sanity checks, such as checking that the base file name is the same across all flavors of a substitution.

Of course, the least desirable approach is to depend on other subprojects' test suites to catch lit bugs. It's easier for lit developers to identify lit bugs when check-lit itself shows them, preferably locally but possibly in CI.

Thanks for taking a look at this.

This comment was removed by MrTrillian.

In D154130#4486898, @tahonermann wrote:

95% of the %>t are around clang modulemap files, because that code resolves real paths in C++ by design, so I can't avoid it.

Can you link to evidence that this behavior is by design? It isn't obvious to me why modulemap files would demand different behavior; especially since that would exacerbate MAX_PATH problems.

I agree with you. The original rationale seems to be here:
https://github.com/llvm/llvm-project/blob/926f3759ec62a8f170e76a60316cc0bdd9dd2ec9/clang/lib/Lex/HeaderSearch.cpp#L257

Thank you for finding that. I took a look at the code and it looks to me like it would be safe to change ModuleMap::canonicalizeModuleMapPath() to use a drive preserving canonicalization. I think the worst case impact would be that a module cache lookup would fail thereby leading to a cache miss and rebuild of the target module. That could impose a significant performance and drive space penalty in the event that substitute drive paths are changed, but I would expect such changes to be rare.

I took a look at the code and it looks to me like it would be safe to change ModuleMap::canonicalizeModuleMapPath() to use a drive preserving canonicalization

This sounds reasonable to me (the person who added canonicalizeModuleMapPath), though I am not at all a Windows path expert.

I took a look at the code and it looks to me like it would be safe to change ModuleMap::canonicalizeModuleMapPath() to use a drive preserving canonicalization

Symbolic links can point across drives so I think a drive preserving canonicalization cannot be much more than a conversion to an absolute path -- similar to the now renamed abs_path_preserve_drive I'm adding to lit. I think the logic would have to be split across Windows (only abs) and non-Windows (realpath) again. I also think a cache miss in this case is ok, so I will follow up with a clang change implementing this. Thanks!

That could impose a significant performance and drive space penalty in the event that substitute drive paths are changed, but I would expect such changes to be rare.

I think the more likely case is that one would build their repo under the substitute drive and later build it under the expanded path on the original drive. Though I feel like a lot of caching might then break.

MrTrillian edited the summary of this revision. (Show Details)

Update clang path canonicalization to avoid resolving substitute drives (ie resolving to a path under a different root). This requires much less change to tests.

Undo whitespace changes.

It is unclear to me why/when we would ever want the substitute drive expansion; the modified tests aren't very elucidating. My naive expectation is that we effectively want to restore the Python 3.7 behavior.

clang/test/ExtractAPI/relative_include.m
24 ↗(On Diff #542097)

I'm curious why this test requires the %{/t:real} normalization. Why would it be desirable to expand substitute drives for this test?

clang/test/Lexer/case-insensitive-include-win.c
14

I'm curious why this test requires the %{/t:real} normalization. Why would it be desirable to expand substitute drives for this test?

MrTrillian retitled this revision from [lit] Avoid os.path.realpath on Windows due to MAX_PATH limitations to [lit][clang] Avoid realpath on Windows due to MAX_PATH limitations.
MrTrillian edited the summary of this revision. (Show Details)

It is unclear to me why/when we would ever want the substitute drive expansion; the modified tests aren't very elucidating. My naive expectation is that we effectively want to restore the Python 3.7 behavior.

It took some more investigation to figure out why I still needed realpaths there. In the end:

  • case-insensitive-include.win.c: still needs it, added a comment.
  • relative_module.m: required a fix in ExtractAPIConsumer.cpp to avoid calling getRealPath.
tahonermann added inline comments.Jul 21 2023, 2:36 PM
clang/test/Lexer/case-insensitive-include-win.c
5–9

Hmm, is it really desirable or necessary that the case mismatch logic resolve substituted drives? I wouldn't think so. This seems like another case where our common canonicalization would be desired.

MrTrillian marked an inline comment as done.Jul 21 2023, 4:43 PM
MrTrillian added inline comments.
clang/test/Lexer/case-insensitive-include-win.c
5–9

I think it is necessary as I don't see how else we can retrieve the original casing of the file path. Merely making the path absolute would not do that. Searching for solutions on the internet finds ideas that are worse like converting to a short path then back to a long path or rebuilding the path one component at a time with a series directory listing requests.

MrTrillian marked an inline comment as not done.Jul 24 2023, 6:51 AM
MrTrillian marked an inline comment as done.Jul 24 2023, 7:59 AM

Marking comments as resolved per my reply. I'm not sure if that's best practice! @tahonermann

Looking forward to reviews

Fixed clang-format issues and improved test comment.

Marking comments as resolved per my reply. I'm not sure if that's best practice!

Yes, that is fine.

clang/test/Lexer/case-insensitive-include-win.c
5–9

The warning this test checks for is diagnosed in Preprocessor::HandleHeaderIncludeOrImport(); search for pp_nonportable_path and/or pp_nonportable_system_path. I believe this warning will be issued if any component of the path does not match the case of the include directive. Since the file name differs in case, unless there is a bug in handling of the file name, this test isn't sensitive to case mismatches in %t.

From a user point of view, resolving a substitute drive doesn't seem desirableto me with respect to determining whether a non-portable path is being used; I don't think the behavior should be dependent on whether a substitute drive is being used or what its target is.

Responded to comment.

clang/test/Lexer/case-insensitive-include-win.c
5–9

@tahonermann I think the code is working by design and it would be an unrelated change to modify its logic. See trySimplifyPath in PPDirectives.cpp:

// Below is a best-effort to handle ".." in paths. It is admittedly
// not 100% correct in the presence of symlinks.

      // If these path components differ by more than just case, then we
      // may be looking at symlinked paths. Bail on this diagnostic to avoid
      // noisy false positives.

The test was previously implicitly requiring getting the realpath, and it worked because lit.py's logic of doing that. Now that requirement is explicit in the test.

Ping. Any brave reviewer to help here?

benlangmuir added inline comments.Jul 26 2023, 3:52 PM
clang/lib/Basic/FileManager.cpp
655

8k is a lot of stack space. The only reason this was 4k in the first place is it was originally using char[PATH_MAX] and unix realpath directly. I'd suggest just dropping to 128 per path.

663

Removing .. can change where the path points in the presence of symlinks; is this needed?

clang/lib/ExtractAPI/ExtractAPIConsumer.cpp
190

Why is this change needed?

Reduced path small strings length to 256 (incidentally close to MAX_PATH on Windows), per @benlangmuir 's comment

MrTrillian marked an inline comment as done.Jul 27 2023, 6:41 AM
MrTrillian added inline comments.
clang/lib/Basic/FileManager.cpp
663

Removing .. can change where the path points in the presence of symlinks; is this needed?

@benlangmuir That's true and not ideal, but makeAbsolute will not resolve /./ or /../ in paths, so it's not a canonicalization and some tests were failing because of that. One alternative would be to use makeAbsolute + remove_dots on Windows (where removing dot dots is semantically correct) and getRealPath on Unix, like I do in lit. Suggestions?

clang/lib/ExtractAPI/ExtractAPIConsumer.cpp
190

Why is this change needed?

@benlangmuir We don't want raw getRealPaths on Windows because of substitute drives and MAX_PATH issues. That is the idea behind this diff. If I leave the tryGetRealPathName here, I need to change the relative_include.m test as in this previous diff: https://reviews.llvm.org/D154130?id=539683 , which is undesirable.

benlangmuir added inline comments.Jul 27 2023, 9:28 AM
clang/lib/Basic/FileManager.cpp
663

Wouldn't removing .. have the same issue with symlinks on Windows? I know symlinks are less common there, but it's not clear to me why it would be correct. I guess you could also check if the paths resolve to the same file after removing ..

clang/lib/ExtractAPI/ExtractAPIConsumer.cpp
190

I wonder if we should just remove tryGetRealPathName; it's not actually the real path in many cases. Anyway, not for this patch, your change here seems fine.

llvm/utils/lit/lit/LitConfig.py
192 ↗(On Diff #544743)

Why are you changing abspath (here and elsewhere)? I understand why you are changing realpath -> lit.util.abs_path_preserve_drive, but what's the issue with bare abspath?

llvm/utils/lit/lit/TestRunner.py
1297

This change drops the + ".tmp" that was previously added to %t:regex_replacement and %:t.

MrTrillian marked 4 inline comments as done.

Reverted some os.path.abspath that had been converted to abs_path_preserve_drive

MrTrillian added inline comments.Jul 28 2023, 10:18 AM
clang/lib/Basic/FileManager.cpp
663

@benlangmuir Windows simplifies \..\ in paths before starting to walk the filesystem. https://superuser.com/questions/1502360/different-behavior-of-double-dots-and-symlinks-in-windows-and-unix

If I detected that the path didn't resolve to the same file after removing .., what would I do?

I think the current logic will only end up using the else branch on Windows. At least I can't think of a scenario where the root would change from using realpath on unix.

llvm/utils/lit/lit/TestRunner.py
1297

This change drops the + ".tmp" that was previously added to %t:regex_replacement and %:t.

@benlangmuir Actually not! The path_substitutions list uses ("t", tmpName) which includes the .tmp, instead of tmpBase which doesn't. Looking at the original code, there didn't seem to be a reason to use tmpBase and re-append .tmp instead of using tmpName.

MrTrillian marked an inline comment as done.Jul 28 2023, 10:20 AM
MrTrillian added inline comments.
llvm/utils/lit/lit/LitConfig.py
192 ↗(On Diff #544743)

Why are you changing abspath (here and elsewhere)? I understand why you are changing realpath -> lit.util.abs_path_preserve_drive, but what's the issue with bare abspath?

You're right, this is overzealous. I think it made more sense with an older iteration of this code. I removed it.

benlangmuir accepted this revision.Jul 28 2023, 10:38 AM

LGTM other than my suggestion to add a comment. @tahonermann are all your comments addressed at this point?

clang/lib/Basic/FileManager.cpp
663

Thanks, I wasn't aware of that subtlety! I suggest we add a comment about that here and also mention (or assert) that this is only reachable on Windows.

This revision is now accepted and ready to land.Jul 28 2023, 10:38 AM
MrTrillian marked an inline comment as done.

@benlangmuir I made the root-preserving canonicalization logic Windows-only now that I found how to test for that. It changes the code shape a little but I think it's an improvement.

MrTrillian marked an inline comment as done.Jul 28 2023, 12:56 PM
MrTrillian added inline comments.
clang/lib/Basic/FileManager.cpp
663

Thanks, I wasn't aware of that subtlety! I suggest we add a comment about that here and also mention (or assert) that this is only reachable on Windows.

@benlangmuir Accounted for in lastest revision. The root-preserving canonicalization logic is now Windows-only.

benlangmuir added inline comments.Jul 28 2023, 1:41 PM
clang/lib/Basic/FileManager.cpp
663

Thanks, looks good.

MrTrillian marked 4 inline comments as done.Jul 31 2023, 6:30 AM

Resolved remaining comments addressed with the FileManager.cpp change to branch on the native path format being Windows. This is ready to merge!

This revision was landed with ongoing or failed builds.Aug 1 2023, 11:01 AM
This revision was automatically updated to reflect the committed changes.
RKSimon added a subscriber: RKSimon.Aug 2 2023, 4:38 AM

@MrTrillian This is failing for me with:

C:\LLVM\ninja>ninja check-llvm-codegen-x86
[0/1/0/1] Running lit suite C:/LLVM/llvm-project/llvm/test/CodeGen/X86llvm-lit.py: C:\LLVM\llvm-project\llvm\utils\lit\lit\TestingConfig.py:151: fatal: unable to parse config file 'C:\\LLVM\\llvm-project\\llvm\\test\\lit.cfg.py', traceback: Traceback (most recent call last):
  File "C:\LLVM\llvm-project\llvm\utils\lit\lit\TestingConfig.py", line 139, in load_from_path
    exec(compile(data, path, "exec"), cfg_globals, None)
  File "C:\LLVM\llvm-project\llvm\test\lit.cfg.py", line 21, in <module>
    config.test_format = lit.formats.ShTest(not llvm_config.use_lit_shell)
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'use_lit_shell'
RKSimon added inline comments.Aug 2 2023, 5:29 AM
llvm/utils/lit/lit/discovery.py
60

Found the problem - you have moved the os.path.normpath call inside abs_path_preserve_drive, causing the path canonicalization fails - removing it and going back to os.path.normcase(cfgpath) seems to fix the issue - I'll push my fix shortly.

RKSimon added inline comments.Aug 2 2023, 6:57 AM
llvm/utils/lit/lit/discovery.py
60

5ccfa1568130 fixed my builds but broke this - https://lab.llvm.org/buildbot/#/builders/216/builds/24966

Any suggestions would be appreciated, but I can't think of anything but reverting both commits until we get to the bottom of it.

RKSimon added inline comments.Aug 2 2023, 7:51 AM
llvm/utils/lit/lit/discovery.py
60

Panic over - I think I've fixed this with d6f1880c629d

@tahonermann are all your comments addressed at this point?

Apologies for the late response; I was on vacation last week.

No. I'm still skeptical that there is ever a desire to expand substitute drives.

clang/test/Lexer/case-insensitive-include-win.c
5–9

I'm still not following here. Are you saying that trySimplifyPath() will replace substitute drives? If so, that doesn't sound desirable and I would expect it to be problematic for your use case. I think we should follow this up ... somewhere (now that this review is closed).

MrTrillian added inline comments.Aug 15 2023, 6:30 AM
clang/test/Lexer/case-insensitive-include-win.c
5–9

@tahonermann . trySimplifyPath() does not replace substitute drives. It's a best-effort attempt to see if the included path mismatches the real file path only by case. It explicitly bails out without diagnostics if it finds that the included path has a different shape from the real file path, which will happen if the included path is on a substitute drive. It has to compare with the real file path because this is the only reasonable way to get the original path casing.

It was already the case that this diagnostic bailed out in the presence of symbolic links, so there are no behavioral differences. I needed to update the test because previously lit.py would enforce that %t was a real path, and now it doesn't, which means that we would hit the "bail out" code path and not produce the diagnostic.

tahonermann added inline comments.Aug 15 2023, 11:18 AM
clang/test/Lexer/case-insensitive-include-win.c
5–9

I think trySimplifyPath() is not particularly relevant as it just performs a simple canonicalization (removal of . and .. path components without regard for symlink traversal) and case insensitive comparison with the remaining components with a "real" path that is presumed to already be devoid of such components. It is therefore sensitive to structure, but only for the (non-. and non-..) components present in the (non-real) Components vector; the "real" path may have more leading components (the Components vector may represent a relative path). The presence of a substitute drive in the "real" path won't contribute to a structural difference unless the Components vector is absolute but with a drive other than the substitute drive or if it is relative but starting at a higher directory level than the substitute drive; neither of which should be the case for this test when %t is consistently used.

The only relevant user path provided to Clang is the argument to the /FI argument; previously \\?\%t.dir\FOO.h. In that case, %t should reflect use of a substitute drive. The question I have then is why/where Clang is replacing the substitute drive. I'm guessing (I haven't actually debugged) is that this is happening in FileManager::fillRealPathName(); this function assigns the path returned by FileEntry::tryGetRealPathName(). I'm curious if that only happens if /FI specifies a UNC path; does the test pass if the UNC prefix is dropped and the path with the substitute drive is used? Could you try that, please? /FI %t.dir\FOO.h. If that still passes, then I think there is a question of whether it is desirable for behavior to differ for \\?\ prefixed paths.

dyung added a subscriber: dyung.Sep 18 2023, 12:53 PM

Sorry to comment so late, but I just had time to look into this a little bit. This change broke running lit tests for me on Windows, and I wanted to see if there was a way I could restore it.

For reference, my sources are stored locally at C:\Dev\git\dev, but I until this change, I was able to access them through a file system junction point at C:\src\git\dev and run lit tests using that. But after this change, I can no longer run the lit tests when I am in C:\src\git\dev. My system is setup this way because we have weird A/V scanning rules that only exclude certain named directories, and my original path wasn't one of them, so I used the file system junction point to get my builds excluded while continuing to use the path structure that was familiar to me and that I also use on other systems (mac/linux).

Since this change seems to be targeted towards preventing resolving paths that are greater than MAX_PATH on Windows, could we possibly leave the substitution in, but check if the resulting path is longer than MAX_PATH rather than just excluding it completely?

Specifically, where I am seeing the difference is in the new function abs_path_preserve_drive() in util.py where we are now using os.path.abspath() instead of os.path.realpath(). I'm not completely sure what the effects of such a change might be to the rest of this change, but keeping the old conversion and then checking against MAX_PATH it seems would still accomplish your goal while allowing cases like mine to continue working as they previously did.

Thoughts?