This is an archive of the discontinued LLVM Phabricator instance.

[lit] Improve globbing in Windows with long paths
Needs ReviewPublic

Authored by drodriguez on May 16 2019, 11:40 AM.

Details

Summary

In many Windows API, the maximum path is limited to 260 characters.
However, the same APIs allow up to around 32K characters if using a
special prefix (\\?\) and unicode paths.

The modifications in this commit allows globbing to return results that
will be considered long paths in Windows (the glob pattern doesn't need
to be a long path itself). Sadly, the trick will only work in 2.7.16 and
above because of a bug in Python os.listdir when used in Windows. In
that case, the behaviour should be the same as before.

Includes tests for the new behaviour (which should be no problem in
Linux, but should had failed in Windows before applying this patch).

Diff Detail

Event Timeline

drodriguez created this revision.May 16 2019, 11:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 16 2019, 11:40 AM
compnerd added a reviewer: rnk.

I will hold this for a moment. The review can happen, but I would not merge it directly. I think I would need to return the results with the prefix in the cases I know it is necessary.

thakis added a comment.Jun 5 2019, 8:53 AM

I have two comments on this:

  • On a micro level, If this solves a problem you have, I think it's fine to merge this. It's small and localized.
  • On a macro level, I think trying to handle long paths on Windows is a losing battle that adds lots of complexity* everywhere without a great payoff. I'd recommend to just put your llvm checkout at a short path, and to keep paths in llvm short enough that we never need this.

*: Of many kinds. In this path, the test suite now works differently on 2.7.16+ and later; as you say you'll likely have to leak the UNC-prefixed path out of glob for cases where it was needed and other places might not expect it; many places in lit will learn about this, etc.

llvm/utils/lit/lit/ShCommands.py
67

Should the normpath call happen in the non-use_prefix case too? It seems strange that use_prefix controls more than just prefixing but \\?\. Also, use a raw literal, so you don't have to escape all the backslashes.

llvm/utils/lit/tests/unit/GlobItem.py
68

Won't this fail with older pythons?

drodriguez marked 2 inline comments as done.Jun 5 2019, 1:06 PM

100% agree with you that handling long paths on Windows is a battle that I cannot win. Sadly, even if I can modify a little where my checkout is, and I can reduce that path the best I can, but when CMake and a lot more pieces start creating directories, they will not take care of the length, and they might end up creating very long paths (inside my short path). The problem appeared when a tool started creating a directory (with a long-ish name which cannot be easily modified) inside the temporal directory for a test. Short path + short path + short path ended up being long path.

About returning results with prefixes from my last comment, sadly it seems that it should be dealt with on the test side, since some tools accept the prefixed paths, and some others dislike them.

llvm/utils/lit/lit/ShCommands.py
67

normpath deals with a problem that only happens in Windows. Many tests are written with Unix paths, so they use / instead of \. Windows normally accept both as directory separator, but when the path is prefixed with \\?\, the Unix directory separators become literal /. The normpath takes care of creating the best path for Windows.

It is not technically needed for Unix, and it should not hurt, but I preferred to avoid touching the Unix path, if possible.

If you want I can move it to L59 and apply to every platform.

llvm/utils/lit/tests/unit/GlobItem.py
68

Yes. The test are written with 2.7.16 in mind. If you execute the lit test in a previous version (and in Windows), the test will fail and it will indicate that other test that use lit might also fail.

Would you prefer to skip this test in case the Python version wasn’t enough?