This is an archive of the discontinued LLVM Phabricator instance.

[Windows] Avoid using FileIndex for unique IDs
ClosedPublic

Authored by mstorsjo on Jul 18 2023, 4:30 AM.

Details

Summary

The FileIndex values returned from GetFileInformationByHandle are
considered stable and uniquely identifying a file, as long as the
handle is open. When handles are closed, there are no guarantees
for their stability or uniqueness. On some file systems (such as
NTFS), the indices are documented to be stable even across handles.
But with some file systems, in particular network mounts, file
indices can be reused very soon after handles are closed.

When such file indices are used for LLVM's UniqueID, files are
considered duplicates as soon as the filesystem driver happens to
have used the same file index for the handle used to inspect the
file. This caused widespread, non-obvious (seemingly random)
breakage. This can happen e.g. if running on a directory that is
shared via Remote Desktop or VirtualBox.

To avoid the issue, use a hash of the canonicalized path for the
file as unique identifier, instead of using FileIndex.

This fixes https://github.com/llvm/llvm-project/issues/61401 and
https://github.com/llvm/llvm-project/issues/22079.

Performance wise, this adds (usually) one extra call to
GetFinalPathNameByHandleW for each call to getStatus(). A test
cases such as running clang-scan-deps becomes around 1% slower
by this, which is considered tolerable.

Change the equivalent() function to use getUniqueID instead of
checking individual file_status fields. The
equivalent(Twine,Twine,bool& result) function calls status() on
each path successively, without keeping the file handles open,
which also is prone to such false positives. This also gets rid
of checks of other superfluous fields in the
equivalent(file_status, file_status) function - the unique ID of
a file should be enough (that is what is done for Unix anyway).

This comes with one known caveat: For hardlinks, each name for
the file now gets a different UniqueID, and equivalent() considers
them different. While that's not ideal, occasional false negatives
for equivalent() is usually that fatal (the cases where we strictly
do need to deduplicate files with different path names are quite
rare) compared to the issues caused by false positives for
equivalent() (where we'd deduplicate and omit totally distinct files).

The FileIndex is documented to be stable on NTFS though, so ideally
we could maybe have used it in the majority of cases. That would
require a heuristic for whether we can rely on FileIndex or not.
We considered using the existing function is_local_internal for that;
however that caused an unacceptable performance regression
(clang-scan-deps became 38% slower in one test, even more than that
in another test).

Diff Detail

Event Timeline

mstorsjo created this revision.Jul 18 2023, 4:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2023, 4:30 AM
mstorsjo requested review of this revision.Jul 18 2023, 4:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2023, 4:30 AM
Herald added a subscriber: wangpc. · View Herald Transcript

@thieta IIRC you had a usecase where you scan through large numbers of files with LLVM code somewhere. Are you able to take this for a spin to make sure it doesn't affect the performance of your usecase too much?

@thieta IIRC you had a usecase where you scan through large numbers of files with LLVM code somewhere. Are you able to take this for a spin to make sure it doesn't affect the performance of your usecase too much?

Will do, but probably not some time real soon since I am on vacation with the family now.

aaron.ballman added inline comments.Aug 7 2023, 11:41 AM
llvm/include/llvm/Support/FileSystem.h
238–239

Some comments explaining that the hash is only expected to be valid if ReliableFileIndex is false may help.

llvm/lib/Support/Windows/Path.inc
741–742
752–755
mstorsjo added inline comments.Aug 7 2023, 12:49 PM
llvm/include/llvm/Support/FileSystem.h
238–239

Yep, that'd be useful, will add that!

llvm/lib/Support/Windows/Path.inc
741–742

Yes, I'd love to do that - but there are gotos that jump past this spot, and the gotos can't cross an initialization.

aaron.ballman added inline comments.Aug 8 2023, 5:07 AM
llvm/lib/Support/Windows/Path.inc
741–742

Ugh... "thanks" goto! :-D

mstorsjo updated this revision to Diff 549818.Aug 14 2023, 1:13 AM

Add a comment clarifying that PathHas only is valid if ReliableFileIndex is false.

aaron.ballman accepted this revision.Aug 14 2023, 6:08 AM

LGTM! I wish there was a reasonable way to test this, but given the subject of the change, I don't think test coverage is practical.

This revision is now accepted and ready to land.Aug 14 2023, 6:08 AM

@thieta IIRC you had a usecase where you scan through large numbers of files with LLVM code somewhere. Are you able to take this for a spin to make sure it doesn't affect the performance of your usecase too much?

Will do, but probably not some time real soon

@thieta Do you have time to try this out in the near future? I'd like to at least know of any potential performance impact before landing this.

(Ideally I'd like to have this in a release, but this is something that definitely should cook in git main for a month or two before considering including in a release. At that point we're probably close to the late 17.0.x, where it's probably too risky to include, so I guess this is practically 18.x material in any case?)

(Ideally I'd like to have this in a release, but this is something that definitely should cook in git main for a month or two before considering including in a release. At that point we're probably close to the late 17.0.x, where it's probably too risky to include, so I guess this is practically 18.x material in any case?)

FWIW, I agree that this kind of change needs a bit longer of bake time to be confident in its correctness, so I think 17.x is unlikely at this point.

@thieta IIRC you had a usecase where you scan through large numbers of files with LLVM code somewhere. Are you able to take this for a spin to make sure it doesn't affect the performance of your usecase too much?

Will do, but probably not some time real soon

@thieta Do you have time to try this out in the near future? I'd like to at least know of any potential performance impact before landing this.

@mstorsjo I think a good test for you would be to run clang-scan-deps on a large CDB .json, multithreaded, and see if there’s a difference before/after patch.

@mstorsjo I think a good test for you would be to run clang-scan-deps on a large CDB .json, multithreaded, and see if there’s a difference before/after patch.

That sounds like a good thing to test. Does anyone have a suitable setup with such files ready that could run the tests? I don't really run Windows (other than VMs for testing things occasionally) or do development in it, I mostly try to cross compile things - and I haven't used clang-scan-deps before either.

@mstorsjo I think a good test for you would be to run clang-scan-deps on a large CDB .json, multithreaded, and see if there’s a difference before/after patch.

That sounds like a good thing to test. Does anyone have a suitable setup with such files ready that could run the tests? I don't really run Windows (other than VMs for testing things occasionally) or do development in it, I mostly try to cross compile things - and I haven't used clang-scan-deps before either.

I set this up and managed to get some numbers now.

As test subject, I configured a build of llvm+clang, with llvm-mingw built with and without this patch. After configuring the build, I ran tim clang-scan-deps.exe --compilation-database=compile_commands.json > out 2> err (using https://github.com/sgraham/tim). Before the change, it took consistently real: 0m17.156s to run, while after the change it took around real: 0m23.703s. So this task became around 38% slower with this change - that's a notable slowdown.

I tried to look into what is causing so much slowdown. Overall, this path adds one call to realPathFromHandle (if status was called on a file descriptor without known path), which calls GetFinalPathNameByHandleW once or twice if the buffer wasn't large enough. (We could resize the new buffer on line 792 to MAX_PATH to avoid needing to do this twice in most cases.) This doesn't incur almost any extra runtime in my tests (perhaps most calls are with a path?). Then we call is_local_internal, which first calls GetVolumePathNameW followed by GetDriveTypeW. Here the absolute majority in runtime seems to be in GetVolumePathNameW - by omitting the rest of is_local_internal and just doing one single call to GetVolumePathNameW, the runtime increases from 17 to 23 seconds.

So the calls to GetFinalPathNameByHandleW and GetDriveTypeW are irrelevant for performance here, and GetVolumePathNameW is the only thing that causes all the extra runtime cost added here. I don't see any great ways around that. (One may be tempted to manually reimplement GetVolumePathNameW to just get the first segment of the path - but that breaks the cases where volumes are mounted in the system hierarchy etc.)

With that in mind, are @aaron.ballman and @aganea still ok with going ahead and landing this?

An alternative would be to just skip the call to is_local_internal and entirely switch to the hash based logic and never use FileIndex. I guess that'd mostly work, but the current patch tries to retain the use of FileIndex where reasonable (which probably should work better whenever e.g. symlinks or hardlinks are in use - which aren't common but do exist).

aganea added a comment.EditedAug 18 2023, 4:32 AM

Another option could be to add a debug opt to disable this new code path, and skip all the canonicalization and reliability check. It could be one way or the other; either enabled by default or disabled by default. But probably better to enable it by default (this new code path) and let people add the flag if they need more performance.

Let me do some tests today with our codebase, see how much of an impact there is. I’d like also to see if this impacts overall Clang build times.

Hmm. This level of performance hit is not great - it means I will probably have to carry an internal patch to disable if we don't expose a compile time define or option.

I will try to test this in the coming week. It has just been busy with a lot of other things right now.

aaron.ballman requested changes to this revision.Aug 21 2023, 8:27 AM

Ouch, that is a sizeable compile time performance regression -- I think that's too much expense for everyone on Windows to pay given how often users are likely to hit the issues this solves. Removing my acceptance of the patch while we consider what to do.

This revision now requires changes to proceed.Aug 21 2023, 8:27 AM
aganea added a comment.EditedAug 21 2023, 10:41 AM

Unfortunately in our case the situation is even worse than what Martin was suggesting. My timings are 2x slower after this patch than without. Tested with a stage2 clang-cl built with ThinLTO and all optimizations on.

Build times for our game project, which is based on UE 5.2, on a 32c/64t Threadripper:

  • before patch: 8 min 13 sec
  • after patch: 15 min 20 sec

Same project, clang-scan-deps:

  • before patch: 1 min 48 sec
  • after patch: 3 min 47 sec

Even something as simple as that is slower after this patch:

PS C:\test> cat main.cpp
int main() { return 0; }

PS C:\test> Measure-Command { C:\before_patch\clang-cl.exe /c .\main.cpp }
...(omitted)...
TotalMilliseconds : 40.5117

PS C:\test> Measure-Command { C:\after_patch\clang-cl.exe /c .\main.cpp }
...(omitted)...
TotalMilliseconds : 57.4749

It seems one of root issues is the call to GetVolumePathNameW inside is_local_internal, which is absolutely terrible. It generates 12 (!!!) separate kernel calls for each folder component in the path. The deeper your files are, the longer it takes to execute.
Also worth noting that Clang calls llvm::sys::status twice per file:

  • once in DiagnoseInputExistence.
  • then in FileSystemStatCache::get.

Also these status calls are done on every single file or folder that Clang fiddles with, the sysroot folders, output files... Most of the time spent during our build is spent in the kernel (darker blue part):

If we had a resident daemon process that was caching inputs during the entire build, probably this patch wouldn't be an problem. One option like I was suggesting would be to hide this new behavior behind a (disabled) option, and tell users about it. But I'm not sure how useful it would be.

Unfortunately in our case the situation is even worse than what Martin was suggesting. My timings are 2x slower after this patch than without. Tested with a stage2 clang-cl built with ThinLTO and all optimizations on.

Build times for our game project, which is based on UE 5.2, on a 32c/64t Threadripper:

  • before patch: 8 min 13 sec
  • after patch: 15 min 20 sec

Same project, clang-scan-deps:

  • before patch: 1 min 48 sec
  • after patch: 3 min 47 sec

Ouch, that's quite seriously bad. I had expected that the overhead would be noticeable in something like clang-scan-deps, but for actual compilation where most effort is spent on other, actually compute intensive things, that's really quite spectacular.

It seems one of root issues is the call to GetVolumePathNameW inside is_local_internal, which is absolutely terrible. It generates 12 (!!!) separate kernel calls for each folder component in the path. The deeper your files are, the longer it takes to execute.

Oh, wow, that explains things...

One option like I was suggesting would be to hide this new behavior behind a (disabled) option, and tell users about it. But I'm not sure how useful it would be.

Yeah that's probably not too useful in the long run. When you hit this issue, you get extremely confusing error behaviours, to the point that I believe we can't diagnose and suggest the option to the user. I wouldn't expect any regular user to figure out to enable the option really. So whatever we do, it should probably work pretty much automatically.

You who might have more use of nontrivial Windows build scenarios - how much impact would it be to go all in on the new path canonicalization + hash approach, i.e. ditching the file index entirely? Performance wise I would believe that it would be no significant change to before. The only thing that matters probably is how it behaves wrt symlinks/hardlinks, if those are present and in use.

One option like I was suggesting would be to hide this new behavior behind a (disabled) option, and tell users about it. But I'm not sure how useful it would be.

Yeah that's probably not too useful in the long run. When you hit this issue, you get extremely confusing error behaviours, to the point that I believe we can't diagnose and suggest the option to the user. I wouldn't expect any regular user to figure out to enable the option really. So whatever we do, it should probably work pretty much automatically.

Agreed, I don't think an option really helps all that much.

You who might have more use of nontrivial Windows build scenarios - how much impact would it be to go all in on the new path canonicalization + hash approach, i.e. ditching the file index entirely? Performance wise I would believe that it would be no significant change to before. The only thing that matters probably is how it behaves wrt symlinks/hardlinks, if those are present and in use.

A related question for hashes -- what about case insensitivity of file paths and slash direction? Presumably we want F:\foo\Bar to be the same path as F:/FOO/BAR?

You who might have more use of nontrivial Windows build scenarios - how much impact would it be to go all in on the new path canonicalization + hash approach, i.e. ditching the file index entirely? Performance wise I would believe that it would be no significant change to before. The only thing that matters probably is how it behaves wrt symlinks/hardlinks, if those are present and in use.

A related question for hashes -- what about case insensitivity of file paths and slash direction? Presumably we want F:\foo\Bar to be the same path as F:/FOO/BAR?

Yes, that's something that certainly would come up a lot. The approach taken by this patch is that we run all paths through realPathFromHandle, which calls GetFinalPathNameByHandleW, https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfinalpathnamebyhandlew, which should return a canonicalized path name for this file.

You who might have more use of nontrivial Windows build scenarios - how much impact would it be to go all in on the new path canonicalization + hash approach, i.e. ditching the file index entirely? Performance wise I would believe that it would be no significant change to before. The only thing that matters probably is how it behaves wrt symlinks/hardlinks, if those are present and in use.

A related question for hashes -- what about case insensitivity of file paths and slash direction? Presumably we want F:\foo\Bar to be the same path as F:/FOO/BAR?

Yes, that's something that certainly would come up a lot. The approach taken by this patch is that we run all paths through realPathFromHandle, which calls GetFinalPathNameByHandleW, https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfinalpathnamebyhandlew, which should return a canonicalized path name for this file.

I guess the follow-up then is that we should check how GetFinalPathNameByHandleW behaves wrt potential symlinks and hardlinks - if it canonicalizes them as we'd like, then I guess it should be quite safe to always go with the hashing.

aganea added a comment.EditedAug 22 2023, 7:35 AM

You who might have more use of nontrivial Windows build scenarios - how much impact would it be to go all in on the new path canonicalization + hash approach, i.e. ditching the file index entirely? Performance wise I would believe that it would be no significant change to before. The only thing that matters probably is how it behaves wrt symlinks/hardlinks, if those are present and in use.

I guess we can try it, but we might be trading one problem for another. Fun stuff like this. We would need a rigorous test plan, since this can't be tested automatically. And a strategy for applying the test plan since a single individual might not have access to all hardware resources to execute the test plan (perhaps sharing the testing across several individuals).

I'd like to suggest again the solution I put forward in D146490, which could only be enabled on a case-by-case basis, for short-lived apps like Clang or LLD. Other, or long-lived applications like ClangD would still use the current (unreliable) trunk behavior for IDs. At least that shovels all the OS/drivers problems to the OS, and leaves us with the optimization problem (of open handles), which could be manageable.

We can also try to improve your patch somehow:

It seems one of root issues is the call to GetVolumePathNameW inside is_local_internal, which is absolutely terrible. It generates 12 (!!!) separate kernel calls for each folder component in the path. The deeper your files are, the longer it takes to execute.

Correction, the issue is with GetDriveTypeW not GetVolumePathNameW. I realized that they do this because of reparse points, and because internally they call other APIs that each opens the path and checks for reparse points, again. For example, I don't understand why internally they call RegSetValueExW? (inside GetDriveTypeW) I wonder if there could be a different way to find whether the file is on a network drive or not.

Also worth noting that Clang calls llvm::sys::status twice per file:

  • once in DiagnoseInputExistence.
  • then in FileSystemStatCache::get.

We could also fix this, which would maybe dampen the impact of this patch.

You who might have more use of nontrivial Windows build scenarios - how much impact would it be to go all in on the new path canonicalization + hash approach, i.e. ditching the file index entirely? Performance wise I would believe that it would be no significant change to before. The only thing that matters probably is how it behaves wrt symlinks/hardlinks, if those are present and in use.

I guess we can try it, but we might be trading one problem for another. Fun stuff like this.

Oh, interesting. However I don't think that seem like a big blocker for us - if VOLUME_NAME_NT works reliably across file systems, we could just use that. We don't need a filename to display anywhere here, we just need a canonical string representation for a file.

We would need a rigorous test plan, since this can't be tested automatically. And a strategy for applying the test plan since a single individual might not have access to all hardware resources to execute the test plan (perhaps sharing the testing across several individuals).

I don't quite see why such a rigorous test plan would be needed here, compared to how we do most other things? If it seems to work reasonably in the cases we've got access to testing and we don't know of any other specific problematic cases, we could land it, and expect to hear back if it causes something else?

I'd like to suggest again the solution I put forward in D146490, which could only be enabled on a case-by-case basis, for short-lived apps like Clang or LLD. Other, or long-lived applications like ClangD would still use the current (unreliable) trunk behavior for IDs. At least that shovels all the OS/drivers problems to the OS, and leaves us with the optimization problem (of open handles), which could be manageable.

I'm quite hesitant of that solution - it's complex and increases resource consumption - and keeping files open longer than necessary always causes issues, especially on Windows. Admittedly, it's not that big of an issue for short-lived processes. Having separate solutions for short-lived and long-lived processes doesn't feel too great though.

It seems one of root issues is the call to GetVolumePathNameW inside is_local_internal, which is absolutely terrible. It generates 12 (!!!) separate kernel calls for each folder component in the path. The deeper your files are, the longer it takes to execute.

Correction, the issue is with GetDriveTypeW not GetVolumePathNameW. I realized that they do this because of reparse points, and because internally they call other APIs that each opens the path and checks for reparse points, again. For example, I don't understand why internally they call RegSetValueExW? (inside GetDriveTypeW) I wonder if there could be a different way to find whether the file is on a network drive or not.

Hmm, in my testing before, it was definitely GetVolumePathNameW that was slow.

FWIW, I tested symlinks and hardlinks with Clang with this patch. Symlinks do get resolved to the target file, so the path hashing approach works just as well as the file index there. For hard links, path hashing is inferior to using the file index though - the file index knows that two different paths are the same file, but GetFinalPathNameByHandleW doesn't canonicalize them to one single name (which is the correct behaviour for hard links - all names are equal and independent of each other).

If we consider hard links on Windows something we don't need to cater for, we could use unconditional path hashing, combined with canonicalization with VOLUME_NAME_NT if the default VOLUME_NAME_DOS is problematic, and that should be quite performant.

mstorsjo updated this revision to Diff 555711.Sep 4 2023, 5:14 AM

Changed to always use a hashed path instead of nFileIndexHigh/nFileIndexLow, without any heuristics for when to do this. With this, I'm measuring a slowdown of around 1% in the test with clang-scan-deps.

How does this behave in your testcase @aganea?

I applied the suggestion regarding wonky file system drivers from Rust, and using VOLUME_NAME_NT, which works with such drivers too (tested with ImDisk, and confirmed that the default VOLUME_NAME_DOS fails there). We don't need to do fallbacking/testing between different formats here as we're not interested in the actual contents of the canonicalized path (we're not using it for anything), we're just interested in its identity.

For the potential case if GetFinalPathNameByHandleW still fails with VOLUME_NAME_NT, fall back on the old behaviour of using nFileIndexHigh/nFileIndexLow.

The only known drawback in this case is that we won't be able to notice two names pointing at the same hardlinked file - but I would believe that to be a compromise that we can affort to do on Windows (where hardlinks generally are rare).

aganea added a comment.Sep 6 2023, 7:38 AM

How does this behave in your testcase @aganea?

I don't see any difference between this patch and the baseline, so that's good news.

There are two failing tests however (related to hard links):

FAIL: LLVM-Unit :: Support/./SupportTests.exe/40/82 (5085 of 72863)
******************** TEST 'LLVM-Unit :: Support/./SupportTests.exe/40/82' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:C:\git\llvm-project\stage2_rpmalloc\unittests\Support\.\SupportTests.exe-LLVM-Unit-27536-40-82.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=82 GTEST_SHARD_INDEX=40 C:\git\llvm-project\stage2_rpmalloc\unittests\Support\.\SupportTests.exe
--

Script:
--
C:\git\llvm-project\stage2_rpmalloc\unittests\Support\.\SupportTests.exe --gtest_filter=FileSystemTest.Unique
--
Test Directory: C:\Users\AGANEA~1\AppData\Local\Temp\lit-tmp-_2_vvve2\file-system-test-e7decf
C:\git\llvm-project\llvm\unittests\Support\Path.cpp(716): error: Expected equality of these values:
  D2
    Which is: 16-byte object <48-04 40-84 00-00 00-00 C6-FC EA-25 9F-15 C7-9E>
  F1
    Which is: 16-byte object <48-04 40-84 00-00 00-00 E1-BF 0D-B1 58-7A 27-07>

C:\git\llvm-project\llvm\unittests\Support\Path.cpp:716
Expected equality of these values:
  D2
    Which is: 16-byte object <48-04 40-84 00-00 00-00 C6-FC EA-25 9F-15 C7-9E>
  F1
    Which is: 16-byte object <48-04 40-84 00-00 00-00 E1-BF 0D-B1 58-7A 27-07>


********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90
FAIL: LLVM-Unit :: Support/./SupportTests.exe/45/82 (58768 of 72863)
******************** TEST 'LLVM-Unit :: Support/./SupportTests.exe/45/82' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:C:\git\llvm-project\stage2_rpmalloc\unittests\Support\.\SupportTests.exe-LLVM-Unit-27536-45-82.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=82 GTEST_SHARD_INDEX=45 C:\git\llvm-project\stage2_rpmalloc\unittests\Support\.\SupportTests.exe
--

Script:
--
C:\git\llvm-project\stage2_rpmalloc\unittests\Support\.\SupportTests.exe --gtest_filter=FileSystemTest.TempFiles
--
Test Directory: C:\Users\AGANEA~1\AppData\Local\Temp\lit-tmp-_2_vvve2\file-system-test-de39a5
C:\git\llvm-project\llvm\unittests\Support\Path.cpp(914): error: Value of: equal
  Actual: false
Expected: true
C:\git\llvm-project\llvm\unittests\Support\Path.cpp(917): error: Value of: fs::equivalent(A, B)
  Actual: false
Expected: true

C:\git\llvm-project\llvm\unittests\Support\Path.cpp:914
Value of: equal
  Actual: false
Expected: true
C:\git\llvm-project\llvm\unittests\Support\Path.cpp:917
Value of: fs::equivalent(A, B)
  Actual: false
Expected: true


********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
********************
Failed Tests (2):
  LLVM-Unit :: Support/./SupportTests.exe/FileSystemTest/TempFiles
  LLVM-Unit :: Support/./SupportTests.exe/FileSystemTest/Unique


Testing Time: 654.15s
  Skipped          :    56
  Unsupported      :  2452
  Passed           : 90408
  Expectedly Failed:   198
  Failed           :     2
mstorsjo updated this revision to Diff 556237.Sep 8 2023, 1:48 AM

Exclude tests that test equality of hardlinks, making it clearer that this is a known intentional missing feature here.

aganea accepted this revision.Sep 8 2023, 4:05 AM

LGTM, thank you Martin!

LGTM, thank you Martin!

Great that it looks good to you now, thanks!

As @aaron.ballman put a negative review in before, on the version with massive performance issues, do you want to have a look on this again now? If everyone are good with it, I’d consider landing it maybe some time next week. And this is clearly way too disruptive to even consider backporting indeed; having it cooking in git main until 18.x seems good.

aaron.ballman accepted this revision.Sep 11 2023, 1:17 PM

LGTM! I am a bit uncomfortable about the hardlink behavior (it seems plausible that some folks may use hardlinks to specify paths to include to things like the Windows SDK, so we could run into them). I think landing this early in the 18.x cycle gives us quite a bit of bake time to see if there's fallout in practice, but we should definitely watch the issues list once this lands and again once we start putting out 18.x rcs.

I think this probably should come with a release note, 1) to talk about the fix and 2) to call out the behavioral change with hardlinks

llvm/unittests/Support/Path.cpp
714 ↗(On Diff #556237)

Would this make sense?

917 ↗(On Diff #556237)

same here as above

This revision is now accepted and ready to land.Sep 11 2023, 1:17 PM
mstorsjo added inline comments.Sep 11 2023, 1:21 PM
llvm/unittests/Support/Path.cpp
714 ↗(On Diff #556237)

I guess that makes sense, although it’s a bit problematic to fix (without trading in the other bugs we’re trying to fix here). I hope that if someone tries to fix it, that they check up on why we ended up here.

LGTM! I am a bit uncomfortable about the hardlink behavior (it seems plausible that some folks may use hardlinks to specify paths to include to things like the Windows SDK, so we could run into them).

Just another note on this; hardlinks generally still work - the kinda unusual scenario that would break, is if you refer to the same file with different path names and rely on LLVM to deduplicate those files for you. I honestly believe that case to be kinda rare. The cases where we really want deduplication to happen for files with distinct paths feels kinda rare to me, in general. I.e. I think false negatives for distinct different path names isn't that bad. (False negatives for the same path name would be bad though. And with the previous implementation with FileIndex on unusual file systems, that would also be possible.) On the other hand, any case of a false positive for equivalent() for actually distinct files is totally fatal, and produces extremely confusing errors.

I think landing this early in the 18.x cycle gives us quite a bit of bake time to see if there's fallout in practice, but we should definitely watch the issues list once this lands and again once we start putting out 18.x rcs.

I think this probably should come with a release note, 1) to talk about the fix and 2) to call out the behavioral change with hardlinks

Yep - I'm updating the commit message for the final solution we settled on, and adding context about other solutions that were considered, and writing a release note for it, and I'll go ahead and land it later today.

mstorsjo updated this revision to Diff 556534.Sep 12 2023, 1:26 AM

Updated with the suggested FIXME markings, and added a release note for it. (I placed the release note in the section for "Changes to the Windows Target", even if this isn't about code generation for a Windows target, but about LLVM running on a Windows host. Instead of creating a separate section for this, I guessed that this reaches the target audience anyway.)

mstorsjo retitled this revision from [Windows] Avoid using FileIndex for unique IDs on network mounts to [Windows] Avoid using FileIndex for unique IDs.Sep 12 2023, 1:30 AM
mstorsjo edited the summary of this revision. (Show Details)
aaron.ballman added inline comments.Sep 12 2023, 5:18 AM
llvm/docs/ReleaseNotes.rst
123–124 ↗(On Diff #556534)

Please add a link to the issues this resolves (that helps folks connect the dots as to why the changes happened in the first place).

mstorsjo updated this revision to Diff 556554.Sep 12 2023, 5:21 AM
mstorsjo edited the summary of this revision. (Show Details)

Add references to the fixed bugs in the release notes.

This revision was landed with ongoing or failed builds.Sep 12 2023, 12:12 PM
This revision was automatically updated to reflect the committed changes.