This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Use stable keys for CanonicalIncludes mappings
ClosedPublic

Authored by kbobyrev on Apr 4 2022, 7:09 AM.

Details

Summary

This patch switches CanonicalInclude mappings to use llvm::sys::fs::UniqueID for a stable file representation because the FileEntry::getName() results turn out to be changing throughout the lifetime of a program (exposed in D120306). This patch makes it possible for D120306 to be re-landed and increases overall stability.

Diff Detail

Event Timeline

kbobyrev created this revision.Apr 4 2022, 7:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2022, 7:09 AM
kbobyrev requested review of this revision.Apr 4 2022, 7:09 AM
kbobyrev updated this revision to Diff 420175.Apr 4 2022, 7:24 AM

Fix the behavior: get Real Path instead.

This is required for D120306 which was landed but later reverted because of the failing Windows tests (different slash types for filenames requested during preamble parsing and in the main file).

note that you seem to be using auto PublicHeader = CanonIncludes.mapHeader(Entry->getName()); (i.e. getName) in D120306 and not tryGetRealPathName. it might be nice to have a test case here, at least one that only fails/runs on windows

note that you seem to be using auto PublicHeader = CanonIncludes.mapHeader(Entry->getName()); (i.e. getName) in D120306 and not tryGetRealPathName. it might be nice to have a test case here, at least one that only fails/runs on windows

Right! I think I also need to do that in SymbolCollector.cpp.

Right! I think I also need to do that in SymbolCollector.cpp.

Now that you mentioned this too, it feels like the issue is not about being consistent, as I think all of the places that we have today are actually making use of FileEntry::getName as keys to CanonicalIncludes::{addMapping,mapHeader}. Are we sure we're not relying on some sort of canonization of file path (AFAICT, that's not the case the only canonization that's happening is dropping ..) when you use tryGetRealPathName instead of getName?

Right! I think I also need to do that in SymbolCollector.cpp.

Now that you mentioned this too, it feels like the issue is not about being consistent, as I think all of the places that we have today are actually making use of FileEntry::getName as keys to CanonicalIncludes::{addMapping,mapHeader}. Are we sure we're not relying on some sort of canonization of file path (AFAICT, that's not the case the only canonization that's happening is dropping ..) when you use tryGetRealPathName instead of getName?

Right, I think the name of the patch is misleading. "Consistent" means "consistency across different SM/FMs" rather than consistency across the call sites.

Changing the strings here makes sense, but I'm confused by the use of tryGetRealPathName.
Also, we ended up dropping the filename in favor of UniqueID in IncludeStructure, so that's an alternative here.

Yes, I think we need a test here with a scope wide enough that it tests that the callsites of addMapping/getMapping line up.

clang-tools-extra/clangd/index/CanonicalIncludes.cpp
74

offline we had discussed getName rather than tryGetRealPathName instead.

The comment in HeaderStructure says specifically that we avoid tryGetRealPathName because it's not preserved in the preamble.
I suspect that this will give incorrect results in cases involving non-preamble #includes.

In any case we should have a comment somewhere on CanonicalIncludes about how we're identifying files, and why it's stable enough. The lack of such a comment is really what led to the bug on friday :-)

74

tryGetRealPathName can be the empty string, what do we do in such cases?

(In practice I think this is going to happen in the case where you have a non-preamble #include of a header-guarded file which was also part of the preamble)

kbobyrev added inline comments.Apr 4 2022, 8:20 AM
clang-tools-extra/clangd/index/CanonicalIncludes.cpp
74

Oh, okay, I thought tryGetRealPathName would not be empty here :(

The problem with getName is that it turned out to produce the same results that we already get right now through SourceManager.

That leaves us with other ways of canonicalization: either through conversions to the native path or FileManager::getCanonicalName(). Both are expensive and I think the problem is that though the mapping here would be OK to generate in an expensive manner, the callsites would not be very happy with that but I guess it's acceptable.

kbobyrev updated this revision to Diff 420423.Apr 5 2022, 2:38 AM

Switch to stable file UniqueIDs.

kbobyrev marked 2 inline comments as done.Apr 5 2022, 2:38 AM
kbobyrev retitled this revision from [clangd] Use consistent header paths in CanonicalIncludes mappings to [clangd] Use stable keys for CanonicalIncludes mappings.Apr 5 2022, 2:40 AM
kbobyrev edited the summary of this revision. (Show Details)
kbobyrev updated this revision to Diff 420425.Apr 5 2022, 2:45 AM

Rebase on top of main.

kbobyrev updated this revision to Diff 420426.Apr 5 2022, 2:48 AM

Rebase correctly.

sammccall added inline comments.Apr 5 2022, 2:58 AM
clang-tools-extra/clangd/index/CanonicalIncludes.h
39–40

nit: no longer a string-to-string mapping

46

prefer FileEntryRef over FileEntry*, so the name we're looking up is controlled by the caller rather than an ~arbitrary choice in case of symlinks

clang-tools-extra/clangd/unittests/CanonicalIncludesTests.cpp
1

Can you add a test for the new functionality? that the mapping doesn't depend on the name

Use the hard-link function in InMemoryFS to create multiple names with one UID

60

why the pointer? can't this just be FileManager Files(...)

kbobyrev updated this revision to Diff 420462.Apr 5 2022, 5:25 AM
kbobyrev marked 4 inline comments as done.

Address review comments.

sammccall accepted this revision.Apr 5 2022, 5:56 AM

Just nits

clang-tools-extra/clangd/IncludeCleaner.cpp
359 ↗(On Diff #420462)

getLastRef().getName() is just the same as Entry->getName().
We should be using SM.getFileEntryRefForID instead.

(Which is actually still not preserving the FERef under the hood, but will soon...)

clang-tools-extra/clangd/index/CanonicalIncludes.h
40

no const, FileEntryRef is passed by value (and below)

clang-tools-extra/clangd/index/SymbolCollector.cpp
384–385

again, avoid getLastRef

clang-tools-extra/clangd/unittests/CanonicalIncludesTests.cpp
26

no need to take ownership, just take InMemoryFileSystem&?

clang-tools-extra/clangd/unittests/SymbolCollectorTests.cpp
1594

avoid getLastRef

This revision is now accepted and ready to land.Apr 5 2022, 5:56 AM
kbobyrev updated this revision to Diff 420500.Apr 5 2022, 7:23 AM
kbobyrev marked 5 inline comments as done.

Fix the test case.

This revision was landed with ongoing or failed builds.Apr 5 2022, 7:28 AM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Apr 5 2022, 9:01 AM

Looks like this breaks tests on Windows: http://45.33.8.238/win/55690/step_9.txt

please take a look, etc :)