This patch switches CanonicalInclude mappings to use llvm::sys::fs::UniqueID for a stable file representation because the FileEntry::getName() results turn out to be changing throughout the lifetime of a program (exposed in D120306). This patch makes it possible for D120306 to be re-landed and increases overall stability.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This is required for D120306 which was landed but later reverted because of the failing Windows tests (different slash types for filenames requested during preamble parsing and in the main file).
note that you seem to be using auto PublicHeader = CanonIncludes.mapHeader(Entry->getName()); (i.e. getName) in D120306 and not tryGetRealPathName. it might be nice to have a test case here, at least one that only fails/runs on windows
Right! I think I also need to do that in SymbolCollector.cpp.
Now that you mentioned this too, it feels like the issue is not about being consistent, as I think all of the places that we have today are actually making use of FileEntry::getName as keys to CanonicalIncludes::{addMapping,mapHeader}. Are we sure we're not relying on some sort of canonization of file path (AFAICT, that's not the case the only canonization that's happening is dropping ..) when you use tryGetRealPathName instead of getName?
Right, I think the name of the patch is misleading. "Consistent" means "consistency across different SM/FMs" rather than consistency across the call sites.
Changing the strings here makes sense, but I'm confused by the use of tryGetRealPathName.
Also, we ended up dropping the filename in favor of UniqueID in IncludeStructure, so that's an alternative here.
Yes, I think we need a test here with a scope wide enough that it tests that the callsites of addMapping/getMapping line up.
clang-tools-extra/clangd/index/CanonicalIncludes.cpp | ||
---|---|---|
74 | offline we had discussed getName rather than tryGetRealPathName instead. The comment in HeaderStructure says specifically that we avoid tryGetRealPathName because it's not preserved in the preamble. In any case we should have a comment somewhere on CanonicalIncludes about how we're identifying files, and why it's stable enough. The lack of such a comment is really what led to the bug on friday :-) | |
74 | tryGetRealPathName can be the empty string, what do we do in such cases? (In practice I think this is going to happen in the case where you have a non-preamble #include of a header-guarded file which was also part of the preamble) |
clang-tools-extra/clangd/index/CanonicalIncludes.cpp | ||
---|---|---|
74 | Oh, okay, I thought tryGetRealPathName would not be empty here :( The problem with getName is that it turned out to produce the same results that we already get right now through SourceManager. That leaves us with other ways of canonicalization: either through conversions to the native path or FileManager::getCanonicalName(). Both are expensive and I think the problem is that though the mapping here would be OK to generate in an expensive manner, the callsites would not be very happy with that but I guess it's acceptable. |
clang-tools-extra/clangd/index/CanonicalIncludes.h | ||
---|---|---|
39–40 | nit: no longer a string-to-string mapping | |
46 | prefer FileEntryRef over FileEntry*, so the name we're looking up is controlled by the caller rather than an ~arbitrary choice in case of symlinks | |
clang-tools-extra/clangd/unittests/CanonicalIncludesTests.cpp | ||
1 | Can you add a test for the new functionality? that the mapping doesn't depend on the name Use the hard-link function in InMemoryFS to create multiple names with one UID | |
60 | why the pointer? can't this just be FileManager Files(...) |
Just nits
clang-tools-extra/clangd/IncludeCleaner.cpp | ||
---|---|---|
359 | getLastRef().getName() is just the same as Entry->getName(). (Which is actually still not preserving the FERef under the hood, but will soon...) | |
clang-tools-extra/clangd/index/CanonicalIncludes.h | ||
40 | no const, FileEntryRef is passed by value (and below) | |
clang-tools-extra/clangd/index/SymbolCollector.cpp | ||
384–385 | again, avoid getLastRef | |
clang-tools-extra/clangd/unittests/CanonicalIncludesTests.cpp | ||
26 | no need to take ownership, just take InMemoryFileSystem&? | |
clang-tools-extra/clangd/unittests/SymbolCollectorTests.cpp | ||
1594 | avoid getLastRef |
Looks like this breaks tests on Windows: http://45.33.8.238/win/55690/step_9.txt
please take a look, etc :)
getLastRef().getName() is just the same as Entry->getName().
We should be using SM.getFileEntryRefForID instead.
(Which is actually still not preserving the FERef under the hood, but will soon...)