This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Update symbol collector to use include-cleaner.
ClosedPublic

Authored by VitaNuo on Jun 14 2023, 5:21 AM.

Diff Detail

Event Timeline

VitaNuo created this revision.Jun 14 2023, 5:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2023, 5:21 AM
VitaNuo updated this revision to Diff 531275.Jun 14 2023, 5:29 AM

Remove redundant variables.

VitaNuo updated this revision to Diff 531278.Jun 14 2023, 5:33 AM

Remove redundancy.

VitaNuo published this revision for review.Jun 14 2023, 5:38 AM
VitaNuo added a reviewer: kadircet.
kadircet added inline comments.Jun 16 2023, 1:45 AM
clang-tools-extra/clangd/index/SymbolCollector.cpp
413–415

sorry i don't understand what this logic is trying to achieve.

we seem to be prefering "first" exporting header, over the headers that directly provide the symbol. Also comment says it's done for objc, but there's nothing limiting this logic to ObjC. any chance this is unintended?

798

s/ASTCtx->getSourceManager()/SM

883

getIncludeHeader might return an empty string

893

this is using legacy mappings for non-objc symbols too

896

you're iterating over multiple headers, and only keeping references for the last one alive.

instead of storing these in the class state, you can have a llvm::DenseMap<Header, std::string> HeaderToSpelling; inside this function and later on just do:

for(const auto& H: It->second) {
  auto [SpellingIt, Inserted] = HeaderToSpelling.try_emplace(H);
  if(Inserted) {
      switch (...) { /* Update *SpellingIt */ }
  }
  NewSym.IncludeHeaders.push_back(*SpellingIt, 1, Directives);
902–904

we actually only want to use HeaderFileURIs->toURI here and not the getIncludeHeader, because:

  • We want to make use of mapping logic in include-cleaner solely, and we already have all that information in Providers mapping.
  • The extra logic we want to apply for ObjC symbols are already applied before reaching here.

the way we create a FileID here is a little bit iffy, by using toURI, hopefully we can avoid that conversion.

the only remaining issue is, we actually still want to use system header mappings inside CanonicalIncludes until we have better support for them in include-cleaner library itself. So I think we should still call Includes->mapHeader(H.physical()) before using toURI.

clang-tools-extra/clangd/index/SymbolCollector.h
129

unfortunately this alternative is called after we've parsed the file, from clangd/index/FileIndex.cpp, indexSymbols. hence attaching PI to PPCallbacks won't have any effect.

This is working unintentionally because we're still populating CanonicalIncludes with legacy IWYUPragmaHandler in clangd, and using that to perform private -> public mappings.

We should instead propagate the PragmaIncludes we have in ParsedAST and in preamble builds here. You should be able to test the new behaviour clang-tools-extra/clangd/unittests/FileIndexTests.cpp to make sure pragma handling through dynamic indexing code paths keep working as expected using an export pragma.

kadircet added inline comments.Jun 16 2023, 1:45 AM
clang-tools-extra/clangd/index/SymbolCollector.cpp
850

you can take this by value and std::move into the map

VitaNuo updated this revision to Diff 532964.Jun 20 2023, 9:30 AM
VitaNuo marked 8 inline comments as done.

Address review comments.

Thanks for the comments!

clang-tools-extra/clangd/index/SymbolCollector.cpp
413–415

No, this was intended, but possibly I didn't get it right.

comment says it's done for objc, but there's nothing limiting this logic to objc

AFAIU this whole code path will now only be reachable for objc symbols. I have removed pragma includes from the canonical include mapping now, leaving the pragma handling to include cleaner. However, that only triggers for C/C++, so in case we need the export and private pragmas for objc, we need to re-insert the handling for them somewhere.
There is no pre-existing test case for pragmas in objc and there is no usage of these pragmas for objc code in the code base either, so I have no way of knowing if this is actually needed. But I believe you've mentioned in some discussion we should still handle the pragmas for objc.

we seem to be preferring "first" exporting header, over the headers that directly provide the symbol.

Isn't that correct if there's an IWYU export pragma involved? The snippet comes from include_cleaner::findHeaders, with the only difference that it stops at the first exporter (since the canonical include mapping also just stored one mapping for the header).

Let me know how to do it better, or maybe if this is necessary at all. Honestly, I am not sure about this, since, as mentioned, there are no export or private/public pragmas in objc files in the codebase atm.

893

Removed this whole passage.

902–904

Ok SGTM. It seems, though, that we still need the "iffy" FID for the canonical mapping.. Let me know if you know a better way.

clang-tools-extra/clangd/index/SymbolCollector.h
129

Ok thank you. I'm using PragmaIncludes from AST and preamble builds for the dynamic index now.

For the background index, I've finally managed to move pragma recording to the IndexAction now.

I've also removed the redundant (to include cleaner) comment handlers from the AST build and preamble building logic.

Also removed the mapSymbol method from CanonicalIncludes, since it had one usage which should now be covered by the include cleaner, I believe.

kadircet added inline comments.Jun 22 2023, 5:47 AM
clang-tools-extra/clangd/ParsedAST.cpp
665

can you also add a FIXME here saying that we should attach PragmaInclude callbacks to main file ast build too?

this is not a recent regression as we were not doing it properly in previous version either. IWYU handler we attach only cares about private pragmas, which don't mean much inside the main file. but we actually need it for other pragmas like keep/export/no_include etc. (no need to do it in this patch as it's already getting quite big, let's do that as a follow up)

clang-tools-extra/clangd/Preamble.h
121–123

you need to rebase as we had some big changes around this logic recently. we should pass the PragmaIncludes also as a shared_ptr (and store as one inside the preambledata)

clang-tools-extra/clangd/index/CanonicalIncludes.h
34–35

AFAICT, only users of this endpoint were in IWYUCommentHandler (and some tests). we can also get rid of this one now.

More importantly now this turns into a mapper used only by symbolcollector, that can be cheaply created at use time (we just copy around a pointer every time we want to create it). so you can drop CanonicalIncludes from all of the interfaces, including SymbolCollector::Options and create one on demand in HeaderFileURICache. all we need is langopts, and it's available through preprocessor (not necessarily on construction, but at the time when we want to do the mappings).

As you're already touching all of the interfaces that propagate CanonicalIncludes around, hopefully this change should only make things simpler (and you already need to touch them when you rebase), but if that feels like too much churn in this patch feel free to do that in a follow up as well.

35–38

let's mention that this will always return a verbatim spelling, e.g. with quotes or angles.

50–51

you can drop this one too, only addMapping would introduce mappings here, but that's going away too.

clang-tools-extra/clangd/index/SymbolCollector.cpp
255

let's change the interface here to take in a FileEntry instead.

256

nit: early exits

if (!SysHeaderMapping)
  return "";
auto Canonical = ...;
if(Canonical.empty())
  return "";
...
263

we should never hit this code path anymore. so feel free to convert the if above to an assert

413–415

existing IWYUCommentHandler only handled private pragmas, so export pragmas were already dropped on the floor, we shouldn't introduce handling for them in here now. the mappings you perform for private/public pragmas and system headers below is enough.

(P.S. using exporters as "alternatives" vs "sole provider" are quite different, policy in include_cleaner does the former whereas logic in here does the latter)

417–418

let's move this logic into mapCanonical too.

420

nit: if (auto Canonical = ...; !Canonical.empty()) return Canonical;

849

what about merging this one and setIncludeLocation ?

e.g:

void SymbolCollector::setIncludeLocation(const Symbol &IdxS, SourceLocation DefLoc, const include_cleaner::Symbol &Sym) {
if (!Opts.CollectIncludePath ||
      shouldCollectIncludePath(S.SymInfo.Kind) == Symbol::Invalid)
  return;
IncludeFiles[SID] = ...;
SymbolProviders[SID] = headersForSymbol(...);
}
852

because you're taking in Headers as const, this move is not actually moving.

873–874

let's iterate over SymbolProviders instead, as IncludeFiles is a legacy struct now.

873–874

inside of this branch is getting crowded, maybe early exit instead?

const Symbol *S = Symbols.find(SID);
if(!S) continue;
...
887

no need to call insert if we didn't modify IncludeHeaders, so maybe reflow to something like:

if (Directives & Import) {
  if(auto IncHeader = ..; !IncHeader.empty()) {
     auto NewSym = *S;
     NewSym.IncludeHeaders....;
     Symbols.insert(NewSym);
  }
  // FIXME: Use providers from include-cleaner once it's polished for ObjC too.
  continue;
}
893

s/if/assert

909

no need for the getOrCreateFileID after you change parameter for mapCanonical to be a file entry.

911

nit:

if(auto Canonical = mapCanonical(H.physical()); !Canonical.empty())
  SpellingIt->second = Canonical;
913

this should be else if. if we performed a mapping, we no longer care about self-containedness of original header.

914

s/PP->getSourceManager()/SM

917

s/if (SpellingIt->second.empty())/else

clang-tools-extra/clangd/index/SymbolCollector.h
170–172

we seem to be populating this map for all kinds of symbols, as long as shouldCollectIncludePath returns non-invalid. what's up with distinguishing imports from includes?

183

no need to have this in the class state, you can just inline this into ::finish method instead.

187

let's move this closer to the SymbolProviders map

197

we already store this in Opts, can we just use it from there instead of creating an extra name for it?

kadircet added inline comments.Jun 23 2023, 4:37 AM
clang-tools-extra/clangd/index/SymbolCollector.cpp
849

also sorry for missing this so far, but i think we should limit this to first provider initially. as otherwise we'll have providers that just accidentally declare certain symbols and we don't want them to be inserted (unless they're the only provider and end up at the top) just because our ranking heuristics fail or something.

VitaNuo updated this revision to Diff 535740.Jun 29 2023, 5:28 AM
VitaNuo marked 27 inline comments as done.

Address review comments.

Thanks for comments!

clang-tools-extra/clangd/index/CanonicalIncludes.h
34–35

... and create one on demand in HeaderFileURICache. all we need is langopts, and it's available through preprocessor (not necessarily on construction, but at the time when we want to do the mappings

This time sounds a little late, since we don't want to rebuild the system header mapping every time we request a mapping for a particular header.
Cache construction time doesn't work, since it happens before the preprocessor is set in the symbol collector (and we need the preprocessor to retrieve language options).
So far the safe place I've found is right after the preprocessor is set in the symbol collector. Another option is to collect the system header options in the finish() method, but I can't see why we would need to delay until then.

clang-tools-extra/clangd/index/SymbolCollector.cpp
255

I believe it has to be FileEntryRef instead. This is what the CanonicalIncludes::mapHeader expects, and this is also seems right (somewhere in the Clang docs I've read "everything that needs a name should use FileEntryRef", and name is exactly what CanonicalIncludes::mapHeader needs).

For the case of a physical header (i.e., FileEntry instance) I've switched to using getLastRef now. There's also another API FileManager::getFileRef where I could try passing the result of tryGetRealPathName. I am not sure I have enough information to judge if any of these options is distinctly better.

417–418

This would deliver wrong results for C++. We would basically get every IWYU-public header twice: once through the verbatim branch of the include_cleaner results, and then once again when mapping the physical private file to a canonical version (e.g., for system headers).

852

Ok, this is not relevant anymore, but I am not sure I understand why. Am I not allowed to say that I don't need this variable/memory anymore, even though it's const?

873–874

Not sure I'm following. Iterating over SymbolProviders means retrieving include_cleaner::Headers. How would I handle the entire Obj-C part then, without the FID of the include header?

909

Still need the getLastRef or FileManager::getFileRef, if I understand correctly.

clang-tools-extra/clangd/index/SymbolCollector.h
170–172

Yeah that's right. I've been trying to express that (after populating this map) we are not using this FID to compute the IncludeHeaders anymore (only for the ObjC case), but rather use it to determine if the Directives should contain Symbol::Import (ie., if this FID contains Obj-C constructs/imports).

I'll try to be a bit more verbose and precise.

kadircet added inline comments.Jul 10 2023, 7:29 AM
clang-tools-extra/clangd/ClangdServer.cpp
86

nit: PI(std::move(PI))

clang-tools-extra/clangd/TUScheduler.cpp
1085

nit: s/PI/std::move(PI)

clang-tools-extra/clangd/index/CanonicalIncludes.cpp
671–672

we're no longer using any FileEntry pieces of this parameter you can take in just a llvm::StringRef HeaderPath instead.

clang-tools-extra/clangd/index/CanonicalIncludes.h
34–35

This time sounds a little late, since we don't want to rebuild the system header mapping every time we request a mapping for a particular header.

System header mapping is a singleton, we build it statically once on first request throughout the whole process and just use that for the rest of the execution.

clang-tools-extra/clangd/index/FileIndex.cpp
52

again you can have a const include_cleaner::PragmaIncludes & without shared_ptr here.

clang-tools-extra/clangd/index/FileIndex.h
119

you can have a const include_cleaner::PragmaIncludes& PI directly both in here and in indexHeaderSymbols.

after building a preamble we always have a non-null PragmaIncludes and these functions are always executed synchronously, so no need to extend ownership and make the code indirect.

clang-tools-extra/clangd/index/IndexAction.cpp
230–235

you can have a unique_ptr here instead (once we store a non-owning pointer in Opts.PragmaIncludes)

clang-tools-extra/clangd/index/SymbolCollector.cpp
255

yeah getLastRef is fine, i was mostly trying to avoid dodgy call to getOrCreateFileID.

you can also pass in just a llvm::StringRef HeaderPath now, as we're not using anything but file path. which you can retreive with getName from both FileEntry and FileEntryRef

412

s/SM.getFileEntryForID(FID)/FE

797

you can actually change include_cleaner::Macro to have a const IdentifierInfo* instead of the const_cast here. it's mostly an oversight to store a non-const IdentifierInfo in the first place.

841

once we have the optional<Header> as the value type you can also rewrite this as:

auto [It, Inserted] = SymbolProviders.try_emplace(S.ID);
if (Inserted) {
  auto Providers = include_cleaner::headersForSymbol(Sym, SM, Opts.PragmaIncludes.get());
  if(!Providers.empty()) It->second = Providers.front();
}

that way we can get rid of some redundant calls to headersForSymbol

852

std::move is unfortunately a little bit misleading at times. it is just a cast that returns an rvalue reference. hence on its own it doesn't actually move anything.

but it enables picking rvalue versions of constructors/assignment operators. (e.g. Foo(Foo&&) or operator=(Foo&&)) which can be implemented efficiently with the assumption that RHS object is being "moved from" and will no longer be used. that way certain types implement a move semantics by just consuming the RHS object. unfortunately that is not possbile when RHS is const, and expressions use regular copy semantics instead.

873–874

we're joining IncludeFiles and SymbolProviders, as they have the same keys.

right now you're iterating over the keys in IncludeFiles and doing a lookup into SymbolProviders using that key to get providers.
we can also do the iteration over SymbolProviders and do the lookup into IncludeFiles instead to find associated FID.

901

you can move the copy into this branch to make sure we're only doing that when necessary, e.g:

if(auto IncludeHeader = ...) {
  Symbol NewSym = *S;
  NewSym.IncludeHeaders...;
  Symbols.insert(NewSym);
}
918

FWIW, i think we should assert(It != SymbolProviders.end()) (or IncludeFiles.end() when you change the outer loop). as we're inserting into these maps simultaneously.

921

we're not really getting any caching benefits as this map is inside the loop. can you put it outside the loop body?

924

we shouldn't continue here, it just means we can directly use SpellingIt to figure out what to put into NewSym.IncludeHeaders.

maybe something like:

auto [SpellingIt, Inserted] = HeaderSpelling.try_emplace(H);
if (Inserted)
  SpellingIt-> second = getSpelling(H, *PP);
if(!SpellingIt->second.empty()) {
  Symbol NewSym = *S;
  NewSym.IncludeHeaders.push_back({SpellingIt->second, 1, Directives});
  Symbols.insert(NewSym);
}
std::string getSpelling(const include_cleaner::Header &H, const Preprocessor &PP) {
  if(H.kind() == Physical) {
   // Check if we have a mapping for the header in system includes.
   // FIXME: get rid of this once stdlib mapping covers these system headers too.
   CanonicalIncludes CI;
   CI.addSystemHeadersMapping(PP.getLangOpts());
    if (auto Canonical = CI.mapHeader(H.physical()->getLastRef()); !Canonical.empty())
        return Canonical;
    if (!tooling::isSelfContainedHeader(...))
        return "";
  }
  // Otherwise use default spelling strategies in include_cleaner.
  return include_cleaner::spellHeader(H);
}
clang-tools-extra/clangd/index/SymbolCollector.h
62

no need to have ownership here, we can have just a const include_cleaner::PragmaIncludes* PI. SymbolCollector shouldn't need to extend lifetimes

127

as mentioned above addSystemHeadersMapping is cheap after the first call in the process, we initialize the mappings only once. so you can just have it as a member in HeaderFileURICache, and drop it completely from the public interface of SymbolCollector

179

can we have a llvm::DenseMap<SymbolID, std::optional<include_cleaner::Header>>, to indicate there are no providers for a symbol, rather than missing the symbol completely from the mapping?

VitaNuo marked 19 inline comments as done.Jul 13 2023, 2:31 AM

Thank you!

clang-tools-extra/clangd/index/SymbolCollector.cpp
841

Sure, although I'm not sure where the redundant calls would be coming from. I've been under the impression that this function is supposed to be called once for each symbol.

873–874

Ok, makes sense now, thanks.

924

As agreed offline, it is not easy to split out a spelling function, since HeaderFileURIs is a private member of the symbol collector, and we seem to still need to generate URIs for physical headers going forward.

VitaNuo updated this revision to Diff 539908.Jul 13 2023, 2:31 AM
VitaNuo marked 3 inline comments as done.

Address review comments.

kadircet accepted this revision.Jul 17 2023, 11:45 PM

thanks a lot for bearing with me, let's ship it!

clang-tools-extra/clangd/IncludeCleaner.cpp
91

nit: just const include_cleaner::PragmaIncludes* PI

clang-tools-extra/clangd/index/SymbolCollector.cpp
260

SysHeaderMapping doesn't need to be part of class state, just define it here completely. (also gets rid of the need for extra explanation and any possible questions around what would happen when we add system headers multiple times)

427

nit: can you move this above the call to mapCanonical and use it there too?

832–833

nit: early exits might be more readable, as we've more nesting now. (if (!Opts.Collect.. || shouldCollect..() == Symbol::Invalid) return;)

841

yeah unfortunately addDeclaration can be called multiple times (with increasing declaration quality, e.g. once for the forward declaration, then for a complete definition of a class)

878–881

nit: auto FID = IncludeFiles.at(SID);

931

i believe this should be just H.physical()->getLastRef() ? as we're passing in a fileentryref into toURI

932

nit: braces, as we've multiple lines (despite being a single statement)

937–940
if (!SpellingIt->second.empty()) {
  auto NewSym = *S;
  NewSym.IncludeHeaders.push_back({SpellingIt->second, 1, Directives});
  Symbols.insert(NewSym);
}
This revision is now accepted and ready to land.Jul 17 2023, 11:45 PM
VitaNuo updated this revision to Diff 542000.Jul 19 2023, 6:45 AM
VitaNuo marked 9 inline comments as done.

Address the latest comment batch.

This revision was landed with ongoing or failed builds.Jul 19 2023, 6:47 AM
This revision was automatically updated to reflect the committed changes.