This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Don't collect refs from non-canonical headers.
Needs ReviewPublic

Authored by hokein on Oct 29 2018, 3:29 AM.

Details

Reviewers
sammccall
Summary

With this patch, we only collect refs for canonical headers (headers
with header guards). As non-canonical headers are usually generated, and
not interesting to users.

For LLVM, we have ~2.8 million refs (out of 6 million) for generated inc
headers.

BeforeAfter
file size78 MB55MB
memory330MB260MB

Event Timeline

hokein created this revision.Oct 29 2018, 3:29 AM

Please add a test with #pragma once, which should still be counted as a header guard.

It's not obvious whether/why this is the right thing to do.

One set of reasoning that occurs to me:

  • headers without guards produce code in some context-sensitive way, they only exist in the context of their including file
  • for the symbols they produce, we have ways to reconcile across contexts - by symbol ID
  • but for refs, there's no such context of identity
  • therefore it's not clear whether a ref in an unguarded header should be tracked once, once per include, or not at all. Not at all is defensible.

This falls down a bit though when we see that in practice we'll often have refs to the same symbol, from the same location, with the same role - this is a reasonable way to reconcile.

If this is just a way to trim down index size, we should consider codebases other than LLVM, as it seems pretty likely the use of generated files there is atypical.

clangd/index/SymbolCollector.cpp
479

High-level comment - what are you doing here? Why?

479

Use of "canonical" here and elsewhere isn't obvious to me - does this name come from somewhere?

"Modular" is how I would describe these headers - it's imprecise, but you could define it somewhere.