This is an archive of the discontinued LLVM Phabricator instance.

[clangd] IncludeCleaner: Skip non self-contained headers
ClosedPublic

Authored by kbobyrev on Oct 28 2021, 2:11 AM.

Details

Summary

Headers without include guards might have side effects or can be the files we
don't want to consider (e.g. tablegen ".inc" files). Skip them when translating
headers to the HeaderIDs that we will consider as unused.

Diff Detail

Event Timeline

kbobyrev created this revision.Oct 28 2021, 2:11 AM
kbobyrev requested review of this revision.Oct 28 2021, 2:11 AM
kbobyrev updated this revision to Diff 382955.Oct 28 2021, 2:12 AM

Update the docs.

kbobyrev planned changes to this revision.Oct 28 2021, 2:17 AM

Aww, I think this is the wrong place to do that.

kbobyrev updated this revision to Diff 382968.Oct 28 2021, 2:49 AM

Put the check into the right place.

kbobyrev updated this revision to Diff 382969.Oct 28 2021, 2:50 AM

Remove redundant changes.

kbobyrev updated this revision to Diff 382970.Oct 28 2021, 2:51 AM

Revert docs change that is no longer true.

sammccall added inline comments.Oct 28 2021, 8:49 AM
clang-tools-extra/clangd/IncludeCleaner.cpp
192–213

this is a local helper only, just pass in ParsedAST instead of all the extra params?

249–250

this is fixed now (though I think it would be better addressed ~here)

298–299

while here, pull out findReferencedFiles into its own var, and leave a FIXME to adjust the set of referenced files to account for non-self-contained headers whose symbols can be attributed to their include parents

324–325

Oops, I missed this in previous review.

Why are we first computing which includes are unused (including ineligible) and then filtering some out, rather than doing this check in the loop in computeUnused?

clang-tools-extra/clangd/unittests/DiagnosticsTests.cpp
1483

This can be now or later, but we can't/shouldn't jam every feature into one test case.
At some point we need to find a way to split this up into different cases, or at least not have many features that can only be tested at the diagnostic level.
(If we call mayConsiderUnused inside computeUnused as suggested above, this no longer has to be a diagnostic test)

sammccall added inline comments.Oct 28 2021, 8:50 AM
clang-tools-extra/clangd/IncludeCleaner.cpp
298–299

Sorry, this was meant to be an optional suggestion rather than an angry command :-)

kbobyrev updated this revision to Diff 383085.Oct 28 2021, 10:58 AM
kbobyrev marked 4 inline comments as done.

Address review comments.

clang-tools-extra/clangd/IncludeCleaner.cpp
249–250

That's what I had initially (you can see the previous version of the patch, after which I sent "oh, wrong place" and changed it). I think my idea was that this looks confusing from the API standpoint. Choosing to filter out non-guarded headers looks like a diagnostics decision and it looked awkward to me to do that here: what getUnused does is simply merge two data structures. I think it's somewhat similar to the patch where you mentioned "parse, don't validate" and it is a great argument: the <built-in> and <scratch-space> are indeed not "headers" in a sensible meaning of the term but non self-contained ones are.

I don't have a very strong opinion on this, so I put the filtering back here but I'm curious if you agree with my thoughs.

kbobyrev updated this revision to Diff 383086.Oct 28 2021, 10:59 AM

Remove stale FIXME.

sammccall added inline comments.Oct 28 2021, 11:23 AM
clang-tools-extra/clangd/IncludeCleaner.cpp
249–250

Sorry, I was unclear: I meant we should put the call to mayConsiderUnused here, not that we should split it up and put half of it here. So the "written with <" check would also be at this point.

Choosing to filter out non-guarded headers looks like a diagnostics decision

I don't think so: we don't know that a header is unused if we suspect it of being an umbrella header or used for side effects. The reason we're choosing not to diagnose it doesn't have anything to do with diagnostics per se.
You could say "unused" has some more narrow definition, but that doesn't seem very useful to me.

what getUnused does is simply merge two data structures

Sure, because the nontrivial parts haven't been implemented yet :-) But its role is to make filtering decisions of Inclusions from the AST to support a high-level concept of "unused", so I think the filtering fits well here.
If we support a non-strict policy, I'd expect getUnused to implement that (maybe with helpers) and that's going to involve some more analysis too.

On the other hand, I think the role of the loop in issueUnusedIncludesDiagnostics *is* simply converting: between an unused include list to diagnostics.

I think it's somewhat similar to the patch where you mentioned "parse, don't validate" and it is a great argument: the <built-in> and <scratch-space> are indeed not "headers" in a sensible meaning of the term but non self-contained ones are.

I'm not sure the analogy holds: my argument was to do the filtering at the place where you're doing the conversion/parsing that might fail. But here we could perfectly well convert the unused FileIDs to unused Header IDs to unused Inclusions to diagnostics, it would just be logically wrong to do so. We have free choice of where to put that logic, and diagnostic conversion seems like the wrong place to me.

kbobyrev updated this revision to Diff 383250.Oct 29 2021, 1:16 AM
kbobyrev marked 3 inline comments as done.

Address review comments.

kbobyrev updated this revision to Diff 383253.Oct 29 2021, 1:33 AM

Add a header guard in tests.

sammccall accepted this revision.Oct 29 2021, 7:51 AM
sammccall added inline comments.
clang-tools-extra/clangd/IncludeCleaner.cpp
250–251

(sorry, I missed this before)

We should not be elogging this case, we're already emitting a diagnostic and it doesn't indicate anything wrong with clangd itself.

256

This seems pretty vague text (particularly: what does valid mean, and is it valid but used or invalid but unused). Maybe "{0} was not used, but is not eligible to be diagnosed as unused"?

clang-tools-extra/clangd/IncludeCleaner.h
64

I'm slightly worried about these sorts of comments becoming stale, they're really just describing what we think "not used" means.

I'd replace it with something more general like "In unclear cases, headers are not marked as unused", but up to you

This revision is now accepted and ready to land.Oct 29 2021, 7:51 AM
kbobyrev updated this revision to Diff 383378.Oct 29 2021, 8:57 AM
kbobyrev marked 3 inline comments as done.

Resolve the review comments.

This revision was landed with ongoing or failed builds.Oct 29 2021, 8:57 AM
This revision was automatically updated to reflect the committed changes.