This is an archive of the discontinued LLVM Phabricator instance.

[clangd] IncludeCleaner as a library: Find all references to symbols in the file
ClosedPublic

Authored by kbobyrev on Jul 5 2021, 4:45 AM.

Details

Summary

This is the first patch in an ongoing attempt of Include Cleaner: unused/missing
headere diagnostics, an IWYU-like functionality implementation for clangd. The
work is split into (mostly) distinct and parallelizable pieces:

  • Finding all referenced locations (this patch).
  • Finding all referenced locations of macros.
  • Building IncludeGraph and marking headers as unused, used and directly used.
  • Making use of the introduced library and add an option to use in clangd.

  • Adding support for standard library headers (possibly through mapping genfiles).

Based on https://reviews.llvm.org/D100540.

Diff Detail

Event Timeline

kbobyrev created this revision.Jul 5 2021, 4:45 AM
kbobyrev requested review of this revision.Jul 5 2021, 4:45 AM
kbobyrev planned changes to this revision.Jul 5 2021, 4:46 AM
kbobyrev updated this revision to Diff 364413.Aug 5 2021, 4:25 AM

Significantly decrease the scope of this change. Focus on "IWYU as a library",
do not propagate into the other parts of clangd yet and keep the change
self-contained for the most part.

kbobyrev updated this revision to Diff 364526.Aug 5 2021, 10:01 AM

Trim the patch even further: only extract locations, prune header marking logic
etc.

kbobyrev updated this revision to Diff 364610.Aug 5 2021, 2:13 PM

Trim the patch even further: the scope is now simply findAllReferences
mechanism. Add more tests and improve functionality (handle using decls and
implicit types).

kbobyrev updated this revision to Diff 364615.Aug 5 2021, 2:28 PM

Add one more test.

kbobyrev updated this revision to Diff 364625.Aug 5 2021, 2:55 PM

Exclude std:: namespace handling.

kbobyrev retitled this revision from [clangd] WIP: Unused header warnings to [clangd] IWYU as a library.Aug 5 2021, 2:59 PM
kbobyrev edited the summary of this revision. (Show Details)
kbobyrev added a reviewer: sammccall.
kbobyrev retitled this revision from [clangd] IWYU as a library to [clangd] IWYU as a library: Find all references to symbols in the file.
kbobyrev updated this revision to Diff 364709.Aug 6 2021, 12:15 AM

Improve comments, refactor some pieces, add more tests.

kbobyrev edited the summary of this revision. (Show Details)Aug 6 2021, 12:39 AM
kbobyrev updated this revision to Diff 364714.Aug 6 2021, 12:41 AM

Remove some code from the next chunk of IWYU work.

kbobyrev updated this revision to Diff 364735.Aug 6 2021, 2:42 AM

Rebase on top of main.

First some thoughts on names... (hopefully to avoid some confusion)

IWYU might be used here to mean three things:

  • the IWYU coding style/policy that says you should #include headers for the symbols you make use of, and no others
  • the IWYU tool which applies IWYU style to files
  • more generally, features that encourage hygiene policies for #includes ("IWYU functionality")

This work is about the last of these: putting header hygiene features in clangd. (Specifically, likely unused-include and missing-include warnings).
I think we should avoid the name "IWYU as a library" or "IWYU functionality" as it's ambiguous and also misleading:

  • policy: we're likely to support policies other than the IWYU style
  • tool: it's not likely to match the design or feature set of the IWYU tool

IncludeHygiene would be an ok name I think but a bit of a mouthful. WDYT?

(This work is definitely inspired by the IWYU tool and I'd used the name in discussions/prototypes, so this confusion is largely my fault! We also did this with clang-include-fixer which ended up as clangd/IncludeFixer.h, though the designs are much more similar)

Thanks, this scope looks good for a first patch!

clang-tools-extra/clangd/IWYU.cpp
18 ↗(On Diff #364735)

The first sentence doesn't mean much to me, I think we could just drop it?

63 ↗(On Diff #364735)

Is this just "FIXME: should this behavior be configurable or tweaked?"

I think it's clear enough what "this behavior" is in context.
Whereas the rest of the comment adds some confusion (e.g. what's an "implicit type", what usage of auto are you talking about etc. And I think it's *way* too soon to say that it *should* be configurable - remember the application is unused-include, turning this off only grants the ability for headers to be marked unused even if they define the type of a subexpression.

I think it's more likely we'd tweak the heuristic in the future, e.g. only do this if it's a type that can be incomplete.

73 ↗(On Diff #364735)

i usually add a private using Base = ... and refer to Base::TraverseType(T) - intent is clearer and often we end up with several of these

95 ↗(On Diff #364735)

this name is a bit vague (why wouldn't we add it?), maybe isNew?

clang-tools-extra/clangd/IWYU.h
10 ↗(On Diff #364735)

Related to the naming discussion, this should first say what the functionality is (warnings about header hygiene: misuse of transitive headers and unused includes).
Then definitely credit the IWYU tool and mention the differences.

17 ↗(On Diff #364735)

I'm not sure we want to do this (or plan to do this) unless we have some idea of how discovery would work.

34 ↗(On Diff #364735)

Maybe just "Finds locations of all symbols used in the main file"?

If you want to talk about traversal/impl strategy, that's OK but not in the first paragraph.

35 ↗(On Diff #364735)

I'd be explicit that the returned locations may be macro expansions, and are not resolved to their spelling/expansion location.

38 ↗(On Diff #364735)

I'm not quite sure what this is saying/who the audience for the comment is.
Maybe it's interesting to say something like

We use this to compute unused headers, so:

  • we cover the whole file in a single traversal for efficiency
  • we don't attempt to describe where symbols were referenced from
  • in ambiguous cases (e.g. implicitly used symbols, multiple declarations) we err on the side of reporting all possible locations
41 ↗(On Diff #364735)

It's a bit more general than this (e.g. multiple declaration). See previous comment for a wording suggestion

43 ↗(On Diff #364735)

I actually don't think we'd use the same entrypoint for missing include, but rather duplicate it (and *maybe* share some impl details whose docs don't belong here).

The reason is we get to "throw away" different info in each case to make the data structures as tight as possible:

  • for unused include, we can throw away the reference location and just track sets of locations and then files. This means we traverse first, build a bundle of locations, and then process them together.
  • for missing include, we need the reference location, but don't need to associate them with specific #includes. So the tight data structure here is a set of allowed FileIDs plus a cache for macro FileIDs. Then we'll traverse last and query one symbol at a time, with early exit.

Comments about future plans can sometimes be useful I think, but this one seems pretty unlikely to pan out.

47 ↗(On Diff #364735)

It's not obvious what this actually means.
Would be good to mention expanding macros

clang-tools-extra/clangd/unittests/IWYUTests.cpp
28 ↗(On Diff #364735)

I think i'd be useful to have a comment in these for the node type or special case that's being tested

First few are

DeclRefExpr
RecordType
TypedefType
MemberExpr
Expr (type is traversed)
Redecls
CXXConstructExpr

84 ↗(On Diff #364735)

I don't think so, IMO current behavior is fine.
B is not named, nor can its absence cause the type to be incomplete.

97 ↗(On Diff #364735)

A bit more explicit about the bug?

When a type is resolved via a using declaration, the UsingShadowDecl is not referenced in the AST.
Compare to TypedefType, or DeclRefExpr::getFoundDecl().

kbobyrev updated this revision to Diff 365094.Aug 9 2021, 12:05 AM
kbobyrev marked 15 inline comments as done.

Thank you for the great suggestions! Indeed, we already have the confusion
regarding IWYU having the CLI flag and users are thinking we provide some sort
of integration/implementation. Having even more confusion (implementing
similar) functionality for our needs is not what we want (also, apologies for
confusion,"as a library" was more about the patch in this context - meaning not
a "usable" feature yet but a smaller scope where we just start the
implementation as the library that can only be used in tests for now). Hence,
renamed the feature to "Include Sanity" (close to Include Hygiene but sounds
better IMO, please let me know if you think otherwise).

I have addressed the review comments and will base my next patch on this (this
patch is probably mostly the blocker for the upcoming ones).

kbobyrev retitled this revision from [clangd] IWYU as a library: Find all references to symbols in the file to [clangd] Include Sanity as a library: Find all references to symbols in the file.Aug 9 2021, 12:07 AM
kbobyrev edited the summary of this revision. (Show Details)
kbobyrev retitled this revision from [clangd] Include Sanity as a library: Find all references to symbols in the file to [clangd] IncludeCleaner as a library: Find all references to symbols in the file.Aug 10 2021, 12:28 AM
kbobyrev edited the summary of this revision. (Show Details)
kbobyrev updated this revision to Diff 366819.Aug 17 2021, 12:32 AM

Remove findReferencedFiles (to be introduced in the next patch) to preserve
proposed granularity.

kbobyrev updated this revision to Diff 366830.Aug 17 2021, 1:34 AM

Fix tests name.

kbobyrev updated this revision to Diff 366832.Aug 17 2021, 1:45 AM

[clangd] IncludeCleaner: Mark used headers.

Follow-up on D105426.

kbobyrev updated this revision to Diff 366833.Aug 17 2021, 1:47 AM

Push back to the origin.

sammccall accepted this revision.Aug 17 2021, 5:48 AM
sammccall added inline comments.
clang-tools-extra/clangd/IncludeCleaner.cpp
66

I just realized this uses "T" if any expression has type "T*" even if never dereferenced, this is probably a gross overestimate. Nevertheless I guess fine for now.

102

this struct is not used in this patch

This revision is now accepted and ready to land.Aug 17 2021, 5:48 AM
kbobyrev updated this revision to Diff 367138.Aug 18 2021, 12:55 AM
kbobyrev marked an inline comment as done.

Remove unused code.

This revision was landed with ongoing or failed builds.Aug 18 2021, 1:10 AM
This revision was automatically updated to reflect the committed changes.