This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Suggest adding missing includes for incomplete type diagnostics.
ClosedPublic

Authored by ioeric on Jan 18 2019, 2:18 AM.

Details

Summary

This enables clangd to intercept compiler diagnostics and attach fixes (e.g. by
querying index). This patch adds missing includes for incomplete types e.g.
member access into class with only forward declaration. This would allow adding
missing includes for user-typed symbol names that are missing declarations
(e.g. typos) in the future.

Diff Detail

Repository
rL LLVM

Event Timeline

ioeric created this revision.Jan 18 2019, 2:18 AM
ioeric updated this revision to Diff 182796.Jan 21 2019, 8:10 AM
  • add comments

This looks pretty good! My main concern is latency (and variability of latency) in practice.
Particularly:

  • we should avoid paying for fuzzyfind and fetching hundreds of results when we want exact-match
  • limit the damage in degenerate cases: should we limit to e.g. 5 index queries per TU?

Actually, now that I think about it - if we can see a forward declaration, can't we do a lookup() instead of a fuzzyFind?
Is the problem that this doesn't generalize to the no-declaration case (where it looks like a typo)?

If we had an operation that looked like lookup() but worked by qname, would we design the feature the same way?
In particular, would we want to batch the requests to the index (lookup takes a set of IDs)? This would affect the code structure a bit. But it would make the feature cost basically O(1) instead of O(errs)...

clangd/ClangdUnit.cpp
295 ↗(On Diff #182796)

Probably worth a two-line comment explaining at a high level what this is doing.
Somewhat redundant with IncludeFixer.h but probably worth it anyway since this containing function is a complicated mess.

clangd/ClangdUnit.h
65 ↗(On Diff #182796)

(nit: can we keep this as a plain struct? Once the number of members get large, initializing them one-by-one probably reads better anyway...)

91 ↗(On Diff #182796)

Index is a reasonable (if surprising) param here, but I think we need to explicitly enable the include-fixing behavior. Even the very minimal hard-coded list of clang-tidy checks caused issues :-( And this is going to result in new network RPCs in some configs.

I'd suggest keeping the Index param, and wrapping the "use include fixer?" policy along with the clang-tidy options in D55256 as some sort of "diagnostic options" struct. (Though be wary of name clash with the one in Diagnostics.h, sigh).

clangd/Diagnostics.h
105 ↗(On Diff #182796)

This seems like a suspicious dependency.
Can we indirect here e.g. void contributeFixes(std::function<vector<Fix>(const clang::Diagnostic&)>)

clangd/IncludeFixer.cpp
1 ↗(On Diff #182796)

new copyright notice

39 ↗(On Diff #182796)

can you add a trace to this function so we know what the impact on latency is?

66 ↗(On Diff #182796)

limit?

74 ↗(On Diff #182796)

so issuing a bunch of fuzzy finds in sequence is clearly not ideal from a performance standpoint.
Any ideas on what we might do better, or how we can limit the worst case?
(not sure we need to do any better in this patch, but might be worth comments)

112 ↗(On Diff #182796)

should be no need for toString here

clangd/IncludeFixer.h
1 ↗(On Diff #182796)

new copyright notice

39 ↗(On Diff #182796)

nit: incomplete is one word

unittests/clangd/IncludeFixerTests.cpp
1 ↗(On Diff #182796)

new copyright notice; wrong filename

26 ↗(On Diff #182796)

because the helpers and public APIs tested are common, I wonder if we'd be better off creating DiagnosticsTests.cpp and moving the relevant tests from both ClangdUnit and here to there.

(Testing the fixes directly through the IncludeFixer class would make sense in this file, but it's awkward - I think testing through the diagnostics API is actually the right thing)

A drop-by comment.

clangd/IncludeFixer.cpp
66 ↗(On Diff #182796)

Why do we use fuzzyFind and not lookup here?
Are there cases when we can't construct SymbolID for the TagDecl?

ioeric updated this revision to Diff 182922.Jan 22 2019, 8:13 AM
ioeric marked 15 inline comments as done.
  • Address review comments

This looks pretty good! My main concern is latency (and variability of latency) in practice.
Particularly:

  • we should avoid paying for fuzzyfind and fetching hundreds of results when we want exact-match
  • limit the damage in degenerate cases: should we limit to e.g. 5 index queries per TU?

Actually, now that I think about it - if we can see a forward declaration, can't we do a lookup() instead of a fuzzyFind?
Is the problem that this doesn't generalize to the no-declaration case (where it looks like a typo)?

I switched to use lookup requests for incomplete type diagnostics. And yes, the typo errors need to use FuzzyFind or lookup by names as no USR is available.

If we had an operation that looked like lookup() but worked by qname, would we design the feature the same way?
In particular, would we want to batch the requests to the index (lookup takes a set of IDs)? This would affect the code structure a bit. But it would make the feature cost basically O(1) instead of O(errs)...

As chatted offline, we decided to leave the batching optimization as a TODO in this patch. See inline comment for more detailed response regarding performance.

clangd/ClangdUnit.h
91 ↗(On Diff #182796)

How about ParseOptions?

clangd/IncludeFixer.cpp
39 ↗(On Diff #182796)

Added a tracer in fixInCompleteType to avoid noise from unsupported diagnostics.

66 ↗(On Diff #182796)

Switched to use lookup. The typo diagnostics (i.e. undefined symbol) can't use lookup as there is no declaration, but we can definitely use lookup for incomplete types in this patch.

74 ↗(On Diff #182796)

As you suggested, batching requests from all diagnostics would definitely help. SymbolIndex already supports batch lookup by IDs, but we would need to extend SymbolIndex to support lookup by Names for the typo errors that we are handling in D57021. In order to batch index requests from all diagnostics, the existing logic around handling and storing diagnostics might also need refactoring. I added a TODO for this. We can revisit if the performance turned out to be a big problem.

Another thing we could do is adding cache for index results. For example, for the code copy-paste case, one incomplete/undefined symbol can cause multiple errors. Caching should save us some unnecessary index queries.

For this patch, I added a limit on the number of index requests in IncludeFixer (5 for now), which would prevent us from sending too many index requests when building AST.

Friendly ping.

ioeric updated this revision to Diff 183507.Jan 25 2019, 3:00 AM
  • minor touch on documentation.
ioeric updated this revision to Diff 183509.Jan 25 2019, 3:03 AM
  • Rebase correctly
sammccall accepted this revision.Jan 25 2019, 7:38 AM

Just a bunch of nits, up to you to work out how/whether to address.

One substantive issue around classes defined outside headers - will chat offline.

clangd/ClangdServer.h
118 ↗(On Diff #183509)

Can we call this option SuggestMissingIncludes or so? include-fixer is the name of a separate tool.

clangd/ClangdUnit.cpp
305 ↗(On Diff #183509)

nit: I'm not sure I agree with the IncludeFixer class taking ownership of the include inserter.
It reads it but never writes to it, so full ownership isn't *needed*. And if we had other features that wanted to compute include insertions, we'd want them to share an inserter.

295 ↗(On Diff #182796)

nit: can you hoist this above the line declaring FixIncludes?
That way (I think) it reads more naturally as "this is what the following block of code does" in a big method, doesn't fold away with the if, etc

clangd/Compiler.h
49 ↗(On Diff #183509)

isn't an optional pointer just a pointer?

clangd/IncludeFixer.cpp
39 ↗(On Diff #183509)

consider moving this 5 to a constructor param, so this file can be entirely mechanism and the policy is spelled out at the construction site.
(No need to actually make it configurable, I think)

41 ↗(On Diff #183509)

(You could also just write this as just a switch on Info.getID(), I'd find that a little direct/easier to read, up to you)

61 ↗(On Diff #183509)

TypeName?

63 ↗(On Diff #183509)

this seems an odd thing to check and way to phrase - there *was* a type in the message, but its name happens to be empty. Am I missing something? (I'd suggest just removing this)

80 ↗(On Diff #183509)

fixme is obsolete

81 ↗(On Diff #183509)

no need to check name

86 ↗(On Diff #183509)

There's a case I'm worried about:

Foo.h declares class X;, Foo.cpp defines class X{}.
Now Bar.cpp tries to do sizeof(X), so we get an incomplete type diagnostic.

  • suggesting #include Foo.cpp is bad - can't include impl file
  • suggesting #include Foo.h is also bad - doesn't help
  • providing no suggestion is correct.

I'm not sure the index really has enough structured information to detect this case, but we could check e.g. that definition header == canonical declaration header? Or that basename(IncludeHeaders) == basename(definition header)?

If possible, can you add a test for this case?

clangd/IncludeFixer.h
39 ↗(On Diff #182796)

not done - method is still "fixInCompleteType"

clangd/tool/ClangdMain.cpp
210 ↗(On Diff #183509)

similarly, I'd call this -suggest-missing-includes or just -suggest-includes

214 ↗(On Diff #183509)

why hidden?

This revision is now accepted and ready to land.Jan 25 2019, 7:38 AM
ioeric updated this revision to Diff 183838.Jan 28 2019, 5:56 AM
ioeric marked 12 inline comments as done.
  • address review comments
ioeric marked an inline comment as done.Jan 28 2019, 5:58 AM
This revision was automatically updated to reflect the committed changes.