This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Support multiple #include headers in one symbol.
ClosedPublic

Authored by ioeric on Aug 27 2018, 2:37 AM.

Details

Summary

Currently, a symbol can have only one #include header attached, which
might not work well if the symbol can be imported via different #includes depending
on where it's used. This patch stores multiple #include headers (with # references)
for each symbol, so that CodeCompletion can decide which include to insert.

In this patch, code completion simply picks the most popular include as the default inserted header. We also return all possible includes and their edits in the CodeCompletion results.

Event Timeline

ioeric created this revision.Aug 27 2018, 2:37 AM

@sammccall The code still needs cleanup but should be useful for providing high-level feedback, which I would like to get before moving further. Thanks!

ioeric updated this revision to Diff 162650.Aug 27 2018, 4:24 AM
  • minor cleanup

Nice!

We could reduce the scope of this patch somewhat by ignoring file proximity and just switching to return the most popular header. This would be a solid improvement over current behavior, and provide the infrastructure for the file-proximity approach.

clangd/CodeComplete.cpp
1443

I remain unconvinced that providing multiple completion candidates is the right thing to do here:

  • if the index hasn't seen a definition, then we're going to show one copy of the completion for each header that has a declaration
  • the user isn't going to have a useful way to distinguish between them. Note that e.g. we're going to show the #include path in the detail, but the documentation is going to be identical for each.
  • we lose the invariant that each completion item (pre-bundling) maps to a different symbol
  • C++ embedders don't have the option of displaying the multiple options once the completion is selected

The alternative approach of sorting the includes by proximity * log(refs) or so, and then using the top one for scoring, seems less of a drastic change to the current behavior. (Visible effect: more accurate includes inserted).

clangd/Quality.cpp
190 ↗(On Diff #162650)

This looks a bit sketchy. Particularly the += boost where everything else is *=.
What's this trying to do?

clangd/index/Index.h
228

via this header -> and include this header?

(otherwise, what does "via" mean?)

clangd/index/Merge.cpp
123

This describes the logic, and the logic always produces the right result, but it's not clear *why*. Maybe add something like:

This is associative because it preserves the invariant:
 - if we haven't seen a definition, Includes covers all declarations
 - if we have seen a definition, Includes covers declarations visible from any definition

in fact it seems hard to reason about this field in Symbol without understanding this, so maybe this invariant belongs on the IncludeHeaders field itself.

123

Thinking more about it - what's the intent here?
I'm not sure sorting by (seen by def, #refs) produces better ranking than just #refs.

But there are other possible reasons for dropping includes not seen by any def:

  • remove spam from the completion list (only a problem if we clone the completion items)
  • reduce size for items that are often redeclared (I can imagine this being a problem, not obvious)

Curious what your thinking is.

ioeric retitled this revision from [clangd] *Prototype* Support multiple #include headers in one symbol. to [clangd] Support multiple #include headers in one symbol..Aug 28 2018, 10:24 AM
ioeric edited the summary of this revision. (Show Details)
ioeric updated this revision to Diff 162897.Aug 28 2018, 10:24 AM
ioeric marked 3 inline comments as done.
ioeric retitled this revision from [clangd] Support multiple #include headers in one symbol. to [clangd] *Prototype* Support multiple #include headers in one symbol..
ioeric edited the summary of this revision. (Show Details)
  • Addressed review comments.
ioeric retitled this revision from [clangd] *Prototype* Support multiple #include headers in one symbol. to [clangd] Support multiple #include headers in one symbol..Aug 28 2018, 10:25 AM
ioeric edited the summary of this revision. (Show Details)

Thanks for the review! I reduced the scope of the patch. PTAL

clangd/CodeComplete.cpp
1443

These are all valid points. I agree that we should start with less drastic change in the first version.

My concern was that there can be cases where it's impossible for clangd to suggest the correct #include (e.g. all includes have the same proximity and popularity). But I guess there are also alternative solutions to these than creating multiple completion results. For example, we could simply give up inserting include during code completion and let an include-fixer-like feature handle it.

clangd/Quality.cpp
190 ↗(On Diff #162650)

(This is no longer needed in this patch.)

As we were generating multiple candidates for the same symbol (for each include), I was trying to group them together using the small score differences as the tie breaker.

clangd/index/Merge.cpp
123

in fact it seems hard to reason about this field in Symbol without understanding this, so maybe this invariant belongs on the IncludeHeaders field itself.

Make sense. Thanks!

Thinking more about it - what's the intent here?

The intention is basically to filter out headers with forward declarations where definition is not seen. I could hardly think of a case where we would favor headers where def is not seen over those where definition is seen.

ioeric updated this revision to Diff 163074.Aug 29 2018, 6:39 AM
  • minor cleanup
  • It would be useful for the C++ API (CodeCompletion struct) to provide the includes/edits in ranked order, if possible. Frontends could experiment with showing all the options.
  • Still not sure that adding more complicated behavior to Code Complete (vs just improving ranking) is the right thing for now (bigger comment below)
clangd/CodeComplete.cpp
306

I'm not sure about this:

  • we're adding new user-visible behavior which is potentially confusing (not inserting even though insertion is turned on and we're offering a completion that requires insertion)
  • it's not clear that first > 10 * second is the right heuristic or tuning
  • this probably isn't the right place for this heuristic - e.g. it causes us not to show the location of the symbol in the detail field. (IIRC, the fact that this is only shown when header insertion is active is a limitation we wanted to fix, not furither entrench).

Can we start with the simplest behavior, closest to what we do today: just choose the most popular include?

Probably the most forward-looking way to do this is to pick the winner when we construct the CompletionCandidate and stash it in a field there. (The scoring function is going to become more complex as we add proximity, and this function gets called twice. Also this function won't have access to the info needed for proximity).

319

nit: Includes[0] and Includes[1]?

ioeric updated this revision to Diff 163351.Aug 30 2018, 9:01 AM
  • Return all possible #includes in CodeComplete.
ioeric edited the summary of this revision. (Show Details)Aug 30 2018, 9:02 AM
ioeric marked 2 inline comments as done.
sammccall accepted this revision.Aug 31 2018, 6:36 AM
sammccall added inline comments.
clangd/CodeComplete.cpp
396

nit: this could be std::stable_partition. Not sure if faster, but maybe clearer.

clangd/CodeComplete.h
145

if we've bundled together overloads that have different sets of providing headers, these includes may not be accurate for all of them. Maybe comment this limitation?

This revision is now accepted and ready to land.Aug 31 2018, 6:36 AM
ioeric updated this revision to Diff 163677.Sep 3 2018, 3:03 AM
  • Rebase
ioeric updated this revision to Diff 163678.Sep 3 2018, 3:09 AM
  • Document limitation for overload bundling.
This revision was automatically updated to reflect the committed changes.