This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Suggest adding missing includes for typos (like include-fixer).
ClosedPublic

Authored by ioeric on Jan 21 2019, 8:17 AM.

Details

Summary

This adds include-fixer feature into clangd based on D56903. Clangd now captures
diagnostics caused by typos and attach include insertion fixes to potentially
fix the typo.

Diff Detail

Repository
rL LLVM

Event Timeline

ioeric created this revision.Jan 21 2019, 8:17 AM
ioeric updated this revision to Diff 182799.Jan 21 2019, 8:18 AM
  • revert unintended change
ioeric updated this revision to Diff 182800.Jan 21 2019, 8:18 AM
ioeric updated this revision to Diff 182925.Jan 22 2019, 8:22 AM
ioeric updated this revision to Diff 183845.Jan 28 2019, 6:20 AM
  • Rebase on origin/master
sammccall added inline comments.Jan 31 2019, 10:18 AM
clangd/IncludeFixer.cpp
74 ↗(On Diff #183845)

"default" looks a bit scary - could still end up with false positives.
Can we cover a few of the most common cases (even if we know they're not exhaustive) to start with?

203 ↗(On Diff #183845)

a comment that we never return a valid correction to try to recover, our suggested fixes always require a rebuild.

209 ↗(On Diff #183845)

why do we do this here rather than compute the scopes/name already in CorrectTypo()?

Passing around more of these AST objects and doing the computations later seems a bit more fragile.

227 ↗(On Diff #183845)

limit?

233 ↗(On Diff #183845)

why put these into a slab and then build fixes later, rather than build the fixes immediately?

241 ↗(On Diff #183845)

Doesn't seem worth special casing this, below code will work fine.

246 ↗(On Diff #183845)

the fixes can overlap: distinct symbols may be defined in the same header.
Common case: function overloads.

At some level we "just want to deduplicate" these.
It's slightly complicated because of the tricky way we calculate the edits.
Maybe the best thing is to pass in all the symbols to fixesForSymbol() instead of just one.

One thing to keep in mind is a future where we want to pull in results across namespaces: the fixes for "include 'a.h' for x::Foo" and "include 'a.h' for y::Foo" need to be distinct because of the different added qualifiers.
So it'd be good to end up with an explicit e.g. map<header_to_include, Fix> to deduplicate, then we can just change the key to pair<header_to_include, qualifier_to_insert>

clangd/IncludeFixer.h
35 ↗(On Diff #183845)

Having to inject the whole compiler into the include fixer doesn't seem right. Apart from the huge scope, we're also going to register parts of the include fixer with the compiler.

A few observations:

  • all we actually need here is Sema and the SM (available through Sema)
  • it's only the typo recorder that needs access to Sema
  • the typo recorder gets fed the Sema instance (ExternalSemaSource::InitializeSema)
  • the fact that fixTypo() needs to access the typo recorder doesn't seem necessary

So I'd suggest changing this function to return a *new* ExternalSemaSource that stashes the last typo in this IncludeFixer object directly.

e.g.

llvm::IntrusiveRefCntPtr<ExternalSemaSource> typoRecorder() {
  return new TypoRecorder(LastTypo);
}
private:
  Optional<TypoRecord> LastTypo;
  class TypoRecorder : public ExternalSemaSource {
    Sema *S;
    Optional<TypoRecord> &Out;
    InitializeSema(Sema& S) { this->S = &S; }
    CorrectTypo(...) {
      assert(S);
      ...
      Out = ...;
    }
  }
46 ↗(On Diff #183845)

This would be a reasonable place to document what the feature is and how it works :-)

e.g.

/// Returns an ExternalSemaSource that records failed name lookups in Sema.
/// This allows IncludeFixer to suggest inserting headers that define those names.
59 ↗(On Diff #183845)

So an annoying naming thing: I find the use of "typo" to describe this feature really confusing. There's no actual typo, and conflating Sema's typo-correction fixes (which allow the parse to recover) and our generated fixes (which requires changing the source and rebuilding) breaks my brain a bit.

I'd suggest calling this "unresolved name" throughout, except when specifically referring to the mechanism by which we observe unresolved names.
e.g. TypoRecorder -> UnresolvedNameRecorder etc.

68 ↗(On Diff #183845)

I think TypoRecorder can be merely declared here, and defined in the CC file?

ioeric updated this revision to Diff 185031.Feb 4 2019, 5:38 AM
ioeric marked 13 inline comments as done.
  • address review comments.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 4 2019, 5:38 AM
ioeric added inline comments.Feb 4 2019, 5:39 AM
clangd/IncludeFixer.cpp
209 ↗(On Diff #183845)

Because not all typos will result in diagnostics. Computing scopes could be expensive, so we would only want to do that when necessary. For example, sema can perform typo corrections for both x and y in x::y, while only one diagnostic (for either x or y) would be generated.

233 ↗(On Diff #183845)

good point.

clangd/IncludeFixer.h
35 ↗(On Diff #183845)

Great idea. Thanks!

I've done most of the suggested refactoring.

the fact that fixTypo() needs to access the typo recorder doesn't seem necessary.

Do you mean access to *Sema*? The IncludeFixer still needs sema because we don't want to run Lookup for each typo sema encounters. We only want to do that for typos that make it to diagnostics. See my response to the other comment about this for details.

sammccall added inline comments.Feb 6 2019, 1:08 AM
clangd/IncludeFixer.cpp
82 ↗(On Diff #185031)

you never seem to clear LastUnresolvedName - I guess you at least want to clear it after using it.
It would be more robust to clear after every diagnostic - would this work?

161 ↗(On Diff #185031)

note that if you have symbols in different scopes from the same header, you're only generating one fix and it will only mention one of the symbols.

It's fine to decide this case doesn't matter, but it's worth adding a comment.
(Alternatively, using a set of pair<inserted-header, qualified-name> to track dupes would avoid this)

186 ↗(On Diff #185031)

SemaPtr?

199 ↗(On Diff #185031)

The FIXME makes sense, but it's not clear to me what the code below it does or how it relates to the comment.

maybe separate with a blank line and then "Extract the typed scope" or so?

228 ↗(On Diff #185031)

nit: recover

233 ↗(On Diff #185031)

unused

209 ↗(On Diff #183845)

If the goal is to make the scope computation lazy, CorrectTypo can store them as a std::function<std::vector<string>()> - that way we can still group the related code and dependencies together.

This function can capture the Sema instance by pointer - this deserves a comment that sema must be alive to access the scopes.

clangd/IncludeFixer.h
60 ↗(On Diff #185031)

nit: the comment on Loc and LookupKind don't say anything, remove them?

61 ↗(On Diff #185031)

please use real names for these fields (EnclosingScope, SpecifiedScope?)

Maybe it would help documenting here to have a more complete example e.g.

namespace ns { int Y = foo::x }

Name is x
Loc is points at x (or f?)
EnclosingScope is ns::
SpecifiedScope is either ns::foo:: or foo:: (depending how 'foo' was resolved)

72 ↗(On Diff #185031)

nit: why take a parameter here, when it's always *LastUnresolvedName?

35 ↗(On Diff #183845)

Yes, I meant Sema here. I think we can avoid IncludeFixer holding a reference to Sema directly.

59 ↗(On Diff #183845)

I don't think this is done: there are still 20+ references to "typo" in this patch.
I think they should all be changed, apart from the implementation of CorrectTypo() and a comment explaining the use of this mechanism.
Happy to discuss if you don't think this is a good idea, though!

ioeric updated this revision to Diff 185546.Feb 6 2019, 7:08 AM
ioeric marked 15 inline comments as done.
  • address review comments
sammccall accepted this revision.Feb 6 2019, 11:55 AM

Great stuff! Sorry for the delay here.

clangd/IncludeFixer.h
59 ↗(On Diff #185546)

nit: this isn't really a callback, I think.
Just "Lazily get the possible scopes of the unresolved name"?

64 ↗(On Diff #185546)

"i.e. typo" is confusing here, as these aren't actually typos

This revision is now accepted and ready to land.Feb 6 2019, 11:55 AM
ioeric updated this revision to Diff 185718.Feb 7 2019, 1:19 AM
ioeric marked 2 inline comments as done.
  • Merge remote-tracking branch 'origin/master' into typo
  • address review comments
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2019, 1:23 AM