This is an archive of the discontinued LLVM Phabricator instance.

[CodeComplete] Propagate preferred types through parser in more cases
ClosedPublic

Authored by ilya-biryukov on Jan 15 2019, 7:48 AM.

Details

Summary

Preferred types are used by code completion for ranking. This commit
considerably increases the number of points in code where those types
are propagated.

In order to avoid complicating signatures of Parser's methods, a
preferred type is kept as a member variable in the parser and updated
during parsing.

Diff Detail

Event Timeline

ilya-biryukov created this revision.Jan 15 2019, 7:48 AM

@kadircet for preferred-type related bits.
@rsmith for parser-related bits. More specifically, (1) should we be worried that this might affect parser performance on the hot path and (2) whether there are better alternatives to do something like this.

That really looks awesome, thanks!

One general comment: I am not really sure if the handling done in ParseCastExpression is extensive enough. But IIUC, this should not cause any regressions, rather should result in missing preferred types just as before this patch ?

lib/Parse/ParseExpr.cpp
1157

shouldn't we have an enterunknown here as well?

lib/Sema/SemaCodeComplete.cpp
352

I am not sure it carries the weight, why not just inline it?

360

Is this check necessary? According to comments getCurBlock returns null if there is no block.

373

nit: maybe move this into top and get rid of the return statements inside ifs(also by changing them to else ifs)

380

Is it really possible for D to be null ?

Thanks for a quick response, @kadircet!
Leaving some first comments, will address the rest later.

One general comment: I am not really sure if the handling done in ParseCastExpression is extensive enough. But IIUC, this should not cause any regressions, rather should result in missing preferred types just as before this patch ?

Exactly, no regressions are expected, we should start providing a preferred type in more places after this change.
This change is aiming to establish a well-defined pattern on how to add these types, more changes will follow to actually propagate the types in more cases.

lib/Sema/SemaCodeComplete.cpp
360

This was copy-pasted from the removed codeCompleteReturn. I'd keep it this way in the initial patch to make sure we don't change semantics in existing cases.

This seems to be introducing a requirement that enterUnknown is called on all paths through the parser where we recurse to parse a subexpression and don't have specific type information. That seems like an unfortunate requirement to me from a maintenance perspective; have you considered alternative approaches? For example, we could store the preferred type alongside the SourceLocation of the corresponding token (and propagate the information when we parse parentheses and any other completion-type-preserving construct), and only apply the information when the code completion token is at that location. That way, we only need to annotate cases where we know the preferred type, not when we don't.

Thanks for the suggestion, this should definitely work! I did struggle to figure out a way to do this without annotating every path with enterUnknown and failed.
I'll try playing around with your idea, my initial plan is to store preferred type alongside the current token as a member of the Parser class and update it when advancing to next token, when the parser backtracks and in the places where we care about propagating the types.

I'll try playing around with your idea, my initial plan is to store preferred type alongside the current token as a member of the Parser class and update it when advancing to next token, when the parser backtracks and in the places where we care about propagating the types.

ConsumeToken is a fairly hot function; if you can avoid changes there that'd be preferable. Tracking this automatically across tentative parse / rollback seems like a very nice idea.

  • Remove enterUnknown(), keep token location instead.
  • Inline update() and remove it
ilya-biryukov marked 3 inline comments as done.Jan 24 2019, 6:22 AM
ilya-biryukov added inline comments.
lib/Sema/SemaCodeComplete.cpp
373

FWIW I find ifs with return to be more readable than a bunch of if+else statements. The reason is that return clearly states function is done, while trailing else suggests it will go further.

But this particular function is short, so it's not a strong argument, can change it if you insist.

380

Probably shouldn't, but I can't be certain, I'm too lazy to inspect all code paths in the parser, so I'd simply rely on the types used there and assume this could happen.

I'll try adding an assertion and finding the example when it happens, though.

ConsumeToken is a fairly hot function; if you can avoid changes there that'd be preferable.

Done, there are no enterUnknown calls anymore and to avoid updating on each call to ConsumeToken we now store a location of the token we computed the expected type for.
The resulting code looks ok to my taste, let me know what you think.

Initially I thought this approach would allow to get rid of the RAII object to restore the type, but I don't see a good alternative to it even with the new approach.

  • Improve a comment
  • Use 'else if' to simplify the code
ilya-biryukov marked an inline comment as done.Jan 28 2019, 12:00 PM
ilya-biryukov added inline comments.
lib/Sema/SemaCodeComplete.cpp
373

On a second thought, I think my approach makes it uglier. The function is small, so the added "return R" everywhere only make it uglier.

Changed the code per your suggestions, thanks!

rsmith added inline comments.Jan 28 2019, 2:37 PM
clang/include/clang/Sema/Sema.h
319–346 ↗(On Diff #183937)

Now we store a location along with the expected type, I don't think we need an RAII object any more. (We would need to save/restore in tentative parsing, if we ever called any of the parsing functions that set a preferred type, but tentative parsing shouldn't be doing that.)

ilya-biryukov marked 2 inline comments as done.Jan 29 2019, 1:54 AM
ilya-biryukov added inline comments.
clang/include/clang/Sema/Sema.h
319–346 ↗(On Diff #183937)

No RAII objects anymore. By keeping them I intended to simplify the requirements on the callers that wouldn't need to worry about restoring the expected type after parsing parts of an expression.
In practice, it's not a big burden and the resulting code looks even more concise. There are surely some cases we're missing, but it's fine, we'll track them down later and missing expected type in some cases is totally ok.

ilya-biryukov marked an inline comment as done.
  • Get rid of the RAII object
kadircet accepted this revision.Jan 29 2019, 3:17 AM

LGTM from my side

This revision is now accepted and ready to land.Jan 29 2019, 3:17 AM
rsmith accepted this revision.Jan 29 2019, 3:45 PM
This revision was automatically updated to reflect the committed changes.