Page MenuHomePhabricator

Optionally add code completion results for arrow instead of dot
ClosedPublic

Authored by yvvan on Dec 22 2017, 3:00 AM.

Details

Summary

Currently getting such completions requires source correction, reparsing and calling completion again. And if it shows no results and rollback is required then it costs one more reparse.
With this change it's possible to get all results which can be later filtered to split changes which require correction.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
yvvan added inline comments.Apr 25 2018, 3:02 AM
include/clang-c/Index.h
5278 ↗(On Diff #143572)

thanks, forgot to change that one

include/clang/Sema/CodeCompleteConsumer.h
415 ↗(On Diff #143572)

My first version took source manager and language options from CXCodeCompleteResults (which is in fact AllocatedCXCodeCompleteResults) so I needed to pass it as an extra parameter which looks kind of ugly,

The single CXCodeCompleteResult does not have them. But I can leave CXCodeCompleteResults pointer as a parameter to avoid this wrapper.

And since it's a libclang design part it probably should be fixed/workarounded there.

577 ↗(On Diff #143572)

CodeCompletionString is an abstraction used by libclang to store everything. So every completion detail getter in libclang works through it.

CXCompletionResult stores only cursor kind and a pointer to that string.

704 ↗(On Diff #143572)

but if it's accepted by value - it's one copy already by default

Instead I can add one more constructor with rvalue reference.

yvvan updated this revision to Diff 144076.Apr 26 2018, 2:19 AM

Wrapper around FIxItHint is removed, other review comments are addressed .

Could you upload the patch with full context?

include/clang/Sema/CodeCompleteConsumer.h
704 ↗(On Diff #143572)

If it's accepted by value, it's copy by default for l-values, but callers have an option to std::move explicitly.
Having an r-value ref overload prevents from calling with l-values (e.g. local variables), i.e. requires either an explicitly copy to a temporary or std::move.

In general, I would still suggest passing by-value, unless it's particularly important to prohibit extra copies (e.g. for performance reasons, but I don't think it's the case here).

600 ↗(On Diff #144076)

NIT: return llvm::ArrayRef<FixItHint> instead.

1040 ↗(On Diff #144076)

Mention other corrections are possible in this comment too?

include/clang/Sema/CodeCompleteOptions.h
43 ↗(On Diff #144076)

Mention other corrections are possible in this comment too?

yvvan updated this revision to Diff 144285.Apr 26 2018, 11:51 PM
yvvan marked 4 inline comments as done.

Minor clean-up according to the last comments

include/clang/Sema/CodeCompleteConsumer.h
704 ↗(On Diff #143572)

Ok, it was always a bit unclear what compiler does in such cases but I've found other references that it's actually a move in that case.

So it's changed to pass-by-value in the next diff update.

1040 ↗(On Diff #144076)

Thanks!

Sorry for the delay. I really like the direction of this patch!
What's left is defining the semantics of corrections more thoroughly to make sure we don't have tricky corner cases that users of the API can't deal with.

PS. This patch is still lacking full context of the diff.
Please use arc or send diff with full context.

include/clang-c/Index.h
5220 ↗(On Diff #144285)

I'm a bit vary about exposing source manager and language options in the API just for the sake of corrections.
I suggest we add an extra parameter of an existing class (CXCodeCompleteResults) instead and remove the corresponding helpers to get source manager and language options:
clang_getCompletionNumCorrections(CXCompletionString completion_string, CXCodeCompleteResults* results);

include/clang/Sema/CodeCompleteConsumer.h
564 ↗(On Diff #144285)

Adding some docs here would be useful. These fix-its could be interpreted too broadly at this point.
I suggest the following semantics and the comment:

/// \brief FixIts that *must* be applied before inserting the text for the corresponding completion item.
/// Completion items with non-empty fixits will not be returned by default, they should be explicitly requested by setting CompletionOptions::IncludeCorrections.
/// For the editors to be able to compute position of the cursor for the completion item itself, the following conditions are guaranteed to hold for RemoveRange of the stored fixits:
///  - Ranges in the fixits are guaranteed to never contain the completion point (or identifier under completion point, if any) inside them, except at the start or at the end of the range.
///  - If a fixit range starts or ends with completion point (or starts or ends after the identifier under completion point), it will contain at least one character. It allows to unambiguously recompute completion point after applying the fixit.
/// The intuition is that provided fixits change code around the identifier we complete, but are not allowed to touch the identifier itself or the completion point.
/// One example of completion items with corrections are the ones replacing '.' with '->' and vice versa:
///      std::unique_ptr<std::vector<int>> vec_ptr;
///      vec_ptr.^  // completion returns an item 'push_back', replacing '.' with '->'
///      vec_ptr->^ // completion returns an item 'release', replacing '->' with '.'let's put

Do those invariants sound reasonable? Could we add asserts that they hold when constructing the completion results?

565 ↗(On Diff #144285)

I wonder if we should call them Fixits instead?
To follow the pattern for diagnostics.

tools/c-index-test/c-index-test.c
2346 ↗(On Diff #144285)

We should print all provided fixits in the test code

2350 ↗(On Diff #144285)

maybe add a source range or source of the replaced text to the printf here?
e.g. "replaces '.' with '->'"?

tools/libclang/CIndexCodeCompletion.cpp
340 ↗(On Diff #144285)

Maybe return a struct containing both the string and CXSourceRange instead of this output parameter?
It does not make sense to get a string without the corresponding source range.
How does libclang return fixits for diagnostics? We probably want to follow the same pattern here.

yvvan added inline comments.May 8 2018, 4:02 AM
include/clang-c/Index.h
5220 ↗(On Diff #144285)

Haha, I had that interface initially but thought it's too ugly and needs to be improved :)

include/clang/Sema/CodeCompleteConsumer.h
564 ↗(On Diff #144285)

Thanks! I usually feel the lack of patience when writing descriptions :)

564 ↗(On Diff #144285)

"Could we add asserts that they hold when constructing the completion results"

The current way of creating them actually assures that by default. And to check it once again it's required to change many things.
Starting with the fact that there is no SourceRange contains() methos or something similar,

565 ↗(On Diff #144285)

Ok, let's call them fixit-s

yvvan updated this revision to Diff 145664.May 8 2018, 4:03 AM

Address comments and provide diff with full context

ilya-biryukov added inline comments.May 8 2018, 8:18 AM
include/clang-c/Index.h
5237 ↗(On Diff #145664)

This seems too large for a brief comment :-)
Maybe break with a newline after the first sentence.

include/clang/Sema/CodeCompleteConsumer.h
564 ↗(On Diff #144285)

The current way of creating them actually assures that by default. And to check it once again it's required to change many things.

It's still nice to have a sanity check there, mostly for the sake of new checks that are gonna get added.
But even the current one is tricky, since it may actually contain a range that ends exactly at the cursor (when completing right after the dot/arrow foo->|).

Starting with the fact that there is no SourceRange contains() methos or something similar,

Moreover, SourceRange doesn't have clear semantics AFAIK. It can either be a half-open or closed range.
Let's add a FIXME, at least, that we should add those asserts later.

564 ↗(On Diff #145664)

I suggest moving this doc comment to getFixits method, as it is the public interface.

592 ↗(On Diff #145664)

We can't store fixits in CodeCompletionString, it should not contain source locatins, which are only valid for the lifetime of SourceManager.
Note that CCS has a lifetime independent of both the SourceManager and the AST.
Storing them in CodeCompletionResult should be good enough for all our use-cases.

814 ↗(On Diff #145664)

We shouldn't duplicate such a large comment in too many places, it would be impossible to keep it in sync.
I would suggest only keeping it in CodeCompletionResult and add a reference to it in other places.
libclang is a bit tricky, though. It is distributed without other LLVM headers, right?

yvvan marked 3 inline comments as done.May 9 2018, 7:12 AM

I have some failing tests... So I will update the diff a bit later (Friday most likely)

include/clang/Sema/CodeCompleteConsumer.h
564 ↗(On Diff #144285)

It's removed from CodeCompletionString therefore it's now only above the public declaration.

592 ↗(On Diff #145664)

To support that I had to make libclang calls even worse but whatever.

814 ↗(On Diff #145664)

I will keep the comment only here and in libclang

yvvan updated this revision to Diff 146289.May 11 2018, 2:23 AM
yvvan marked 3 inline comments as done.

Only keep small completion fixit-s in CodeCompletionResults.
Change libclang calls to overcome that and not use CXCompletionString

yvvan added a comment.EditedMay 14 2018, 3:37 AM

ping

I hope this review will be over some day :)

I hope this review will be over some day :)

Sorry for keeping it that long.
FWIW, I think we keep jumping back and forth between the clang and libclang changes here.
I should've suggested splitting the change into two earlier.

I promise to promptly review the changes, let's try getting it in this week. I am very excited to have this feature!

lib/Sema/SemaCodeComplete.cpp
4107 ↗(On Diff #146289)

This FIXME should probably go into the CompletionResult constructor, we want to check the invariants in a single place.

4118 ↗(On Diff #146289)

Why do we need to report an extra diagnostic here in completion?
Our completion items contain all the relevant information for actually doing the change, and having different diagnostics in completion vs non-completion modes does not seem like the right way to do things.

tools/libclang/CIndexCodeCompletion.cpp
309 ↗(On Diff #146289)

Storing vector<vector<>> here makes looks like a hack. Even though it might seem more tricky, I suggest storing an opaque pointer to vector<FixItHint> in each CXCompletionResult. Managing the lifetime of vectors in the AllocatedCXCodeCompleteResults seems totally fine, but there should be a way to get to the fixits in a similar way we can get to the completion string.
More concretely, I suggest the following API:

// === Index.h
typedef void* CXCompletionFixIts;
typedef struct {
   // ...
   CXCompletionFixIts FixIts;
};

// Get the number of fix-its.
unsigned clang_getCompletionNumFixIts(CXCompletionResult *Result);
// ... Similar to getDiagnosticFixIt
CXString clang_getCompletionFixIt((CXCompletionResult *Result, unsigned fixit_index, CXSourceRange *replacement_range);



// === Impl.cpp
struct AllocatedCXCodeCompleteResults : public CXCodeCompleteResults {
// .....
// Pool for allocating non-empty fixit vectors in the CXCompletionResult.
std::vector<std::vector<FixItHint>> FixItsVector
};

unsigned clang_getCompletionNumFixIts(CXCompletionResult *Result) {
  auto* FixIts = static_cast<std::vector<FixItHint>*>(Result->FixIts);
  if (!FixIts)
    return 0;
  return FixIts->size();
}
yvvan added a comment.May 14 2018, 5:49 AM

I should've suggested splitting the change into two earlier.

Next time I will do that from the beginning :)

tools/libclang/CIndexCodeCompletion.cpp
309 ↗(On Diff #146289)

unsigned clang_getCompletionNumFixIts(CXCompletionResult *Result);

Do you mean appending CXCompletionResult struct with the "CXCompletionFixIts FixIts" member? Doesn't it break binary compatibility (changing the struct size)?

ilya-biryukov added inline comments.May 14 2018, 6:46 AM
tools/libclang/CIndexCodeCompletion.cpp
309 ↗(On Diff #146289)

Ah, you're right. If libclang promises binary compatibility with binaries compiled from headers with previous versions, we're not allowed to do this change.
I'm not sure what's the best way to do that, probably the interface you propose does what we want with the least amount of friction.
I'd be fine with leaving as is, but given that libclang has a stable interface, I'd get more opinions from someone who owns the code before adding this. (Another reason to split this change into two?)

A possible alternative that won't break binary compatibility would be to change the opaque CXCompletionString to point into a struct that has both CodeCompletionString and the fixits vector, but that means finding all use-sites of CXCompletionString and that might be tricky.

yvvan updated this revision to Diff 146749.May 14 2018, 11:19 PM
yvvan marked 2 inline comments as done.

Only C++ part, libclang part has moved to https://reviews.llvm.org/D46862

yvvan added inline comments.May 14 2018, 11:22 PM
lib/Sema/SemaCodeComplete.cpp
4118 ↗(On Diff #146289)

In current version of this patch it's not needed anymore. Removed.

tools/libclang/CIndexCodeCompletion.cpp
309 ↗(On Diff #146289)

I already tried to have it in CXCompletionString. And I've even found and replaced all usages. But there's a bigger issue: CXCompletionString does not have dispose method and it exists not only in context of CXCompletionChunkKind (for example as a return value of clang_getCompletionChunkCompletionString). It's easy to add such dispose method but it will introduce leaks in already existing code bases.

By all that I mean that "the interface you propose does what we want with the least amount of friction".

Maybe move the tests back to this revision?
There are tests for code completion that don't depend on libindex or libclang and use clang -cc1 directly, i.e. tools/clang/test/CodeComplete. This would require adding a flag to clang and patching up PrintingCodeCompleConsumer to print the fix-its, but that should be easy. Moreover, it's useful to have the option to show those completions in clang anyway.

include/clang/Basic/DiagnosticSemaKinds.td
5715 ↗(On Diff #146749)

This is not used anymore, right? Maybe remove it?

include/clang/Sema/CodeCompleteConsumer.h
807 ↗(On Diff #146749)

NIT: reflow on this comment broke formatting, maybe keep it unindented instead? I.e., this should give better formatting:

///      std::unique_ptr<std::vector<int>> vec_ptr;
///      vec_ptr.^ 
///      vec_ptr->^
///
///  In 'vec_ptr.^', one of completion items is 'push_back', it requires replacing '.' with '->'.
///  In 'vec_ptr->^', one of completion items is 'release', it requires replacing '.' with '->'.
lib/Sema/SemaCodeComplete.cpp
3954 ↗(On Diff #146749)

Maybe pass a single fix-it here too with a more descriptive name?
It would make the code more readable.

4022 ↗(On Diff #146749)

Maybe pass a single fix-it with more descriptive name, e.g. Optional<FixItHint> AccessOpFixIt?

4105 ↗(On Diff #146749)

This FIXME should probably live in CodeCompletionResult constructor or ResultBuilder::AddResult. It's pretty obvious that this specific code does the right thing, but it may be non-obvious about more generic code that stores a vector<FixItHint>

test/SemaCXX/member-expr.cpp
193 ↗(On Diff #146749)

Is this still needed after we removed the diagnostics code?

tools/libclang/CIndexCodeCompletion.cpp
309 ↗(On Diff #146289)

I already tried to have it in CXCompletionString. And I've even found and replaced all usages. But there's a bigger issue: CXCompletionString does not have dispose method and it exists not only in context of CXCompletionChunkKind (for example as a return value of clang_getCompletionChunkCompletionString). It's easy to add such dispose method but it will introduce leaks in already existing code bases.

Yeah, that's probably too compilcated. And it does not make sense for completion strings inside chunks to have fix-its, the fix-its only make sense for the completion item as a whole.

yvvan added a comment.May 15 2018, 3:29 AM

I will add more tests...

test/SemaCXX/member-expr.cpp
193 ↗(On Diff #146749)

Yes, even if it does not fail it is not related to that change anymore

yvvan updated this revision to Diff 147039.May 16 2018, 3:51 AM
yvvan marked 4 inline comments as done.

Append PrintingCodeCompleteConsumer and CompilerInvocation options, add CodeCompletion tests.

Sorry, a few more things and we're good to go.

include/clang/Driver/CC1Options.td
449 ↗(On Diff #147039)

NIT: maybe simplify to "which require small fix-its"?

lib/Frontend/CompilerInvocation.cpp
1541 ↗(On Diff #147039)

Following naming convention of other options, maybe use OPT_code_completion_with_fixits? (i.e. none of them start with include)

lib/Sema/CodeCompleteConsumer.cpp
559 ↗(On Diff #147039)

Unfortunately, that might not work for some ranges, see docs of CharSourceRange:

/// In the token range case, the
/// size of the last token must be measured to determine the actual end of the
/// range.

The TextDiagnostic::emitParseableFixits handles it, I suggest we do it similarly:

FixItHint*I = //....;
SourceLocation BLoc = I->RemoveRange.getBegin();
SourceLocation ELoc = I->RemoveRange.getEnd();

std::pair<FileID, unsigned> BInfo = SM.getDecomposedLoc(BLoc);
std::pair<FileID, unsigned> EInfo = SM.getDecomposedLoc(ELoc);

// Adjust for token ranges.
if (I->RemoveRange.isTokenRange())
  EInfo.second += Lexer::MeasureTokenLength(ELoc, SM, LangOpts);

// We specifically do not do word-wrapping or tab-expansion here,
// because this is supposed to be easy to parse.
PresumedLoc PLoc = SM.getPresumedLoc(BLoc);
if (PLoc.isInvalid())
  break;

OS << "fix-it:\"";
OS.write_escaped(PLoc.getFilename());
OS << "\":{" << SM.getLineNumber(BInfo.first, BInfo.second)
  << ':' << SM.getColumnNumber(BInfo.first, BInfo.second)
  << '-' << SM.getLineNumber(EInfo.first, EInfo.second)
  << ':' << SM.getColumnNumber(EInfo.first, EInfo.second)
  << "}:\"";
OS.write_escaped(I->CodeToInsert);
OS << "\"\n";
lib/Sema/SemaCodeComplete.cpp
4109 ↗(On Diff #147039)

I'd use token ranges here to avoid assumptions about sizes of tokens, e.g. CreateReplacemen(CharSourceRange::getTokenRange(OpLoc, OpLoc), IsArrow ? '.' : '->')
There are complicated cases like \ that end of the line and macros and it's definitely better to use an abstraction that hides those cases.

yvvan added inline comments.May 22 2018, 2:42 AM
lib/Sema/CodeCompleteConsumer.cpp
559 ↗(On Diff #147039)

yeah. sorry. it's always like that with source ranges. thanks for a hint

lib/Sema/SemaCodeComplete.cpp
4109 ↗(On Diff #147039)

the problem is that I need the range, not just a location.... and I don't know how to extract it here. is there a way to get next token location here?

ilya-biryukov added inline comments.May 22 2018, 2:58 AM
lib/Sema/SemaCodeComplete.cpp
4109 ↗(On Diff #147039)

IIUC, using CharSourceRange::getTokenRange(OpLoc, OpLoc) will do what you want.
When CharSourceRange is a token range, the client code should measurements the token length and the producing code would use the starting location of the last token.
Here are the relevant bits from the comment of the CharSourceRange:

/// In the token range case, the
/// size of the last token must be measured to determine the actual end of the
/// range.
yvvan added inline comments.May 22 2018, 3:07 AM
lib/Sema/SemaCodeComplete.cpp
4109 ↗(On Diff #147039)

ok, i will recheck that it does calculate the token range

yvvan updated this revision to Diff 148164.May 23 2018, 12:06 AM
yvvan marked 4 inline comments as done.

I have only a few nits left, otherwise looks good. Thanks for making this change! Really excited for it to get in finally!

include/clang/Driver/CC1Options.td
449 ↗(On Diff #148164)

NIT: there's a redundant space before the closing .

include/clang/Sema/CodeCompleteConsumer.h
786 ↗(On Diff #148164)

NIT: remove \brief from the start of the comment. Those recently got removed from all of llvm

1059 ↗(On Diff #148164)

NIT: another redundant \brief

include/clang/Sema/CodeCompleteOptions.h
42 ↗(On Diff #148164)

NIT: Maybe s/Show also results/Include results?

lib/Sema/SemaCodeComplete.cpp
4148 ↗(On Diff #148164)

NIT: maybe swap the two cases to do the non-fixit ones first? It is the most common case, so it should probably go first.

yvvan marked 4 inline comments as done.May 24 2018, 11:31 PM
yvvan added inline comments.
lib/Sema/SemaCodeComplete.cpp
4148 ↗(On Diff #148164)

The actual completion should be the last because of the "Completion context". I've put them in this order intentionally. I will add comment about that.

yvvan marked an inline comment as done.May 25 2018, 12:27 AM
yvvan added inline comments.
lib/Sema/SemaCodeComplete.cpp
4148 ↗(On Diff #148164)

I've rechecked this part - I can actually now move it without breaking anything

yvvan updated this revision to Diff 148553.May 25 2018, 12:29 AM

NIT-s addressed

ilya-biryukov accepted this revision.May 25 2018, 2:43 AM

Thanks for addressing the NITs. LGTM!

This revision is now accepted and ready to land.May 25 2018, 2:43 AM
This revision was automatically updated to reflect the committed changes.