Currently getting such completions requires source correction, reparsing and calling completion again. And if it shows no results and rollback is required then it costs one more reparse.
With this change it's possible to get all results which can be later filtered to split changes which require correction.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Sorry for not taking a look for a long time. I think that would be a very useful feature to have.
Here are a few comments (also see the inline comments)
- Please add tests
- How would the clients know how to correct dot to arrow? It'd be cool if code completion results to provide a replacement for that (e.g. return a FixItHint that should be applied along with the completion items that have RequiresDotToArrowCorrection == true).
- We should probably have a more general option, i.e. AllowCorrections. We could reuse it for other advanced completion scenarios.
- This patch calls HandleCodeCompleteResults twice and it breaks at least one existing client (clangd) and, arguably, makes the contract of CodeCompleteConsumer more complicated. I have a patch that addresses that: D42474. The change also moves some of the code to parser, arguably making the overall patch.
include/clang/Sema/CodeCompleteConsumer.h | ||
---|---|---|
493 ↗ | (On Diff #130384) | Maybe remove the default argument? |
595 ↗ | (On Diff #130384) | Maybe remove = false, initialize in constructor to be consistent with how other members are initialized? |
612 ↗ | (On Diff #130384) | Maybe remove the default argument too? |
include/clang/Sema/CodeCompleteOptions.h | ||
43 ↗ | (On Diff #130384) | Could we make a more general option (say, IncludeCorrections)? |
lib/Sema/SemaCodeComplete.cpp | ||
4066 ↗ | (On Diff #130384) | The code here seems to replicate what the parser does, so certifying that's it's correct is hard and we may not pick up the changes when something changes in parser. |
I also propose to try correcting in both directions, i.e. here's how completion completion could work with this feature:
struct X { int foobarbaz; }; void test() { X var; X* ptr; std::unique_ptr<X> uptr; var->foobar // replace with var.foobarbaz ptr.foobar // replace with ptr->foobar uptr->get // replace with uptr.get uptr.foobar // replace with uptr->foobarbaz }
@ilya-biryukov
Thanks a lot for you comments and for the provided code replacement. I'm checking now how it works and will continue addressing other comments.
Sorry for not getting back on this earlier.
I wanted to discuss whether returning correction as a enum value is a proper interface for users of libclang.
It seems easy to replace . with -> inside clang, properly handling all the tricky cases (tokens coming from macro substitution, etc) and making sure it's a viable option (again, macro substitutions may make it non-viable).
And even though it's not hard to implement things like "replace . with ->" for any particular editor, we still need to do that for every small correction that may we want to add to the completion in every editor.
I propose to return something similar to a FixItHint from Diagnostic.h, i.e. description of a simple text edit that should be applied before inserting the completion identifier. In that case, we may hope that every editor will implement the corrections once, and we could add more features into code completions that will be immediately picked up by all the editors that support it.
WDYT?
I've already added hints in this patch and it also do not add extra completions unless the flag IncludeCorrections is set. So this will not force editors to use corrections.
Did you mean that you think it's good to have extra fixit hints even if this flag is not set?
Or is your idea is to return the char sequence instead to use this correction in some universal way?
Exactly. Editors that implement corrections would pick up any new corrections automatically, rather than implementing each specific correction separately.
Sorry for the delay.
include/clang/Sema/CodeCompleteConsumer.h | ||
---|---|---|
565 ↗ | (On Diff #138139) | Having a string replacement without an actual range to replace still moves a lot of responsibility onto the clients. We should probably model corrections after the fix-its for diagnostics:
For the fix-its in the diagnostics, see CXLoadedDiagnostic.h for an interface exported via libclang and Diagnostic.h for an interface exported in C++ API. |
include/clang/Sema/CodeCompleteConsumer.h | ||
---|---|---|
565 ↗ | (On Diff #138139) | I thought that fixits always exist separately from diagnostics. I will check this out, thanks. |
include/clang/Sema/CodeCompleteConsumer.h | ||
---|---|---|
565 ↗ | (On Diff #138139) | I've looked into diagnostics and realized that i already use them here but the are just not forwarded to completion results which is of source possible. I have a draft with Optional<FixItHint> instead of Optional<std::string> Is that fine with you? |
Sorry for the late response, was on vacation.
include/clang/Sema/CodeCompleteConsumer.h | ||
---|---|---|
565 ↗ | (On Diff #138139) | Sounds good. |
Use wrapped FixItHint to keep corrections.
Can be quite easily changed from Optional to some container if required.
Important: please upload the patch with full context diff
include/clang-c/Index.h | ||
---|---|---|
5278 ↗ | (On Diff #143572) | This implies that "dot to arrow" is the only available correction. Maybe rephrase to mention that others are possible in theory? Whether to include completion items with corrections (small fix-its), e.g. change '.' to '->' on member access, etc. |
include/clang/Sema/CodeCompleteConsumer.h | ||
415 ↗ | (On Diff #143572) | Why do we need this wrapper? |
577 ↗ | (On Diff #143572) | Storing fix-its in CodeCompletionString seems like to be against its original intention, i.e. now it depends on SourceLocations, which require SourceManager, etc. |
704 ↗ | (On Diff #143572) | Maybe accept the vector by value instead of const reference to allow the clients to std::move the argument and avoid copies? |
include/clang-c/Index.h | ||
---|---|---|
5278 ↗ | (On Diff #143572) | thanks, forgot to change that one |
include/clang/Sema/CodeCompleteConsumer.h | ||
415 ↗ | (On Diff #143572) | My first version took source manager and language options from CXCodeCompleteResults (which is in fact AllocatedCXCodeCompleteResults) so I needed to pass it as an extra parameter which looks kind of ugly, The single CXCodeCompleteResult does not have them. But I can leave CXCodeCompleteResults pointer as a parameter to avoid this wrapper. And since it's a libclang design part it probably should be fixed/workarounded there. |
577 ↗ | (On Diff #143572) | CodeCompletionString is an abstraction used by libclang to store everything. So every completion detail getter in libclang works through it. CXCompletionResult stores only cursor kind and a pointer to that string. |
704 ↗ | (On Diff #143572) | but if it's accepted by value - it's one copy already by default Instead I can add one more constructor with rvalue reference. |
Could you upload the patch with full context?
include/clang/Sema/CodeCompleteConsumer.h | ||
---|---|---|
704 ↗ | (On Diff #143572) | If it's accepted by value, it's copy by default for l-values, but callers have an option to std::move explicitly. In general, I would still suggest passing by-value, unless it's particularly important to prohibit extra copies (e.g. for performance reasons, but I don't think it's the case here). |
600 ↗ | (On Diff #144076) | NIT: return llvm::ArrayRef<FixItHint> instead. |
1040 ↗ | (On Diff #144076) | Mention other corrections are possible in this comment too? |
include/clang/Sema/CodeCompleteOptions.h | ||
43 ↗ | (On Diff #144076) | Mention other corrections are possible in this comment too? |
Minor clean-up according to the last comments
include/clang/Sema/CodeCompleteConsumer.h | ||
---|---|---|
704 ↗ | (On Diff #143572) | Ok, it was always a bit unclear what compiler does in such cases but I've found other references that it's actually a move in that case. So it's changed to pass-by-value in the next diff update. |
1040 ↗ | (On Diff #144076) | Thanks! |
Sorry for the delay. I really like the direction of this patch!
What's left is defining the semantics of corrections more thoroughly to make sure we don't have tricky corner cases that users of the API can't deal with.
PS. This patch is still lacking full context of the diff.
Please use arc or send diff with full context.
include/clang-c/Index.h | ||
---|---|---|
5220 ↗ | (On Diff #144285) | I'm a bit vary about exposing source manager and language options in the API just for the sake of corrections. |
include/clang/Sema/CodeCompleteConsumer.h | ||
564 ↗ | (On Diff #144285) | Adding some docs here would be useful. These fix-its could be interpreted too broadly at this point. /// \brief FixIts that *must* be applied before inserting the text for the corresponding completion item. /// Completion items with non-empty fixits will not be returned by default, they should be explicitly requested by setting CompletionOptions::IncludeCorrections. /// For the editors to be able to compute position of the cursor for the completion item itself, the following conditions are guaranteed to hold for RemoveRange of the stored fixits: /// - Ranges in the fixits are guaranteed to never contain the completion point (or identifier under completion point, if any) inside them, except at the start or at the end of the range. /// - If a fixit range starts or ends with completion point (or starts or ends after the identifier under completion point), it will contain at least one character. It allows to unambiguously recompute completion point after applying the fixit. /// The intuition is that provided fixits change code around the identifier we complete, but are not allowed to touch the identifier itself or the completion point. /// One example of completion items with corrections are the ones replacing '.' with '->' and vice versa: /// std::unique_ptr<std::vector<int>> vec_ptr; /// vec_ptr.^ // completion returns an item 'push_back', replacing '.' with '->' /// vec_ptr->^ // completion returns an item 'release', replacing '->' with '.'let's put Do those invariants sound reasonable? Could we add asserts that they hold when constructing the completion results? |
565 ↗ | (On Diff #144285) | I wonder if we should call them Fixits instead? |
tools/c-index-test/c-index-test.c | ||
2346 ↗ | (On Diff #144285) | We should print all provided fixits in the test code |
2350 ↗ | (On Diff #144285) | maybe add a source range or source of the replaced text to the printf here? |
tools/libclang/CIndexCodeCompletion.cpp | ||
340 ↗ | (On Diff #144285) | Maybe return a struct containing both the string and CXSourceRange instead of this output parameter? |
include/clang-c/Index.h | ||
---|---|---|
5220 ↗ | (On Diff #144285) | Haha, I had that interface initially but thought it's too ugly and needs to be improved :) |
include/clang/Sema/CodeCompleteConsumer.h | ||
564 ↗ | (On Diff #144285) | Thanks! I usually feel the lack of patience when writing descriptions :) |
564 ↗ | (On Diff #144285) | "Could we add asserts that they hold when constructing the completion results" The current way of creating them actually assures that by default. And to check it once again it's required to change many things. |
565 ↗ | (On Diff #144285) | Ok, let's call them fixit-s |
include/clang-c/Index.h | ||
---|---|---|
5237 ↗ | (On Diff #145664) | This seems too large for a brief comment :-) |
include/clang/Sema/CodeCompleteConsumer.h | ||
564 ↗ | (On Diff #145664) | I suggest moving this doc comment to getFixits method, as it is the public interface. |
592 ↗ | (On Diff #145664) | We can't store fixits in CodeCompletionString, it should not contain source locatins, which are only valid for the lifetime of SourceManager. |
814 ↗ | (On Diff #145664) | We shouldn't duplicate such a large comment in too many places, it would be impossible to keep it in sync. |
564 ↗ | (On Diff #144285) |
It's still nice to have a sanity check there, mostly for the sake of new checks that are gonna get added.
Moreover, SourceRange doesn't have clear semantics AFAIK. It can either be a half-open or closed range. |
I have some failing tests... So I will update the diff a bit later (Friday most likely)
include/clang/Sema/CodeCompleteConsumer.h | ||
---|---|---|
592 ↗ | (On Diff #145664) | To support that I had to make libclang calls even worse but whatever. |
814 ↗ | (On Diff #145664) | I will keep the comment only here and in libclang |
564 ↗ | (On Diff #144285) | It's removed from CodeCompletionString therefore it's now only above the public declaration. |
Only keep small completion fixit-s in CodeCompletionResults.
Change libclang calls to overcome that and not use CXCompletionString
Sorry for keeping it that long.
FWIW, I think we keep jumping back and forth between the clang and libclang changes here.
I should've suggested splitting the change into two earlier.
I promise to promptly review the changes, let's try getting it in this week. I am very excited to have this feature!
lib/Sema/SemaCodeComplete.cpp | ||
---|---|---|
4107 ↗ | (On Diff #146289) | This FIXME should probably go into the CompletionResult constructor, we want to check the invariants in a single place. |
4118 ↗ | (On Diff #146289) | Why do we need to report an extra diagnostic here in completion? |
tools/libclang/CIndexCodeCompletion.cpp | ||
309 ↗ | (On Diff #146289) | Storing vector<vector<>> here makes looks like a hack. Even though it might seem more tricky, I suggest storing an opaque pointer to vector<FixItHint> in each CXCompletionResult. Managing the lifetime of vectors in the AllocatedCXCodeCompleteResults seems totally fine, but there should be a way to get to the fixits in a similar way we can get to the completion string. // === Index.h typedef void* CXCompletionFixIts; typedef struct { // ... CXCompletionFixIts FixIts; }; // Get the number of fix-its. unsigned clang_getCompletionNumFixIts(CXCompletionResult *Result); // ... Similar to getDiagnosticFixIt CXString clang_getCompletionFixIt((CXCompletionResult *Result, unsigned fixit_index, CXSourceRange *replacement_range); // === Impl.cpp struct AllocatedCXCodeCompleteResults : public CXCodeCompleteResults { // ..... // Pool for allocating non-empty fixit vectors in the CXCompletionResult. std::vector<std::vector<FixItHint>> FixItsVector }; unsigned clang_getCompletionNumFixIts(CXCompletionResult *Result) { auto* FixIts = static_cast<std::vector<FixItHint>*>(Result->FixIts); if (!FixIts) return 0; return FixIts->size(); } |
Next time I will do that from the beginning :)
tools/libclang/CIndexCodeCompletion.cpp | ||
---|---|---|
309 ↗ | (On Diff #146289) | unsigned clang_getCompletionNumFixIts(CXCompletionResult *Result); Do you mean appending CXCompletionResult struct with the "CXCompletionFixIts FixIts" member? Doesn't it break binary compatibility (changing the struct size)? |
tools/libclang/CIndexCodeCompletion.cpp | ||
---|---|---|
309 ↗ | (On Diff #146289) | Ah, you're right. If libclang promises binary compatibility with binaries compiled from headers with previous versions, we're not allowed to do this change. A possible alternative that won't break binary compatibility would be to change the opaque CXCompletionString to point into a struct that has both CodeCompletionString and the fixits vector, but that means finding all use-sites of CXCompletionString and that might be tricky. |
lib/Sema/SemaCodeComplete.cpp | ||
---|---|---|
4118 ↗ | (On Diff #146289) | In current version of this patch it's not needed anymore. Removed. |
tools/libclang/CIndexCodeCompletion.cpp | ||
309 ↗ | (On Diff #146289) | I already tried to have it in CXCompletionString. And I've even found and replaced all usages. But there's a bigger issue: CXCompletionString does not have dispose method and it exists not only in context of CXCompletionChunkKind (for example as a return value of clang_getCompletionChunkCompletionString). It's easy to add such dispose method but it will introduce leaks in already existing code bases. By all that I mean that "the interface you propose does what we want with the least amount of friction". |
Maybe move the tests back to this revision?
There are tests for code completion that don't depend on libindex or libclang and use clang -cc1 directly, i.e. tools/clang/test/CodeComplete. This would require adding a flag to clang and patching up PrintingCodeCompleConsumer to print the fix-its, but that should be easy. Moreover, it's useful to have the option to show those completions in clang anyway.
include/clang/Basic/DiagnosticSemaKinds.td | ||
---|---|---|
5715 ↗ | (On Diff #146749) | This is not used anymore, right? Maybe remove it? |
include/clang/Sema/CodeCompleteConsumer.h | ||
807 ↗ | (On Diff #146749) | NIT: reflow on this comment broke formatting, maybe keep it unindented instead? I.e., this should give better formatting: /// std::unique_ptr<std::vector<int>> vec_ptr; /// vec_ptr.^ /// vec_ptr->^ /// /// In 'vec_ptr.^', one of completion items is 'push_back', it requires replacing '.' with '->'. /// In 'vec_ptr->^', one of completion items is 'release', it requires replacing '.' with '->'. |
lib/Sema/SemaCodeComplete.cpp | ||
3954 ↗ | (On Diff #146749) | Maybe pass a single fix-it here too with a more descriptive name? |
4022 ↗ | (On Diff #146749) | Maybe pass a single fix-it with more descriptive name, e.g. Optional<FixItHint> AccessOpFixIt? |
4105 ↗ | (On Diff #146749) | This FIXME should probably live in CodeCompletionResult constructor or ResultBuilder::AddResult. It's pretty obvious that this specific code does the right thing, but it may be non-obvious about more generic code that stores a vector<FixItHint> |
test/SemaCXX/member-expr.cpp | ||
193 ↗ | (On Diff #146749) | Is this still needed after we removed the diagnostics code? |
tools/libclang/CIndexCodeCompletion.cpp | ||
309 ↗ | (On Diff #146289) |
Yeah, that's probably too compilcated. And it does not make sense for completion strings inside chunks to have fix-its, the fix-its only make sense for the completion item as a whole. |
I will add more tests...
test/SemaCXX/member-expr.cpp | ||
---|---|---|
193 ↗ | (On Diff #146749) | Yes, even if it does not fail it is not related to that change anymore |
Append PrintingCodeCompleteConsumer and CompilerInvocation options, add CodeCompletion tests.
Sorry, a few more things and we're good to go.
include/clang/Driver/CC1Options.td | ||
---|---|---|
449 ↗ | (On Diff #147039) | NIT: maybe simplify to "which require small fix-its"? |
lib/Frontend/CompilerInvocation.cpp | ||
1541 ↗ | (On Diff #147039) | Following naming convention of other options, maybe use OPT_code_completion_with_fixits? (i.e. none of them start with include) |
lib/Sema/CodeCompleteConsumer.cpp | ||
559 ↗ | (On Diff #147039) | Unfortunately, that might not work for some ranges, see docs of CharSourceRange: /// In the token range case, the /// size of the last token must be measured to determine the actual end of the /// range. The TextDiagnostic::emitParseableFixits handles it, I suggest we do it similarly: FixItHint*I = //....; SourceLocation BLoc = I->RemoveRange.getBegin(); SourceLocation ELoc = I->RemoveRange.getEnd(); std::pair<FileID, unsigned> BInfo = SM.getDecomposedLoc(BLoc); std::pair<FileID, unsigned> EInfo = SM.getDecomposedLoc(ELoc); // Adjust for token ranges. if (I->RemoveRange.isTokenRange()) EInfo.second += Lexer::MeasureTokenLength(ELoc, SM, LangOpts); // We specifically do not do word-wrapping or tab-expansion here, // because this is supposed to be easy to parse. PresumedLoc PLoc = SM.getPresumedLoc(BLoc); if (PLoc.isInvalid()) break; OS << "fix-it:\""; OS.write_escaped(PLoc.getFilename()); OS << "\":{" << SM.getLineNumber(BInfo.first, BInfo.second) << ':' << SM.getColumnNumber(BInfo.first, BInfo.second) << '-' << SM.getLineNumber(EInfo.first, EInfo.second) << ':' << SM.getColumnNumber(EInfo.first, EInfo.second) << "}:\""; OS.write_escaped(I->CodeToInsert); OS << "\"\n"; |
lib/Sema/SemaCodeComplete.cpp | ||
4109 ↗ | (On Diff #147039) | I'd use token ranges here to avoid assumptions about sizes of tokens, e.g. CreateReplacemen(CharSourceRange::getTokenRange(OpLoc, OpLoc), IsArrow ? '.' : '->') |
lib/Sema/CodeCompleteConsumer.cpp | ||
---|---|---|
559 ↗ | (On Diff #147039) | yeah. sorry. it's always like that with source ranges. thanks for a hint |
lib/Sema/SemaCodeComplete.cpp | ||
4109 ↗ | (On Diff #147039) | the problem is that I need the range, not just a location.... and I don't know how to extract it here. is there a way to get next token location here? |
lib/Sema/SemaCodeComplete.cpp | ||
---|---|---|
4109 ↗ | (On Diff #147039) | IIUC, using CharSourceRange::getTokenRange(OpLoc, OpLoc) will do what you want. /// In the token range case, the /// size of the last token must be measured to determine the actual end of the /// range. |
lib/Sema/SemaCodeComplete.cpp | ||
---|---|---|
4109 ↗ | (On Diff #147039) | ok, i will recheck that it does calculate the token range |
I have only a few nits left, otherwise looks good. Thanks for making this change! Really excited for it to get in finally!
include/clang/Driver/CC1Options.td | ||
---|---|---|
449 ↗ | (On Diff #148164) | NIT: there's a redundant space before the closing . |
include/clang/Sema/CodeCompleteConsumer.h | ||
786 ↗ | (On Diff #148164) | NIT: remove \brief from the start of the comment. Those recently got removed from all of llvm |
1059 ↗ | (On Diff #148164) | NIT: another redundant \brief |
include/clang/Sema/CodeCompleteOptions.h | ||
42 ↗ | (On Diff #148164) | NIT: Maybe s/Show also results/Include results? |
lib/Sema/SemaCodeComplete.cpp | ||
4148 ↗ | (On Diff #148164) | NIT: maybe swap the two cases to do the non-fixit ones first? It is the most common case, so it should probably go first. |
lib/Sema/SemaCodeComplete.cpp | ||
---|---|---|
4148 ↗ | (On Diff #148164) | The actual completion should be the last because of the "Completion context". I've put them in this order intentionally. I will add comment about that. |
lib/Sema/SemaCodeComplete.cpp | ||
---|---|---|
4148 ↗ | (On Diff #148164) | I've rechecked this part - I can actually now move it without breaking anything |