This is an archive of the discontinued LLVM Phabricator instance.

Clang Code Completion Filtering

Authored by CrisCristescu on Mar 2 2016, 10:10 AM.


Group Reviewers
Restricted Project

The patch is enabling filtering for code completion.

The stem upon which the filtering is to be done is known at the moment of parsing.
A get/set function for the completion stem has been added to the Token class in order to save the stem as the identifier of the code_complete token type. Once the token has been set, it has to be passed through to the SemaCodeCompleteX functions, which subsequently passes it to the HandleCodeCompletionResults and which passes it to the PrintingCodeCompleteConsumer::ProcessCodeCompleteResults where the actual filtering is happening.

Diff Detail


Event Timeline

CrisCristescu retitled this revision from to Clang Code Completion Filtering .
CrisCristescu updated this object.
CrisCristescu added reviewers: Restricted Project, benlangmuir, akyrtzi.
CrisCristescu set the repository for this revision to rL LLVM.
CrisCristescu changed the visibility from "Public (No Login Required)" to "All Users".
CrisCristescu added a subscriber: cfe-commits.

I'm not yet acquainted enough with the code at hand, but I wonder whether I'm understanding the code correctly:

Could it be that you only filter before printing to the output stream? The other consumers, e.g. the one used by the clang-c API function clang_codeCompleteAt is thus not affected? That would be a shame, imo. This functionality should also be applied there, don't you think?

akyrtzi edited edge metadata.Mar 2 2016, 11:50 AM

We should not bake-in filtering into the clang CodeComplete API. What kind of filtering to use should be on a higher level, different clients may want different filtering, e.g. prefix filtering, fuzzy filtering, etc.

@akyrtzi raises a very valid point - I did not think about that. KDevelop uses the clang-c API and does fuzzy matching on top of the results, e.g. for camel-case matching. But, we currently always request code completion at a word start boundary so nothing would change for us. That said, I see how this patch could be seen as a breaking behavior change, and thus should probably only get enabled by an explicit option - if at all.

But, we currently always request code completion at a word start boundary so nothing would change for us. That said, I see how this patch could be seen as a breaking behavior change, and thus should probably only get enabled by an explicit option - if at all.

I actually think it is fine changing it so clang code-completion automatically adjusts to word boundary, I cannot think of a case where it is useful to not do that, or being aware of a client that does not already adjusts anyway.

CrisCristescu edited edge metadata.Mar 3 2016, 4:21 AM
CrisCristescu changed the visibility from "All Users" to "Public (No Login Required)".
CrisCristescu updated this revision to Diff 51404.EditedMar 23 2016, 5:58 AM

This is a new way of filtering which is could be less intrusive and has less changes.

The code_completion token now stores the information, if there is one,
about the identifier that it is lexing.
During the parsing there is a common entry point which is ConsumerToken
where we can set the information necessary for filtering in the Sema
in the case of a code_completion token.
The information is retrieved when doing the actual filtering.

This introduces a change in functionality which means that some tests
had to be modified(6 tests for ObjC) due to the fact that if you press
tab at the end of a word which could be a language specific keyword,
you will get the possible completions of this identifier,
not what could be written after the space. This is in accordance with the
behaviour of IDEs, for instance XCode.
As an example:
const<TAB> will only give completions for variables like constVar,
and if you have const<SPACE><TAB> the you get the possible completions:
const double, const int etc.

CrisCristescu added a comment.EditedMar 23 2016, 6:06 AM

For the filtering itself the motivation for not doing it on the client side is the following in our use case:

  • we do not re-filter on every key stroke, we only filter when the <TAB> key is pressed i.e there is one completion point;
  • we have a PCH which will help with the generation time;
  • performance is acceptable with the patch included: less than 1s even when the file contains several "heavier" includes such as <string> or <vector>;
  • without the patch if the <TAB> key is pressed in the middle of the identifier name (i.e setDouble) an error is thrown:
fiter-member-access.cpp:19:17: error: no member named 'set' in 'MyClass'
        ~~~~~~~~~~~~~  ^
  • and most importantly, we plan to propose another set of patches which would do lookup's only on subsets of the DeclContext of the identifier i.e incomplete lookup-s in order to reduce the amount of PCH deserializations.
benlangmuir edited edge metadata.Apr 26 2016, 11:28 AM

Overall approach seems good to me, but this needs tests. Other feedback inline.


I would expect the default to be false, since if you don't implement filtering presumably you want every result.

Also: Please fix alignment of the results parameter.


Can Declaration ever be legitimately null? We're going to unconditionally filter it out if so, which seems odd. Same applies to the other cases below.


Is this reachable? If not, consider using llvm_unreachable() instead of return.

karies added a subscriber: karies.Apr 28 2016, 12:10 PM
CrisCristescu edited edge metadata.

Addresses the previous comments and adds testing for filtering.
All the tests already in CodeCompletion are testing both the backward compatibility and the new functionality itself.

LGTM with one comment about the doxygen comments, but you should probably wait to hear from @akyrtzi too.


Do we really need a group here for just one method? If we do need the group, there should be a matching @}. And it's missing one of the slashes (should be /// @{).

karies added a comment.May 9 2016, 2:12 AM

Regarding the concerns raised by @akyrtzi : maybe it's a question of making the completer interface useful at low cost (with filtering for generic use cases and less ASTReading) versus not breaking existing use cases (fuzzy match). Would you still object to this change if the filtering could be disabled in the CodeCompleteConsumer?

v.g.vassilev added inline comments.
270 ↗(On Diff #56115)

Can you initialize this variable in the PP's ctor?

Thanks for making the changes; I'd recommend to go ahead and commit and we can iterate post-commit if necessary.

CrisCristescu marked 3 inline comments as done.

Addressing some sugesstions.

CrisCristescu marked an inline comment as done.

PP CodeCompletionII initialisation.

CrisCristescu marked an inline comment as done.

Some more cosmetics.

v.g.vassilev requested changes to this revision.Jul 26 2016, 8:18 AM
v.g.vassilev added a reviewer: v.g.vassilev.
v.g.vassilev added inline comments.
748 ↗(On Diff #65508)

It seems the indentation is too much here.

This revision now requires changes to proceed.Jul 26 2016, 8:18 AM
CrisCristescu edited edge metadata.

Wrong indentation update.

v.g.vassilev accepted this revision.Jul 27 2016, 7:13 AM
v.g.vassilev edited edge metadata.
This revision is now accepted and ready to land.Jul 27 2016, 7:13 AM
v.g.vassilev closed this revision.Jul 27 2016, 8:05 AM

Landed in r276878.

It appears that build fails due to this change :

llvm/tools/clang/lib/Sema/CodeCompleteConsumer.cpp:448:3: error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default]

default: llvm_unreachable("Unknown code completion result Kind.");