This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Generate better incomplete bigrams for the Dex index
AbandonedPublic

Authored by kbobyrev on Aug 14 2018, 4:50 AM.

Details

Reviewers
None
Summary

Currently, the query trigram generator would simply yield u_p trigram for the u_p query. This is not optimal, since the user is likely to try matching two heads with this query and this patch addresses the issue.

Diff Detail

Event Timeline

kbobyrev created this revision.Aug 14 2018, 4:50 AM
kbobyrev updated this revision to Diff 160555.Aug 14 2018, 4:51 AM

Treat leading underscores as additional signals and don't extract two heads in that case.

ioeric added inline comments.Aug 16 2018, 6:26 AM
clang-tools-extra/unittests/clangd/DexIndexTests.cpp
324

I'm not sure if this is correct. If users have explicitly typed _, they are likely to want a _ there. You mentioned in the patch summary that users might want to match two heads with this. Could you provide an example?

kbobyrev added inline comments.Aug 16 2018, 6:33 AM
clang-tools-extra/unittests/clangd/DexIndexTests.cpp
324

The particular example I had on my mind was unique_ptr. Effectively, if the query is SC then StrCat would be matched (because the incomplete trigram would be sc$ and two heads from the symbol identifier would also yield sc$), but for u_p, unique_ptr is currently not matched.

On the other hand, if there's something like m_field and user types mf or m_f it will be matched in both cases, because m_field yields mf$ in the index build stage, so this change doesn't decrease code completion quality for such cases.

ioeric added inline comments.Aug 16 2018, 6:50 AM
clang-tools-extra/unittests/clangd/DexIndexTests.cpp
324

The problem is that u_p can now match upXXX with up$, which might be a bit surprising for users.

It seems to me that another way way to handle this is to generate u_p trigram for unique_ptr. Have we considered including separators like _ in trigrams?

kbobyrev abandoned this revision.Aug 17 2018, 4:00 AM

As Eric suggested, such change might reduce the quality of code completion. A more generic approach is yet to be investigated and requires significantly more time resources, we should try it after we get other parts of the index to work as expected.

clang-tools-extra/unittests/clangd/DexIndexTests.cpp
324

I agree, this might be a better solution. Anyway, this seems that it's probably too early to optimize for such a general case like you suggested. I believe, word bounds might help here, but we should probably try them after we have a fully-functional index upstream.