Currently, the query trigram generator would simply yield u_p trigram for the u_p query. This is not optimal, since the user is likely to try matching two heads with this query and this patch addresses the issue.
Details
- Reviewers
- None
Diff Detail
Event Timeline
Treat leading underscores as additional signals and don't extract two heads in that case.
clang-tools-extra/unittests/clangd/DexIndexTests.cpp | ||
---|---|---|
324 | I'm not sure if this is correct. If users have explicitly typed _, they are likely to want a _ there. You mentioned in the patch summary that users might want to match two heads with this. Could you provide an example? |
clang-tools-extra/unittests/clangd/DexIndexTests.cpp | ||
---|---|---|
324 | The particular example I had on my mind was unique_ptr. Effectively, if the query is SC then StrCat would be matched (because the incomplete trigram would be sc$ and two heads from the symbol identifier would also yield sc$), but for u_p, unique_ptr is currently not matched. On the other hand, if there's something like m_field and user types mf or m_f it will be matched in both cases, because m_field yields mf$ in the index build stage, so this change doesn't decrease code completion quality for such cases. |
clang-tools-extra/unittests/clangd/DexIndexTests.cpp | ||
---|---|---|
324 | The problem is that u_p can now match upXXX with up$, which might be a bit surprising for users. It seems to me that another way way to handle this is to generate u_p trigram for unique_ptr. Have we considered including separators like _ in trigrams? |
As Eric suggested, such change might reduce the quality of code completion. A more generic approach is yet to be investigated and requires significantly more time resources, we should try it after we get other parts of the index to work as expected.
clang-tools-extra/unittests/clangd/DexIndexTests.cpp | ||
---|---|---|
324 | I agree, this might be a better solution. Anyway, this seems that it's probably too early to optimize for such a general case like you suggested. I believe, word bounds might help here, but we should probably try them after we have a fully-functional index upstream. |
I'm not sure if this is correct. If users have explicitly typed _, they are likely to want a _ there. You mentioned in the patch summary that users might want to match two heads with this. Could you provide an example?