Currently, the query trigram generator would simply yield u_p trigram for the u_p query. This is not optimal, since the user is likely to try matching two heads with this query and this patch addresses the issue.
Details
- Reviewers
- None
Diff Detail
Event Timeline
Treat leading underscores as additional signals and don't extract two heads in that case.
| clang-tools-extra/unittests/clangd/DexIndexTests.cpp | ||
|---|---|---|
| 324 | I'm not sure if this is correct. If users have explicitly typed _, they are likely to want a _ there. You mentioned in the patch summary that users might want to match two heads with this. Could you provide an example? | |
| clang-tools-extra/unittests/clangd/DexIndexTests.cpp | ||
|---|---|---|
| 324 | The particular example I had on my mind was unique_ptr. Effectively, if the query is SC then StrCat would be matched (because the incomplete trigram would be sc$ and two heads from the symbol identifier would also yield sc$), but for u_p, unique_ptr is currently not matched. On the other hand, if there's something like m_field and user types mf or m_f it will be matched in both cases, because m_field yields mf$ in the index build stage, so this change doesn't decrease code completion quality for such cases. | |
| clang-tools-extra/unittests/clangd/DexIndexTests.cpp | ||
|---|---|---|
| 324 | The problem is that u_p can now match upXXX with up$, which might be a bit surprising for users. It seems to me that another way way to handle this is to generate u_p trigram for unique_ptr. Have we considered including separators like _ in trigrams? | |
As Eric suggested, such change might reduce the quality of code completion. A more generic approach is yet to be investigated and requires significantly more time resources, we should try it after we get other parts of the index to work as expected.
| clang-tools-extra/unittests/clangd/DexIndexTests.cpp | ||
|---|---|---|
| 324 | I agree, this might be a better solution. Anyway, this seems that it's probably too early to optimize for such a general case like you suggested. I believe, word bounds might help here, but we should probably try them after we have a fully-functional index upstream. | |
I'm not sure if this is correct. If users have explicitly typed _, they are likely to want a _ there. You mentioned in the patch summary that users might want to match two heads with this. Could you provide an example?