Current implementation of heuristic-based scoring function also contains
computation of derived signals (e.g. whether name contains a word from
context, computing file distances, scope distances.)
This is an attempt to separate out the logic for computation of derived
signals from the scoring function.
This will allow us to have a clean API for scoring functions that will
take only concrete code completion signals as input.
Details
- Reviewers
sammccall
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
- Added DerivedSignals struct containing all the derived signals.
- Added NameMatchesContext and proximtiy signals to this struct.
- We need to call computeDerivedSignals() before calling evaluate() if we set non-concrete utilites (e.g. ContextWords and Name).
- This is logically equivalent to the previous version (both when the utilites are explicitly set and when the default signals are used).
- The utilites are not marked as null in computeDerivedSignals. This is due to 2 reasons:
- Current scoring function checks whether ScopeProximityMatch is set or not to decide whether to multiply with scopeProxitiyScore. Possible solutions:
- Have scopeProxitiyScore as a derived signal itself.
- Or have a different derived signal HasScopeProximityMatch.
- Having these utilities available for debug purposes is great. We can try to compute other derived signals (e.g. ContextMatchesName) and test out it's value without even adding them concretely to clangd. Once their value is justified, we can add it to Quality/Relevance signals.
- Current scoring function checks whether ScopeProximityMatch is set or not to decide whether to multiply with scopeProxitiyScore. Possible solutions:
clang-tools-extra/clangd/FindSymbols.cpp | ||
---|---|---|
112 ↗ | (On Diff #262852) | why this change? |
clang-tools-extra/clangd/Quality.h | ||
139 | Why is it better to group the fields acconding to how they're used in the scoring function, rather than by what they mean? | |
163 | Can we make this Optional, so we can verify it gets computed? In fact, does it need to be a member at all, or can it just be transiently created while calling evaluate? | |
165 | why must this be called explicitly rather than being computed by Evaluate? | |
clang-tools-extra/clangd/Quality.h.rej | ||
1 ↗ | (On Diff #262852) | Bad merge? |
Addressed comments.
clang-tools-extra/clangd/Quality.h | ||
---|---|---|
139 | I intended to separate out the concrete signals from properties/utilities used to calculate other derived signals. I agree the previous grouping made it makes it easier to follow the meaning of these. So reverted it. | |
165 | Now evaluate() calls this. |
Why is it better to group the fields acconding to how they're used in the scoring function, rather than by what they mean?
(I find the new grouping harder to follow)