Page MenuHomePhabricator

[clangd] Refactor code completion signal's utility properties.
Needs ReviewPublic

Authored by usaxena95 on May 6 2020, 9:22 AM.



Current implementation of heuristic-based scoring function also contains
computation of derived signals (e.g. whether name contains a word from
context, computing file distances, scope distances.)
This is an attempt to separate out the logic for computation of derived
signals from the scoring function.
This will allow us to have a clean API for scoring functions that will
take only concrete code completion signals as input.

This only refactors Name and ContextWords which are essentially used to
compute whether the name contains a context word or not. If this looks
good then this will be done for other properties that are more of a
utility (used to compute other signals) than a signal themselves.

Diff Detail

Event Timeline

usaxena95 created this revision.May 6 2020, 9:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 6 2020, 9:22 AM
usaxena95 updated this revision to Diff 262852.May 8 2020, 3:40 AM
  • Added DerivedSignals struct containing all the derived signals.
  • Added NameMatchesContext and proximtiy signals to this struct.
  • We need to call computeDerivedSignals() before calling evaluate() if we set non-concrete utilites (e.g. ContextWords and Name).
  • This is logically equivalent to the previous version (both when the utilites are explicitly set and when the default signals are used).
  • The utilites are not marked as null in computeDerivedSignals. This is due to 2 reasons:
    • Current scoring function checks whether ScopeProximityMatch is set or not to decide whether to multiply with scopeProxitiyScore. Possible solutions:
      • Have scopeProxitiyScore as a derived signal itself.
      • Or have a different derived signal HasScopeProximityMatch.
    • Having these utilities available for debug purposes is great. We can try to compute other derived signals (e.g. ContextMatchesName) and test out it's value without even adding them concretely to clangd. Once their value is justified, we can add it to Quality/Relevance signals.
sammccall added inline comments.May 11 2020, 1:09 AM

why this change?


Why is it better to group the fields acconding to how they're used in the scoring function, rather than by what they mean?
(I find the new grouping harder to follow)


Can we make this Optional, so we can verify it gets computed? In fact, does it need to be a member at all, or can it just be transiently created while calling evaluate?


why must this be called explicitly rather than being computed by Evaluate?


Bad merge?