By default clangd will score a code completion item using heuristics model.
Scoring can be done by Decision Forest model by passing --ranking_model=decision_forest to
clangd.
Features omitted from the model:
- NameMatch is excluded because the final score must be multiplicative in NameMatch to allow rescoring by the editor.
- NeedsFixIts is excluded because the generating dataset that needs 'fixits' is non-trivial.
There are multiple ways (heuristics) to combine the above two features with the prediction of the DF:
- NeedsFixIts is used as is with a penalty of 0.5.
Various alternatives of combining NameMatch N and Decision forest Prediction P
- N * scale(P, 0, 1): Linearly scale the output of model to range [0, 1]
- N * a^P:
- More natural: Prediction of each Decision Tree can be considered as a multiplicative boost (like NameMatch)
- Ordering is independent of the absolute value of P. Order of two items is proportional to a^{difference in model prediction score}. Higher a gives higher weightage to model output as compared to NameMatch score.
Baseline MRR = 0.619
MRR for various combinations:
N * P = 0.6346, advantage%=2.5768
N * 1.1^P = 0.6600, advantage%=6.6853
N * 1.2^P = 0.6669, advantage%=7.8005
N * 1.3^P = 0.6668, advantage%=7.7795
N * 1.4^P = 0.6659, advantage%=7.6270
N * 1.5^P = 0.6646, advantage%=7.4200
N * 1.6^P = 0.6636, advantage%=7.2671
N * 1.7^P = 0.6629, advantage%=7.1450
N * 2^P = 0.6612, advantage%=6.8673
N * 2.5^P = 0.6598, advantage%=6.6491
N * 3^P = 0.6590, advantage%=6.5242
N * scaled[0, 1] = 0.6465, advantage%=4.5054
Ideally we'd rename the evaluate() here, since SymbolQualitySignals is used for both heuristic and DecisionForest version, but evaluate is heuristic-specific. I think in pefect world this would be out of SymbolQualitySignals class (which would become just storage), but at least it should be renamed to evaluateUsingHeuristic().