[clangd] Use Decision Forest to score code completions.

Authored by usaxena95 on Sep 21 2020, 10:56 PM.


[clangd] Use Decision Forest to score code completions.

By default clangd will score a code completion item using heuristics model.

Scoring can be done by Decision Forest model by passing --ranking_model=decision_forest to

Features omitted from the model:

  • NameMatch is excluded because the final score must be multiplicative in NameMatch to allow rescoring by the editor.
  • NeedsFixIts is excluded because the generating dataset that needs 'fixits' is non-trivial.

There are multiple ways (heuristics) to combine the above two features with the prediction of the DF:

  • NeedsFixIts is used as is with a penalty of 0.5.

Various alternatives of combining NameMatch N and Decision forest Prediction P

  • N * scale(P, 0, 1): Linearly scale the output of model to range [0, 1]
  • N * a^P:
    • More natural: Prediction of each Decision Tree can be considered as a multiplicative boost (like NameMatch)
    • Ordering is independent of the absolute value of P. Order of two items is proportional to a^{difference in model prediction score}. Higher a gives higher weightage to model output as compared to NameMatch score.

Baseline MRR = 0.619
MRR for various combinations:
N * P = 0.6346, advantage%=2.5768
N * 1.1^P = 0.6600, advantage%=6.6853
N * 1.2^P = 0.6669, advantage%=7.8005
N * 1.3^P = 0.6668, advantage%=7.7795
N * 1.4^P = 0.6659, advantage%=7.6270
N * 1.5^P = 0.6646, advantage%=7.4200
N * 1.6^P = 0.6636, advantage%=7.2671
N * 1.7^P = 0.6629, advantage%=7.1450
N * 2^P = 0.6612, advantage%=6.8673
N * 2.5^P = 0.6598, advantage%=6.6491
N * 3^P = 0.6590, advantage%=6.5242
N * scaled[0, 1] = 0.6465, advantage%=4.5054

Differential Revision: https://reviews.llvm.org/D88281


usaxena95Sep 28 2020, 9:59 AM
Differential Revision
D88281: [clangd] Use Decision Forest to score code completions.
rG76753a597b5d: Add FunctionType to MLIR C and Python bindings.