This is an archive of the discontinued LLVM Phabricator instance.

[clangd] New CC Ranking Model to fix bad inference due to overflow.
ClosedPublic

Authored by usaxena95 on Oct 8 2020, 4:25 AM.

Details

Summary

Unreachable file distances are represented as std::numeric_limits<unsigned>::max().
The previous dataset recorded the signals as signed int capturing this default value
as -1.

A new dataset was regenerated and a new model is trained that interprets this
unreachable as the intended value.

Distribution of SymbolScopeDistance:

Value         Frequency(%)
0             46.6184
4294967295    29.5342
6             14.5666
4              6.4433
2              1.4534
8              0.5760
10             0.3581
....

Distribution of FileProximityDistance:

Value         Frequency(%)
4294967295    39.9378
12             5.1997
14             4.9828
15             4.4221
16             4.3820
13             4.2765
17             3.8957
11             3.6387
19             3.4799
18             3.4076
....

Diff Detail

Event Timeline

usaxena95 created this revision.Oct 8 2020, 4:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 8 2020, 4:25 AM
usaxena95 requested review of this revision.Oct 8 2020, 4:25 AM
usaxena95 updated this revision to Diff 296925.Oct 8 2020, 4:46 AM

Update model to LambdaMART instead of XE_NDCG.

usaxena95 edited the summary of this revision. (Show Details)Oct 8 2020, 4:58 AM
adamcz accepted this revision.Oct 8 2020, 6:01 AM
This revision is now accepted and ready to land.Oct 8 2020, 6:01 AM