This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Boost code completion results that were named in the last few lines.
ClosedPublic

Authored by sammccall on May 3 2019, 2:15 PM.

Details

Summary

The hope is this will catch a few patterns with repetition:

SomeClass* S = ^SomeClass::Create()

int getFrobnicator() { return ^frobnicator_; }

// discard the factory, it's no longer valid.
^MyFactory.reset();

Without triggering antipatterns too often:

return Point(x.first, x.^second);

I'm going to gather some data on whether this turns out to be a win overall.

With 3 lines of context, minimum word length of 6 and a score boost of 2, this looks very positive, particularly within the first couple of characters of the identifier.

Further tuning yields: 3 lines of context, minimum word length of 4 and a score boost of 1.5. This is a worth total of ~1.5 points of MRR.

==================================================================================================
                                        OVERALL (excl. CROSS_NAMESPACE and INITIALISMS)
==================================================================================================
  Total measurements: 78343 (-2)
  Average latency (ms): 104.989326477 (-30)
  All measurements:
	MRR: 71.09 (+1.44)	Top-1: 61.55% (+1.49%)	Top-5: 83.13% (+1.36%)	Top-100: 96.50% (+0.26%)
  Full identifiers:
	MRR: 97.58 (-0.05)	Top-1: 96.39% (-0.08%)	Top-5: 99.01% (-0.02%)	Top-100: 99.26% (-0.01%)
  Filter length 0-5:
	MRR:      33.37 (+3.88)		64.91 (+2.36)		72.87 (+1.36)		75.15 (+1.05)		76.37 (+0.77)		80.57 (+0.36)
	Top-1:    21.01% (+3.42%)		52.35% (+2.66%)		61.65% (+1.50%)		64.60% (+1.31%)		66.27% (+0.90%)		71.63% (+0.41%)
	Top-5:    48.00% (+4.63%)		81.18% (+2.03%)		87.19% (+0.90%)		88.67% (+0.73%)		89.31% (+0.53%)		92.00% (+0.30%)
	Top-100:  86.27% (+1.14%)		96.57% (+0.14%)		98.38% (+0.15%)		98.57% (+0.18%)		98.72% (+0.12%)		98.79% (-0.01%)
==================================================================================================
                                        INITIALISMS
==================================================================================================
  Total measurements: 11590 (+5)
  Average latency (ms): 88.707244873 (-37)
  All measurements:
	MRR: 84.16 (+1.11)	Top-1: 76.61% (+1.40%)	Top-5: 93.81% (+0.82%)	Top-100: 98.88% (+0.02%)
  Initialism length 2-4:
	MRR:      82.02 (+1.12)		87.05 (+1.35)		89.60 (+0.42)
	Top-1:    73.64% (+1.45%)		80.53% (+1.55%)		84.43% (+0.70%)
	Top-5:    92.78% (+0.68%)		95.37% (+1.38%)		95.91% (+0.17%)
	Top-100:  98.81% (+0.03%)		98.94% (+0.00%)		99.13% (+0.00%)
==================================================================================================
                                        DEFAULT
==================================================================================================
  Total measurements: 40199 (-4)
  Average latency (ms): 124.613945007 (-15)
  All measurements:
	MRR: 64.83 (+0.79)	Top-1: 54.77% (+0.77%)	Top-5: 77.60% (+0.98%)	Top-100: 94.17% (+0.27%)
  Full identifiers:
	MRR: 96.92 (-0.07)	Top-1: 95.68% (-0.10%)	Top-5: 98.54% (-0.03%)	Top-100: 98.98% (-0.02%)
  Filter length 0-5:
	MRR:      20.78 (+1.51)		54.44 (+1.45)		66.64 (+0.87)		70.51 (+0.77)		72.89 (+0.42)		74.97 (+0.47)
	Top-1:    11.34% (+0.81%)		40.37% (+1.39%)		53.71% (+0.87%)		58.68% (+1.13%)		61.92% (+0.67%)		64.73% (+0.58%)
	Top-5:    30.03% (+2.86%)		73.19% (+2.20%)		83.52% (+0.58%)		86.07% (+0.49%)		87.34% (+0.11%)		88.39% (+0.37%)
	Top-100:  75.72% (+1.52%)		94.08% (+0.06%)		97.58% (+0.08%)		97.93% (+0.14%)		98.21% (+0.02%)		98.24% (-0.02%)
==================================================================================================
                                        EXPLICIT_MEMBER_ACCESS
==================================================================================================
  Total measurements: 19778 (+37)
  Average latency (ms): 40.6332778931 (-66)
  All measurements:
	MRR: 68.45 (+3.27)	Top-1: 58.13% (+3.28%)	Top-5: 81.29% (+2.87%)	Top-100: 98.26% (+0.47%)
  Full identifiers:
	MRR: 97.12 (-0.03)	Top-1: 95.11% (-0.06%)	Top-5: 99.26% (+0.00%)	Top-100: 99.39% (+0.00%)
  Filter length 0-5:
	MRR:      33.01 (+8.42)		63.27 (+5.58)		67.75 (+3.36)		69.30 (+2.44)		69.89 (+2.07)		81.01 (+0.48)
	Top-1:    21.06% (+7.55%)		50.03% (+6.36%)		55.62% (+3.60%)		57.62% (+2.54%)		58.12% (+1.93%)		71.35% (+0.41%)
	Top-5:    46.60% (+9.39%)		80.34% (+3.59%)		83.20% (+2.44%)		84.23% (+1.87%)		84.63% (+1.85%)		93.19% (+0.41%)
	Top-100:  94.24% (+1.53%)		98.68% (+0.44%)		98.80% (+0.41%)		98.83% (+0.43%)		98.90% (+0.42%)		99.20% (+0.00%)
==================================================================================================
                                        WANT_LOCAL
==================================================================================================
  Total measurements: 18366 (-35)
  Average latency (ms): 131.339324951 (-24)
  All measurements:
	MRR: 87.63 (+0.93)	Top-1: 80.08% (+1.19%)	Top-5: 97.21% (+0.60%)	Top-100: 99.70% (-0.00%)
  Full identifiers:
	MRR: 99.37 (-0.03)	Top-1: 99.07% (-0.07%)	Top-5: 99.70% (-0.00%)	Top-100: 99.70% (-0.00%)
  Filter length 0-5:
	MRR:      59.13 (+4.23)		89.08 (+0.98)		92.10 (+0.37)		92.14 (+0.18)		92.30 (+0.11)		93.82 (-0.03)
	Top-1:    40.48% (+4.65%)		80.49% (+1.59%)		85.60% (+0.70%)		85.73% (+0.37%)		86.18% (+0.28%)		88.99% (+0.03%)
	Top-5:    85.63% (+3.58%)		99.22% (+0.07%)		99.53% (-0.00%)		99.49% (+0.04%)		99.51% (-0.00%)		99.40% (-0.00%)
	Top-100:  99.70% (-0.00%)		99.72% (-0.00%)		99.67% (-0.00%)		99.72% (-0.00%)		99.69% (-0.00%)		99.65% (-0.00%)
==================================================================================================
                                        CROSS_NAMESPACE
==================================================================================================
  Total measurements: 13706 (+14)
  Average latency (ms): 124.59098053 (-33)
  All measurements:
	MRR: 31.58 (+0.69)	Top-1: 23.53% (+0.33%)	Top-5: 40.30% (+1.03%)	Top-100: 75.27% (+1.06%)
  Full identifiers:
	MRR: 75.54 (-0.16)	Top-1: 67.59% (-0.11%)	Top-5: 84.60% (-0.62%)	Top-100: 99.27% (+0.00%)
  Filter length 0-5:
	MRR:      1.58 (-0.14)		12.86 (+1.25)		27.50 (+1.27)		29.13 (+1.15)		35.31 (+1.03)		40.81 (+0.50)
	Top-1:    0.68% (-0.24%)		6.27% (+0.53%)		16.53% (+0.43%)		19.06% (+0.76%)		25.32% (+0.80%)		30.18% (+0.24%)
	Top-5:    1.94% (-0.15%)		19.24% (+1.98%)		38.07% (+2.33%)		41.01% (+1.37%)		46.47% (+1.48%)		53.55% (+0.83%)
	Top-100:  14.48% (+2.03%)		64.48% (+0.96%)		87.65% (+0.95%)		85.47% (+1.86%)		90.30% (+1.16%)		89.96% (+0.36%)
==================================================================================================
                                        WITH EXPECTED_TYPE
==================================================================================================
  Total measurements: 34423 (-6)
  Average latency (ms): 101.443885803 (-35)
  All measurements:
	MRR: 74.04 (+1.79)	Top-1: 65.15% (+2.10%)	Top-5: 85.48% (+1.26%)	Top-100: 96.50% (+0.32%)
  Full identifiers:
	MRR: 94.95 (-0.08)	Top-1: 92.52% (-0.11%)	Top-5: 97.85% (-0.06%)	Top-100: 99.25% (-0.00%)
  Filter length 0-5:
	MRR:      41.75 (+4.21)		67.92 (+2.78)		77.86 (+1.50)		77.25 (+1.71)		77.68 (+1.34)		79.67 (+1.08)
	Top-1:    29.32% (+4.45%)		57.34% (+3.32%)		68.65% (+1.82%)		67.91% (+2.17%)		68.47% (+1.74%)		70.79% (+1.32%)
	Top-5:    57.76% (+3.62%)		81.88% (+2.37%)		89.68% (+0.78%)		89.34% (+0.92%)		89.41% (+0.66%)		91.07% (+0.68%)
	Top-100:  86.31% (+1.36%)		95.34% (+0.22%)		98.34% (+0.13%)		98.33% (+0.31%)		98.60% (+0.21%)		98.68% (+0.08%)``

Event Timeline

sammccall created this revision.May 3 2019, 2:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2019, 2:15 PM
sammccall updated this revision to Diff 198129.May 4 2019, 1:35 AM

Tune magic numbers

sammccall edited the summary of this revision. (Show Details)May 4 2019, 1:41 AM
sammccall added reviewers: ilya-biryukov, gribozavr.
sammccall edited the summary of this revision. (Show Details)May 5 2019, 7:25 AM
sammccall updated this revision to Diff 198178.May 5 2019, 7:26 AM

Further tune numbers based on experiments.
Remove stopwords due to new threshold. Fix tests.

Nice! Looking at the numbers, the improvements look worthwhile.

The stopwords don't seem to be in the patch anymore, but they do seem like a good idea. Specifically, we might want to remove all keywords - the intuition is that they don't provide much value, because users have to put them.

clangd/unittests/CodeCompleteTests.cpp
25

This included was probably added accidentally. Remove?

28

This include is redundant, maybe remove it? (added by clangd for sure)

clangd/unittests/SourceCodeTests.cpp
25

NIT: use ::testing to be consistent with the rest of the code in clangd?

sammccall updated this revision to Diff 198242.May 6 2019, 3:19 AM

address comments

sammccall updated this revision to Diff 198243.May 6 2019, 3:21 AM
sammccall marked 3 inline comments as done.

Comment about keywords

Nice! Looking at the numbers, the improvements look worthwhile.

The stopwords don't seem to be in the patch anymore, but they do seem like a good idea. Specifically, we might want to remove all keywords - the intuition is that they don't provide much value, because users have to put them.

The simplest versions of this didn't show any actual numerical benefit. There may be gains to be had by running the real lexer, will leave that for future work.

This revision was not accepted when it landed; it landed in state Needs Review.May 6 2019, 3:25 AM
This revision was automatically updated to reflect the committed changes.

(doh, sorry - I thought this was accepted. Happy to revert, or try the lexer soon if you think it's important. Case-sensitive matching was still slightly negative within the noise)

Nah, it's ok. LGTM