This is an archive of the discontinued LLVM Phabricator instance.

Differential D96353

[clangd] Use ML Code completion ranking as default.
ClosedPublic

Authored by usaxena95 on Feb 9 2021, 9:37 AM.

Download Raw Diff

Details

Reviewers

hokein

Commits

rG438b5bb05a42: [clangd] Use ML Code completion ranking as default.

Summary

This makes code completion use a Decision Forest based ranking algorithm to rank
completion candidates. [Estimated 6% accuracy boost]. This was
previously hidden behind the flag --ranking-model=decision_forest. This
patch makes it the default ranking algorithm.

Note: this is a generic model, not specialized for any particular
project. clangd does not collect or upload data to train code completion.

Also treat Keywords separately as they are not recorded by the training set generator.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

usaxena95 created this revision.Feb 9 2021, 9:37 AM

Herald added subscribers: kadircet, arphaman. · View Herald TranscriptFeb 9 2021, 9:37 AM

usaxena95 requested review of this revision.Feb 9 2021, 9:37 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2021, 9:37 AM

Herald added subscribers: cfe-commits, MaskRay, ilya-biryukov. · View Herald Transcript

Harbormaster completed remote builds in B88482: Diff 322427.Feb 9 2021, 10:23 AM

hokein added inline comments.Feb 10 2021, 11:51 PM

clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp
650	sorry, I didn't infer the motivation of this change func->ns from this patch, could you explain?
650–656	looks like the Results here is not verified, as you remove the line below, is it intentional?

Addressed comments.

clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp
650	Sorry about not highlighting this. The ML model doesn't rank the function before the namespace. I removed this and made both of these as functions to independently validate the effect of references.
650–656	Oops. Thanks!.

maybe add some data (improvement DecisionForest vs heuristic) in the patch description.

clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp
650	ah, I see, that makes sense, thanks!

This revision is now accepted and ready to land.Feb 11 2021, 1:49 AM

Harbormaster completed remote builds in B88769: Diff 322922.Feb 11 2021, 2:39 AM

Closed by commit rG438b5bb05a42: [clangd] Use ML Code completion ranking as default. (authored by usaxena95). · Explain WhyMar 2 2021, 1:08 AM

This revision was automatically updated to reflect the committed changes.

usaxena95 added a commit: rG438b5bb05a42: [clangd] Use ML Code completion ranking as default..

usaxena95 edited the summary of this revision. (Show Details)Mar 2 2021, 2:00 AM

usaxena95 added a reverting change: rG7f086d74c347: Revert "[clangd] Use ML Code completion ranking as default.".Mar 2 2021, 6:05 AM

Revision Contents

Path

Size

clang-tools-extra/

clangd/

CodeComplete.h

2 lines

Quality.cpp

8 lines

unittests/

CodeCompleteTests.cpp

14 lines

Diff 322922

clang-tools-extra/clangd/CodeComplete.h

Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	struct CodeCompleteOptions {
std::function<void(const CodeCompletion &, const SymbolQualitySignals &,		std::function<void(const CodeCompletion &, const SymbolQualitySignals &,
const SymbolRelevanceSignals &, float Score)>		const SymbolRelevanceSignals &, float Score)>
RecordCCResult;		RecordCCResult;

/// Model to use for ranking code completion candidates.		/// Model to use for ranking code completion candidates.
enum CodeCompletionRankingModel {		enum CodeCompletionRankingModel {
Heuristics,		Heuristics,
DecisionForest,		DecisionForest,
} RankingModel = Heuristics;		} RankingModel = DecisionForest;

/// Callback used to score a CompletionCandidate if DecisionForest ranking		/// Callback used to score a CompletionCandidate if DecisionForest ranking
/// model is enabled.		/// model is enabled.
/// This allows us to inject experimental models and compare them with		/// This allows us to inject experimental models and compare them with
/// baseline model using A/B testing.		/// baseline model using A/B testing.
std::function<DecisionForestScores(		std::function<DecisionForestScores(
const SymbolQualitySignals &, const SymbolRelevanceSignals &, float Base)>		const SymbolQualitySignals &, const SymbolRelevanceSignals &, float Base)>
DecisionForestScorer = &evaluateDecisionForest;		DecisionForestScorer = &evaluateDecisionForest;
▲ Show 20 Lines • Show All 180 Lines • Show Last 20 Lines

clang-tools-extra/clangd/Quality.cpp

Show First 20 Lines • Show All 574 Lines • ▼ Show 20 Lines	evaluateDecisionForest(const SymbolQualitySignals &Quality,
E.setHadSymbolType(Relevance.HadSymbolType);		E.setHadSymbolType(Relevance.HadSymbolType);
E.setTypeMatchesPreferred(Relevance.TypeMatchesPreferred);		E.setTypeMatchesPreferred(Relevance.TypeMatchesPreferred);

DecisionForestScores Scores;		DecisionForestScores Scores;
// Exponentiating DecisionForest prediction makes the score of each tree a		// Exponentiating DecisionForest prediction makes the score of each tree a
// multiplciative boost (like NameMatch). This allows us to weigh the		// multiplciative boost (like NameMatch). This allows us to weigh the
// prediciton score and NameMatch appropriately.		// prediciton score and NameMatch appropriately.
Scores.ExcludingName = pow(Base, Evaluate(E));		Scores.ExcludingName = pow(Base, Evaluate(E));
// NeedsFixIts is not part of the DecisionForest as generating training		// Following cases are not part of the generated training dataset:
// data that needs fixits is not-feasible.		// - Symbols with `NeedsFixIts`.
		// - Forbidden symbols.
		// - Keywords: Dataset contains only macros and decls.
if (Relevance.NeedsFixIts)		if (Relevance.NeedsFixIts)
Scores.ExcludingName *= 0.5;		Scores.ExcludingName *= 0.5;
if (Relevance.Forbidden)		if (Relevance.Forbidden)
Scores.ExcludingName *= 0;		Scores.ExcludingName *= 0;
		if (Quality.Category == SymbolQualitySignals::Keyword)
		Scores.ExcludingName *= 4;

// NameMatch should be a multiplier on total score to support rescoring.		// NameMatch should be a multiplier on total score to support rescoring.
Scores.Total = Relevance.NameMatch * Scores.ExcludingName;		Scores.Total = Relevance.NameMatch * Scores.ExcludingName;
return Scores;		return Scores;
}		}

// Produces an integer that sorts in the same order as F.		// Produces an integer that sorts in the same order as F.
// That is: a < b <==> encodeFloat(a) < encodeFloat(b).		// That is: a < b <==> encodeFloat(a) < encodeFloat(b).
Show All 36 Lines

clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp

Show First 20 Lines • Show All 641 Lines • ▼ Show 20 Lines	auto Results = completions(
R"cpp(		R"cpp(
void f() { ns::x^ }		void f() { ns::x^ }
)cpp",		)cpp",
{cls("ns::XYZ"), func("ns::foo")});		{cls("ns::XYZ"), func("ns::foo")});
EXPECT_THAT(Results.Completions, UnorderedElementsAre(Named("XYZ")));		EXPECT_THAT(Results.Completions, UnorderedElementsAre(Named("XYZ")));
}		}

TEST(CompletionTest, ReferencesAffectRanking) {		TEST(CompletionTest, ReferencesAffectRanking) {
auto Results = completions("int main() { abs^ }", {ns("absl"), func("absb")});		EXPECT_THAT(completions("int main() { abs^ }", {func("absA"), func("absB")})
		hokeinUnsubmitted Done Reply Inline Actions sorry, I didn't infer the motivation of this change func->ns from this patch, could you explain? hokein: sorry, I didn't infer the motivation of this change func->ns from this patch, could you explain?
		usaxena95AuthorUnsubmitted Done Reply Inline Actions Sorry about not highlighting this. The ML model doesn't rank the function before the namespace. I removed this and made both of these as functions to independently validate the effect of references. usaxena95: Sorry about not highlighting this. The ML model doesn't rank the function before the namespace.
		hokeinUnsubmitted Not Done Reply Inline Actions ah, I see, that makes sense, thanks! hokein: ah, I see, that makes sense, thanks!
EXPECT_THAT(Results.Completions,		.Completions,
HasSubsequence(Named("absb"), Named("absl")));		HasSubsequence(Named("absA"), Named("absB")));
Results = completions("int main() { abs^ }",		EXPECT_THAT(completions("int main() { abs^ }",
{withReferences(10000, ns("absl")), func("absb")});		{func("absA"), withReferences(1000, func("absB"))})
EXPECT_THAT(Results.Completions,		.Completions,
HasSubsequence(Named("absl"), Named("absb")));		HasSubsequence(Named("absB"), Named("absA")));
		hokeinUnsubmitted Done Reply Inline Actions looks like the Results here is not verified, as you remove the line below, is it intentional? hokein: looks like the Results here is not verified, as you remove the line below, is it intentional?
		usaxena95AuthorUnsubmitted Done Reply Inline Actions Oops. Thanks!. usaxena95: Oops. Thanks!.
}		}

TEST(CompletionTest, ContextWords) {		TEST(CompletionTest, ContextWords) {
auto Results = completions(R"cpp(		auto Results = completions(R"cpp(
enum class Color { RED, YELLOW, BLUE };		enum class Color { RED, YELLOW, BLUE };

// (blank lines so the definition above isn't "context")		// (blank lines so the definition above isn't "context")

▲ Show 20 Lines • Show All 2,450 Lines • Show Last 20 Lines