This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang-tools-extra/clangd/
-
clangd/
4/5
Quality.h
-
Quality.cpp

Differential D79500

[clangd] Refactor code completion signal's utility properties.
ClosedPublic

Authored by usaxena95 on May 6 2020, 9:22 AM.

Download Raw Diff

Details

Reviewers

sammccall

Summary

Current implementation of heuristic-based scoring function also contains
computation of derived signals (e.g. whether name contains a word from
context, computing file distances, scope distances.)
This is an attempt to separate out the logic for computation of derived
signals from the scoring function.
This will allow us to have a clean API for scoring functions that will
take only concrete code completion signals as input.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

usaxena95 created this revision.May 6 2020, 9:22 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 6 2020, 9:22 AM

Herald added subscribers: cfe-commits, kadircet, arphaman and 3 others. · View Herald Transcript

Harbormaster failed remote builds in B55942: Diff 262400!May 6 2020, 10:13 AM

Added DerivedSignals struct containing all the derived signals.
Added NameMatchesContext and proximtiy signals to this struct.
We need to call computeDerivedSignals() before calling evaluate() if we set non-concrete utilites (e.g. ContextWords and Name).
This is logically equivalent to the previous version (both when the utilites are explicitly set and when the default signals are used).
The utilites are not marked as null in computeDerivedSignals. This is due to 2 reasons:
- Current scoring function checks whether ScopeProximityMatch is set or not to decide whether to multiply with scopeProxitiyScore. Possible solutions:
  - Have scopeProxitiyScore as a derived signal itself.
  - Or have a different derived signal HasScopeProximityMatch.
- Having these utilities available for debug purposes is great. We can try to compute other derived signals (e.g. ContextMatchesName) and test out it's value without even adding them concretely to clangd. Once their value is justified, we can add it to Quality/Relevance signals.

Harbormaster completed remote builds in B56148: Diff 262852.May 8 2020, 5:19 AM

sammccall added inline comments.May 11 2020, 1:09 AM

clang-tools-extra/clangd/FindSymbols.cpp
112 ↗	(On Diff #262852)	why this change?
clang-tools-extra/clangd/Quality.h
139	Why is it better to group the fields acconding to how they're used in the scoring function, rather than by what they mean? (I find the new grouping harder to follow)
163	Can we make this Optional, so we can verify it gets computed? In fact, does it need to be a member at all, or can it just be transiently created while calling evaluate?
165	why must this be called explicitly rather than being computed by Evaluate?
clang-tools-extra/clangd/Quality.h.rej
1 ↗	(On Diff #262852)	Bad merge?

Addressed comments.

clang-tools-extra/clangd/Quality.h
139	I intended to separate out the concrete signals from properties/utilities used to calculate other derived signals. I agree the previous grouping made it makes it easier to follow the meaning of these. So reverted it.
165	Now evaluate() calls this.

sammccall accepted this revision.Sep 23 2020, 6:04 AM

This revision is now accepted and ready to land.Sep 23 2020, 6:04 AM

Harbormaster completed remote builds in B72648: Diff 293706.Sep 23 2020, 6:20 AM

usaxena95 edited the summary of this revision. (Show Details)Sep 23 2020, 7:10 AM

Closed with commit 158af0d3d165c0382a6a291e81ffecf0b18ffe77

Revision Contents

Path

Size

clang-tools-extra/

clangd/

Quality.h

13 lines

Quality.cpp

62 lines

Diff 293706

clang-tools-extra/clangd/Quality.h

Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	struct SymbolRelevanceSignals {

// Whether clang provided a preferred type in the completion context.		// Whether clang provided a preferred type in the completion context.
bool HadContextType = false;		bool HadContextType = false;
// Whether a source completion item or a symbol had a type information.		// Whether a source completion item or a symbol had a type information.
bool HadSymbolType = false;		bool HadSymbolType = false;
// Whether the item matches the type expected in the completion context.		// Whether the item matches the type expected in the completion context.
bool TypeMatchesPreferred = false;		bool TypeMatchesPreferred = false;

		/// Set of derived signals computed by calculateDerivedSignals(). Must not be
		sammccallUnsubmitted Done Reply Inline Actions Why is it better to group the fields acconding to how they're used in the scoring function, rather than by what they mean? (I find the new grouping harder to follow) sammccall: Why is it better to group the fields acconding to how they're used in the scoring function…
		usaxena95AuthorUnsubmitted Done Reply Inline Actions I intended to separate out the concrete signals from properties/utilities used to calculate other derived signals. I agree the previous grouping made it makes it easier to follow the meaning of these. So reverted it. usaxena95: I intended to separate out the concrete signals from properties/utilities used to calculate…
		/// set explicitly.
		struct DerivedSignals {
		/// Whether Name contains some word from context.
		bool NameMatchesContext = false;
		/// Min distance between SymbolURI and all the headers included by the TU.
		unsigned FileProximityDistance = FileDistance::Unreachable;
		/// Min distance between SymbolScope and all the available scopes.
		unsigned ScopeProximityDistance = FileDistance::Unreachable;
		};

		DerivedSignals calculateDerivedSignals() const;

void merge(const CodeCompletionResult &SemaResult);		void merge(const CodeCompletionResult &SemaResult);
void merge(const Symbol &IndexResult);		void merge(const Symbol &IndexResult);

// Condense these signals down to a single number, higher is better.		// Condense these signals down to a single number, higher is better.
float evaluate() const;		float evaluate() const;
};		};
llvm::raw_ostream &operator<<(llvm::raw_ostream &,		llvm::raw_ostream &operator<<(llvm::raw_ostream &,
const SymbolRelevanceSignals &);		const SymbolRelevanceSignals &);

/// Combine symbol quality and relevance into a single score.		/// Combine symbol quality and relevance into a single score.
float evaluateSymbolAndRelevance(float SymbolQuality, float SymbolRelevance);		float evaluateSymbolAndRelevance(float SymbolQuality, float SymbolRelevance);

		sammccallUnsubmitted Not Done Reply Inline Actions Can we make this Optional, so we can verify it gets computed? In fact, does it need to be a member at all, or can it just be transiently created while calling evaluate? sammccall: Can we make this Optional, so we can verify it gets computed? In fact, does it need to be a…
/// TopN<T> is a lossy container that preserves only the "best" N elements.		/// TopN<T> is a lossy container that preserves only the "best" N elements.
template <typename T, typename Compare = std::greater<T>> class TopN {		template <typename T, typename Compare = std::greater<T>> class TopN {
		sammccallUnsubmitted Done Reply Inline Actions why must this be called explicitly rather than being computed by Evaluate? sammccall: why must this be called explicitly rather than being computed by Evaluate?
		usaxena95AuthorUnsubmitted Done Reply Inline Actions Now evaluate() calls this. usaxena95: Now evaluate() calls this.
public:		public:
using value_type = T;		using value_type = T;
TopN(size_t N, Compare Greater = Compare())		TopN(size_t N, Compare Greater = Compare())
: N(N), Greater(std::move(Greater)) {}		: N(N), Greater(std::move(Greater)) {}

// Adds a candidate to the set.		// Adds a candidate to the set.
// Returns true if a candidate was dropped to get back under N.		// Returns true if a candidate was dropped to get back under N.
bool push(value_type &&V) {		bool push(value_type &&V) {
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

clang-tools-extra/clangd/Quality.cpp

Show First 20 Lines • Show All 314 Lines • ▼ Show 20 Lines	void SymbolRelevanceSignals::merge(const CodeCompletionResult &SemaCCResult) {

// Declarations are scoped, others (like macros) are assumed global.		// Declarations are scoped, others (like macros) are assumed global.
if (SemaCCResult.Declaration)		if (SemaCCResult.Declaration)
Scope = std::min(Scope, computeScope(SemaCCResult.Declaration));		Scope = std::min(Scope, computeScope(SemaCCResult.Declaration));

NeedsFixIts = !SemaCCResult.FixIts.empty();		NeedsFixIts = !SemaCCResult.FixIts.empty();
}		}

static std::pair<float, unsigned> uriProximity(llvm::StringRef SymbolURI,		static float fileProximityScore(unsigned FileDistance) {
URIDistance *D) {		// Range: [0, 1]
if (!D \|\| SymbolURI.empty())		// FileDistance = [0, 1, 2, 3, 4, .., FileDistance::Unreachable]
return {0.f, 0u};		// Score = [1, 0.82, 0.67, 0.55, 0.45, .., 0]
unsigned Distance = D->distance(SymbolURI);		if (FileDistance == FileDistance::Unreachable)
		return 0;
// Assume approximately default options are used for sensible scoring.		// Assume approximately default options are used for sensible scoring.
return {std::exp(Distance * -0.4f / FileDistanceOptions().UpCost), Distance};		return std::exp(FileDistance * -0.4f / FileDistanceOptions().UpCost);
}		}

static float scopeBoost(ScopeDistance &Distance,		static float scopeProximityScore(unsigned ScopeDistance) {
llvm::Optional<llvm::StringRef> SymbolScope) {		// Range: [0.6, 2].
if (!SymbolScope)		// ScopeDistance = [0, 1, 2, 3, 4, 5, 6, 7, .., FileDistance::Unreachable]
return 1;		// Score = [2.0, 1.55, 1.2, 0.93, 0.72, 0.65, 0.65, 0.65, .., 0.6]
auto D = Distance.distance(*SymbolScope);		if (ScopeDistance == FileDistance::Unreachable)
if (D == FileDistance::Unreachable)
return 0.6f;		return 0.6f;
return std::max(0.65, 2.0 * std::pow(0.6, D / 2.0));		return std::max(0.65, 2.0 * std::pow(0.6, ScopeDistance / 2.0));
}		}

static llvm::Optional<llvm::StringRef>		static llvm::Optional<llvm::StringRef>
wordMatching(llvm::StringRef Name, const llvm::StringSet<> *ContextWords) {		wordMatching(llvm::StringRef Name, const llvm::StringSet<> *ContextWords) {
if (ContextWords)		if (ContextWords)
for (const auto& Word : ContextWords->keys())		for (const auto &Word : ContextWords->keys())
if (Name.contains_lower(Word))		if (Name.contains_lower(Word))
return Word;		return Word;
return llvm::None;		return llvm::None;
}		}

		SymbolRelevanceSignals::DerivedSignals
		SymbolRelevanceSignals::calculateDerivedSignals() const {
		DerivedSignals Derived;
		Derived.NameMatchesContext = wordMatching(Name, ContextWords).hasValue();
		Derived.FileProximityDistance = !FileProximityMatch \|\| SymbolURI.empty()
		? FileDistance::Unreachable
		: FileProximityMatch->distance(SymbolURI);
		if (ScopeProximityMatch) {
		// For global symbol, the distance is 0.
		Derived.ScopeProximityDistance =
		SymbolScope ? ScopeProximityMatch->distance(*SymbolScope) : 0;
		}
		return Derived;
		}

float SymbolRelevanceSignals::evaluate() const {		float SymbolRelevanceSignals::evaluate() const {
		DerivedSignals Derived = calculateDerivedSignals();
float Score = 1;		float Score = 1;

if (Forbidden)		if (Forbidden)
return 0;		return 0;

Score *= NameMatch;		Score *= NameMatch;

// File proximity scores are [0,1] and we translate them into a multiplier in		// File proximity scores are [0,1] and we translate them into a multiplier in
// the range from 1 to 3.		// the range from 1 to 3.
Score = 1 + 2 std::max(uriProximity(SymbolURI, FileProximityMatch).first,		Score = 1 + 2 std::max(fileProximityScore(Derived.FileProximityDistance),
SemaFileProximityScore);		SemaFileProximityScore);

if (ScopeProximityMatch)		if (ScopeProximityMatch)
// Use a constant scope boost for sema results, as scopes of sema results		// Use a constant scope boost for sema results, as scopes of sema results
// can be tricky (e.g. class/function scope). Set to the max boost as we		// can be tricky (e.g. class/function scope). Set to the max boost as we
// don't load top-level symbols from the preamble and sema results are		// don't load top-level symbols from the preamble and sema results are
// always in the accessible scope.		// always in the accessible scope.
Score *=		Score *= SemaSaysInScope
SemaSaysInScope ? 2.0 : scopeBoost(*ScopeProximityMatch, SymbolScope);		? 2.0
		: scopeProximityScore(Derived.ScopeProximityDistance);

if (wordMatching(Name, ContextWords))		if (Derived.NameMatchesContext)
Score *= 1.5;		Score *= 1.5;

// Symbols like local variables may only be referenced within their scope.		// Symbols like local variables may only be referenced within their scope.
// Conversely if we're in that scope, it's likely we'll reference them.		// Conversely if we're in that scope, it's likely we'll reference them.
if (Query == CodeComplete) {		if (Query == CodeComplete) {
// The narrower the scope where a symbol is visible, the more likely it is		// The narrower the scope where a symbol is visible, the more likely it is
// to be relevant when it is available.		// to be relevant when it is available.
switch (Scope) {		switch (Scope) {
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	llvm::raw_ostream &operator<<(llvm::raw_ostream &OS,
OS << llvm::formatv("\tContext: {0}\n", getCompletionKindString(S.Context));		OS << llvm::formatv("\tContext: {0}\n", getCompletionKindString(S.Context));
OS << llvm::formatv("\tQuery type: {0}\n", static_cast<int>(S.Query));		OS << llvm::formatv("\tQuery type: {0}\n", static_cast<int>(S.Query));
OS << llvm::formatv("\tScope: {0}\n", static_cast<int>(S.Scope));		OS << llvm::formatv("\tScope: {0}\n", static_cast<int>(S.Scope));

OS << llvm::formatv("\tSymbol URI: {0}\n", S.SymbolURI);		OS << llvm::formatv("\tSymbol URI: {0}\n", S.SymbolURI);
OS << llvm::formatv("\tSymbol scope: {0}\n",		OS << llvm::formatv("\tSymbol scope: {0}\n",
S.SymbolScope ? *S.SymbolScope : "<None>");		S.SymbolScope ? *S.SymbolScope : "<None>");

		SymbolRelevanceSignals::DerivedSignals Derived = S.calculateDerivedSignals();
if (S.FileProximityMatch) {		if (S.FileProximityMatch) {
auto Score = uriProximity(S.SymbolURI, S.FileProximityMatch);		unsigned Score = fileProximityScore(Derived.FileProximityDistance);
OS << llvm::formatv("\tIndex URI proximity: {0} (distance={1})\n",		OS << llvm::formatv("\tIndex URI proximity: {0} (distance={1})\n", Score,
Score.first, Score.second);		Derived.FileProximityDistance);
}		}
OS << llvm::formatv("\tSema file proximity: {0}\n", S.SemaFileProximityScore);		OS << llvm::formatv("\tSema file proximity: {0}\n", S.SemaFileProximityScore);

OS << llvm::formatv("\tSema says in scope: {0}\n", S.SemaSaysInScope);		OS << llvm::formatv("\tSema says in scope: {0}\n", S.SemaSaysInScope);
if (S.ScopeProximityMatch)		if (S.ScopeProximityMatch)
OS << llvm::formatv("\tIndex scope boost: {0}\n",		OS << llvm::formatv("\tIndex scope boost: {0}\n",
scopeBoost(*S.ScopeProximityMatch, S.SymbolScope));		scopeProximityScore(Derived.ScopeProximityDistance));

OS << llvm::formatv(		OS << llvm::formatv(
"\tType matched preferred: {0} (Context type: {1}, Symbol type: {2}\n",		"\tType matched preferred: {0} (Context type: {1}, Symbol type: {2}\n",
S.TypeMatchesPreferred, S.HadContextType, S.HadSymbolType);		S.TypeMatchesPreferred, S.HadContextType, S.HadSymbolType);

return OS;		return OS;
}		}

▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines