This is an archive of the discontinued LLVM Phabricator instance.

Can you benchmark this? I'm nervous about the URI stuff in the hot path.
Timing CodeCompleteFlow::measureResults before/after with index enabled seems like a reasonable test.
(But you might want to make this apply to sema first too for realistic numbers?)

clangd/Quality.cpp
266	how do we know proximitypath is set at this point? Better to copy the symbol URL path I think :-(
267	proximity path needs to be set here too
271	Why U->toString() rather than ->body()?
clangd/Quality.h
80	It seems OK to have ProximityPath or ProximityScore, but we shouldn't have both: drop proximityscore and calculate it during evaluate()?

Oops, couple more comments.
But the big things I think are:

what's the performance impact of doing all this work (including the URI stuff) inside the scoring loop?
what's the most useful formula for the proximity score

clangd/Quality.cpp
264	This doesn't look quite right to me. We can tune the details later, but in practice this seems like it's very hard to get zero proximity, which is our neutral score - you need to be 18 directories away? FWIW, fozzie appears to give an additive boost proportional to 5-up, where up is the number of directories from the context you have to traverse up from the context to get to a parent of the symbol. (There's no penalty for down-traversals probably for implementation reasons, this should be smaller than the up-traversal penalty I think)
clangd/Quality.h
80	what's the plan for associated-header? should this be a smallvector<2>?

In D47935#1126283, @sammccall wrote:

Can you benchmark this? I'm nervous about the URI stuff in the hot path.
Timing CodeCompleteFlow::measureResults before/after with index enabled seems like a reasonable test.
(But you might want to make this apply to sema first too for realistic numbers?)

Sure! I had some numbers but they are on some paper that I don't access to right now... will collect some new figures (with URI manipulations in sema).

clangd/Quality.cpp
264	The numbers are guessed... definitely happy to tune. We can tune the details later, but in practice this seems like it's very hard to get zero proximity, which is our neutral score - you need to be 18 directories away? It's 18 directories away if one file is in an ancestor directories of the other (i.e. only traverse up or down). If you need to traverse up and down, the penalty for each directory is 0.1, which takes 10 directories (up+down, so 5 up in average). I think it's useful to make this distinction because I think it's more likely for a file to use a header if it's in the file path. I'm not sure if we should use zero as the neutral score. For example, if a codebase has deep directory structure, most scores are probably going to be small; conversely, most scores would be relatively big. I think relative scores are more useful. (There's no penalty for down-traversals probably for implementation reasons, this should be smaller than the up-traversal penalty I think) Why do you think down-traversal should take less penalty?
267	Alternatively, I wonder if we could give sema result a fixed proximity score as they are symbols that are already included?
271	Because both URIs need to be parsed in order to use `body()`. Here we don't parse `SymURI`.
clangd/Quality.h
80	Just want to make sure I understand. We would copy the symbol URI to use in `merge` right?
80	I think it should be easy to change this to vector when it's actually needed?

Here are some numbers by completing "clang::^" 40 times (with result limit 1000 instead of 100).

Timing in CodeCompleteFlow::measureResults

Before: Avg: 1811 us Med: 1792 us 
After: Avg: 2714 us Med: 2689 us

As a reference, a full CodeCompleteFlow (with 1000 candidates) takes ~70 ms (using LLVM's yaml index).

So, with the current limit of 100 results, the increase for measureResults should be roughly 0.18ms -> 0.27ms, which I think is reasonable.

Merge branch 'uri' into proximity
Addressed review comments.

Harbormaster completed remote builds in B19222: Diff 150923.Jun 12 2018, 3:33 AM

Cleanup comment a bit.

Harbormaster completed remote builds in B19223: Diff 150924.Jun 12 2018, 3:35 AM

PTAL

clangd/Quality.cpp
267	As discussed offline, sema symbols now have a fixed proximity score (not entirely sure about the value though).
clangd/Quality.h
80	Changed to vector anyway...
80	Experimented with this a bit (removing ProximityScore). As we print the proximity score for debugging, we would still want to keep the store. Alternatively, I made the proximity paths a parameter of `merge` as we only use them for index result anyway.

Sorry for the delay on this change. There's a bunch of complexity in this problem that I haven't seen how to slice through:

the signals needed seem like a weird fit for the Symbol*Signals structs for some reason (maybe my misdesign)
the inconsistency between how we do this for Sema and for Index results has... only slightly good reasons
the URI vs filename thing is awkward
with all this, the actual scoring still seems ad-hoc and is missing important parts (main header, transitive includes)

Not all your fault that the code reflects this, the problem is tangly. But it's hard for me to reason about APIs or performance or layering.

Looking at the last point (scoring model) because it seems the most tractable. I think this is basically an edit distance problem?
(We can call the result "proximity", start at one, and multiply by <1, or call it "distance" and start at 0 and add penalties, but it's equivalent).

we're computing distances between files (glossing over URI-space vs file-space)
the roots are the main file, and maybe the matching header
edits take us from a filepath to a related filepath:
- from a file to a header it #includes
- from a file to its parent directory
- from a parent directory to a child directory
- from a parent directory to a file in it
the distance is the smallest sum-of-penalties for any path leading from the root to the symbol

What do you think of this model?

If the model seems reasonable, then it suggests an approach of building a one-per-query data structure that computes the needed edit-distance recursively, memoizing results. SymbolRelevanceResults could store the symbol path and a pointer to the edit-distance machine, and for debugging the machine would know how to describe its configuration. URI/path mapping wouldn't be a performance concern (I think) if the memoization captured it.

Let's chat offline?

clangd/Quality.cpp
264	If you need to traverse up and down, the penalty for each directory is 0.1, which takes 10 directories (up+down, so 5 up in average). I think you've halved twice there - it still seems to be 10, which is a lot. I'm not sure if we should use zero as the neutral score. Well, zero is currently the neutral score, and this patch doesn't change it :-) I think starting at 1 for the current file and multiplying by p<1 to apply penalties should give a reasonable 0-1 score that's relatively sane even for codebases of different sizes. Happy to have a different model, but you need to explain/implement how it combines with other signals. Why do you think down-traversal should take less penalty? Intuitively, because subprojects are more closely related than superprojects. But this didn't occur to me until someone mentioned it, we should check with Matei and Alexander.

sammccall mentioned this in D47931: [clangd] Customizable URI schemes for dynamic index..Jun 12 2018, 12:12 PM

Introduced a one-per-query structure for relevance signals; use multiplication for proximity; simplify tests a bit; separate index and sema proximity scores.

Harbormaster completed remote builds in B19289: Diff 151169.Jun 13 2018, 8:03 AM

In D47935#1129987, @sammccall wrote:

Sorry for the delay on this change. There's a bunch of complexity in this problem that I haven't seen how to slice through:

the signals needed seem like a weird fit for the Symbol*Signals structs for some reason (maybe my misdesign)

According to offline discussion, I added a structure SymbolRelevanceContext that captures per-query signals like proximity paths. Not sure about the name though.

the inconsistency between how we do this for Sema and for Index results has... only slightly good reasons

The proximity scores for index and sema are now explicitly separated to make it easier to understand and debug.

the URI vs filename thing is awkward

with all this, the actual scoring still seems ad-hoc and is missing important parts (main header, transitive includes)

Not all your fault that the code reflects this, the problem is tangly. But it's hard for me to reason about APIs or performance or layering.

Looking at the last point (scoring model) because it seems the most tractable. I think this is basically an edit distance problem?
(We can call the result "proximity", start at one, and multiply by <1, or call it "distance" and start at 0 and add penalties, but it's equivalent).

we're computing distances between files (glossing over URI-space vs file-space)

the roots are the main file, and maybe the matching header

edits take us from a filepath to a related filepath:

from a file to a header it #includes

from a file to its parent directory

from a parent directory to a child directory

from a parent directory to a file in it

the distance is the smallest sum-of-penalties for any path leading from the root to the symbol

What do you think of this model?

If the model seems reasonable, then it suggests an approach of building a one-per-query data structure that computes the needed edit-distance recursively, memoizing results. SymbolRelevanceResults could store the symbol path and a pointer to the edit-distance machine, and for debugging the machine would know how to describe its configuration. URI/path mapping wouldn't be a performance concern (I think) if the memoization captured it.

I like how this model addresses the proximity for src/ and include/ setup. I think we could start with something simple and iterate, although I agree that we should strike for a design that would be easy to replace the proximity algorithm in the future.

Let's chat offline?

clangd/Quality.cpp
264	I think you've halved twice there - it still seems to be 10, which is a lot. OK you are right. It would behave badly when there are many ups and only one down. I think starting at 1 for the current file and multiplying by p<1 to apply penalties should give a reasonable 0-1 score that's relatively sane even for codebases of different sizes. Sounds good. Picked p=0.7 which seems to give reasonable scores.

Rebased.

Harbormaster completed remote builds in B19290: Diff 151172.Jun 13 2018, 8:08 AM

Thanks, this looks much clearer/more modular/more extensible to me!
A couple of notes on the abstractions before digging into details again.

clangd/Quality.h
72	This is ambiguously a couple of different (and good!) things: an encapsulation of the proximitypaths state and logic a grouping together of the "query-dependent, symbol-invariant" inputs to the relevance calculation. There's a place for both of these, but I'd argue for separating them (and only doing the second in this patch). Reasons: the former doesn't need to be in this file if it gets complex (FuzzyMatch.h is a similar case), while the latter does easier to understand/name if this hierarchy is expressed explicitly I suspect we may want the context to be a separate struct, passed to SymbolRelevanceSignals::evaluate(), rather than a member of SymbolRelevanceSignals. That would add more churn than needs to be in this patch though. If this makes sense to you then I think this class looks great but should be called something specific like `FileProximityMatcher`.
76	One of the simplifying assumptions in the model is that all signals are optional - can we make Context a pointer `= nullptr` and drop the constructor?

addressed review comments.

Rebase...

Harbormaster completed remote builds in B19330: Diff 151322.Jun 14 2018, 3:36 AM

ioeric added inline comments.Jun 14 2018, 3:39 AM

clangd/Quality.h
72	Sounds good. Thanks for the explanation!

Thanks, just details now!

clangd/Quality.cpp
208	why is this a special case? /x/a/b vs /x/a/c is 1 up + 1 down --> 0.59 /a/b vs /a/c is 1 up + 1 down --> 0.59 /b vs /c is unrelated --> 0 I don't doubt the assertion that these are unrelated paths, but I'm not sure fixing just this case is an improvement overall. (In a perfect world, we'd define the algorithm so that this case yields 0 without a discontinuity)
239	For composability, you could consider styling more tersely e.g. as ProximityPaths{/path/to/file}, and in the RelevanceSignals operator<< including it like other fields, yielding: == Symbol relevance: 0.8 == Name match: 0.7 File proximity matcher: ProximityPaths{/path/to/file} ...
clangd/Quality.h
79	Should mention the semantics of the score, maybe via the other extreme: when the SymbolURI exactly matches a proximity path, score is 1.
83	This is redundant with (IndexSymbolURI, FileProximityMatch) I think, and will only be correctly set if FileProximityMatch is set before calling merge(Symbol). Can we defer calculation until evaluate()? (If you want to dump this intermediate score, you can recompute it in operator<<, I'm not sure it's necessary).
unittests/clangd/TestFS.cpp
66 ↗	(On Diff #151322)	These helpers would be more coherent if this used the same test root as above - any reason we can't do that? Then this comment could just be "unittest: is a scheme that refers to files relative to testRoot()"
107 ↗	(On Diff #151322)	This is really surprising to me - is this the common pattern for registries? (i.e. we don't have something more declarative like bazel's `cc_library.alwayslink`)? If so, can we move the declaration to TestFS.h and give a usage example, so the consuming libraries don't have to repeat the decl?

addressed review comments

clangd/Quality.cpp
208	The intuition is that when we hit the root, it's very likely that we are switching projects. But we could leave this out of the patch and evaluate whether this is an improvement later.
clangd/Quality.h
83	Done. (If you want to dump this intermediate score, you can recompute it in operator<<, I'm not sure it's necessary). I think the proximity score would be useful for debugging, no?
unittests/clangd/TestFS.cpp
66 ↗	(On Diff #151322)	Good idea.
107 ↗	(On Diff #151322)	yeah... this pattern is also used in `tooling::CompilationDatabase` (e.g. https://github.com/llvm-mirror/clang/blob/master/lib/Tooling/CompilationDatabase.cpp#L398), and I'm not aware of a good way to deal without `alwayslink`. If so, can we move the declaration to TestFS.h and give a usage example, so the consuming libraries don't have to repeat the decl? Done.

Thanks! Just nits

clangd/Quality.cpp
327–333	No camel case here, just words
328	Could this just be inlined like the others? `Index proximity: 0.5 (ProximityRoots{foo/bar.h})`
clangd/Quality.h
72	nit: can we forward declare this here and move it down (e.g. above TopN) to keep the signals at the top? (I suspect it'll end up in another header eventually)
83	nit: just SymbolURI (signals should be conceptually source-independent, may be missing)
85–87	can you add a FIXME to unify with index proximity score? signals should be source-independent
unittests/clangd/TestFS.h
60 ↗	(On Diff #151358)	document the unittest: scheme here?

This revision is now accepted and ready to land.Jun 15 2018, 12:41 AM

addressed review comments and rebase.

Harbormaster completed remote builds in B19373: Diff 151469.Jun 15 2018, 2:00 AM

Thanks for the review!

Closed by commit rL334810: [clangd] Boost completion score according to file proximity. (authored by ioeric). · Explain WhyJun 15 2018, 2:02 AM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptJun 15 2018, 2:02 AM

Revision Contents

Path

Size

clangd/

CodeComplete.cpp

9 lines

Quality.h

31 lines

Quality.cpp

79 lines

unittests/

clangd/

QualityTests.cpp

68 lines

Commit	Tree	Parents	Author	Summary	Date
57c6b479a786	2bc8d63f80c0	281b614fd6ae 15445eb79d1c	Eric Liu	Merge remote-tracking branch 'origin/master' into proximity	Jun 15 2018, 1:59 AM
281b614fd6ae	2bc8d63f80c0	8fcd9ef9df14 348ce466f555	Eric Liu	Merge branch 'uri' into proximity	Jun 15 2018, 1:59 AM
348ce466f555	4a5f319a42d9	97ce962c8184	Eric Liu	Pulled unittest URI refactoring from D47935	Jun 15 2018, 1:55 AM
8fcd9ef9df14	db4cb173ee13	617127ce7781 97ce962c8184	Eric Liu	Merge branch 'uri' into proximity	Jun 15 2018, 1:39 AM
97ce962c8184	a1c65ead62bd	fee536ded7b5 946ef1bc49f6	Eric Liu	Merge branch 'master' of http://llvm.org/git/clang-tools-extra into uri	Jun 15 2018, 1:38 AM
617127ce7781	b92a57c812dc	2b63034548a6	Eric Liu	addressed review comments.	Jun 15 2018, 1:38 AM
2b63034548a6	88b46add3b46	03596f2dacc6	Eric Liu	removed special case for no common directory.	Jun 14 2018, 8:42 AM
03596f2dacc6	68d0c6a60cd5	d094d95a9a89	Eric Liu	clang-format	Jun 14 2018, 8:27 AM
d094d95a9a89	a0b4a897c803	f48899d2c2ab fee536ded7b5	Eric Liu	Merge branch 'uri' into proximity	Jun 14 2018, 8:26 AM
fee536ded7b5	b0296c46278d	f9d1f0aa0c98 17aeea47688a	Eric Liu	Merge remote-tracking branch 'origin/master' into uri	Jun 14 2018, 8:25 AM
f48899d2c2ab	a0b4a897c803	45f6c38e1aee 17aeea47688a	Eric Liu	Merge remote-tracking branch 'origin/master' into proximity	Jun 14 2018, 8:25 AM
45f6c38e1aee	b05f77bbe93f	4c213eba68c9	Eric Liu	addressed review comments	Jun 14 2018, 7:43 AM
4c213eba68c9	6c79ed0fb60c	6296086d733d	Eric Liu	addressed review comments/	Jun 14 2018, 3:35 AM
6296086d733d	5b8cd929afb5	43786eb6f4a5 f9d1f0aa0c98	Eric Liu	Merge branch 'uri' into proximity	Jun 13 2018, 8:06 AM
f9d1f0aa0c98	07bb33d0e077	b5d4000123e2 8a6cecd26f71	Eric Liu	Merged with origin/master	Jun 13 2018, 8:06 AM
43786eb6f4a5	6fe9c0ceeb8b	90f94822ea22	Eric Liu	Introduced a one-per-query structure for relevance signals; use multiplication… (Show More…)	Jun 13 2018, 7:59 AM
90f94822ea22	e3875c7ca023	688db8d3c097	Eric Liu	Cleanup comment a bit.	Jun 12 2018, 3:35 AM
688db8d3c097	9e37f851b8c8	5164b5384d80	Eric Liu	Addressed review comments.	Jun 12 2018, 3:33 AM
5164b5384d80	d9778d665a02	90d62b9cdd82 b5d4000123e2	Eric Liu	Merge branch 'uri' into proximity	Jun 12 2018, 1:53 AM
b5d4000123e2	1e75ace2f8ab	e96e59702521	Eric Liu	[clangd] Customizable URI schemes for dynamic index. (Show More…)	Jun 7 2018, 12:41 PM
90d62b9cdd82	d9778d665a02	114e7442dfa3	Eric Liu	[clangd] Boost completion score according to file proximity. (Show More…)	Jun 8 2018, 3:19 AM
114e7442dfa3	1e75ace2f8ab	e96e59702521	Eric Liu	[clangd] Customizable URI schemes for dynamic index.	Jun 7 2018, 12:41 PM
e96e59702521	dc3dfc0d6f00	d4faf8b0a2a2	ioeric	[clangd] Support proximity paths in index fuzzy find. (Show More…)	Jun 12 2018, 1:48 AM

Diff 151469

clangd/CodeComplete.cpp

Show First 20 Lines • Show All 842 Lines • ▼ Show 20 Lines	class CodeCompleteFlow {
PathRef FileName;		PathRef FileName;
const CodeCompleteOptions &Opts;		const CodeCompleteOptions &Opts;
// Sema takes ownership of Recorder. Recorder is valid until Sema cleanup.		// Sema takes ownership of Recorder. Recorder is valid until Sema cleanup.
CompletionRecorder *Recorder = nullptr;		CompletionRecorder *Recorder = nullptr;
int NSema = 0, NIndex = 0, NBoth = 0; // Counters for logging.		int NSema = 0, NIndex = 0, NBoth = 0; // Counters for logging.
bool Incomplete = false; // Would more be available with a higher limit?		bool Incomplete = false; // Would more be available with a higher limit?
llvm::Optional<FuzzyMatcher> Filter; // Initialized once Sema runs.		llvm::Optional<FuzzyMatcher> Filter; // Initialized once Sema runs.
std::unique_ptr<IncludeInserter> Includes; // Initialized once compiler runs.		std::unique_ptr<IncludeInserter> Includes; // Initialized once compiler runs.
		FileProximityMatcher FileProximityMatch;

public:		public:
// A CodeCompleteFlow object is only useful for calling run() exactly once.		// A CodeCompleteFlow object is only useful for calling run() exactly once.
CodeCompleteFlow(PathRef FileName, const CodeCompleteOptions &Opts)		CodeCompleteFlow(PathRef FileName, const CodeCompleteOptions &Opts)
: FileName(FileName), Opts(Opts) {}		: FileName(FileName), Opts(Opts),
		// FIXME: also use path of the main header corresponding to FileName to
		// calculate the file proximity, which would capture include/ and src/
		// project setup where headers and implementations are not in the same
		// directory.
		FileProximityMatch({FileName}) {}

CompletionList run(const SemaCompleteInput &SemaCCInput) && {		CompletionList run(const SemaCompleteInput &SemaCCInput) && {
trace::Span Tracer("CodeCompleteFlow");		trace::Span Tracer("CodeCompleteFlow");

// We run Sema code completion first. It builds an AST and calculates:		// We run Sema code completion first. It builds an AST and calculates:
// - completion results based on the AST.		// - completion results based on the AST.
// - partial identifier and context. We need these for the index query.		// - partial identifier and context. We need these for the index query.
CompletionList Output;		CompletionList Output;
▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	void addCandidate(TopN<ScoredCandidate, ScoredCandidateGreater> &Candidates,
CompletionCandidate C;		CompletionCandidate C;
C.SemaResult = SemaResult;		C.SemaResult = SemaResult;
C.IndexResult = IndexResult;		C.IndexResult = IndexResult;
C.Name = IndexResult ? IndexResult->Name : Recorder->getName(*SemaResult);		C.Name = IndexResult ? IndexResult->Name : Recorder->getName(*SemaResult);

SymbolQualitySignals Quality;		SymbolQualitySignals Quality;
SymbolRelevanceSignals Relevance;		SymbolRelevanceSignals Relevance;
Relevance.Query = SymbolRelevanceSignals::CodeComplete;		Relevance.Query = SymbolRelevanceSignals::CodeComplete;
		Relevance.FileProximityMatch = &FileProximityMatch;
if (auto FuzzyScore = fuzzyScore(C))		if (auto FuzzyScore = fuzzyScore(C))
Relevance.NameMatch = *FuzzyScore;		Relevance.NameMatch = *FuzzyScore;
else		else
return;		return;
if (IndexResult) {		if (IndexResult) {
Quality.merge(*IndexResult);		Quality.merge(*IndexResult);
Relevance.merge(*IndexResult);		Relevance.merge(*IndexResult);
}		}
▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

clangd/Quality.h

Show All 20 Lines
/// consistent regardless of the source.		/// consistent regardless of the source.
/// - compute scores from scoring signals. These are suitable for sorting.		/// - compute scores from scoring signals. These are suitable for sorting.
/// - sorting utilities like the TopN container.		/// - sorting utilities like the TopN container.
/// These could be split up further to isolate dependencies if we care.		/// These could be split up further to isolate dependencies if we care.
///		///
//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANGD_QUALITY_H		#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANGD_QUALITY_H
#define LLVM_CLANG_TOOLS_EXTRA_CLANGD_QUALITY_H		#define LLVM_CLANG_TOOLS_EXTRA_CLANGD_QUALITY_H
		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include <algorithm>		#include <algorithm>
#include <functional>		#include <functional>
#include <vector>		#include <vector>
namespace llvm {		namespace llvm {
class raw_ostream;		class raw_ostream;
}		}
namespace clang {		namespace clang {
Show All 26 Lines	struct SymbolQualitySignals {
void merge(const Symbol &IndexResult);		void merge(const Symbol &IndexResult);

// Condense these signals down to a single number, higher is better.		// Condense these signals down to a single number, higher is better.
float evaluate() const;		float evaluate() const;
};		};
llvm::raw_ostream &operator<<(llvm::raw_ostream &,		llvm::raw_ostream &operator<<(llvm::raw_ostream &,
const SymbolQualitySignals &);		const SymbolQualitySignals &);

		class FileProximityMatcher;
		sammccallUnsubmitted Done Reply Inline Actions This is ambiguously a couple of different (and good!) things: an encapsulation of the proximitypaths state and logic a grouping together of the "query-dependent, symbol-invariant" inputs to the relevance calculation. There's a place for both of these, but I'd argue for separating them (and only doing the second in this patch). Reasons: the former doesn't need to be in this file if it gets complex (FuzzyMatch.h is a similar case), while the latter does easier to understand/name if this hierarchy is expressed explicitly I suspect we may want the context to be a separate struct, passed to SymbolRelevanceSignals::evaluate(), rather than a member of SymbolRelevanceSignals. That would add more churn than needs to be in this patch though. If this makes sense to you then I think this class looks great but should be called something specific like `FileProximityMatcher`. sammccall: This is ambiguously a couple of different (and good!) things: - an encapsulation of the…
		ioericAuthorUnsubmitted Not Done Reply Inline Actions Sounds good. Thanks for the explanation! ioeric: Sounds good. Thanks for the explanation!
		sammccallUnsubmitted Done Reply Inline Actions nit: can we forward declare this here and move it down (e.g. above TopN) to keep the signals at the top? (I suspect it'll end up in another header eventually) sammccall: nit: can we forward declare this here and move it down (e.g. above TopN) to keep the signals at…

/// Attributes of a symbol-query pair that affect how much we like it.		/// Attributes of a symbol-query pair that affect how much we like it.
struct SymbolRelevanceSignals {		struct SymbolRelevanceSignals {
/// 0-1+ fuzzy-match score for unqualified name. Must be explicitly assigned.		/// 0-1+ fuzzy-match score for unqualified name. Must be explicitly assigned.
		sammccallUnsubmitted Done Reply Inline Actions One of the simplifying assumptions in the model is that all signals are optional - can we make Context a pointer `= nullptr` and drop the constructor? sammccall: One of the simplifying assumptions in the model is that all signals are optional - can we make…
float NameMatch = 1;		float NameMatch = 1;
bool Forbidden = false; // Unavailable (e.g const) or inaccessible (private).		bool Forbidden = false; // Unavailable (e.g const) or inaccessible (private).

		sammccallUnsubmitted Done Reply Inline Actions Should mention the semantics of the score, maybe via the other extreme: when the SymbolURI exactly matches a proximity path, score is 1. sammccall: Should mention the semantics of the score, maybe via the other extreme: when the SymbolURI…
		const FileProximityMatcher *FileProximityMatch = nullptr;
		sammccallUnsubmitted Not Done Reply Inline Actions It seems OK to have ProximityPath or ProximityScore, but we shouldn't have both: drop proximityscore and calculate it during evaluate()? sammccall: It seems OK to have ProximityPath or ProximityScore, but we shouldn't have both: drop…
		ioericAuthorUnsubmitted Not Done Reply Inline Actions Just want to make sure I understand. We would copy the symbol URI to use in `merge` right? ioeric: Just want to make sure I understand. We would copy the symbol URI to use in `merge` right?
		ioericAuthorUnsubmitted Not Done Reply Inline Actions Experimented with this a bit (removing ProximityScore). As we print the proximity score for debugging, we would still want to keep the store. Alternatively, I made the proximity paths a parameter of `merge` as we only use them for index result anyway. ioeric: Experimented with this a bit (removing ProximityScore). As we print the proximity score for…
		sammccallUnsubmitted Not Done Reply Inline Actions what's the plan for associated-header? should this be a smallvector<2>? sammccall: what's the plan for associated-header? should this be a smallvector<2>?
		ioericAuthorUnsubmitted Not Done Reply Inline Actions I think it should be easy to change this to vector when it's actually needed? ioeric: I think it should be easy to change this to vector when it's actually needed?
		ioericAuthorUnsubmitted Not Done Reply Inline Actions Changed to vector anyway... ioeric: Changed to vector anyway...
		/// This is used to calculate proximity between the index symbol and the
		/// query.
		llvm::StringRef SymbolURI;
		sammccallUnsubmitted Done Reply Inline Actions This is redundant with (IndexSymbolURI, FileProximityMatch) I think, and will only be correctly set if FileProximityMatch is set before calling merge(Symbol). Can we defer calculation until evaluate()? (If you want to dump this intermediate score, you can recompute it in operator<<, I'm not sure it's necessary). sammccall: This is redundant with (IndexSymbolURI, FileProximityMatch) I think, and will only be correctly…
		ioericAuthorUnsubmitted Not Done Reply Inline Actions Done. (If you want to dump this intermediate score, you can recompute it in operator<<, I'm not sure it's necessary). I think the proximity score would be useful for debugging, no? ioeric: Done. > (If you want to dump this intermediate score, you can recompute it in operator<<, I'm…
		sammccallUnsubmitted Done Reply Inline Actions nit: just SymbolURI (signals should be conceptually source-independent, may be missing) sammccall: nit: just SymbolURI (signals should be conceptually source-independent, may be missing)
/// Proximity between best declaration and the query. [0-1], 1 is closest.		/// Proximity between best declaration and the query. [0-1], 1 is closest.
float ProximityScore = 0;		/// FIXME: unify with index proximity score - signals should be
		/// source-independent.
		float SemaProximityScore = 0;
		sammccallUnsubmitted Done Reply Inline Actions can you add a FIXME to unify with index proximity score? signals should be source-independent sammccall: can you add a FIXME to unify with index proximity score? signals should be source-independent

// An approximate measure of where we expect the symbol to be used.		// An approximate measure of where we expect the symbol to be used.
enum AccessibleScope {		enum AccessibleScope {
FunctionScope,		FunctionScope,
ClassScope,		ClassScope,
FileScope,		FileScope,
GlobalScope,		GlobalScope,
} Scope = GlobalScope;		} Scope = GlobalScope;
Show All 10 Lines	struct SymbolRelevanceSignals {
float evaluate() const;		float evaluate() const;
};		};
llvm::raw_ostream &operator<<(llvm::raw_ostream &,		llvm::raw_ostream &operator<<(llvm::raw_ostream &,
const SymbolRelevanceSignals &);		const SymbolRelevanceSignals &);

/// Combine symbol quality and relevance into a single score.		/// Combine symbol quality and relevance into a single score.
float evaluateSymbolAndRelevance(float SymbolQuality, float SymbolRelevance);		float evaluateSymbolAndRelevance(float SymbolQuality, float SymbolRelevance);

		class FileProximityMatcher {
		public:
		/// \p ProximityPaths are used to compute proximity scores from symbol's
		/// declaring file. The best score will be used.
		explicit FileProximityMatcher(
		llvm::ArrayRef<llvm::StringRef> ProximityPaths);

		/// Calculates the best proximity score from proximity paths to the symbol's
		/// URI. Score is [0-1], 1 means \p SymbolURI exactly matches a proximity
		/// path. When a path cannot be encoded into the same scheme as \p
		/// SymbolURI, the proximity will be 0.
		float uriProximity(llvm::StringRef SymbolURI) const;

		private:
		llvm::SmallVector<std::string, 2> ProximityPaths;
		friend llvm::raw_ostream &operator<<(llvm::raw_ostream &,
		const FileProximityMatcher &);
		};

/// TopN<T> is a lossy container that preserves only the "best" N elements.		/// TopN<T> is a lossy container that preserves only the "best" N elements.
template <typename T, typename Compare = std::greater<T>> class TopN {		template <typename T, typename Compare = std::greater<T>> class TopN {
public:		public:
using value_type = T;		using value_type = T;
TopN(size_t N, Compare Greater = Compare())		TopN(size_t N, Compare Greater = Compare())
: N(N), Greater(std::move(Greater)) {}		: N(N), Greater(std::move(Greater)) {}

// Adds a candidate to the set.		// Adds a candidate to the set.
Show All 40 Lines

clangd/Quality.cpp

//===--- Quality.cpp --------------------------------------------- C++--===//		//===--- Quality.cpp --------------------------------------------- C++--===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
#include "Quality.h"		#include "Quality.h"
		#include "URI.h"
#include "index/Index.h"		#include "index/Index.h"
#include "clang/AST/ASTContext.h"		#include "clang/AST/ASTContext.h"
#include "clang/Basic/CharInfo.h"		#include "clang/Basic/CharInfo.h"
#include "clang/AST/DeclVisitor.h"		#include "clang/AST/DeclVisitor.h"
#include "clang/Basic/SourceManager.h"		#include "clang/Basic/SourceManager.h"
#include "clang/Sema/CodeCompleteConsumer.h"		#include "clang/Sema/CodeCompleteConsumer.h"
#include "llvm/Support/FormatVariadic.h"		#include "llvm/Support/FormatVariadic.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines	raw_ostream &operator<<(raw_ostream &OS, const SymbolQualitySignals &S) {
OS << formatv("=== Symbol quality: {0}\n", S.evaluate());		OS << formatv("=== Symbol quality: {0}\n", S.evaluate());
OS << formatv("\tReferences: {0}\n", S.References);		OS << formatv("\tReferences: {0}\n", S.References);
OS << formatv("\tDeprecated: {0}\n", S.Deprecated);		OS << formatv("\tDeprecated: {0}\n", S.Deprecated);
OS << formatv("\tReserved name: {0}\n", S.ReservedName);		OS << formatv("\tReserved name: {0}\n", S.ReservedName);
OS << formatv("\tCategory: {0}\n", static_cast<int>(S.Category));		OS << formatv("\tCategory: {0}\n", static_cast<int>(S.Category));
return OS;		return OS;
}		}

		/// Calculates a proximity score from \p From and \p To, which are URI strings
		/// that have the same scheme. This does not parse URI. A URI (sans "<scheme>:")
		/// is split into chunks by '/' and each chunk is considered a file/directory.
		/// For example, "uri:///a/b/c" will be treated as /a/b/c
		static float uriProximity(StringRef From, StringRef To) {
		auto SchemeSplitFrom = From.split(':');
		auto SchemeSplitTo = To.split(':');
		assert((SchemeSplitFrom.first == SchemeSplitTo.first) &&
		"URIs must have the same scheme in order to compute proximity.");
		auto Split = [](StringRef URIWithoutScheme) {
		SmallVector<StringRef, 8> Split;
		URIWithoutScheme.split(Split, '/', /MaxSplit=/-1, /KeepEmpty=/false);
		return Split;
		};
		SmallVector<StringRef, 8> Fs = Split(SchemeSplitFrom.second);
		SmallVector<StringRef, 8> Ts = Split(SchemeSplitTo.second);
		auto F = Fs.begin(), T = Ts.begin(), FE = Fs.end(), TE = Ts.end();
		for (; F != FE && T != TE && F == T; ++F, ++T) {
		}
		// We penalize for traversing up and down from \p From to \p To but penalize
		sammccallUnsubmitted Done Reply Inline Actions why is this a special case? /x/a/b vs /x/a/c is 1 up + 1 down --> 0.59 /a/b vs /a/c is 1 up + 1 down --> 0.59 /b vs /c is unrelated --> 0 I don't doubt the assertion that these are unrelated paths, but I'm not sure fixing just this case is an improvement overall. (In a perfect world, we'd define the algorithm so that this case yields 0 without a discontinuity) sammccall: why is this a special case? - /x/a/b vs /x/a/c is 1 up + 1 down --> 0.59 - /a/b vs /a/c is 1…
		ioericAuthorUnsubmitted Not Done Reply Inline Actions The intuition is that when we hit the root, it's very likely that we are switching projects. But we could leave this out of the patch and evaluate whether this is an improvement later. ioeric: The intuition is that when we hit the root, it's very likely that we are switching projects.
		// less for traversing down because subprojects are more closely related than
		// superprojects.
		int UpDist = FE - F;
		int DownDist = TE - T;
		return std::pow(0.7, UpDist + DownDist/2);
		}

		FileProximityMatcher::FileProximityMatcher(ArrayRef<StringRef> ProximityPaths)
		: ProximityPaths(ProximityPaths.begin(), ProximityPaths.end()) {}

		float FileProximityMatcher::uriProximity(StringRef SymbolURI) const {
		float Score = 0;
		if (!ProximityPaths.empty() && !SymbolURI.empty()) {
		for (const auto &Path : ProximityPaths)
		// Only calculate proximity score for two URIs with the same scheme so
		// that the computation can be purely text-based and thus avoid expensive
		// URI encoding/decoding.
		if (auto U = URI::create(Path, SymbolURI.split(':').first)) {
		Score = std::max(Score, clangd::uriProximity(U->toString(), SymbolURI));
		} else {
		llvm::consumeError(U.takeError());
		}
		}
		return Score;
		}

		llvm::raw_ostream &operator<<(llvm::raw_ostream &OS,
		const FileProximityMatcher &M) {
		OS << formatv("File proximity matcher: ");
		OS << formatv("ProximityPaths{{0}}", llvm::join(M.ProximityPaths.begin(),
		M.ProximityPaths.end(), ","));
		sammccallUnsubmitted Done Reply Inline Actions For composability, you could consider styling more tersely e.g. as ProximityPaths{/path/to/file}, and in the RelevanceSignals operator<< including it like other fields, yielding: == Symbol relevance: 0.8 == Name match: 0.7 File proximity matcher: ProximityPaths{/path/to/file} ... sammccall: For composability, you could consider styling more tersely e.g. as…
		return OS;
		}

static SymbolRelevanceSignals::AccessibleScope		static SymbolRelevanceSignals::AccessibleScope
ComputeScope(const NamedDecl &D) {		ComputeScope(const NamedDecl &D) {
bool InClass = false;		bool InClass = false;
for (const DeclContext *DC = D.getDeclContext(); !DC->isFileContext();		for (const DeclContext *DC = D.getDeclContext(); !DC->isFileContext();
DC = DC->getParent()) {		DC = DC->getParent()) {
if (DC->isFunctionOrMethod())		if (DC->isFunctionOrMethod())
return SymbolRelevanceSignals::FunctionScope;		return SymbolRelevanceSignals::FunctionScope;
InClass = InClass \|\| DC->isRecord();		InClass = InClass \|\| DC->isRecord();
}		}
if (InClass)		if (InClass)
return SymbolRelevanceSignals::ClassScope;		return SymbolRelevanceSignals::ClassScope;
// This threshold could be tweaked, e.g. to treat module-visible as global.		// This threshold could be tweaked, e.g. to treat module-visible as global.
if (D.getLinkageInternal() < ExternalLinkage)		if (D.getLinkageInternal() < ExternalLinkage)
return SymbolRelevanceSignals::FileScope;		return SymbolRelevanceSignals::FileScope;
return SymbolRelevanceSignals::GlobalScope;		return SymbolRelevanceSignals::GlobalScope;
}		}

void SymbolRelevanceSignals::merge(const Symbol &IndexResult) {		void SymbolRelevanceSignals::merge(const Symbol &IndexResult) {
// FIXME: Index results always assumed to be at global scope. If Scope becomes		// FIXME: Index results always assumed to be at global scope. If Scope becomes
// relevant to non-completion requests, we should recognize class members etc.		// relevant to non-completion requests, we should recognize class members etc.

		SymbolURI = IndexResult.CanonicalDeclaration.FileURI;
		sammccallUnsubmitted Not Done Reply Inline Actions This doesn't look quite right to me. We can tune the details later, but in practice this seems like it's very hard to get zero proximity, which is our neutral score - you need to be 18 directories away? FWIW, fozzie appears to give an additive boost proportional to 5-up, where up is the number of directories from the context you have to traverse up from the context to get to a parent of the symbol. (There's no penalty for down-traversals probably for implementation reasons, this should be smaller than the up-traversal penalty I think) sammccall: This doesn't look quite right to me. We can tune the details later, but in practice this seems…
		ioericAuthorUnsubmitted Not Done Reply Inline Actions The numbers are guessed... definitely happy to tune. We can tune the details later, but in practice this seems like it's very hard to get zero proximity, which is our neutral score - you need to be 18 directories away? It's 18 directories away if one file is in an ancestor directories of the other (i.e. only traverse up or down). If you need to traverse up and down, the penalty for each directory is 0.1, which takes 10 directories (up+down, so 5 up in average). I think it's useful to make this distinction because I think it's more likely for a file to use a header if it's in the file path. I'm not sure if we should use zero as the neutral score. For example, if a codebase has deep directory structure, most scores are probably going to be small; conversely, most scores would be relatively big. I think relative scores are more useful. (There's no penalty for down-traversals probably for implementation reasons, this should be smaller than the up-traversal penalty I think) Why do you think down-traversal should take less penalty? ioeric: The numbers are guessed... definitely happy to tune. > We can tune the details later, but in…
		sammccallUnsubmitted Not Done Reply Inline Actions If you need to traverse up and down, the penalty for each directory is 0.1, which takes 10 directories (up+down, so 5 up in average). I think you've halved twice there - it still seems to be 10, which is a lot. I'm not sure if we should use zero as the neutral score. Well, zero is currently the neutral score, and this patch doesn't change it :-) I think starting at 1 for the current file and multiplying by p<1 to apply penalties should give a reasonable 0-1 score that's relatively sane even for codebases of different sizes. Happy to have a different model, but you need to explain/implement how it combines with other signals. Why do you think down-traversal should take less penalty? Intuitively, because subprojects are more closely related than superprojects. But this didn't occur to me until someone mentioned it, we should check with Matei and Alexander. sammccall: > If you need to traverse up and down, the penalty for each directory is 0.1, which takes 10…
		ioericAuthorUnsubmitted Not Done Reply Inline Actions I think you've halved twice there - it still seems to be 10, which is a lot. OK you are right. It would behave badly when there are many ups and only one down. I think starting at 1 for the current file and multiplying by p<1 to apply penalties should give a reasonable 0-1 score that's relatively sane even for codebases of different sizes. Sounds good. Picked p=0.7 which seems to give reasonable scores. ioeric: > I think you've halved twice there - it still seems to be 10, which is a lot. OK you are right.
}		}

		sammccallUnsubmitted Not Done Reply Inline Actions how do we know proximitypath is set at this point? Better to copy the symbol URL path I think :-( sammccall: how do we know proximitypath is set at this point? Better to copy the symbol URL path I think…
void SymbolRelevanceSignals::merge(const CodeCompletionResult &SemaCCResult) {		void SymbolRelevanceSignals::merge(const CodeCompletionResult &SemaCCResult) {
		sammccallUnsubmitted Not Done Reply Inline Actions proximity path needs to be set here too sammccall: proximity path needs to be set here too
		ioericAuthorUnsubmitted Not Done Reply Inline Actions Alternatively, I wonder if we could give sema result a fixed proximity score as they are symbols that are already included? ioeric: Alternatively, I wonder if we could give sema result a fixed proximity score as they are…
		ioericAuthorUnsubmitted Not Done Reply Inline Actions As discussed offline, sema symbols now have a fixed proximity score (not entirely sure about the value though). ioeric: As discussed offline, sema symbols now have a fixed proximity score (not entirely sure about…
if (SemaCCResult.Availability == CXAvailability_NotAvailable \|\|		if (SemaCCResult.Availability == CXAvailability_NotAvailable \|\|
SemaCCResult.Availability == CXAvailability_NotAccessible)		SemaCCResult.Availability == CXAvailability_NotAccessible)
Forbidden = true;		Forbidden = true;

		sammccallUnsubmitted Not Done Reply Inline Actions Why U->toString() rather than ->body()? sammccall: Why U->toString() rather than ->body()?
		ioericAuthorUnsubmitted Not Done Reply Inline Actions Because both URIs need to be parsed in order to use `body()`. Here we don't parse `SymURI`. ioeric: Because both URIs need to be parsed in order to use `body()`. Here we don't parse `SymURI`.
if (SemaCCResult.Declaration) {		if (SemaCCResult.Declaration) {
// We boost things that have decls in the main file.		// We boost things that have decls in the main file. We give a fixed score
// The real proximity scores would be more general when we have them.		// for all other declarations in sema as they are already included in the
		// translation unit.
float DeclProximity =		float DeclProximity =
hasDeclInMainFile(*SemaCCResult.Declaration) ? 1.0 : 0.0;		hasDeclInMainFile(*SemaCCResult.Declaration) ? 1.0 : 0.6;
ProximityScore = std::max(DeclProximity, ProximityScore);		SemaProximityScore = std::max(DeclProximity, SemaProximityScore);
}		}

// Declarations are scoped, others (like macros) are assumed global.		// Declarations are scoped, others (like macros) are assumed global.
if (SemaCCResult.Declaration)		if (SemaCCResult.Declaration)
Scope = std::min(Scope, ComputeScope(*SemaCCResult.Declaration));		Scope = std::min(Scope, ComputeScope(*SemaCCResult.Declaration));
}		}

float SymbolRelevanceSignals::evaluate() const {		float SymbolRelevanceSignals::evaluate() const {
float Score = 1;		float Score = 1;

if (Forbidden)		if (Forbidden)
return 0;		return 0;

Score *= NameMatch;		Score *= NameMatch;

		float IndexProximityScore =
		FileProximityMatch ? FileProximityMatch->uriProximity(SymbolURI) : 0;
// Proximity scores are [0,1] and we translate them into a multiplier in the		// Proximity scores are [0,1] and we translate them into a multiplier in the
// range from 1 to 2.		// range from 1 to 2.
Score *= 1 + ProximityScore;		Score *= 1 + std::max(IndexProximityScore, SemaProximityScore);

// Symbols like local variables may only be referenced within their scope.		// Symbols like local variables may only be referenced within their scope.
// Conversely if we're in that scope, it's likely we'll reference them.		// Conversely if we're in that scope, it's likely we'll reference them.
if (Query == CodeComplete) {		if (Query == CodeComplete) {
// The narrower the scope where a symbol is visible, the more likely it is		// The narrower the scope where a symbol is visible, the more likely it is
// to be relevant when it is available.		// to be relevant when it is available.
switch (Scope) {		switch (Scope) {
case GlobalScope:		case GlobalScope:
break;		break;
case FileScope:		case FileScope:
Score *= 1.5;		Score *= 1.5;
break;		break;
case ClassScope:		case ClassScope:
Score *= 2;		Score *= 2;
break;		break;
case FunctionScope:		case FunctionScope:
Score *= 4;		Score *= 4;
break;		break;
}		}
}		}

return Score;		return Score;
}		}

raw_ostream &operator<<(raw_ostream &OS, const SymbolRelevanceSignals &S) {		raw_ostream &operator<<(raw_ostream &OS, const SymbolRelevanceSignals &S) {
OS << formatv("=== Symbol relevance: {0}\n", S.evaluate());		OS << formatv("=== Symbol relevance: {0}\n", S.evaluate());
OS << formatv("\tName match: {0}\n", S.NameMatch);		OS << formatv("\tName match: {0}\n", S.NameMatch);
OS << formatv("\tForbidden: {0}\n", S.Forbidden);		OS << formatv("\tForbidden: {0}\n", S.Forbidden);
OS << formatv("\tProximity: {0}\n", S.ProximityScore);		OS << formatv("\tSymbol URI: {0}\n", S.SymbolURI);
		if (S.FileProximityMatch) {
		sammccallUnsubmitted Done Reply Inline Actions Could this just be inlined like the others? `Index proximity: 0.5 (ProximityRoots{foo/bar.h})` sammccall: Could this just be inlined like the others? `Index proximity: 0.5 (ProximityRoots{foo/bar.h})`
		OS << formatv("\tIndex proximity: {0}\n",
		S.FileProximityMatch->uriProximity(S.SymbolURI))
		<< " (" << *S.FileProximityMatch << ")\n";
		}
		OS << formatv("\tSema proximity: {0}\n", S.SemaProximityScore);
		sammccallUnsubmitted Done Reply Inline Actions No camel case here, just words sammccall: No camel case here, just words
OS << formatv("\tQuery type: {0}\n", static_cast<int>(S.Query));		OS << formatv("\tQuery type: {0}\n", static_cast<int>(S.Query));
OS << formatv("\tScope: {0}\n", static_cast<int>(S.Scope));		OS << formatv("\tScope: {0}\n", static_cast<int>(S.Scope));
return OS;		return OS;
}		}

float evaluateSymbolAndRelevance(float SymbolQuality, float SymbolRelevance) {		float evaluateSymbolAndRelevance(float SymbolQuality, float SymbolRelevance) {
return SymbolQuality * SymbolRelevance;		return SymbolQuality * SymbolRelevance;
}		}
Show All 29 Lines

unittests/clangd/QualityTests.cpp

Show All 12 Lines
//		//
// Here we test the signal extraction and sanity-check that signals point in		// Here we test the signal extraction and sanity-check that signals point in
// the right direction. This should be supplemented by quality metrics which		// the right direction. This should be supplemented by quality metrics which
// we can compute from a corpus of queries and preferred rankings.		// we can compute from a corpus of queries and preferred rankings.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "Quality.h"		#include "Quality.h"
		#include "TestFS.h"
#include "TestTU.h"		#include "TestTU.h"
#include "gmock/gmock.h"		#include "gmock/gmock.h"
#include "gtest/gtest.h"		#include "gtest/gtest.h"

namespace clang {		namespace clang {
namespace clangd {		namespace clangd {

		// Force the unittest URI scheme to be linked,
		static int LLVM_ATTRIBUTE_UNUSED UnittestSchemeAnchorDest =
		UnittestSchemeAnchorSource;

namespace {		namespace {

TEST(QualityTests, SymbolQualitySignalExtraction) {		TEST(QualityTests, SymbolQualitySignalExtraction) {
auto Header = TestTU::withHeaderCode(R"cpp(		auto Header = TestTU::withHeaderCode(R"cpp(
int _X;		int _X;

[[deprecated]]		[[deprecated]]
int _f() { return _X; }		int _f() { return _X; }
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	Relevance.merge(CodeCompletionResult(&findDecl(AST, "deprecated"),
/Priority=/42, nullptr, false,		/Priority=/42, nullptr, false,
/Accessible=/false));		/Accessible=/false));
EXPECT_EQ(Relevance.NameMatch, SymbolRelevanceSignals().NameMatch);		EXPECT_EQ(Relevance.NameMatch, SymbolRelevanceSignals().NameMatch);
EXPECT_TRUE(Relevance.Forbidden);		EXPECT_TRUE(Relevance.Forbidden);
EXPECT_EQ(Relevance.Scope, SymbolRelevanceSignals::GlobalScope);		EXPECT_EQ(Relevance.Scope, SymbolRelevanceSignals::GlobalScope);

Relevance = {};		Relevance = {};
Relevance.merge(CodeCompletionResult(&findDecl(AST, "main"), 42));		Relevance.merge(CodeCompletionResult(&findDecl(AST, "main"), 42));
EXPECT_FLOAT_EQ(Relevance.ProximityScore, 1.0) << "Decl in current file";		EXPECT_FLOAT_EQ(Relevance.SemaProximityScore, 1.0) << "Decl in current file";
Relevance = {};		Relevance = {};
Relevance.merge(CodeCompletionResult(&findDecl(AST, "header"), 42));		Relevance.merge(CodeCompletionResult(&findDecl(AST, "header"), 42));
EXPECT_FLOAT_EQ(Relevance.ProximityScore, 0.0) << "Decl from header";		EXPECT_FLOAT_EQ(Relevance.SemaProximityScore, 0.6) << "Decl from header";
Relevance = {};		Relevance = {};
Relevance.merge(CodeCompletionResult(&findDecl(AST, "header_main"), 42));		Relevance.merge(CodeCompletionResult(&findDecl(AST, "header_main"), 42));
EXPECT_FLOAT_EQ(Relevance.ProximityScore, 1.0) << "Current file and header";		EXPECT_FLOAT_EQ(Relevance.SemaProximityScore, 1.0)
		<< "Current file and header";

Relevance = {};		Relevance = {};
Relevance.merge(CodeCompletionResult(&findAnyDecl(AST, "X"), 42));		Relevance.merge(CodeCompletionResult(&findAnyDecl(AST, "X"), 42));
EXPECT_EQ(Relevance.Scope, SymbolRelevanceSignals::FileScope);		EXPECT_EQ(Relevance.Scope, SymbolRelevanceSignals::FileScope);
Relevance = {};		Relevance = {};
Relevance.merge(CodeCompletionResult(&findAnyDecl(AST, "y"), 42));		Relevance.merge(CodeCompletionResult(&findAnyDecl(AST, "y"), 42));
EXPECT_EQ(Relevance.Scope, SymbolRelevanceSignals::ClassScope);		EXPECT_EQ(Relevance.Scope, SymbolRelevanceSignals::ClassScope);
Relevance = {};		Relevance = {};
Show All 37 Lines	TEST(QualityTests, SymbolRelevanceSignalsSanity) {
Forbidden.Forbidden = true;		Forbidden.Forbidden = true;
EXPECT_LT(Forbidden.evaluate(), Default.evaluate());		EXPECT_LT(Forbidden.evaluate(), Default.evaluate());

SymbolRelevanceSignals PoorNameMatch;		SymbolRelevanceSignals PoorNameMatch;
PoorNameMatch.NameMatch = 0.2f;		PoorNameMatch.NameMatch = 0.2f;
EXPECT_LT(PoorNameMatch.evaluate(), Default.evaluate());		EXPECT_LT(PoorNameMatch.evaluate(), Default.evaluate());

SymbolRelevanceSignals WithProximity;		SymbolRelevanceSignals WithProximity;
WithProximity.ProximityScore = 0.2f;		WithProximity.SemaProximityScore = 0.2f;
EXPECT_GT(WithProximity.evaluate(), Default.evaluate());		EXPECT_GT(WithProximity.evaluate(), Default.evaluate());

SymbolRelevanceSignals Scoped;		SymbolRelevanceSignals Scoped;
Scoped.Scope = SymbolRelevanceSignals::FileScope;		Scoped.Scope = SymbolRelevanceSignals::FileScope;
EXPECT_EQ(Scoped.evaluate(), Default.evaluate());		EXPECT_EQ(Scoped.evaluate(), Default.evaluate());
Scoped.Query = SymbolRelevanceSignals::CodeComplete;		Scoped.Query = SymbolRelevanceSignals::CodeComplete;
EXPECT_GT(Scoped.evaluate(), Default.evaluate());		EXPECT_GT(Scoped.evaluate(), Default.evaluate());
}		}

TEST(QualityTests, SortText) {		TEST(QualityTests, SortText) {
EXPECT_LT(sortText(std::numeric_limits<float>::infinity()), sortText(1000.2f));		EXPECT_LT(sortText(std::numeric_limits<float>::infinity()), sortText(1000.2f));
EXPECT_LT(sortText(1000.2f), sortText(1));		EXPECT_LT(sortText(1000.2f), sortText(1));
EXPECT_LT(sortText(1), sortText(0.3f));		EXPECT_LT(sortText(1), sortText(0.3f));
EXPECT_LT(sortText(0.3f), sortText(0));		EXPECT_LT(sortText(0.3f), sortText(0));
EXPECT_LT(sortText(0), sortText(-10));		EXPECT_LT(sortText(0), sortText(-10));
EXPECT_LT(sortText(-10), sortText(-std::numeric_limits<float>::infinity()));		EXPECT_LT(sortText(-10), sortText(-std::numeric_limits<float>::infinity()));

EXPECT_LT(sortText(1, "z"), sortText(0, "a"));		EXPECT_LT(sortText(1, "z"), sortText(0, "a"));
EXPECT_LT(sortText(0, "a"), sortText(0, "z"));		EXPECT_LT(sortText(0, "a"), sortText(0, "z"));
}		}

		// {a,b,c} becomes /clangd-test/a/b/c
		std::string joinPaths(llvm::ArrayRef<StringRef> Parts) {
		return testPath(
		llvm::join(Parts.begin(), Parts.end(), llvm::sys::path::get_separator()));
		}

		static constexpr float ProximityBase = 0.7;

		// Calculates a proximity score for an index symbol with declaration file
		// SymPath with the given URI scheme.
		float URIProximity(const FileProximityMatcher &Matcher, StringRef SymPath,
		StringRef Scheme = "file") {
		auto U = URI::create(SymPath, Scheme);
		EXPECT_TRUE(static_cast<bool>(U)) << llvm::toString(U.takeError());
		return Matcher.uriProximity(U->toString());
		}

		TEST(QualityTests, URIProximityScores) {
		FileProximityMatcher Matcher(
		/ProximityPaths=/{joinPaths({"a", "b", "c", "d", "x"})});

		EXPECT_FLOAT_EQ(URIProximity(Matcher, joinPaths({"a", "b", "c", "d", "x"})),
		1);
		EXPECT_FLOAT_EQ(URIProximity(Matcher, joinPaths({"a", "b", "c", "d", "y"})),
		ProximityBase);
		EXPECT_FLOAT_EQ(URIProximity(Matcher, joinPaths({"a", "y", "z"})),
		std::pow(ProximityBase, 5));
		EXPECT_FLOAT_EQ(
		URIProximity(Matcher, joinPaths({"a", "b", "c", "d", "e", "y"})),
		std::pow(ProximityBase, 2));
		EXPECT_FLOAT_EQ(
		URIProximity(Matcher, joinPaths({"a", "b", "m", "n", "o", "y"})),
		std::pow(ProximityBase, 5));
		EXPECT_FLOAT_EQ(
		URIProximity(Matcher, joinPaths({"a", "t", "m", "n", "o", "y"})),
		std::pow(ProximityBase, 6));
		// Note the common directory is /clang-test/
		EXPECT_FLOAT_EQ(URIProximity(Matcher, joinPaths({"m", "n", "o", "p", "y"})),
		std::pow(ProximityBase, 7));
		}

		TEST(QualityTests, URIProximityScoresWithTestURI) {
		FileProximityMatcher Matcher(
		/ProximityPaths=/{joinPaths({"b", "c", "x"})});
		EXPECT_FLOAT_EQ(URIProximity(Matcher, joinPaths({"b", "c", "x"}), "unittest"),
		1);
		EXPECT_FLOAT_EQ(URIProximity(Matcher, joinPaths({"b", "y"}), "unittest"),
		std::pow(ProximityBase, 2));
		// unittest:///b/c/x vs unittest:///m/n/y. No common directory.
		EXPECT_FLOAT_EQ(URIProximity(Matcher, joinPaths({"m", "n", "y"}), "unittest"),
		std::pow(ProximityBase, 4));
		}

} // namespace		} // namespace
} // namespace clangd		} // namespace clangd
} // namespace clang		} // namespace clang

This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Boost completion score according to file proximity.ClosedPublic

Details

Diff Detail