This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clangd/index/dex/
-
index/
-
dex/
1/1
Dex.cpp
1/1
Trigram.h
-
Trigram.cpp
-
unittests/clangd/
-
clangd/
1/1
DexTests.cpp

Differential D52808

[cland] Dex: fix/simplify trigram generation
ClosedPublic

Authored by sammccall on Oct 2 2018, 4:16 PM.

Download Raw Diff

Details

Reviewers

ioeric

Commits

rGb5bbfef6cd89: [cland] Dex: fix/simplify short-trigram generation
rL343775: [cland] Dex: fix/simplify short-trigram generation
rCTE343775: [cland] Dex: fix/simplify short-trigram generation

Summary

Instead of a$$ for a short-query trigram, just use a
Generate more short-query trigrams, e.g. "AbcDefGhi" now yields "d" and "ag". This is effectively required by LSP, having "ag" not match but "agh" match will lead to glitches due to client-side filtering.
Drop leading-punctuation short-query trigrams. Nice idea, but current implementation is broken (competes with non-punctuation short query trigrams)

Diff Detail

Repository: rCTE Clang Tools Extra

Event Timeline

sammccall created this revision.Oct 2 2018, 4:16 PM

Herald added subscribers: cfe-commits, kadircet, arphaman and 3 others. · View Herald TranscriptOct 2 2018, 4:16 PM

Harbormaster completed remote builds in B23382: Diff 168047.Oct 2 2018, 4:17 PM

Update comment, revert unintended change

Harbormaster completed remote builds in B23383: Diff 168049.Oct 2 2018, 4:19 PM

Generate more short-query trigrams, e.g. "AbcDefGhi" now yields "d" and "ag".

I am concerned about the impact on the size of posting lists (we can measure) and retrieval quality by adding more incomplete trigrams.

This is effectively required by LSP, having "ag" not match but "agh" match will lead to glitches due to client-side filtering.

It seems hard to make index behave the same as LSP clients, considering all the ranking signals we have; there can be other reasons that cause having "ag" not match but "agh" match. And it seems reasonable to assume that you would get better symbols as you type more characters (i.e. stronger relevance signal).

Drop leading-punctuation short-query trigrams. Nice idea, but current implementation is broken (competes with non-punctuation short query trigrams).

Could you elaborate how this is broken? We should probably fix it instead of removing it. __ not matching __some_macro sounds like a regression.

unittests/clangd/DexTests.cpp
399	nit: remove?

TL;DR: i'm no longer convinced of my conclusions for short-query, look at the proposal below.

In D52808#1254899, @ioeric wrote:

Generate more short-query trigrams, e.g. "AbcDefGhi" now yields "d" and "ag".

I am concerned about the impact on the size of posting lists (we can measure) and retrieval quality by adding more incomplete trigrams.

There is indeed an impact here. Before: 22309480, After: 22531144.
About 1/3 of this is posting lists, so 1% to overall, 3% on posting lists.
I haven't measured quality (don't have a good way to do that for dex currently). It's true that the new results we're admitting are probably worse as they don't start with the right letter.

This is effectively required by LSP, having "ag" not match but "agh" match will lead to glitches due to client-side filtering.

It seems hard to make index behave the same as LSP clients, considering all the ranking signals we have; there can be other reasons that cause having "ag" not match but "agh" match.

No, there's only one such reason: we truncated the result list (the symbol matched, but it wasn't one of the best N).
This reason is accounted for by LSP: client-side filtering only occurs when there was no truncation.

And it seems reasonable to assume that you would get better symbols as you type more characters (i.e. stronger relevance signal).

The problem is LSP clients are free to assume that the result list is complete (unless marked as incomplete) and therefore will never retrieve the better symbols.

Drop leading-punctuation short-query trigrams. Nice idea, but current implementation is broken (competes with non-punctuation short query trigrams).

Could you elaborate how this is broken? We should probably fix it instead of removing it. __ not matching __some_macro sounds like a regression.

fuzzyFind considers _ptr to match unique_ptr, so it needs to consider _ to match it too.

We will still match __some_macro, but the posting lists will match ~everything and then postfilter. It may make sense to do optimizations or tweak overall trigram generation for this case, but the way it's currently done seems unprincipled and a little broken.

OK, so after some thought on the tradeoffs here, what about this alternative design:

for query length <3, we support really restrictive matches: exact prefix, or first char + next head char. We
we get around the LSP restrictions by always marking such result sets as incomplete

Use similar but better-defined rules for short trigram matches.
Modify Dex to account for the matches not being exhaustive.

Unfortunately the test needs D52796, which depends on this patch.

Harbormaster completed remote builds in B23454: Diff 168280.Oct 4 2018, 6:04 AM

The problem is LSP clients are free to assume that the result list is complete (unless marked as incomplete) and therefore will never retrieve the better symbols.

Good point. Thanks for the explanation!

clangd/index/dex/Dex.cpp
222	should this still be a `vlog`?
clangd/index/dex/Trigram.h
47	is `b` still generated?

This revision is now accepted and ready to land.Oct 4 2018, 6:54 AM

Closed by commit rCTE343775: [cland] Dex: fix/simplify short-trigram generation (authored by sammccall). · Explain WhyOct 4 2018, 7:03 AM

This revision was automatically updated to reflect the committed changes.

sammccall marked 2 inline comments as done.

Revision Contents

Path

Size

clangd/

index/

dex/

Dex.cpp

4 lines

Trigram.h

29 lines

Trigram.cpp

98 lines

unittests/

clangd/

DexTests.cpp

74 lines

Diff 168284

clangd/index/dex/Dex.cpp

Show First 20 Lines • Show All 150 Lines • ▼ Show 20 Lines
/// while applying Callback to each symbol in the order of decreasing quality		/// while applying Callback to each symbol in the order of decreasing quality
/// of the matched symbols.		/// of the matched symbols.
bool Dex::fuzzyFind(const FuzzyFindRequest &Req,		bool Dex::fuzzyFind(const FuzzyFindRequest &Req,
llvm::function_ref<void(const Symbol &)> Callback) const {		llvm::function_ref<void(const Symbol &)> Callback) const {
assert(!StringRef(Req.Query).contains("::") &&		assert(!StringRef(Req.Query).contains("::") &&
"There must be no :: in query.");		"There must be no :: in query.");
trace::Span Tracer("Dex fuzzyFind");		trace::Span Tracer("Dex fuzzyFind");
FuzzyMatcher Filter(Req.Query);		FuzzyMatcher Filter(Req.Query);
bool More = false;		// For short queries we use specialized trigrams that don't yield all results.
		// Prevent clients from postfiltering them for longer queries.
		bool More = !Req.Query.empty() && Req.Query.size() < 3;

std::vector<std::unique_ptr<Iterator>> TopLevelChildren;		std::vector<std::unique_ptr<Iterator>> TopLevelChildren;
const auto TrigramTokens = generateQueryTrigrams(Req.Query);		const auto TrigramTokens = generateQueryTrigrams(Req.Query);

// Generate query trigrams and construct AND iterator over all query		// Generate query trigrams and construct AND iterator over all query
// trigrams.		// trigrams.
std::vector<std::unique_ptr<Iterator>> TrigramIterators;		std::vector<std::unique_ptr<Iterator>> TrigramIterators;
for (const auto &Trigram : TrigramTokens) {		for (const auto &Trigram : TrigramTokens) {
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	bool Dex::fuzzyFind(const FuzzyFindRequest &Req,
// Retrieve more items than it was requested: some of the items with high		// Retrieve more items than it was requested: some of the items with high
// final score might not be retrieved otherwise.		// final score might not be retrieved otherwise.
// FIXME(kbobyrev): Pre-scoring retrieval threshold should be adjusted as		// FIXME(kbobyrev): Pre-scoring retrieval threshold should be adjusted as
// using 100x of the requested number might not be good in practice, e.g.		// using 100x of the requested number might not be good in practice, e.g.
// when the requested number of items is small.		// when the requested number of items is small.
auto Root = Req.Limit ? Corpus.limit(move(QueryIterator), Req.Limit 100)		auto Root = Req.Limit ? Corpus.limit(move(QueryIterator), Req.Limit 100)
: move(QueryIterator);		: move(QueryIterator);
SPAN_ATTACH(Tracer, "query", llvm::to_string(*Root));		SPAN_ATTACH(Tracer, "query", llvm::to_string(*Root));
vlog("Dex query tree: {0}", *Root);		vlog("Dex query tree: {0}", *Root);
		ioericUnsubmitted Done Reply Inline Actions should this still be a `vlog`? ioeric: should this still be a `vlog`?

using IDAndScore = std::pair<DocID, float>;		using IDAndScore = std::pair<DocID, float>;
std::vector<IDAndScore> IDAndScores = consume(*Root);		std::vector<IDAndScore> IDAndScores = consume(*Root);

auto Compare = [](const IDAndScore &LHS, const IDAndScore &RHS) {		auto Compare = [](const IDAndScore &LHS, const IDAndScore &RHS) {
return LHS.second > RHS.second;		return LHS.second > RHS.second;
};		};
TopN<IDAndScore, decltype(Compare)> Top(		TopN<IDAndScore, decltype(Compare)> Top(
▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

clangd/index/dex/Trigram.h

	Show All 27 Lines

	#include <string>			#include <string>

	namespace clang {			namespace clang {
	namespace clangd {			namespace clangd {
	namespace dex {			namespace dex {

	/// Returns list of unique fuzzy-search trigrams from unqualified symbol.			/// Returns list of unique fuzzy-search trigrams from unqualified symbol.
				/// The trigrams give the 3-character query substrings this symbol can match.
	///			///
	/// First, given Identifier (unqualified symbol name) is segmented using			/// The symbol's name is broken into segments, e.g. "FooBar" has two segments.
	/// FuzzyMatch API and lowercased. After segmentation, the following technique
	/// is applied for generating trigrams: for each letter or digit in the input
	/// string the algorithms looks for the possible next and skip-1-next characters
	/// which can be jumped to during fuzzy matching. Each combination of such three
	/// characters is inserted into the result.
	///
	/// Trigrams can start at any character in the input. Then we can choose to move			/// Trigrams can start at any character in the input. Then we can choose to move
	/// to the next character, move to the start of the next segment, or skip over a			/// to the next character, move to the start of the next segment, or stop.
	/// segment.
	///			///
	/// This also generates incomplete trigrams for short query scenarios:			/// Short trigrams (length 1-2) are used for short queries. These are:
	/// * Empty trigram: "$$$".			/// - prefixes of the identifier, of length 1 and 2
	/// * Unigram: the first character of the identifier.			/// - the first character + next head character
	/// * Bigrams: a 2-char prefix of the identifier and a bigram of the first two			///
	/// HEAD characters (if they exist).			/// For "FooBar" we get the following trigrams:
	//			/// {f, fo, fb, foo, fob, fba, oob, oba, bar}.
				ioericUnsubmitted Done Reply Inline Actions is `b` still generated? ioeric: is `b` still generated?
	/// Note: the returned list of trigrams does not have duplicates, if any trigram			///
	/// belongs to more than one class it is only inserted once.			/// Trigrams are lowercase, as trigram matching is case-insensitive.
				/// Trigrams in the returned list are deduplicated.
	std::vector<Token> generateIdentifierTrigrams(llvm::StringRef Identifier);			std::vector<Token> generateIdentifierTrigrams(llvm::StringRef Identifier);

	/// Returns list of unique fuzzy-search trigrams given a query.			/// Returns list of unique fuzzy-search trigrams given a query.
	///			///
	/// Query is segmented using FuzzyMatch API and downcasted to lowercase. Then,			/// Query is segmented using FuzzyMatch API and downcasted to lowercase. Then,
	/// the simplest trigrams - sequences of three consecutive letters and digits			/// the simplest trigrams - sequences of three consecutive letters and digits
	/// are extracted and returned after deduplication.			/// are extracted and returned after deduplication.
	///			///
	Show All 10 Lines

clangd/index/dex/Trigram.cpp

	Show All 17 Lines
	#include <string>			#include <string>

	using namespace llvm;			using namespace llvm;

	namespace clang {			namespace clang {
	namespace clangd {			namespace clangd {
	namespace dex {			namespace dex {

	/// This is used to mark unigrams and bigrams and distinct them from complete
	/// trigrams. Since '$' is not present in valid identifier names, it is safe to
	/// use it as the special symbol.
	static const char END_MARKER = '$';

	std::vector<Token> generateIdentifierTrigrams(llvm::StringRef Identifier) {			std::vector<Token> generateIdentifierTrigrams(llvm::StringRef Identifier) {
	// Apply fuzzy matching text segmentation.			// Apply fuzzy matching text segmentation.
	std::vector<CharRole> Roles(Identifier.size());			std::vector<CharRole> Roles(Identifier.size());
	calculateRoles(Identifier,			calculateRoles(Identifier,
	llvm::makeMutableArrayRef(Roles.data(), Identifier.size()));			llvm::makeMutableArrayRef(Roles.data(), Identifier.size()));

	std::string LowercaseIdentifier = Identifier.lower();			std::string LowercaseIdentifier = Identifier.lower();

	// For each character, store indices of the characters to which fuzzy matching			// For each character, store indices of the characters to which fuzzy matching
	// algorithm can jump. There are 3 possible variants:			// algorithm can jump. There are 3 possible variants:
	//			//
	// * Next Tail - next character from the same segment			// * Next Tail - next character from the same segment
	// * Next Head - front character of the next segment			// * Next Head - front character of the next segment
	// * Skip-1-Next Head - front character of the skip-1-next segment			// * Skip-1-Next Head - front character of the skip-1-next segment
	//			//
	// Next stores tuples of three indices in the presented order, if a variant is			// Next stores tuples of three indices in the presented order, if a variant is
	// not available then 0 is stored.			// not available then 0 is stored.
	std::vector<std::array<unsigned, 3>> Next(LowercaseIdentifier.size());			std::vector<std::array<unsigned, 3>> Next(LowercaseIdentifier.size());
	unsigned NextTail = 0, NextHead = 0, NextNextHead = 0;			unsigned NextTail = 0, NextHead = 0, NextNextHead = 0;
	// Store two first HEAD characters in the identifier (if present).
	std::deque<char> TwoHeads;
	for (int I = LowercaseIdentifier.size() - 1; I >= 0; --I) {			for (int I = LowercaseIdentifier.size() - 1; I >= 0; --I) {
	Next[I] = {{NextTail, NextHead, NextNextHead}};			Next[I] = {{NextTail, NextHead, NextNextHead}};
	NextTail = Roles[I] == Tail ? I : 0;			NextTail = Roles[I] == Tail ? I : 0;
	if (Roles[I] == Head) {			if (Roles[I] == Head) {
	NextNextHead = NextHead;			NextNextHead = NextHead;
	NextHead = I;			NextHead = I;
	TwoHeads.push_front(LowercaseIdentifier[I]);
	if (TwoHeads.size() > 2)
	TwoHeads.pop_back();
	}			}
	}			}

	DenseSet<Token> UniqueTrigrams;			DenseSet<Token> UniqueTrigrams;

	auto add = [&](std::string Chars) {			auto Add = [&](std::string Chars) {
	UniqueTrigrams.insert(Token(Token::Kind::Trigram, Chars));			UniqueTrigrams.insert(Token(Token::Kind::Trigram, Chars));
	};			};

	if (TwoHeads.size() == 2)			// Iterate through valid sequneces of three characters Fuzzy Matcher can
	add({{TwoHeads.front(), TwoHeads.back(), END_MARKER}});

	if (!LowercaseIdentifier.empty())
	add({{LowercaseIdentifier.front(), END_MARKER, END_MARKER}});

	if (LowercaseIdentifier.size() >= 2)
	add({{LowercaseIdentifier[0], LowercaseIdentifier[1], END_MARKER}});

	if (LowercaseIdentifier.size() >= 3)
	add({{LowercaseIdentifier[0], LowercaseIdentifier[1],
	LowercaseIdentifier[2]}});

	// Iterate through valid seqneces of three characters Fuzzy Matcher can
	// process.			// process.
	for (size_t I = 0; I < LowercaseIdentifier.size(); ++I) {			for (size_t I = 0; I < LowercaseIdentifier.size(); ++I) {
	// Skip delimiters.			// Skip delimiters.
	if (Roles[I] != Head && Roles[I] != Tail)			if (Roles[I] != Head && Roles[I] != Tail)
	continue;			continue;
	for (const unsigned J : Next[I]) {			for (const unsigned J : Next[I]) {
	if (J == 0)			if (J == 0)
	continue;			continue;
	for (const unsigned K : Next[J]) {			for (const unsigned K : Next[J]) {
	if (K == 0)			if (K == 0)
	continue;			continue;
	add({{LowercaseIdentifier[I], LowercaseIdentifier[J],			Add({{LowercaseIdentifier[I], LowercaseIdentifier[J],
	LowercaseIdentifier[K]}});			LowercaseIdentifier[K]}});
	}			}
	}			}
	}			}
				// Emit short-query trigrams: FooBar -> f, fo, fb.
				if (!LowercaseIdentifier.empty())
				Add({LowercaseIdentifier[0]});
				if (LowercaseIdentifier.size() >= 2)
				Add({LowercaseIdentifier[0], LowercaseIdentifier[1]});
				for (size_t I = 1; I < LowercaseIdentifier.size(); ++I)
				if (Roles[I] == Head) {
				Add({LowercaseIdentifier[0], LowercaseIdentifier[I]});
				break;
				}

	std::vector<Token> Result;			return {UniqueTrigrams.begin(), UniqueTrigrams.end()};
	for (const auto &Trigram : UniqueTrigrams)
	Result.push_back(Trigram);

	return Result;
	}			}

	std::vector<Token> generateQueryTrigrams(llvm::StringRef Query) {			std::vector<Token> generateQueryTrigrams(llvm::StringRef Query) {
				std::string LowercaseQuery = Query.lower();
				if (Query.size() < 3) // short-query trigrams only
				return {Token(Token::Kind::Trigram, LowercaseQuery)};

	// Apply fuzzy matching text segmentation.			// Apply fuzzy matching text segmentation.
	std::vector<CharRole> Roles(Query.size());			std::vector<CharRole> Roles(Query.size());
	calculateRoles(Query, llvm::makeMutableArrayRef(Roles.data(), Query.size()));			calculateRoles(Query, llvm::makeMutableArrayRef(Roles.data(), Query.size()));

	// Additional pass is necessary to count valid identifier characters.
	// Depending on that, this function might return incomplete trigram.
	unsigned ValidSymbolsCount = 0;
	for (const auto Role : Roles)
	if (Role == Head \|\| Role == Tail)
	++ValidSymbolsCount;

	std::string LowercaseQuery = Query.lower();

	DenseSet<Token> UniqueTrigrams;			DenseSet<Token> UniqueTrigrams;
				std::string Chars;
	// If the number of symbols which can form fuzzy matching trigram is not			for (unsigned I = 0; I < Query.size(); ++I) {
	// sufficient, generate a single incomplete trigram for query.
	if (ValidSymbolsCount < 3) {
	std::string Chars =
	LowercaseQuery.substr(0, std::min<size_t>(3UL, Query.size()));
	Chars.append(3 - Chars.size(), END_MARKER);
	UniqueTrigrams.insert(Token(Token::Kind::Trigram, Chars));
	} else {
	std::deque<char> Chars;
	for (size_t I = 0; I < LowercaseQuery.size(); ++I) {
	// If current symbol is delimiter, just skip it.
	if (Roles[I] != Head && Roles[I] != Tail)			if (Roles[I] != Head && Roles[I] != Tail)
	continue;			continue; // Skip delimiters.

	Chars.push_back(LowercaseQuery[I]);			Chars.push_back(LowercaseQuery[I]);

	if (Chars.size() > 3)			if (Chars.size() > 3)
	Chars.pop_front();			Chars.erase(Chars.begin());
				if (Chars.size() == 3)
	if (Chars.size() == 3) {			UniqueTrigrams.insert(Token(Token::Kind::Trigram, Chars));
	UniqueTrigrams.insert(
	Token(Token::Kind::Trigram, std::string(begin(Chars), end(Chars))));
	}
	}
	}			}

	std::vector<Token> Result;			return {UniqueTrigrams.begin(), UniqueTrigrams.end()};
	for (const auto &Trigram : UniqueTrigrams)
	Result.push_back(Trigram);

	return Result;
	}			}

	} // namespace dex			} // namespace dex
	} // namespace clangd			} // namespace clangd
	} // namespace clang			} // namespace clang

unittests/clangd/DexTests.cpp

Show First 20 Lines • Show All 361 Lines • ▼ Show 20 Lines

testing::Matcher<std::vector<Token>>		testing::Matcher<std::vector<Token>>
trigramsAre(std::initializer_list<std::string> Trigrams) {		trigramsAre(std::initializer_list<std::string> Trigrams) {
return tokensAre(Trigrams, Token::Kind::Trigram);		return tokensAre(Trigrams, Token::Kind::Trigram);
}		}

TEST(DexTrigrams, IdentifierTrigrams) {		TEST(DexTrigrams, IdentifierTrigrams) {
EXPECT_THAT(generateIdentifierTrigrams("X86"),		EXPECT_THAT(generateIdentifierTrigrams("X86"),
trigramsAre({"x86", "x$$", "x8$"}));		trigramsAre({"x86", "x", "x8"}));

EXPECT_THAT(generateIdentifierTrigrams("nl"), trigramsAre({"nl$", "n$$"}));		EXPECT_THAT(generateIdentifierTrigrams("nl"), trigramsAre({"nl", "n"}));

EXPECT_THAT(generateIdentifierTrigrams("n"), trigramsAre({"n$$"}));		EXPECT_THAT(generateIdentifierTrigrams("n"), trigramsAre({"n"}));

EXPECT_THAT(generateIdentifierTrigrams("clangd"),		EXPECT_THAT(generateIdentifierTrigrams("clangd"),
trigramsAre({"c$$", "cl$", "cla", "lan", "ang", "ngd"}));		trigramsAre({"c", "cl", "cla", "lan", "ang", "ngd"}));

EXPECT_THAT(generateIdentifierTrigrams("abc_def"),		EXPECT_THAT(generateIdentifierTrigrams("abc_def"),
trigramsAre({"a$$", "abc", "abd", "ade", "bcd", "bde", "cde",		trigramsAre({"a", "ab", "ad", "abc", "abd", "ade", "bcd", "bde",
"def", "ab$", "ad$"}));		"cde", "def"}));

EXPECT_THAT(generateIdentifierTrigrams("a_b_c_d_e_"),		EXPECT_THAT(generateIdentifierTrigrams("a_b_c_d_e_"),
trigramsAre({"a$$", "a_$", "a_b", "abc", "abd", "acd", "ace",		trigramsAre({"a", "a_", "ab", "abc", "abd", "acd", "ace", "bcd",
"bcd", "bce", "bde", "cde", "ab$"}));		"bce", "bde", "cde"}));

EXPECT_THAT(generateIdentifierTrigrams("unique_ptr"),		EXPECT_THAT(generateIdentifierTrigrams("unique_ptr"),
trigramsAre({"u$$", "uni", "unp", "upt", "niq", "nip", "npt",		trigramsAre({"u", "un", "up", "uni", "unp", "upt", "niq", "nip",
"iqu", "iqp", "ipt", "que", "qup", "qpt", "uep",		"npt", "iqu", "iqp", "ipt", "que", "qup", "qpt",
"ept", "ptr", "un$", "up$"}));		"uep", "ept", "ptr"}));

EXPECT_THAT(		EXPECT_THAT(
generateIdentifierTrigrams("TUDecl"),		generateIdentifierTrigrams("TUDecl"),
trigramsAre({"t$$", "tud", "tde", "ude", "dec", "ecl", "tu$", "td$"}));		trigramsAre({"t", "tu", "td", "tud", "tde", "ude", "dec", "ecl"}));

EXPECT_THAT(generateIdentifierTrigrams("IsOK"),		EXPECT_THAT(generateIdentifierTrigrams("IsOK"),
trigramsAre({"i$$", "iso", "iok", "sok", "is$", "io$"}));		trigramsAre({"i", "is", "io", "iso", "iok", "sok"}));

		auto X = generateIdentifierTrigrams("abc_defGhij__klm");
		ioericUnsubmitted Done Reply Inline Actions nit: remove? ioeric: nit: remove?
EXPECT_THAT(		EXPECT_THAT(
generateIdentifierTrigrams("abc_defGhij__klm"),		generateIdentifierTrigrams("abc_defGhij__klm"),
trigramsAre({"a$$", "abc", "abd", "abg", "ade", "adg", "adk", "agh",		trigramsAre({"a", "ab", "ad", "abc", "abd", "abg", "ade", "adg",
"agk", "bcd", "bcg", "bde", "bdg", "bdk", "bgh", "bgk",		"adk", "agh", "agk", "bcd", "bcg", "bde", "bdg", "bdk",
"cde", "cdg", "cdk", "cgh", "cgk", "def", "deg", "dek",		"bgh", "bgk", "cde", "cdg", "cdk", "cgh", "cgk", "def",
"dgh", "dgk", "dkl", "efg", "efk", "egh", "egk", "ekl",		"deg", "dek", "dgh", "dgk", "dkl", "efg", "efk", "egh",
"fgh", "fgk", "fkl", "ghi", "ghk", "gkl", "hij", "hik",		"egk", "ekl", "fgh", "fgk", "fkl", "ghi", "ghk", "gkl",
"hkl", "ijk", "ikl", "jkl", "klm", "ab$", "ad$"}));		"hij", "hik", "hkl", "ijk", "ikl", "jkl", "klm"}));
}		}

TEST(DexTrigrams, QueryTrigrams) {		TEST(DexTrigrams, QueryTrigrams) {
EXPECT_THAT(generateQueryTrigrams("c"), trigramsAre({"c$$"}));		EXPECT_THAT(generateQueryTrigrams("c"), trigramsAre({"c"}));
EXPECT_THAT(generateQueryTrigrams("cl"), trigramsAre({"cl$"}));		EXPECT_THAT(generateQueryTrigrams("cl"), trigramsAre({"cl"}));
EXPECT_THAT(generateQueryTrigrams("cla"), trigramsAre({"cla"}));		EXPECT_THAT(generateQueryTrigrams("cla"), trigramsAre({"cla"}));

EXPECT_THAT(generateQueryTrigrams("_"), trigramsAre({"_$$"}));		EXPECT_THAT(generateQueryTrigrams("_"), trigramsAre({"_"}));
EXPECT_THAT(generateQueryTrigrams("__"), trigramsAre({"__$"}));		EXPECT_THAT(generateQueryTrigrams("__"), trigramsAre({"__"}));
EXPECT_THAT(generateQueryTrigrams("___"), trigramsAre({"___"}));		EXPECT_THAT(generateQueryTrigrams("___"), trigramsAre({}));

EXPECT_THAT(generateQueryTrigrams("X86"), trigramsAre({"x86"}));		EXPECT_THAT(generateQueryTrigrams("X86"), trigramsAre({"x86"}));

EXPECT_THAT(generateQueryTrigrams("clangd"),		EXPECT_THAT(generateQueryTrigrams("clangd"),
trigramsAre({"cla", "lan", "ang", "ngd"}));		trigramsAre({"cla", "lan", "ang", "ngd"}));

EXPECT_THAT(generateQueryTrigrams("abc_def"),		EXPECT_THAT(generateQueryTrigrams("abc_def"),
trigramsAre({"abc", "bcd", "cde", "def"}));		trigramsAre({"abc", "bcd", "cde", "def"}));
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	auto I = Dex::build(
RefSlab(), URISchemes);		RefSlab(), URISchemes);
FuzzyFindRequest Req;		FuzzyFindRequest Req;
Req.Query = "lol";		Req.Query = "lol";
Req.Limit = 2;		Req.Limit = 2;
EXPECT_THAT(match(*I, Req),		EXPECT_THAT(match(*I, Req),
UnorderedElementsAre("LaughingOutLoud", "LittleOldLady"));		UnorderedElementsAre("LaughingOutLoud", "LittleOldLady"));
}		}

		// TODO(sammccall): enable after D52796 bugfix.
		#if 0
		TEST(DexTest, ShortQuery) {
		auto I =
		Dex::build(generateSymbols({"OneTwoThreeFour"}), RefSlab(), URISchemes);
		FuzzyFindRequest Req;
		bool Incomplete;

		EXPECT_THAT(match(*I, Req, &Incomplete), ElementsAre("OneTwoThreeFour"));
		EXPECT_FALSE(Incomplete) << "Empty string is not a short query";

		Req.Query = "t";
		EXPECT_THAT(match(*I, Req, &Incomplete), ElementsAre());
		EXPECT_TRUE(Incomplete) << "Short queries have different semantics";

		Req.Query = "tt";
		EXPECT_THAT(match(*I, Req, &Incomplete), ElementsAre());
		EXPECT_TRUE(Incomplete) << "Short queries have different semantics";

		Req.Query = "ttf";
		EXPECT_THAT(match(*I, Req, &Incomplete), ElementsAre("OneTwoThreeFour"));
		EXPECT_FALSE(Incomplete) << "3-char string is not a short query";
		}
		#endif

TEST(DexTest, MatchQualifiedNamesWithoutSpecificScope) {		TEST(DexTest, MatchQualifiedNamesWithoutSpecificScope) {
auto I = Dex::build(generateSymbols({"a::y1", "b::y2", "y3"}), RefSlab(),		auto I = Dex::build(generateSymbols({"a::y1", "b::y2", "y3"}), RefSlab(),
URISchemes);		URISchemes);
FuzzyFindRequest Req;		FuzzyFindRequest Req;
Req.Query = "y";		Req.Query = "y";
EXPECT_THAT(match(*I, Req), UnorderedElementsAre("a::y1", "b::y2", "y3"));		EXPECT_THAT(match(*I, Req), UnorderedElementsAre("a::y1", "b::y2", "y3"));
}		}

▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines