This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang-tools-extra/trunk/
-
trunk/
-
clangd/
-
CMakeLists.txt
-
FuzzyMatch.h
-
FuzzyMatch.cpp
-
unittests/clangd/
-
clangd/
-
CMakeLists.txt
-
FuzzyMatchTests.cpp

Differential D40060

[clangd] Fuzzy match scorer
ClosedPublic

Authored by sammccall on Nov 14 2017, 5:37 PM.

Download Raw Diff

Details

Reviewers

ilya-biryukov

Commits

rG87496417ff42: [clangd] Fuzzy match scorer
rCTE319557: [clangd] Fuzzy match scorer
rL319557: [clangd] Fuzzy match scorer

Summary

This will be used for rescoring code completion results based on partial
identifiers.
Short-term use:

we want to limit the number of code completion results returned to improve performance of global completion. The scorer will be used to rerank the results to return when the user has applied a filter.

Long-term use case:

ranking of completion results from in-memory index
merging of completion results from multiple sources (merging usually works best when done at the component-score level, rescoring the fuzzy-match quality avoids different backends needing to have comparable scores)

Diff Detail

Repository: rL LLVM

Event Timeline

sammccall created this revision.Nov 14 2017, 5:37 PM

Herald added a subscriber: mgorny. · View Herald TranscriptNov 14 2017, 5:37 PM

clang-format

Harbormaster completed remote builds in B12190: Diff 122955.Nov 14 2017, 5:39 PM

Harbormaster completed remote builds in B12191: Diff 122956.

Trim memory usage and add comments.

ioeric added a subscriber: ioeric.Nov 15 2017, 4:25 AM

klimek added a subscriber: klimek.Nov 23 2017, 1:41 AM

klimek added inline comments.

clangd/FuzzyMatch.cpp
69 ↗	(On Diff #122986)	Why .5?
88 ↗	(On Diff #122986)	Why 2 * NPat?
92 ↗	(On Diff #122986)	Do you mean "part of the Head or Tail"? Also, explain that these are the CharRoles. A reader reads this first, and will search for what CharRole means in the code later. CharRole is defined in a different file, without comments, so figuring out how that all relates is super hard :)
101–103 ↗	(On Diff #122986)	I think this is the only place where the roles are hinted at. Explain what roles mean and what we need them for.
108 ↗	(On Diff #122986)	I'd spell out the numbers, as they are important (here and for CharRole).
110 ↗	(On Diff #122986)	Finding bugs in these will be hard :)
120 ↗	(On Diff #122986)	Can you expand in the comment why this works for utf-8?
137 ↗	(On Diff #122986)	The body of this needs more comments on what it does. I can slowly figure it out by doing bit math, but it should be spelled out what's expected to be in each value at each point.
207 ↗	(On Diff #122986)	Perhaps add assert(LPat[P] == LWord[W]);
212 ↗	(On Diff #122986)	Why does P == W imply that?
215 ↗	(On Diff #122986)	This is the first time the term "asserted word break" shows up, perhaps explain this when explaining the roles.
218 ↗	(On Diff #122986)	The previous what didn't match?
clangd/FuzzyMatch.h
31 ↗	(On Diff #122986)	Document that patterns larger than MaxPat will be silently cut.
39 ↗	(On Diff #122986)	I find most of the abbreviations here non-intuitive, and thus needing comments (Pat I get is for pattern :) N - what does it mean? Number of characters in the pattern? I'd use Length instead. LPat and LWord (no idea what L could stand for).
50 ↗	(On Diff #122986)	I'd use a StringRef instead, and call the storage *Storage or something.
60 ↗	(On Diff #122986)	Comment that this is not actually used inside the algorithm, just for debugging.

Addressing review comments and generally improving comments.

Harbormaster completed remote builds in B12431: Diff 124096.Nov 23 2017, 8:55 AM

Thanks for the review, and sorry for the subtlety of the code and sparse comments.
It should be a little better now, please let me know which parts aren't clear enough.

clangd/FuzzyMatch.cpp
69 ↗	(On Diff #122986)	The .5 and the 2 are the same thing. Extracted a constant with a comment.
92 ↗	(On Diff #122986)	Rewrote this section and added more comments. CharRole is defined here now. Each character in a segment that isn't the Head is the Tail. It's a bit of a stretch, but it's short and evocative and (now) explained with examples.
110 ↗	(On Diff #122986)	Ack :-(
120 ↗	(On Diff #122986)	Done. It doesn't really "work", so much as we just give up...
212 ↗	(On Diff #122986)	Every pattern character must match in order, so a match with P < W is impossible, and P == W means the match is perfect so far. (Also explained in the comment)
215 ↗	(On Diff #122986)	Rephrased the comment here to use the familiar terminology.
clangd/FuzzyMatch.h
39 ↗	(On Diff #122986)	Comments added throughout. N - what does it mean? Number of characters in the pattern? I'd use Length instead. `WordLength` is too long for something so common I think, these each have >20 refs. Changed `NWord` -> `WordN` which I think reads better - `Word` and `WordN` form a nice pair. (N rather than L for length because of confusion with Lower) Changed `LWord` to `LowWord` etc.
50 ↗	(On Diff #122986)	What's the goal here? I have a couple of objections to this: if you actually use StringRef[] to access the data, now you've got a gratuitous indirection everywhere For `LPat`/`LWord` too? Now we have more members than in the first place, and two ways to write each bounds check. If the intent is to clean up the places where I construct `StringRef(Word, NWord)` explicitly, adding `StringRef word()` would certainly make sense.

inspirer added a subscriber: inspirer.Nov 27 2017, 9:18 AM

inspirer added inline comments.

clangd/FuzzyMatch.cpp
254 ↗	(On Diff #124096)	You need a third boolean dimension in your DP table for this condition to work - "matches". Consider matching "Abde" against "AbdDe". The result should be [Ab]d[De] and not [Abd]D[e]. While evaluating Abd against AbdD, you will have to choose between two ways to represent the match and no matter what you choose, scoring in this line will not know whether your previous char matched, since you merged two branches and kept only one of them. This scoring works OK-ish since you check "if (Diag >= Left)" above and so you Matched table is full of trues, but you matches will gravitate towards the ends of the candidate string if you decide to show them in the UI.

ilya-biryukov added inline comments.Nov 29 2017, 3:08 AM

clangd/FuzzyMatch.cpp
118 ↗	(On Diff #124096)	I'm not sure if we care, but maybe we should treat `+`, `-` and other symbols that could be in operator names (e.g., `operator +`) differently for C++. Could also make sense for other languages with overloaded operators.
244 ↗	(On Diff #124096)	Maybe use `P == 0` instead? It's equivalent, but a bit easier to read if you think of `P` as an offset. Totally subjective, though, it's fine to have it either way.
245 ↗	(On Diff #124096)	Does it mean I will get no matches in the following situation? `Items = [printf, scanf]` `Pattern = f` It may be a bit confusing, since I do have a match, even though is terrible and it's ok to put those items very low in the list. A more real example is: `Items = [fprintf, fscanf]` `Pattern = print` Would `fprintf` match in that case? I think it should. Another important one: `Items = [istream, ostream]` `Pattern = stream`
clangd/FuzzyMatch.h
31 ↗	(On Diff #124096)	Maybe move the truncating logic into the clients? The users of this code are certainly better suited to report warnings/reject requests that are too large.
53 ↗	(On Diff #124096)	Maybe we could split the data we store into two sections: Pattern-specific data. Initialized on construction, never changed later. Per-match data. Initialized per `match()` call. Otherwise it is somewhat hard to certify whether everything is being initialized properly.

added more VSCode tests, and made test assert matched characters. This uncovered algorithm problems
cache now includes "did previous character match" in the key (scoring depends on this, so we gave incorrect results)
added a penalty for non-consecutive matches
first character matching inside a segment downgraded from a ban to a penalty This allows [stream] to match "istream"
don't award case bonuses if the query is all lowercase. This helps matches like [ccm] -> [c]ode[C]ompletec[m] compete with [c]odeComplete[cm]

Thanks @ilya-biryukov, @inspirer, @klimek for the helpful comments!

I've addressed hopefully the most important and added more rigorous testing.
Sorry for the large delta, the most invasive change was of course adding the extra dimension to the scoring table. (Which fixed a bunch of problems)

clangd/FuzzyMatch.cpp
118 ↗	(On Diff #124096)	You might be right, but in the absence of concrete problems I think treating them as punctuation is actually the most conservative thing to do. E.g. matching [op=] against "operator=" gets big penalties if we treat '=' as Lower, and treating it as Upper seems likely to have other weird effects... Punctuation/separators are treated pretty neutrally.
245 ↗	(On Diff #124096)	Done. VSCode will filter these out, but I agree these are important and don't seem to cause problems.
254 ↗	(On Diff #124096)	Thank you for this! Fixed. The naming around Scores/ScoreInfo is a bit clumsy, happy to take suggestions :-( I've also made all our tests assert the exact characters matched. We don't have an API or need this feature, but it makes the tests detect a lot more misbehavior that's hard to capture otherwise.
clangd/FuzzyMatch.h
53 ↗	(On Diff #124096)	This hides the parallels between the Pattern and Word data, I'm not sure I like it better overall. I've added a comment describing this split, reordered some variables, and renamed IsSubstring to WordContainsPattern, which I think clarifies this a bit. WDYT?

LGTM.

clangd/FuzzyMatch.h
53 ↗	(On Diff #124096)	I'd prefer grouping the fields by their lifetime in that case, because it makes certifying that everything was properly initialized easier. Which is especially a big deal when changing code to avoid silly initialization-related bugs. Grouping by meaning also makes lots of sense, of course, but logical relations are only hard to grasp when reading the code and don't usually cause subtle bugs when rewriting the code. And proper comments allow to reintroduce those logical parallels. But that could be accounted to my personal preference, so feel free to leave the code as is. Just wanted to clarify my point a bit more.

This revision is now accepted and ready to land.Dec 1 2017, 8:07 AM

I'd broken the scoring scale with the last few tweaks:

The harsh pattern-split penalty was driving too many decent matches to 0 score
The case-insensitive change resulted in some perfect prefix matches not getting perfect scores

Added tweaks to address these. Match quality is now 0-3, with default being 1.
Happy to make followup changes, but this seems unlikely to be controversial :-)

clangd/FuzzyMatch.h
53 ↗	(On Diff #124096)	Makes sense. I've split the fields as you suggest, it also reads well.

Closed by commit rL319557: [clangd] Fuzzy match scorer (authored by sammccall). · Explain WhyDec 1 2017, 9:08 AM

This revision was automatically updated to reflect the committed changes.

sammccall marked an inline comment as done.

Revision Contents

Path

Size

clang-tools-extra/

trunk/

clangd/

CMakeLists.txt

1 line

FuzzyMatch.h

84 lines

FuzzyMatch.cpp

373 lines

unittests/

clangd/

CMakeLists.txt

1 line

FuzzyMatchTests.cpp

252 lines

Diff 125159

clang-tools-extra/trunk/clangd/CMakeLists.txt

	set(LLVM_LINK_COMPONENTS			set(LLVM_LINK_COMPONENTS
	Support			Support
	)			)

	add_clang_library(clangDaemon			add_clang_library(clangDaemon
	ClangdLSPServer.cpp			ClangdLSPServer.cpp
	ClangdServer.cpp			ClangdServer.cpp
	ClangdUnit.cpp			ClangdUnit.cpp
	ClangdUnitStore.cpp			ClangdUnitStore.cpp
	DraftStore.cpp			DraftStore.cpp
				FuzzyMatch.cpp
	GlobalCompilationDatabase.cpp			GlobalCompilationDatabase.cpp
	JSONExpr.cpp			JSONExpr.cpp
	JSONRPCDispatcher.cpp			JSONRPCDispatcher.cpp
	Logger.cpp			Logger.cpp
	Protocol.cpp			Protocol.cpp
	ProtocolHandlers.cpp			ProtocolHandlers.cpp
	Trace.cpp			Trace.cpp

	Show All 19 Lines

clang-tools-extra/trunk/clangd/FuzzyMatch.h

				//===--- FuzzyMatch.h - Approximate identifier matching ---------- C++--===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements fuzzy-matching of strings against identifiers.
				// It indicates both the existence and quality of a match:
				// 'eb' matches both 'emplace_back' and 'embed', the former has a better score.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANGD_FUZZYMATCH_H
				#define LLVM_CLANG_TOOLS_EXTRA_CLANGD_FUZZYMATCH_H

				#include "llvm/ADT/Optional.h"
				#include "llvm/ADT/SmallString.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/Support/raw_ostream.h"

				namespace clang {
				namespace clangd {

				// A matcher capable of matching and scoring strings against a single pattern.
				// It's optimized for matching against many strings - match() does not allocate.
				class FuzzyMatcher {
				public:
				// Characters beyond MaxPat are ignored.
				FuzzyMatcher(llvm::StringRef Pattern);

				// If Word matches the pattern, return a score in [0,1] (higher is better).
				// Characters beyond MaxWord are ignored.
				llvm::Optional<float> match(llvm::StringRef Word);

				// Dump internal state from the last match() to the stream, for debugging.
				// Returns the pattern with [] around matched characters, e.g.
				// [u_p] + "unique_ptr" --> "[u]nique[_p]tr"
				llvm::SmallString<256> dumpLast(llvm::raw_ostream &) const;

				private:
				// We truncate the pattern and the word to bound the cost of matching.
				constexpr static int MaxPat = 63, MaxWord = 127;
				enum CharRole : char; // For segmentation.
				enum CharType : char; // For segmentation.
				enum Action { Miss = 0, Match = 1 };

				bool init(llvm::StringRef Word);
				void buildGraph();
				void calculateRoles(const char Text, CharRole Out, int N);
				int skipPenalty(int W, Action Last);
				int matchBonus(int P, int W, Action Last);

				// Pattern data is initialized by the constructor, then constant.
				char Pat[MaxPat]; // Pattern data
				int PatN; // Length
				char LowPat[MaxPat]; // Pattern in lowercase
				CharRole PatRole[MaxPat]; // Pattern segmentation info
				bool CaseSensitive; // Case-sensitive match if pattern has uppercase
				float ScoreScale; // Normalizes scores for the pattern length.

				// Word data is initialized on each call to match(), mostly by init().
				char Word[MaxWord]; // Word data
				int WordN; // Length
				char LowWord[MaxWord]; // Word in lowercase
				CharRole WordRole[MaxWord]; // Word segmentation info
				bool WordContainsPattern; // Simple substring check

				// Cumulative best-match score table.
				// Boundary conditions are filled in by the constructor.
				// The rest is repopulated for each match(), by buildGraph().
				struct ScoreInfo {
				signed int Score : 15;
				Action Prev : 1;
				};
				ScoreInfo Scores[MaxPat + 1][MaxWord + 1][/* Last Action */ 2];
				};

				} // namespace clangd
				} // namespace clang

				#endif

clang-tools-extra/trunk/clangd/FuzzyMatch.cpp

				//===--- FuzzyMatch.h - Approximate identifier matching ---------- C++--===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// To check for a match between a Pattern ('u_p') and a Word ('unique_ptr'),
				// we consider the possible partial match states:
				//
				// u n i q u e _ p t r
				// +---------------------
				// \|A . . . . . . . . . .
				// u\|
				// \|. . . . . . . . . . .
				// _\|
				// \|. . . . . . . O . . .
				// p\|
				// \|. . . . . . . . . . B
				//
				// Each dot represents some prefix of the pattern being matched against some
				// prefix of the word.
				// - A is the initial state: '' matched against ''
				// - O is an intermediate state: 'u_' matched against 'unique_'
				// - B is the target state: 'u_p' matched against 'unique_ptr'
				//
				// We aim to find the best path from A->B.
				// - Moving right (consuming a word character)
				// Always legal: not all word characters must match.
				// - Moving diagonally (consuming both a word and pattern character)
				// Legal if the characters match.
				// - Moving down (consuming a pattern character) is never legal.
				// Never legal: all pattern characters must match something.
				//
				// The scoring is based on heuristics:
				// - when matching a character, apply a bonus or penalty depending on the
				// match quality (does case match, do word segments align, etc)
				// - when skipping a character, apply a penalty if it hurts the match
				// (it starts a word segment, or splits the matched region, etc)
				//
				// These heuristics require the ability to "look backward" one character, to
				// see whether it was matched or not. Therefore the dynamic-programming matrix
				// has an extra dimension (last character matched).
				// Each entry also has an additional flag indicating whether the last-but-one
				// character matched, which is needed to trace back through the scoring table
				// and reconstruct the match.
				//
				// We treat strings as byte-sequences, so only ASCII has first-class support.
				//
				// This algorithm was inspired by VS code's client-side filtering, and aims
				// to be mostly-compatible.
				//
				//===----------------------------------------------------------------------===//

				#include "FuzzyMatch.h"
				#include "llvm/ADT/Optional.h"
				#include "llvm/Support/Format.h"

				using namespace llvm;
				using namespace clang::clangd;

				const int FuzzyMatcher::MaxPat;
				const int FuzzyMatcher::MaxWord;

				static char lower(char C) { return C >= 'A' && C <= 'Z' ? C + ('a' - 'A') : C; }
				// A "negative infinity" score that won't overflow.
				// We use this to mark unreachable states and forbidden solutions.
				// Score field is 15 bits wide, min value is -2^14, we use half of that.
				static constexpr int AwfulScore = -(1 << 13);
				static bool isAwful(int S) { return S < AwfulScore / 2; }
				static constexpr int PerfectBonus = 3; // Perfect per-pattern-char score.

				FuzzyMatcher::FuzzyMatcher(StringRef Pattern)
				: PatN(std::min<int>(MaxPat, Pattern.size())), CaseSensitive(false),
				ScoreScale(float{1} / (PerfectBonus * PatN)), WordN(0) {
				memcpy(Pat, Pattern.data(), PatN);
				for (int I = 0; I < PatN; ++I) {
				LowPat[I] = lower(Pat[I]);
				CaseSensitive \|= LowPat[I] != Pat[I];
				}
				Scores[0][0][Miss] = {0, Miss};
				Scores[0][0][Match] = {AwfulScore, Miss};
				for (int P = 0; P <= PatN; ++P)
				for (int W = 0; W < P; ++W)
				for (Action A : {Miss, Match})
				Scores[P][W][A] = {AwfulScore, Miss};
				calculateRoles(Pat, PatRole, PatN);
				}

				Optional<float> FuzzyMatcher::match(StringRef Word) {
				if (!PatN)
				return 1;
				if (!(WordContainsPattern = init(Word)))
				return None;
				buildGraph();
				auto Best = std::max(Scores[PatN][WordN][Miss].Score,
				Scores[PatN][WordN][Match].Score);
				if (isAwful(Best))
				return None;
				return ScoreScale * std::min(PerfectBonus * PatN, std::max<int>(0, Best));
				}

				// Segmentation of words and patterns.
				// A name like "fooBar_baz" consists of several parts foo, bar, baz.
				// Aligning segmentation of word and pattern improves the fuzzy-match.
				// For example: [lol] matches "LaughingOutLoud" better than "LionPopulation"
				//
				// First we classify each character into types (uppercase, lowercase, etc).
				// Then we look at the sequence: e.g. [upper, lower] is the start of a segment.

				// We only distinguish the types of characters that affect segmentation.
				// It's not obvious how to segment digits, we treat them as lowercase letters.
				// As we don't decode UTF-8, we treat bytes over 127 as lowercase too.
				// This means we require exact (case-sensitive) match.
				enum FuzzyMatcher::CharType : char {
				Empty = 0, // Before-the-start and after-the-end (and control chars).
				Lower = 1, // Lowercase letters, digits, and non-ASCII bytes.
				Upper = 2, // Uppercase letters.
				Punctuation = 3, // ASCII punctuation (including Space)
				};

				// We get CharTypes from a lookup table. Each is 2 bits, 4 fit in each byte.
				// The top 6 bits of the char select the byte, the bottom 2 select the offset.
				// e.g. 'q' = 010100 01 = byte 28 (55), bits 3-2 (01) -> Lower.
				constexpr static uint8_t CharTypes[] = {
				0x00, 0x00, 0x00, 0x00, // Control characters
				0x00, 0x00, 0x00, 0x00, // Control characters
				0xff, 0xff, 0xff, 0xff, // Punctuation
				0x55, 0x55, 0xf5, 0xff, // Numbers->Lower, more Punctuation.
				0xab, 0xaa, 0xaa, 0xaa, // @ and A-O
				0xaa, 0xaa, 0xea, 0xff, // P-Z, more Punctuation.
				0x57, 0x55, 0x55, 0x55, // ` and a-o
				0x55, 0x55, 0xd5, 0x3f, // p-z, Punctuation, DEL.
				0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, // Bytes over 127 -> Lower.
				0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, // (probably UTF-8).
				0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55,
				0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55,
				};

				// Each character's Role is the Head or Tail of a segment, or a Separator.
				// e.g. XMLHttpRequest_Async
				// +--+---+------ +----
				// ^Head ^Tail ^Separator
				enum FuzzyMatcher::CharRole : char {
				Unknown = 0, // Stray control characters or impossible states.
				Tail = 1, // Part of a word segment, but not the first character.
				Head = 2, // The first character of a word segment.
				Separator = 3, // Punctuation characters that separate word segments.
				};

				// The Role can be determined from the Type of a character and its neighbors:
				//
				// Example \| Chars \| Type \| Role
				// ---------+--------------+-----
				// F(o)oBar \| Foo \| Ull \| Tail
				// Foo(B)ar \| oBa \| lUl \| Head
				// (f)oo \| ^fo \| Ell \| Head
				// H(T)TP \| HTT \| UUU \| Tail
				//
				// Our lookup table maps a 6 bit key (Prev, Curr, Next) to a 2-bit Role.
				// A byte packs 4 Roles. (Prev, Curr) selects a byte, Next selects the offset.
				// e.g. Lower, Upper, Lower -> 01 10 01 -> byte 6 (aa), bits 3-2 (10) -> Head.
				constexpr static uint8_t CharRoles[] = {
				// clang-format off
				// Curr= Empty Lower Upper Separ
				/* Prev=Empty */ 0x00, 0xaa, 0xaa, 0xff, // At start, Lower\|Upper->Head
				/* Prev=Lower */ 0x00, 0x55, 0xaa, 0xff, // In word, Upper->Head;Lower->Tail
				/* Prev=Upper */ 0x00, 0x55, 0x59, 0xff, // Ditto, but U(U)U->Tail
				/* Prev=Separ */ 0x00, 0xaa, 0xaa, 0xff, // After separator, like at start
				// clang-format on
				};

				template <typename T> static T packedLookup(const uint8_t *Data, int I) {
				return static_cast<T>((Data[I >> 2] >> ((I & 3) * 2)) & 3);
				}
				void FuzzyMatcher::calculateRoles(const char Text, CharRole Out, int N) {
				// Types holds a sliding window of (Prev, Curr, Next) types.
				// Initial value is (Empty, Empty, type of Text[0]).
				int Types = packedLookup<CharType>(CharTypes, Text[0]);
				// Rotate slides in the type of the next character.
				auto Rotate = [&](CharType T) { Types = ((Types << 2) \| T) & 0x3f; };
				for (int I = 0; I < N - 1; ++I) {
				// For each character, rotate in the next, and look up the role.
				Rotate(packedLookup<CharType>(CharTypes, Text[I + 1]));
				*Out++ = packedLookup<CharRole>(CharRoles, Types);
				}
				// For the last character, the "next character" is Empty.
				Rotate(Empty);
				*Out++ = packedLookup<CharRole>(CharRoles, Types);
				}

				// Sets up the data structures matching Word.
				// Returns false if we can cheaply determine that no match is possible.
				bool FuzzyMatcher::init(StringRef NewWord) {
				WordN = std::min<int>(MaxWord, NewWord.size());
				if (PatN > WordN)
				return false;
				memcpy(Word, NewWord.data(), WordN);
				for (int I = 0; I < WordN; ++I)
				LowWord[I] = lower(Word[I]);

				// Cheap subsequence check.
				for (int W = 0, P = 0; P != PatN; ++W) {
				if (W == WordN)
				return false;
				if (LowWord[W] == LowPat[P])
				++P;
				}

				calculateRoles(Word, WordRole, WordN);
				return true;
				}

				// The forwards pass finds the mappings of Pattern onto Word.
				// Score = best score achieved matching Word[..W] against Pat[..P].
				// Unlike other tables, indices range from 0 to N inclusive
				// Matched = whether we chose to match Word[W] with Pat[P] or not.
				//
				// Points are mostly assigned to matched characters, with 1 being a good score
				// and 3 being a great one. So we treat the score range as [0, 3 * PatN].
				// This range is not strict: we can apply larger bonuses/penalties, or penalize
				// non-matched characters.
				void FuzzyMatcher::buildGraph() {
				for (int W = 0; W < WordN; ++W) {
				Scores[0][W + 1][Miss] = {Scores[0][W][Miss].Score - skipPenalty(W, Miss),
				Miss};
				Scores[0][W + 1][Match] = {AwfulScore, Miss};
				}
				for (int P = 0; P < PatN; ++P) {
				for (int W = P; W < WordN; ++W) {
				auto &Score = Scores[P + 1][W + 1], &PreMiss = Scores[P + 1][W];

				auto MatchMissScore = PreMiss[Match].Score;
				auto MissMissScore = PreMiss[Miss].Score;
				if (P < PatN - 1) { // Skipping trailing characters is always free.
				MatchMissScore -= skipPenalty(W, Match);
				MissMissScore -= skipPenalty(W, Miss);
				}
				Score[Miss] = (MatchMissScore > MissMissScore)
				? ScoreInfo{MatchMissScore, Match}
				: ScoreInfo{MissMissScore, Miss};

				if (LowPat[P] != LowWord[W]) { // No match possible.
				Score[Match] = {AwfulScore, Miss};
				} else {
				auto &PreMatch = Scores[P][W];
				auto MatchMatchScore = PreMatch[Match].Score + matchBonus(P, W, Match);
				auto MissMatchScore = PreMatch[Miss].Score + matchBonus(P, W, Miss);
				Score[Match] = (MatchMatchScore > MissMatchScore)
				? ScoreInfo{MatchMatchScore, Match}
				: ScoreInfo{MissMatchScore, Miss};
				}
				}
				}
				}

				int FuzzyMatcher::skipPenalty(int W, Action Last) {
				int S = 0;
				if (WordRole[W] == Head) // Skipping a segment.
				S += 1;
				if (Last == Match) // Non-consecutive match.
				S += 2; // We'd rather skip a segment than split our match.
				return S;
				}

				int FuzzyMatcher::matchBonus(int P, int W, Action Last) {
				assert(LowPat[P] == LowWord[W]);
				int S = 1;
				// Bonus: pattern so far is a (case-insensitive) prefix of the word.
				if (P == W) // We can't skip pattern characters, so we must have matched all.
				++S;
				// Bonus: case matches, or a Head in the pattern aligns with one in the word.
				if ((Pat[P] == Word[W] && (CaseSensitive \|\| P == W)) \|\|
				(PatRole[P] == Head && WordRole[W] == Head))
				++S;
				// Penalty: matching inside a segment (and previous char wasn't matched).
				if (WordRole[W] == Tail && P && Last == Miss)
				S -= 3;
				// Penalty: a Head in the pattern matches in the middle of a word segment.
				if (PatRole[P] == Head && WordRole[W] == Tail)
				--S;
				// Penalty: matching the first pattern character in the middle of a segment.
				if (P == 0 && WordRole[W] == Tail)
				S -= 4;
				assert(S <= PerfectBonus);
				return S;
				}

				llvm::SmallString<256> FuzzyMatcher::dumpLast(llvm::raw_ostream &OS) const {
				llvm::SmallString<256> Result;
				OS << "=== Match \"" << StringRef(Word, WordN) << "\" against ["
				<< StringRef(Pat, PatN) << "] ===\n";
				if (!WordContainsPattern) {
				OS << "Substring check failed.\n";
				return Result;
				} else if (isAwful(std::max(Scores[PatN][WordN][Match].Score,
				Scores[PatN][WordN][Miss].Score))) {
				OS << "Substring check passed, but all matches are forbidden\n";
				}
				if (!CaseSensitive)
				OS << "Lowercase query, so scoring ignores case\n";

				// Traverse Matched table backwards to reconstruct the Pattern/Word mapping.
				// The Score table has cumulative scores, subtracting along this path gives
				// us the per-letter scores.
				Action Last =
				(Scores[PatN][WordN][Match].Score > Scores[PatN][WordN][Miss].Score)
				? Match
				: Miss;
				int S[MaxWord];
				Action A[MaxWord];
				for (int W = WordN - 1, P = PatN - 1; W >= 0; --W) {
				A[W] = Last;
				const auto &Cell = Scores[P + 1][W + 1][Last];
				if (Last == Match)
				--P;
				const auto &Prev = Scores[P + 1][W][Cell.Prev];
				S[W] = Cell.Score - Prev.Score;
				Last = Cell.Prev;
				}
				for (int I = 0; I < WordN; ++I) {
				if (A[I] == Match && (I == 0 \|\| A[I - 1] == Miss))
				Result.push_back('[');
				if (A[I] == Miss && I > 0 && A[I - 1] == Match)
				Result.push_back(']');
				Result.push_back(Word[I]);
				}
				if (A[WordN - 1] == Match)
				Result.push_back(']');

				for (char C : StringRef(Word, WordN))
				OS << " " << C << " ";
				OS << "\n";
				for (int I = 0, J = 0; I < WordN; I++)
				OS << " " << (A[I] == Match ? Pat[J++] : ' ') << " ";
				OS << "\n";
				for (int I = 0; I < WordN; I++)
				OS << format("%2d ", S[I]);
				OS << "\n";

				OS << "\nSegmentation:";
				OS << "\n'" << StringRef(Word, WordN) << "'\n ";
				for (int I = 0; I < WordN; ++I)
				OS << "?-+ "[static_cast<int>(WordRole[I])];
				OS << "\n[" << StringRef(Pat, PatN) << "]\n ";
				for (int I = 0; I < PatN; ++I)
				OS << "?-+ "[static_cast<int>(PatRole[I])];
				OS << "\n";

				OS << "\nScoring table (last-Miss, last-Match):\n";
				OS << " \| ";
				for (char C : StringRef(Word, WordN))
				OS << " " << C << " ";
				OS << "\n";
				OS << "-+----" << std::string(WordN * 4, '-') << "\n";
				for (int I = 0; I <= PatN; ++I) {
				for (Action A : {Miss, Match}) {
				OS << ((I && A == Miss) ? Pat[I - 1] : ' ') << "\|";
				for (int J = 0; J <= WordN; ++J) {
				if (!isAwful(Scores[I][J][A].Score))
				OS << format("%3d%c", Scores[I][J][A].Score,
				Scores[I][J][A].Prev == Match ? '*' : ' ');
				else
				OS << " ";
				}
				OS << "\n";
				}
				}

				return Result;
				}

clang-tools-extra/trunk/unittests/clangd/CMakeLists.txt

	set(LLVM_LINK_COMPONENTS			set(LLVM_LINK_COMPONENTS
	support			support
	)			)

	get_filename_component(CLANGD_SOURCE_DIR			get_filename_component(CLANGD_SOURCE_DIR
	${CMAKE_CURRENT_SOURCE_DIR}/../../clangd REALPATH)			${CMAKE_CURRENT_SOURCE_DIR}/../../clangd REALPATH)
	include_directories(			include_directories(
	${CLANGD_SOURCE_DIR}			${CLANGD_SOURCE_DIR}
	)			)

	add_extra_unittest(ClangdTests			add_extra_unittest(ClangdTests
	ClangdTests.cpp			ClangdTests.cpp
				FuzzyMatchTests.cpp
	JSONExprTests.cpp			JSONExprTests.cpp
	TraceTests.cpp			TraceTests.cpp
	)			)

	target_link_libraries(ClangdTests			target_link_libraries(ClangdTests
	clangBasic			clangBasic
	clangDaemon			clangDaemon
	clangFormat			clangFormat
	clangFrontend			clangFrontend
	clangSema			clangSema
	clangTooling			clangTooling
	clangToolingCore			clangToolingCore
	LLVMSupport			LLVMSupport
	)			)

clang-tools-extra/trunk/unittests/clangd/FuzzyMatchTests.cpp

				//===-- FuzzyMatchTests.cpp - String fuzzy matcher tests --------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#include "FuzzyMatch.h"

				#include "llvm/ADT/StringExtras.h"
				#include "gmock/gmock.h"
				#include "gtest/gtest.h"

				namespace clang {
				namespace clangd {
				namespace {
				using namespace llvm;
				using testing::Not;

				struct ExpectedMatch {
				ExpectedMatch(StringRef Annotated) : Word(Annotated), Annotated(Annotated) {
				for (char C : "[]")
				Word.erase(std::remove(Word.begin(), Word.end(), C), Word.end());
				}
				std::string Word;
				StringRef Annotated;
				};
				raw_ostream &operator<<(raw_ostream &OS, const ExpectedMatch &M) {
				return OS << "'" << M.Word << "' as " << M.Annotated;
				}

				struct MatchesMatcher : public testing::MatcherInterface<StringRef> {
				ExpectedMatch Candidate;
				MatchesMatcher(ExpectedMatch Candidate) : Candidate(std::move(Candidate)) {}

				void DescribeTo(::std::ostream *OS) const override {
				raw_os_ostream(*OS) << "Matches " << Candidate;
				}

				bool MatchAndExplain(StringRef Pattern,
				testing::MatchResultListener *L) const override {
				std::unique_ptr<raw_ostream> OS(
				L->stream() ? (raw_ostream )(new raw_os_ostream(L->stream()))
				: new raw_null_ostream());
				FuzzyMatcher Matcher(Pattern);
				auto Result = Matcher.match(Candidate.Word);
				auto AnnotatedMatch = Matcher.dumpLast(*OS << "\n");
				return Result && AnnotatedMatch == Candidate.Annotated;
				}
				};

				// Accepts patterns that match a given word.
				// Dumps the debug tables on match failure.
				testing::Matcher<StringRef> matches(StringRef M) {
				return testing::MakeMatcher<StringRef>(new MatchesMatcher(M));
				}

				TEST(FuzzyMatch, Matches) {
				EXPECT_THAT("u_p", matches("[u]nique[_p]tr"));
				EXPECT_THAT("up", matches("[u]nique_[p]tr"));
				EXPECT_THAT("uq", matches("[u]ni[q]ue_ptr"));
				EXPECT_THAT("qp", Not(matches("unique_ptr")));
				EXPECT_THAT("log", Not(matches("SVGFEMorphologyElement")));

				EXPECT_THAT("tit", matches("win.[tit]"));
				EXPECT_THAT("title", matches("win.[title]"));
				EXPECT_THAT("WordCla", matches("[Word]Character[Cla]ssifier"));
				EXPECT_THAT("WordCCla", matches("[WordC]haracter[Cla]ssifier"));

				EXPECT_THAT("dete", Not(matches("editor.quickSuggestionsDelay")));

				EXPECT_THAT("highlight", matches("editorHover[Highlight]"));
				EXPECT_THAT("hhighlight", matches("editor[H]over[Highlight]"));
				EXPECT_THAT("dhhighlight", Not(matches("editorHoverHighlight")));

				EXPECT_THAT("-moz", matches("[-moz]-foo"));
				EXPECT_THAT("moz", matches("-[moz]-foo"));
				EXPECT_THAT("moza", matches("-[moz]-[a]nimation"));

				EXPECT_THAT("ab", matches("[ab]A"));
				EXPECT_THAT("ccm", matches("[c]a[cm]elCase"));
				EXPECT_THAT("bti", Not(matches("the_black_knight")));
				EXPECT_THAT("ccm", Not(matches("camelCase")));
				EXPECT_THAT("cmcm", Not(matches("camelCase")));
				EXPECT_THAT("BK", matches("the_[b]lack_[k]night"));
				EXPECT_THAT("KeyboardLayout=", Not(matches("KeyboardLayout")));
				EXPECT_THAT("LLL", matches("SVisual[L]ogger[L]ogs[L]ist"));
				EXPECT_THAT("LLLL", Not(matches("SVilLoLosLi")));
				EXPECT_THAT("LLLL", Not(matches("SVisualLoggerLogsList")));
				EXPECT_THAT("TEdit", matches("[T]ext[Edit]"));
				EXPECT_THAT("TEdit", matches("[T]ext[Edit]or"));
				EXPECT_THAT("TEdit", matches("[Te]xte[dit]"));
				EXPECT_THAT("TEdit", matches("[t]ext_[edit]"));
				EXPECT_THAT("TEditDit", matches("[T]ext[Edit]or[D]ecorat[i]on[T]ype"));
				EXPECT_THAT("TEdit", matches("[T]ext[Edit]orDecorationType"));
				EXPECT_THAT("Tedit", matches("[T]ext[Edit]"));
				EXPECT_THAT("ba", Not(matches("?AB?")));
				EXPECT_THAT("bkn", matches("the_[b]lack_[kn]ight"));
				EXPECT_THAT("bt", matches("the_[b]lack_knigh[t]"));
				EXPECT_THAT("ccm", matches("[c]amelCase[cm]"));
				EXPECT_THAT("fdm", matches("[f]in[dM]odel"));
				EXPECT_THAT("fob", matches("[fo]o[b]ar"));
				EXPECT_THAT("fobz", Not(matches("foobar")));
				EXPECT_THAT("foobar", matches("[foobar]"));
				EXPECT_THAT("form", matches("editor.[form]atOnSave"));
				EXPECT_THAT("g p", matches("[G]it:[ P]ull"));
				EXPECT_THAT("g p", matches("[G]it:[ P]ull"));
				EXPECT_THAT("gip", matches("[Gi]t: [P]ull"));
				EXPECT_THAT("gip", matches("[Gi]t: [P]ull"));
				EXPECT_THAT("gp", matches("[G]it: [P]ull"));
				EXPECT_THAT("gp", matches("[G]it_Git_[P]ull"));
				EXPECT_THAT("is", matches("[I]mport[S]tatement"));
				EXPECT_THAT("is", matches("[is]Valid"));
				EXPECT_THAT("lowrd", matches("[low]Wo[rd]"));
				EXPECT_THAT("myvable", matches("[myva]ria[ble]"));
				EXPECT_THAT("no", Not(matches("")));
				EXPECT_THAT("no", Not(matches("match")));
				EXPECT_THAT("ob", Not(matches("foobar")));
				EXPECT_THAT("sl", matches("[S]Visual[L]oggerLogsList"));
				EXPECT_THAT("sllll", matches("[S]Visua[lL]ogger[L]ogs[L]ist"));
				EXPECT_THAT("Three", matches("H[T]ML[HRE]l[e]ment"));
				EXPECT_THAT("Three", matches("[Three]"));
				EXPECT_THAT("fo", Not(matches("barfoo")));
				EXPECT_THAT("fo", matches("bar_[fo]o"));
				EXPECT_THAT("fo", matches("bar_[Fo]o"));
				EXPECT_THAT("fo", matches("bar [fo]o"));
				EXPECT_THAT("fo", matches("bar.[fo]o"));
				EXPECT_THAT("fo", matches("bar/[fo]o"));
				EXPECT_THAT("fo", matches("bar\\[fo]o"));

				EXPECT_THAT(
				"aaaaaa",
				matches("[aaaaaa]aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
				"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));
				EXPECT_THAT("baba", Not(matches("ababababab")));
				EXPECT_THAT("fsfsfs", Not(matches("dsafdsafdsafdsafdsafdsafdsafasdfdsa")));
				EXPECT_THAT("fsfsfsfsfsfsfsf",
				Not(matches("dsafdsafdsafdsafdsafdsafdsafasdfdsafdsafdsafdsafdsfd"
				"safdsfdfdfasdnfdsajfndsjnafjndsajlknfdsa")));

				EXPECT_THAT(" g", matches("[ g]roup"));
				EXPECT_THAT("g", matches(" [g]roup"));
				EXPECT_THAT("g g", Not(matches(" groupGroup")));
				EXPECT_THAT("g g", matches(" [g]roup[ G]roup"));
				EXPECT_THAT(" g g", matches("[ ] [g]roup[ G]roup"));
				EXPECT_THAT("zz", matches("[zz]Group"));
				EXPECT_THAT("zzg", matches("[zzG]roup"));
				EXPECT_THAT("g", matches("zz[G]roup"));

				EXPECT_THAT("aaaa", matches("_a_[aaaa]")); // Prefer consecutive.
				EXPECT_THAT("printf", matches("s[printf]"));
				EXPECT_THAT("str", matches("o[str]eam"));
				}

				struct RankMatcher : public testing::MatcherInterface<StringRef> {
				std::vector<ExpectedMatch> RankedStrings;
				RankMatcher(std::initializer_list<ExpectedMatch> RankedStrings)
				: RankedStrings(RankedStrings) {}

				void DescribeTo(::std::ostream *OS) const override {
				raw_os_ostream O(*OS);
				O << "Ranks strings in order: [";
				for (const auto &Str : RankedStrings)
				O << "\n\t" << Str;
				O << "\n]";
				}

				bool MatchAndExplain(StringRef Pattern,
				testing::MatchResultListener *L) const override {
				std::unique_ptr<raw_ostream> OS(
				L->stream() ? (raw_ostream )(new raw_os_ostream(L->stream()))
				: new raw_null_ostream());
				FuzzyMatcher Matcher(Pattern);
				const ExpectedMatch *LastMatch;
				Optional<float> LastScore;
				bool Ok = true;
				for (const auto &Str : RankedStrings) {
				auto Score = Matcher.match(Str.Word);
				if (!Score) {
				*OS << "\nDoesn't match '" << Str.Word << "'";
				Matcher.dumpLast(*OS << "\n");
				Ok = false;
				} else {
				std::string Buf;
				llvm::raw_string_ostream Info(Buf);
				auto AnnotatedMatch = Matcher.dumpLast(Info);

				if (AnnotatedMatch != Str.Annotated) {
				*OS << "\nMatched " << Str.Word << " as " << AnnotatedMatch
				<< " instead of " << Str.Annotated << "\n"
				<< Info.str();
				Ok = false;
				} else if (LastScore && LastScore < Score) {
				OS << "\nRanks '" << Str.Word << "'=" << Score << " above '"
				<< LastMatch->Word << "'=" << *LastScore << "\n"
				<< Info.str();
				Matcher.match(LastMatch->Word);
				Matcher.dumpLast(*OS << "\n");
				Ok = false;
				}
				}
				LastMatch = &Str;
				LastScore = Score;
				}
				return Ok;
				}
				};

				// Accepts patterns that match all the strings and rank them in the given order.
				// Dumps the debug tables on match failure.
				template <typename... T> testing::Matcher<StringRef> ranks(T... RankedStrings) {
				return testing::MakeMatcher<StringRef>(
				new RankMatcher{ExpectedMatch(RankedStrings)...});
				}

				TEST(FuzzyMatch, Ranking) {
				EXPECT_THAT("eb", ranks("[e]mplace_[b]ack", "[e]m[b]ed"));
				EXPECT_THAT("cons",
				ranks("[cons]ole", "[Cons]ole", "ArrayBuffer[Cons]tructor"));
				EXPECT_THAT("foo", ranks("[foo]", "[Foo]"));
				EXPECT_THAT("onMess",
				ranks("[onMess]age", "[onmess]age", "[on]This[M]ega[Es]cape[s]"));
				EXPECT_THAT("CC", ranks("[C]amel[C]ase", "[c]amel[C]ase"));
				EXPECT_THAT("cC", ranks("[c]amel[C]ase", "[C]amel[C]ase"));
				EXPECT_THAT("p", ranks("[p]arse", "[p]osix", "[p]afdsa", "[p]ath", "[p]"));
				EXPECT_THAT("pa", ranks("[pa]rse", "[pa]th", "[pa]fdsa"));
				EXPECT_THAT("log", ranks("[log]", "Scroll[Log]icalPosition"));
				EXPECT_THAT("e", ranks("[e]lse", "Abstract[E]lement"));
				EXPECT_THAT("workbench.sideb",
				ranks("[workbench.sideB]ar.location",
				"[workbench.]editor.default[SideB]ySideLayout"));
				EXPECT_THAT("editor.r", ranks("[editor.r]enderControlCharacter",
				"[editor.]overview[R]ulerlanes",
				"diff[Editor.r]enderSideBySide"));
				EXPECT_THAT("-mo", ranks("[-mo]z-columns", "[-]ms-ime-[mo]de"));
				EXPECT_THAT("convertModelPosition",
				ranks("[convertModelPosition]ToViewPosition",
				"[convert]ViewTo[ModelPosition]"));
				EXPECT_THAT("is", ranks("[is]ValidViewletId", "[i]mport [s]tatement"));
				EXPECT_THAT("title", ranks("window.[title]",
				"files.[t]r[i]m[T]rai[l]ingWhit[e]space"));
				EXPECT_THAT("strcpy", ranks("[strcpy]", "[strcpy]_s", "[str]n[cpy]"));
				EXPECT_THAT("close", ranks("workbench.quickOpen.[close]OnFocusOut",
				"[c]ss.[l]int.imp[o]rt[S]tat[e]ment",
				"[c]ss.co[lo]rDecorator[s].[e]nable"));
				}

				} // namespace
				} // namespace clangd
				} // namespace clang