This patch introduces the core building block of the next-generation Clangd symbol index - Dex. Search tokens are the keys in the inverted index and represent a characteristic of a specific symbol: examples of search token types (Token Namespaces) are
- Trigrams - these are essential for unqualified symbol name fuzzy search
- Scopes for filtering the symbols by the namespace
- Paths, e.g. these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted trigram is a valid sequence for Fuzzy Matcher jumps, proposed implementation utilize existing FuzzyMatcher API for segmentation and trigram extraction.
However, trigrams generation algorithm for the query string is different from the previous one: it simply yields sequences of 3 consecutive lowercased valid characters (letters, digits).
Dex RFC in the mailing list: http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
nit: swap these sentences around? conceptual explanation first, how they're used later?
could even give an example: