This patch introduces LIMIT iterator, which is very important for improving the quality of search query. LIMIT iterators can be applied on top of BOOST iterators to prevent populating query request with a huge number of low-quality symbols.
Details
Diff Detail
Event Timeline
Patch is in the preview mode, I will add documentation and more tests before opening a review.
Since it's bundled with the BOOST iterator and doesn't make too much sense without it, I should probable rebase on top of D50970 and add it as the parent revision.
Use better wording for internal LimitIterator documentation, perform minor code cleanup.
first few comments, sorry I'll need to come back to this.
clang-tools-extra/clangd/index/dex/Iterator.h | ||
---|---|---|
89–90 | Update documentation. Informs the iterator that the current document was consumed, and returns its boost. The rest of the doc seems a bit to far in the details, it's hard to follow. If this iterator has any child iterators that contain the document, consume() should be called on those and their boosts incorporated. consume() must *not* be called on children that don't contain the current doc. | |
90–97 | There's two possible formulations here:
We need to be explicit about which. It seems you want the second (based on offline discussion). In this case we should either assert that the DocID matches peek(), or just drop the DocID parameter and use peek() instead. (If this later means we need to optimize peek, so be it). | |
169 | This comment is accurate, but maybe a bit too concise as what's going on here is a little subtle. Returns LIMIT iterator, which yields up to N elements of its child iterator. Elements only count towards the limit if they are part of the final result set. Therefore the following iterator (AND (2) (LIMIT (1 2) 1)) yields (2), not (). |
clang-tools-extra/clangd/index/dex/Iterator.cpp | ||
---|---|---|
109 | this isn't possible, as the item being consumed is the value of peek(). | |
227 | and again here | |
341 | if condition is always true | |
348 | ISTM this should show the immutable state at the start of the query, rather than the mutated state? e.g. (LIMIT 5(3) <child>) | |
358 | I think you can remove the Limit parameter now, replacing it with an outer Limit iterator | |
362 | you can inline this now | |
clang-tools-extra/unittests/clangd/DexIndexTests.cpp | ||
267 | Is this testing the "limit" parameter to consume, or the limit iterator? (As noted above, I think you can remove the limit parameter) |
Update documentation.
The rest of the doc seems a bit to far in the details, it's hard to follow.
Maybe just