This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Implement LIMIT iterator
ClosedPublic

Authored by kbobyrev on Aug 21 2018, 1:52 AM.

Details

Summary

This patch introduces LIMIT iterator, which is very important for improving the quality of search query. LIMIT iterators can be applied on top of BOOST iterators to prevent populating query request with a huge number of low-quality symbols.

Diff Detail

Repository
rL LLVM

Event Timeline

kbobyrev created this revision.Aug 21 2018, 1:52 AM
kbobyrev planned changes to this revision.Aug 21 2018, 1:53 AM

Patch is in the preview mode, I will add documentation and more tests before opening a review.

kbobyrev updated this revision to Diff 161677.Aug 21 2018, 2:56 AM

Add comprehensive tests, improve documentation.

kbobyrev planned changes to this revision.Aug 21 2018, 2:57 AM

Since it's bundled with the BOOST iterator and doesn't make too much sense without it, I should probable rebase on top of D50970 and add it as the parent revision.

kbobyrev updated this revision to Diff 161947.Aug 22 2018, 6:59 AM
kbobyrev edited the summary of this revision. (Show Details)
kbobyrev added a reviewer: sammccall.

Rebase on top of BOOST iterator patch, update tests.

kbobyrev updated this revision to Diff 161952.Aug 22 2018, 7:22 AM

Use better wording for internal LimitIterator documentation, perform minor code cleanup.

first few comments, sorry I'll need to come back to this.

clang-tools-extra/clangd/index/dex/Iterator.h
90 ↗(On Diff #161952)

Update documentation.

Informs the iterator that the current document was consumed, and returns its boost.

The rest of the doc seems a bit to far in the details, it's hard to follow.
Maybe just

If this iterator has any child iterators that contain the document, consume()
should be called on those and their boosts incorporated.
consume() must *not* be called on children that don't contain the current doc.
96 ↗(On Diff #161952)

There's two possible formulations here:

  • the DocID parameter is used by the Iterator to determine whether to actually decrement its limit (do we actually have that doc?)
  • the DocID parameter is redundant, as the parent should only call consume() on children that have the doc.

We need to be explicit about which. It seems you want the second (based on offline discussion). In this case we should either assert that the DocID matches peek(), or just drop the DocID parameter and use peek() instead. (If this later means we need to optimize peek, so be it).

168 ↗(On Diff #161952)

This comment is accurate, but maybe a bit too concise as what's going on here is a little subtle.
Maybe expand a little like

Returns LIMIT iterator, which yields up to N elements of its child iterator.
Elements only count towards the limit if they are part of the final result set.
Therefore the following iterator (AND (2) (LIMIT (1 2) 1)) yields (2), not ().
kbobyrev updated this revision to Diff 162183.Aug 23 2018, 7:46 AM
kbobyrev marked 3 inline comments as done.

Address a round comments from Sam.

sammccall added inline comments.Aug 24 2018, 1:15 AM
clang-tools-extra/clangd/index/dex/Iterator.cpp
109 ↗(On Diff #162183)

this isn't possible, as the item being consumed is the value of peek().
You could assert !reachedEnd() or just delete this.

227 ↗(On Diff #162183)

and again here

341 ↗(On Diff #162183)

if condition is always true
you could replace the if with an assertion, but the child will assert, so better just to remove it.
replace if with assertion

348 ↗(On Diff #162183)

ISTM this should show the immutable state at the start of the query, rather than the mutated state?
Or if you really think it's useful, both.

e.g. (LIMIT 5(3) <child>)

358 ↗(On Diff #162183)

I think you can remove the Limit parameter now, replacing it with an outer Limit iterator

362 ↗(On Diff #162183)

you can inline this now

clang-tools-extra/unittests/clangd/DexIndexTests.cpp
267 ↗(On Diff #162183)

Is this testing the "limit" parameter to consume, or the limit iterator?
It shouldn't test both just because they happen to share a name :-)

(As noted above, I think you can remove the limit parameter)

kbobyrev updated this revision to Diff 162341.Aug 24 2018, 1:38 AM
kbobyrev marked 7 inline comments as done.

Address a round of comments & simplify code.

sammccall accepted this revision.Aug 24 2018, 3:12 AM
sammccall added inline comments.
clang-tools-extra/unittests/clangd/DexIndexTests.cpp
34 ↗(On Diff #162341)

remove limit param here too?
Where you're *testing* limit, the tests will be clearer if you do it explicitly.

279 ↗(On Diff #162341)

this doesn't seem to test the limit iterator (at least not more than it's tested elsewhere)

This revision is now accepted and ready to land.Aug 24 2018, 3:12 AM
This revision was automatically updated to reflect the committed changes.