This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Store OR iterator children in heap
AbandonedPublic

Authored by kbobyrev on Sep 14 2018, 3:25 AM.

Details

Reviewers
None
Summary

Use min-heap invariant for OR iterator's children. This helps to avoid iterating through all children in reachedEnd(), peek() and allows early-stopping in consume() and advance().

Before this patch: 4710391415
After this patch: 6104649571
Overhead: ~30%

Diff Detail

Event Timeline

kbobyrev created this revision.Sep 14 2018, 3:25 AM
ilya-biryukov added inline comments.Sep 14 2018, 3:49 AM
clang-tools-extra/clangd/index/dex/Iterator.cpp
128

NIT: use triple-slash comments.
NIT: LHS > RHS seems to be exactly what's defined by this function. Maybe mention peek() to explain how actual comparison works?

129

Turn lambda into a function?

129

Call Greater to avoid confusion? Compare can mean comparisons in both directions.

199

This seems to assume Children is sorted, but that's not the case.
Why is it valid to iterate only a subset of the vector?

ilya-biryukov added inline comments.Sep 14 2018, 4:04 AM
clang-tools-extra/clangd/index/dex/Iterator.cpp
128

To make the second part of the comment clearer: LHS > RHS duplicates what could be inferred from the function name+body (which is small enough to be readily readable)....

kbobyrev planned changes to this revision.Sep 14 2018, 4:39 AM
kbobyrev marked 2 inline comments as done.
kbobyrev updated this revision to Diff 165471.Sep 14 2018, 5:17 AM
kbobyrev marked 3 inline comments as done.
kbobyrev edited the summary of this revision. (Show Details)

Fixed the bug with incorrect assumption of Children being sorted (instead of being a heap). This caused a huge overhead (~30%). When I iterate through all children in consume() (like before) it's also worse (~18%).

Seems like this optimization is not worth (yet?). As soon as we get more proximity paths (and hence more OR iterator children) that might make sense.

kbobyrev edited the summary of this revision. (Show Details)Sep 14 2018, 5:17 AM

Seems like this optimization is not worth (yet?). As soon as we get more proximity paths (and hence more OR iterator children) that might make sense.

WDYT about storing all the elements with the minimal doc-id outside the heap? I.e. we can pop all elements with the minimum doc-id on 'advance' and iterator creation.
That way we could potentially regain the improvements we've seen in the first version.

kbobyrev planned changes to this revision.Nov 2 2018, 4:04 AM

Will probably come back to it at some point, the experiments didn't show any significant improvement.

kbobyrev abandoned this revision.Dec 2 2019, 2:24 AM

Probably not worth investigating at this point.