This is an archive of the discontinued LLVM Phabricator instance.

Replace the custom linked list in LeaderTableEntry with TinyPtrVector.
ClosedPublic

Authored by resistor on May 8 2022, 10:38 PM.

Details

Summary

The purpose of the custom linked list was to optimize for the case
of a single-element list. It turns out that TinyPtrVector handles
the same basic scenario even better, reducing the size of
LeaderTableEntry by 33%, and requiring only log2(N) allocations
as the size of the list grows. The only downside is that we have
to store the Value's and BasicBlock's in separate vectors, which
is slightly awkward in a few cases. Fortunately that ends up being
entirely encapsulated inside helper functions.

Diff Detail

Event Timeline

resistor created this revision.May 8 2022, 10:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 8 2022, 10:38 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
resistor requested review of this revision.May 8 2022, 10:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 8 2022, 10:38 PM
nikic added inline comments.May 10 2022, 1:41 AM
llvm/include/llvm/Transforms/Scalar/GVN.h
295

This looks incorrect in multiple ways. If we're at the first instruction, this will step before the begin iterator. And the end iterator is invalidated in either case. I think you need something like this:

VI = entry.Val.erase(VI);
BI = entry.BB.erase(BI);
VE = entry.Val.end();
continue;
llvm/lib/Transforms/Scalar/GVN.cpp
2113

Can be just return is_contained(entry.BB, BB)?

nikic added a reviewer: nikic.May 10 2022, 1:42 AM
resistor updated this revision to Diff 428404.EditedMay 10 2022, 9:20 AM
resistor marked 2 inline comments as done.

Address review feedback.

llvm/include/llvm/Transforms/Scalar/GVN.h
295

Doh!

llvm/lib/Transforms/Scalar/GVN.cpp
2113

Done

nikic added a comment.May 11 2022, 8:22 AM

This still crashes when building llvm-test-suite: http://llvm-compile-time-tracker.com/show_error.php?commit=77bc69cedc32296847573390cf393827b6196d1f There must be something else wrong...

nikic added inline comments.May 11 2022, 8:26 AM
llvm/lib/Transforms/Scalar/GVN.cpp
2109

Hm, I think this is inverted. The old code is a bit hard to understand, but what it's checking is that all BBs are BB, not that any is. So this should be:

return all_of(entry.BB, [BB](BasicBlock *EntryBB) { return EntryBB == BB; });
nikic added inline comments.May 11 2022, 8:28 AM
llvm/lib/Transforms/Scalar/GVN.cpp
2109

You had that right in your initial version any my is_contained suggestion was wrong :)

resistor updated this revision to Diff 428723.May 11 2022, 11:24 AM

Fix condition.wq

resistor updated this revision to Diff 428740.May 11 2022, 12:11 PM

Fix accidentally walking off of the end of the BB vector.

As far as I can tell, the only behavioral difference with the new version is the order of leaders in the table, which biases the candidates returned by findLeader. The hand-rolled linked listed used an unprincipled ordering: the first candidate found was always at the front of the list, but after candidates were in reverse order of discovery. The new version is currently storing them strictly in order of recovery. I can try doing a backwards search in findLeader, which would be more similar to the the old approach on average. Really, we should probably have a more principled heuristic for choosing amongst the leaders, but that seems out of scope for this change.

resistor updated this revision to Diff 429120.May 12 2022, 7:54 PM

Replicate the leader table insertion ordering of the old linked list.

asbirlea accepted this revision.May 24 2022, 1:49 PM

Changes LGTM with inline nits.
@nikic I have not verified compile time impact with the latest diff.

llvm/lib/Transforms/Scalar/GVN.cpp
2108–2109

s/entry/Entry

2131–2133

s/entry/Entry

2228–2230

s/entry/Entry

3040–3043

s/entry/Entry

This revision is now accepted and ready to land.May 24 2022, 1:49 PM
resistor updated this revision to Diff 432201.May 25 2022, 11:51 PM

Rebase and address lints

resistor marked 4 inline comments as done.May 25 2022, 11:52 PM
This revision was landed with ongoing or failed builds.May 25 2022, 11:52 PM
This revision was automatically updated to reflect the committed changes.
nikic added a comment.May 26 2022, 2:12 AM

Compile-time still looks the same: https://llvm-compile-time-tracker.com/compare.php?from=c2eccc67ce07e9cb374eb0ecdb3038fcb8be08cd&to=1e9114984490b83d4665f12a11f84c83f50ca8f0&stat=instructions That is, mildly negative.

I think that this analysis isn't really correct:

It turns out that TinyPtrVector handles the same basic scenario even better, reducing the size of LeaderTableEntry by 33%, and requiring only log2(N) allocations as the size of the list grows.

For all practical purposes, the number of allocations in the previous implementation was zero, because a bump pointer allocator was used, which is essentially free. In the new scheme, we get real (global allocator) allocations whenever we have more than one entry.

It is true that for a single entry, the size is reduced by 33%. However, if we go to two elements, then we'll end up separately allocating two SmallVectors with 4 elements each, which means we use a total of 14 pointer-sized values instead of 6 (not counting any additional overhead the global allocator adds). So for 2 entries, we actually use more than twice as much memory. This gets better for 3 and 4 entries, and then again much worse at 5 entries, where we leave behind unused inline SmallVector storage.

I don't know what the actual distribution of the number of entries is, but it's not really clear whether the new implementation uses less memory or is more performant in practice.

I assume that there was some kind of motivation for making this change and you observed better resource utilization for some workload?