This is an archive of the discontinued LLVM Phabricator instance.

[MBP] Rotate should bring more fallthrough
Needs ReviewPublic

Authored by Carrot on May 17 2019, 1:49 PM.

Details

Reviewers
davidxl
Summary

Current implementation of rotateLoop does rotate without carefully compare the number of exit fallthrough and top fallthrough, so it may create more taken branches, as demonstrated by the test case no_rotate.

This patch makes sure the new exit fallthrough is larger than top fallthrough before rotation.

Diff Detail

Event Timeline

Carrot created this revision.May 17 2019, 1:49 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2019, 1:49 PM

I like the direction this patch is going, but I think it can do better.

The block layout algorithm here tries to maximize total FallThrough frequency. In the context of loop rotation, this can be formulated as such -- comparing the total fall through frequency before and after. If the post rotation fall through freq is lower, do not do it. In fact, the same idea can be used to find the optimal rotation (optimal bb rotated to the bottom).

In the no_rotate.ll, assuming the Freq(header->middle) = F0, Freq(middle->latch1) = F1, Freq(latch1->latch2) = F2, Freq(latch2->header) = F3, Freq(header->end) = F4, we have F2 == F3, F0 > F1 > F2

Before rotation, the total fallthrough freq = F0 + F1 + F2
After rotation, the total fall through = F1 + F2 + F3

Before - After = F 0 - F2 > 0, clearly not good to rotate.

Precisely compute the fallthrough frequency is very helpful in layout of loop header and exit BB. It is included in my next patch.

Current implementation of findBestLoopExit finds the exit edge with most frequency, and rotateLoop exposes that edge as fallthrough. This patch fixed a problem in a very common case

  • The loop header has only one out-of-loop predecessor
  • The bottom of the loop chain doesn't exit the loop

In this case, the exit edge can't have larger frequency than the fallthrough to header, so the loop chain should not be rotated.

Loop rotation not only affects fall through placement of incoming and exit edges, but internal fall through as well -- it converts the original backedge into a new fallthrough while converts an original internal fallthrough into the backedge. In other words, the overall cost/benefit needs to consider all changes. Doing pattern matching like this can lead to wrong decisions.

I assume this case can be handled by your next complete patch, so we should try to go that direction directly.

Does this patch handle the case in D62079? If yes, you can move the tests added in that patch here.