This is an archive of the discontinued LLVM Phabricator instance.

Peel off the dominant case in switch statement
Needs ReviewPublic

Authored by xur on Sep 15 2017, 3:42 PM.

Details

Reviewers
davidxl
hans
Summary

This patch peels off the dominant case in switch statement into a branch. This avoids the extra compares when lowering the switch statement into chain of branches in binary search tree style.

For a switch statement like the following:

switch (i) {
case     8: s1; break;
case -8826: s2; break;
case 18312: s3; break;
case 18568: s4; break;
case   129: s5; break;
}

The lowering generates the following coe

  1. BB#0: # %entry cmpl $128, %edi jg .LBB0_4
  2. BB#1: # %entry movl $1, %eax cmpl $8, %edi jne .LBB0_2
  3. BB#10: # %return retq

.LBB0_4: # %entry

cmpl    $129, %edi

Here the coml to $128 is extra. This is happening even if i==8 target has only non-zero branch-weights. Like !{!"branch_weights", i32 0, i32 100000, i32 0, i32 0, i32 0, i32 0}

This patch would peel i==8 into a branch and keep the rest in the switch statement.

Diff Detail

Event Timeline

xur created this revision.Sep 15 2017, 3:42 PM
hans added inline comments.Sep 15 2017, 3:56 PM
lib/Transforms/Utils/SimplifyCFG.cpp
5680

I don't think we should do this if SwitchToLookupTable (the call below) can turn the swich into a lookup instead.

In fact, I don't think this should be done at the IR level at all, since it's more of a switch lowering issue. SelectionDAGBuilder::visitSwitch() would be a better place.

That code already takes case weights into account, and when lowering to a binary search tree, it will balance it based on weight, favoring cases that are hot. Do you find that that's not sufficient? (I'm willing to believe that's the case, but I'd like to see it argued.)

davidxl added inline comments.Sep 15 2017, 4:25 PM
lib/Transforms/Utils/SimplifyCFG.cpp
5680

Peeling the dominating case avoids a memory read (table lookup). Besides it trades a indirect branch for a a highly biased direct branch (which is usually highly predictable by the branch predictor).

Like looping which happens at higher level, switch peeling can also enable more possible surrounding optimizations (eg like jump threading). Peeling it out also probably also makes it easier to for better code layout.

On the other hand, this should not be done when size optimization is on.

We've already discovered a slew of problem from over-optimizing things in simplifycfg, which is also being used as a canonicalization pass.
Either it needs a set of flags controlling what it does, or it needs to not be trying to do "performance improvement" and "canonicalization" as a single pass.
See, e.g, https://bugs.llvm.org/show_bug.cgi?id=34603

hans added inline comments.Sep 18 2017, 12:14 PM
lib/Transforms/Utils/SimplifyCFG.cpp
5680

SwitchToLookupTable creates lookup tables for switches that are used to select from a set of constant values (this is different from jump tables, sorry the names here are confusing), so there is no indirect branch. I think peeling off the common case is probably not a good idea for those kinds of switches.

I can see that peeling early might enable other high-level optimizations. But that's true in some sense for switch lowering in general. I'd suggest to at least look into how peeling in the regular switch lowering code at SelectionDAGBuilder::visitSwitch() would look like. It seems to me like the natural place to do it, and I suspect it's easier to do there.