This is an archive of the discontinued LLVM Phabricator instance.

peephole optimization in switch table lookup: reuse the guarding table comparison if possible
ClosedPublic

Authored by eeckstein on Nov 26 2014, 11:31 AM.

Details

Reviewers
hans
Summary

This optimization tries to reuse the generated compare instruction, if there is a comparison agains the default value after the switch.

Example:

switch (x) {
case 0: r = 10; break;
case 1: r = 11; break;
...
default: r = 0; break; // 0 does not appear in any case value.
}
if (r == 0) {
do_something;
}

transforms to:

cond = x < table_size;
if (cond) {
r = table[x];
} else {
r = 0;
}
if (cond) {

do_something;

}

Jump threading can then eliminate the second if (cond):

if (x < table_size) {
r = table[x];
} else {
r = 0;
do_something;
}

Diff Detail

Event Timeline

eeckstein updated this revision to Diff 16660.Nov 26 2014, 11:31 AM
eeckstein retitled this revision from to peephole optimization in switch table lookup: reuse the guarding table comparison if possible.
eeckstein updated this object.
eeckstein edited the test plan for this revision. (Show Details)
eeckstein added a reviewer: hans.
eeckstein added a subscriber: Unknown Object (MLST).

This is already the third version of the change. Previous version were discussed per email:

On 25 Nov 2014, at 20:48, Hans Wennborg <hans@chromium.org> wrote:

On Tue, Nov 25, 2014 at 6:19 AM, Erik Eckstein <eeckstein@apple.com> wrote:

Could jump threading or one of its analyses be taught to handle this?
So that we could also handle a case like:

Actually the current jump threading can handle this, but only if the "r == 0" is a compare + branch. E.g. if do_something is a call, it will work.
It currently does not handle select instructions. So if do_something is a simple variable assignment, then it will not work. I think this could be added easily.

But we have a phase ordering problem: jump threading is obviously done after switch table generation (so it does not work currently for switches which are converted to tables).
If we would do jump threading before, then it might prevent switch table generation.

I suggest the following:

  1. Use my patch to do this kind of "jump threading" for switch tables (solves the phase ordering problem).
  2. Teach the jump threading pass to handle select instructions.
  1. and 2) are unrelated.

Yeah, I guess the phase ordering makes things tricky. Fair enough.

Comments on the actual patch below:

Index: lib/Transforms/Utils/SimplifyCFG.cpp

  • lib/Transforms/Utils/SimplifyCFG.cpp (revision 222430)

+++ lib/Transforms/Utils/SimplifyCFG.cpp (working copy)
@@ -73,6 +73,7 @@
STATISTIC(NumLinearMaps, "Number of switch instructions turned into linear mapping");
STATISTIC(NumLookupTables, "Number of switch instructions turned into lookup tables");
STATISTIC(NumLookupTablesHoles, "Number of switch instructions turned into lookup tables (holes checked)");
+STATISTIC(NumTableCmpReuses, "Number of reused switch table lookup compares");
STATISTIC(NumSinkCommons, "Number of common instructions sunk down to the end block");
STATISTIC(NumSpeculations, "Number of speculative executed instructions");

@@ -3963,6 +3964,57 @@

return SI->getNumCases() * 10 >= TableSize * 4;

}

+/ Try to reuse the result of the compare for guarding the switch table lookup.
+
/ If the value of the resulting phi is used in a compare which yields the same
+/// result as the guarding compare, we can reuse the guarding compare.

The comment should probably say that the purpose of reusing the
compare is to facilitate jump threading.

+void reuseTableCompare(ICmpInst *CmpInst, BranchInst *BR,
+ Value *&InvertedTableCmp,

I'm not sure caching InvertedTableCmp is worth the extra book-keeping.
I assume jump threading also works for the inverted case?

+ const SmallVectorImpl<std::pair<ConstantInt*, Constant*> >& Values,
+ Constant *DefaultValue) {
+
+ Constant *CmpOp1 = dyn_cast<Constant>(CmpInst->getOperand(1));
+ if (!CmpOp1)
+ return;
+
+
+ Constant *TrueConst = ConstantInt::getTrue(CmpInst->getType());
+ Constant *FalseConst = ConstantInt::getFalse(CmpInst->getType());
+
+ Check if the compare with the default value is constant true or false.
+ Constant *DefaultConst = ConstantExpr::getICmp(CmpInst->getPredicate(),
+ DefaultValue, CmpOp1, true);
+ if (DefaultConst != TrueConst && DefaultConst != FalseConst)
+ return;
+
+
Check if we have a consistent compare result for all case values.
+ Constant *CommonCaseConst = nullptr;
+ for (auto ValuePair : Values) {
+ Constant *CaseConst = ConstantExpr::getICmp(CmpInst->getPredicate(),
+ ValuePair.second, CmpOp1, true);
+ if (CommonCaseConst && CommonCaseConst != CaseConst)
+ return;
+ CommonCaseConst = CaseConst;

I would have written this as:

if (!CommonCaseConst)

CommonCaseConst = CaseConst;

if (CaseConst != CommonCaseConst)

return;

Actually, instead of checking against CommonCaseConst, couldn't we
just check that CaseConst is always the opposite of DefaultConst? I
think that could simplify the loop a bit, and also the code below?

+ }
+ if (CommonCaseConst != TrueConst && CommonCaseConst != FalseConst)
+ return;
+
+ Value *TableCmp = BR->getCondition();
+ if (DefaultConst == FalseConst && CommonCaseConst == TrueConst) {
+ The compare yields the same result. We can replace it.
+ CmpInst->replaceAllUsesWith(TableCmp);
+ ++NumTableCmpReuses;
+ } else if (DefaultConst == TrueConst && CommonCaseConst == FalseConst) {
+
The compare yields the same result, just inverted. We can replace it.
+ if (!InvertedTableCmp) {
+ Create a boolean invert, if we don't have it yet.
+ InvertedTableCmp = BinaryOperator::CreateXor(TableCmp,
+ ConstantInt::get(TableCmp->getType(), 1), "inverted.cmp", BR);
+ }
+ CmpInst->replaceAllUsesWith(InvertedTableCmp);
+ ++NumTableCmpReuses;
+ }
+}
+
/ SwitchToLookupTable - If the switch is only used to initialize one or more
/ phi nodes in a common successor block with different constant values,
/ replace the switch with lookup tables.
@@ -4039,11 +4091,8 @@

// If the table has holes, we need a constant result for the default case
// or a bitmask that fits in a register.
SmallVector<std::pair<PHINode*, Constant*>, 4> DefaultResultsList;
  • bool HasDefaultResults = false;
  • if (TableHasHoles) {
  • HasDefaultResults = GetCaseResults(SI, nullptr, SI->getDefaultDest(),

+ bool HasDefaultResults = GetCaseResults(SI, nullptr, SI->getDefaultDest(),

&CommonDest, DefaultResultsList, DL);
  • }

    bool NeedMask = (TableHasHoles && !HasDefaultResults); if (NeedMask) {

@@ -4087,6 +4136,8 @@

// lookup table BB. Otherwise, check if the condition value is within the case
// range. If it is so, branch to the new BB. Otherwise branch to SI's default
// destination.

+ BranchInst *BranchInst = nullptr;

Since we create a branch also for "covered" lookup tables, maybe this
should be called CondBranchInst or something? Or actually, could we
keep track of the comparison instruction instead, e.g. "Value
*TableRangeCheck"? Or maybe the conditional branch could be called
TableRangeCheck.

+

const bool GeneratingCoveredLookupTable = MaxTableSize == TableSize;
if (GeneratingCoveredLookupTable) {
  Builder.CreateBr(LookupBB);

@@ -4097,7 +4148,7 @@

} else {
  Value *Cmp = Builder.CreateICmpULT(TableIndex, ConstantInt::get(
                                     MinCaseVal->getType(), TableSize));
  • Builder.CreateCondBr(Cmp, LookupBB, SI->getDefaultDest());

+ BranchInst = Builder.CreateCondBr(Cmp, LookupBB, SI->getDefaultDest());

}

// Populate the BB that does the lookups.

@@ -4148,11 +4199,11 @@

bool ReturnedEarly = false;
for (size_t I = 0, E = PHIs.size(); I != E; ++I) {
  PHINode *PHI = PHIs[I];

+ const ResultListTy &ResultList = ResultLists[PHI];

// If using a bitmask, use any value to fill the lookup table holes.
Constant *DV = NeedMask ? ResultLists[PHI][0].second : DefaultResults[PHI];
  • SwitchLookupTable Table(Mod, TableSize, MinCaseVal, ResultLists[PHI],
  • DV, DL);

+ SwitchLookupTable Table(Mod, TableSize, MinCaseVal, ResultList, DV, DL);

Value *Result = Table.BuildLookup(TableIndex, Builder);

@@ -4164,6 +4215,22 @@

  ReturnedEarly = true;
  break;
}

+
+ Do a small peephole optimization: re-use the switch table compare if
+
possible.
+ This is similiar to InstCombiner::FoldOpIntoPhi. FoldOpIntoPhi can't
+
handle switch tables so we do it explicitly here.
+ if (!TableHasHoles && HasDefaultResults && BranchInst) {

I wish the body of this if statement could be extracted to a separate
utility function to keep the code here a little simpler. I feel if it
was just something like:

if (BranchInst && HasDefaultResults && !TableHasHoles)

reuseTableRangeCheck(...)

It would feel less intrusive. Maybe just move the search for the Cmp
instruction into reuseTableCompare.

+ Value *InvertedTableCmp = nullptr;
+ for (auto UI = PHI->user_begin(), E = PHI->user_end(); UI != E; ++UI) {

Could probably use a range-based for loop over PHI->users() instead.

+ // Check if we have an icmp in the same block.
+ ICmpInst *CmpInst = dyn_cast<ICmpInst>(*UI);
+ if (CmpInst && CmpInst->getParent() == PHI->getParent()) {
+ reuseTableCompare(CmpInst, BranchInst, InvertedTableCmp, ResultList,
+ DV);
+ }
+ }
+ }

  PHI->addIncoming(Result, LookupBB);
}

Index: test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll

  • test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll (revision 222430)

+++ test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll (working copy)
@@ -1078,3 +1078,93 @@
; CHECK-NEXT: ret i8 %switch.idx.cast
}

Please add a comment for each test to make it easier for a casual
reader to see what they're doing.

+define i32 @reuse_cmp1(i32 %x) {
+entry:
+ switch i32 %x, label %sw.default [
+ i32 0, label %sw.bb
+ i32 1, label %sw.bb1
+ i32 2, label %sw.bb2
+ i32 3, label %sw.bb3
+ ]
+sw.bb: br label %sw.epilog
+sw.bb1: br label %sw.epilog
+sw.bb2: br label %sw.epilog
+sw.bb3: br label %sw.epilog
+sw.default: br label %sw.epilog
+sw.epilog:
+ %r.0 = phi i32 [ 0, %sw.default ], [ 13, %sw.bb3 ], [ 12, %sw.bb2 ], [ 11, %sw.bb1 ], [ 10, %sw.bb ]
+ %cmp = icmp eq i32 %r.0, 0
+ br i1 %cmp, label %if.then, label %if.end
+if.then: br label %return
+if.end: br label %return
+return:
+ %retval.0 = phi i32 [ 100, %if.then ], [ %r.0, %if.end ]
+ ret i32 %retval.0
+; CHECK-LABEL: @reuse_cmp1(
+; CHECK: entry:
+; CHECK-NEXT: %switch.tableidx = sub i32 %x, 0
+; CHECK-NEXT: [[C:%.+]] = icmp ult i32 %switch.tableidx, 4
+; CHECK-NEXT: %inverted.cmp = xor i1 [[C]], true
+; CHECK: [[R:%.+]] = select i1 %inverted.cmp, i32 100, i32 {{.*}}
+; CHECK-NEXT: ret i32 [[R]]
+}
+
+define i32 @reuse_cmp2(i32 %x) {
+entry:
+ switch i32 %x, label %sw.default [
+ i32 0, label %sw.bb
+ i32 1, label %sw.bb1
+ i32 2, label %sw.bb2
+ i32 3, label %sw.bb3
+ ]
+sw.bb: br label %sw.epilog
+sw.bb1: br label %sw.epilog
+sw.bb2: br label %sw.epilog
+sw.bb3: br label %sw.epilog
+sw.default: br label %sw.epilog
+sw.epilog:
+ %r.0 = phi i32 [ 4, %sw.default ], [ 3, %sw.bb3 ], [ 2, %sw.bb2 ], [ 1, %sw.bb1 ], [ 0, %sw.bb ]
+ %cmp = icmp ne i32 %r.0, 4
+ br i1 %cmp, label %if.then, label %if.end
+if.then: br label %return
+if.end: br label %return
+return:
+ %retval.0 = phi i32 [ %r.0, %if.then ], [ 100, %if.end ]
+ ret i32 %retval.0
+; CHECK-LABEL: @reuse_cmp2(
+; CHECK: entry:
+; CHECK-NEXT: %switch.tableidx = sub i32 %x, 0
+; CHECK-NEXT: [[C:%.+]] = icmp ult i32 %switch.tableidx, 4
+; CHECK: [[R:%.+]] = select i1 [[C]], i32 {{.*}}, i32 100
+; CHECK-NEXT: ret i32 [[R]]
+}
+
+define i32 @no_reuse_cmp(i32 %x) {
+entry:
+ switch i32 %x, label %sw.default [
+ i32 0, label %sw.bb
+ i32 1, label %sw.bb1
+ i32 2, label %sw.bb2
+ i32 3, label %sw.bb3
+ ]
+sw.bb: br label %sw.epilog
+sw.bb1: br label %sw.epilog
+sw.bb2: br label %sw.epilog
+sw.bb3: br label %sw.epilog
+sw.default: br label %sw.epilog
+sw.epilog:
+ %r.0 = phi i32 [ 12, %sw.default ], [ 13, %sw.bb3 ], [ 12, %sw.bb2 ], [ 11, %sw.bb1 ], [ 10, %sw.bb ]
+ %cmp = icmp ne i32 %r.0, 0
+ br i1 %cmp, label %if.then, label %if.end
+if.then: br label %return
+if.end: br label %return
+return:
+ %retval.0 = phi i32 [ %r.0, %if.then ], [ 100, %if.end ]
+ ret i32 %retval.0
+; CHECK-LABEL: @no_reuse_cmp(
+; CHECK: [[S:%.+]] = select
+; CHECK-NEXT: %cmp = icmp ne i32 [[S]], 0
+; CHECK-NEXT: [[R:%.+]] = select i1 %cmp, i32 [[S]], i32 100
+; CHECK-NEXT: ret i32 [[R]]
+}
+

hans accepted this revision.Nov 26 2014, 5:03 PM
hans edited edge metadata.

Looks good to me.

lib/Transforms/Utils/SimplifyCFG.cpp
4006

This helper function should probably be static.

4006

Instead of passing both PhiUser and PhiBlock, can't we just pass in the PhiNode and iterate over the users ourselves? That way we get to extract some more code too, which I think would be good.

4041

ultra nit: Upper case and period is nice even in asserts.

This revision is now accepted and ready to land.Nov 26 2014, 5:03 PM
eeckstein closed this revision.Dec 1 2014, 3:06 AM

commited in r222891. Note that I added a fix which checks that the guarding branch dominates the phi-block.

lib/Transforms/Utils/SimplifyCFG.cpp
4006

If I move the phi-loop to reuseTableCompare, I cannot easily return in case the optimization cannot be done for a specific phi-user.
I would need another helper function, which I don't like either.
Therefore I kept it as it is.