This is an archive of the discontinued LLVM Phabricator instance.

Remove the final bit test during lowering switch statement if all cases in bit test cover a contiguous range.
ClosedPublic

Authored by congh on Aug 21 2015, 2:01 PM.

Details

Summary

When lowering switch statement, if bit tests are used then LLVM will always generates a jump to the default statement in the last bit test. However, this is not necessary when all cases in bit tests cover a contiguous range. This is because when generating the bit tests header MBB, there is a range check that guarantees cases in bit tests won't go outside of [low, high], where low and high are minimum and maximum case values in the bit tests. This patch checks if this is the case and then doesn't emit jump to default statement and hence saves a bit test and a branch.

For example, given the following IR:

define void @foo(i32 %x) {
entry:
  switch i32 %x, label %return [
    i32 0, label %bb0
    i32 2, label %bb0
    i32 4, label %bb0
    i32 1, label %bb1
    i32 3, label %bb1
    i32 5, label %bb1
  ]
bb0: tail call void @a() br label %return
bb1: tail call void @b() br label %return
return: ret void
}
declare void @a()
declare void @b()

Previously LLVM generates the following MC:

BB#0: derived from LLVM BB %entry
    Live Ins: %EDI
	CMP32ri8 %EDI, 5, %EFLAGS<imp-def>
	JA_1 <BB#3>, %EFLAGS<imp-use>
    Successors according to CFG: BB#3(16) BB#1(16)

BB#1: derived from LLVM BB %entry
    Live Ins: %EDI
    Predecessors according to CFG: BB#0
	%EAX<def> = MOV32ri 21
	BT32rr %EAX<kill>, %EDI, %EFLAGS<imp-def>
	JAE_1 <BB#2>, %EFLAGS<imp-use>
    Successors according to CFG: BB#4(48) BB#2(48)

BB#4: derived from LLVM BB %bb0
    Predecessors according to CFG: BB#1
	TAILJMPd64 <ga:@a>, <regmask>, %RSP<imp-use>, %RSP<imp-use>

BB#2: derived from LLVM BB %entry
    Live Ins: %EDI
    Predecessors according to CFG: BB#1
	%EAX<def> = MOV32ri 42
	BT32rr %EAX<kill>, %EDI<kill>, %EFLAGS<imp-def>
	JAE_1 <BB#3>, %EFLAGS<imp-use>
    Successors according to CFG: BB#5(48) BB#3(16)

BB#5: derived from LLVM BB %bb1
    Predecessors according to CFG: BB#2
	TAILJMPd64 <ga:@b>, <regmask>, %RSP<imp-use>, %RSP<imp-use>

BB#3: derived from LLVM BB %return
    Predecessors according to CFG: BB#0 BB#2
	RETQ

With this patch the MC generated is:

BB#0: derived from LLVM BB %entry
    Live Ins: %EDI
	CMP32ri8 %EDI, 5, %EFLAGS<imp-def>
	JBE_1 <BB#1>, %EFLAGS<imp-use>
    Successors according to CFG: BB#3(16) BB#1(16)

BB#3: derived from LLVM BB %return
    Predecessors according to CFG: BB#0
	RETQ

BB#1: derived from LLVM BB %entry
    Live Ins: %EDI
    Predecessors according to CFG: BB#0
	%EAX<def> = MOV32ri 21
	BT32rr %EAX<kill>, %EDI<kill>, %EFLAGS<imp-def>
	JAE_1 <BB#2>, %EFLAGS<imp-use>
    Successors according to CFG: BB#4(48) BB#2(48)

BB#4: derived from LLVM BB %bb0
    Predecessors according to CFG: BB#1
	TAILJMPd64 <ga:@a>, <regmask>, %RSP<imp-use>, %RSP<imp-use>

BB#2: derived from LLVM BB %bb1
    Predecessors according to CFG: BB#1
	TAILJMPd64 <ga:@b>, <regmask>, %RSP<imp-use>, %RSP<imp-use>

We can save one basic block with 3 instructions.

Diff Detail

Event Timeline

congh updated this revision to Diff 32861.Aug 21 2015, 2:01 PM
congh retitled this revision from to Remove the final bit test during lowering switch statement if all cases in bit test cover a contiguous range..
congh updated this object.
congh added reviewers: hfinkel, spatel, davidxl.
congh updated this object.
congh added a subscriber: llvm-commits.
davidxl edited edge metadata.Aug 21 2015, 3:47 PM

I have not looked at the details. Does your patch work for the case when a min value is not zero:

define void @foo(i32 %x) {
entry:

switch i32 %x, label %return [
  i32 1, label %bb0
  i32 3, label %bb0
  i32 5, label %bb0
  i32 2, label %bb1
  i32 4, label %bb1
  i32 6, label %bb1
]

bb0: tail call void @a() br label %return
bb1: tail call void @b() br label %return
return: ret void
}
declare void @a()
declare void @b()

congh added a comment.Aug 21 2015, 3:52 PM

I have not looked at the details. Does your patch work for the case when a min value is not zero:

define void @foo(i32 %x) {
entry:

switch i32 %x, label %return [
  i32 1, label %bb0
  i32 3, label %bb0
  i32 5, label %bb0
  i32 2, label %bb1
  i32 4, label %bb1
  i32 6, label %bb1
]

bb0: tail call void @a() br label %return
bb1: tail call void @b() br label %return
return: ret void
}
declare void @a()
declare void @b()

Yes. In visitBitTestHeader(), a subtraction is always generated to subtract the min value, turning the min value into zero.

The subtraction is generated with your patch? Without your patch, the subtraction is not generated unless the range is outside 64 bit.

congh added a comment.Aug 21 2015, 4:07 PM

No, the subtraction is not from this patch. Originally, if the value can fit in 64-bit, the min value will be set to zero before visitBitTestHeader() is called, so a subtraction by zero will be eliminated. In this patch, when the value can fit in 64-bit, I check if the value range is contiguous. If it is, I assume we can save more than a subtraction with the optimization done in this patch. Therefore I still do the subtraction in this case, and the branch to the default statement will be eliminated. Otherwise, the subtraction will not be done as there is always a branch to the default statement.

If subtraction is always done (when it is contiguous), it is always a win? There are cases the last test is rarely taken.

congh added a comment.Aug 21 2015, 4:42 PM

If subtraction is always done (when it is contiguous), it is always a win? There are cases the last test is rarely taken.

OK. So should I calculated the potential win by taking the probability of the last test into consideration? Or if the value can fit in 64-bit we don't to this optimization?

I think it should be an optimization that always wins -- the following conditions need to be satisfied:

  1. value range is continuous AND
  2. min == 0 OR the value range is outside 64 (or whatever largest width supported by the target)
congh added a comment.Aug 21 2015, 5:01 PM

I think it should be an optimization that always wins -- the following conditions need to be satisfied:

  1. value range is continuous AND
  2. min == 0 OR the value range is outside 64 (or whatever largest width supported by the target)

Agree. I will update the patch later.

congh updated this revision to Diff 32887.Aug 21 2015, 5:04 PM
congh edited edge metadata.

Update the patch by disabling the optimization in this patch when the case value can fit in a word and the minimum value of the value range is not zero (in which case we can save a subtraction).

looks fine to me. Needs to be examined by other reviewers. (suggest adding hans)

congh added a reviewer: hans.Aug 25 2015, 11:06 AM

looks fine to me. Needs to be examined by other reviewers. (suggest adding hans)

I have added Hans as a reviewer. Thanks for the review!

hans edited edge metadata.Aug 25 2015, 11:25 AM

Looks great! I like this :-)

Please add tests for the cases where this doesn't fire too.

test/CodeGen/X86/switch.ll
485

I think this means the optimization affects both bit-test tests in this file. It would be great to have tests where it doesn't fire too, either because the case range is not covered, or because of the "fits in machine word, so optimize away subtraction" case.

congh updated this revision to Diff 33113.Aug 25 2015, 1:31 PM
congh edited edge metadata.

Add two test cases on which the optimization in this patch won't be done.

hans accepted this revision.Aug 25 2015, 1:40 PM
hans edited edge metadata.

Thanks! lgtm

test/CodeGen/X86/switch.ll
144

nit: The newline seems unnecessary here. I guess it was just copied from line 110, so maybe remote it there too? Also in bt_is_better3.

This revision is now accepted and ready to land.Aug 25 2015, 1:40 PM
congh added inline comments.Aug 25 2015, 1:48 PM
test/CodeGen/X86/switch.ll
485

I have added two test cases covering two cases you mentioned. Thank you for the suggestion!

congh updated this revision to Diff 33117.Aug 25 2015, 1:49 PM
congh edited edge metadata.

Remove three blank lines from switch.ll.

test/CodeGen/X86/switch.ll
144

Removed. Thanks!

This revision was automatically updated to reflect the committed changes.