HomePhabricator

[X86] condition branches folding for three-way conditional codes

Authored by xur on Oct 8 2018, 11:52 AM.

Description

[X86] condition branches folding for three-way conditional codes

This patch implements a pass that optimizes condition branches on x86 by
taking advantage of the three-way conditional code generated by compare
instructions.

Currently, it tries to hoisting EQ and NE conditional branch to a dominant
conditional branch condition where the same EQ/NE conditional code is
computed. An example:
bb_0:

cmp %0, 19
jg bb_1
jmp bb_2

bb_1:

cmp %0, 40
jg bb_3
jmp bb_4

bb_4:

cmp %0, 20
je bb_5
jmp bb_6

Here we could combine the two compares in bb_0 and bb_4 and have the
following code:

bb_0:

cmp %0, 20
jg bb_1
jl bb_2
jmp bb_5

bb_1:

cmp %0, 40
jg bb_3
jmp bb_6

For the case of %0 == 20 (bb_5), we eliminate two jumps, and the control height
for bb_6 is also reduced. bb_4 is gone after the optimization.

This optimization is motivated by the branch pattern generated by the switch
lowering: we always have pivot-1 compare for the inner nodes and we do a pivot
compare again the leaf (like above pattern).

This pass currently is enabled on Intel's Sandybridge and later arches. Some
reviewers pointed out that on some arches (like AMD Jaguar), this pass may
increase branch density to the point where it hurts the performance of the
branch predictor.

Differential Revision: https://reviews.llvm.org/D46662

llvm-svn: 343993

Details