Improve scheduling by coalescing branches that depend on the same condition. This pass looks for blocks that are guarded by the same branch condition in the IR and attempts to merge the blocks together. This is done by moving code either up/down to it’s predecessor/successor blocks.
On power8 LE, we see a 11% improvement for lbm and 28% improvement for mcf (SPEC2006).
I tried the following test on ARM and X86.
$ cat branchC.ll
; RUN: llc -mcpu=generic -mtriple=x86_64-unknown-linux -verify-machineinstrs < %s | FileCheck %s
; RUN: llc -mtriple=armv6-unknown-linux-gnu < %s | FileCheck %s
; RUN: llc -verify-machineinstrs -o - %s -mtriple=aarch64-linux-gnu | FileCheck %s
; Function Attrs: nounwind
define double @testBranchCoal(double %a, double %b, double %c, i32 %x) {
entry:
%test = icmp eq i32 %x, 0 %tmp1 = select i1 %test, double %a, double 2.000000e-03 %tmp2 = select i1 %test, double %b, double 0.000000e+00 %tmp3 = select i1 %test, double %c, double 5.000000e-03 %res1 = fadd double %tmp1, %tmp2 %result = fadd double %res1, %tmp3 ret double %result
}
This does not affect ARM since the LLVM IR produced does not conform to the pattern we expected.
For X86, the branches were not coalesced since the terminator produced contain implicit operands. This code will only coalesce branches whose terminators contain explicit operands.
This is originally reported in: https://llvm.org/bugs/show_bug.cgi?id=25219
This line isn't correct. I have a case where instructions that "produce the same value" are different. The relevant sequence is:
The debug output produces:
While it would be safe to CSE those crmoves, what definitely cannot occur is to assume that the value of CR0GT has not changed between the 2 instructions.