Improve scheduling by coalescing branches that depend on the same condition. This pass looks for blocks that are guarded by the same branch condition in the IR and attempts to merge the blocks together. This is done by moving code either up/down to it’s predecessor/successor blocks.
On power8 LE, we see a 11% improvement for lbm and 28% improvement for mcf (SPEC2006).
I tried the following test on ARM and X86.
$ cat branchC.ll
; RUN: llc -mcpu=generic -mtriple=x86_64-unknown-linux -verify-machineinstrs < %s | FileCheck %s
; RUN: llc -mtriple=armv6-unknown-linux-gnu < %s | FileCheck %s
; RUN: llc -verify-machineinstrs -o - %s -mtriple=aarch64-linux-gnu | FileCheck %s
; Function Attrs: nounwind
define double @testBranchCoal(double %a, double %b, double %c, i32 %x) {
entry:
%test = icmp eq i32 %x, 0 %tmp1 = select i1 %test, double %a, double 2.000000e-03 %tmp2 = select i1 %test, double %b, double 0.000000e+00 %tmp3 = select i1 %test, double %c, double 5.000000e-03 %res1 = fadd double %tmp1, %tmp2 %result = fadd double %res1, %tmp3 ret double %result
}
This does not affect ARM since the LLVM IR produced does not conform to the pattern we expected.
For X86, the branches were not coalesced since the terminator produced contain implicit operands. This code will only coalesce branches whose terminators contain explicit operands.
This is originally reported in: https://llvm.org/bugs/show_bug.cgi?id=25219
This line isn't correct. I have a case where instructions that "produce the same value" are different. The relevant sequence is:
BB#16: ... BCTRL8_LDinto_toc ... %vreg140<def> = COPY %CR0GT; CRBITRC:%vreg140 %vreg141<def> = LXSDX %vreg138, %vreg129, %RM<imp-use>; mem:LD8[%134](dereferenceable) F8RC:%vreg141 G8RC_and_G8RC_NOX0:%vreg138 G8RC:%vreg129 %vreg142<def> = XXLXORdpz; F8RC:%vreg142 BC %vreg140, <BB#73>; CRBITRC:%vreg140 BB#72: derived from LLVM BB %114 Predecessors according to CFG: BB#16 Successors according to CFG: BB#73(?%) BB#73: derived from LLVM BB %114 Predecessors according to CFG: BB#16 BB#72 %vreg143<def> = PHI %vreg142, <BB#72>, %vreg141, <BB#16>; F8RC:%vreg143,%vreg142,%vreg141 ... BCTRL8_LDinto_toc ... %vreg149<def> = COPY %CR0GT; CRBITRC:%vreg149 %vreg150<def> = LXSDX %vreg138, %vreg129, %RM<imp-use>; mem:LD8[%134](dereferenceable) F8RC:%vreg150 G8RC_and_G8RC_NOX0:%vreg138 G8RC:%vreg129 BC %vreg149, <BB#75>; CRBITRC:%vreg149 Successors according to CFG: BB#74(?%) BB#75(?%) BB#74: derived from LLVM BB %114 Predecessors according to CFG: BB#73 Successors according to CFG: BB#75(?%) BB#75: derived from LLVM BB %114 Predecessors according to CFG: BB#73 BB#74 %vreg151<def> = PHI %vreg142, <BB#74>, %vreg150, <BB#73>; F8RC:%vreg151,%vreg142,%vreg150The debug output produces:
While it would be safe to CSE those crmoves, what definitely cannot occur is to assume that the value of CR0GT has not changed between the 2 instructions.