Improve scheduling with branch coalescing
ClosedPublic

Authored by lei on Jan 3 2017, 12:39 PM.

Details

Summary

Improve scheduling by coalescing branches that depend on the same condition. This pass looks for blocks that are guarded by the same branch condition in the IR and attempts to merge the blocks together. This is done by moving code either up/down to it’s predecessor/successor blocks.

On power8 LE, we see a 11% improvement for lbm and 28% improvement for mcf (SPEC2006).

I tried the following test on ARM and X86.

$ cat branchC.ll
; RUN: llc -mcpu=generic -mtriple=x86_64-unknown-linux -verify-machineinstrs < %s | FileCheck %s
; RUN: llc -mtriple=armv6-unknown-linux-gnu < %s | FileCheck %s
; RUN: llc -verify-machineinstrs -o - %s -mtriple=aarch64-linux-gnu | FileCheck %s

; Function Attrs: nounwind
define double @testBranchCoal(double %a, double %b, double %c, i32 %x) {
entry:

%test = icmp eq i32 %x, 0
%tmp1 = select i1 %test, double %a, double 2.000000e-03
%tmp2 = select i1 %test, double %b, double 0.000000e+00
%tmp3 = select i1 %test, double %c, double 5.000000e-03

%res1 = fadd double %tmp1, %tmp2
%result = fadd double %res1, %tmp3
ret double %result

}

This does not affect ARM since the LLVM IR produced does not conform to the pattern we expected.
For X86, the branches were not coalesced since the terminator produced contain implicit operands. This code will only coalesce branches whose terminators contain explicit operands.

This is originally reported in: https://llvm.org/bugs/show_bug.cgi?id=25219

Diff Detail

Repository
rL LLVM
There are a very large number of changes, so older changes are hidden. Show Older Changes
lei added inline comments.Jan 24 2017, 12:50 PM
lib/CodeGen/BranchCoalescing.cpp
409 ↗(On Diff #82932)

will update to use range version of splice()

lei marked an inline comment as done.Jan 24 2017, 9:01 PM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
608 ↗(On Diff #82932)

The same kind of checks are needed for function canMerge and mergeCandidate, Put the checks into function validateCandidates() and call it to verify MBBs and do an early exit if returns false.

lei marked an inline comment as done.Jan 25 2017, 8:46 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
442 ↗(On Diff #82932)

will remove this redundant check since MI.defs() return register definitions.

lei updated this revision to Diff 85784.Jan 25 2017, 10:55 AM

Updated patch to address all the review comments from Nemanja.

lei updated this revision to Diff 86082.Jan 27 2017, 11:21 AM

Add guard to only verify MF which this pass have modified.

lei added a comment.Feb 7 2017, 10:46 AM

Just a friendly reminder that this patch is waiting to be reviewed. Thanks!

kbarton accepted this revision.Feb 8 2017, 12:10 PM

Aside from a couple minor comments, this LGTM.

It would be good if @hfinkel or @echristo could take a look also.

lib/CodeGen/BranchCoalescing.cpp
119 ↗(On Diff #86082)

Branch Coalescing not Branch Coalesce

753 ↗(On Diff #86082)

Did we conclude whether this is different from the checks done by the verifyAfter parameter in addPass?

This revision is now accepted and ready to land.Feb 8 2017, 12:10 PM

I'll look today.

In general pretty happy with this. A few comments inline.

A few top level comments:

What performance testing have you done with this?
Looks like SPEC on ppc?
Anything else?
Same for correctness checking. You mentioned that this doesn't affect ARM and then change a thumb testcase, what's up there?

Thanks!

-eric

lib/CodeGen/BranchCoalescing.cpp
31 ↗(On Diff #86082)

Let's have this default to off for the initial commit and then we can turn it on in a subsequent one.

230 ↗(On Diff #86082)

Probably should be something like "canCoalesceBranch" instead of analyzeBranch. All of the analysis is being done by analyzeBranch really and this is actually figuring out whether or not we should coalesce.

252 ↗(On Diff #86082)

Meta comment on the DEBUG statements. I like them, but it might be nice to give a summary of what you're analyzing here before all of the "can't do it" statements.

289–303 ↗(On Diff #86082)

I believe the case you're looking for here is also whether or not there's a single fall through? Might be nice to reorganize the code with that in mind.

363 ↗(On Diff #86082)

Could use an explanation of why - it confused Nemanja at first and could confuse others.

612 ↗(On Diff #86082)

llvm_unreachable

642 ↗(On Diff #86082)

Nit: All comments should be complete sentences. (A few other occurrences)

663–666 ↗(On Diff #86082)

This set of comments is very confusing.

lei added a comment.EditedFeb 13 2017, 12:55 PM

In general pretty happy with this. A few comments inline.

A few top level comments:

What performance testing have you done with this?
Looks like SPEC on ppc?
Anything else?
Same for correctness checking. You mentioned that this doesn't affect ARM and then change a thumb testcase, what's up there?

Thanks!

-eric

Yes, the only performance tests done was SPEC on PPC.

Seems I was testing with the wrong triple for ARM. The -mtriple I was testing with was "-mtriple=armv6-unknown-linux-gnu" which generated a pattern that we do not recognize. The test case below uses
-mtriple=thumb-apple-darwin and -mtriple=thumb-pc-linux-gnueabi. The select statements for those triples does generate the pattern we are looking for.

I have updated the batch to be turned off by default.

lei added inline comments.Feb 13 2017, 1:03 PM
lib/CodeGen/BranchCoalescing.cpp
31 ↗(On Diff #86082)

okay

119 ↗(On Diff #86082)

okay

230 ↗(On Diff #86082)

true

252 ↗(On Diff #86082)

okay

lei marked 5 inline comments as done.Feb 20 2017, 8:33 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
363 ↗(On Diff #86082)

Sine PHI node ordering can not be assumed, it doesn't really matter where we place the PHI instructions. Will update comment to reflect this.

lei marked 5 inline comments as done.Feb 20 2017, 10:03 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
663–666 ↗(On Diff #86082)

okay

Because select gets converted to branches in PowerPC, I was wondering if there is a way to not generate selects in the first place.

lei marked 5 inline comments as done.Feb 21 2017, 10:11 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
612 ↗(On Diff #86082)

okay

lei marked 3 inline comments as done.Feb 21 2017, 10:43 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
642 ↗(On Diff #86082)

k.

753 ↗(On Diff #86082)

Yes. This is different from the checks done by the verifyAfter parameter in addPass. The verifyAfter parameter for this pass is set to true by default.

lei marked 4 inline comments as done.Feb 21 2017, 11:00 AM

Because select gets converted to branches in PowerPC, I was wondering if there is a way to not generate selects in the first place.

It is not true in general that selects get converted to branches on PowerPC. For example, integer selects can be lowered into an actual "integer select" instruction under the "right" circumstances.
I'm sure there is a way to prevent the selects from being produced, but I am not sure this would have any effect on this optimization. For the specific test case that is added as part of this patch, it appears that SROA produces the selects. I haven't personally looked at the semantics of SROA and the heuristics it uses to produce a select or not, but there is presumably a way to tell it not to do so.

Because select gets converted to branches in PowerPC, I was wondering if there is a way to not generate selects in the first place.

It is not true in general that selects get converted to branches on PowerPC. For example, integer selects can be lowered into an actual "integer select" instruction under the "right" circumstances.
I'm sure there is a way to prevent the selects from being produced, but I am not sure this would have any effect on this optimization. For the specific test case that is added as part of this patch, it appears that SROA produces the selects. I haven't personally looked at the semantics of SROA and the heuristics it uses to produce a select or not, but there is presumably a way to tell it not to do so.

In general, we want to produce selects rather than explicit branching due to superblock formation etc. On Power9 we'll find that the integer select instruction is also much faster than on Power8 and so will experience less of a penalty as well.

You marked things done that aren't done. What's going on? :)

-eric

lib/CodeGen/BranchCoalescing.cpp
230 ↗(On Diff #86082)

You've marked this done, but it's not done?

lei marked an inline comment as done.Feb 22 2017, 11:49 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
289–303 ↗(On Diff #86082)

okay.

lei marked an inline comment as done.Feb 22 2017, 11:51 AM

You marked things done that aren't done. What's going on? :)

-eric

Someone told me no one looked at that so I was using it to mark what I have done in my version ... which is not uploaded yet :) Will stop doing that now that I know it is actually being noted on.

lei marked an inline comment as not done.Feb 22 2017, 11:53 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
289–303 ↗(On Diff #86082)

Will reorganize this.

In D28249#683815, @lei wrote:

You marked things done that aren't done. What's going on? :)

-eric

Someone told me no one looked at that so I was using it to mark what I have done in my version ... which is not uploaded yet :) Will stop doing that now that I know it is actually being noted on.

Thanks :)

I very much look at it to make sure everything I asked for was done before coming back to reviews.

lei updated this revision to Diff 89397.Feb 22 2017, 12:09 PM

Address review comments.

lei marked 2 inline comments as done.Feb 22 2017, 12:10 PM
In D28249#683815, @lei wrote:

You marked things done that aren't done. What's going on? :)

-eric

Someone told me no one looked at that so I was using it to mark what I have done in my version ... which is not uploaded yet :) Will stop doing that now that I know it is actually being noted on.

Thanks :)

I very much look at it to make sure everything I asked for was done before coming back to reviews.

Now it's really done :)

Bunch of inline comments.

Thanks!

-eric

include/llvm/CodeGen/Passes.h
406 ↗(On Diff #89397)

Nit: Complete sentences in comments please.

lib/CodeGen/BranchCoalescing.cpp
352 ↗(On Diff #89397)

"beginning"

367 ↗(On Diff #89397)

"contains"

392–393 ↗(On Diff #89397)

This seems like an odd comment.

404 ↗(On Diff #89397)

I think it makes just as much sense (and is more readable) if you split the function into canMoveToBeginning and canMoveToEnd and just drop the boolean parameter.

It'll also make it easier to split the docs.

453–466 ↗(On Diff #89397)

Possible you just want to replace this with a bunch of llvm_unreachables?

479 ↗(On Diff #89397)

Why?

487 ↗(On Diff #89397)

Sounds like an assert.

516 ↗(On Diff #89397)

Seems like you want to swap these conditionals?

534 ↗(On Diff #89397)

else if? Otherwise what happens?

757 ↗(On Diff #89397)

This was to be turned on in a subsequent patch. Remove this and the ppc support please :)

378 ↗(On Diff #82932)

I don't think this has ever been changed, but I'm not against a separate function. That said, doesn't transferSuccessorsAndUpdatePHIs already do this?

Also, this loop needs documentation :)

393 ↗(On Diff #82932)

Should probably assert this somewhere.

lei added inline comments.Feb 24 2017, 8:35 AM
include/llvm/CodeGen/Passes.h
406 ↗(On Diff #89397)

okay

lib/CodeGen/BranchCoalescing.cpp
392–393 ↗(On Diff #89397)

A MI instruction can only be moved to TargetMBB if there are no uses of it within the TargetMBB's PHI nodes. Will reword the comment.

404 ↗(On Diff #89397)

agreed.

453–466 ↗(On Diff #89397)

yup

479 ↗(On Diff #89397)

Maybe preference is the wrong word here... this is by design. Will update the doc

487 ↗(On Diff #89397)

we do assert on that, will update the doc here.

534 ↗(On Diff #89397)

No else if. We check conditions to move up and down separately and verify that only one direction is valid.

378 ↗(On Diff #82932)

It's because here we only want to move the PHIs from SourceMBB to TargetMBB. SourceMBB will be deleted later. We don't want to transfer the successor info here. Control flow need to be updated once we have deleted SourceMBB.

393 ↗(On Diff #82932)

We don't assert cause we first check one direction (canMoveToBeginning) and then check the other direction (canMoveToEnd). If both fails, we just don't move the instruction and move to investigate the next one.

lei updated this revision to Diff 89775.Feb 24 2017, 10:50 PM

Address review comments.

lei marked 12 inline comments as done.Feb 24 2017, 10:54 PM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
757 ↗(On Diff #89397)

see line 161

lei marked an inline comment as done.Feb 24 2017, 10:56 PM
echristo accepted this revision.Feb 28 2017, 1:33 PM

LGTM.

Thanks! Now let's get some performance testing and go from there.

-eric

This revision was automatically updated to reflect the committed changes.
MatzeB added a subscriber: MatzeB.Apr 10 2017, 4:34 PM

Just noticed we have a new pass in the codegen pipeline and wondered what it is about. Some comments:

  • The description/examples talk about the same branch condition in the IR but the IR doesn't even have branches and this is an MI pass, not an IR pass.
  • When I codegen the given example on X86 I do indeed see silly code getting generated because isel chooses a "CMOV_FR64" for each of the select instructions which "Expand ISel Pseudo-instructions" later expands into 3 if-diamonds that all have the same condition.
  • It looks like we created a whole new pass to fix Expand ISel Pseudo begin stupid?
  • This is a generic codegen pass but the description here indicates it only ever happens to match the patterns on PowerPC?

Hi Matthias,

I'll let Lei talk to more of it, however...

Just noticed we have a new pass in the codegen pipeline and wondered what it is about. Some comments:

  • The description/examples talk about the same branch condition in the IR but the IR doesn't even have branches and this is an MI pass, not an IR pass.

Would you prefer s/IR/MIR? :)

That said, it's definitely meant to be an MI pass.

  • When I codegen the given example on X86 I do indeed see silly code getting generated because isel chooses a "CMOV_FR64" for each of the select instructions which "Expand ISel Pseudo-instructions" later expands into 3 if-diamonds that all have the same condition.

Yep. The arm and mips backends also do the same thing.

  • It looks like we created a whole new pass to fix Expand ISel Pseudo begin stupid?

I agree up to a point, though I don't have any particular ideas here how to fix it. My thought was "let the backends expand into the easiest code as possible and then clean it up with a generic pass", but I'm definitely open to different ideas.

  • This is a generic codegen pass but the description here indicates it only ever happens to match the patterns on PowerPC?

It's not so much that as the original patch was on by default and required changes to the ARM testcases, but the motivating example was particularly poor code on Power. We could probably throw some extra tests in, but it seems like a stretch to make all of the examples quite so universal? The code itself, of course, has no backend dependencies or even TTI.

Thoughts?

-eric

To say this first: I'm fine with the current implementation.

Hi Matthias,

I'll let Lei talk to more of it, however...

Just noticed we have a new pass in the codegen pipeline and wondered what it is about. Some comments:

  • The description/examples talk about the same branch condition in the IR but the IR doesn't even have branches and this is an MI pass, not an IR pass.

Would you prefer s/IR/MIR? :)

This comment was mostly about the description of this patch which was confusing as it only talked about IR. Actually looking at the code the documentation comment above the class is a lot better and explains the problem just fine.

That said, it's definitely meant to be an MI pass.

  • When I codegen the given example on X86 I do indeed see silly code getting generated because isel chooses a "CMOV_FR64" for each of the select instructions which "Expand ISel Pseudo-instructions" later expands into 3 if-diamonds that all have the same condition.

Yep. The arm and mips backends also do the same thing.

  • It looks like we created a whole new pass to fix Expand ISel Pseudo begin stupid?

I agree up to a point, though I don't have any particular ideas here how to fix it. My thought was "let the backends expand into the easiest code as possible and then clean it up with a generic pass", but I'm definitely open to different ideas.

Again no hard rule. My intuition would be that we would get by with less complexity and less code by improving expandisel pseudos; I would expect that matching the pattern and checking for correctness should be quite a bit easier there. But this isn't a big issue and now that Lei already went through the trouble of creating a generic pass we may just as well keep it.

  • This is a generic codegen pass but the description here indicates it only ever happens to match the patterns on PowerPC?

It's not so much that as the original patch was on by default and required changes to the ARM testcases, but the motivating example was particularly poor code on Power. We could probably throw some extra tests in, but it seems like a stretch to make all of the examples quite so universal? The code itself, of course, has no backend dependencies or even TTI.

Please ignore this comment. I was mainly triggered by "After Branch Coalescing" showing up in -print-after-all dumps but the pass never changing anything and the comments above about this only working on powerpc; Turns out the pass just checks the enable-branch-coalesce isn't passed as a commandline flag. I think most passes have a cl::opt in TargetPassConfig.cpp instead and we don't even add the pass the pipeline if it is disable, you could change to that style too to avoid people seeing the dump for a disabled pass :)

Good to hear this is a generic pass, I am pretty sure the code equally bad on the other targets and it is good that we fixed it. I'll add a ticket into our bugtracker to experiment/evaluate on X86/AArch64.

  • Matthias
lei added a comment.Apr 28 2017, 7:59 AM

I think most passes have a cl::opt in TargetPassConfig.cpp instead and we don't even add the pass the pipeline if it is disable, you could change to that style too to avoid people seeing the dump for a disabled pass :)

Will address this in the next patch to turn this on by default for PowerPC.
Thanks

iteratee added inline comments.
llvm/trunk/lib/CodeGen/BranchCoalescing.cpp
340

This line isn't correct. I have a case where instructions that "produce the same value" are different. The relevant sequence is:

BB#16: 
...
        BCTRL8_LDinto_toc
...
        %vreg140<def> = COPY %CR0GT; CRBITRC:%vreg140
        %vreg141<def> = LXSDX %vreg138, %vreg129, %RM<imp-use>; mem:LD8[%134](dereferenceable) F8RC:%vreg141 G8RC_and_G8RC_NOX0:%vreg138 G8RC:%vreg129
        %vreg142<def> = XXLXORdpz; F8RC:%vreg142
        BC %vreg140, <BB#73>; CRBITRC:%vreg140

BB#72: derived from LLVM BB %114
    Predecessors according to CFG: BB#16
    Successors according to CFG: BB#73(?%)

BB#73: derived from LLVM BB %114
    Predecessors according to CFG: BB#16 BB#72
        %vreg143<def> = PHI %vreg142, <BB#72>, %vreg141, <BB#16>; F8RC:%vreg143,%vreg142,%vreg141
...
        BCTRL8_LDinto_toc
...
        %vreg149<def> = COPY %CR0GT; CRBITRC:%vreg149
        %vreg150<def> = LXSDX %vreg138, %vreg129, %RM<imp-use>; mem:LD8[%134](dereferenceable) F8RC:%vreg150 G8RC_and_G8RC_NOX0:%vreg138 G8RC:%vreg129
        BC %vreg149, <BB#75>; CRBITRC:%vreg149
    Successors according to CFG: BB#74(?%) BB#75(?%)

BB#74: derived from LLVM BB %114
    Predecessors according to CFG: BB#73
    Successors according to CFG: BB#75(?%)

BB#75: derived from LLVM BB %114
    Predecessors according to CFG: BB#73 BB#74
        %vreg151<def> = PHI %vreg142, <BB#74>, %vreg150, <BB#73>; F8RC:%vreg151,%vreg142,%vreg150

The debug output produces:

Valid Candidate
Op1: 1024
Op2: 1024
Op1 and Op2 are identical!
Op1: %vreg140
Op2: %vreg149
Op1Def: %vreg140<def> = COPY %CR0GT; CRBITRC:%vreg140
 and %vreg149<def> = COPY %CR0GT; CRBITRC:%vreg149
 produce the same value!

While it would be safe to CSE those crmoves, what definitely cannot occur is to assume that the value of CR0GT has not changed between the 2 instructions.

hfinkel added inline comments.Aug 30 2017, 4:33 PM
llvm/trunk/lib/CodeGen/BranchCoalescing.cpp
340

Indeed; I think that we need to filter out instructions with physical-register uses (even if identical). We might be able to be a little smarter about it in a similar way to MachineLICM (the other uses of this function), which does this:

// Don't hoist an instruction that uses or defines a physical register.
if (TargetRegisterInfo::isPhysicalRegister(Reg)) {
  if (MO.isUse()) {
    // If the physreg has no defs anywhere, it's just an ambient register
    // and we can freely move its uses. Alternatively, if it's allocatable,
    // it could get allocated to something with a def during allocation.
    // However, if the physreg is known to always be caller saved/restored
    // then this use is safe to hoist.
    if (!MRI->isConstantPhysReg(Reg) &&
        !(TRI->isCallerPreservedPhysReg(Reg, *I.getParent()->getParent())))
        return false;
    // Otherwise it's safe to move.
    continue;
  } else if (!MO.isDead()) {
    // A def that isn't dead. We can't move it.
    return false;
  ...
lei added a comment.Sep 6 2017, 7:34 PM
llvm/trunk/lib/CodeGen/BranchCoalescing.cpp
340