This is an archive of the discontinued LLVM Phabricator instance.

Improve scheduling with branch coalescing
ClosedPublic

Authored by lei on Jan 3 2017, 12:39 PM.

Details

Summary

Improve scheduling by coalescing branches that depend on the same condition. This pass looks for blocks that are guarded by the same branch condition in the IR and attempts to merge the blocks together. This is done by moving code either up/down to it’s predecessor/successor blocks.

On power8 LE, we see a 11% improvement for lbm and 28% improvement for mcf (SPEC2006).

I tried the following test on ARM and X86.

$ cat branchC.ll
; RUN: llc -mcpu=generic -mtriple=x86_64-unknown-linux -verify-machineinstrs < %s | FileCheck %s
; RUN: llc -mtriple=armv6-unknown-linux-gnu < %s | FileCheck %s
; RUN: llc -verify-machineinstrs -o - %s -mtriple=aarch64-linux-gnu | FileCheck %s

; Function Attrs: nounwind
define double @testBranchCoal(double %a, double %b, double %c, i32 %x) {
entry:

%test = icmp eq i32 %x, 0
%tmp1 = select i1 %test, double %a, double 2.000000e-03
%tmp2 = select i1 %test, double %b, double 0.000000e+00
%tmp3 = select i1 %test, double %c, double 5.000000e-03

%res1 = fadd double %tmp1, %tmp2
%result = fadd double %res1, %tmp3
ret double %result

}

This does not affect ARM since the LLVM IR produced does not conform to the pattern we expected.
For X86, the branches were not coalesced since the terminator produced contain implicit operands. This code will only coalesce branches whose terminators contain explicit operands.

This is originally reported in: https://llvm.org/bugs/show_bug.cgi?id=25219

Diff Detail

Repository
rL LLVM

Event Timeline

lei updated this revision to Diff 82932.Jan 3 2017, 12:39 PM
lei retitled this revision from to Improve scheduling with branch coalescing.
lei updated this object.
lei added reviewers: Carrot, nemanjai, kbarton, echristo.
lei added a subscriber: llvm-commits.
lei added a reviewer: hfinkel.Jan 3 2017, 12:44 PM
lei added subscribers: jtony, sfertile, syzaara, craig.topper.
nemanjai edited edge metadata.Jan 9 2017, 5:09 PM

These inline comments are based on a first-pass reading of the patch. As such, they refer to issues that are rather local in scope. This is a large patch so I'll set aside some more time to fully understand it in its entirety in the next review cycle.

lib/CodeGen/BranchCoalescing.cpp
217 ↗(On Diff #82932)

We want to reset the stats for every function? I would have assumed we're interested in collecting this for the entire module.

227 ↗(On Diff #82932)

I am personally not a fan of functions that modify an output parameter and then fail. Presumably this makes no difference now, but if we end up extending this pass in the future to have some backtracking/requeuing capabilities, it would be nice if the object wasn't modified/invalidated in some way.

336 ↗(On Diff #82932)

Use same debug format for both (i.e. space after the colon).

344 ↗(On Diff #82932)

Won't this fail (as in assert) if Op2.isReg() != true?

356 ↗(On Diff #82932)

Maybe a comment as to why we fail here if we get physical registers. Also, if I follow the control flow here correctly, we will return true here if one is a physical register and the other is a virtual register. Is that what we want? I imagine not.

378 ↗(On Diff #82932)

Not that it makes a difference computationally, but perhaps for consistency:

for (MachineBasicBlock::instr_iterator Instr : MBB.instrs())
  if (Instr->isPHI())
    return Instr;
  return MBB.instr_end();

Also, I'm not sure what the style guide says regarding the use of auto there, but perhaps instead of MachineBasicBlock::instr_iterator, it should be auto.

Finally, if every use of this function will just use begin() instead of end() when there are no existing PHI's, perhaps it would make sense to actually return begin() if there are no PHI's in the MBB.

393 ↗(On Diff #82932)

Can you please state why something like MachineBasicBlock::SkipPHIsLabelsAndDebug() cannot be used instead of this function (thereby eliminating the need for that function altogether)?

409 ↗(On Diff #82932)

You should have a range of iterators that represent the PHI's in the From block. Why not use the range-version of splice() after this loop?
If it is to avoid the empty range check on the PHI's in the From block, maybe just a comment to that end.

415 ↗(On Diff #82932)

The second sentence is redundant.

419 ↗(On Diff #82932)

'... used in this block ...' means '... used in original block ...'?

Overall, I think this comment is kind of hard to follow. I think it would suffice to say when an instruction can move to the begin/end location. Something along the lines of:

An instruction MI can move from MBB Source to MBB Target under the following conditions:

  • There are no PHI's in Target that use what MI defines
  • No instructions in Source use what MI defines (unless Target is a dominator of Source)
  • No instructions in Source define what MI uses (unless Source is a dominator of Target)
  • If any instructions in Target define what MI uses, the MI can only move to the end of Target
442 ↗(On Diff #82932)

I don't understand the purpose of the comment.

466 ↗(On Diff #82932)

I don't think this function suffices to determine whether an instruction can move to a block.
Example MBB's:

BB#0:
...
; some non-PHI def of %vreg2
; some non-PHI def of %vreg3
%vreg10 = ADD %vreg2, %vreg3 ; MI
...

BB#1: ; MBB1
%vreg5 = PHI %vreg10, <BB#0>, %vreg4, <BB#2>
...

The call:
canMoveTo(MI, MBB1, false)
Would return true, but it is not safe to do so.

I think this patch uses this query in very limited circumstances, so it isn't necessarily required for this function to determine whether it is safe to move an arbitrary instruction to an arbitrary block. But if this is the case, we should add asserts to eliminate incorrect uses of the function (in the future).

507 ↗(On Diff #82932)

Why are all of these asserts rather than early exits returning false? On builds that compile asserts out, I imagine we could end up doing the wrong thing.

510 ↗(On Diff #82932)

I don't think there's anything to be gained from printing the name of the function emitting the message in the debug output. Please remove this.

554 ↗(On Diff #82932)

The use of 'From', 'from', 'To' and 'to' makes the comment confusing. Please consider renaming the parameters or otherwise clarifying this comment. Perhaps something like:
Merge the CFG triangle \p From into the preceding CFG triangle \p To.

608 ↗(On Diff #82932)

Are these two asserts enough or does From.BranchBlock need to [immediately] dominate both To.BranchTargetBlock and To.FallThroughBlock?

620 ↗(On Diff #82932)

You've already moved all the PHI's from From.BranchBlock. Can't this just be a loop over From.BranchBlock->instrs()? Or it may need to be a loop from rbegin() to rend().

626 ↗(On Diff #82932)

This is loop-invariant. I think it'd be cleaner to move the assert out of the loop and call splice() on the result of a ternary operator. Maybe even invert the if and the loop.

655 ↗(On Diff #82932)

Where is the assert to ensure this?

lei marked 12 inline comments as done.Jan 22 2017, 9:58 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
217 ↗(On Diff #82932)

Initialization removed.

227 ↗(On Diff #82932)

agreed.

336 ↗(On Diff #82932)

fixed

344 ↗(On Diff #82932)

No, since we check that Op2 and Op1 are the same type on line 330. We do an early exit if they are not the same type. getType() returns an enum MachineOperandType, isReg() basically checks that the MachineOperand's type (MachineOperandType) is of the specific enum MO_Register.

356 ↗(On Diff #82932)

It was just put in to see if we trip on it. I have removed this.

378 ↗(On Diff #82932)

Can begin() also be a PHI? If not, then that is okay, if it can then it is better to return end() so that we know there are no PHIs in this block.

393 ↗(On Diff #82932)

MachineBasicBlock::SkipPHIsLabelsAndDebug() returns the iterator to the first NON PHI, non-label instruction. We want the iterator to the first PHI node in a MBB. I don't see any function within that class that returns the iterator to the first PHI.

415 ↗(On Diff #82932)

agreed

419 ↗(On Diff #82932)

Arguments renamed and doc updated to be more clear.

466 ↗(On Diff #82932)

You are right, this function alone can not be used to determine whether an instruction can move to a random block. This function is a helper function used by canMerge() to determine whether 2 candidates can be merged.

507 ↗(On Diff #82932)

This seem to be the case for other classes like MachineOperand. If assert is off it could end up doing the wrong thing....

510 ↗(On Diff #82932)

okay

554 ↗(On Diff #82932)

Renamed "From/To" to "SourceRegion/TargetRegion". I believe this is now less confusing ...

608 ↗(On Diff #82932)

From.BranchBlock MUST equal To.BranchTargetBlock, not dominate.

620 ↗(On Diff #82932)

agreed

626 ↗(On Diff #82932)

assert moved to top of function

655 ↗(On Diff #82932)

In analyzeBranch(), line 297, we do an early exit if the FallThroughBlock contain code.

lei marked 12 inline comments as done.Jan 22 2017, 10:29 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
620 ↗(On Diff #82932)

changing this to be a range based splice call instead.

nemanjai added inline comments.Jan 22 2017, 12:21 PM
lib/CodeGen/BranchCoalescing.cpp
344 ↗(On Diff #82932)

Oh OK. I missed the early exit check. I think the layout of this function is a little confusing so I can see why I missed it.
It seems that the conditions under which we return true are (any of):

  • The lists are pair-wise identical
  • For any pairs that aren't identical, they're both registers that produce the same value

Why not such a simple layout to the function? For example, your check for same type is repeated in the call to MachineOperand::isIdenticalTo().
It seems to me like a layout such as this for the loop encapsulates what this function should do:

for (unsigned i = 0; i < OpList1.size(); ++i) {
  const MachineOperand &Op1 = OpList1[i];
  const MachineOperand &Op2 = OpList2[i];
  DEBUG(dbgs() << "Op1: " << Op1 << "\n"
               << "Op2: " << Op2 << "\n");

  if (Op1.isIdenticalTo(Op2)) {
    DEBUG(dbgs() << "Op1 and Op2 are identical!\n");
    return true;
  }

  if (Op1.isReg() && Op2.isReg() &&
      TargetRegisterInfo::isVirtualRegister(Op1.getReg()) &&
      TargetRegisterInfo::isVirtualRegister(Op2.getReg())) {
    if (TII->produceSameValue(*Op1Def, *Op2Def, MRI)) {
      DEBUG(dbgs() << "Operands produce the same value.\n");
      return true;
    }
  }
  DEBUG(dbgs() << "The operands are not provably identical.\n");
  return false;
}
356 ↗(On Diff #82932)

I'm afraid this won't suffice. If it were to somehow happen that (at least) one is a physical register, you'll return true here - regardless of whether it's the same register (not that it really matters whether it's the same register or not).

378 ↗(On Diff #82932)

Are you referring to some potential future use of this function? Because this result isn't currently used to determine whether the block has any PHI's. But I suppose you're right it is potentially dangerous for it to return a valid iterator when there are no PHI's.

Also, the work this function does just seems like overkill. It seems like you should only be checking instructions between MBB.instr_begin() and MBB.getFirstNonPHI(). Why does that not suffice?

In any case, considering this function is only used once and in many cases it'll return a value that you're just going to discard and use the begin() pointer, I don't think this function should exist. Just find the insert location in the function where you need it.

386 ↗(On Diff #82932)

It seems below like you're inserting them before any existing PHI instructions.

393 ↗(On Diff #82932)

OK, that makes sense. I suppose you want to make sure that you skip anything that comes between begin() and the first PHI node in the ToMBB.

But I guess what I'm confused about is exactly what this function is meant to do. It appears that it will do the following:

  • Iterate over all the PHI nodes in MBB called FromMBB
  • If there are any PHI nodes that refer to the same MBB, they'll be updated so they refer to the MBB called ToMBB
  • Then each of those PHI nodes will be moved to MBB called ToMBB before any existing PHI nodes in ToMBB

However, what happens to any PHI nodes in ToMBB that refer to FromMBB? They don't need to be updated?

466 ↗(On Diff #82932)

Yes, but please add asserts to ensure this function isn't used incorrectly in the future. It is dangerous for a function called canMoveTo to return true when an instruction can't safely move to a basic block.

608 ↗(On Diff #82932)

The two are not mutually exclusive. Also, this doesn't answer whether From.BranchBlock needs to dominate To.FallThroughBlock.

655 ↗(On Diff #82932)

Sure. Now there is. I'd still prefer an assert. If someone in the future feels that analyzeBranch() doesn't need that early exit any longer, they'll need to ensure they account for this assert and why it's in this function.

729 ↗(On Diff #82932)

Setting a bool to true and then using it in an if statement without any statements that could modify it in between is redundant.
Besides, I think it's overly verbose to dump the entire function after every merge - I'd imagine that a dump after all the merging is done would suffice.

nemanjai added inline comments.Jan 23 2017, 9:54 AM
lib/CodeGen/BranchCoalescing.cpp
344 ↗(On Diff #82932)

Sorry, the two return true; lines should be continue; so that you keep checking the remainder of the lists rather than returning true if the first pair are identical.

lei marked 12 inline comments as done.Jan 23 2017, 10:19 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
344 ↗(On Diff #82932)

Minor change so that we are doing early exit when operands within the operand lists are detected to be diff.

for (unsigned i = 0; i < OpList1.size(); ++i) {
   const MachineOperand &Op1 = OpList1[i];
   const MachineOperand &Op2 = OpList2[i];

   DEBUG(dbgs() << "Op1: " << Op1 << "\n"
                << "Op2: " << Op2 << "\n");

   if (Op1.isIdenticalTo(Op2)) {
     DEBUG(dbgs() << "Op1 and Op2 are identical!\n");
     continue;
   }

   // If the operands are not identical, but are registers, check to see if the
   // definition of the register produces the same value. If they produce the
   // same value, consider them to be identical.
   if (Op1.isReg() && Op2.isReg() &&
       TargetRegisterInfo::isVirtualRegister(Op1.getReg()) &&
       TargetRegisterInfo::isVirtualRegister(Op2.getReg())) {
     MachineInstr *Op1Def = MRI->getVRegDef(Op1.getReg());
     MachineInstr *Op2Def = MRI->getVRegDef(Op2.getReg());
     if (TII->produceSameValue(*Op1Def, *Op2Def, MRI)) {
       DEBUG(dbgs() << "Op1Def: " << *Op1Def << " and " << *Op2Def
                    << " produce the same value!\n");
     } else {
       DEBUG(dbgs() << "Operands produce different values\n");
       return false;
     }
   } else {
     DEBUG(dbgs() << "The operands are not provably identical.\n");
     return false;
   }
 }
356 ↗(On Diff #82932)

changes in previous comment will address this issue as well.

466 ↗(On Diff #82932)

We always return false when an instruction isn't safe to remove. This function just does a check to see if it's valid to move an instruction to the BB at the specified location. If won't work if it asserts when it can't be moved.

lei marked 6 inline comments as done.Jan 23 2017, 10:48 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
378 ↗(On Diff #82932)

will remove function and add loop to location where it's being used.

386 ↗(On Diff #82932)

Yes. They should be inserted before existing PHIs. Will update the doc.

393 ↗(On Diff #82932)

This function moves PHI nodes in the FromMBB to the beginning of the PHI block in ToMBB. If any PHI nodes in ToMBB reference registers defined in PHI nodes in FromMBB, the PHI nodes in FromMBB can not be moved to ToMBB since we can not infer order in the PHI node execution.

lei marked 8 inline comments as done.Jan 23 2017, 10:50 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
344 ↗(On Diff #82932)

see code snippet above.

lei marked an inline comment as done.Jan 24 2017, 12:34 PM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
655 ↗(On Diff #82932)

okay

729 ↗(On Diff #82932)

For debug purposes, it would be nice to see the stages of merging...
Will remove the if statement.

lei marked 3 inline comments as done.Jan 24 2017, 12:50 PM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
378 ↗(On Diff #82932)

Actually according to the llvm lang ref, PHI instructions must be first in a basic block. Will remove this function and replace the function call below with MBB->begin()

lei added inline comments.Jan 24 2017, 12:50 PM
lib/CodeGen/BranchCoalescing.cpp
409 ↗(On Diff #82932)

will update to use range version of splice()

lei marked an inline comment as done.Jan 24 2017, 9:01 PM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
608 ↗(On Diff #82932)

The same kind of checks are needed for function canMerge and mergeCandidate, Put the checks into function validateCandidates() and call it to verify MBBs and do an early exit if returns false.

lei marked an inline comment as done.Jan 25 2017, 8:46 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
442 ↗(On Diff #82932)

will remove this redundant check since MI.defs() return register definitions.

lei updated this revision to Diff 85784.Jan 25 2017, 10:55 AM

Updated patch to address all the review comments from Nemanja.

lei updated this revision to Diff 86082.Jan 27 2017, 11:21 AM

Add guard to only verify MF which this pass have modified.

lei added a comment.Feb 7 2017, 10:46 AM

Just a friendly reminder that this patch is waiting to be reviewed. Thanks!

kbarton accepted this revision.Feb 8 2017, 12:10 PM

Aside from a couple minor comments, this LGTM.

It would be good if @hfinkel or @echristo could take a look also.

lib/CodeGen/BranchCoalescing.cpp
119 ↗(On Diff #86082)

Branch Coalescing not Branch Coalesce

753 ↗(On Diff #86082)

Did we conclude whether this is different from the checks done by the verifyAfter parameter in addPass?

This revision is now accepted and ready to land.Feb 8 2017, 12:10 PM
echristo edited edge metadata.Feb 8 2017, 1:06 PM

I'll look today.

In general pretty happy with this. A few comments inline.

A few top level comments:

What performance testing have you done with this?
Looks like SPEC on ppc?
Anything else?
Same for correctness checking. You mentioned that this doesn't affect ARM and then change a thumb testcase, what's up there?

Thanks!

-eric

lib/CodeGen/BranchCoalescing.cpp
31 ↗(On Diff #86082)

Let's have this default to off for the initial commit and then we can turn it on in a subsequent one.

230 ↗(On Diff #86082)

Probably should be something like "canCoalesceBranch" instead of analyzeBranch. All of the analysis is being done by analyzeBranch really and this is actually figuring out whether or not we should coalesce.

252 ↗(On Diff #86082)

Meta comment on the DEBUG statements. I like them, but it might be nice to give a summary of what you're analyzing here before all of the "can't do it" statements.

289–303 ↗(On Diff #86082)

I believe the case you're looking for here is also whether or not there's a single fall through? Might be nice to reorganize the code with that in mind.

363 ↗(On Diff #86082)

Could use an explanation of why - it confused Nemanja at first and could confuse others.

612 ↗(On Diff #86082)

llvm_unreachable

642 ↗(On Diff #86082)

Nit: All comments should be complete sentences. (A few other occurrences)

663–666 ↗(On Diff #86082)

This set of comments is very confusing.

lei added a comment.EditedFeb 13 2017, 12:55 PM

In general pretty happy with this. A few comments inline.

A few top level comments:

What performance testing have you done with this?
Looks like SPEC on ppc?
Anything else?
Same for correctness checking. You mentioned that this doesn't affect ARM and then change a thumb testcase, what's up there?

Thanks!

-eric

Yes, the only performance tests done was SPEC on PPC.

Seems I was testing with the wrong triple for ARM. The -mtriple I was testing with was "-mtriple=armv6-unknown-linux-gnu" which generated a pattern that we do not recognize. The test case below uses
-mtriple=thumb-apple-darwin and -mtriple=thumb-pc-linux-gnueabi. The select statements for those triples does generate the pattern we are looking for.

I have updated the batch to be turned off by default.

lei added inline comments.Feb 13 2017, 1:03 PM
lib/CodeGen/BranchCoalescing.cpp
31 ↗(On Diff #86082)

okay

119 ↗(On Diff #86082)

okay

230 ↗(On Diff #86082)

true

252 ↗(On Diff #86082)

okay

lei marked 5 inline comments as done.Feb 20 2017, 8:33 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
363 ↗(On Diff #86082)

Sine PHI node ordering can not be assumed, it doesn't really matter where we place the PHI instructions. Will update comment to reflect this.

lei marked 5 inline comments as done.Feb 20 2017, 10:03 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
663–666 ↗(On Diff #86082)

okay

Because select gets converted to branches in PowerPC, I was wondering if there is a way to not generate selects in the first place.

lei marked 5 inline comments as done.Feb 21 2017, 10:11 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
612 ↗(On Diff #86082)

okay

lei marked 3 inline comments as done.Feb 21 2017, 10:43 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
642 ↗(On Diff #86082)

k.

753 ↗(On Diff #86082)

Yes. This is different from the checks done by the verifyAfter parameter in addPass. The verifyAfter parameter for this pass is set to true by default.

lei marked 4 inline comments as done.Feb 21 2017, 11:00 AM

Because select gets converted to branches in PowerPC, I was wondering if there is a way to not generate selects in the first place.

It is not true in general that selects get converted to branches on PowerPC. For example, integer selects can be lowered into an actual "integer select" instruction under the "right" circumstances.
I'm sure there is a way to prevent the selects from being produced, but I am not sure this would have any effect on this optimization. For the specific test case that is added as part of this patch, it appears that SROA produces the selects. I haven't personally looked at the semantics of SROA and the heuristics it uses to produce a select or not, but there is presumably a way to tell it not to do so.

Because select gets converted to branches in PowerPC, I was wondering if there is a way to not generate selects in the first place.

It is not true in general that selects get converted to branches on PowerPC. For example, integer selects can be lowered into an actual "integer select" instruction under the "right" circumstances.
I'm sure there is a way to prevent the selects from being produced, but I am not sure this would have any effect on this optimization. For the specific test case that is added as part of this patch, it appears that SROA produces the selects. I haven't personally looked at the semantics of SROA and the heuristics it uses to produce a select or not, but there is presumably a way to tell it not to do so.

In general, we want to produce selects rather than explicit branching due to superblock formation etc. On Power9 we'll find that the integer select instruction is also much faster than on Power8 and so will experience less of a penalty as well.

You marked things done that aren't done. What's going on? :)

-eric

lib/CodeGen/BranchCoalescing.cpp
230 ↗(On Diff #86082)

You've marked this done, but it's not done?

lei marked an inline comment as done.Feb 22 2017, 11:49 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
289–303 ↗(On Diff #86082)

okay.

lei marked an inline comment as done.Feb 22 2017, 11:51 AM

You marked things done that aren't done. What's going on? :)

-eric

Someone told me no one looked at that so I was using it to mark what I have done in my version ... which is not uploaded yet :) Will stop doing that now that I know it is actually being noted on.

lei marked an inline comment as not done.Feb 22 2017, 11:53 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
289–303 ↗(On Diff #86082)

Will reorganize this.

In D28249#683815, @lei wrote:

You marked things done that aren't done. What's going on? :)

-eric

Someone told me no one looked at that so I was using it to mark what I have done in my version ... which is not uploaded yet :) Will stop doing that now that I know it is actually being noted on.

Thanks :)

I very much look at it to make sure everything I asked for was done before coming back to reviews.

lei updated this revision to Diff 89397.Feb 22 2017, 12:09 PM

Address review comments.

lei marked 2 inline comments as done.Feb 22 2017, 12:10 PM
In D28249#683815, @lei wrote:

You marked things done that aren't done. What's going on? :)

-eric

Someone told me no one looked at that so I was using it to mark what I have done in my version ... which is not uploaded yet :) Will stop doing that now that I know it is actually being noted on.

Thanks :)

I very much look at it to make sure everything I asked for was done before coming back to reviews.

Now it's really done :)

Bunch of inline comments.

Thanks!

-eric

include/llvm/CodeGen/Passes.h
406 ↗(On Diff #89397)

Nit: Complete sentences in comments please.

lib/CodeGen/BranchCoalescing.cpp
352 ↗(On Diff #89397)

"beginning"

367 ↗(On Diff #89397)

"contains"

392–393 ↗(On Diff #89397)

This seems like an odd comment.

404 ↗(On Diff #89397)

I think it makes just as much sense (and is more readable) if you split the function into canMoveToBeginning and canMoveToEnd and just drop the boolean parameter.

It'll also make it easier to split the docs.

453–466 ↗(On Diff #89397)

Possible you just want to replace this with a bunch of llvm_unreachables?

479 ↗(On Diff #89397)

Why?

487 ↗(On Diff #89397)

Sounds like an assert.

516 ↗(On Diff #89397)

Seems like you want to swap these conditionals?

534 ↗(On Diff #89397)

else if? Otherwise what happens?

757 ↗(On Diff #89397)

This was to be turned on in a subsequent patch. Remove this and the ppc support please :)

378 ↗(On Diff #82932)

I don't think this has ever been changed, but I'm not against a separate function. That said, doesn't transferSuccessorsAndUpdatePHIs already do this?

Also, this loop needs documentation :)

393 ↗(On Diff #82932)

Should probably assert this somewhere.

lei added inline comments.Feb 24 2017, 8:35 AM
include/llvm/CodeGen/Passes.h
406 ↗(On Diff #89397)

okay

lib/CodeGen/BranchCoalescing.cpp
392–393 ↗(On Diff #89397)

A MI instruction can only be moved to TargetMBB if there are no uses of it within the TargetMBB's PHI nodes. Will reword the comment.

404 ↗(On Diff #89397)

agreed.

453–466 ↗(On Diff #89397)

yup

479 ↗(On Diff #89397)

Maybe preference is the wrong word here... this is by design. Will update the doc

487 ↗(On Diff #89397)

we do assert on that, will update the doc here.

534 ↗(On Diff #89397)

No else if. We check conditions to move up and down separately and verify that only one direction is valid.

378 ↗(On Diff #82932)

It's because here we only want to move the PHIs from SourceMBB to TargetMBB. SourceMBB will be deleted later. We don't want to transfer the successor info here. Control flow need to be updated once we have deleted SourceMBB.

393 ↗(On Diff #82932)

We don't assert cause we first check one direction (canMoveToBeginning) and then check the other direction (canMoveToEnd). If both fails, we just don't move the instruction and move to investigate the next one.

lei updated this revision to Diff 89775.Feb 24 2017, 10:50 PM

Address review comments.

lei marked 12 inline comments as done.Feb 24 2017, 10:54 PM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
757 ↗(On Diff #89397)

see line 161

lei marked an inline comment as done.Feb 24 2017, 10:56 PM
echristo accepted this revision.Feb 28 2017, 1:33 PM

LGTM.

Thanks! Now let's get some performance testing and go from there.

-eric

This revision was automatically updated to reflect the committed changes.
MatzeB added a subscriber: MatzeB.Apr 10 2017, 4:34 PM

Just noticed we have a new pass in the codegen pipeline and wondered what it is about. Some comments:

  • The description/examples talk about the same branch condition in the IR but the IR doesn't even have branches and this is an MI pass, not an IR pass.
  • When I codegen the given example on X86 I do indeed see silly code getting generated because isel chooses a "CMOV_FR64" for each of the select instructions which "Expand ISel Pseudo-instructions" later expands into 3 if-diamonds that all have the same condition.
  • It looks like we created a whole new pass to fix Expand ISel Pseudo begin stupid?
  • This is a generic codegen pass but the description here indicates it only ever happens to match the patterns on PowerPC?

Hi Matthias,

I'll let Lei talk to more of it, however...

Just noticed we have a new pass in the codegen pipeline and wondered what it is about. Some comments:

  • The description/examples talk about the same branch condition in the IR but the IR doesn't even have branches and this is an MI pass, not an IR pass.

Would you prefer s/IR/MIR? :)

That said, it's definitely meant to be an MI pass.

  • When I codegen the given example on X86 I do indeed see silly code getting generated because isel chooses a "CMOV_FR64" for each of the select instructions which "Expand ISel Pseudo-instructions" later expands into 3 if-diamonds that all have the same condition.

Yep. The arm and mips backends also do the same thing.

  • It looks like we created a whole new pass to fix Expand ISel Pseudo begin stupid?

I agree up to a point, though I don't have any particular ideas here how to fix it. My thought was "let the backends expand into the easiest code as possible and then clean it up with a generic pass", but I'm definitely open to different ideas.

  • This is a generic codegen pass but the description here indicates it only ever happens to match the patterns on PowerPC?

It's not so much that as the original patch was on by default and required changes to the ARM testcases, but the motivating example was particularly poor code on Power. We could probably throw some extra tests in, but it seems like a stretch to make all of the examples quite so universal? The code itself, of course, has no backend dependencies or even TTI.

Thoughts?

-eric

To say this first: I'm fine with the current implementation.

Hi Matthias,

I'll let Lei talk to more of it, however...

Just noticed we have a new pass in the codegen pipeline and wondered what it is about. Some comments:

  • The description/examples talk about the same branch condition in the IR but the IR doesn't even have branches and this is an MI pass, not an IR pass.

Would you prefer s/IR/MIR? :)

This comment was mostly about the description of this patch which was confusing as it only talked about IR. Actually looking at the code the documentation comment above the class is a lot better and explains the problem just fine.

That said, it's definitely meant to be an MI pass.

  • When I codegen the given example on X86 I do indeed see silly code getting generated because isel chooses a "CMOV_FR64" for each of the select instructions which "Expand ISel Pseudo-instructions" later expands into 3 if-diamonds that all have the same condition.

Yep. The arm and mips backends also do the same thing.

  • It looks like we created a whole new pass to fix Expand ISel Pseudo begin stupid?

I agree up to a point, though I don't have any particular ideas here how to fix it. My thought was "let the backends expand into the easiest code as possible and then clean it up with a generic pass", but I'm definitely open to different ideas.

Again no hard rule. My intuition would be that we would get by with less complexity and less code by improving expandisel pseudos; I would expect that matching the pattern and checking for correctness should be quite a bit easier there. But this isn't a big issue and now that Lei already went through the trouble of creating a generic pass we may just as well keep it.

  • This is a generic codegen pass but the description here indicates it only ever happens to match the patterns on PowerPC?

It's not so much that as the original patch was on by default and required changes to the ARM testcases, but the motivating example was particularly poor code on Power. We could probably throw some extra tests in, but it seems like a stretch to make all of the examples quite so universal? The code itself, of course, has no backend dependencies or even TTI.

Please ignore this comment. I was mainly triggered by "After Branch Coalescing" showing up in -print-after-all dumps but the pass never changing anything and the comments above about this only working on powerpc; Turns out the pass just checks the enable-branch-coalesce isn't passed as a commandline flag. I think most passes have a cl::opt in TargetPassConfig.cpp instead and we don't even add the pass the pipeline if it is disable, you could change to that style too to avoid people seeing the dump for a disabled pass :)

Good to hear this is a generic pass, I am pretty sure the code equally bad on the other targets and it is good that we fixed it. I'll add a ticket into our bugtracker to experiment/evaluate on X86/AArch64.

  • Matthias
lei added a comment.Apr 28 2017, 7:59 AM

I think most passes have a cl::opt in TargetPassConfig.cpp instead and we don't even add the pass the pipeline if it is disable, you could change to that style too to avoid people seeing the dump for a disabled pass :)

Will address this in the next patch to turn this on by default for PowerPC.
Thanks

iteratee added inline comments.
llvm/trunk/lib/CodeGen/BranchCoalescing.cpp
340

This line isn't correct. I have a case where instructions that "produce the same value" are different. The relevant sequence is:

BB#16: 
...
        BCTRL8_LDinto_toc
...
        %vreg140<def> = COPY %CR0GT; CRBITRC:%vreg140
        %vreg141<def> = LXSDX %vreg138, %vreg129, %RM<imp-use>; mem:LD8[%134](dereferenceable) F8RC:%vreg141 G8RC_and_G8RC_NOX0:%vreg138 G8RC:%vreg129
        %vreg142<def> = XXLXORdpz; F8RC:%vreg142
        BC %vreg140, <BB#73>; CRBITRC:%vreg140

BB#72: derived from LLVM BB %114
    Predecessors according to CFG: BB#16
    Successors according to CFG: BB#73(?%)

BB#73: derived from LLVM BB %114
    Predecessors according to CFG: BB#16 BB#72
        %vreg143<def> = PHI %vreg142, <BB#72>, %vreg141, <BB#16>; F8RC:%vreg143,%vreg142,%vreg141
...
        BCTRL8_LDinto_toc
...
        %vreg149<def> = COPY %CR0GT; CRBITRC:%vreg149
        %vreg150<def> = LXSDX %vreg138, %vreg129, %RM<imp-use>; mem:LD8[%134](dereferenceable) F8RC:%vreg150 G8RC_and_G8RC_NOX0:%vreg138 G8RC:%vreg129
        BC %vreg149, <BB#75>; CRBITRC:%vreg149
    Successors according to CFG: BB#74(?%) BB#75(?%)

BB#74: derived from LLVM BB %114
    Predecessors according to CFG: BB#73
    Successors according to CFG: BB#75(?%)

BB#75: derived from LLVM BB %114
    Predecessors according to CFG: BB#73 BB#74
        %vreg151<def> = PHI %vreg142, <BB#74>, %vreg150, <BB#73>; F8RC:%vreg151,%vreg142,%vreg150

The debug output produces:

Valid Candidate
Op1: 1024
Op2: 1024
Op1 and Op2 are identical!
Op1: %vreg140
Op2: %vreg149
Op1Def: %vreg140<def> = COPY %CR0GT; CRBITRC:%vreg140
 and %vreg149<def> = COPY %CR0GT; CRBITRC:%vreg149
 produce the same value!

While it would be safe to CSE those crmoves, what definitely cannot occur is to assume that the value of CR0GT has not changed between the 2 instructions.

hfinkel added inline comments.Aug 30 2017, 4:33 PM
llvm/trunk/lib/CodeGen/BranchCoalescing.cpp
340

Indeed; I think that we need to filter out instructions with physical-register uses (even if identical). We might be able to be a little smarter about it in a similar way to MachineLICM (the other uses of this function), which does this:

// Don't hoist an instruction that uses or defines a physical register.
if (TargetRegisterInfo::isPhysicalRegister(Reg)) {
  if (MO.isUse()) {
    // If the physreg has no defs anywhere, it's just an ambient register
    // and we can freely move its uses. Alternatively, if it's allocatable,
    // it could get allocated to something with a def during allocation.
    // However, if the physreg is known to always be caller saved/restored
    // then this use is safe to hoist.
    if (!MRI->isConstantPhysReg(Reg) &&
        !(TRI->isCallerPreservedPhysReg(Reg, *I.getParent()->getParent())))
        return false;
    // Otherwise it's safe to move.
    continue;
  } else if (!MO.isDead()) {
    // A def that isn't dead. We can't move it.
    return false;
  ...
lei added a comment.Sep 6 2017, 7:34 PM
llvm/trunk/lib/CodeGen/BranchCoalescing.cpp
340