This is an archive of the discontinued LLVM Phabricator instance.

Improve scheduling with branch coalescing
ClosedPublic

Authored by lei on Jan 3 2017, 12:39 PM.

Details

Summary

Improve scheduling by coalescing branches that depend on the same condition. This pass looks for blocks that are guarded by the same branch condition in the IR and attempts to merge the blocks together. This is done by moving code either up/down to it’s predecessor/successor blocks.

On power8 LE, we see a 11% improvement for lbm and 28% improvement for mcf (SPEC2006).

I tried the following test on ARM and X86.

$ cat branchC.ll
; RUN: llc -mcpu=generic -mtriple=x86_64-unknown-linux -verify-machineinstrs < %s | FileCheck %s
; RUN: llc -mtriple=armv6-unknown-linux-gnu < %s | FileCheck %s
; RUN: llc -verify-machineinstrs -o - %s -mtriple=aarch64-linux-gnu | FileCheck %s

; Function Attrs: nounwind
define double @testBranchCoal(double %a, double %b, double %c, i32 %x) {
entry:

%test = icmp eq i32 %x, 0
%tmp1 = select i1 %test, double %a, double 2.000000e-03
%tmp2 = select i1 %test, double %b, double 0.000000e+00
%tmp3 = select i1 %test, double %c, double 5.000000e-03

%res1 = fadd double %tmp1, %tmp2
%result = fadd double %res1, %tmp3
ret double %result

}

This does not affect ARM since the LLVM IR produced does not conform to the pattern we expected.
For X86, the branches were not coalesced since the terminator produced contain implicit operands. This code will only coalesce branches whose terminators contain explicit operands.

This is originally reported in: https://llvm.org/bugs/show_bug.cgi?id=25219

Diff Detail

Event Timeline

lei updated this revision to Diff 82932.Jan 3 2017, 12:39 PM
lei retitled this revision from to Improve scheduling with branch coalescing.
lei updated this object.
lei added reviewers: Carrot, nemanjai, kbarton, echristo.
lei added a subscriber: llvm-commits.
lei added a reviewer: hfinkel.Jan 3 2017, 12:44 PM
lei added subscribers: jtony, sfertile, syzaara, craig.topper.
nemanjai edited edge metadata.Jan 9 2017, 5:09 PM

These inline comments are based on a first-pass reading of the patch. As such, they refer to issues that are rather local in scope. This is a large patch so I'll set aside some more time to fully understand it in its entirety in the next review cycle.

lib/CodeGen/BranchCoalescing.cpp
218

We want to reset the stats for every function? I would have assumed we're interested in collecting this for the entire module.

228

I am personally not a fan of functions that modify an output parameter and then fail. Presumably this makes no difference now, but if we end up extending this pass in the future to have some backtracking/requeuing capabilities, it would be nice if the object wasn't modified/invalidated in some way.

337

Use same debug format for both (i.e. space after the colon).

345

Won't this fail (as in assert) if Op2.isReg() != true?

357

Maybe a comment as to why we fail here if we get physical registers. Also, if I follow the control flow here correctly, we will return true here if one is a physical register and the other is a virtual register. Is that what we want? I imagine not.

379

Not that it makes a difference computationally, but perhaps for consistency:

for (MachineBasicBlock::instr_iterator Instr : MBB.instrs())
  if (Instr->isPHI())
    return Instr;
  return MBB.instr_end();

Also, I'm not sure what the style guide says regarding the use of auto there, but perhaps instead of MachineBasicBlock::instr_iterator, it should be auto.

Finally, if every use of this function will just use begin() instead of end() when there are no existing PHI's, perhaps it would make sense to actually return begin() if there are no PHI's in the MBB.

394

Can you please state why something like MachineBasicBlock::SkipPHIsLabelsAndDebug() cannot be used instead of this function (thereby eliminating the need for that function altogether)?

410

You should have a range of iterators that represent the PHI's in the From block. Why not use the range-version of splice() after this loop?
If it is to avoid the empty range check on the PHI's in the From block, maybe just a comment to that end.

416

The second sentence is redundant.

420

'... used in this block ...' means '... used in original block ...'?

Overall, I think this comment is kind of hard to follow. I think it would suffice to say when an instruction can move to the begin/end location. Something along the lines of:

An instruction MI can move from MBB Source to MBB Target under the following conditions:

  • There are no PHI's in Target that use what MI defines
  • No instructions in Source use what MI defines (unless Target is a dominator of Source)
  • No instructions in Source define what MI uses (unless Source is a dominator of Target)
  • If any instructions in Target define what MI uses, the MI can only move to the end of Target
443

I don't understand the purpose of the comment.

467

I don't think this function suffices to determine whether an instruction can move to a block.
Example MBB's:

BB#0:
...
; some non-PHI def of %vreg2
; some non-PHI def of %vreg3
%vreg10 = ADD %vreg2, %vreg3 ; MI
...

BB#1: ; MBB1
%vreg5 = PHI %vreg10, <BB#0>, %vreg4, <BB#2>
...

The call:
canMoveTo(MI, MBB1, false)
Would return true, but it is not safe to do so.

I think this patch uses this query in very limited circumstances, so it isn't necessarily required for this function to determine whether it is safe to move an arbitrary instruction to an arbitrary block. But if this is the case, we should add asserts to eliminate incorrect uses of the function (in the future).

508

Why are all of these asserts rather than early exits returning false? On builds that compile asserts out, I imagine we could end up doing the wrong thing.

511

I don't think there's anything to be gained from printing the name of the function emitting the message in the debug output. Please remove this.

555

The use of 'From', 'from', 'To' and 'to' makes the comment confusing. Please consider renaming the parameters or otherwise clarifying this comment. Perhaps something like:
Merge the CFG triangle \p From into the preceding CFG triangle \p To.

609

Are these two asserts enough or does From.BranchBlock need to [immediately] dominate both To.BranchTargetBlock and To.FallThroughBlock?

621

You've already moved all the PHI's from From.BranchBlock. Can't this just be a loop over From.BranchBlock->instrs()? Or it may need to be a loop from rbegin() to rend().

627

This is loop-invariant. I think it'd be cleaner to move the assert out of the loop and call splice() on the result of a ternary operator. Maybe even invert the if and the loop.

656

Where is the assert to ensure this?

lei marked 12 inline comments as done.Jan 22 2017, 9:58 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
218

Initialization removed.

228

agreed.

337

fixed

345

No, since we check that Op2 and Op1 are the same type on line 330. We do an early exit if they are not the same type. getType() returns an enum MachineOperandType, isReg() basically checks that the MachineOperand's type (MachineOperandType) is of the specific enum MO_Register.

357

It was just put in to see if we trip on it. I have removed this.

379

Can begin() also be a PHI? If not, then that is okay, if it can then it is better to return end() so that we know there are no PHIs in this block.

394

MachineBasicBlock::SkipPHIsLabelsAndDebug() returns the iterator to the first NON PHI, non-label instruction. We want the iterator to the first PHI node in a MBB. I don't see any function within that class that returns the iterator to the first PHI.

416

agreed

420

Arguments renamed and doc updated to be more clear.

467

You are right, this function alone can not be used to determine whether an instruction can move to a random block. This function is a helper function used by canMerge() to determine whether 2 candidates can be merged.

508

This seem to be the case for other classes like MachineOperand. If assert is off it could end up doing the wrong thing....

511

okay

555

Renamed "From/To" to "SourceRegion/TargetRegion". I believe this is now less confusing ...

609

From.BranchBlock MUST equal To.BranchTargetBlock, not dominate.

621

agreed

627

assert moved to top of function

656

In analyzeBranch(), line 297, we do an early exit if the FallThroughBlock contain code.

lei marked 12 inline comments as done.Jan 22 2017, 10:29 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
621

changing this to be a range based splice call instead.

nemanjai added inline comments.Jan 22 2017, 12:21 PM
lib/CodeGen/BranchCoalescing.cpp
345

Oh OK. I missed the early exit check. I think the layout of this function is a little confusing so I can see why I missed it.
It seems that the conditions under which we return true are (any of):

  • The lists are pair-wise identical
  • For any pairs that aren't identical, they're both registers that produce the same value

Why not such a simple layout to the function? For example, your check for same type is repeated in the call to MachineOperand::isIdenticalTo().
It seems to me like a layout such as this for the loop encapsulates what this function should do:

for (unsigned i = 0; i < OpList1.size(); ++i) {
  const MachineOperand &Op1 = OpList1[i];
  const MachineOperand &Op2 = OpList2[i];
  DEBUG(dbgs() << "Op1: " << Op1 << "\n"
               << "Op2: " << Op2 << "\n");

  if (Op1.isIdenticalTo(Op2)) {
    DEBUG(dbgs() << "Op1 and Op2 are identical!\n");
    return true;
  }

  if (Op1.isReg() && Op2.isReg() &&
      TargetRegisterInfo::isVirtualRegister(Op1.getReg()) &&
      TargetRegisterInfo::isVirtualRegister(Op2.getReg())) {
    if (TII->produceSameValue(*Op1Def, *Op2Def, MRI)) {
      DEBUG(dbgs() << "Operands produce the same value.\n");
      return true;
    }
  }
  DEBUG(dbgs() << "The operands are not provably identical.\n");
  return false;
}
357

I'm afraid this won't suffice. If it were to somehow happen that (at least) one is a physical register, you'll return true here - regardless of whether it's the same register (not that it really matters whether it's the same register or not).

379

Are you referring to some potential future use of this function? Because this result isn't currently used to determine whether the block has any PHI's. But I suppose you're right it is potentially dangerous for it to return a valid iterator when there are no PHI's.

Also, the work this function does just seems like overkill. It seems like you should only be checking instructions between MBB.instr_begin() and MBB.getFirstNonPHI(). Why does that not suffice?

In any case, considering this function is only used once and in many cases it'll return a value that you're just going to discard and use the begin() pointer, I don't think this function should exist. Just find the insert location in the function where you need it.

387

It seems below like you're inserting them before any existing PHI instructions.

394

OK, that makes sense. I suppose you want to make sure that you skip anything that comes between begin() and the first PHI node in the ToMBB.

But I guess what I'm confused about is exactly what this function is meant to do. It appears that it will do the following:

  • Iterate over all the PHI nodes in MBB called FromMBB
  • If there are any PHI nodes that refer to the same MBB, they'll be updated so they refer to the MBB called ToMBB
  • Then each of those PHI nodes will be moved to MBB called ToMBB before any existing PHI nodes in ToMBB

However, what happens to any PHI nodes in ToMBB that refer to FromMBB? They don't need to be updated?

467

Yes, but please add asserts to ensure this function isn't used incorrectly in the future. It is dangerous for a function called canMoveTo to return true when an instruction can't safely move to a basic block.

609

The two are not mutually exclusive. Also, this doesn't answer whether From.BranchBlock needs to dominate To.FallThroughBlock.

656

Sure. Now there is. I'd still prefer an assert. If someone in the future feels that analyzeBranch() doesn't need that early exit any longer, they'll need to ensure they account for this assert and why it's in this function.

730

Setting a bool to true and then using it in an if statement without any statements that could modify it in between is redundant.
Besides, I think it's overly verbose to dump the entire function after every merge - I'd imagine that a dump after all the merging is done would suffice.

nemanjai added inline comments.Jan 23 2017, 9:54 AM
lib/CodeGen/BranchCoalescing.cpp
345

Sorry, the two return true; lines should be continue; so that you keep checking the remainder of the lists rather than returning true if the first pair are identical.

lei marked 12 inline comments as done.Jan 23 2017, 10:19 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
345

Minor change so that we are doing early exit when operands within the operand lists are detected to be diff.

for (unsigned i = 0; i < OpList1.size(); ++i) {
   const MachineOperand &Op1 = OpList1[i];
   const MachineOperand &Op2 = OpList2[i];

   DEBUG(dbgs() << "Op1: " << Op1 << "\n"
                << "Op2: " << Op2 << "\n");

   if (Op1.isIdenticalTo(Op2)) {
     DEBUG(dbgs() << "Op1 and Op2 are identical!\n");
     continue;
   }

   // If the operands are not identical, but are registers, check to see if the
   // definition of the register produces the same value. If they produce the
   // same value, consider them to be identical.
   if (Op1.isReg() && Op2.isReg() &&
       TargetRegisterInfo::isVirtualRegister(Op1.getReg()) &&
       TargetRegisterInfo::isVirtualRegister(Op2.getReg())) {
     MachineInstr *Op1Def = MRI->getVRegDef(Op1.getReg());
     MachineInstr *Op2Def = MRI->getVRegDef(Op2.getReg());
     if (TII->produceSameValue(*Op1Def, *Op2Def, MRI)) {
       DEBUG(dbgs() << "Op1Def: " << *Op1Def << " and " << *Op2Def
                    << " produce the same value!\n");
     } else {
       DEBUG(dbgs() << "Operands produce different values\n");
       return false;
     }
   } else {
     DEBUG(dbgs() << "The operands are not provably identical.\n");
     return false;
   }
 }
357

changes in previous comment will address this issue as well.

467

We always return false when an instruction isn't safe to remove. This function just does a check to see if it's valid to move an instruction to the BB at the specified location. If won't work if it asserts when it can't be moved.

lei marked 6 inline comments as done.Jan 23 2017, 10:48 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
379

will remove function and add loop to location where it's being used.

387

Yes. They should be inserted before existing PHIs. Will update the doc.

394

This function moves PHI nodes in the FromMBB to the beginning of the PHI block in ToMBB. If any PHI nodes in ToMBB reference registers defined in PHI nodes in FromMBB, the PHI nodes in FromMBB can not be moved to ToMBB since we can not infer order in the PHI node execution.

lei marked 8 inline comments as done.Jan 23 2017, 10:50 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
345

see code snippet above.

lei marked an inline comment as done.Jan 24 2017, 12:34 PM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
656

okay

730

For debug purposes, it would be nice to see the stages of merging...
Will remove the if statement.

lei marked 3 inline comments as done.Jan 24 2017, 12:50 PM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
379

Actually according to the llvm lang ref, PHI instructions must be first in a basic block. Will remove this function and replace the function call below with MBB->begin()

lei added inline comments.Jan 24 2017, 12:50 PM
lib/CodeGen/BranchCoalescing.cpp
410

will update to use range version of splice()

lei marked an inline comment as done.Jan 24 2017, 9:01 PM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
609

The same kind of checks are needed for function canMerge and mergeCandidate, Put the checks into function validateCandidates() and call it to verify MBBs and do an early exit if returns false.

lei marked an inline comment as done.Jan 25 2017, 8:46 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
443

will remove this redundant check since MI.defs() return register definitions.

lei updated this revision to Diff 85784.Jan 25 2017, 10:55 AM

Updated patch to address all the review comments from Nemanja.

lei updated this revision to Diff 86082.Jan 27 2017, 11:21 AM

Add guard to only verify MF which this pass have modified.

lei added a comment.Feb 7 2017, 10:46 AM

Just a friendly reminder that this patch is waiting to be reviewed. Thanks!

kbarton accepted this revision.Feb 8 2017, 12:10 PM

Aside from a couple minor comments, this LGTM.

It would be good if @hfinkel or @echristo could take a look also.

lib/CodeGen/BranchCoalescing.cpp
120

Branch Coalescing not Branch Coalesce

754

Did we conclude whether this is different from the checks done by the verifyAfter parameter in addPass?

This revision is now accepted and ready to land.Feb 8 2017, 12:10 PM
echristo edited edge metadata.Feb 8 2017, 1:06 PM

I'll look today.

In general pretty happy with this. A few comments inline.

A few top level comments:

What performance testing have you done with this?
Looks like SPEC on ppc?
Anything else?
Same for correctness checking. You mentioned that this doesn't affect ARM and then change a thumb testcase, what's up there?

Thanks!

-eric

lib/CodeGen/BranchCoalescing.cpp
32

Let's have this default to off for the initial commit and then we can turn it on in a subsequent one.

231

Probably should be something like "canCoalesceBranch" instead of analyzeBranch. All of the analysis is being done by analyzeBranch really and this is actually figuring out whether or not we should coalesce.

253

Meta comment on the DEBUG statements. I like them, but it might be nice to give a summary of what you're analyzing here before all of the "can't do it" statements.

290–304

I believe the case you're looking for here is also whether or not there's a single fall through? Might be nice to reorganize the code with that in mind.

364

Could use an explanation of why - it confused Nemanja at first and could confuse others.

613

llvm_unreachable

643

Nit: All comments should be complete sentences. (A few other occurrences)

664–667

This set of comments is very confusing.

lei added a comment.EditedFeb 13 2017, 12:55 PM

In general pretty happy with this. A few comments inline.

A few top level comments:

What performance testing have you done with this?
Looks like SPEC on ppc?
Anything else?
Same for correctness checking. You mentioned that this doesn't affect ARM and then change a thumb testcase, what's up there?

Thanks!

-eric

Yes, the only performance tests done was SPEC on PPC.

Seems I was testing with the wrong triple for ARM. The -mtriple I was testing with was "-mtriple=armv6-unknown-linux-gnu" which generated a pattern that we do not recognize. The test case below uses
-mtriple=thumb-apple-darwin and -mtriple=thumb-pc-linux-gnueabi. The select statements for those triples does generate the pattern we are looking for.

I have updated the batch to be turned off by default.

lei added inline comments.Feb 13 2017, 1:03 PM
lib/CodeGen/BranchCoalescing.cpp
32

okay

120

okay

231

true

253

okay

lei marked 5 inline comments as done.Feb 20 2017, 8:33 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
364

Sine PHI node ordering can not be assumed, it doesn't really matter where we place the PHI instructions. Will update comment to reflect this.

lei marked 5 inline comments as done.Feb 20 2017, 10:03 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
664–667

okay

Because select gets converted to branches in PowerPC, I was wondering if there is a way to not generate selects in the first place.

lei marked 5 inline comments as done.Feb 21 2017, 10:11 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
613

okay

lei marked 3 inline comments as done.Feb 21 2017, 10:43 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
643

k.

754

Yes. This is different from the checks done by the verifyAfter parameter in addPass. The verifyAfter parameter for this pass is set to true by default.

lei marked 4 inline comments as done.Feb 21 2017, 11:00 AM

Because select gets converted to branches in PowerPC, I was wondering if there is a way to not generate selects in the first place.

It is not true in general that selects get converted to branches on PowerPC. For example, integer selects can be lowered into an actual "integer select" instruction under the "right" circumstances.
I'm sure there is a way to prevent the selects from being produced, but I am not sure this would have any effect on this optimization. For the specific test case that is added as part of this patch, it appears that SROA produces the selects. I haven't personally looked at the semantics of SROA and the heuristics it uses to produce a select or not, but there is presumably a way to tell it not to do so.

Because select gets converted to branches in PowerPC, I was wondering if there is a way to not generate selects in the first place.

It is not true in general that selects get converted to branches on PowerPC. For example, integer selects can be lowered into an actual "integer select" instruction under the "right" circumstances.
I'm sure there is a way to prevent the selects from being produced, but I am not sure this would have any effect on this optimization. For the specific test case that is added as part of this patch, it appears that SROA produces the selects. I haven't personally looked at the semantics of SROA and the heuristics it uses to produce a select or not, but there is presumably a way to tell it not to do so.

In general, we want to produce selects rather than explicit branching due to superblock formation etc. On Power9 we'll find that the integer select instruction is also much faster than on Power8 and so will experience less of a penalty as well.

You marked things done that aren't done. What's going on? :)

-eric

lib/CodeGen/BranchCoalescing.cpp
231

You've marked this done, but it's not done?

lei marked an inline comment as done.Feb 22 2017, 11:49 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
290–304

okay.

lei marked an inline comment as done.Feb 22 2017, 11:51 AM

You marked things done that aren't done. What's going on? :)

-eric

Someone told me no one looked at that so I was using it to mark what I have done in my version ... which is not uploaded yet :) Will stop doing that now that I know it is actually being noted on.

lei marked an inline comment as not done.Feb 22 2017, 11:53 AM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
290–304

Will reorganize this.

In D28249#683815, @lei wrote:

You marked things done that aren't done. What's going on? :)

-eric

Someone told me no one looked at that so I was using it to mark what I have done in my version ... which is not uploaded yet :) Will stop doing that now that I know it is actually being noted on.

Thanks :)

I very much look at it to make sure everything I asked for was done before coming back to reviews.

lei updated this revision to Diff 89397.Feb 22 2017, 12:09 PM

Address review comments.

lei marked 2 inline comments as done.Feb 22 2017, 12:10 PM
In D28249#683815, @lei wrote:

You marked things done that aren't done. What's going on? :)

-eric

Someone told me no one looked at that so I was using it to mark what I have done in my version ... which is not uploaded yet :) Will stop doing that now that I know it is actually being noted on.

Thanks :)

I very much look at it to make sure everything I asked for was done before coming back to reviews.

Now it's really done :)

Bunch of inline comments.

Thanks!

-eric

include/llvm/CodeGen/Passes.h
403

Nit: Complete sentences in comments please.

lib/CodeGen/BranchCoalescing.cpp
353

"beginning"

368

"contains"

379

I don't think this has ever been changed, but I'm not against a separate function. That said, doesn't transferSuccessorsAndUpdatePHIs already do this?

Also, this loop needs documentation :)

393–394

This seems like an odd comment.

394

Should probably assert this somewhere.

405

I think it makes just as much sense (and is more readable) if you split the function into canMoveToBeginning and canMoveToEnd and just drop the boolean parameter.

It'll also make it easier to split the docs.

454–467

Possible you just want to replace this with a bunch of llvm_unreachables?

480

Why?

488

Sounds like an assert.

517

Seems like you want to swap these conditionals?

535

else if? Otherwise what happens?

758

This was to be turned on in a subsequent patch. Remove this and the ppc support please :)

lei added inline comments.Feb 24 2017, 8:35 AM
include/llvm/CodeGen/Passes.h
403

okay

lib/CodeGen/BranchCoalescing.cpp
379

It's because here we only want to move the PHIs from SourceMBB to TargetMBB. SourceMBB will be deleted later. We don't want to transfer the successor info here. Control flow need to be updated once we have deleted SourceMBB.

393–394

A MI instruction can only be moved to TargetMBB if there are no uses of it within the TargetMBB's PHI nodes. Will reword the comment.

394

We don't assert cause we first check one direction (canMoveToBeginning) and then check the other direction (canMoveToEnd). If both fails, we just don't move the instruction and move to investigate the next one.

405

agreed.

454–467

yup

480

Maybe preference is the wrong word here... this is by design. Will update the doc

488

we do assert on that, will update the doc here.

535

No else if. We check conditions to move up and down separately and verify that only one direction is valid.

lei updated this revision to Diff 89775.Feb 24 2017, 10:50 PM

Address review comments.

lei marked 12 inline comments as done.Feb 24 2017, 10:54 PM
lei added inline comments.
lib/CodeGen/BranchCoalescing.cpp
758

see line 161

lei marked an inline comment as done.Feb 24 2017, 10:56 PM
echristo accepted this revision.Feb 28 2017, 1:33 PM

LGTM.

Thanks! Now let's get some performance testing and go from there.

-eric

This revision was automatically updated to reflect the committed changes.
MatzeB added a subscriber: MatzeB.Apr 10 2017, 4:34 PM

Just noticed we have a new pass in the codegen pipeline and wondered what it is about. Some comments:

  • The description/examples talk about the same branch condition in the IR but the IR doesn't even have branches and this is an MI pass, not an IR pass.
  • When I codegen the given example on X86 I do indeed see silly code getting generated because isel chooses a "CMOV_FR64" for each of the select instructions which "Expand ISel Pseudo-instructions" later expands into 3 if-diamonds that all have the same condition.
  • It looks like we created a whole new pass to fix Expand ISel Pseudo begin stupid?
  • This is a generic codegen pass but the description here indicates it only ever happens to match the patterns on PowerPC?

Hi Matthias,

I'll let Lei talk to more of it, however...

Just noticed we have a new pass in the codegen pipeline and wondered what it is about. Some comments:

  • The description/examples talk about the same branch condition in the IR but the IR doesn't even have branches and this is an MI pass, not an IR pass.

Would you prefer s/IR/MIR? :)

That said, it's definitely meant to be an MI pass.

  • When I codegen the given example on X86 I do indeed see silly code getting generated because isel chooses a "CMOV_FR64" for each of the select instructions which "Expand ISel Pseudo-instructions" later expands into 3 if-diamonds that all have the same condition.

Yep. The arm and mips backends also do the same thing.

  • It looks like we created a whole new pass to fix Expand ISel Pseudo begin stupid?

I agree up to a point, though I don't have any particular ideas here how to fix it. My thought was "let the backends expand into the easiest code as possible and then clean it up with a generic pass", but I'm definitely open to different ideas.

  • This is a generic codegen pass but the description here indicates it only ever happens to match the patterns on PowerPC?

It's not so much that as the original patch was on by default and required changes to the ARM testcases, but the motivating example was particularly poor code on Power. We could probably throw some extra tests in, but it seems like a stretch to make all of the examples quite so universal? The code itself, of course, has no backend dependencies or even TTI.

Thoughts?

-eric

To say this first: I'm fine with the current implementation.

Hi Matthias,

I'll let Lei talk to more of it, however...

Just noticed we have a new pass in the codegen pipeline and wondered what it is about. Some comments:

  • The description/examples talk about the same branch condition in the IR but the IR doesn't even have branches and this is an MI pass, not an IR pass.

Would you prefer s/IR/MIR? :)

This comment was mostly about the description of this patch which was confusing as it only talked about IR. Actually looking at the code the documentation comment above the class is a lot better and explains the problem just fine.

That said, it's definitely meant to be an MI pass.

  • When I codegen the given example on X86 I do indeed see silly code getting generated because isel chooses a "CMOV_FR64" for each of the select instructions which "Expand ISel Pseudo-instructions" later expands into 3 if-diamonds that all have the same condition.

Yep. The arm and mips backends also do the same thing.

  • It looks like we created a whole new pass to fix Expand ISel Pseudo begin stupid?

I agree up to a point, though I don't have any particular ideas here how to fix it. My thought was "let the backends expand into the easiest code as possible and then clean it up with a generic pass", but I'm definitely open to different ideas.

Again no hard rule. My intuition would be that we would get by with less complexity and less code by improving expandisel pseudos; I would expect that matching the pattern and checking for correctness should be quite a bit easier there. But this isn't a big issue and now that Lei already went through the trouble of creating a generic pass we may just as well keep it.

  • This is a generic codegen pass but the description here indicates it only ever happens to match the patterns on PowerPC?

It's not so much that as the original patch was on by default and required changes to the ARM testcases, but the motivating example was particularly poor code on Power. We could probably throw some extra tests in, but it seems like a stretch to make all of the examples quite so universal? The code itself, of course, has no backend dependencies or even TTI.

Please ignore this comment. I was mainly triggered by "After Branch Coalescing" showing up in -print-after-all dumps but the pass never changing anything and the comments above about this only working on powerpc; Turns out the pass just checks the enable-branch-coalesce isn't passed as a commandline flag. I think most passes have a cl::opt in TargetPassConfig.cpp instead and we don't even add the pass the pipeline if it is disable, you could change to that style too to avoid people seeing the dump for a disabled pass :)

Good to hear this is a generic pass, I am pretty sure the code equally bad on the other targets and it is good that we fixed it. I'll add a ticket into our bugtracker to experiment/evaluate on X86/AArch64.

  • Matthias
lei added a comment.Apr 28 2017, 7:59 AM

I think most passes have a cl::opt in TargetPassConfig.cpp instead and we don't even add the pass the pipeline if it is disable, you could change to that style too to avoid people seeing the dump for a disabled pass :)

Will address this in the next patch to turn this on by default for PowerPC.
Thanks

iteratee added inline comments.
llvm/trunk/lib/CodeGen/BranchCoalescing.cpp
340 ↗(On Diff #90219)

This line isn't correct. I have a case where instructions that "produce the same value" are different. The relevant sequence is:

BB#16: 
...
        BCTRL8_LDinto_toc
...
        %vreg140<def> = COPY %CR0GT; CRBITRC:%vreg140
        %vreg141<def> = LXSDX %vreg138, %vreg129, %RM<imp-use>; mem:LD8[%134](dereferenceable) F8RC:%vreg141 G8RC_and_G8RC_NOX0:%vreg138 G8RC:%vreg129
        %vreg142<def> = XXLXORdpz; F8RC:%vreg142
        BC %vreg140, <BB#73>; CRBITRC:%vreg140

BB#72: derived from LLVM BB %114
    Predecessors according to CFG: BB#16
    Successors according to CFG: BB#73(?%)

BB#73: derived from LLVM BB %114
    Predecessors according to CFG: BB#16 BB#72
        %vreg143<def> = PHI %vreg142, <BB#72>, %vreg141, <BB#16>; F8RC:%vreg143,%vreg142,%vreg141
...
        BCTRL8_LDinto_toc
...
        %vreg149<def> = COPY %CR0GT; CRBITRC:%vreg149
        %vreg150<def> = LXSDX %vreg138, %vreg129, %RM<imp-use>; mem:LD8[%134](dereferenceable) F8RC:%vreg150 G8RC_and_G8RC_NOX0:%vreg138 G8RC:%vreg129
        BC %vreg149, <BB#75>; CRBITRC:%vreg149
    Successors according to CFG: BB#74(?%) BB#75(?%)

BB#74: derived from LLVM BB %114
    Predecessors according to CFG: BB#73
    Successors according to CFG: BB#75(?%)

BB#75: derived from LLVM BB %114
    Predecessors according to CFG: BB#73 BB#74
        %vreg151<def> = PHI %vreg142, <BB#74>, %vreg150, <BB#73>; F8RC:%vreg151,%vreg142,%vreg150

The debug output produces:

Valid Candidate
Op1: 1024
Op2: 1024
Op1 and Op2 are identical!
Op1: %vreg140
Op2: %vreg149
Op1Def: %vreg140<def> = COPY %CR0GT; CRBITRC:%vreg140
 and %vreg149<def> = COPY %CR0GT; CRBITRC:%vreg149
 produce the same value!

While it would be safe to CSE those crmoves, what definitely cannot occur is to assume that the value of CR0GT has not changed between the 2 instructions.

hfinkel added inline comments.Aug 30 2017, 4:33 PM
llvm/trunk/lib/CodeGen/BranchCoalescing.cpp
340 ↗(On Diff #90219)

Indeed; I think that we need to filter out instructions with physical-register uses (even if identical). We might be able to be a little smarter about it in a similar way to MachineLICM (the other uses of this function), which does this:

// Don't hoist an instruction that uses or defines a physical register.
if (TargetRegisterInfo::isPhysicalRegister(Reg)) {
  if (MO.isUse()) {
    // If the physreg has no defs anywhere, it's just an ambient register
    // and we can freely move its uses. Alternatively, if it's allocatable,
    // it could get allocated to something with a def during allocation.
    // However, if the physreg is known to always be caller saved/restored
    // then this use is safe to hoist.
    if (!MRI->isConstantPhysReg(Reg) &&
        !(TRI->isCallerPreservedPhysReg(Reg, *I.getParent()->getParent())))
        return false;
    // Otherwise it's safe to move.
    continue;
  } else if (!MO.isDead()) {
    // A def that isn't dead. We can't move it.
    return false;
  ...
lei added a comment.Sep 6 2017, 7:34 PM
llvm/trunk/lib/CodeGen/BranchCoalescing.cpp
340 ↗(On Diff #90219)