This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/
-
llvm/
-
CodeGen/
-
Passes.h
-
InitializePasses.h
-
lib/CodeGen/
-
CodeGen/
3
BranchCoalescing.cpp
-
CMakeLists.txt
-
CodeGen.cpp
-
TargetPassConfig.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
branch_coalesce.ll

Differential D28249

Improve scheduling with branch coalescing
ClosedPublic

Authored by lei on Jan 3 2017, 12:39 PM.

Download Raw Diff

Details

Reviewers

echristo
Carrot
kbarton
hfinkel
nemanjai

Commits

rGb223cfabcc6b: Improve scheduling with branch coalescing
rL296670: Improve scheduling with branch coalescing

Summary

Improve scheduling by coalescing branches that depend on the same condition. This pass looks for blocks that are guarded by the same branch condition in the IR and attempts to merge the blocks together. This is done by moving code either up/down to it’s predecessor/successor blocks.

On power8 LE, we see a 11% improvement for lbm and 28% improvement for mcf (SPEC2006).

I tried the following test on ARM and X86.

$ cat branchC.ll
; RUN: llc -mcpu=generic -mtriple=x86_64-unknown-linux -verify-machineinstrs < %s | FileCheck %s
; RUN: llc -mtriple=armv6-unknown-linux-gnu < %s | FileCheck %s
; RUN: llc -verify-machineinstrs -o - %s -mtriple=aarch64-linux-gnu | FileCheck %s

; Function Attrs: nounwind
define double @testBranchCoal(double %a, double %b, double %c, i32 %x) {
entry:

%test = icmp eq i32 %x, 0
%tmp1 = select i1 %test, double %a, double 2.000000e-03
%tmp2 = select i1 %test, double %b, double 0.000000e+00
%tmp3 = select i1 %test, double %c, double 5.000000e-03

%res1 = fadd double %tmp1, %tmp2
%result = fadd double %res1, %tmp3
ret double %result

}

This does not affect ARM since the LLVM IR produced does not conform to the pattern we expected.
For X86, the branches were not coalesced since the terminator produced contain implicit operands. This code will only coalesce branches whose terminators contain explicit operands.

This is originally reported in: https://llvm.org/bugs/show_bug.cgi?id=25219

Diff Detail

Repository: rL LLVM

Event Timeline

lei updated this revision to Diff 82932.Jan 3 2017, 12:39 PM

lei retitled this revision from to Improve scheduling with branch coalescing.

lei updated this object.

lei added reviewers: Carrot, nemanjai, kbarton, echristo.

lei added a subscriber: llvm-commits.

Herald added subscribers: mgorny, mehdi_amini, aemerson. · View Herald TranscriptJan 3 2017, 12:39 PM

lei added a reviewer: hfinkel.Jan 3 2017, 12:44 PM

lei added subscribers: jtony, sfertile, syzaara, craig.topper.

These inline comments are based on a first-pass reading of the patch. As such, they refer to issues that are rather local in scope. This is a large patch so I'll set aside some more time to fully understand it in its entirety in the next review cycle.

lib/CodeGen/BranchCoalescing.cpp
217 ↗	(On Diff #82932)	We want to reset the stats for every function? I would have assumed we're interested in collecting this for the entire module.
227 ↗	(On Diff #82932)	I am personally not a fan of functions that modify an output parameter and then fail. Presumably this makes no difference now, but if we end up extending this pass in the future to have some backtracking/requeuing capabilities, it would be nice if the object wasn't modified/invalidated in some way.
336 ↗	(On Diff #82932)	Use same debug format for both (i.e. space after the colon).
344 ↗	(On Diff #82932)	Won't this fail (as in assert) if `Op2.isReg() != true`?
356 ↗	(On Diff #82932)	Maybe a comment as to why we fail here if we get physical registers. Also, if I follow the control flow here correctly, we will return `true` here if one is a physical register and the other is a virtual register. Is that what we want? I imagine not.
378 ↗	(On Diff #82932)	Not that it makes a difference computationally, but perhaps for consistency: for (MachineBasicBlock::instr_iterator Instr : MBB.instrs()) if (Instr->isPHI()) return Instr; return MBB.instr_end(); Also, I'm not sure what the style guide says regarding the use of `auto` there, but perhaps instead of `MachineBasicBlock::instr_iterator`, it should be `auto`. Finally, if every use of this function will just use `begin()` instead of `end()` when there are no existing PHI's, perhaps it would make sense to actually return `begin()` if there are no PHI's in the MBB.
393 ↗	(On Diff #82932)	Can you please state why something like `MachineBasicBlock::SkipPHIsLabelsAndDebug()` cannot be used instead of this function (thereby eliminating the need for that function altogether)?
409 ↗	(On Diff #82932)	You should have a range of iterators that represent the PHI's in the `From` block. Why not use the range-version of `splice()` after this loop? If it is to avoid the empty range check on the PHI's in the `From` block, maybe just a comment to that end.
415 ↗	(On Diff #82932)	The second sentence is redundant.
419 ↗	(On Diff #82932)	'... used in this block ...' means '... used in original block ...'? Overall, I think this comment is kind of hard to follow. I think it would suffice to say when an instruction can move to the begin/end location. Something along the lines of: An instruction MI can move from MBB Source to MBB Target under the following conditions: There are no PHI's in Target that use what MI defines No instructions in Source use what MI defines (unless Target is a dominator of Source) No instructions in Source define what MI uses (unless Source is a dominator of Target) If any instructions in Target define what MI uses, the MI can only move to the end of Target
442 ↗	(On Diff #82932)	I don't understand the purpose of the comment.
466 ↗	(On Diff #82932)	I don't think this function suffices to determine whether an instruction can move to a block. Example MBB's: BB#0: ... ; some non-PHI def of %vreg2 ; some non-PHI def of %vreg3 %vreg10 = ADD %vreg2, %vreg3 ; MI ... BB#1: ; MBB1 %vreg5 = PHI %vreg10, <BB#0>, %vreg4, <BB#2> ... The call: `canMoveTo(MI, MBB1, false)` Would return true, but it is not safe to do so. I think this patch uses this query in very limited circumstances, so it isn't necessarily required for this function to determine whether it is safe to move an arbitrary instruction to an arbitrary block. But if this is the case, we should add asserts to eliminate incorrect uses of the function (in the future).
507 ↗	(On Diff #82932)	Why are all of these asserts rather than early exits returning false? On builds that compile asserts out, I imagine we could end up doing the wrong thing.
510 ↗	(On Diff #82932)	I don't think there's anything to be gained from printing the name of the function emitting the message in the debug output. Please remove this.
554 ↗	(On Diff #82932)	The use of 'From', 'from', 'To' and 'to' makes the comment confusing. Please consider renaming the parameters or otherwise clarifying this comment. Perhaps something like: `Merge the CFG triangle \p From into the preceding CFG triangle \p To.`
608 ↗	(On Diff #82932)	Are these two asserts enough or does `From.BranchBlock` need to [immediately] dominate both `To.BranchTargetBlock` and `To.FallThroughBlock`?
620 ↗	(On Diff #82932)	You've already moved all the PHI's from `From.BranchBlock`. Can't this just be a loop over `From.BranchBlock->instrs()`? Or it may need to be a loop from `rbegin()` to `rend()`.
626 ↗	(On Diff #82932)	This is loop-invariant. I think it'd be cleaner to move the assert out of the loop and call `splice()` on the result of a ternary operator. Maybe even invert the `if` and the loop.
655 ↗	(On Diff #82932)	Where is the assert to ensure this?

lei marked 12 inline comments as done.Jan 22 2017, 9:58 AM

lei added inline comments.

lib/CodeGen/BranchCoalescing.cpp
217 ↗	(On Diff #82932)	Initialization removed.
227 ↗	(On Diff #82932)	agreed.
336 ↗	(On Diff #82932)	fixed
344 ↗	(On Diff #82932)	No, since we check that Op2 and Op1 are the same type on line 330. We do an early exit if they are not the same type. getType() returns an enum MachineOperandType, isReg() basically checks that the MachineOperand's type (MachineOperandType) is of the specific enum MO_Register.
356 ↗	(On Diff #82932)	It was just put in to see if we trip on it. I have removed this.
378 ↗	(On Diff #82932)	Can begin() also be a PHI? If not, then that is okay, if it can then it is better to return end() so that we know there are no PHIs in this block.
393 ↗	(On Diff #82932)	MachineBasicBlock::SkipPHIsLabelsAndDebug() returns the iterator to the first NON PHI, non-label instruction. We want the iterator to the first PHI node in a MBB. I don't see any function within that class that returns the iterator to the first PHI.
415 ↗	(On Diff #82932)	agreed
419 ↗	(On Diff #82932)	Arguments renamed and doc updated to be more clear.
466 ↗	(On Diff #82932)	You are right, this function alone can not be used to determine whether an instruction can move to a random block. This function is a helper function used by canMerge() to determine whether 2 candidates can be merged.
507 ↗	(On Diff #82932)	This seem to be the case for other classes like MachineOperand. If assert is off it could end up doing the wrong thing....
510 ↗	(On Diff #82932)	okay
554 ↗	(On Diff #82932)	Renamed "From/To" to "SourceRegion/TargetRegion". I believe this is now less confusing ...
608 ↗	(On Diff #82932)	From.BranchBlock MUST equal To.BranchTargetBlock, not dominate.
620 ↗	(On Diff #82932)	agreed
626 ↗	(On Diff #82932)	assert moved to top of function
655 ↗	(On Diff #82932)	In analyzeBranch(), line 297, we do an early exit if the FallThroughBlock contain code.

lei marked 12 inline comments as done.Jan 22 2017, 10:29 AM

lei added inline comments.

lib/CodeGen/BranchCoalescing.cpp
620 ↗	(On Diff #82932)	changing this to be a range based splice call instead.

nemanjai added inline comments.Jan 22 2017, 12:21 PM

lib/CodeGen/BranchCoalescing.cpp
344 ↗	(On Diff #82932)	Oh OK. I missed the early exit check. I think the layout of this function is a little confusing so I can see why I missed it. It seems that the conditions under which we return true are (any of): The lists are pair-wise identical For any pairs that aren't identical, they're both registers that produce the same value Why not such a simple layout to the function? For example, your check for same type is repeated in the call to `MachineOperand::isIdenticalTo()`. It seems to me like a layout such as this for the loop encapsulates what this function should do: for (unsigned i = 0; i < OpList1.size(); ++i) { const MachineOperand &Op1 = OpList1[i]; const MachineOperand &Op2 = OpList2[i]; DEBUG(dbgs() << "Op1: " << Op1 << "\n" << "Op2: " << Op2 << "\n"); if (Op1.isIdenticalTo(Op2)) { DEBUG(dbgs() << "Op1 and Op2 are identical!\n"); return true; } if (Op1.isReg() && Op2.isReg() && TargetRegisterInfo::isVirtualRegister(Op1.getReg()) && TargetRegisterInfo::isVirtualRegister(Op2.getReg())) { if (TII->produceSameValue(Op1Def, Op2Def, MRI)) { DEBUG(dbgs() << "Operands produce the same value.\n"); return true; } } DEBUG(dbgs() << "The operands are not provably identical.\n"); return false; }
356 ↗	(On Diff #82932)	I'm afraid this won't suffice. If it were to somehow happen that (at least) one is a physical register, you'll return true here - regardless of whether it's the same register (not that it really matters whether it's the same register or not).
378 ↗	(On Diff #82932)	Are you referring to some potential future use of this function? Because this result isn't currently used to determine whether the block has any PHI's. But I suppose you're right it is potentially dangerous for it to return a valid iterator when there are no PHI's. Also, the work this function does just seems like overkill. It seems like you should only be checking instructions between `MBB.instr_begin()` and `MBB.getFirstNonPHI()`. Why does that not suffice? In any case, considering this function is only used once and in many cases it'll return a value that you're just going to discard and use the `begin()` pointer, I don't think this function should exist. Just find the insert location in the function where you need it.
386 ↗	(On Diff #82932)	It seems below like you're inserting them before any existing PHI instructions.
393 ↗	(On Diff #82932)	OK, that makes sense. I suppose you want to make sure that you skip anything that comes between `begin()` and the first PHI node in the `ToMBB`. But I guess what I'm confused about is exactly what this function is meant to do. It appears that it will do the following: Iterate over all the PHI nodes in MBB called `FromMBB` If there are any PHI nodes that refer to the same MBB, they'll be updated so they refer to the MBB called `ToMBB` Then each of those PHI nodes will be moved to MBB called `ToMBB` before any existing PHI nodes in `ToMBB` However, what happens to any PHI nodes in `ToMBB` that refer to `FromMBB`? They don't need to be updated?
466 ↗	(On Diff #82932)	Yes, but please add asserts to ensure this function isn't used incorrectly in the future. It is dangerous for a function called `canMoveTo` to return true when an instruction can't safely move to a basic block.
608 ↗	(On Diff #82932)	The two are not mutually exclusive. Also, this doesn't answer whether `From.BranchBlock` needs to dominate `To.FallThroughBlock`.
655 ↗	(On Diff #82932)	Sure. Now there is. I'd still prefer an assert. If someone in the future feels that `analyzeBranch()` doesn't need that early exit any longer, they'll need to ensure they account for this assert and why it's in this function.
729 ↗	(On Diff #82932)	Setting a bool to true and then using it in an if statement without any statements that could modify it in between is redundant. Besides, I think it's overly verbose to dump the entire function after every merge - I'd imagine that a dump after all the merging is done would suffice.

nemanjai added inline comments.Jan 23 2017, 9:54 AM

lib/CodeGen/BranchCoalescing.cpp
344 ↗	(On Diff #82932)	Sorry, the two `return true;` lines should be `continue;` so that you keep checking the remainder of the lists rather than returning true if the first pair are identical.

lei marked 12 inline comments as done.Jan 23 2017, 10:19 AM

lei added inline comments.

lib/CodeGen/BranchCoalescing.cpp

344 ↗

(On Diff #82932)

Minor change so that we are doing early exit when operands within the operand lists are detected to be diff.

for (unsigned i = 0; i < OpList1.size(); ++i) {
   const MachineOperand &Op1 = OpList1[i];
   const MachineOperand &Op2 = OpList2[i];

   DEBUG(dbgs() << "Op1: " << Op1 << "\n"
                << "Op2: " << Op2 << "\n");

   if (Op1.isIdenticalTo(Op2)) {
     DEBUG(dbgs() << "Op1 and Op2 are identical!\n");
     continue;
   }

   // If the operands are not identical, but are registers, check to see if the
   // definition of the register produces the same value. If they produce the
   // same value, consider them to be identical.
   if (Op1.isReg() && Op2.isReg() &&
       TargetRegisterInfo::isVirtualRegister(Op1.getReg()) &&
       TargetRegisterInfo::isVirtualRegister(Op2.getReg())) {
     MachineInstr *Op1Def = MRI->getVRegDef(Op1.getReg());
     MachineInstr *Op2Def = MRI->getVRegDef(Op2.getReg());
     if (TII->produceSameValue(*Op1Def, *Op2Def, MRI)) {
       DEBUG(dbgs() << "Op1Def: " << *Op1Def << " and " << *Op2Def
                    << " produce the same value!\n");
     } else {
       DEBUG(dbgs() << "Operands produce different values\n");
       return false;
     }
   } else {
     DEBUG(dbgs() << "The operands are not provably identical.\n");
     return false;
   }
 }

356 ↗

(On Diff #82932)

changes in previous comment will address this issue as well.

466 ↗

(On Diff #82932)

We always return false when an instruction isn't safe to remove. This function just does a check to see if it's valid to move an instruction to the BB at the specified location. If won't work if it asserts when it can't be moved.

lei marked 6 inline comments as done.Jan 23 2017, 10:48 AM

lei added inline comments.

lib/CodeGen/BranchCoalescing.cpp
378 ↗	(On Diff #82932)	will remove function and add loop to location where it's being used.
386 ↗	(On Diff #82932)	Yes. They should be inserted before existing PHIs. Will update the doc.
393 ↗	(On Diff #82932)	This function moves PHI nodes in the FromMBB to the beginning of the PHI block in ToMBB. If any PHI nodes in ToMBB reference registers defined in PHI nodes in FromMBB, the PHI nodes in FromMBB can not be moved to ToMBB since we can not infer order in the PHI node execution.

lei marked 8 inline comments as done.Jan 23 2017, 10:50 AM

lei added inline comments.

lib/CodeGen/BranchCoalescing.cpp
344 ↗	(On Diff #82932)	see code snippet above.

lei marked an inline comment as done.Jan 24 2017, 12:34 PM

lei added inline comments.

lib/CodeGen/BranchCoalescing.cpp
655 ↗	(On Diff #82932)	okay
729 ↗	(On Diff #82932)	For debug purposes, it would be nice to see the stages of merging... Will remove the if statement.

lei marked 3 inline comments as done.Jan 24 2017, 12:50 PM

lei added inline comments.

lib/CodeGen/BranchCoalescing.cpp
378 ↗	(On Diff #82932)	Actually according to the llvm lang ref, PHI instructions must be first in a basic block. Will remove this function and replace the function call below with MBB->begin()

lei added inline comments.Jan 24 2017, 12:50 PM

lib/CodeGen/BranchCoalescing.cpp
409 ↗	(On Diff #82932)	will update to use range version of splice()

lei marked an inline comment as done.Jan 24 2017, 9:01 PM

lei added inline comments.

lib/CodeGen/BranchCoalescing.cpp
608 ↗	(On Diff #82932)	The same kind of checks are needed for function canMerge and mergeCandidate, Put the checks into function validateCandidates() and call it to verify MBBs and do an early exit if returns false.

lei marked an inline comment as done.Jan 25 2017, 8:46 AM

lei added inline comments.

lib/CodeGen/BranchCoalescing.cpp
442 ↗	(On Diff #82932)	will remove this redundant check since MI.defs() return register definitions.

Updated patch to address all the review comments from Nemanja.

Add guard to only verify MF which this pass have modified.

Just a friendly reminder that this patch is waiting to be reviewed. Thanks!

Aside from a couple minor comments, this LGTM.

It would be good if @hfinkel or @echristo could take a look also.

lib/CodeGen/BranchCoalescing.cpp
119 ↗	(On Diff #86082)	Branch Coalescing not Branch Coalesce
753 ↗	(On Diff #86082)	Did we conclude whether this is different from the checks done by the verifyAfter parameter in addPass?

This revision is now accepted and ready to land.Feb 8 2017, 12:10 PM

I'll look today.

In general pretty happy with this. A few comments inline.

A few top level comments:

What performance testing have you done with this?
Looks like SPEC on ppc?
Anything else?
Same for correctness checking. You mentioned that this doesn't affect ARM and then change a thumb testcase, what's up there?

Thanks!

-eric

lib/CodeGen/BranchCoalescing.cpp
31 ↗	(On Diff #86082)	Let's have this default to off for the initial commit and then we can turn it on in a subsequent one.
230 ↗	(On Diff #86082)	Probably should be something like "canCoalesceBranch" instead of analyzeBranch. All of the analysis is being done by analyzeBranch really and this is actually figuring out whether or not we should coalesce.
252 ↗	(On Diff #86082)	Meta comment on the DEBUG statements. I like them, but it might be nice to give a summary of what you're analyzing here before all of the "can't do it" statements.
289–303 ↗	(On Diff #86082)	I believe the case you're looking for here is also whether or not there's a single fall through? Might be nice to reorganize the code with that in mind.
363 ↗	(On Diff #86082)	Could use an explanation of why - it confused Nemanja at first and could confuse others.
612 ↗	(On Diff #86082)	llvm_unreachable
642 ↗	(On Diff #86082)	Nit: All comments should be complete sentences. (A few other occurrences)
663–666 ↗	(On Diff #86082)	This set of comments is very confusing.

In D28249#672204, @echristo wrote:

In general pretty happy with this. A few comments inline.

A few top level comments:

What performance testing have you done with this?
Looks like SPEC on ppc?
Anything else?
Same for correctness checking. You mentioned that this doesn't affect ARM and then change a thumb testcase, what's up there?

Thanks!

-eric

Yes, the only performance tests done was SPEC on PPC.

Seems I was testing with the wrong triple for ARM. The -mtriple I was testing with was "-mtriple=armv6-unknown-linux-gnu" which generated a pattern that we do not recognize. The test case below uses
-mtriple=thumb-apple-darwin and -mtriple=thumb-pc-linux-gnueabi. The select statements for those triples does generate the pattern we are looking for.

I have updated the batch to be turned off by default.

lei added inline comments.Feb 13 2017, 1:03 PM

lib/CodeGen/BranchCoalescing.cpp
31 ↗	(On Diff #86082)	okay
119 ↗	(On Diff #86082)	okay
230 ↗	(On Diff #86082)	true
252 ↗	(On Diff #86082)	okay

lei marked 5 inline comments as done.Feb 20 2017, 8:33 AM

lei added inline comments.

lib/CodeGen/BranchCoalescing.cpp
363 ↗	(On Diff #86082)	Sine PHI node ordering can not be assumed, it doesn't really matter where we place the PHI instructions. Will update comment to reflect this.

lei marked 5 inline comments as done.Feb 20 2017, 10:03 AM

lei added inline comments.

lib/CodeGen/BranchCoalescing.cpp
663–666 ↗	(On Diff #86082)	okay

Because select gets converted to branches in PowerPC, I was wondering if there is a way to not generate selects in the first place.

lei marked 5 inline comments as done.Feb 21 2017, 10:11 AM

lei added inline comments.

lib/CodeGen/BranchCoalescing.cpp
612 ↗	(On Diff #86082)	okay

lei marked 3 inline comments as done.Feb 21 2017, 10:43 AM

lei added inline comments.

lib/CodeGen/BranchCoalescing.cpp
642 ↗	(On Diff #86082)	k.
753 ↗	(On Diff #86082)	Yes. This is different from the checks done by the verifyAfter parameter in addPass. The verifyAfter parameter for this pass is set to true by default.

lei marked 4 inline comments as done.Feb 21 2017, 11:00 AM

In D28249#681685, @hiraditya wrote:

Because select gets converted to branches in PowerPC, I was wondering if there is a way to not generate selects in the first place.

It is not true in general that selects get converted to branches on PowerPC. For example, integer selects can be lowered into an actual "integer select" instruction under the "right" circumstances.
I'm sure there is a way to prevent the selects from being produced, but I am not sure this would have any effect on this optimization. For the specific test case that is added as part of this patch, it appears that SROA produces the selects. I haven't personally looked at the semantics of SROA and the heuristics it uses to produce a select or not, but there is presumably a way to tell it not to do so.

In D28249#682567, @nemanjai wrote:

In D28249#681685, @hiraditya wrote:

Because select gets converted to branches in PowerPC, I was wondering if there is a way to not generate selects in the first place.

It is not true in general that selects get converted to branches on PowerPC. For example, integer selects can be lowered into an actual "integer select" instruction under the "right" circumstances.
I'm sure there is a way to prevent the selects from being produced, but I am not sure this would have any effect on this optimization. For the specific test case that is added as part of this patch, it appears that SROA produces the selects. I haven't personally looked at the semantics of SROA and the heuristics it uses to produce a select or not, but there is presumably a way to tell it not to do so.

In general, we want to produce selects rather than explicit branching due to superblock formation etc. On Power9 we'll find that the integer select instruction is also much faster than on Power8 and so will experience less of a penalty as well.

You marked things done that aren't done. What's going on? :)

-eric

lib/CodeGen/BranchCoalescing.cpp
230 ↗	(On Diff #86082)	You've marked this done, but it's not done?

lei marked an inline comment as done.Feb 22 2017, 11:49 AM

lei added inline comments.

lib/CodeGen/BranchCoalescing.cpp
289–303 ↗	(On Diff #86082)	okay.

In D28249#682608, @echristo wrote:

You marked things done that aren't done. What's going on? :)

-eric

Someone told me no one looked at that so I was using it to mark what I have done in my version ... which is not uploaded yet :) Will stop doing that now that I know it is actually being noted on.

lei marked an inline comment as not done.Feb 22 2017, 11:53 AM

lei added inline comments.

lib/CodeGen/BranchCoalescing.cpp
289–303 ↗	(On Diff #86082)	Will reorganize this.

In D28249#683815, @lei wrote:

In D28249#682608, @echristo wrote:

You marked things done that aren't done. What's going on? :)

-eric

Someone told me no one looked at that so I was using it to mark what I have done in my version ... which is not uploaded yet :) Will stop doing that now that I know it is actually being noted on.

Thanks :)

I very much look at it to make sure everything I asked for was done before coming back to reviews.

Address review comments.

In D28249#683829, @echristo wrote:

In D28249#683815, @lei wrote:

In D28249#682608, @echristo wrote:

You marked things done that aren't done. What's going on? :)

-eric

Someone told me no one looked at that so I was using it to mark what I have done in my version ... which is not uploaded yet :) Will stop doing that now that I know it is actually being noted on.

Thanks :)

I very much look at it to make sure everything I asked for was done before coming back to reviews.

Now it's really done :)

Bunch of inline comments.

Thanks!

-eric

include/llvm/CodeGen/Passes.h
406 ↗	(On Diff #89397)	Nit: Complete sentences in comments please.
lib/CodeGen/BranchCoalescing.cpp
352 ↗	(On Diff #89397)	"beginning"
367 ↗	(On Diff #89397)	"contains"
392–393 ↗	(On Diff #89397)	This seems like an odd comment.
404 ↗	(On Diff #89397)	I think it makes just as much sense (and is more readable) if you split the function into canMoveToBeginning and canMoveToEnd and just drop the boolean parameter. It'll also make it easier to split the docs.
453–466 ↗	(On Diff #89397)	Possible you just want to replace this with a bunch of llvm_unreachables?
479 ↗	(On Diff #89397)	Why?
487 ↗	(On Diff #89397)	Sounds like an assert.
516 ↗	(On Diff #89397)	Seems like you want to swap these conditionals?
534 ↗	(On Diff #89397)	else if? Otherwise what happens?
757 ↗	(On Diff #89397)	This was to be turned on in a subsequent patch. Remove this and the ppc support please :)
378 ↗	(On Diff #82932)	I don't think this has ever been changed, but I'm not against a separate function. That said, doesn't transferSuccessorsAndUpdatePHIs already do this? Also, this loop needs documentation :)
393 ↗	(On Diff #82932)	Should probably assert this somewhere.

lei added inline comments.Feb 24 2017, 8:35 AM

include/llvm/CodeGen/Passes.h
406 ↗	(On Diff #89397)	okay
lib/CodeGen/BranchCoalescing.cpp
392–393 ↗	(On Diff #89397)	A MI instruction can only be moved to TargetMBB if there are no uses of it within the TargetMBB's PHI nodes. Will reword the comment.
404 ↗	(On Diff #89397)	agreed.
453–466 ↗	(On Diff #89397)	yup
479 ↗	(On Diff #89397)	Maybe preference is the wrong word here... this is by design. Will update the doc
487 ↗	(On Diff #89397)	we do assert on that, will update the doc here.
534 ↗	(On Diff #89397)	No else if. We check conditions to move up and down separately and verify that only one direction is valid.
378 ↗	(On Diff #82932)	It's because here we only want to move the PHIs from SourceMBB to TargetMBB. SourceMBB will be deleted later. We don't want to transfer the successor info here. Control flow need to be updated once we have deleted SourceMBB.
393 ↗	(On Diff #82932)	We don't assert cause we first check one direction (canMoveToBeginning) and then check the other direction (canMoveToEnd). If both fails, we just don't move the instruction and move to investigate the next one.

Address review comments.

lei marked 12 inline comments as done.Feb 24 2017, 10:54 PM

lei added inline comments.

lib/CodeGen/BranchCoalescing.cpp
757 ↗	(On Diff #89397)	see line 161

lei marked an inline comment as done.Feb 24 2017, 10:56 PM

LGTM.

Thanks! Now let's get some performance testing and go from there.

-eric

Closed by commit rL296670: Improve scheduling with branch coalescing (authored by nemanjai). · Explain WhyMar 1 2017, 12:41 PM

This revision was automatically updated to reflect the committed changes.

Just noticed we have a new pass in the codegen pipeline and wondered what it is about. Some comments:

The description/examples talk about the same branch condition in the IR but the IR doesn't even have branches and this is an MI pass, not an IR pass.
When I codegen the given example on X86 I do indeed see silly code getting generated because isel chooses a "CMOV_FR64" for each of the select instructions which "Expand ISel Pseudo-instructions" later expands into 3 if-diamonds that all have the same condition.
It looks like we created a whole new pass to fix Expand ISel Pseudo begin stupid?
This is a generic codegen pass but the description here indicates it only ever happens to match the patterns on PowerPC?

Hi Matthias,

I'll let Lei talk to more of it, however...

In D28249#723145, @MatzeB wrote:

Just noticed we have a new pass in the codegen pipeline and wondered what it is about. Some comments:

The description/examples talk about the same branch condition in the IR but the IR doesn't even have branches and this is an MI pass, not an IR pass.

Would you prefer s/IR/MIR? :)

That said, it's definitely meant to be an MI pass.

When I codegen the given example on X86 I do indeed see silly code getting generated because isel chooses a "CMOV_FR64" for each of the select instructions which "Expand ISel Pseudo-instructions" later expands into 3 if-diamonds that all have the same condition.

Yep. The arm and mips backends also do the same thing.

It looks like we created a whole new pass to fix Expand ISel Pseudo begin stupid?

I agree up to a point, though I don't have any particular ideas here how to fix it. My thought was "let the backends expand into the easiest code as possible and then clean it up with a generic pass", but I'm definitely open to different ideas.

This is a generic codegen pass but the description here indicates it only ever happens to match the patterns on PowerPC?

It's not so much that as the original patch was on by default and required changes to the ARM testcases, but the motivating example was particularly poor code on Power. We could probably throw some extra tests in, but it seems like a stretch to make all of the examples quite so universal? The code itself, of course, has no backend dependencies or even TTI.

Thoughts?

-eric

To say this first: I'm fine with the current implementation.

In D28249#739046, @echristo wrote:

Hi Matthias,

I'll let Lei talk to more of it, however...

In D28249#723145, @MatzeB wrote:

Just noticed we have a new pass in the codegen pipeline and wondered what it is about. Some comments:

The description/examples talk about the same branch condition in the IR but the IR doesn't even have branches and this is an MI pass, not an IR pass.

Would you prefer s/IR/MIR? :)

This comment was mostly about the description of this patch which was confusing as it only talked about IR. Actually looking at the code the documentation comment above the class is a lot better and explains the problem just fine.

That said, it's definitely meant to be an MI pass.

When I codegen the given example on X86 I do indeed see silly code getting generated because isel chooses a "CMOV_FR64" for each of the select instructions which "Expand ISel Pseudo-instructions" later expands into 3 if-diamonds that all have the same condition.

Yep. The arm and mips backends also do the same thing.

It looks like we created a whole new pass to fix Expand ISel Pseudo begin stupid?

I agree up to a point, though I don't have any particular ideas here how to fix it. My thought was "let the backends expand into the easiest code as possible and then clean it up with a generic pass", but I'm definitely open to different ideas.

Again no hard rule. My intuition would be that we would get by with less complexity and less code by improving expandisel pseudos; I would expect that matching the pattern and checking for correctness should be quite a bit easier there. But this isn't a big issue and now that Lei already went through the trouble of creating a generic pass we may just as well keep it.

This is a generic codegen pass but the description here indicates it only ever happens to match the patterns on PowerPC?

It's not so much that as the original patch was on by default and required changes to the ARM testcases, but the motivating example was particularly poor code on Power. We could probably throw some extra tests in, but it seems like a stretch to make all of the examples quite so universal? The code itself, of course, has no backend dependencies or even TTI.

Please ignore this comment. I was mainly triggered by "After Branch Coalescing" showing up in -print-after-all dumps but the pass never changing anything and the comments above about this only working on powerpc; Turns out the pass just checks the enable-branch-coalesce isn't passed as a commandline flag. I think most passes have a cl::opt in TargetPassConfig.cpp instead and we don't even add the pass the pipeline if it is disable, you could change to that style too to avoid people seeing the dump for a disabled pass :)

Good to hear this is a generic pass, I am pretty sure the code equally bad on the other targets and it is good that we fixed it. I'll add a ticket into our bugtracker to experiment/evaluate on X86/AArch64.

Matthias

I think most passes have a cl::opt in TargetPassConfig.cpp instead and we don't even add the pass the pipeline if it is disable, you could change to that style too to avoid people seeing the dump for a disabled pass :)

Will address this in the next patch to turn this on by default for PowerPC.
Thanks

iteratee added a subscriber: iteratee.Aug 30 2017, 3:47 PM

iteratee added inline comments.

llvm/trunk/lib/CodeGen/BranchCoalescing.cpp

340

This line isn't correct. I have a case where instructions that "produce the same value" are different. The relevant sequence is:

BB#16: 
...
        BCTRL8_LDinto_toc
...
        %vreg140<def> = COPY %CR0GT; CRBITRC:%vreg140
        %vreg141<def> = LXSDX %vreg138, %vreg129, %RM<imp-use>; mem:LD8[%134](dereferenceable) F8RC:%vreg141 G8RC_and_G8RC_NOX0:%vreg138 G8RC:%vreg129
        %vreg142<def> = XXLXORdpz; F8RC:%vreg142
        BC %vreg140, <BB#73>; CRBITRC:%vreg140

BB#72: derived from LLVM BB %114
    Predecessors according to CFG: BB#16
    Successors according to CFG: BB#73(?%)

BB#73: derived from LLVM BB %114
    Predecessors according to CFG: BB#16 BB#72
        %vreg143<def> = PHI %vreg142, <BB#72>, %vreg141, <BB#16>; F8RC:%vreg143,%vreg142,%vreg141
...
        BCTRL8_LDinto_toc
...
        %vreg149<def> = COPY %CR0GT; CRBITRC:%vreg149
        %vreg150<def> = LXSDX %vreg138, %vreg129, %RM<imp-use>; mem:LD8[%134](dereferenceable) F8RC:%vreg150 G8RC_and_G8RC_NOX0:%vreg138 G8RC:%vreg129
        BC %vreg149, <BB#75>; CRBITRC:%vreg149
    Successors according to CFG: BB#74(?%) BB#75(?%)

BB#74: derived from LLVM BB %114
    Predecessors according to CFG: BB#73
    Successors according to CFG: BB#75(?%)

BB#75: derived from LLVM BB %114
    Predecessors according to CFG: BB#73 BB#74
        %vreg151<def> = PHI %vreg142, <BB#74>, %vreg150, <BB#73>; F8RC:%vreg151,%vreg142,%vreg150

The debug output produces:

Valid Candidate
Op1: 1024
Op2: 1024
Op1 and Op2 are identical!
Op1: %vreg140
Op2: %vreg149
Op1Def: %vreg140<def> = COPY %CR0GT; CRBITRC:%vreg140
 and %vreg149<def> = COPY %CR0GT; CRBITRC:%vreg149
 produce the same value!

While it would be safe to CSE those crmoves, what definitely cannot occur is to assume that the value of CR0GT has not changed between the 2 instructions.

hfinkel added inline comments.Aug 30 2017, 4:33 PM

llvm/trunk/lib/CodeGen/BranchCoalescing.cpp

340

Indeed; I think that we need to filter out instructions with physical-register uses (even if identical). We might be able to be a little smarter about it in a similar way to MachineLICM (the other uses of this function), which does this:

// Don't hoist an instruction that uses or defines a physical register.
if (TargetRegisterInfo::isPhysicalRegister(Reg)) {
  if (MO.isUse()) {
    // If the physreg has no defs anywhere, it's just an ambient register
    // and we can freely move its uses. Alternatively, if it's allocatable,
    // it could get allocated to something with a def during allocation.
    // However, if the physreg is known to always be caller saved/restored
    // then this use is safe to hoist.
    if (!MRI->isConstantPhysReg(Reg) &&
        !(TRI->isCallerPreservedPhysReg(Reg, *I.getParent()->getParent())))
        return false;
    // Otherwise it's safe to move.
    continue;
  } else if (!MO.isDead()) {
    // A def that isn't dead. We can't move it.
    return false;
  ...

llvm/trunk/lib/CodeGen/BranchCoalescing.cpp
340	Addressing this in https://reviews.llvm.org/D32776

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

CodeGen/

Passes.h

3 lines

InitializePasses.h

1 line

lib/

CodeGen/

758 lines

1 line

1 line

4 lines

test/

CodeGen/

PowerPC/

branch_coalesce.ll

31 lines

Diff 90219

llvm/trunk/include/llvm/CodeGen/Passes.h

Show First 20 Lines • Show All 396 Lines • ▼ Show 20 Lines	/// MachineDominanaceFrontier - This pass is a machine dominators analysis pass.
/// if available with PysicalRegisterUsageInfo pass.		/// if available with PysicalRegisterUsageInfo pass.
FunctionPass *createRegUsageInfoPropPass();		FunctionPass *createRegUsageInfoPropPass();

/// This pass performs software pipelining on machine instructions.		/// This pass performs software pipelining on machine instructions.
extern char &MachinePipelinerID;		extern char &MachinePipelinerID;

/// This pass frees the memory occupied by the MachineFunction.		/// This pass frees the memory occupied by the MachineFunction.
FunctionPass *createFreeMachineFunctionPass();		FunctionPass *createFreeMachineFunctionPass();

		/// This pass combine basic blocks guarded by the same branch.
		extern char &BranchCoalescingID;
} // End llvm namespace		} // End llvm namespace

/// Target machine pass initializer for passes with dependencies. Use with		/// Target machine pass initializer for passes with dependencies. Use with
/// INITIALIZE_TM_PASS_END.		/// INITIALIZE_TM_PASS_END.
#define INITIALIZE_TM_PASS_BEGIN INITIALIZE_PASS_BEGIN		#define INITIALIZE_TM_PASS_BEGIN INITIALIZE_PASS_BEGIN

/// Target machine pass initializer for passes with dependencies. Use with		/// Target machine pass initializer for passes with dependencies. Use with
/// INITIALIZE_TM_PASS_BEGIN.		/// INITIALIZE_TM_PASS_BEGIN.
Show All 24 Lines

llvm/trunk/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	void initializeAtomicExpandPass(PassRegistry&);			void initializeAtomicExpandPass(PassRegistry&);
	void initializeBBVectorizePass(PassRegistry&);			void initializeBBVectorizePass(PassRegistry&);
	void initializeBDCELegacyPassPass(PassRegistry &);			void initializeBDCELegacyPassPass(PassRegistry &);
	void initializeBarrierNoopPass(PassRegistry&);			void initializeBarrierNoopPass(PassRegistry&);
	void initializeBasicAAWrapperPassPass(PassRegistry&);			void initializeBasicAAWrapperPassPass(PassRegistry&);
	void initializeBlockExtractorPassPass(PassRegistry&);			void initializeBlockExtractorPassPass(PassRegistry&);
	void initializeBlockFrequencyInfoWrapperPassPass(PassRegistry&);			void initializeBlockFrequencyInfoWrapperPassPass(PassRegistry&);
	void initializeBoundsCheckingPass(PassRegistry&);			void initializeBoundsCheckingPass(PassRegistry&);
				void initializeBranchCoalescingPass(PassRegistry&);
	void initializeBranchFolderPassPass(PassRegistry&);			void initializeBranchFolderPassPass(PassRegistry&);
	void initializeBranchProbabilityInfoWrapperPassPass(PassRegistry&);			void initializeBranchProbabilityInfoWrapperPassPass(PassRegistry&);
	void initializeBranchRelaxationPass(PassRegistry&);			void initializeBranchRelaxationPass(PassRegistry&);
	void initializeBreakCriticalEdgesPass(PassRegistry&);			void initializeBreakCriticalEdgesPass(PassRegistry&);
	void initializeCFGOnlyViewerLegacyPassPass(PassRegistry&);			void initializeCFGOnlyViewerLegacyPassPass(PassRegistry&);
	void initializeCFGPrinterLegacyPassPass(PassRegistry&);			void initializeCFGPrinterLegacyPassPass(PassRegistry&);
	void initializeCFGOnlyPrinterLegacyPassPass(PassRegistry&);			void initializeCFGOnlyPrinterLegacyPassPass(PassRegistry&);
	void initializeCFGSimplifyPassPass(PassRegistry&);			void initializeCFGSimplifyPassPass(PassRegistry&);
	▲ Show 20 Lines • Show All 281 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/BranchCoalescing.cpp

				//===-- CoalesceBranches.cpp - Coalesce blocks with the same condition ---===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// Coalesce basic blocks guarded by the same branch condition into a single
				/// basic block.
				///
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/BitVector.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/CodeGen/MachineDominators.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachinePostDominators.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Target/TargetFrameLowering.h"
				#include "llvm/Target/TargetInstrInfo.h"
				#include "llvm/Target/TargetSubtargetInfo.h"

				using namespace llvm;

				#define DEBUG_TYPE "coal-branch"

				static cl::opt<cl::boolOrDefault>
				EnableBranchCoalescing("enable-branch-coalesce", cl::Hidden,
				cl::desc("enable coalescing of duplicate branches"));

				STATISTIC(NumBlocksCoalesced, "Number of blocks coalesced");
				STATISTIC(NumPHINotMoved, "Number of PHI Nodes that cannot be merged");
				STATISTIC(NumBlocksNotCoalesced, "Number of blocks not coalesced");

				//===----------------------------------------------------------------------===//
				// BranchCoalescing
				//===----------------------------------------------------------------------===//
				///
				/// Improve scheduling by coalescing branches that depend on the same condition.
				/// This pass looks for blocks that are guarded by the same branch condition
				/// and attempts to merge the blocks together. Such opportunities arise from
				/// the expansion of select statements in the IR.
				///
				/// For example, consider the following LLVM IR:
				///
				/// %test = icmp eq i32 %x 0
				/// %tmp1 = select i1 %test, double %a, double 2.000000e-03
				/// %tmp2 = select i1 %test, double %b, double 5.000000e-03
				///
				/// This IR expands to the following machine code on PowerPC:
				///
				/// BB#0: derived from LLVM BB %entry
				/// Live Ins: %F1 %F3 %X6
				/// <SNIP1>
				/// %vreg0<def> = COPY %F1; F8RC:%vreg0
				/// %vreg5<def> = CMPLWI %vreg4<kill>, 0; CRRC:%vreg5 GPRC:%vreg4
				/// %vreg8<def> = LXSDX %ZERO8, %vreg7<kill>, %RM<imp-use>;
				/// mem:LD8[ConstantPool] F8RC:%vreg8 G8RC:%vreg7
				/// BCC 76, %vreg5, <BB#2>; CRRC:%vreg5
				/// Successors according to CFG: BB#1(?%) BB#2(?%)
				///
				/// BB#1: derived from LLVM BB %entry
				/// Predecessors according to CFG: BB#0
				/// Successors according to CFG: BB#2(?%)
				///
				/// BB#2: derived from LLVM BB %entry
				/// Predecessors according to CFG: BB#0 BB#1
				/// %vreg9<def> = PHI %vreg8, <BB#1>, %vreg0, <BB#0>;
				/// F8RC:%vreg9,%vreg8,%vreg0
				/// <SNIP2>
				/// BCC 76, %vreg5, <BB#4>; CRRC:%vreg5
				/// Successors according to CFG: BB#3(?%) BB#4(?%)
				///
				/// BB#3: derived from LLVM BB %entry
				/// Predecessors according to CFG: BB#2
				/// Successors according to CFG: BB#4(?%)
				///
				/// BB#4: derived from LLVM BB %entry
				/// Predecessors according to CFG: BB#2 BB#3
				/// %vreg13<def> = PHI %vreg12, <BB#3>, %vreg2, <BB#2>;
				/// F8RC:%vreg13,%vreg12,%vreg2
				/// <SNIP3>
				/// BLR8 %LR8<imp-use>, %RM<imp-use>, %F1<imp-use>
				///
				/// When this pattern is detected, branch coalescing will try to collapse
				/// it by moving code in BB#2 to BB#0 and/or BB#4 and removing BB#3.
				///
				/// If all conditions are meet, IR should collapse to:
				///
				/// BB#0: derived from LLVM BB %entry
				/// Live Ins: %F1 %F3 %X6
				/// <SNIP1>
				/// %vreg0<def> = COPY %F1; F8RC:%vreg0
				/// %vreg5<def> = CMPLWI %vreg4<kill>, 0; CRRC:%vreg5 GPRC:%vreg4
				/// %vreg8<def> = LXSDX %ZERO8, %vreg7<kill>, %RM<imp-use>;
				/// mem:LD8[ConstantPool] F8RC:%vreg8 G8RC:%vreg7
				/// <SNIP2>
				/// BCC 76, %vreg5, <BB#4>; CRRC:%vreg5
				/// Successors according to CFG: BB#1(0x2aaaaaaa / 0x80000000 = 33.33%)
				/// BB#4(0x55555554 / 0x80000000 = 66.67%)
				///
				/// BB#1: derived from LLVM BB %entry
				/// Predecessors according to CFG: BB#0
				/// Successors according to CFG: BB#4(0x40000000 / 0x80000000 = 50.00%)
				///
				/// BB#4: derived from LLVM BB %entry
				/// Predecessors according to CFG: BB#0 BB#1
				/// %vreg9<def> = PHI %vreg8, <BB#1>, %vreg0, <BB#0>;
				/// F8RC:%vreg9,%vreg8,%vreg0
				/// %vreg13<def> = PHI %vreg12, <BB#1>, %vreg2, <BB#0>;
				/// F8RC:%vreg13,%vreg12,%vreg2
				/// <SNIP3>
				/// BLR8 %LR8<imp-use>, %RM<imp-use>, %F1<imp-use>
				///
				/// Branch Coalescing does not split blocks, it moves everything in the same
				/// direction ensuring it does not break use/definition semantics.
				///
				/// PHI nodes and its corresponding use instructions are moved to its successor
				/// block if there are no uses within the successor block PHI nodes. PHI
				/// node ordering cannot be assumed.
				///
				/// Non-PHI can be moved up to the predecessor basic block or down to the
				/// successor basic block following any PHI instructions. Whether it moves
				/// up or down depends on whether the register(s) defined in the instructions
				/// are used in current block or in any PHI instructions at the beginning of
				/// the successor block.

				namespace {

				class BranchCoalescing : public MachineFunctionPass {
				struct CoalescingCandidateInfo {
				MachineBasicBlock *BranchBlock; //< Block containing the branch
				MachineBasicBlock *BranchTargetBlock; //< Block branched to
				MachineBasicBlock *FallThroughBlock; //< Fall-through if branch not taken
				SmallVector<MachineOperand, 4> Cond;
				bool MustMoveDown;
				bool MustMoveUp;

				CoalescingCandidateInfo();
				void clear();
				};

				MachineDominatorTree *MDT;
				MachinePostDominatorTree *MPDT;
				const TargetInstrInfo *TII;
				MachineRegisterInfo *MRI;

				void initialize(MachineFunction &F);
				bool canCoalesceBranch(CoalescingCandidateInfo &Cand);
				bool identicalOperands(ArrayRef<MachineOperand> OperandList1,
				ArrayRef<MachineOperand> OperandList2) const;
				bool validateCandidates(CoalescingCandidateInfo &SourceRegion,
				CoalescingCandidateInfo &TargetRegion) const;

				static bool isBranchCoalescingEnabled() {
				return EnableBranchCoalescing == cl::BOU_TRUE;
				}

				public:
				static char ID;

				BranchCoalescing() : MachineFunctionPass(ID) {
				initializeBranchCoalescingPass(*PassRegistry::getPassRegistry());
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<MachineDominatorTree>();
				AU.addRequired<MachinePostDominatorTree>();
				MachineFunctionPass::getAnalysisUsage(AU);
				}

				StringRef getPassName() const override { return "Branch Coalescing"; }

				bool mergeCandidates(CoalescingCandidateInfo &SourceRegion,
				CoalescingCandidateInfo &TargetRegion);
				bool canMoveToBeginning(const MachineInstr &MI,
				const MachineBasicBlock &MBB) const;
				bool canMoveToEnd(const MachineInstr &MI,
				const MachineBasicBlock &MBB) const;
				bool canMerge(CoalescingCandidateInfo &SourceRegion,
				CoalescingCandidateInfo &TargetRegion) const;
				void moveAndUpdatePHIs(MachineBasicBlock *SourceRegionMBB,
				MachineBasicBlock *TargetRegionMBB);
				bool runOnMachineFunction(MachineFunction &MF) override;
				};
				} // End anonymous namespace.

				char BranchCoalescing::ID = 0;
				char &llvm::BranchCoalescingID = BranchCoalescing::ID;

				INITIALIZE_PASS_BEGIN(BranchCoalescing, "branch-coalescing",
				"Branch Coalescing", false, false)
				INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
				INITIALIZE_PASS_DEPENDENCY(MachinePostDominatorTree)
				INITIALIZE_PASS_END(BranchCoalescing, "branch-coalescing", "Branch Coalescing",
				false, false)

				BranchCoalescing::CoalescingCandidateInfo::CoalescingCandidateInfo()
				: BranchBlock(nullptr), BranchTargetBlock(nullptr),
				FallThroughBlock(nullptr), MustMoveDown(false), MustMoveUp(false) {}

				void BranchCoalescing::CoalescingCandidateInfo::clear() {
				BranchBlock = nullptr;
				BranchTargetBlock = nullptr;
				FallThroughBlock = nullptr;
				Cond.clear();
				MustMoveDown = false;
				MustMoveUp = false;
				}

				void BranchCoalescing::initialize(MachineFunction &MF) {
				MDT = &getAnalysis<MachineDominatorTree>();
				MPDT = &getAnalysis<MachinePostDominatorTree>();
				TII = MF.getSubtarget().getInstrInfo();
				MRI = &MF.getRegInfo();
				}

				///
				/// Analyze the branch statement to determine if it can be coalesced. This
				/// method analyses the branch statement for the given candidate to determine
				/// if it can be coalesced. If the branch can be coalesced, then the
				/// BranchTargetBlock and the FallThroughBlock are recorded in the specified
				/// Candidate.
				///
				///\param[in,out] Cand The coalescing candidate to analyze
				///\return true if and only if the branch can be coalesced, false otherwise
				///
				bool BranchCoalescing::canCoalesceBranch(CoalescingCandidateInfo &Cand) {
				DEBUG(dbgs() << "Determine if branch block " << Cand.BranchBlock->getNumber()
				<< " can be coalesced:");
				MachineBasicBlock *FalseMBB = nullptr;

				if (TII->analyzeBranch(*Cand.BranchBlock, Cand.BranchTargetBlock, FalseMBB,
				Cand.Cond)) {
				DEBUG(dbgs() << "TII unable to Analyze Branch - skip\n");
				return false;
				}

				for (auto &I : Cand.BranchBlock->terminators()) {
				DEBUG(dbgs() << "Looking at terminator : " << I << "\n");
				if (!I.isBranch())
				continue;

				if (I.getNumOperands() != I.getNumExplicitOperands()) {
				DEBUG(dbgs() << "Terminator contains implicit operands - skip : " << I
				<< "\n");
				return false;
				}
				}

				if (Cand.BranchBlock->isEHPad() \|\| Cand.BranchBlock->hasEHPadSuccessor()) {
				DEBUG(dbgs() << "EH Pad - skip\n");
				return false;
				}

				// For now only consider triangles (i.e, BranchTargetBlock is set,
				// FalseMBB is null, and BranchTargetBlock is a successor to BranchBlock)
				if (!Cand.BranchTargetBlock \|\| (Cand.BranchTargetBlock && FalseMBB)
				\|\| !Cand.BranchBlock->isSuccessor(Cand.BranchTargetBlock)) {
				DEBUG(dbgs() << "Does not form a triangle - skip\n");
				return false;
				}

				// Ensure there are only two successors
				if (Cand.BranchBlock->succ_size() != 2) {
				DEBUG(dbgs() << "Does not have 2 successors - skip\n");
				return false;
				}

				// Sanity check - the block must be able to fall through
				assert(Cand.BranchBlock->canFallThrough() &&
				"Expecting the block to fall through!");

				// We have already ensured there are exactly two successors to
				// BranchBlock and that BranchTargetBlock is a successor to BranchBlock.
				// Ensure the single fall though block is empty.
				MachineBasicBlock *Succ =
				(*Cand.BranchBlock->succ_begin() == Cand.BranchTargetBlock)
				? *Cand.BranchBlock->succ_rbegin()
				: *Cand.BranchBlock->succ_begin();

				assert(Succ && "Expecting a valid fall-through block\n");

				if (!Succ->empty()) {
				DEBUG(dbgs() << "Fall-through block contains code -- skip\n");
				return false;
				}

				if (!Succ->isSuccessor(Cand.BranchTargetBlock)) {
				DEBUG(dbgs()
				<< "Successor of fall through block is not branch taken block\n");
				return false;
				}

				Cand.FallThroughBlock = Succ;
				DEBUG(dbgs() << "Valid Candidate\n");
				return true;
				}

				///
				/// Determine if the two operand lists are identical
				///
				/// \param[in] OpList1 operand list
				/// \param[in] OpList2 operand list
				/// \return true if and only if the operands lists are identical
				///
				bool BranchCoalescing::identicalOperands(
				ArrayRef<MachineOperand> OpList1, ArrayRef<MachineOperand> OpList2) const {

				if (OpList1.size() != OpList2.size()) {
				DEBUG(dbgs() << "Operand list is different size\n");
				return false;
				}

				for (unsigned i = 0; i < OpList1.size(); ++i) {
				const MachineOperand &Op1 = OpList1[i];
				const MachineOperand &Op2 = OpList2[i];

				DEBUG(dbgs() << "Op1: " << Op1 << "\n"
				<< "Op2: " << Op2 << "\n");

				if (Op1.isIdenticalTo(Op2)) {
				DEBUG(dbgs() << "Op1 and Op2 are identical!\n");
				continue;
				}

				// If the operands are not identical, but are registers, check to see if the
				// definition of the register produces the same value. If they produce the
				// same value, consider them to be identical.
				if (Op1.isReg() && Op2.isReg() &&
				TargetRegisterInfo::isVirtualRegister(Op1.getReg()) &&
				TargetRegisterInfo::isVirtualRegister(Op2.getReg())) {
				MachineInstr *Op1Def = MRI->getVRegDef(Op1.getReg());
				MachineInstr *Op2Def = MRI->getVRegDef(Op2.getReg());
				if (TII->produceSameValue(Op1Def, Op2Def, MRI)) {
				iterateeUnsubmitted Not Done Reply Inline Actions This line isn't correct. I have a case where instructions that "produce the same value" are different. The relevant sequence is: BB#16: ... BCTRL8_LDinto_toc ... %vreg140<def> = COPY %CR0GT; CRBITRC:%vreg140 %vreg141<def> = LXSDX %vreg138, %vreg129, %RM<imp-use>; mem:LD8[%134](dereferenceable) F8RC:%vreg141 G8RC_and_G8RC_NOX0:%vreg138 G8RC:%vreg129 %vreg142<def> = XXLXORdpz; F8RC:%vreg142 BC %vreg140, <BB#73>; CRBITRC:%vreg140 BB#72: derived from LLVM BB %114 Predecessors according to CFG: BB#16 Successors according to CFG: BB#73(?%) BB#73: derived from LLVM BB %114 Predecessors according to CFG: BB#16 BB#72 %vreg143<def> = PHI %vreg142, <BB#72>, %vreg141, <BB#16>; F8RC:%vreg143,%vreg142,%vreg141 ... BCTRL8_LDinto_toc ... %vreg149<def> = COPY %CR0GT; CRBITRC:%vreg149 %vreg150<def> = LXSDX %vreg138, %vreg129, %RM<imp-use>; mem:LD8[%134](dereferenceable) F8RC:%vreg150 G8RC_and_G8RC_NOX0:%vreg138 G8RC:%vreg129 BC %vreg149, <BB#75>; CRBITRC:%vreg149 Successors according to CFG: BB#74(?%) BB#75(?%) BB#74: derived from LLVM BB %114 Predecessors according to CFG: BB#73 Successors according to CFG: BB#75(?%) BB#75: derived from LLVM BB %114 Predecessors according to CFG: BB#73 BB#74 %vreg151<def> = PHI %vreg142, <BB#74>, %vreg150, <BB#73>; F8RC:%vreg151,%vreg142,%vreg150 The debug output produces: Valid Candidate Op1: 1024 Op2: 1024 Op1 and Op2 are identical! Op1: %vreg140 Op2: %vreg149 Op1Def: %vreg140<def> = COPY %CR0GT; CRBITRC:%vreg140 and %vreg149<def> = COPY %CR0GT; CRBITRC:%vreg149 produce the same value! While it would be safe to CSE those crmoves, what definitely cannot occur is to assume that the value of CR0GT has not changed between the 2 instructions. iteratee: This line isn't correct. I have a case where instructions that "produce the same value" are…
				hfinkelUnsubmitted Not Done Reply Inline Actions Indeed; I think that we need to filter out instructions with physical-register uses (even if identical). We might be able to be a little smarter about it in a similar way to MachineLICM (the other uses of this function), which does this: // Don't hoist an instruction that uses or defines a physical register. if (TargetRegisterInfo::isPhysicalRegister(Reg)) { if (MO.isUse()) { // If the physreg has no defs anywhere, it's just an ambient register // and we can freely move its uses. Alternatively, if it's allocatable, // it could get allocated to something with a def during allocation. // However, if the physreg is known to always be caller saved/restored // then this use is safe to hoist. if (!MRI->isConstantPhysReg(Reg) && !(TRI->isCallerPreservedPhysReg(Reg, I.getParent()->getParent()))) return false; // Otherwise it's safe to move. continue; } else if (!MO.isDead()) { // A def that isn't dead. We can't move it. return false; ... hfinkel:* Indeed; I think that we need to filter out instructions with physical-register uses (even if…
				leiAuthorUnsubmitted Not Done Reply Inline Actions Addressing this in https://reviews.llvm.org/D32776 lei: Addressing this in https://reviews.llvm.org/D32776
				DEBUG(dbgs() << "Op1Def: " << Op1Def << " and " << Op2Def
				<< " produce the same value!\n");
				} else {
				DEBUG(dbgs() << "Operands produce different values\n");
				return false;
				}
				} else {
				DEBUG(dbgs() << "The operands are not provably identical.\n");
				return false;
				}
				}
				return true;
				}

				///
				/// Moves ALL PHI instructions in SourceMBB to beginning of TargetMBB
				/// and update them to refer to the new block. PHI node ordering
				/// cannot be assumed so it does not matter where the PHI instructions
				/// are moved to in TargetMBB.
				///
				/// \param[in] SourceMBB block to move PHI instructions from
				/// \param[in] TargetMBB block to move PHI instructions to
				///
				void BranchCoalescing::moveAndUpdatePHIs(MachineBasicBlock *SourceMBB,
				MachineBasicBlock *TargetMBB) {

				MachineBasicBlock::iterator MI = SourceMBB->begin();
				MachineBasicBlock::iterator ME = SourceMBB->getFirstNonPHI();

				if (MI == ME) {
				DEBUG(dbgs() << "SourceMBB contains no PHI instructions.\n");
				return;
				}

				// Update all PHI instructions in SourceMBB and move to top of TargetMBB
				for (MachineBasicBlock::iterator Iter = MI; Iter != ME; Iter++) {
				MachineInstr &PHIInst = *Iter;
				for (unsigned i = 2, e = PHIInst.getNumOperands() + 1; i != e; i += 2) {
				MachineOperand &MO = PHIInst.getOperand(i);
				if (MO.getMBB() == SourceMBB)
				MO.setMBB(TargetMBB);
				}
				}
				TargetMBB->splice(TargetMBB->begin(), SourceMBB, MI, ME);
				}

				///
				/// This function checks if MI can be moved to the beginning of the TargetMBB
				/// following PHI instructions. A MI instruction can be moved to beginning of
				/// the TargetMBB if there are no uses of it within the TargetMBB PHI nodes.
				///
				/// \param[in] MI the machine instruction to move.
				/// \param[in] MBB the machine basic block to move to
				/// \return true if it is safe to move MI to beginning of TargetMBB,
				/// false otherwise.
				///
				bool BranchCoalescing::canMoveToBeginning(const MachineInstr &MI,
				const MachineBasicBlock &TargetMBB
				) const {

				DEBUG(dbgs() << "Checking if " << MI << " can move to beginning of "
				<< TargetMBB.getNumber() << "\n");

				for (auto &Def : MI.defs()) { // Looking at Def
				for (auto &Use : MRI->use_instructions(Def.getReg())) {
				if (Use.isPHI() && Use.getParent() == &TargetMBB) {
				DEBUG(dbgs() << " * used in a PHI -- cannot move *\n");
				return false;
				}
				}
				}

				DEBUG(dbgs() << " Safe to move to the beginning.\n");
				return true;
				}

				///
				/// This function checks if MI can be moved to the end of the TargetMBB,
				/// immediately before the first terminator. A MI instruction can be moved
				/// to then end of the TargetMBB if no PHI node defines what MI uses within
				/// it's own MBB.
				///
				/// \param[in] MI the machine instruction to move.
				/// \param[in] MBB the machine basic block to move to
				/// \return true if it is safe to move MI to end of TargetMBB,
				/// false otherwise.
				///
				bool BranchCoalescing::canMoveToEnd(const MachineInstr &MI,
				const MachineBasicBlock &TargetMBB
				) const {

				DEBUG(dbgs() << "Checking if " << MI << " can move to end of "
				<< TargetMBB.getNumber() << "\n");

				for (auto &Use : MI.uses()) {
				if (Use.isReg() && TargetRegisterInfo::isVirtualRegister(Use.getReg())) {
				MachineInstr *DefInst = MRI->getVRegDef(Use.getReg());
				if (DefInst->isPHI() && DefInst->getParent() == MI.getParent()) {
				DEBUG(dbgs() << " * Cannot move this instruction *\n");
				return false;
				} else {
				DEBUG(dbgs() << " *** def is in another block -- safe to move!\n");
				}
				}
				}

				DEBUG(dbgs() << " Safe to move to the end.\n");
				return true;
				}

				///
				/// This method checks to ensure the two coalescing candidates follows the
				/// expected pattern required for coalescing.
				///
				/// \param[in] SourceRegion The candidate to move statements from
				/// \param[in] TargetRegion The candidate to move statements to
				/// \return true if all instructions in SourceRegion.BranchBlock can be merged
				/// into a block in TargetRegion; false otherwise.
				///
				bool BranchCoalescing::validateCandidates(
				CoalescingCandidateInfo &SourceRegion,
				CoalescingCandidateInfo &TargetRegion) const {

				if (TargetRegion.BranchTargetBlock != SourceRegion.BranchBlock)
				llvm_unreachable("Expecting SourceRegion to immediately follow TargetRegion");
				else if (!MDT->dominates(TargetRegion.BranchBlock, SourceRegion.BranchBlock))
				llvm_unreachable("Expecting TargetRegion to dominate SourceRegion");
				else if (!MPDT->dominates(SourceRegion.BranchBlock, TargetRegion.BranchBlock))
				llvm_unreachable("Expecting SourceRegion to post-dominate TargetRegion");
				else if (!TargetRegion.FallThroughBlock->empty() \|\|
				!SourceRegion.FallThroughBlock->empty())
				llvm_unreachable("Expecting fall-through blocks to be empty");

				return true;
				}

				///
				/// This method determines whether the two coalescing candidates can be merged.
				/// In order to be merged, all instructions must be able to
				/// 1. Move to the beginning of the SourceRegion.BranchTargetBlock;
				/// 2. Move to the end of the TargetRegion.BranchBlock.
				/// Merging involves moving the instructions in the
				/// TargetRegion.BranchTargetBlock (also SourceRegion.BranchBlock).
				///
				/// This function first try to move instructions from the
				/// TargetRegion.BranchTargetBlock down, to the beginning of the
				/// SourceRegion.BranchTargetBlock. This is not possible if any register defined
				/// in TargetRegion.BranchTargetBlock is used in a PHI node in the
				/// SourceRegion.BranchTargetBlock. In this case, check whether the statement
				/// can be moved up, to the end of the TargetRegion.BranchBlock (immediately
				/// before the branch statement). If it cannot move, then these blocks cannot
				/// be merged.
				///
				/// Note that there is no analysis for moving instructions past the fall-through
				/// blocks because they are confirmed to be empty. An assert is thrown if they
				/// are not.
				///
				/// \param[in] SourceRegion The candidate to move statements from
				/// \param[in] TargetRegion The candidate to move statements to
				/// \return true if all instructions in SourceRegion.BranchBlock can be merged
				/// into a block in TargetRegion, false otherwise.
				///
				bool BranchCoalescing::canMerge(CoalescingCandidateInfo &SourceRegion,
				CoalescingCandidateInfo &TargetRegion) const {
				if (!validateCandidates(SourceRegion, TargetRegion))
				return false;

				// Walk through PHI nodes first and see if they force the merge into the
				// SourceRegion.BranchTargetBlock.
				for (MachineBasicBlock::iterator
				I = SourceRegion.BranchBlock->instr_begin(),
				E = SourceRegion.BranchBlock->getFirstNonPHI();
				I != E; ++I) {
				for (auto &Def : I->defs())
				for (auto &Use : MRI->use_instructions(Def.getReg())) {
				if (Use.isPHI() && Use.getParent() == SourceRegion.BranchTargetBlock) {
				DEBUG(dbgs() << "PHI " << *I << " defines register used in another "
				"PHI within branch target block -- can't merge\n");
				NumPHINotMoved++;
				return false;
				}
				if (Use.getParent() == SourceRegion.BranchBlock) {
				DEBUG(dbgs() << "PHI " << *I
				<< " defines register used in this "
				"block -- all must move down\n");
				SourceRegion.MustMoveDown = true;
				}
				}
				}

				// Walk through the MI to see if they should be merged into
				// TargetRegion.BranchBlock (up) or SourceRegion.BranchTargetBlock (down)
				for (MachineBasicBlock::iterator
				I = SourceRegion.BranchBlock->getFirstNonPHI(),
				E = SourceRegion.BranchBlock->end();
				I != E; ++I) {
				if (!canMoveToBeginning(I, SourceRegion.BranchTargetBlock)) {
				DEBUG(dbgs() << "Instruction " << *I
				<< " cannot move down - must move up!\n");
				SourceRegion.MustMoveUp = true;
				}
				if (!canMoveToEnd(I, TargetRegion.BranchBlock)) {
				DEBUG(dbgs() << "Instruction " << *I
				<< " cannot move up - must move down!\n");
				SourceRegion.MustMoveDown = true;
				}
				}

				return (SourceRegion.MustMoveUp && SourceRegion.MustMoveDown) ? false : true;
				}

				/// Merge the instructions from SourceRegion.BranchBlock,
				/// SourceRegion.BranchTargetBlock, and SourceRegion.FallThroughBlock into
				/// TargetRegion.BranchBlock, TargetRegion.BranchTargetBlock and
				/// TargetRegion.FallThroughBlock respectively.
				///
				/// The successors for blocks in TargetRegion will be updated to use the
				/// successors from blocks in SourceRegion. Finally, the blocks in SourceRegion
				/// will be removed from the function.
				///
				/// A region consists of a BranchBlock, a FallThroughBlock, and a
				/// BranchTargetBlock. Branch coalesce works on patterns where the
				/// TargetRegion's BranchTargetBlock must also be the SourceRegions's
				/// BranchBlock.
				///
				/// Before mergeCandidates:
				///
				/// +---------------------------+
				/// \| TargetRegion.BranchBlock \|
				/// +---------------------------+
				/// / \|
				/// / +--------------------------------+
				/// \| \| TargetRegion.FallThroughBlock \|
				/// \ +--------------------------------+
				/// \ \|
				/// +----------------------------------+
				/// \| TargetRegion.BranchTargetBlock \|
				/// \| SourceRegion.BranchBlock \|
				/// +----------------------------------+
				/// / \|
				/// / +--------------------------------+
				/// \| \| SourceRegion.FallThroughBlock \|
				/// \ +--------------------------------+
				/// \ \|
				/// +----------------------------------+
				/// \| SourceRegion.BranchTargetBlock \|
				/// +----------------------------------+
				///
				/// After mergeCandidates:
				///
				/// +-----------------------------+
				/// \| TargetRegion.BranchBlock \|
				/// \| SourceRegion.BranchBlock \|
				/// +-----------------------------+
				/// / \|
				/// / +---------------------------------+
				/// \| \| TargetRegion.FallThroughBlock \|
				/// \| \| SourceRegion.FallThroughBlock \|
				/// \ +---------------------------------+
				/// \ \|
				/// +----------------------------------+
				/// \| SourceRegion.BranchTargetBlock \|
				/// +----------------------------------+
				///
				/// \param[in] SourceRegion The candidate to move blocks from
				/// \param[in] TargetRegion The candidate to move blocks to
				///
				bool BranchCoalescing::mergeCandidates(CoalescingCandidateInfo &SourceRegion,
				CoalescingCandidateInfo &TargetRegion) {

				if (SourceRegion.MustMoveUp && SourceRegion.MustMoveDown) {
				llvm_unreachable("Cannot have both MustMoveDown and MustMoveUp set!");
				return false;
				}

				if (!validateCandidates(SourceRegion, TargetRegion))
				return false;

				// Start the merging process by first handling the BranchBlock.
				// Move any PHIs in SourceRegion.BranchBlock down to the branch-taken block
				moveAndUpdatePHIs(SourceRegion.BranchBlock, SourceRegion.BranchTargetBlock);

				// Move remaining instructions in SourceRegion.BranchBlock into
				// TargetRegion.BranchBlock
				MachineBasicBlock::iterator firstInstr =
				SourceRegion.BranchBlock->getFirstNonPHI();
				MachineBasicBlock::iterator lastInstr =
				SourceRegion.BranchBlock->getFirstTerminator();

				MachineBasicBlock *Source = SourceRegion.MustMoveDown
				? SourceRegion.BranchTargetBlock
				: TargetRegion.BranchBlock;

				MachineBasicBlock::iterator Target =
				SourceRegion.MustMoveDown
				? SourceRegion.BranchTargetBlock->getFirstNonPHI()
				: TargetRegion.BranchBlock->getFirstTerminator();

				Source->splice(Target, SourceRegion.BranchBlock, firstInstr, lastInstr);

				// Once PHI and instructions have been moved we need to clean up the
				// control flow.

				// Remove SourceRegion.FallThroughBlock before transferring successors of
				// SourceRegion.BranchBlock to TargetRegion.BranchBlock.
				SourceRegion.BranchBlock->removeSuccessor(SourceRegion.FallThroughBlock);
				TargetRegion.BranchBlock->transferSuccessorsAndUpdatePHIs(
				SourceRegion.BranchBlock);
				// Update branch in TargetRegion.BranchBlock to jump to
				// SourceRegion.BranchTargetBlock
				// In this case, TargetRegion.BranchTargetBlock == SourceRegion.BranchBlock.
				TargetRegion.BranchBlock->ReplaceUsesOfBlockWith(
				SourceRegion.BranchBlock, SourceRegion.BranchTargetBlock);
				// Remove the branch statement(s) in SourceRegion.BranchBlock
				MachineBasicBlock::iterator I =
				SourceRegion.BranchBlock->terminators().begin();
				while (I != SourceRegion.BranchBlock->terminators().end()) {
				MachineInstr &CurrInst = *I;
				++I;
				if (CurrInst.isBranch())
				CurrInst.eraseFromParent();
				}

				// Fall-through block should be empty since this is part of the condition
				// to coalesce the branches.
				assert(TargetRegion.FallThroughBlock->empty() &&
				"FallThroughBlocks should be empty!");

				// Transfer successor information and move PHIs down to the
				// branch-taken block.
				TargetRegion.FallThroughBlock->transferSuccessorsAndUpdatePHIs(
				SourceRegion.FallThroughBlock);
				TargetRegion.FallThroughBlock->removeSuccessor(SourceRegion.BranchBlock);

				// Remove the blocks from the function.
				assert(SourceRegion.BranchBlock->empty() &&
				"Expecting branch block to be empty!");
				SourceRegion.BranchBlock->eraseFromParent();

				assert(SourceRegion.FallThroughBlock->empty() &&
				"Expecting fall-through block to be empty!\n");
				SourceRegion.FallThroughBlock->eraseFromParent();

				NumBlocksCoalesced++;
				return true;
				}

				bool BranchCoalescing::runOnMachineFunction(MachineFunction &MF) {

				if (skipFunction(*MF.getFunction()) \|\| MF.empty() \|\|
				!isBranchCoalescingEnabled())
				return false;

				bool didSomething = false;

				DEBUG(dbgs() << "****** Branch Coalescing ******\n");
				initialize(MF);

				DEBUG(dbgs() << "Function: "; MF.dump(); dbgs() << "\n");

				CoalescingCandidateInfo Cand1, Cand2;
				// Walk over blocks and find candidates to merge
				// Continue trying to merge with the first candidate found, as long as merging
				// is successfull.
				for (MachineBasicBlock &MBB : MF) {
				bool MergedCandidates = false;
				do {
				MergedCandidates = false;
				Cand1.clear();
				Cand2.clear();

				Cand1.BranchBlock = &MBB;

				// If unable to coalesce the branch, then continue to next block
				if (!canCoalesceBranch(Cand1))
				break;

				Cand2.BranchBlock = Cand1.BranchTargetBlock;
				if (!canCoalesceBranch(Cand2))
				break;

				// Sanity check
				// The branch-taken block of the second candidate should post-dominate the
				// first candidate
				assert(MPDT->dominates(Cand2.BranchTargetBlock, Cand1.BranchBlock) &&
				"Branch-taken block should post-dominate first candidate");

				if (!identicalOperands(Cand1.Cond, Cand2.Cond)) {
				DEBUG(dbgs() << "Blocks " << Cand1.BranchBlock->getNumber() << " and "
				<< Cand2.BranchBlock->getNumber()
				<< " have different branches\n");
				break;
				}
				if (!canMerge(Cand2, Cand1)) {
				DEBUG(dbgs() << "Cannot merge blocks " << Cand1.BranchBlock->getNumber()
				<< " and " << Cand2.BranchBlock->getNumber() << "\n");
				NumBlocksNotCoalesced++;
				continue;
				}
				DEBUG(dbgs() << "Merging blocks " << Cand1.BranchBlock->getNumber()
				<< " and " << Cand1.BranchTargetBlock->getNumber() << "\n");
				MergedCandidates = mergeCandidates(Cand2, Cand1);
				if (MergedCandidates)
				didSomething = true;

				DEBUG(dbgs() << "Function after merging: "; MF.dump(); dbgs() << "\n");
				} while (MergedCandidates);
				}

				#ifndef NDEBUG
				// Verify MF is still valid after branch coalescing
				if (didSomething)
				MF.verify(nullptr, "Error in code produced by branch coalescing");
				#endif // NDEBUG

				DEBUG(dbgs() << "Finished Branch Coalescing\n");
				return didSomething;
				}

llvm/trunk/lib/CodeGen/CMakeLists.txt

	add_llvm_library(LLVMCodeGen			add_llvm_library(LLVMCodeGen
	AggressiveAntiDepBreaker.cpp			AggressiveAntiDepBreaker.cpp
	AllocationOrder.cpp			AllocationOrder.cpp
	Analysis.cpp			Analysis.cpp
	AtomicExpandPass.cpp			AtomicExpandPass.cpp
	BasicTargetTransformInfo.cpp			BasicTargetTransformInfo.cpp
				BranchCoalescing.cpp
	BranchFolding.cpp			BranchFolding.cpp
	BranchRelaxation.cpp			BranchRelaxation.cpp
	BuiltinGCs.cpp			BuiltinGCs.cpp
	CalcSpillWeights.cpp			CalcSpillWeights.cpp
	CallingConvLower.cpp			CallingConvLower.cpp
	CodeGen.cpp			CodeGen.cpp
	CodeGenPrepare.cpp			CodeGenPrepare.cpp
	CountingFunctionInserter.cpp			CountingFunctionInserter.cpp
	▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/CodeGen.cpp

	Show All 15 Lines
	#include "llvm-c/Initialization.h"			#include "llvm-c/Initialization.h"
	#include "llvm/PassRegistry.h"			#include "llvm/PassRegistry.h"

	using namespace llvm;			using namespace llvm;

	/// initializeCodeGen - Initialize all passes linked into the CodeGen library.			/// initializeCodeGen - Initialize all passes linked into the CodeGen library.
	void llvm::initializeCodeGen(PassRegistry &Registry) {			void llvm::initializeCodeGen(PassRegistry &Registry) {
	initializeAtomicExpandPass(Registry);			initializeAtomicExpandPass(Registry);
				initializeBranchCoalescingPass(Registry);
	initializeBranchFolderPassPass(Registry);			initializeBranchFolderPassPass(Registry);
	initializeBranchRelaxationPass(Registry);			initializeBranchRelaxationPass(Registry);
	initializeCodeGenPreparePass(Registry);			initializeCodeGenPreparePass(Registry);
	initializeCountingFunctionInserterPass(Registry);			initializeCountingFunctionInserterPass(Registry);
	initializeDeadMachineInstructionElimPass(Registry);			initializeDeadMachineInstructionElimPass(Registry);
	initializeDetectDeadLanesPass(Registry);			initializeDetectDeadLanesPass(Registry);
	initializeDwarfEHPreparePass(Registry);			initializeDwarfEHPreparePass(Registry);
	initializeEarlyIfConverterPass(Registry);			initializeEarlyIfConverterPass(Registry);
	▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/TargetPassConfig.cpp

Show First 20 Lines • Show All 701 Lines • ▼ Show 20 Lines	void TargetPassConfig::addMachineSSAOptimization() {

// Allow targets to insert passes that improve instruction level parallelism,		// Allow targets to insert passes that improve instruction level parallelism,
// like if-conversion. Such passes will typically need dominator trees and		// like if-conversion. Such passes will typically need dominator trees and
// loop info, just like LICM and CSE below.		// loop info, just like LICM and CSE below.
addILPOpts();		addILPOpts();

addPass(&MachineLICMID, false);		addPass(&MachineLICMID, false);
addPass(&MachineCSEID, false);		addPass(&MachineCSEID, false);

		// Coalesce basic blocks with the same branch condition
		addPass(&BranchCoalescingID);

addPass(&MachineSinkingID);		addPass(&MachineSinkingID);

addPass(&PeepholeOptimizerID);		addPass(&PeepholeOptimizerID);
// Clean-up the dead code that may have been generated by peephole		// Clean-up the dead code that may have been generated by peephole
// rewriting.		// rewriting.
addPass(&DeadMachineInstructionElimID);		addPass(&DeadMachineInstructionElimID);
}		}

▲ Show 20 Lines • Show All 198 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/PowerPC/branch_coalesce.ll

				; RUN: llc -mcpu=pwr8 -mtriple=powerpc64le-unknown-linux-gnu -verify-machineinstrs -enable-branch-coalesce=true < %s \| FileCheck %s
				; RUN: llc -mcpu=pwr8 -mtriple=powerpc64-unknown-linux-gnu -verify-machineinstrs -enable-branch-coalesce=true < %s \| FileCheck %s

				; Function Attrs: nounwind
				define double @testBranchCoal(double %a, double %b, double %c, i32 %x) {
				entry:
				%test = icmp eq i32 %x, 0
				%tmp1 = select i1 %test, double %a, double 2.000000e-03
				%tmp2 = select i1 %test, double %b, double 0.000000e+00
				%tmp3 = select i1 %test, double %c, double 5.000000e-03

				%res1 = fadd double %tmp1, %tmp2
				%result = fadd double %res1, %tmp3
				ret double %result

				; CHECK-LABEL: @testBranchCoal
				; CHECK: cmplwi [[CMPR:[0-7]+]], 6, 0
				; CHECK: beq [[CMPR]], .LBB[[LAB1:[0-9_]+]]
				; CHECK-DAG: addis [[LD1REG:[0-9]+]], 2, .LCPI0_0@toc@ha
				; CHECK-DAG: addis [[LD2REG:[0-9]+]], 2, .LCPI0_1@toc@ha
				; CHECK-DAG: xxlxor 2, 2, 2
				; CHECK-NOT: beq
				; CHECK-DAG: addi [[LD1BASE:[0-9]+]], [[LD1REG]]
				; CHECK-DAG: addi [[LD2BASE:[0-9]+]], [[LD2REG]]
				; CHECK-DAG: lxsdx 1, 0, [[LD1BASE]]
				; CHECK-DAG: lxsdx 3, 0, [[LD2BASE]]
				; CHECK: .LBB[[LAB1]]
				; CHECK: xsadddp 0, 1, 2
				; CHECK: xsadddp 1, 0, 3
				; CHECK: blr
				}

This is an archive of the discontinued LLVM Phabricator instance.

Improve scheduling with branch coalescingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 90219

llvm/trunk/include/llvm/CodeGen/Passes.h

llvm/trunk/include/llvm/InitializePasses.h

llvm/trunk/lib/CodeGen/BranchCoalescing.cpp

llvm/trunk/lib/CodeGen/CMakeLists.txt

llvm/trunk/lib/CodeGen/CodeGen.cpp

llvm/trunk/lib/CodeGen/TargetPassConfig.cpp

llvm/trunk/test/CodeGen/PowerPC/branch_coalesce.ll

Improve scheduling with branch coalescing
ClosedPublic