This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/CodeGen/
-
CodeGen/
-
PeepholeOptimizer.cpp
-
TwoAddressInstructionPass.cpp
-
test/CodeGen/
-
CodeGen/
-
MIR/Generic/
-
Generic/
-
multiRunPass.mir
-
X86/
-
peephole-recurrence.mir

Differential D31821

Remove redundant copy in recurrences
ClosedPublic

Authored by twoh on Apr 7 2017, 11:19 AM.

Download Raw Diff

Details

Reviewers

qcolombet
MatzeB
wmi

Commits

rG0e35ea3b7c63: Remove redundant copy in recurrences
rL306758: Remove redundant copy in recurrences

Summary

If there is a chain of instructions formulating a recurrence, commuting operands can help removing a redundant copy. In the following example code,

BB#1: ; Loop Header
  %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
  ...

BB#6: ; Loop Latch
  %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
  %vreg10<def,tied1> = ADD32rr %vreg1<kill,tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1,%vreg0
  %vreg3<def,tied1> = ADD32rr %vreg2<kill,tied0>, %vreg10<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2,%vreg10
  CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
  %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
  JL_1 <BB#1>, %EFLAGS<imp-use,kill>

Existing two-address generation pass generates following code:

BB#1:
  %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
  ...

BB#6:
    Predecessors according to CFG: BB#5 BB#4
  %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
  %vreg10<def> = COPY %vreg1<kill>; GR32:%vreg10,%vreg1
  %vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg0
  %vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10
  %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2
  CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
  %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
  JL_1 <BB#1>, %EFLAGS<imp-use,kill>
  JMP_1 <BB#7>

This is suboptimal because the assembly code generated has a redundant copy at the end of #BB6 to feed %vreg13 to BB#1:

.LBB0_6:
  addl  %esi, %edi
  addl  %ebx, %edi
  cmpl  $10, %edi
  movl  %edi, %esi
  jl  .LBB0_1

This redundant copy can be elimiated by making instructions in the recurrence chain to compute the value "into" the register that actually holds the feedback value. In this example, this can be achieved by commuting %vreg0 and %vreg1 to compute %vreg10. With that change, code after two-address generation becomes

BB#1:
  %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
  ...

BB#6: derived from LLVM BB %bb7
    Predecessors according to CFG: BB#5 BB#4
  %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
  %vreg10<def> = COPY %vreg0<kill>; GR32:%vreg10,%vreg0
  %vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg1<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1
  %vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10
  %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2
  CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
  %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
  JL_1 <BB#1>, %EFLAGS<imp-use,kill>
  JMP_1 <BB#7>

and the final assembly does not have redundant copy:

.LBB0_6:
  addl  %edi, %eax
  addl  %ebx, %eax
  cmpl  $10, %eax
  jl  .LBB0_1

Diff Detail

Repository: rL LLVM

Event Timeline

twoh created this revision.Apr 7 2017, 11:19 AM

twoh edited the summary of this revision. (Show Details)Apr 7 2017, 11:19 AM

small changes to the test.

Probably best to have Quentin or Matthias do it.

Friendly ping. Thanks!

Ping. Thanks!

This looks pretty complicated for the task at hand. Wouldn't this be a simpler transformation to perform at a place where we are still in Machine SSA and have use-def information for free and also have phis around to indicate loops so we can do without a MachineLoopInfo instance.

@MatzeB I was considering other passes as well for the sake of simplicity, but I concluded two-address instruction is the right place to do. The problem doesn't appear while we're in SSA, but becomes an issue when we make an decision about which operand register should be the destination register. I think this is why isProfitableToCommute function is in two-address instruction pass.

Ping. If you're too busy to review this patch, could you please recommend someone else? Thanks!

Ping.

Ping. Can someone please let me know what would be the best way to have this patch reviewed? Thanks!

I'd like to hear @qcolombet's opinion about this, also +wmi who wrote the similar isRevCopyChain().

This patch

Nitpicks below
Do you have an idea of the compiletime impact of this patch?
I am slightly worried about the use of MachineLoopInfo: Is it really necessary here? Maybe the same can be done with some simpler dominance check? Can you tell when the MachineLoopInfo is/isn't available (as that will impact whether this rule is applied or not).

General Observations

I am not happy with the general approach of throwing more and more patterns at TwoAddressInstructions:

Yes, LLVMs handling of TwoAddressInstruction as a pre-RA pass is a bad idea IMO (and I don't know why it was done this way): We are unnecessarily constraining the allocation problem and don't necessarily do a good job upfront without knowing how the allocation will work out.
This pass adds another pattern at TwoAddressInstruction: It is similar to isRevCopyChain() but slightly more complicated so we end up with 160 extra lines of analysis for yet another pattern. If this trend continues we will have an even harder time maintaining this pass.
On the other hand this improves code quality today without rearchitecting the code.

lib/CodeGen/TwoAddressInstructionPass.cpp
187 ↗	(On Diff #94549)	This makes no sense! You are starting to use the machine loop info, all you do here is mark it preserved a 2nd time; There was already addPreservedID(MachineLoopInfoID) which had the same effect.
577 ↗	(On Diff #94549)	Use references for things that cannot be nullptr. Similar for a few other places here.
579–580 ↗	(On Diff #94549)	Don't use `auto` when the type isn't immediately obvious just by looking at the line. (`auto` is unfriendly towards the readers of your code). Similar in a few more places.
585 ↗	(On Diff #94549)	we don't tend to have a space before the ';' here. Did you try clang-format on your patch?
621 ↗	(On Diff #94549)	This only gives you the declared/minimum number of operands, there may be more.
714–715 ↗	(On Diff #94549)	Isn't this check superfluous? I would expect this to be true anyway if the instruction is inside `MBB` which is checked later.
test/CodeGen/X86/twoaddr-recurrence.ll
1–2 ↗	(On Diff #94549)	Did you try to create a .mir test so the pass can be tested in isolation? If yes and it didn't work, could you tell us why so we can improve the .mir testing.

Addressing comments from @MatzeB.

Harbormaster completed remote builds in B6485: Diff 99207.May 16 2017, 2:42 PM

@MatzeB Thank you for your comments. I'm collection compile time numbers now with spec2006. For our internal benchmark, which takes about ~40min to compile, the compile time difference was negligible.

I don't think this patch can be implemented without loop info, because this is formulated around the loop recurrence pattern. It might be possible but will eventually require more code to produce the information provided by MachineLoopInfo. I'm curious what is your biggest concern about using MachineLoopInfo in this pass, as it is used in other passes as well.

I agree on you that maintainability vs supporting more patterns is a hard trade-off. IMHO, if we want to generate better performing code, adding more code to support more patterns would be unavoidable, unless we re-architecturing register allocation related passes. I think this patch adds more lines of code because this handles the pattern based on the loop structure for the first time.

I tried .mir test, but the output is still not two-address format. I tried

./bin/llc -mtriple=x86_64-- -run-pass=twoaddressinstruction -o - ./input.mir

(input.mir is from -stop-before=twoaddressinstruction), and the output still has instructions such as

%vreg10<def,tied1> = ADD32rr %vreg1<kill,tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1,%vreg0
.

lib/CodeGen/TwoAddressInstructionPass.cpp
714–715 ↗	(On Diff #94549)	This check is here to filter the definitions that are outside of Loop. The check below is to see if the definition inside the Loop belongs to MBB.

The problem doesn't appear while we're in SSA [...]

Why is that?
The marking of tie operand is already there, following phis shouldn't be an issue and would eliminate the whole data flow processing.

What am I missing?

Side question; if several recurrence chains share some operand, how do you pick the most profitable one?

@qcolombet Actually you're right. Having tied operands, this can be done within SSA. Do you have a suggestion for a better place to implement this? I thought what isRevCopyChain does is most close to what this patch does.

I'm afraid that I couldn't quite understand your side question. This patch commutes operands of an instruction if the instruction is inside the recurrence that meets certain conditions. In line 846-850, it checks if the recurrence pattern is formulated around regC, and returns true to commute operands. If not, it checks same for regB, and returns false to not to commute if it observes the pattern. If the pattern is observed for both regB and regC, it is possible that not to commute the operands generates better result, but as the logic checks regC first, it'll commute the operands.

@MatzeB Below is a compile time difference in percentage for spec2006

400.perlbench 1.80
401.bzip2 0.66
403.gcc 0.77
429.mcf 2.61
433.milc -1.70
444.namd 0.31
445.gobmk -0.05
447.dealII -0.22
450.soplex 0.48
453.povray 1.34
456.hmmer 0.49
458.sjeng 1.30
462.libquantum 1.90
464.h264ref 0.52
470.lbm -3.01
471.omnetpp 2.50
473.astar 0.27
482.sphinx3 1.01
483.xalncbmk -0.47

Hi Taewook,

Thank you for working on it. The testcase you gave is interesting. My previous work in isRevCopyChain is quite limited and only handle recurrence within a BB, so I like to see a more general fix about the redundent copy problem caused by recurrence.

A problem of the patch I can see is that it uses dataflow to track the recurrence, which I think may not be enough. Think about the case below:

r3 = r4;
...
r1 = r2 + r3;
...
r5 = load(r1)
r4 = r5;

r1-->r5-->r4-->r3-->r1, it is a recurrence loop according to the algorithm, however, we don't have problem to coalesce r3 with r4 and coalesce r4 and r5, because r5 and r1 don't have to be allocated to the same register.

So the def-src constructing an recurrence chain interesting for us should only contains def and src which are in a tied operand group, or which are in the same copy. In other word, the recurrence chain can only cause redundent copy problem when all the operands on the chain have to be allocated to the same register. That kind of recurrence loop is what we are interested in.

In addition, I don't think the recurrence chain has to be a strict cycle. Just like your motivational testcase:
vreg0 = vreg13
...
vreg2 = vreg15
vreg10 = add vreg1, vreg0
vreg3 = add vreg2, vreg10
vreg13 = vreg3

Here the recurrence chain starting from vreg0 is: vreg0-->vreg13-->vreg3-->vreg2. This is not a strict cycle, however, vreg2 and vreg0 already have interference with each other. It means it is impossible to allocate all the vregs in the recurrence chain to the same physical register, and some copy has to be kept.

I think those are the two issues that have to be addressed for a more general solution to the recurrence problem.

Thanks,
Wei.

@wmi Thank you for your reply. I agree on you that we should consider tied operand group, and as @qcolombet mentioned in the previous comment, if tie operand information is already available before this pass, I'd like to discuss where would be the best place to implement this.

I'm afraid I couldn't understand your second point. In the example there is a recurrence cycle of vreg0-->vreg13-->vreg3-->vreg10-->vreg0 as well, and what this patch does is commuting (vreg0, vreg1) and (vreg2, vreg10) to remove the copy. Did I miss something?

Hi Taewook,

@qcolombet Actually you're right. Having tied operands, this can be done within SSA. Do you have a suggestion for a better place to implement this? I thought what isRevCopyChain does is most close to what this patch does.

I would investigate this as part of the PeepholeOptimizer. It already massages copies.

Regarding my second question, answered it, with that part:

If the pattern is observed for both regB and regC, it is possible that not to commute the operands generates better result, but as the logic checks regC first, it'll commute the operands.

Cheers,
-Quentin

Reimplement the feature in peephole optimizer. This patch makes the values in the recurrence cycle to be tied to each other if possible.

Add missing reference.

FYI, below is the compile difference in percentage for spec2006:

400.perlbench 0.23
401.bzip2 -0.01
403.gcc -0.01
429.mcf 3.35
433.milc 1.11
444.namd 0.27
445.gobmk 0.20
447.dealII 0.49
450.soplex 1.02
453.povray 0.55
456.hmmer 0.19
458.sjeng 1.37
462.libquantum 1.76
464.h264ref 1.01
470.lbm -0.98
471.omnetpp 1.61
473.astar 0.95
482.sphinx3 0.50
483.xalncbmk -0.89

ping. thanks!

Sorry for the delay. The rewrite based on SSA looks much cleaner now. About the algorithm, IIUC it tries to find loop based on define-use of tied operand or operand commutable with tied operand. However, I still have concern that the method can increase redundent copy sometimes.

Here is a testcase:

A = phi(B, O)
M = load addr1.
C = M + A C and M are tied operands.
store A to addr2.
N = load from addr3
B = N + C B and N are tied operands.

Without the patch, we can allocate the above testcase without copy -- allocate A, B and N to physreg1 and allocate M, C to physreg2.

With the patch, after it changes C = M + A to C = A + M, a copy needs to be generated because C and A are tied operands but C's live range and A's live range are overlapped. It is impossible to allocate C and A to the same physreg without extra copy.

Actually, I think we don't have to explore candidate cycle based on operand commutable with tied operand. An algorithm in my mind is to find cycle which are only composed of tied operands, copies, and phi in loop header, then try to look at if there is any live range overlap between any two operands inside of the cycle. Only when there is live range overlap, we will consider to commute some operands.

@wmi Not at all! Thanks for your comments.

You're right. I should've considered the case where the operands involved in the recurrence cycle live beyond the use in the recurrence cycle. You're suggestion sounds great, but my concern is that computing live range might be expensive because live interval analysis is performed after peephole optimization, What do you think of limit it to the case of operands involved in the recurrence cycle have no use outside of the cycle?

Addressing @wmi's concern by limiting the targets to the recurrence cycles that only the last instruction of the recurrence (that feeds the PHI instruction) can have uses outside of the recurrence. This is not an ideal solution yet, and more fundamental solution (such as having recurrence optimization as a separate pass and/or using live range analysis for it) should follow. But still I think it is worth to have it here.

Test cases are updated as well to use MIR.

Ping. Thanks!

For the example below, findTargetRecurrence starts from r2 and r3 to search a def reg equals to r1. There are a lot of possibilities to explore. That is where the complexity of findTargetRecurrence comes from.

r1 = phi(r2, r3)
r4 = r5 + r1;
r6 = r7 + r4;
r3 = r6 + r8;

After adding the constraint to the recurrence cycle, since all the instructions other than the last one in the recurrence loop should have only one use, it will be easy to start from r1 and search forward. I guess findTargetRecurrence can be simplified a lot if the backward searching is replaced by forward searching, right?

Addressing comments from @wmi. Thank you for the suggestion!

minor fix.

wmi added inline comments.Jun 29 2017, 11:42 AM

lib/CodeGen/PeepholeOptimizer.cpp
1549–1552 ↗	(On Diff #104697)	For findTargetRecurrence, RCs will at most contain one RC, right? It may be better to remove RCs, change the return value of findTargetRecurrence to bool type, and use it to indicate whether a RC is found.

@wmi Good call! I fixed the code per your suggestion. Thanks!

LGTM.

This revision is now accepted and ready to land.Jun 29 2017, 2:06 PM

Closed by commit rL306758: Remove redundant copy in recurrences (authored by twoh). · Explain WhyJun 29 2017, 4:11 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

PeepholeOptimizer.cpp

170 lines

TwoAddressInstructionPass.cpp

11 lines

test/

CodeGen/

MIR/

Generic/

multiRunPass.mir

3 lines

X86/

peephole-recurrence.mir

232 lines

Diff 104769

llvm/trunk/lib/CodeGen/PeepholeOptimizer.cpp

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineDominators.h"		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
		#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineOperand.h"		#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/MC/MCInstrDesc.h"		#include "llvm/MC/MCInstrDesc.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
Show All 27 Lines	static cl::opt<bool> DisableNAPhysCopyOpt(
cl::desc("Disable non-allocatable physical register copy optimization"));		cl::desc("Disable non-allocatable physical register copy optimization"));

// Limit the number of PHI instructions to process		// Limit the number of PHI instructions to process
// in PeepholeOptimizer::getNextSource.		// in PeepholeOptimizer::getNextSource.
static cl::opt<unsigned> RewritePHILimit(		static cl::opt<unsigned> RewritePHILimit(
"rewrite-phi-limit", cl::Hidden, cl::init(10),		"rewrite-phi-limit", cl::Hidden, cl::init(10),
cl::desc("Limit the length of PHI chains to lookup"));		cl::desc("Limit the length of PHI chains to lookup"));

		// Limit the length of recurrence chain when evaluating the benefit of
		// commuting operands.
		static cl::opt<unsigned> MaxRecurrenceChain(
		"recurrence-chain-limit", cl::Hidden, cl::init(3),
		cl::desc("Maximum length of recurrence chain when evaluating the benefit "
		"of commuting operands"));


STATISTIC(NumReuse, "Number of extension results reused");		STATISTIC(NumReuse, "Number of extension results reused");
STATISTIC(NumCmps, "Number of compares eliminated");		STATISTIC(NumCmps, "Number of compares eliminated");
STATISTIC(NumImmFold, "Number of move immediate folded");		STATISTIC(NumImmFold, "Number of move immediate folded");
STATISTIC(NumLoadFold, "Number of loads folded");		STATISTIC(NumLoadFold, "Number of loads folded");
STATISTIC(NumSelects, "Number of selects optimized");		STATISTIC(NumSelects, "Number of selects optimized");
STATISTIC(NumUncoalescableCopies, "Number of uncoalescable copies optimized");		STATISTIC(NumUncoalescableCopies, "Number of uncoalescable copies optimized");
STATISTIC(NumRewrittenCopies, "Number of copies rewritten");		STATISTIC(NumRewrittenCopies, "Number of copies rewritten");
STATISTIC(NumNAPhysCopies, "Number of non-allocatable physical copies removed");		STATISTIC(NumNAPhysCopies, "Number of non-allocatable physical copies removed");

namespace {		namespace {

class ValueTrackerResult;		class ValueTrackerResult;
		class RecurrenceInstr;

class PeepholeOptimizer : public MachineFunctionPass {		class PeepholeOptimizer : public MachineFunctionPass {
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;
const TargetRegisterInfo *TRI;		const TargetRegisterInfo *TRI;
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;
MachineDominatorTree *DT; // Machine dominator tree		MachineDominatorTree *DT; // Machine dominator tree
		MachineLoopInfo *MLI;

public:		public:
static char ID; // Pass identification		static char ID; // Pass identification

PeepholeOptimizer() : MachineFunctionPass(ID) {		PeepholeOptimizer() : MachineFunctionPass(ID) {
initializePeepholeOptimizerPass(*PassRegistry::getPassRegistry());		initializePeepholeOptimizerPass(*PassRegistry::getPassRegistry());
}		}

bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
		AU.addRequired<MachineLoopInfo>();
		AU.addPreserved<MachineLoopInfo>();
if (Aggressive) {		if (Aggressive) {
AU.addRequired<MachineDominatorTree>();		AU.addRequired<MachineDominatorTree>();
AU.addPreserved<MachineDominatorTree>();		AU.addPreserved<MachineDominatorTree>();
}		}
}		}

/// \brief Track Def -> Use info used for rewriting copies.		/// \brief Track Def -> Use info used for rewriting copies.
typedef SmallDenseMap<TargetInstrInfo::RegSubRegPair, ValueTrackerResult>		typedef SmallDenseMap<TargetInstrInfo::RegSubRegPair, ValueTrackerResult>
RewriteMapTy;		RewriteMapTy;

		/// \brief Sequence of instructions that formulate recurrence cycle.
		typedef SmallVector<RecurrenceInstr, 4> RecurrenceCycle;

private:		private:
bool optimizeCmpInstr(MachineInstr MI, MachineBasicBlock MBB);		bool optimizeCmpInstr(MachineInstr MI, MachineBasicBlock MBB);
bool optimizeExtInstr(MachineInstr MI, MachineBasicBlock MBB,		bool optimizeExtInstr(MachineInstr MI, MachineBasicBlock MBB,
SmallPtrSetImpl<MachineInstr*> &LocalMIs);		SmallPtrSetImpl<MachineInstr*> &LocalMIs);
bool optimizeSelect(MachineInstr *MI,		bool optimizeSelect(MachineInstr *MI,
SmallPtrSetImpl<MachineInstr *> &LocalMIs);		SmallPtrSetImpl<MachineInstr *> &LocalMIs);
bool optimizeCondBranch(MachineInstr *MI);		bool optimizeCondBranch(MachineInstr *MI);
bool optimizeCoalescableCopy(MachineInstr *MI);		bool optimizeCoalescableCopy(MachineInstr *MI);
bool optimizeUncoalescableCopy(MachineInstr *MI,		bool optimizeUncoalescableCopy(MachineInstr *MI,
SmallPtrSetImpl<MachineInstr *> &LocalMIs);		SmallPtrSetImpl<MachineInstr *> &LocalMIs);
		bool optimizeRecurrence(MachineInstr &PHI);
bool findNextSource(unsigned Reg, unsigned SubReg,		bool findNextSource(unsigned Reg, unsigned SubReg,
RewriteMapTy &RewriteMap);		RewriteMapTy &RewriteMap);
bool isMoveImmediate(MachineInstr *MI,		bool isMoveImmediate(MachineInstr *MI,
SmallSet<unsigned, 4> &ImmDefRegs,		SmallSet<unsigned, 4> &ImmDefRegs,
DenseMap<unsigned, MachineInstr*> &ImmDefMIs);		DenseMap<unsigned, MachineInstr*> &ImmDefMIs);
bool foldImmediate(MachineInstr MI, MachineBasicBlock MBB,		bool foldImmediate(MachineInstr MI, MachineBasicBlock MBB,
SmallSet<unsigned, 4> &ImmDefRegs,		SmallSet<unsigned, 4> &ImmDefRegs,
DenseMap<unsigned, MachineInstr*> &ImmDefMIs);		DenseMap<unsigned, MachineInstr*> &ImmDefMIs);
		/// \brief Finds recurrence cycles, but only ones that formulated around
		/// a def operand and a use operand that are tied. If there is a use
		/// operand commutable with the tied use operand, find recurrence cycle
		/// along that operand as well.
		bool findTargetRecurrence(unsigned Reg,
		const SmallSet<unsigned, 2> &TargetReg,
		RecurrenceCycle &RC);

/// \brief If copy instruction \p MI is a virtual register copy, track it in		/// \brief If copy instruction \p MI is a virtual register copy, track it in
/// the set \p CopySrcRegs and \p CopyMIs. If this virtual register was		/// the set \p CopySrcRegs and \p CopyMIs. If this virtual register was
/// previously seen as a copy, replace the uses of this copy with the		/// previously seen as a copy, replace the uses of this copy with the
/// previously seen copy's destination register.		/// previously seen copy's destination register.
bool foldRedundantCopy(MachineInstr *MI,		bool foldRedundantCopy(MachineInstr *MI,
SmallSet<unsigned, 4> &CopySrcRegs,		SmallSet<unsigned, 4> &CopySrcRegs,
DenseMap<unsigned, MachineInstr *> &CopyMIs);		DenseMap<unsigned, MachineInstr *> &CopyMIs);
Show All 28 Lines	private:
bool isUncoalescableCopy(const MachineInstr &MI) {		bool isUncoalescableCopy(const MachineInstr &MI) {
return MI.isBitcast() \|\|		return MI.isBitcast() \|\|
(!DisableAdvCopyOpt &&		(!DisableAdvCopyOpt &&
(MI.isRegSequenceLike() \|\| MI.isInsertSubregLike() \|\|		(MI.isRegSequenceLike() \|\| MI.isInsertSubregLike() \|\|
MI.isExtractSubregLike()));		MI.isExtractSubregLike()));
}		}
};		};

		/// \brief Helper class to hold instructions that are inside recurrence
		/// cycles. The recurrence cycle is formulated around 1) a def operand and its
		/// tied use operand, or 2) a def operand and a use operand that is commutable
		/// with another use operand which is tied to the def operand. In the latter
		/// case, index of the tied use operand and the commutable use operand are
		/// maintained with CommutePair.
		class RecurrenceInstr {
		public:
		typedef std::pair<unsigned, unsigned> IndexPair;

		RecurrenceInstr(MachineInstr *MI) : MI(MI) {}
		RecurrenceInstr(MachineInstr *MI, unsigned Idx1, unsigned Idx2)
		: MI(MI), CommutePair(std::make_pair(Idx1, Idx2)) {}

		MachineInstr *getMI() const { return MI; }
		Optional<IndexPair> getCommutePair() const { return CommutePair; }

		private:
		MachineInstr *MI;
		Optional<IndexPair> CommutePair;
		};

/// \brief Helper class to hold a reply for ValueTracker queries. Contains the		/// \brief Helper class to hold a reply for ValueTracker queries. Contains the
/// returned sources for a given search and the instructions where the sources		/// returned sources for a given search and the instructions where the sources
/// were tracked from.		/// were tracked from.
class ValueTrackerResult {		class ValueTrackerResult {
private:		private:
/// Track all sources found by one ValueTracker query.		/// Track all sources found by one ValueTracker query.
SmallVector<TargetInstrInfo::RegSubRegPair, 2> RegSrcs;		SmallVector<TargetInstrInfo::RegSubRegPair, 2> RegSrcs;

▲ Show 20 Lines • Show All 174 Lines • ▼ Show 20 Lines
} // end anonymous namespace		} // end anonymous namespace

char PeepholeOptimizer::ID = 0;		char PeepholeOptimizer::ID = 0;
char &llvm::PeepholeOptimizerID = PeepholeOptimizer::ID;		char &llvm::PeepholeOptimizerID = PeepholeOptimizer::ID;

INITIALIZE_PASS_BEGIN(PeepholeOptimizer, DEBUG_TYPE,		INITIALIZE_PASS_BEGIN(PeepholeOptimizer, DEBUG_TYPE,
"Peephole Optimizations", false, false)		"Peephole Optimizations", false, false)
INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)		INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
		INITIALIZE_PASS_DEPENDENCY(MachineLoopInfo)
INITIALIZE_PASS_END(PeepholeOptimizer, DEBUG_TYPE,		INITIALIZE_PASS_END(PeepholeOptimizer, DEBUG_TYPE,
"Peephole Optimizations", false, false)		"Peephole Optimizations", false, false)

/// If instruction is a copy-like instruction, i.e. it reads a single register		/// If instruction is a copy-like instruction, i.e. it reads a single register
/// and writes a single register and it does not modify the source, and if the		/// and writes a single register and it does not modify the source, and if the
/// source value is preserved as a sub-register of the result, then replace all		/// source value is preserved as a sub-register of the result, then replace all
/// reachable uses of the source with the subreg of the result.		/// reachable uses of the source with the subreg of the result.
///		///
▲ Show 20 Lines • Show All 1,059 Lines • ▼ Show 20 Lines	bool PeepholeOptimizer::foldRedundantNAPhysCopy(
// register get a copy of the non-allocatable physical register, and we only		// register get a copy of the non-allocatable physical register, and we only
// track one such copy. Avoid getting confused by this new non-allocatable		// track one such copy. Avoid getting confused by this new non-allocatable
// physical register definition, and remove it from the tracked copies.		// physical register definition, and remove it from the tracked copies.
DEBUG(dbgs() << "NAPhysCopy: missed opportunity " << *MI << '\n');		DEBUG(dbgs() << "NAPhysCopy: missed opportunity " << *MI << '\n');
NAPhysToVirtMIs.erase(PrevCopy);		NAPhysToVirtMIs.erase(PrevCopy);
return false;		return false;
}		}

		/// \bried Returns true if \p MO is a virtual register operand.
		static bool isVirtualRegisterOperand(MachineOperand &MO) {
		if (!MO.isReg())
		return false;
		return TargetRegisterInfo::isVirtualRegister(MO.getReg());
		}

		bool PeepholeOptimizer::findTargetRecurrence(
		unsigned Reg, const SmallSet<unsigned, 2> &TargetRegs,
		RecurrenceCycle &RC) {
		// Recurrence found if Reg is in TargetRegs.
		if (TargetRegs.count(Reg))
		return true;

		// TODO: Curerntly, we only allow the last instruction of the recurrence
		// cycle (the instruction that feeds the PHI instruction) to have more than
		// one uses to guarantee that commuting operands does not tie registers
		// with overlapping live range. Once we have actual live range info of
		// each register, this constraint can be relaxed.
		if (!MRI->hasOneNonDBGUse(Reg))
		return false;

		// Give up if the reccurrence chain length is longer than the limit.
		if (RC.size() >= MaxRecurrenceChain)
		return false;

		MachineInstr &MI = *(MRI->use_instr_nodbg_begin(Reg));
		unsigned Idx = MI.findRegisterUseOperandIdx(Reg);

		// Only interested in recurrences whose instructions have only one def, which
		// is a virtual register.
		if (MI.getDesc().getNumDefs() != 1)
		return false;

		MachineOperand &DefOp = MI.getOperand(0);
		if (!isVirtualRegisterOperand(DefOp))
		return false;

		// Check if def operand of MI is tied to any use operand. We are only
		// interested in the case that all the instructions in the recurrence chain
		// have there def operand tied with one of the use operand.
		unsigned TiedUseIdx;
		if (!MI.isRegTiedToUseOperand(0, &TiedUseIdx))
		return false;

		if (Idx == TiedUseIdx) {
		RC.push_back(RecurrenceInstr(&MI));
		return findTargetRecurrence(DefOp.getReg(), TargetRegs, RC);
		} else {
		// If Idx is not TiedUseIdx, check if Idx is commutable with TiedUseIdx.
		unsigned CommIdx = TargetInstrInfo::CommuteAnyOperandIndex;
		if (TII->findCommutedOpIndices(MI, Idx, CommIdx) && CommIdx == TiedUseIdx) {
		RC.push_back(RecurrenceInstr(&MI, Idx, CommIdx));
		return findTargetRecurrence(DefOp.getReg(), TargetRegs, RC);
		}
		}

		return false;
		}

		/// \brief Phi instructions will eventually be lowered to copy instructions. If
		/// phi is in a loop header, a recurrence may formulated around the source and
		/// destination of the phi. For such case commuting operands of the instructions
		/// in the recurrence may enable coalescing of the copy instruction generated
		/// from the phi. For example, if there is a recurrence of
		///
		/// LoopHeader:
		/// %vreg1 = phi(%vreg0, %vreg100)
		/// LoopLatch:
		/// %vreg0<def, tied1> = ADD %vreg2<def, tied0>, %vreg1
		///
		/// , the fact that vreg0 and vreg2 are in the same tied operands set makes
		/// the coalescing of copy instruction generated from the phi in
		/// LoopHeader(i.e. %vreg1 = COPY %vreg0) impossible, because %vreg1 and
		/// %vreg2 have overlapping live range. This introduces additional move
		/// instruction to the final assembly. However, if we commute %vreg2 and
		/// %vreg1 of ADD instruction, the redundant move instruction can be
		/// avoided.
		bool PeepholeOptimizer::optimizeRecurrence(MachineInstr &PHI) {
		SmallSet<unsigned, 2> TargetRegs;
		for (unsigned Idx = 1; Idx < PHI.getNumOperands(); Idx += 2) {
		MachineOperand &MO = PHI.getOperand(Idx);
		assert(isVirtualRegisterOperand(MO) && "Invalid PHI instruction");
		TargetRegs.insert(MO.getReg());
		}

		bool Changed = false;
		RecurrenceCycle RC;
		if (findTargetRecurrence(PHI.getOperand(0).getReg(), TargetRegs, RC)) {
		// Commutes operands of instructions in RC if necessary so that the copy to
		// be generated from PHI can be coalesced.
		DEBUG(dbgs() << "Optimize recurrence chain from " << PHI);
		for (auto &RI : RC) {
		DEBUG(dbgs() << "\tInst: " << *(RI.getMI()));
		auto CP = RI.getCommutePair();
		if (CP) {
		Changed = true;
		TII->commuteInstruction((RI.getMI()), false, (CP).first,
		(*CP).second);
		DEBUG(dbgs() << "\t\tCommuted: " << *(RI.getMI()));
		}
		}
		}

		return Changed;
		}

bool PeepholeOptimizer::runOnMachineFunction(MachineFunction &MF) {		bool PeepholeOptimizer::runOnMachineFunction(MachineFunction &MF) {
if (skipFunction(*MF.getFunction()))		if (skipFunction(*MF.getFunction()))
return false;		return false;

DEBUG(dbgs() << "******** PEEPHOLE OPTIMIZER ********\n");		DEBUG(dbgs() << "******** PEEPHOLE OPTIMIZER ********\n");
DEBUG(dbgs() << "********** Function: " << MF.getName() << '\n');		DEBUG(dbgs() << "********** Function: " << MF.getName() << '\n');

if (DisablePeephole)		if (DisablePeephole)
return false;		return false;

TII = MF.getSubtarget().getInstrInfo();		TII = MF.getSubtarget().getInstrInfo();
TRI = MF.getSubtarget().getRegisterInfo();		TRI = MF.getSubtarget().getRegisterInfo();
MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();
DT = Aggressive ? &getAnalysis<MachineDominatorTree>() : nullptr;		DT = Aggressive ? &getAnalysis<MachineDominatorTree>() : nullptr;
		MLI = &getAnalysis<MachineLoopInfo>();

bool Changed = false;		bool Changed = false;

for (MachineBasicBlock &MBB : MF) {		for (MachineBasicBlock &MBB : MF) {
bool SeenMoveImm = false;		bool SeenMoveImm = false;

// During this forward scan, at some point it needs to answer the question		// During this forward scan, at some point it needs to answer the question
// "given a pointer to an MI in the current BB, is it located before or		// "given a pointer to an MI in the current BB, is it located before or
Show All 12 Lines	for (MachineBasicBlock &MBB : MF) {
// %PHYSREG is the map index; MI is the last valid `%vreg = COPY %PHYSREG`		// %PHYSREG is the map index; MI is the last valid `%vreg = COPY %PHYSREG`
// without any intervening re-definition of %PHYSREG.		// without any intervening re-definition of %PHYSREG.
DenseMap<unsigned, MachineInstr *> NAPhysToVirtMIs;		DenseMap<unsigned, MachineInstr *> NAPhysToVirtMIs;

// Set of virtual registers that are copied from.		// Set of virtual registers that are copied from.
SmallSet<unsigned, 4> CopySrcRegs;		SmallSet<unsigned, 4> CopySrcRegs;
DenseMap<unsigned, MachineInstr *> CopySrcMIs;		DenseMap<unsigned, MachineInstr *> CopySrcMIs;

		bool IsLoopHeader = MLI->isLoopHeader(&MBB);

for (MachineBasicBlock::iterator MII = MBB.begin(), MIE = MBB.end();		for (MachineBasicBlock::iterator MII = MBB.begin(), MIE = MBB.end();
MII != MIE; ) {		MII != MIE; ) {
MachineInstr MI = &MII;		MachineInstr MI = &MII;
// We may be erasing MI below, increment MII now.		// We may be erasing MI below, increment MII now.
++MII;		++MII;
LocalMIs.insert(MI);		LocalMIs.insert(MI);

// Skip debug values. They should not affect this peephole optimization.		// Skip debug values. They should not affect this peephole optimization.
if (MI->isDebugValue())		if (MI->isDebugValue())
continue;		continue;

if (MI->isPosition() \|\| MI->isPHI())		if (MI->isPosition())
		continue;

		if (IsLoopHeader && MI->isPHI()) {
		if (optimizeRecurrence(*MI)) {
		Changed = true;
continue;		continue;
		}
		}

if (!MI->isCopy()) {		if (!MI->isCopy()) {
for (const auto &Op : MI->operands()) {		for (const auto &Op : MI->operands()) {
// Visit all operands: definitions can be implicit or explicit.		// Visit all operands: definitions can be implicit or explicit.
if (Op.isReg()) {		if (Op.isReg()) {
unsigned Reg = Op.getReg();		unsigned Reg = Op.getReg();
if (Op.isDef() && isNAPhysCopy(Reg)) {		if (Op.isDef() && isNAPhysCopy(Reg)) {
const auto &Def = NAPhysToVirtMIs.find(Reg);		const auto &Def = NAPhysToVirtMIs.find(Reg);
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	for (MachineBasicBlock::iterator MII = MBB.begin(), MIE = MBB.end();
LocalMIs.erase(MI);		LocalMIs.erase(MI);
LocalMIs.erase(DefMI);		LocalMIs.erase(DefMI);
LocalMIs.insert(FoldMI);		LocalMIs.insert(FoldMI);
MI->eraseFromParent();		MI->eraseFromParent();
DefMI->eraseFromParent();		DefMI->eraseFromParent();
MRI->markUsesInDebugValueAsUndef(FoldedReg);		MRI->markUsesInDebugValueAsUndef(FoldedReg);
FoldAsLoadDefCandidates.erase(FoldedReg);		FoldAsLoadDefCandidates.erase(FoldedReg);
++NumLoadFold;		++NumLoadFold;

// MI is replaced with FoldMI so we can continue trying to fold		// MI is replaced with FoldMI so we can continue trying to fold
Changed = true;		Changed = true;
MI = FoldMI;		MI = FoldMI;
}		}
}		}
}		}
}		}

// If we run into an instruction we can't fold across, discard		// If we run into an instruction we can't fold across, discard
// the load candidates. Note: We might be able to fold into this		// the load candidates. Note: We might be able to fold into this
// instruction, so this needs to be after the folding logic.		// instruction, so this needs to be after the folding logic.
if (MI->isLoadFoldBarrier()) {		if (MI->isLoadFoldBarrier()) {
DEBUG(dbgs() << "Encountered load fold barrier on " << *MI << "\n");		DEBUG(dbgs() << "Encountered load fold barrier on " << *MI << "\n");
FoldAsLoadDefCandidates.clear();		FoldAsLoadDefCandidates.clear();
}		}

▲ Show 20 Lines • Show All 297 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/TwoAddressInstructionPass.cpp

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
STATISTIC(NumReSchedDowns, "Number of instructions re-scheduled down");		STATISTIC(NumReSchedDowns, "Number of instructions re-scheduled down");

// Temporary flag to disable rescheduling.		// Temporary flag to disable rescheduling.
static cl::opt<bool>		static cl::opt<bool>
EnableRescheduling("twoaddr-reschedule",		EnableRescheduling("twoaddr-reschedule",
cl::desc("Coalesce copies by rescheduling (default=true)"),		cl::desc("Coalesce copies by rescheduling (default=true)"),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

		// Limit the number of dataflow edges to traverse when evaluating the benefit
		// of commuting operands.
		static cl::opt<unsigned> MaxDataFlowEdge(
		"dataflow-edge-limit", cl::Hidden, cl::init(3),
		cl::desc("Maximum number of dataflow edges to traverse when evaluating "
		"the benefit of commuting operands"));

namespace {		namespace {
class TwoAddressInstructionPass : public MachineFunctionPass {		class TwoAddressInstructionPass : public MachineFunctionPass {
MachineFunction *MF;		MachineFunction *MF;
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;
const TargetRegisterInfo *TRI;		const TargetRegisterInfo *TRI;
const InstrItineraryData *InstrItins;		const InstrItineraryData *InstrItins;
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;
LiveVariables *LV;		LiveVariables *LV;
▲ Show 20 Lines • Show All 553 Lines • ▼ Show 20 Lines	isProfitableToCommute(unsigned regA, unsigned regB, unsigned regC,
// to eliminate an otherwise unavoidable copy.		// to eliminate an otherwise unavoidable copy.
// FIXME:		// FIXME:
// We can extend the logic further: If an pair of operands in an insn has		// We can extend the logic further: If an pair of operands in an insn has
// been merged, the insn could be regarded as a virtual copy, and the virtual		// been merged, the insn could be regarded as a virtual copy, and the virtual
// copy could also be used to construct a copy chain.		// copy could also be used to construct a copy chain.
// To more generally minimize register copies, ideally the logic of two addr		// To more generally minimize register copies, ideally the logic of two addr
// instruction pass should be integrated with register allocation pass where		// instruction pass should be integrated with register allocation pass where
// interference graph is available.		// interference graph is available.
if (isRevCopyChain(regC, regA, 3))		if (isRevCopyChain(regC, regA, MaxDataFlowEdge))
return true;		return true;

if (isRevCopyChain(regB, regA, 3))		if (isRevCopyChain(regB, regA, MaxDataFlowEdge))
return false;		return false;

// Since there are no intervening uses for both registers, then commute		// Since there are no intervening uses for both registers, then commute
// if the def of regC is closer. Its live interval is shorter.		// if the def of regC is closer. Its live interval is shorter.
return LastDefB && LastDefC && LastDefC > LastDefB;		return LastDefB && LastDefC && LastDefC > LastDefB;
}		}

/// Commute a two-address instruction and update the basic block, distance map,		/// Commute a two-address instruction and update the basic block, distance map,
▲ Show 20 Lines • Show All 1,174 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/MIR/Generic/multiRunPass.mir

	# RUN: llc -run-pass expand-isel-pseudos -run-pass peephole-opt -debug-pass=Arguments -o - %s 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=PSEUDO_PEEPHOLE			# RUN: llc -run-pass expand-isel-pseudos -run-pass peephole-opt -debug-pass=Arguments -o - %s 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=PSEUDO_PEEPHOLE
	# RUN: llc -run-pass expand-isel-pseudos,peephole-opt -debug-pass=Arguments -o - %s 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=PSEUDO_PEEPHOLE			# RUN: llc -run-pass expand-isel-pseudos,peephole-opt -debug-pass=Arguments -o - %s 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=PSEUDO_PEEPHOLE
	# RUN: llc -run-pass peephole-opt -run-pass expand-isel-pseudos -debug-pass=Arguments -o - %s 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=PEEPHOLE_PSEUDO			# RUN: llc -run-pass peephole-opt -run-pass expand-isel-pseudos -debug-pass=Arguments -o - %s 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=PEEPHOLE_PSEUDO
	# RUN: llc -run-pass peephole-opt,expand-isel-pseudos -debug-pass=Arguments -o - %s 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=PEEPHOLE_PSEUDO			# RUN: llc -run-pass peephole-opt,expand-isel-pseudos -debug-pass=Arguments -o - %s 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=PEEPHOLE_PSEUDO
	# REQUIRES: asserts			# REQUIRES: asserts

	# This test ensures that the command line accepts			# This test ensures that the command line accepts
	# several run passes on the same command line and			# several run passes on the same command line and
	# actually create the proper pipeline for it.			# actually create the proper pipeline for it.
	# PSEUDO_PEEPHOLE: -expand-isel-pseudos {{(-machineverifier )?}}-peephole-opt			# PSEUDO_PEEPHOLE: -expand-isel-pseudos
				# PSEUDO_PEEPHOLE-SAME: {{(-machineverifier )?}}-peephole-opt
	# PEEPHOLE_PSEUDO: -peephole-opt {{(-machineverifier )?}}-expand-isel-pseudos			# PEEPHOLE_PSEUDO: -peephole-opt {{(-machineverifier )?}}-expand-isel-pseudos

	# Make sure there are no other passes happening after what we asked.			# Make sure there are no other passes happening after what we asked.
	# CHECK-NEXT: --- \|			# CHECK-NEXT: --- \|
	---			---
	# CHECK: name: foo			# CHECK: name: foo
	name: foo			name: foo
	body: \|			body: \|
	bb.0:			bb.0:
	...			...

llvm/trunk/test/CodeGen/X86/peephole-recurrence.mir

				# RUN: llc -mtriple=x86_64-- -run-pass=peephole-opt -o - %s \| FileCheck %s

				--- \|
				define i32 @foo(i32 %a) {
				bb0:
				br label %bb1

				bb1: ; preds = %bb7, %bb0
				%vreg0 = phi i32 [ 0, %bb0 ], [ %vreg3, %bb7 ]
				%cond0 = icmp eq i32 %a, 0
				br i1 %cond0, label %bb4, label %bb3

				bb3: ; preds = %bb1
				br label %bb4

				bb4: ; preds = %bb1, %bb3
				%vreg5 = phi i32 [ 2, %bb3 ], [ 1, %bb1 ]
				%cond1 = icmp eq i32 %vreg5, 0
				br i1 %cond1, label %bb7, label %bb6

				bb6: ; preds = %bb4
				br label %bb7

				bb7: ; preds = %bb4, %bb6
				%vreg1 = phi i32 [ 2, %bb6 ], [ 1, %bb4 ]
				%vreg2 = add i32 %vreg5, %vreg0
				%vreg3 = add i32 %vreg1, %vreg2
				%cond2 = icmp slt i32 %vreg3, 10
				br i1 %cond2, label %bb1, label %bb8

				bb8: ; preds = %bb7
				ret i32 0
				}

				define i32 @bar(i32 %a, i32* %p) {
				bb0:
				br label %bb1

				bb1: ; preds = %bb7, %bb0
				%vreg0 = phi i32 [ 0, %bb0 ], [ %vreg3, %bb7 ]
				%cond0 = icmp eq i32 %a, 0
				br i1 %cond0, label %bb4, label %bb3

				bb3: ; preds = %bb1
				br label %bb4

				bb4: ; preds = %bb1, %bb3
				%vreg5 = phi i32 [ 2, %bb3 ], [ 1, %bb1 ]
				%cond1 = icmp eq i32 %vreg5, 0
				br i1 %cond1, label %bb7, label %bb6

				bb6: ; preds = %bb4
				br label %bb7

				bb7: ; preds = %bb4, %bb6
				%vreg1 = phi i32 [ 2, %bb6 ], [ 1, %bb4 ]
				%vreg2 = add i32 %vreg5, %vreg0
				store i32 %vreg0, i32* %p
				%vreg3 = add i32 %vreg1, %vreg2
				%cond2 = icmp slt i32 %vreg3, 10
				br i1 %cond2, label %bb1, label %bb8

				bb8: ; preds = %bb7
				ret i32 0
				}

				...
				---
				# There is a recurrence formulated around %0, %10, and %3. Check that operands
				# are commuted for ADD instructions in bb.5.bb7 so that the values involved in
				# the recurrence are tied. This will remove redundant copy instruction.
				name: foo
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gr32, preferred-register: '' }
				- { id: 1, class: gr32, preferred-register: '' }
				- { id: 2, class: gr32, preferred-register: '' }
				- { id: 3, class: gr32, preferred-register: '' }
				- { id: 4, class: gr32, preferred-register: '' }
				- { id: 5, class: gr32, preferred-register: '' }
				- { id: 6, class: gr32, preferred-register: '' }
				- { id: 7, class: gr32, preferred-register: '' }
				- { id: 8, class: gr32, preferred-register: '' }
				- { id: 9, class: gr32, preferred-register: '' }
				- { id: 10, class: gr32, preferred-register: '' }
				- { id: 11, class: gr32, preferred-register: '' }
				- { id: 12, class: gr32, preferred-register: '' }
				liveins:
				- { reg: '%edi', virtual-reg: '%4' }
				body: \|
				bb.0.bb0:
				successors: %bb.1.bb1(0x80000000)
				liveins: %edi

				%4 = COPY %edi
				%5 = MOV32r0 implicit-def dead %eflags

				bb.1.bb1:
				successors: %bb.3.bb4(0x30000000), %bb.2.bb3(0x50000000)

				; CHECK: %0 = PHI %5, %bb.0.bb0, %3, %bb.5.bb7
				%0 = PHI %5, %bb.0.bb0, %3, %bb.5.bb7
				%6 = MOV32ri 1
				TEST32rr %4, %4, implicit-def %eflags
				JE_1 %bb.3.bb4, implicit %eflags
				JMP_1 %bb.2.bb3

				bb.2.bb3:
				successors: %bb.3.bb4(0x80000000)

				%7 = MOV32ri 2

				bb.3.bb4:
				successors: %bb.5.bb7(0x30000000), %bb.4.bb6(0x50000000)

				%1 = PHI %6, %bb.1.bb1, %7, %bb.2.bb3
				TEST32rr %1, %1, implicit-def %eflags
				JE_1 %bb.5.bb7, implicit %eflags
				JMP_1 %bb.4.bb6

				bb.4.bb6:
				successors: %bb.5.bb7(0x80000000)

				%9 = MOV32ri 2

				bb.5.bb7:
				successors: %bb.1.bb1(0x7c000000), %bb.6.bb8(0x04000000)

				%2 = PHI %6, %bb.3.bb4, %9, %bb.4.bb6
				%10 = ADD32rr %1, %0, implicit-def dead %eflags
				; CHECK: %10 = ADD32rr
				; CHECK-SAME: %0,
				; CHECK-SAME: %1,
				%3 = ADD32rr %2, killed %10, implicit-def dead %eflags
				; CHECK: %3 = ADD32rr
				; CHECK-SAME: %10,
				; CHECK-SAME: %2,
				%11 = SUB32ri8 %3, 10, implicit-def %eflags
				JL_1 %bb.1.bb1, implicit %eflags
				JMP_1 %bb.6.bb8

				bb.6.bb8:
				%12 = MOV32r0 implicit-def dead %eflags
				%eax = COPY %12
				RET 0, %eax

				...
				---
				# Here a recurrence is formulated around %0, %11, and %3, but operands should
				# not be commuted because %0 has a use outside of recurrence. This is to
				# prevent the case of commuting operands ties the values with overlapping live
				# ranges.
				name: bar
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gr32, preferred-register: '' }
				- { id: 1, class: gr32, preferred-register: '' }
				- { id: 2, class: gr32, preferred-register: '' }
				- { id: 3, class: gr32, preferred-register: '' }
				- { id: 4, class: gr32, preferred-register: '' }
				- { id: 5, class: gr64, preferred-register: '' }
				- { id: 6, class: gr32, preferred-register: '' }
				- { id: 7, class: gr32, preferred-register: '' }
				- { id: 8, class: gr32, preferred-register: '' }
				- { id: 9, class: gr32, preferred-register: '' }
				- { id: 10, class: gr32, preferred-register: '' }
				- { id: 11, class: gr32, preferred-register: '' }
				- { id: 12, class: gr32, preferred-register: '' }
				- { id: 13, class: gr32, preferred-register: '' }
				liveins:
				- { reg: '%edi', virtual-reg: '%4' }
				- { reg: '%rsi', virtual-reg: '%5' }
				body: \|
				bb.0.bb0:
				successors: %bb.1.bb1(0x80000000)
				liveins: %edi, %rsi

				%5 = COPY %rsi
				%4 = COPY %edi
				%6 = MOV32r0 implicit-def dead %eflags

				bb.1.bb1:
				successors: %bb.3.bb4(0x30000000), %bb.2.bb3(0x50000000)

				%0 = PHI %6, %bb.0.bb0, %3, %bb.5.bb7
				; CHECK: %0 = PHI %6, %bb.0.bb0, %3, %bb.5.bb7
				%7 = MOV32ri 1
				TEST32rr %4, %4, implicit-def %eflags
				JE_1 %bb.3.bb4, implicit %eflags
				JMP_1 %bb.2.bb3

				bb.2.bb3:
				successors: %bb.3.bb4(0x80000000)

				%8 = MOV32ri 2

				bb.3.bb4:
				successors: %bb.5.bb7(0x30000000), %bb.4.bb6(0x50000000)

				%1 = PHI %7, %bb.1.bb1, %8, %bb.2.bb3
				TEST32rr %1, %1, implicit-def %eflags
				JE_1 %bb.5.bb7, implicit %eflags
				JMP_1 %bb.4.bb6

				bb.4.bb6:
				successors: %bb.5.bb7(0x80000000)

				%10 = MOV32ri 2

				bb.5.bb7:
				successors: %bb.1.bb1(0x7c000000), %bb.6.bb8(0x04000000)

				%2 = PHI %7, %bb.3.bb4, %10, %bb.4.bb6
				%11 = ADD32rr %1, %0, implicit-def dead %eflags
				; CHECK: %11 = ADD32rr
				; CHECK-SAME: %1,
				; CHECK-SAME: %0,
				MOV32mr %5, 1, _, 0, _, %0 :: (store 4 into %ir.p)
				%3 = ADD32rr %2, killed %11, implicit-def dead %eflags
				; CHECK: %3 = ADD32rr
				; CHECK-SAME: %2,
				; CHECK-SAME: %11,
				%12 = SUB32ri8 %3, 10, implicit-def %eflags
				JL_1 %bb.1.bb1, implicit %eflags
				JMP_1 %bb.6.bb8

				bb.6.bb8:
				%13 = MOV32r0 implicit-def dead %eflags
				%eax = COPY %13
				RET 0, %eax

				...