This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
1
PeepholeOptimizer.cpp
7
TwoAddressInstructionPass.cpp
-
test/CodeGen/
-
CodeGen/
-
MIR/Generic/
-
Generic/
-
multiRunPass.mir
-
X86/
-
peephole-recurrence.mir

Differential D31821

Remove redundant copy in recurrences
ClosedPublic

Authored by twoh on Apr 7 2017, 11:19 AM.

Download Raw Diff

Details

Reviewers

qcolombet
MatzeB
wmi

Commits

rG0e35ea3b7c63: Remove redundant copy in recurrences
rL306758: Remove redundant copy in recurrences

Summary

If there is a chain of instructions formulating a recurrence, commuting operands can help removing a redundant copy. In the following example code,

BB#1: ; Loop Header
  %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
  ...

BB#6: ; Loop Latch
  %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
  %vreg10<def,tied1> = ADD32rr %vreg1<kill,tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1,%vreg0
  %vreg3<def,tied1> = ADD32rr %vreg2<kill,tied0>, %vreg10<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2,%vreg10
  CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
  %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
  JL_1 <BB#1>, %EFLAGS<imp-use,kill>

Existing two-address generation pass generates following code:

BB#1:
  %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
  ...

BB#6:
    Predecessors according to CFG: BB#5 BB#4
  %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
  %vreg10<def> = COPY %vreg1<kill>; GR32:%vreg10,%vreg1
  %vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg0
  %vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10
  %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2
  CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
  %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
  JL_1 <BB#1>, %EFLAGS<imp-use,kill>
  JMP_1 <BB#7>

This is suboptimal because the assembly code generated has a redundant copy at the end of #BB6 to feed %vreg13 to BB#1:

.LBB0_6:
  addl  %esi, %edi
  addl  %ebx, %edi
  cmpl  $10, %edi
  movl  %edi, %esi
  jl  .LBB0_1

This redundant copy can be elimiated by making instructions in the recurrence chain to compute the value "into" the register that actually holds the feedback value. In this example, this can be achieved by commuting %vreg0 and %vreg1 to compute %vreg10. With that change, code after two-address generation becomes

BB#1:
  %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
  ...

BB#6: derived from LLVM BB %bb7
    Predecessors according to CFG: BB#5 BB#4
  %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
  %vreg10<def> = COPY %vreg0<kill>; GR32:%vreg10,%vreg0
  %vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg1<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1
  %vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10
  %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2
  CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
  %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
  JL_1 <BB#1>, %EFLAGS<imp-use,kill>
  JMP_1 <BB#7>

and the final assembly does not have redundant copy:

.LBB0_6:
  addl  %edi, %eax
  addl  %ebx, %eax
  cmpl  $10, %eax
  jl  .LBB0_1

Diff Detail

Build Status

Buildable 7791
Build 7791: arc lint + arc unit

Event Timeline

twoh created this revision.Apr 7 2017, 11:19 AM

twoh edited the summary of this revision. (Show Details)Apr 7 2017, 11:19 AM

small changes to the test.

Probably best to have Quentin or Matthias do it.

Friendly ping. Thanks!

Ping. Thanks!

This looks pretty complicated for the task at hand. Wouldn't this be a simpler transformation to perform at a place where we are still in Machine SSA and have use-def information for free and also have phis around to indicate loops so we can do without a MachineLoopInfo instance.

@MatzeB I was considering other passes as well for the sake of simplicity, but I concluded two-address instruction is the right place to do. The problem doesn't appear while we're in SSA, but becomes an issue when we make an decision about which operand register should be the destination register. I think this is why isProfitableToCommute function is in two-address instruction pass.

Ping. If you're too busy to review this patch, could you please recommend someone else? Thanks!

Ping.

Ping. Can someone please let me know what would be the best way to have this patch reviewed? Thanks!

I'd like to hear @qcolombet's opinion about this, also +wmi who wrote the similar isRevCopyChain().

This patch

Nitpicks below
Do you have an idea of the compiletime impact of this patch?
I am slightly worried about the use of MachineLoopInfo: Is it really necessary here? Maybe the same can be done with some simpler dominance check? Can you tell when the MachineLoopInfo is/isn't available (as that will impact whether this rule is applied or not).

General Observations

I am not happy with the general approach of throwing more and more patterns at TwoAddressInstructions:

Yes, LLVMs handling of TwoAddressInstruction as a pre-RA pass is a bad idea IMO (and I don't know why it was done this way): We are unnecessarily constraining the allocation problem and don't necessarily do a good job upfront without knowing how the allocation will work out.
This pass adds another pattern at TwoAddressInstruction: It is similar to isRevCopyChain() but slightly more complicated so we end up with 160 extra lines of analysis for yet another pattern. If this trend continues we will have an even harder time maintaining this pass.
On the other hand this improves code quality today without rearchitecting the code.

lib/CodeGen/TwoAddressInstructionPass.cpp
172	This makes no sense! You are starting to use the machine loop info, all you do here is mark it preserved a 2nd time; There was already addPreservedID(MachineLoopInfoID) which had the same effect.
561	Use references for things that cannot be nullptr. Similar for a few other places here.
563–564	Don't use `auto` when the type isn't immediately obvious just by looking at the line. (`auto` is unfriendly towards the readers of your code). Similar in a few more places.
569	we don't tend to have a space before the ';' here. Did you try clang-format on your patch?
605	This only gives you the declared/minimum number of operands, there may be more.
698–699	Isn't this check superfluous? I would expect this to be true anyway if the instruction is inside `MBB` which is checked later.
test/CodeGen/X86/twoaddr-recurrence.ll
1–2 ↗	(On Diff #94549)	Did you try to create a .mir test so the pass can be tested in isolation? If yes and it didn't work, could you tell us why so we can improve the .mir testing.

Addressing comments from @MatzeB.

Harbormaster completed remote builds in B6485: Diff 99207.May 16 2017, 2:42 PM

@MatzeB Thank you for your comments. I'm collection compile time numbers now with spec2006. For our internal benchmark, which takes about ~40min to compile, the compile time difference was negligible.

I don't think this patch can be implemented without loop info, because this is formulated around the loop recurrence pattern. It might be possible but will eventually require more code to produce the information provided by MachineLoopInfo. I'm curious what is your biggest concern about using MachineLoopInfo in this pass, as it is used in other passes as well.

I agree on you that maintainability vs supporting more patterns is a hard trade-off. IMHO, if we want to generate better performing code, adding more code to support more patterns would be unavoidable, unless we re-architecturing register allocation related passes. I think this patch adds more lines of code because this handles the pattern based on the loop structure for the first time.

I tried .mir test, but the output is still not two-address format. I tried

./bin/llc -mtriple=x86_64-- -run-pass=twoaddressinstruction -o - ./input.mir

(input.mir is from -stop-before=twoaddressinstruction), and the output still has instructions such as

%vreg10<def,tied1> = ADD32rr %vreg1<kill,tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1,%vreg0
.

lib/CodeGen/TwoAddressInstructionPass.cpp
698–699	This check is here to filter the definitions that are outside of Loop. The check below is to see if the definition inside the Loop belongs to MBB.

The problem doesn't appear while we're in SSA [...]

Why is that?
The marking of tie operand is already there, following phis shouldn't be an issue and would eliminate the whole data flow processing.

What am I missing?

Side question; if several recurrence chains share some operand, how do you pick the most profitable one?

@qcolombet Actually you're right. Having tied operands, this can be done within SSA. Do you have a suggestion for a better place to implement this? I thought what isRevCopyChain does is most close to what this patch does.

I'm afraid that I couldn't quite understand your side question. This patch commutes operands of an instruction if the instruction is inside the recurrence that meets certain conditions. In line 846-850, it checks if the recurrence pattern is formulated around regC, and returns true to commute operands. If not, it checks same for regB, and returns false to not to commute if it observes the pattern. If the pattern is observed for both regB and regC, it is possible that not to commute the operands generates better result, but as the logic checks regC first, it'll commute the operands.

@MatzeB Below is a compile time difference in percentage for spec2006

400.perlbench 1.80
401.bzip2 0.66
403.gcc 0.77
429.mcf 2.61
433.milc -1.70
444.namd 0.31
445.gobmk -0.05
447.dealII -0.22
450.soplex 0.48
453.povray 1.34
456.hmmer 0.49
458.sjeng 1.30
462.libquantum 1.90
464.h264ref 0.52
470.lbm -3.01
471.omnetpp 2.50
473.astar 0.27
482.sphinx3 1.01
483.xalncbmk -0.47

Hi Taewook,

Thank you for working on it. The testcase you gave is interesting. My previous work in isRevCopyChain is quite limited and only handle recurrence within a BB, so I like to see a more general fix about the redundent copy problem caused by recurrence.

A problem of the patch I can see is that it uses dataflow to track the recurrence, which I think may not be enough. Think about the case below:

r3 = r4;
...
r1 = r2 + r3;
...
r5 = load(r1)
r4 = r5;

r1-->r5-->r4-->r3-->r1, it is a recurrence loop according to the algorithm, however, we don't have problem to coalesce r3 with r4 and coalesce r4 and r5, because r5 and r1 don't have to be allocated to the same register.

So the def-src constructing an recurrence chain interesting for us should only contains def and src which are in a tied operand group, or which are in the same copy. In other word, the recurrence chain can only cause redundent copy problem when all the operands on the chain have to be allocated to the same register. That kind of recurrence loop is what we are interested in.

In addition, I don't think the recurrence chain has to be a strict cycle. Just like your motivational testcase:
vreg0 = vreg13
...
vreg2 = vreg15
vreg10 = add vreg1, vreg0
vreg3 = add vreg2, vreg10
vreg13 = vreg3

Here the recurrence chain starting from vreg0 is: vreg0-->vreg13-->vreg3-->vreg2. This is not a strict cycle, however, vreg2 and vreg0 already have interference with each other. It means it is impossible to allocate all the vregs in the recurrence chain to the same physical register, and some copy has to be kept.

I think those are the two issues that have to be addressed for a more general solution to the recurrence problem.

Thanks,
Wei.

@wmi Thank you for your reply. I agree on you that we should consider tied operand group, and as @qcolombet mentioned in the previous comment, if tie operand information is already available before this pass, I'd like to discuss where would be the best place to implement this.

I'm afraid I couldn't understand your second point. In the example there is a recurrence cycle of vreg0-->vreg13-->vreg3-->vreg10-->vreg0 as well, and what this patch does is commuting (vreg0, vreg1) and (vreg2, vreg10) to remove the copy. Did I miss something?

Hi Taewook,

@qcolombet Actually you're right. Having tied operands, this can be done within SSA. Do you have a suggestion for a better place to implement this? I thought what isRevCopyChain does is most close to what this patch does.

I would investigate this as part of the PeepholeOptimizer. It already massages copies.

Regarding my second question, answered it, with that part:

If the pattern is observed for both regB and regC, it is possible that not to commute the operands generates better result, but as the logic checks regC first, it'll commute the operands.

Cheers,
-Quentin

Reimplement the feature in peephole optimizer. This patch makes the values in the recurrence cycle to be tied to each other if possible.

Add missing reference.

FYI, below is the compile difference in percentage for spec2006:

400.perlbench 0.23
401.bzip2 -0.01
403.gcc -0.01
429.mcf 3.35
433.milc 1.11
444.namd 0.27
445.gobmk 0.20
447.dealII 0.49
450.soplex 1.02
453.povray 0.55
456.hmmer 0.19
458.sjeng 1.37
462.libquantum 1.76
464.h264ref 1.01
470.lbm -0.98
471.omnetpp 1.61
473.astar 0.95
482.sphinx3 0.50
483.xalncbmk -0.89

ping. thanks!

Sorry for the delay. The rewrite based on SSA looks much cleaner now. About the algorithm, IIUC it tries to find loop based on define-use of tied operand or operand commutable with tied operand. However, I still have concern that the method can increase redundent copy sometimes.

Here is a testcase:

A = phi(B, O)
M = load addr1.
C = M + A C and M are tied operands.
store A to addr2.
N = load from addr3
B = N + C B and N are tied operands.

Without the patch, we can allocate the above testcase without copy -- allocate A, B and N to physreg1 and allocate M, C to physreg2.

With the patch, after it changes C = M + A to C = A + M, a copy needs to be generated because C and A are tied operands but C's live range and A's live range are overlapped. It is impossible to allocate C and A to the same physreg without extra copy.

Actually, I think we don't have to explore candidate cycle based on operand commutable with tied operand. An algorithm in my mind is to find cycle which are only composed of tied operands, copies, and phi in loop header, then try to look at if there is any live range overlap between any two operands inside of the cycle. Only when there is live range overlap, we will consider to commute some operands.

@wmi Not at all! Thanks for your comments.

You're right. I should've considered the case where the operands involved in the recurrence cycle live beyond the use in the recurrence cycle. You're suggestion sounds great, but my concern is that computing live range might be expensive because live interval analysis is performed after peephole optimization, What do you think of limit it to the case of operands involved in the recurrence cycle have no use outside of the cycle?

Addressing @wmi's concern by limiting the targets to the recurrence cycles that only the last instruction of the recurrence (that feeds the PHI instruction) can have uses outside of the recurrence. This is not an ideal solution yet, and more fundamental solution (such as having recurrence optimization as a separate pass and/or using live range analysis for it) should follow. But still I think it is worth to have it here.

Test cases are updated as well to use MIR.

Ping. Thanks!

For the example below, findTargetRecurrence starts from r2 and r3 to search a def reg equals to r1. There are a lot of possibilities to explore. That is where the complexity of findTargetRecurrence comes from.

r1 = phi(r2, r3)
r4 = r5 + r1;
r6 = r7 + r4;
r3 = r6 + r8;

After adding the constraint to the recurrence cycle, since all the instructions other than the last one in the recurrence loop should have only one use, it will be easy to start from r1 and search forward. I guess findTargetRecurrence can be simplified a lot if the backward searching is replaced by forward searching, right?

Addressing comments from @wmi. Thank you for the suggestion!

minor fix.

wmi added inline comments.Jun 29 2017, 11:42 AM

lib/CodeGen/PeepholeOptimizer.cpp
1549–1552	For findTargetRecurrence, RCs will at most contain one RC, right? It may be better to remove RCs, change the return value of findTargetRecurrence to bool type, and use it to indicate whether a RC is found.

@wmi Good call! I fixed the code per your suggestion. Thanks!

LGTM.

This revision is now accepted and ready to land.Jun 29 2017, 2:06 PM

Closed by commit rL306758: Remove redundant copy in recurrences (authored by twoh). · Explain WhyJun 29 2017, 4:11 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

PeepholeOptimizer.cpp

174 lines

TwoAddressInstructionPass.cpp

11 lines

test/

CodeGen/

MIR/

Generic/

multiRunPass.mir

3 lines

X86/

peephole-recurrence.mir

232 lines

Diff 104697

lib/CodeGen/PeepholeOptimizer.cpp

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineDominators.h"		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
		#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineOperand.h"		#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/MC/MCInstrDesc.h"		#include "llvm/MC/MCInstrDesc.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
Show All 27 Lines	static cl::opt<bool> DisableNAPhysCopyOpt(
cl::desc("Disable non-allocatable physical register copy optimization"));		cl::desc("Disable non-allocatable physical register copy optimization"));

// Limit the number of PHI instructions to process		// Limit the number of PHI instructions to process
// in PeepholeOptimizer::getNextSource.		// in PeepholeOptimizer::getNextSource.
static cl::opt<unsigned> RewritePHILimit(		static cl::opt<unsigned> RewritePHILimit(
"rewrite-phi-limit", cl::Hidden, cl::init(10),		"rewrite-phi-limit", cl::Hidden, cl::init(10),
cl::desc("Limit the length of PHI chains to lookup"));		cl::desc("Limit the length of PHI chains to lookup"));

		// Limit the length of recurrence chain when evaluating the benefit of
		// commuting operands.
		static cl::opt<unsigned> MaxRecurrenceChain(
		"recurrence-chain-limit", cl::Hidden, cl::init(3),
		cl::desc("Maximum length of recurrence chain when evaluating the benefit "
		"of commuting operands"));


STATISTIC(NumReuse, "Number of extension results reused");		STATISTIC(NumReuse, "Number of extension results reused");
STATISTIC(NumCmps, "Number of compares eliminated");		STATISTIC(NumCmps, "Number of compares eliminated");
STATISTIC(NumImmFold, "Number of move immediate folded");		STATISTIC(NumImmFold, "Number of move immediate folded");
STATISTIC(NumLoadFold, "Number of loads folded");		STATISTIC(NumLoadFold, "Number of loads folded");
STATISTIC(NumSelects, "Number of selects optimized");		STATISTIC(NumSelects, "Number of selects optimized");
STATISTIC(NumUncoalescableCopies, "Number of uncoalescable copies optimized");		STATISTIC(NumUncoalescableCopies, "Number of uncoalescable copies optimized");
STATISTIC(NumRewrittenCopies, "Number of copies rewritten");		STATISTIC(NumRewrittenCopies, "Number of copies rewritten");
STATISTIC(NumNAPhysCopies, "Number of non-allocatable physical copies removed");		STATISTIC(NumNAPhysCopies, "Number of non-allocatable physical copies removed");

namespace {		namespace {

class ValueTrackerResult;		class ValueTrackerResult;
		class RecurrenceInstr;

class PeepholeOptimizer : public MachineFunctionPass {		class PeepholeOptimizer : public MachineFunctionPass {
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;
const TargetRegisterInfo *TRI;		const TargetRegisterInfo *TRI;
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;
MachineDominatorTree *DT; // Machine dominator tree		MachineDominatorTree *DT; // Machine dominator tree
		MachineLoopInfo *MLI;

public:		public:
static char ID; // Pass identification		static char ID; // Pass identification

PeepholeOptimizer() : MachineFunctionPass(ID) {		PeepholeOptimizer() : MachineFunctionPass(ID) {
initializePeepholeOptimizerPass(*PassRegistry::getPassRegistry());		initializePeepholeOptimizerPass(*PassRegistry::getPassRegistry());
}		}

bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
		AU.addRequired<MachineLoopInfo>();
		AU.addPreserved<MachineLoopInfo>();
if (Aggressive) {		if (Aggressive) {
AU.addRequired<MachineDominatorTree>();		AU.addRequired<MachineDominatorTree>();
AU.addPreserved<MachineDominatorTree>();		AU.addPreserved<MachineDominatorTree>();
}		}
}		}

/// \brief Track Def -> Use info used for rewriting copies.		/// \brief Track Def -> Use info used for rewriting copies.
typedef SmallDenseMap<TargetInstrInfo::RegSubRegPair, ValueTrackerResult>		typedef SmallDenseMap<TargetInstrInfo::RegSubRegPair, ValueTrackerResult>
RewriteMapTy;		RewriteMapTy;

		/// \brief Sequence of instructions that formulate recurrence cycle.
		typedef SmallVector<RecurrenceInstr, 4> RecurrenceCycle;

private:		private:
bool optimizeCmpInstr(MachineInstr MI, MachineBasicBlock MBB);		bool optimizeCmpInstr(MachineInstr MI, MachineBasicBlock MBB);
bool optimizeExtInstr(MachineInstr MI, MachineBasicBlock MBB,		bool optimizeExtInstr(MachineInstr MI, MachineBasicBlock MBB,
SmallPtrSetImpl<MachineInstr*> &LocalMIs);		SmallPtrSetImpl<MachineInstr*> &LocalMIs);
bool optimizeSelect(MachineInstr *MI,		bool optimizeSelect(MachineInstr *MI,
SmallPtrSetImpl<MachineInstr *> &LocalMIs);		SmallPtrSetImpl<MachineInstr *> &LocalMIs);
bool optimizeCondBranch(MachineInstr *MI);		bool optimizeCondBranch(MachineInstr *MI);
bool optimizeCoalescableCopy(MachineInstr *MI);		bool optimizeCoalescableCopy(MachineInstr *MI);
bool optimizeUncoalescableCopy(MachineInstr *MI,		bool optimizeUncoalescableCopy(MachineInstr *MI,
SmallPtrSetImpl<MachineInstr *> &LocalMIs);		SmallPtrSetImpl<MachineInstr *> &LocalMIs);
		bool optimizeRecurrence(MachineInstr &PHI);
bool findNextSource(unsigned Reg, unsigned SubReg,		bool findNextSource(unsigned Reg, unsigned SubReg,
RewriteMapTy &RewriteMap);		RewriteMapTy &RewriteMap);
bool isMoveImmediate(MachineInstr *MI,		bool isMoveImmediate(MachineInstr *MI,
SmallSet<unsigned, 4> &ImmDefRegs,		SmallSet<unsigned, 4> &ImmDefRegs,
DenseMap<unsigned, MachineInstr*> &ImmDefMIs);		DenseMap<unsigned, MachineInstr*> &ImmDefMIs);
bool foldImmediate(MachineInstr MI, MachineBasicBlock MBB,		bool foldImmediate(MachineInstr MI, MachineBasicBlock MBB,
SmallSet<unsigned, 4> &ImmDefRegs,		SmallSet<unsigned, 4> &ImmDefRegs,
DenseMap<unsigned, MachineInstr*> &ImmDefMIs);		DenseMap<unsigned, MachineInstr*> &ImmDefMIs);
		/// \brief Finds recurrence cycles, but only ones that formulated around
		/// a def operand and a use operand that are tied. If there is a use
		/// operand commutable with the tied use operand, find recurrence cycle
		/// along that operand as well.
		void findTargetRecurrence(unsigned Reg,
		const SmallSet<unsigned, 2> &TargetReg,
		RecurrenceCycle &RC,
		SmallVectorImpl<RecurrenceCycle> &RCs);

/// \brief If copy instruction \p MI is a virtual register copy, track it in		/// \brief If copy instruction \p MI is a virtual register copy, track it in
/// the set \p CopySrcRegs and \p CopyMIs. If this virtual register was		/// the set \p CopySrcRegs and \p CopyMIs. If this virtual register was
/// previously seen as a copy, replace the uses of this copy with the		/// previously seen as a copy, replace the uses of this copy with the
/// previously seen copy's destination register.		/// previously seen copy's destination register.
bool foldRedundantCopy(MachineInstr *MI,		bool foldRedundantCopy(MachineInstr *MI,
SmallSet<unsigned, 4> &CopySrcRegs,		SmallSet<unsigned, 4> &CopySrcRegs,
DenseMap<unsigned, MachineInstr *> &CopyMIs);		DenseMap<unsigned, MachineInstr *> &CopyMIs);
Show All 28 Lines	private:
bool isUncoalescableCopy(const MachineInstr &MI) {		bool isUncoalescableCopy(const MachineInstr &MI) {
return MI.isBitcast() \|\|		return MI.isBitcast() \|\|
(!DisableAdvCopyOpt &&		(!DisableAdvCopyOpt &&
(MI.isRegSequenceLike() \|\| MI.isInsertSubregLike() \|\|		(MI.isRegSequenceLike() \|\| MI.isInsertSubregLike() \|\|
MI.isExtractSubregLike()));		MI.isExtractSubregLike()));
}		}
};		};

		/// \brief Helper class to hold instructions that are inside recurrence
		/// cycles. The recurrence cycle is formulated around 1) a def operand and its
		/// tied use operand, or 2) a def operand and a use operand that is commutable
		/// with another use operand which is tied to the def operand. In the latter
		/// case, index of the tied use operand and the commutable use operand are
		/// maintained with CommutePair.
		class RecurrenceInstr {
		public:
		typedef std::pair<unsigned, unsigned> IndexPair;

		RecurrenceInstr(MachineInstr *MI) : MI(MI) {}
		RecurrenceInstr(MachineInstr *MI, unsigned Idx1, unsigned Idx2)
		: MI(MI), CommutePair(std::make_pair(Idx1, Idx2)) {}

		MachineInstr *getMI() const { return MI; }
		Optional<IndexPair> getCommutePair() const { return CommutePair; }

		private:
		MachineInstr *MI;
		Optional<IndexPair> CommutePair;
		};

/// \brief Helper class to hold a reply for ValueTracker queries. Contains the		/// \brief Helper class to hold a reply for ValueTracker queries. Contains the
/// returned sources for a given search and the instructions where the sources		/// returned sources for a given search and the instructions where the sources
/// were tracked from.		/// were tracked from.
class ValueTrackerResult {		class ValueTrackerResult {
private:		private:
/// Track all sources found by one ValueTracker query.		/// Track all sources found by one ValueTracker query.
SmallVector<TargetInstrInfo::RegSubRegPair, 2> RegSrcs;		SmallVector<TargetInstrInfo::RegSubRegPair, 2> RegSrcs;

▲ Show 20 Lines • Show All 174 Lines • ▼ Show 20 Lines
} // end anonymous namespace		} // end anonymous namespace

char PeepholeOptimizer::ID = 0;		char PeepholeOptimizer::ID = 0;
char &llvm::PeepholeOptimizerID = PeepholeOptimizer::ID;		char &llvm::PeepholeOptimizerID = PeepholeOptimizer::ID;

INITIALIZE_PASS_BEGIN(PeepholeOptimizer, DEBUG_TYPE,		INITIALIZE_PASS_BEGIN(PeepholeOptimizer, DEBUG_TYPE,
"Peephole Optimizations", false, false)		"Peephole Optimizations", false, false)
INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)		INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
		INITIALIZE_PASS_DEPENDENCY(MachineLoopInfo)
INITIALIZE_PASS_END(PeepholeOptimizer, DEBUG_TYPE,		INITIALIZE_PASS_END(PeepholeOptimizer, DEBUG_TYPE,
"Peephole Optimizations", false, false)		"Peephole Optimizations", false, false)

/// If instruction is a copy-like instruction, i.e. it reads a single register		/// If instruction is a copy-like instruction, i.e. it reads a single register
/// and writes a single register and it does not modify the source, and if the		/// and writes a single register and it does not modify the source, and if the
/// source value is preserved as a sub-register of the result, then replace all		/// source value is preserved as a sub-register of the result, then replace all
/// reachable uses of the source with the subreg of the result.		/// reachable uses of the source with the subreg of the result.
///		///
▲ Show 20 Lines • Show All 1,059 Lines • ▼ Show 20 Lines	bool PeepholeOptimizer::foldRedundantNAPhysCopy(
// register get a copy of the non-allocatable physical register, and we only		// register get a copy of the non-allocatable physical register, and we only
// track one such copy. Avoid getting confused by this new non-allocatable		// track one such copy. Avoid getting confused by this new non-allocatable
// physical register definition, and remove it from the tracked copies.		// physical register definition, and remove it from the tracked copies.
DEBUG(dbgs() << "NAPhysCopy: missed opportunity " << *MI << '\n');		DEBUG(dbgs() << "NAPhysCopy: missed opportunity " << *MI << '\n');
NAPhysToVirtMIs.erase(PrevCopy);		NAPhysToVirtMIs.erase(PrevCopy);
return false;		return false;
}		}

		/// \bried Returns true if \p MO is a virtual register operand.
		static bool isVirtualRegisterOperand(MachineOperand &MO) {
		if (!MO.isReg())
		return false;
		return TargetRegisterInfo::isVirtualRegister(MO.getReg());
		}

		void PeepholeOptimizer::findTargetRecurrence(
		unsigned Reg, const SmallSet<unsigned, 2> &TargetRegs, RecurrenceCycle &RC,
		SmallVectorImpl<RecurrenceCycle> &RCs) {
		// Recurrence found if Reg is in TargetRegs.
		if (TargetRegs.count(Reg)) {
		RCs.push_back(RC);
		return;
		}
		wmiUnsubmitted Not Done Reply Inline Actions For findTargetRecurrence, RCs will at most contain one RC, right? It may be better to remove RCs, change the return value of findTargetRecurrence to bool type, and use it to indicate whether a RC is found. wmi: For findTargetRecurrence, RCs will at most contain one RC, right? It may be better to remove…

		// TODO: Curerntly, we only allow the last instruction of the recurrence
		// cycle (the instruction that feeds the PHI instruction) to have more than
		// one uses to guarantee that commuting operands does not tie registers
		// with overlapping live range. Once we have actual live range info of
		// each register, this constraint can be relaxed.
		if (!MRI->hasOneNonDBGUse(Reg))
		return;

		// Give up if the reccurrence chain length is longer than the limit.
		if (RC.size() >= MaxRecurrenceChain)
		return;

		MachineInstr &MI = *(MRI->use_instr_nodbg_begin(Reg));
		unsigned Idx = MI.findRegisterUseOperandIdx(Reg);

		// Only interested in recurrences whose instructions have only one def, which
		// is a virtual register.
		if (MI.getDesc().getNumDefs() != 1)
		return;

		MachineOperand &DefOp = MI.getOperand(0);
		if (!isVirtualRegisterOperand(DefOp))
		return;

		// Check if def operand of MI is tied to any use operand. We are only
		// interested in the case that all the instructions in the recurrence chain
		// have there def operand tied with one of the use operand.
		unsigned TiedUseIdx;
		if (!MI.isRegTiedToUseOperand(0, &TiedUseIdx))
		return;

		if (Idx == TiedUseIdx) {
		RC.push_back(RecurrenceInstr(&MI));
		findTargetRecurrence(DefOp.getReg(), TargetRegs, RC, RCs);
		} else {
		// If Idx is not TiedUseIdx, check if Idx is commutable with TiedUseIdx.
		unsigned CommIdx = TargetInstrInfo::CommuteAnyOperandIndex;
		if (TII->findCommutedOpIndices(MI, Idx, CommIdx) && CommIdx == TiedUseIdx) {
		RC.push_back(RecurrenceInstr(&MI, Idx, CommIdx));
		findTargetRecurrence(DefOp.getReg(), TargetRegs, RC, RCs);
		}
		}
		}

		/// \brief Phi instructions will eventually be lowered to copy instructions. If
		/// phi is in a loop header, a recurrence may formulated around the source and
		/// destination of the phi. For such case commuting operands of the instructions
		/// in the recurrence may enable coalescing of the copy instruction generated
		/// from the phi. For example, if there is a recurrence of
		///
		/// LoopHeader:
		/// %vreg1 = phi(%vreg0, %vreg100)
		/// LoopLatch:
		/// %vreg0<def, tied1> = ADD %vreg2<def, tied0>, %vreg1
		///
		/// , the fact that vreg0 and vreg2 are in the same tied operands set makes
		/// the coalescing of copy instruction generated from the phi in
		/// LoopHeader(i.e. %vreg1 = COPY %vreg0) impossible, because %vreg1 and
		/// %vreg2 have overlapping live range. This introduces additional move
		/// instruction to the final assembly. However, if we commute %vreg2 and
		/// %vreg1 of ADD instruction, the redundant move instruction can be
		/// avoided.
		bool PeepholeOptimizer::optimizeRecurrence(MachineInstr &PHI) {
		bool Changed = false;
		SmallSet<unsigned, 2> TargetRegs;
		for (unsigned Idx = 1; Idx < PHI.getNumOperands(); Idx += 2) {
		MachineOperand &MO = PHI.getOperand(Idx);
		assert(isVirtualRegisterOperand(MO) && "Invalid PHI instruction");
		TargetRegs.insert(MO.getReg());
		}

		RecurrenceCycle RC;
		SmallVector<RecurrenceCycle, 4> RCs;
		findTargetRecurrence(PHI.getOperand(0).getReg(), TargetRegs, RC, RCs);

		// Commutes operands of instructions in each RC if necessary so that the
		// copy to be generated from PHI can be coalesced.
		for (auto &RC : RCs) {
		DEBUG(dbgs() << "Optimize recurrence chain from " << PHI);
		for (auto &RI : RC) {
		DEBUG(dbgs() << "\tInst: " << *(RI.getMI()));
		auto CP = RI.getCommutePair();
		if (CP) {
		Changed = true;
		TII->commuteInstruction((RI.getMI()), false, (CP).first,
		(*CP).second);
		DEBUG(dbgs() << "\t\tCommuted: " << *(RI.getMI()));
		}
		}
		}

		return Changed;
		}

bool PeepholeOptimizer::runOnMachineFunction(MachineFunction &MF) {		bool PeepholeOptimizer::runOnMachineFunction(MachineFunction &MF) {
if (skipFunction(*MF.getFunction()))		if (skipFunction(*MF.getFunction()))
return false;		return false;

DEBUG(dbgs() << "******** PEEPHOLE OPTIMIZER ********\n");		DEBUG(dbgs() << "******** PEEPHOLE OPTIMIZER ********\n");
DEBUG(dbgs() << "********** Function: " << MF.getName() << '\n');		DEBUG(dbgs() << "********** Function: " << MF.getName() << '\n');

if (DisablePeephole)		if (DisablePeephole)
return false;		return false;

TII = MF.getSubtarget().getInstrInfo();		TII = MF.getSubtarget().getInstrInfo();
TRI = MF.getSubtarget().getRegisterInfo();		TRI = MF.getSubtarget().getRegisterInfo();
MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();
DT = Aggressive ? &getAnalysis<MachineDominatorTree>() : nullptr;		DT = Aggressive ? &getAnalysis<MachineDominatorTree>() : nullptr;
		MLI = &getAnalysis<MachineLoopInfo>();

bool Changed = false;		bool Changed = false;

for (MachineBasicBlock &MBB : MF) {		for (MachineBasicBlock &MBB : MF) {
bool SeenMoveImm = false;		bool SeenMoveImm = false;

// During this forward scan, at some point it needs to answer the question		// During this forward scan, at some point it needs to answer the question
// "given a pointer to an MI in the current BB, is it located before or		// "given a pointer to an MI in the current BB, is it located before or
Show All 12 Lines	for (MachineBasicBlock &MBB : MF) {
// %PHYSREG is the map index; MI is the last valid `%vreg = COPY %PHYSREG`		// %PHYSREG is the map index; MI is the last valid `%vreg = COPY %PHYSREG`
// without any intervening re-definition of %PHYSREG.		// without any intervening re-definition of %PHYSREG.
DenseMap<unsigned, MachineInstr *> NAPhysToVirtMIs;		DenseMap<unsigned, MachineInstr *> NAPhysToVirtMIs;

// Set of virtual registers that are copied from.		// Set of virtual registers that are copied from.
SmallSet<unsigned, 4> CopySrcRegs;		SmallSet<unsigned, 4> CopySrcRegs;
DenseMap<unsigned, MachineInstr *> CopySrcMIs;		DenseMap<unsigned, MachineInstr *> CopySrcMIs;

		bool IsLoopHeader = MLI->isLoopHeader(&MBB);

for (MachineBasicBlock::iterator MII = MBB.begin(), MIE = MBB.end();		for (MachineBasicBlock::iterator MII = MBB.begin(), MIE = MBB.end();
MII != MIE; ) {		MII != MIE; ) {
MachineInstr MI = &MII;		MachineInstr MI = &MII;
// We may be erasing MI below, increment MII now.		// We may be erasing MI below, increment MII now.
++MII;		++MII;
LocalMIs.insert(MI);		LocalMIs.insert(MI);

// Skip debug values. They should not affect this peephole optimization.		// Skip debug values. They should not affect this peephole optimization.
if (MI->isDebugValue())		if (MI->isDebugValue())
continue;		continue;

if (MI->isPosition() \|\| MI->isPHI())		if (MI->isPosition())
		continue;

		if (IsLoopHeader && MI->isPHI()) {
		if (optimizeRecurrence(*MI)) {
		Changed = true;
continue;		continue;
		}
		}

if (!MI->isCopy()) {		if (!MI->isCopy()) {
for (const auto &Op : MI->operands()) {		for (const auto &Op : MI->operands()) {
// Visit all operands: definitions can be implicit or explicit.		// Visit all operands: definitions can be implicit or explicit.
if (Op.isReg()) {		if (Op.isReg()) {
unsigned Reg = Op.getReg();		unsigned Reg = Op.getReg();
if (Op.isDef() && isNAPhysCopy(Reg)) {		if (Op.isDef() && isNAPhysCopy(Reg)) {
const auto &Def = NAPhysToVirtMIs.find(Reg);		const auto &Def = NAPhysToVirtMIs.find(Reg);
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	for (MachineBasicBlock::iterator MII = MBB.begin(), MIE = MBB.end();
LocalMIs.erase(MI);		LocalMIs.erase(MI);
LocalMIs.erase(DefMI);		LocalMIs.erase(DefMI);
LocalMIs.insert(FoldMI);		LocalMIs.insert(FoldMI);
MI->eraseFromParent();		MI->eraseFromParent();
DefMI->eraseFromParent();		DefMI->eraseFromParent();
MRI->markUsesInDebugValueAsUndef(FoldedReg);		MRI->markUsesInDebugValueAsUndef(FoldedReg);
FoldAsLoadDefCandidates.erase(FoldedReg);		FoldAsLoadDefCandidates.erase(FoldedReg);
++NumLoadFold;		++NumLoadFold;

// MI is replaced with FoldMI so we can continue trying to fold		// MI is replaced with FoldMI so we can continue trying to fold
Changed = true;		Changed = true;
MI = FoldMI;		MI = FoldMI;
}		}
}		}
}		}
}		}

// If we run into an instruction we can't fold across, discard		// If we run into an instruction we can't fold across, discard
// the load candidates. Note: We might be able to fold into this		// the load candidates. Note: We might be able to fold into this
// instruction, so this needs to be after the folding logic.		// instruction, so this needs to be after the folding logic.
if (MI->isLoadFoldBarrier()) {		if (MI->isLoadFoldBarrier()) {
DEBUG(dbgs() << "Encountered load fold barrier on " << *MI << "\n");		DEBUG(dbgs() << "Encountered load fold barrier on " << *MI << "\n");
FoldAsLoadDefCandidates.clear();		FoldAsLoadDefCandidates.clear();
}		}

▲ Show 20 Lines • Show All 297 Lines • Show Last 20 Lines

lib/CodeGen/TwoAddressInstructionPass.cpp

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
STATISTIC(NumReSchedDowns, "Number of instructions re-scheduled down");		STATISTIC(NumReSchedDowns, "Number of instructions re-scheduled down");

// Temporary flag to disable rescheduling.		// Temporary flag to disable rescheduling.
static cl::opt<bool>		static cl::opt<bool>
EnableRescheduling("twoaddr-reschedule",		EnableRescheduling("twoaddr-reschedule",
cl::desc("Coalesce copies by rescheduling (default=true)"),		cl::desc("Coalesce copies by rescheduling (default=true)"),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

		// Limit the number of dataflow edges to traverse when evaluating the benefit
		// of commuting operands.
		static cl::opt<unsigned> MaxDataFlowEdge(
		"dataflow-edge-limit", cl::Hidden, cl::init(3),
		cl::desc("Maximum number of dataflow edges to traverse when evaluating "
		"the benefit of commuting operands"));

namespace {		namespace {
class TwoAddressInstructionPass : public MachineFunctionPass {		class TwoAddressInstructionPass : public MachineFunctionPass {
MachineFunction *MF;		MachineFunction *MF;
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;
const TargetRegisterInfo *TRI;		const TargetRegisterInfo *TRI;
const InstrItineraryData *InstrItins;		const InstrItineraryData *InstrItins;
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;
LiveVariables *LV;		LiveVariables *LV;
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addUsedIfAvailable<AAResultsWrapperPass>();		AU.addUsedIfAvailable<AAResultsWrapperPass>();
AU.addUsedIfAvailable<LiveVariables>();		AU.addUsedIfAvailable<LiveVariables>();
AU.addPreserved<LiveVariables>();		AU.addPreserved<LiveVariables>();
AU.addPreserved<SlotIndexes>();		AU.addPreserved<SlotIndexes>();
AU.addPreserved<LiveIntervals>();		AU.addPreserved<LiveIntervals>();
AU.addPreservedID(MachineLoopInfoID);		AU.addPreservedID(MachineLoopInfoID);
AU.addPreservedID(MachineDominatorsID);		AU.addPreservedID(MachineDominatorsID);
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
		MatzeBUnsubmitted Not Done Reply Inline Actions This makes no sense! You are starting to use the machine loop info, all you do here is mark it preserved a 2nd time; There was already addPreservedID(MachineLoopInfoID) which had the same effect. MatzeB: This makes no sense! You are starting to use the machine loop info, all you do here is mark it…
}		}

/// Pass entry point.		/// Pass entry point.
bool runOnMachineFunction(MachineFunction&) override;		bool runOnMachineFunction(MachineFunction&) override;
};		};
} // end anonymous namespace		} // end anonymous namespace

char TwoAddressInstructionPass::ID = 0;		char TwoAddressInstructionPass::ID = 0;
▲ Show 20 Lines • Show All 372 Lines • ▼ Show 20 Lines	for (unsigned R : Set)
if (TRI->regsOverlap(R, Reg))		if (TRI->regsOverlap(R, Reg))
return true;		return true;

return false;		return false;
}		}

/// Return true if it's potentially profitable to commute the two-address		/// Return true if it's potentially profitable to commute the two-address
/// instruction that's being processed.		/// instruction that's being processed.
bool		bool
		MatzeBUnsubmitted Not Done Reply Inline Actions Use references for things that cannot be nullptr. Similar for a few other places here. MatzeB: Use references for things that cannot be nullptr. Similar for a few other places here.
TwoAddressInstructionPass::		TwoAddressInstructionPass::
isProfitableToCommute(unsigned regA, unsigned regB, unsigned regC,		isProfitableToCommute(unsigned regA, unsigned regB, unsigned regC,
MachineInstr *MI, unsigned Dist) {		MachineInstr *MI, unsigned Dist) {
		MatzeBUnsubmitted Not Done Reply Inline Actions Don't use `auto` when the type isn't immediately obvious just by looking at the line. (`auto` is unfriendly towards the readers of your code). Similar in a few more places. MatzeB: Don't use `auto` when the type isn't immediately obvious just by looking at the line. (`auto`…
if (OptLevel == CodeGenOpt::None)		if (OptLevel == CodeGenOpt::None)
return false;		return false;

// Determine if it's profitable to commute this two address instruction. In		// Determine if it's profitable to commute this two address instruction. In
// general, we want no uses between this instruction and the definition of		// general, we want no uses between this instruction and the definition of
		MatzeBUnsubmitted Not Done Reply Inline Actions we don't tend to have a space before the ';' here. Did you try clang-format on your patch? MatzeB: we don't tend to have a space before the ';' here. Did you try clang-format on your patch?
// the two-address register.		// the two-address register.
// e.g.		// e.g.
// %reg1028<def> = EXTRACT_SUBREG %reg1027<kill>, 1		// %reg1028<def> = EXTRACT_SUBREG %reg1027<kill>, 1
// %reg1029<def> = MOV8rr %reg1028		// %reg1029<def> = MOV8rr %reg1028
// %reg1029<def> = SHR8ri %reg1029, 7, %EFLAGS<imp-def,dead>		// %reg1029<def> = SHR8ri %reg1029, 7, %EFLAGS<imp-def,dead>
// insert => %reg1030<def> = MOV8rr %reg1028		// insert => %reg1030<def> = MOV8rr %reg1028
// %reg1030<def> = ADD8rr %reg1028<kill>, %reg1029<kill>, %EFLAGS<imp-def,dead>		// %reg1030<def> = ADD8rr %reg1028<kill>, %reg1029<kill>, %EFLAGS<imp-def,dead>
// In this case, it might not be possible to coalesce the second MOV8rr		// In this case, it might not be possible to coalesce the second MOV8rr
Show All 19 Lines	isProfitableToCommute(unsigned regA, unsigned regB, unsigned regC,
// r0 = MOV %reg1026		// r0 = MOV %reg1026
// Commute the ADD to hopefully eliminate an otherwise unavoidable copy.		// Commute the ADD to hopefully eliminate an otherwise unavoidable copy.
unsigned ToRegA = getMappedReg(regA, DstRegMap);		unsigned ToRegA = getMappedReg(regA, DstRegMap);
if (ToRegA) {		if (ToRegA) {
unsigned FromRegB = getMappedReg(regB, SrcRegMap);		unsigned FromRegB = getMappedReg(regB, SrcRegMap);
unsigned FromRegC = getMappedReg(regC, SrcRegMap);		unsigned FromRegC = getMappedReg(regC, SrcRegMap);
bool CompB = FromRegB && regsAreCompatible(FromRegB, ToRegA, TRI);		bool CompB = FromRegB && regsAreCompatible(FromRegB, ToRegA, TRI);
bool CompC = FromRegC && regsAreCompatible(FromRegC, ToRegA, TRI);		bool CompC = FromRegC && regsAreCompatible(FromRegC, ToRegA, TRI);

		MatzeBUnsubmitted Not Done Reply Inline Actions This only gives you the declared/minimum number of operands, there may be more. MatzeB: This only gives you the declared/minimum number of operands, there may be more.
// Compute if any of the following are true:		// Compute if any of the following are true:
// -RegB is not tied to a register and RegC is compatible with RegA.		// -RegB is not tied to a register and RegC is compatible with RegA.
// -RegB is tied to the wrong physical register, but RegC is.		// -RegB is tied to the wrong physical register, but RegC is.
// -RegB is tied to the wrong physical register, and RegC isn't tied.		// -RegB is tied to the wrong physical register, and RegC isn't tied.
if ((!FromRegB && CompC) \|\| (FromRegB && !CompB && (!FromRegC \|\| CompC)))		if ((!FromRegB && CompC) \|\| (FromRegB && !CompB && (!FromRegC \|\| CompC)))
return true;		return true;
// Don't compute if any of the following are true:		// Don't compute if any of the following are true:
// -RegC is not tied to a register and RegB is compatible with RegA.		// -RegC is not tied to a register and RegB is compatible with RegA.
Show All 25 Lines	isProfitableToCommute(unsigned regA, unsigned regB, unsigned regC,
// to eliminate an otherwise unavoidable copy.		// to eliminate an otherwise unavoidable copy.
// FIXME:		// FIXME:
// We can extend the logic further: If an pair of operands in an insn has		// We can extend the logic further: If an pair of operands in an insn has
// been merged, the insn could be regarded as a virtual copy, and the virtual		// been merged, the insn could be regarded as a virtual copy, and the virtual
// copy could also be used to construct a copy chain.		// copy could also be used to construct a copy chain.
// To more generally minimize register copies, ideally the logic of two addr		// To more generally minimize register copies, ideally the logic of two addr
// instruction pass should be integrated with register allocation pass where		// instruction pass should be integrated with register allocation pass where
// interference graph is available.		// interference graph is available.
if (isRevCopyChain(regC, regA, 3))		if (isRevCopyChain(regC, regA, MaxDataFlowEdge))
return true;		return true;

if (isRevCopyChain(regB, regA, 3))		if (isRevCopyChain(regB, regA, MaxDataFlowEdge))
return false;		return false;

// Since there are no intervening uses for both registers, then commute		// Since there are no intervening uses for both registers, then commute
// if the def of regC is closer. Its live interval is shorter.		// if the def of regC is closer. Its live interval is shorter.
return LastDefB && LastDefC && LastDefC > LastDefB;		return LastDefB && LastDefC && LastDefC > LastDefB;
}		}

/// Commute a two-address instruction and update the basic block, distance map,		/// Commute a two-address instruction and update the basic block, distance map,
Show All 31 Lines
/// to a 3-address one.		/// to a 3-address one.
bool		bool
TwoAddressInstructionPass::isProfitableToConv3Addr(unsigned RegA,unsigned RegB){		TwoAddressInstructionPass::isProfitableToConv3Addr(unsigned RegA,unsigned RegB){
// Look for situations like this:		// Look for situations like this:
// %reg1024<def> = MOV r1		// %reg1024<def> = MOV r1
// %reg1025<def> = MOV r0		// %reg1025<def> = MOV r0
// %reg1026<def> = ADD %reg1024, %reg1025		// %reg1026<def> = ADD %reg1024, %reg1025
// r2 = MOV %reg1026		// r2 = MOV %reg1026
// Turn ADD into a 3-address instruction to avoid a copy.		// Turn ADD into a 3-address instruction to avoid a copy.
unsigned FromRegB = getMappedReg(RegB, SrcRegMap);		unsigned FromRegB = getMappedReg(RegB, SrcRegMap);
		MatzeBUnsubmitted Not Done Reply Inline Actions Isn't this check superfluous? I would expect this to be true anyway if the instruction is inside `MBB` which is checked later. MatzeB: Isn't this check superfluous? I would expect this to be true anyway if the instruction is…
		twohAuthorUnsubmitted Not Done Reply Inline Actions This check is here to filter the definitions that are outside of Loop. The check below is to see if the definition inside the Loop belongs to MBB. twoh: This check is here to filter the definitions that are outside of Loop. The check below is to…
if (!FromRegB)		if (!FromRegB)
return false;		return false;
unsigned ToRegA = getMappedReg(RegA, DstRegMap);		unsigned ToRegA = getMappedReg(RegA, DstRegMap);
return (ToRegA && !regsAreCompatible(FromRegB, ToRegA, TRI));		return (ToRegA && !regsAreCompatible(FromRegB, ToRegA, TRI));
}		}

/// Convert the specified two-address instruction into a three address one.		/// Convert the specified two-address instruction into a three address one.
/// Return true if this transformation was successful.		/// Return true if this transformation was successful.
▲ Show 20 Lines • Show All 1,125 Lines • Show Last 20 Lines

test/CodeGen/MIR/Generic/multiRunPass.mir

	# RUN: llc -run-pass expand-isel-pseudos -run-pass peephole-opt -debug-pass=Arguments -o - %s 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=PSEUDO_PEEPHOLE			# RUN: llc -run-pass expand-isel-pseudos -run-pass peephole-opt -debug-pass=Arguments -o - %s 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=PSEUDO_PEEPHOLE
	# RUN: llc -run-pass expand-isel-pseudos,peephole-opt -debug-pass=Arguments -o - %s 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=PSEUDO_PEEPHOLE			# RUN: llc -run-pass expand-isel-pseudos,peephole-opt -debug-pass=Arguments -o - %s 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=PSEUDO_PEEPHOLE
	# RUN: llc -run-pass peephole-opt -run-pass expand-isel-pseudos -debug-pass=Arguments -o - %s 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=PEEPHOLE_PSEUDO			# RUN: llc -run-pass peephole-opt -run-pass expand-isel-pseudos -debug-pass=Arguments -o - %s 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=PEEPHOLE_PSEUDO
	# RUN: llc -run-pass peephole-opt,expand-isel-pseudos -debug-pass=Arguments -o - %s 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=PEEPHOLE_PSEUDO			# RUN: llc -run-pass peephole-opt,expand-isel-pseudos -debug-pass=Arguments -o - %s 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=PEEPHOLE_PSEUDO
	# REQUIRES: asserts			# REQUIRES: asserts

	# This test ensures that the command line accepts			# This test ensures that the command line accepts
	# several run passes on the same command line and			# several run passes on the same command line and
	# actually create the proper pipeline for it.			# actually create the proper pipeline for it.
	# PSEUDO_PEEPHOLE: -expand-isel-pseudos {{(-machineverifier )?}}-peephole-opt			# PSEUDO_PEEPHOLE: -expand-isel-pseudos
				# PSEUDO_PEEPHOLE-SAME: {{(-machineverifier )?}}-peephole-opt
	# PEEPHOLE_PSEUDO: -peephole-opt {{(-machineverifier )?}}-expand-isel-pseudos			# PEEPHOLE_PSEUDO: -peephole-opt {{(-machineverifier )?}}-expand-isel-pseudos

	# Make sure there are no other passes happening after what we asked.			# Make sure there are no other passes happening after what we asked.
	# CHECK-NEXT: --- \|			# CHECK-NEXT: --- \|
	---			---
	# CHECK: name: foo			# CHECK: name: foo
	name: foo			name: foo
	body: \|			body: \|
	bb.0:			bb.0:
	...			...

test/CodeGen/X86/peephole-recurrence.mir

This file was added.

				# RUN: llc -mtriple=x86_64-- -run-pass=peephole-opt -o - %s \| FileCheck %s

				--- \|
				define i32 @foo(i32 %a) {
				bb0:
				br label %bb1

				bb1: ; preds = %bb7, %bb0
				%vreg0 = phi i32 [ 0, %bb0 ], [ %vreg3, %bb7 ]
				%cond0 = icmp eq i32 %a, 0
				br i1 %cond0, label %bb4, label %bb3

				bb3: ; preds = %bb1
				br label %bb4

				bb4: ; preds = %bb1, %bb3
				%vreg5 = phi i32 [ 2, %bb3 ], [ 1, %bb1 ]
				%cond1 = icmp eq i32 %vreg5, 0
				br i1 %cond1, label %bb7, label %bb6

				bb6: ; preds = %bb4
				br label %bb7

				bb7: ; preds = %bb4, %bb6
				%vreg1 = phi i32 [ 2, %bb6 ], [ 1, %bb4 ]
				%vreg2 = add i32 %vreg5, %vreg0
				%vreg3 = add i32 %vreg1, %vreg2
				%cond2 = icmp slt i32 %vreg3, 10
				br i1 %cond2, label %bb1, label %bb8

				bb8: ; preds = %bb7
				ret i32 0
				}

				define i32 @bar(i32 %a, i32* %p) {
				bb0:
				br label %bb1

				bb1: ; preds = %bb7, %bb0
				%vreg0 = phi i32 [ 0, %bb0 ], [ %vreg3, %bb7 ]
				%cond0 = icmp eq i32 %a, 0
				br i1 %cond0, label %bb4, label %bb3

				bb3: ; preds = %bb1
				br label %bb4

				bb4: ; preds = %bb1, %bb3
				%vreg5 = phi i32 [ 2, %bb3 ], [ 1, %bb1 ]
				%cond1 = icmp eq i32 %vreg5, 0
				br i1 %cond1, label %bb7, label %bb6

				bb6: ; preds = %bb4
				br label %bb7

				bb7: ; preds = %bb4, %bb6
				%vreg1 = phi i32 [ 2, %bb6 ], [ 1, %bb4 ]
				%vreg2 = add i32 %vreg5, %vreg0
				store i32 %vreg0, i32* %p
				%vreg3 = add i32 %vreg1, %vreg2
				%cond2 = icmp slt i32 %vreg3, 10
				br i1 %cond2, label %bb1, label %bb8

				bb8: ; preds = %bb7
				ret i32 0
				}

				...
				---
				# There is a recurrence formulated around %0, %10, and %3. Check that operands
				# are commuted for ADD instructions in bb.5.bb7 so that the values involved in
				# the recurrence are tied. This will remove redundant copy instruction.
				name: foo
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gr32, preferred-register: '' }
				- { id: 1, class: gr32, preferred-register: '' }
				- { id: 2, class: gr32, preferred-register: '' }
				- { id: 3, class: gr32, preferred-register: '' }
				- { id: 4, class: gr32, preferred-register: '' }
				- { id: 5, class: gr32, preferred-register: '' }
				- { id: 6, class: gr32, preferred-register: '' }
				- { id: 7, class: gr32, preferred-register: '' }
				- { id: 8, class: gr32, preferred-register: '' }
				- { id: 9, class: gr32, preferred-register: '' }
				- { id: 10, class: gr32, preferred-register: '' }
				- { id: 11, class: gr32, preferred-register: '' }
				- { id: 12, class: gr32, preferred-register: '' }
				liveins:
				- { reg: '%edi', virtual-reg: '%4' }
				body: \|
				bb.0.bb0:
				successors: %bb.1.bb1(0x80000000)
				liveins: %edi

				%4 = COPY %edi
				%5 = MOV32r0 implicit-def dead %eflags

				bb.1.bb1:
				successors: %bb.3.bb4(0x30000000), %bb.2.bb3(0x50000000)

				; CHECK: %0 = PHI %5, %bb.0.bb0, %3, %bb.5.bb7
				%0 = PHI %5, %bb.0.bb0, %3, %bb.5.bb7
				%6 = MOV32ri 1
				TEST32rr %4, %4, implicit-def %eflags
				JE_1 %bb.3.bb4, implicit %eflags
				JMP_1 %bb.2.bb3

				bb.2.bb3:
				successors: %bb.3.bb4(0x80000000)

				%7 = MOV32ri 2

				bb.3.bb4:
				successors: %bb.5.bb7(0x30000000), %bb.4.bb6(0x50000000)

				%1 = PHI %6, %bb.1.bb1, %7, %bb.2.bb3
				TEST32rr %1, %1, implicit-def %eflags
				JE_1 %bb.5.bb7, implicit %eflags
				JMP_1 %bb.4.bb6

				bb.4.bb6:
				successors: %bb.5.bb7(0x80000000)

				%9 = MOV32ri 2

				bb.5.bb7:
				successors: %bb.1.bb1(0x7c000000), %bb.6.bb8(0x04000000)

				%2 = PHI %6, %bb.3.bb4, %9, %bb.4.bb6
				%10 = ADD32rr %1, %0, implicit-def dead %eflags
				; CHECK: %10 = ADD32rr
				; CHECK-SAME: %0,
				; CHECK-SAME: %1,
				%3 = ADD32rr %2, killed %10, implicit-def dead %eflags
				; CHECK: %3 = ADD32rr
				; CHECK-SAME: %10,
				; CHECK-SAME: %2,
				%11 = SUB32ri8 %3, 10, implicit-def %eflags
				JL_1 %bb.1.bb1, implicit %eflags
				JMP_1 %bb.6.bb8

				bb.6.bb8:
				%12 = MOV32r0 implicit-def dead %eflags
				%eax = COPY %12
				RET 0, %eax

				...
				---
				# Here a recurrence is formulated around %0, %11, and %3, but operands should
				# not be commuted because %0 has a use outside of recurrence. This is to
				# prevent the case of commuting operands ties the values with overlapping live
				# ranges.
				name: bar
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gr32, preferred-register: '' }
				- { id: 1, class: gr32, preferred-register: '' }
				- { id: 2, class: gr32, preferred-register: '' }
				- { id: 3, class: gr32, preferred-register: '' }
				- { id: 4, class: gr32, preferred-register: '' }
				- { id: 5, class: gr64, preferred-register: '' }
				- { id: 6, class: gr32, preferred-register: '' }
				- { id: 7, class: gr32, preferred-register: '' }
				- { id: 8, class: gr32, preferred-register: '' }
				- { id: 9, class: gr32, preferred-register: '' }
				- { id: 10, class: gr32, preferred-register: '' }
				- { id: 11, class: gr32, preferred-register: '' }
				- { id: 12, class: gr32, preferred-register: '' }
				- { id: 13, class: gr32, preferred-register: '' }
				liveins:
				- { reg: '%edi', virtual-reg: '%4' }
				- { reg: '%rsi', virtual-reg: '%5' }
				body: \|
				bb.0.bb0:
				successors: %bb.1.bb1(0x80000000)
				liveins: %edi, %rsi

				%5 = COPY %rsi
				%4 = COPY %edi
				%6 = MOV32r0 implicit-def dead %eflags

				bb.1.bb1:
				successors: %bb.3.bb4(0x30000000), %bb.2.bb3(0x50000000)

				%0 = PHI %6, %bb.0.bb0, %3, %bb.5.bb7
				; CHECK: %0 = PHI %6, %bb.0.bb0, %3, %bb.5.bb7
				%7 = MOV32ri 1
				TEST32rr %4, %4, implicit-def %eflags
				JE_1 %bb.3.bb4, implicit %eflags
				JMP_1 %bb.2.bb3

				bb.2.bb3:
				successors: %bb.3.bb4(0x80000000)

				%8 = MOV32ri 2

				bb.3.bb4:
				successors: %bb.5.bb7(0x30000000), %bb.4.bb6(0x50000000)

				%1 = PHI %7, %bb.1.bb1, %8, %bb.2.bb3
				TEST32rr %1, %1, implicit-def %eflags
				JE_1 %bb.5.bb7, implicit %eflags
				JMP_1 %bb.4.bb6

				bb.4.bb6:
				successors: %bb.5.bb7(0x80000000)

				%10 = MOV32ri 2

				bb.5.bb7:
				successors: %bb.1.bb1(0x7c000000), %bb.6.bb8(0x04000000)

				%2 = PHI %7, %bb.3.bb4, %10, %bb.4.bb6
				%11 = ADD32rr %1, %0, implicit-def dead %eflags
				; CHECK: %11 = ADD32rr
				; CHECK-SAME: %1,
				; CHECK-SAME: %0,
				MOV32mr %5, 1, _, 0, _, %0 :: (store 4 into %ir.p)
				%3 = ADD32rr %2, killed %11, implicit-def dead %eflags
				; CHECK: %3 = ADD32rr
				; CHECK-SAME: %2,
				; CHECK-SAME: %11,
				%12 = SUB32ri8 %3, 10, implicit-def %eflags
				JL_1 %bb.1.bb1, implicit %eflags
				JMP_1 %bb.6.bb8

				bb.6.bb8:
				%13 = MOV32r0 implicit-def dead %eflags
				%eax = COPY %13
				RET 0, %eax

				...