This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
7
TwoAddressInstructionPass.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
1
twoaddr-recurrence.ll

Differential D31821

Remove redundant copy in recurrences
ClosedPublic

Authored by twoh on Apr 7 2017, 11:19 AM.

Download Raw Diff

Details

Reviewers

qcolombet
MatzeB
wmi

Commits

rG0e35ea3b7c63: Remove redundant copy in recurrences
rL306758: Remove redundant copy in recurrences

Summary

If there is a chain of instructions formulating a recurrence, commuting operands can help removing a redundant copy. In the following example code,

BB#1: ; Loop Header
  %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
  ...

BB#6: ; Loop Latch
  %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
  %vreg10<def,tied1> = ADD32rr %vreg1<kill,tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1,%vreg0
  %vreg3<def,tied1> = ADD32rr %vreg2<kill,tied0>, %vreg10<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2,%vreg10
  CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
  %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
  JL_1 <BB#1>, %EFLAGS<imp-use,kill>

Existing two-address generation pass generates following code:

BB#1:
  %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
  ...

BB#6:
    Predecessors according to CFG: BB#5 BB#4
  %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
  %vreg10<def> = COPY %vreg1<kill>; GR32:%vreg10,%vreg1
  %vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg0
  %vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10
  %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2
  CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
  %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
  JL_1 <BB#1>, %EFLAGS<imp-use,kill>
  JMP_1 <BB#7>

This is suboptimal because the assembly code generated has a redundant copy at the end of #BB6 to feed %vreg13 to BB#1:

.LBB0_6:
  addl  %esi, %edi
  addl  %ebx, %edi
  cmpl  $10, %edi
  movl  %edi, %esi
  jl  .LBB0_1

This redundant copy can be elimiated by making instructions in the recurrence chain to compute the value "into" the register that actually holds the feedback value. In this example, this can be achieved by commuting %vreg0 and %vreg1 to compute %vreg10. With that change, code after two-address generation becomes

BB#1:
  %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
  ...

BB#6: derived from LLVM BB %bb7
    Predecessors according to CFG: BB#5 BB#4
  %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
  %vreg10<def> = COPY %vreg0<kill>; GR32:%vreg10,%vreg0
  %vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg1<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1
  %vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10
  %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2
  CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
  %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
  JL_1 <BB#1>, %EFLAGS<imp-use,kill>
  JMP_1 <BB#7>

and the final assembly does not have redundant copy:

.LBB0_6:
  addl  %edi, %eax
  addl  %ebx, %eax
  cmpl  $10, %eax
  jl  .LBB0_1

Diff Detail

Build Status

Buildable 5389
Build 5389: arc lint + arc unit

Event Timeline

twoh created this revision.Apr 7 2017, 11:19 AM

twoh edited the summary of this revision. (Show Details)Apr 7 2017, 11:19 AM

small changes to the test.

Probably best to have Quentin or Matthias do it.

Friendly ping. Thanks!

Ping. Thanks!

This looks pretty complicated for the task at hand. Wouldn't this be a simpler transformation to perform at a place where we are still in Machine SSA and have use-def information for free and also have phis around to indicate loops so we can do without a MachineLoopInfo instance.

@MatzeB I was considering other passes as well for the sake of simplicity, but I concluded two-address instruction is the right place to do. The problem doesn't appear while we're in SSA, but becomes an issue when we make an decision about which operand register should be the destination register. I think this is why isProfitableToCommute function is in two-address instruction pass.

Ping. If you're too busy to review this patch, could you please recommend someone else? Thanks!

Ping.

Ping. Can someone please let me know what would be the best way to have this patch reviewed? Thanks!

I'd like to hear @qcolombet's opinion about this, also +wmi who wrote the similar isRevCopyChain().

This patch

Nitpicks below
Do you have an idea of the compiletime impact of this patch?
I am slightly worried about the use of MachineLoopInfo: Is it really necessary here? Maybe the same can be done with some simpler dominance check? Can you tell when the MachineLoopInfo is/isn't available (as that will impact whether this rule is applied or not).

General Observations

I am not happy with the general approach of throwing more and more patterns at TwoAddressInstructions:

Yes, LLVMs handling of TwoAddressInstruction as a pre-RA pass is a bad idea IMO (and I don't know why it was done this way): We are unnecessarily constraining the allocation problem and don't necessarily do a good job upfront without knowing how the allocation will work out.
This pass adds another pattern at TwoAddressInstruction: It is similar to isRevCopyChain() but slightly more complicated so we end up with 160 extra lines of analysis for yet another pattern. If this trend continues we will have an even harder time maintaining this pass.
On the other hand this improves code quality today without rearchitecting the code.

lib/CodeGen/TwoAddressInstructionPass.cpp
187	This makes no sense! You are starting to use the machine loop info, all you do here is mark it preserved a 2nd time; There was already addPreservedID(MachineLoopInfoID) which had the same effect.
577	Use references for things that cannot be nullptr. Similar for a few other places here.
579–580	Don't use `auto` when the type isn't immediately obvious just by looking at the line. (`auto` is unfriendly towards the readers of your code). Similar in a few more places.
585	we don't tend to have a space before the ';' here. Did you try clang-format on your patch?
621	This only gives you the declared/minimum number of operands, there may be more.
714–715	Isn't this check superfluous? I would expect this to be true anyway if the instruction is inside `MBB` which is checked later.
test/CodeGen/X86/twoaddr-recurrence.ll
1–2	Did you try to create a .mir test so the pass can be tested in isolation? If yes and it didn't work, could you tell us why so we can improve the .mir testing.

Addressing comments from @MatzeB.

Harbormaster completed remote builds in B6485: Diff 99207.May 16 2017, 2:42 PM

@MatzeB Thank you for your comments. I'm collection compile time numbers now with spec2006. For our internal benchmark, which takes about ~40min to compile, the compile time difference was negligible.

I don't think this patch can be implemented without loop info, because this is formulated around the loop recurrence pattern. It might be possible but will eventually require more code to produce the information provided by MachineLoopInfo. I'm curious what is your biggest concern about using MachineLoopInfo in this pass, as it is used in other passes as well.

I agree on you that maintainability vs supporting more patterns is a hard trade-off. IMHO, if we want to generate better performing code, adding more code to support more patterns would be unavoidable, unless we re-architecturing register allocation related passes. I think this patch adds more lines of code because this handles the pattern based on the loop structure for the first time.

I tried .mir test, but the output is still not two-address format. I tried

./bin/llc -mtriple=x86_64-- -run-pass=twoaddressinstruction -o - ./input.mir

(input.mir is from -stop-before=twoaddressinstruction), and the output still has instructions such as

%vreg10<def,tied1> = ADD32rr %vreg1<kill,tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1,%vreg0
.

lib/CodeGen/TwoAddressInstructionPass.cpp
714–715	This check is here to filter the definitions that are outside of Loop. The check below is to see if the definition inside the Loop belongs to MBB.

The problem doesn't appear while we're in SSA [...]

Why is that?
The marking of tie operand is already there, following phis shouldn't be an issue and would eliminate the whole data flow processing.

What am I missing?

Side question; if several recurrence chains share some operand, how do you pick the most profitable one?

@qcolombet Actually you're right. Having tied operands, this can be done within SSA. Do you have a suggestion for a better place to implement this? I thought what isRevCopyChain does is most close to what this patch does.

I'm afraid that I couldn't quite understand your side question. This patch commutes operands of an instruction if the instruction is inside the recurrence that meets certain conditions. In line 846-850, it checks if the recurrence pattern is formulated around regC, and returns true to commute operands. If not, it checks same for regB, and returns false to not to commute if it observes the pattern. If the pattern is observed for both regB and regC, it is possible that not to commute the operands generates better result, but as the logic checks regC first, it'll commute the operands.

@MatzeB Below is a compile time difference in percentage for spec2006

400.perlbench 1.80
401.bzip2 0.66
403.gcc 0.77
429.mcf 2.61
433.milc -1.70
444.namd 0.31
445.gobmk -0.05
447.dealII -0.22
450.soplex 0.48
453.povray 1.34
456.hmmer 0.49
458.sjeng 1.30
462.libquantum 1.90
464.h264ref 0.52
470.lbm -3.01
471.omnetpp 2.50
473.astar 0.27
482.sphinx3 1.01
483.xalncbmk -0.47

Hi Taewook,

Thank you for working on it. The testcase you gave is interesting. My previous work in isRevCopyChain is quite limited and only handle recurrence within a BB, so I like to see a more general fix about the redundent copy problem caused by recurrence.

A problem of the patch I can see is that it uses dataflow to track the recurrence, which I think may not be enough. Think about the case below:

r3 = r4;
...
r1 = r2 + r3;
...
r5 = load(r1)
r4 = r5;

r1-->r5-->r4-->r3-->r1, it is a recurrence loop according to the algorithm, however, we don't have problem to coalesce r3 with r4 and coalesce r4 and r5, because r5 and r1 don't have to be allocated to the same register.

So the def-src constructing an recurrence chain interesting for us should only contains def and src which are in a tied operand group, or which are in the same copy. In other word, the recurrence chain can only cause redundent copy problem when all the operands on the chain have to be allocated to the same register. That kind of recurrence loop is what we are interested in.

In addition, I don't think the recurrence chain has to be a strict cycle. Just like your motivational testcase:
vreg0 = vreg13
...
vreg2 = vreg15
vreg10 = add vreg1, vreg0
vreg3 = add vreg2, vreg10
vreg13 = vreg3

Here the recurrence chain starting from vreg0 is: vreg0-->vreg13-->vreg3-->vreg2. This is not a strict cycle, however, vreg2 and vreg0 already have interference with each other. It means it is impossible to allocate all the vregs in the recurrence chain to the same physical register, and some copy has to be kept.

I think those are the two issues that have to be addressed for a more general solution to the recurrence problem.

Thanks,
Wei.

@wmi Thank you for your reply. I agree on you that we should consider tied operand group, and as @qcolombet mentioned in the previous comment, if tie operand information is already available before this pass, I'd like to discuss where would be the best place to implement this.

I'm afraid I couldn't understand your second point. In the example there is a recurrence cycle of vreg0-->vreg13-->vreg3-->vreg10-->vreg0 as well, and what this patch does is commuting (vreg0, vreg1) and (vreg2, vreg10) to remove the copy. Did I miss something?

Hi Taewook,

@qcolombet Actually you're right. Having tied operands, this can be done within SSA. Do you have a suggestion for a better place to implement this? I thought what isRevCopyChain does is most close to what this patch does.

I would investigate this as part of the PeepholeOptimizer. It already massages copies.

Regarding my second question, answered it, with that part:

If the pattern is observed for both regB and regC, it is possible that not to commute the operands generates better result, but as the logic checks regC first, it'll commute the operands.

Cheers,
-Quentin

Reimplement the feature in peephole optimizer. This patch makes the values in the recurrence cycle to be tied to each other if possible.

Add missing reference.

FYI, below is the compile difference in percentage for spec2006:

400.perlbench 0.23
401.bzip2 -0.01
403.gcc -0.01
429.mcf 3.35
433.milc 1.11
444.namd 0.27
445.gobmk 0.20
447.dealII 0.49
450.soplex 1.02
453.povray 0.55
456.hmmer 0.19
458.sjeng 1.37
462.libquantum 1.76
464.h264ref 1.01
470.lbm -0.98
471.omnetpp 1.61
473.astar 0.95
482.sphinx3 0.50
483.xalncbmk -0.89

ping. thanks!

Sorry for the delay. The rewrite based on SSA looks much cleaner now. About the algorithm, IIUC it tries to find loop based on define-use of tied operand or operand commutable with tied operand. However, I still have concern that the method can increase redundent copy sometimes.

Here is a testcase:

A = phi(B, O)
M = load addr1.
C = M + A C and M are tied operands.
store A to addr2.
N = load from addr3
B = N + C B and N are tied operands.

Without the patch, we can allocate the above testcase without copy -- allocate A, B and N to physreg1 and allocate M, C to physreg2.

With the patch, after it changes C = M + A to C = A + M, a copy needs to be generated because C and A are tied operands but C's live range and A's live range are overlapped. It is impossible to allocate C and A to the same physreg without extra copy.

Actually, I think we don't have to explore candidate cycle based on operand commutable with tied operand. An algorithm in my mind is to find cycle which are only composed of tied operands, copies, and phi in loop header, then try to look at if there is any live range overlap between any two operands inside of the cycle. Only when there is live range overlap, we will consider to commute some operands.

@wmi Not at all! Thanks for your comments.

You're right. I should've considered the case where the operands involved in the recurrence cycle live beyond the use in the recurrence cycle. You're suggestion sounds great, but my concern is that computing live range might be expensive because live interval analysis is performed after peephole optimization, What do you think of limit it to the case of operands involved in the recurrence cycle have no use outside of the cycle?

Addressing @wmi's concern by limiting the targets to the recurrence cycles that only the last instruction of the recurrence (that feeds the PHI instruction) can have uses outside of the recurrence. This is not an ideal solution yet, and more fundamental solution (such as having recurrence optimization as a separate pass and/or using live range analysis for it) should follow. But still I think it is worth to have it here.

Test cases are updated as well to use MIR.

Ping. Thanks!

For the example below, findTargetRecurrence starts from r2 and r3 to search a def reg equals to r1. There are a lot of possibilities to explore. That is where the complexity of findTargetRecurrence comes from.

r1 = phi(r2, r3)
r4 = r5 + r1;
r6 = r7 + r4;
r3 = r6 + r8;

After adding the constraint to the recurrence cycle, since all the instructions other than the last one in the recurrence loop should have only one use, it will be easy to start from r1 and search forward. I guess findTargetRecurrence can be simplified a lot if the backward searching is replaced by forward searching, right?

Addressing comments from @wmi. Thank you for the suggestion!

minor fix.

wmi added inline comments.Jun 29 2017, 11:42 AM

lib/CodeGen/PeepholeOptimizer.cpp
1549–1552 ↗	(On Diff #104697)	For findTargetRecurrence, RCs will at most contain one RC, right? It may be better to remove RCs, change the return value of findTargetRecurrence to bool type, and use it to indicate whether a RC is found.

@wmi Good call! I fixed the code per your suggestion. Thanks!

LGTM.

This revision is now accepted and ready to land.Jun 29 2017, 2:06 PM

Closed by commit rL306758: Remove redundant copy in recurrences (authored by twoh). · Explain WhyJun 29 2017, 4:11 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

TwoAddressInstructionPass.cpp

212 lines

test/

CodeGen/

X86/

twoaddr-recurrence.ll

45 lines

Diff 94549

lib/CodeGen/TwoAddressInstructionPass.cpp

Show All 31 Lines
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/CodeGen/LiveIntervalAnalysis.h"		#include "llvm/CodeGen/LiveIntervalAnalysis.h"
#include "llvm/CodeGen/LiveVariables.h"		#include "llvm/CodeGen/LiveVariables.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
		#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/MC/MCInstrItineraries.h"		#include "llvm/MC/MCInstrItineraries.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
Show All 15 Lines
STATISTIC(NumReSchedDowns, "Number of instructions re-scheduled down");		STATISTIC(NumReSchedDowns, "Number of instructions re-scheduled down");

// Temporary flag to disable rescheduling.		// Temporary flag to disable rescheduling.
static cl::opt<bool>		static cl::opt<bool>
EnableRescheduling("twoaddr-reschedule",		EnableRescheduling("twoaddr-reschedule",
cl::desc("Coalesce copies by rescheduling (default=true)"),		cl::desc("Coalesce copies by rescheduling (default=true)"),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

		// Limit the number of dataflow edges to traverse when evaluating the benefit
		// of commuting operands.
		static cl::opt<unsigned> MaxDataFlowEdge(
		"dataflow-edge-limit", cl::Hidden, cl::init(3),
		cl::desc("Maximum number of dataflow edges to traverse when evaluating "
		"the benefit of commuting operands"));

namespace {		namespace {
class TwoAddressInstructionPass : public MachineFunctionPass {		class TwoAddressInstructionPass : public MachineFunctionPass {
MachineFunction *MF;		MachineFunction *MF;
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;
const TargetRegisterInfo *TRI;		const TargetRegisterInfo *TRI;
const InstrItineraryData *InstrItins;		const InstrItineraryData *InstrItins;
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;
LiveVariables *LV;		LiveVariables *LV;
LiveIntervals *LIS;		LiveIntervals *LIS;
		MachineLoopInfo *MLI;
AliasAnalysis *AA;		AliasAnalysis *AA;
CodeGenOpt::Level OptLevel;		CodeGenOpt::Level OptLevel;

// The current basic block being processed.		// The current basic block being processed.
MachineBasicBlock *MBB;		MachineBasicBlock *MBB;

// Keep track the distance of a MI from the start of the current basic block.		// Keep track the distance of a MI from the start of the current basic block.
DenseMap<MachineInstr*, unsigned> DistanceMap;		DenseMap<MachineInstr*, unsigned> DistanceMap;

// Set of already processed instructions in the current block.		// Set of already processed instructions in the current block.
SmallPtrSet<MachineInstr*, 8> Processed;		using MachineInstrSet = SmallPtrSet<MachineInstr*, 8>;
		MachineInstrSet Processed;

// A map from virtual registers to physical registers which are likely targets		// A map from virtual registers to physical registers which are likely targets
// to be coalesced to due to copies from physical registers to virtual		// to be coalesced to due to copies from physical registers to virtual
// registers. e.g. v1024 = move r0.		// registers. e.g. v1024 = move r0.
DenseMap<unsigned, unsigned> SrcRegMap;		DenseMap<unsigned, unsigned> SrcRegMap;

// A map from virtual registers to physical registers which are likely targets		// A map from virtual registers to physical registers which are likely targets
// to be coalesced to due to copies to physical registers from virtual		// to be coalesced to due to copies to physical registers from virtual
// registers. e.g. r1 = move v1024.		// registers. e.g. r1 = move v1024.
DenseMap<unsigned, unsigned> DstRegMap;		DenseMap<unsigned, unsigned> DstRegMap;

bool sink3AddrInstruction(MachineInstr *MI, unsigned Reg,		bool sink3AddrInstruction(MachineInstr *MI, unsigned Reg,
MachineBasicBlock::iterator OldPos);		MachineBasicBlock::iterator OldPos);

bool isRevCopyChain(unsigned FromReg, unsigned ToReg, int Maxlen);		bool isRevCopyChain(unsigned FromReg, unsigned ToReg, int Maxlen);

bool noUseAfterLastDef(unsigned Reg, unsigned Dist, unsigned &LastDef);		bool noUseAfterLastDef(unsigned Reg, unsigned Dist, unsigned &LastDef);

		MachineInstr* findReachingDefInMBB(MachineOperand *UseMO);
		void collectReachableLiveIns(MachineInstr *MI, unsigned UseReg,
		SmallDenseSet<unsigned> &LiveIns,
		unsigned MaxLen);

		void collectUsesInMBB(MachineOperand *DefMO, MachineInstrSet &Uses);
		bool isReachableInMBB(unsigned Reg, MachineInstr From, MachineInstr To,
		unsigned MaxLen);

		bool isRecurrenceChain(MachineInstr *MI, unsigned SrcReg, unsigned DstReg,
		int MaxLen);

bool isProfitableToCommute(unsigned regA, unsigned regB, unsigned regC,		bool isProfitableToCommute(unsigned regA, unsigned regB, unsigned regC,
MachineInstr *MI, unsigned Dist);		MachineInstr *MI, unsigned Dist);

bool commuteInstruction(MachineInstr *MI, unsigned DstIdx,		bool commuteInstruction(MachineInstr *MI, unsigned DstIdx,
unsigned RegBIdx, unsigned RegCIdx, unsigned Dist);		unsigned RegBIdx, unsigned RegCIdx, unsigned Dist);

bool isProfitableToConv3Addr(unsigned RegA, unsigned RegB);		bool isProfitableToConv3Addr(unsigned RegA, unsigned RegB);

Show All 40 Lines	void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addUsedIfAvailable<LiveVariables>();		AU.addUsedIfAvailable<LiveVariables>();
AU.addPreserved<LiveVariables>();		AU.addPreserved<LiveVariables>();
AU.addPreserved<SlotIndexes>();		AU.addPreserved<SlotIndexes>();
AU.addPreserved<LiveIntervals>();		AU.addPreserved<LiveIntervals>();
AU.addPreservedID(MachineLoopInfoID);		AU.addPreservedID(MachineLoopInfoID);
AU.addPreservedID(MachineDominatorsID);		AU.addPreservedID(MachineDominatorsID);
		AU.addPreserved<MachineLoopInfo>();
		MatzeBUnsubmitted Not Done Reply Inline Actions This makes no sense! You are starting to use the machine loop info, all you do here is mark it preserved a 2nd time; There was already addPreservedID(MachineLoopInfoID) which had the same effect. MatzeB: This makes no sense! You are starting to use the machine loop info, all you do here is mark it…
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}

/// Pass entry point.		/// Pass entry point.
bool runOnMachineFunction(MachineFunction&) override;		bool runOnMachineFunction(MachineFunction&) override;
};		};
} // end anonymous namespace		} // end anonymous namespace

▲ Show 20 Lines • Show All 361 Lines • ▼ Show 20 Lines
regsAreCompatible(unsigned RegA, unsigned RegB, const TargetRegisterInfo *TRI) {		regsAreCompatible(unsigned RegA, unsigned RegB, const TargetRegisterInfo *TRI) {
if (RegA == RegB)		if (RegA == RegB)
return true;		return true;
if (!RegA \|\| !RegB)		if (!RegA \|\| !RegB)
return false;		return false;
return TRI->regsOverlap(RegA, RegB);		return TRI->regsOverlap(RegA, RegB);
}		}

// Returns true if Reg is equal or aliased to at least one register in Set.		/// Returns true if Reg is equal or aliased to at least one register in Set.
static bool regOverlapsSet(const SmallVectorImpl<unsigned> &Set, unsigned Reg,		static bool regOverlapsSet(const SmallVectorImpl<unsigned> &Set, unsigned Reg,
const TargetRegisterInfo *TRI) {		const TargetRegisterInfo *TRI) {
for (unsigned R : Set)		for (unsigned R : Set)
if (TRI->regsOverlap(R, Reg))		if (TRI->regsOverlap(R, Reg))
return true;		return true;

return false;		return false;
}		}

		/// Find a reaching definition for UseMO if there's one in MBB
		MachineInstr* TwoAddressInstructionPass::findReachingDefInMBB(
		MachineOperand *UseMO) {
		MatzeBUnsubmitted Not Done Reply Inline Actions Use references for things that cannot be nullptr. Similar for a few other places here. MatzeB: Use references for things that cannot be nullptr. Similar for a few other places here.
		assert(UseMO->isUse() && UseMO->isReg());
		auto Inst = UseMO->getParent();
		auto Reg = UseMO->getReg();
		MatzeBUnsubmitted Not Done Reply Inline Actions Don't use `auto` when the type isn't immediately obvious just by looking at the line. (`auto` is unfriendly towards the readers of your code). Similar in a few more places. MatzeB: Don't use `auto` when the type isn't immediately obvious just by looking at the line. (`auto`…

		// Find the last definition of Reg before Inst.
		MachineBasicBlock::reverse_iterator MI(Inst);
		MI = std::next(MI);
		for ( ; MI != MBB->rend() ; ++MI)
		MatzeBUnsubmitted Not Done Reply Inline Actions we don't tend to have a space before the ';' here. Did you try clang-format on your patch? MatzeB: we don't tend to have a space before the ';' here. Did you try clang-format on your patch?
		if ((*MI).findRegisterDefOperandIdx(Reg) != -1)
		return &*MI;
		return nullptr;
		}

		/// Collect live-ins which are backward-reachable from UseReg. MaxLen limits
		/// the maximum number of dataflow edges.
		void TwoAddressInstructionPass::collectReachableLiveIns(
		MachineInstr *MI, unsigned UseReg, SmallDenseSet<unsigned> &LiveIns,
		unsigned MaxLen) {
		MachineOperand *UseMO = MI->findRegisterUseOperand(UseReg);
		if (!UseMO)
		return;

		SmallVector<std::pair<MachineOperand*, unsigned>, 4> WorkList;
		WorkList.push_back(std::make_pair(UseMO, 0));

		while (!WorkList.empty()) {
		MachineOperand *MO;
		unsigned Dist;
		std::tie(MO, Dist) = WorkList.back();
		WorkList.pop_back();

		MachineInstr *DefMI = findReachingDefInMBB(MO);
		if (!DefMI) {
		// There's no local reaching definition for MO, which means that MO is a
		// live-in.
		LiveIns.insert(MO->getReg());
		continue;
		}

		// Give up if Dist is greater than MaxLen.
		if (Dist > MaxLen)
		continue;

		auto OpsNum = DefMI->getDesc().getNumOperands();
		MatzeBUnsubmitted Not Done Reply Inline Actions This only gives you the declared/minimum number of operands, there may be more. MatzeB: This only gives you the declared/minimum number of operands, there may be more.
		auto SrcOpIdx = DefMI->getDesc().getNumDefs();
		for ( ; SrcOpIdx < OpsNum ; ++SrcOpIdx) {
		auto &SrcMO = DefMI->getOperand(SrcOpIdx);
		if (SrcMO.isReg())
		WorkList.push_back(std::make_pair(&SrcMO, Dist+1));
		}
		}
		}

		/// Collect instructions that uses DefMO inside MBB.
		void TwoAddressInstructionPass::collectUsesInMBB(MachineOperand *DefMO,
		MachineInstrSet &Uses) {
		assert(DefMO->isDef() && DefMO->isReg());
		auto Inst = DefMO->getParent();
		auto Reg = DefMO->getReg();

		for (auto &Use : MRI->use_instructions(Reg)) {
		if (Use.isDebugValue() \|\| Use.getParent() != MBB)
		continue;
		auto *UseMO = Use.findRegisterUseOperand(Reg);
		if (Inst == findReachingDefInMBB(UseMO))
		Uses.insert(&Use);
		}
		}

		/// Check if there is a dataflow from Reg, which is defined by From, to To
		/// inside MBB. MaxLen limits the maximum number of dataflow edges.
		bool TwoAddressInstructionPass::isReachableInMBB(
		unsigned Reg, MachineInstr From, MachineInstr To, unsigned MaxLen) {
		MachineOperand *MO = From->findRegisterDefOperand(Reg);
		assert(MO && "Reg is not defined by From");

		SmallVector<std::pair<MachineOperand*, unsigned>, 4> WorkList;
		WorkList.push_back(std::make_pair(MO, 0));

		while (!WorkList.empty()) {
		unsigned Dist;
		std::tie(MO, Dist) = WorkList.back();
		WorkList.pop_back();

		MachineInstrSet UseMIs;
		collectUsesInMBB(MO, UseMIs);

		for (auto *UseMI : UseMIs) {
		if (UseMI == To)
		return true;

		// Give up if Dist is greater than MaxLen.
		if (Dist > MaxLen)
		continue;

		for (auto &Def : UseMI->defs())
		if (Def.isReg())
		WorkList.push_back(std::make_pair(&Def, Dist+1));
		}
		}

		return false;
		}

		/// Return true if DstReg and SrcReg are the part of a recurrence chain, which
		/// ends with a copy instruction.
		bool TwoAddressInstructionPass::isRecurrenceChain(
		MachineInstr *MI, unsigned SrcReg, unsigned DstReg, int MaxLen) {
		// Collect backward-reachable live-ins from SrcReg.
		SmallDenseSet<unsigned> ReachableLiveIns;
		collectReachableLiveIns(MI, SrcReg, ReachableLiveIns, MaxLen);
		if (ReachableLiveIns.empty())
		return false;

		// For each reachable live-ins, check following:
		// - The live-in has a unique def UD.
		// - UD is a copy.
		// - UD is in the header of the loop that MBB belongs to.
		// - The source operand of UD comes from another copy, which is in MBB.
		// - There is a dataflow from DstReg to the source operand of UD.
		for (auto LiveIn : ReachableLiveIns) {
		// Check if there's a unique def of LiveIn, which is a copy.
		MachineInstr *UD = MRI->getUniqueVRegDef(LiveIn);
		if (!UD \|\| !UD->isCopy())
		continue;

		// Check if the definition is in the loop header of the loop for MBB.
		MachineBasicBlock *UDBB = UD->getParent();
		MachineLoop *Loop = MLI->getLoopFor(MBB);
		if (!Loop \|\| Loop != MLI->getLoopFor(UDBB) \|\|!MLI->isLoopHeader(UDBB))
		continue;

		// Check if the only definition of the source operand of Def instruction
		// inside Loop is in MBB.
		MachineInstr *UDSrcDef = nullptr;
		for (auto &Def: MRI->def_instructions(UD->getOperand(1).getReg())) {
		if (MLI->getLoopFor(Def.getParent()) != Loop)
		continue;
		MatzeBUnsubmitted Not Done Reply Inline Actions Isn't this check superfluous? I would expect this to be true anyway if the instruction is inside `MBB` which is checked later. MatzeB: Isn't this check superfluous? I would expect this to be true anyway if the instruction is…
		twohAuthorUnsubmitted Not Done Reply Inline Actions This check is here to filter the definitions that are outside of Loop. The check below is to see if the definition inside the Loop belongs to MBB. twoh: This check is here to filter the definitions that are outside of Loop. The check below is to…

		// If UDSrcDef is not nullptr, there are multiple definitions for the
		// source operand of UD inside Loop.
		if (UDSrcDef) {
		UDSrcDef = nullptr;
		break;
		}
		UDSrcDef = &Def;
		}

		// Check if UDSrcDef is a copy, and is in MBB.
		if (!UDSrcDef \|\| !UDSrcDef->isCopy() \|\| UDSrcDef->getParent() != MBB)
		continue;

		// Check if there is a dataflow from DstReg to UDSrcDef
		if (isReachableInMBB(DstReg, MI, UDSrcDef, MaxLen))
		return true;
		}

		return false;
		}

/// Return true if it's potentially profitable to commute the two-address		/// Return true if it's potentially profitable to commute the two-address
/// instruction that's being processed.		/// instruction that's being processed.
bool		bool
TwoAddressInstructionPass::		TwoAddressInstructionPass::
isProfitableToCommute(unsigned regA, unsigned regB, unsigned regC,		isProfitableToCommute(unsigned regA, unsigned regB, unsigned regC,
MachineInstr *MI, unsigned Dist) {		MachineInstr *MI, unsigned Dist) {
if (OptLevel == CodeGenOpt::None)		if (OptLevel == CodeGenOpt::None)
return false;		return false;
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	isProfitableToCommute(unsigned regA, unsigned regB, unsigned regC,
// to eliminate an otherwise unavoidable copy.		// to eliminate an otherwise unavoidable copy.
// FIXME:		// FIXME:
// We can extend the logic further: If an pair of operands in an insn has		// We can extend the logic further: If an pair of operands in an insn has
// been merged, the insn could be regarded as a virtual copy, and the virtual		// been merged, the insn could be regarded as a virtual copy, and the virtual
// copy could also be used to construct a copy chain.		// copy could also be used to construct a copy chain.
// To more generally minimize register copies, ideally the logic of two addr		// To more generally minimize register copies, ideally the logic of two addr
// instruction pass should be integrated with register allocation pass where		// instruction pass should be integrated with register allocation pass where
// interference graph is available.		// interference graph is available.
if (isRevCopyChain(regC, regA, 3))		if (isRevCopyChain(regC, regA, MaxDataFlowEdge))
return true;		return true;

if (isRevCopyChain(regB, regA, 3))		if (isRevCopyChain(regB, regA, MaxDataFlowEdge))
return false;		return false;

		// Check if there is a recurrence that a redundant copy can be avoided by
		// commuting operands. For example, if we have a recurrence of
		// Header:
		// %vreg1 = MOV %vreg0
		// Latch:
		// %vreg3 = ADD %vreg2, %vreg1
		// %vreg0 = MOV %vreg3
		// swithching %vreg2 and %vreg1 avoids a redundant copy at the end by feeding
		// the accumulation result directly back to the header.
		if (MLI) {
		if (isRecurrenceChain(MI, regC, regA, MaxDataFlowEdge))
		return true;

		if (isRecurrenceChain(MI, regB, regA, MaxDataFlowEdge))
		return false;
		}

// Since there are no intervening uses for both registers, then commute		// Since there are no intervening uses for both registers, then commute
// if the def of regC is closer. Its live interval is shorter.		// if the def of regC is closer. Its live interval is shorter.
return LastDefB && LastDefC && LastDefC > LastDefB;		return LastDefB && LastDefC && LastDefC > LastDefB;
}		}

/// Commute a two-address instruction and update the basic block, distance map,		/// Commute a two-address instruction and update the basic block, distance map,
/// and live variables if needed. Return true if it is successful.		/// and live variables if needed. Return true if it is successful.
bool TwoAddressInstructionPass::commuteInstruction(MachineInstr *MI,		bool TwoAddressInstructionPass::commuteInstruction(MachineInstr *MI,
▲ Show 20 Lines • Show All 968 Lines • ▼ Show 20 Lines	bool TwoAddressInstructionPass::runOnMachineFunction(MachineFunction &Func) {
MF = &Func;		MF = &Func;
const TargetMachine &TM = MF->getTarget();		const TargetMachine &TM = MF->getTarget();
MRI = &MF->getRegInfo();		MRI = &MF->getRegInfo();
TII = MF->getSubtarget().getInstrInfo();		TII = MF->getSubtarget().getInstrInfo();
TRI = MF->getSubtarget().getRegisterInfo();		TRI = MF->getSubtarget().getRegisterInfo();
InstrItins = MF->getSubtarget().getInstrItineraryData();		InstrItins = MF->getSubtarget().getInstrItineraryData();
LV = getAnalysisIfAvailable<LiveVariables>();		LV = getAnalysisIfAvailable<LiveVariables>();
LIS = getAnalysisIfAvailable<LiveIntervals>();		LIS = getAnalysisIfAvailable<LiveIntervals>();
		MLI = getAnalysisIfAvailable<MachineLoopInfo>();
AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
OptLevel = TM.getOptLevel();		OptLevel = TM.getOptLevel();

bool MadeChange = false;		bool MadeChange = false;

DEBUG(dbgs() << "******** REWRITING TWO-ADDR INSTRS ********\n");		DEBUG(dbgs() << "******** REWRITING TWO-ADDR INSTRS ********\n");
DEBUG(dbgs() << "********** Function: "		DEBUG(dbgs() << "********** Function: "
<< MF->getName() << '\n');		<< MF->getName() << '\n');
▲ Show 20 Lines • Show All 185 Lines • Show Last 20 Lines

test/CodeGen/X86/twoaddr-recurrence.ll

This file was added.

				; RUN: llc < %s -march=x86-64 \| FileCheck %s
				;
				MatzeBUnsubmitted Not Done Reply Inline Actions Did you try to create a .mir test so the pass can be tested in isolation? If yes and it didn't work, could you tell us why so we can improve the .mir testing. MatzeB: Did you try to create a .mir test so the pass can be tested in isolation? If yes and it didn't…
				; CHECK: bb7
				; CHECK: cmpl{{ +}}$10
				; CHECK-NOT: movl
				; CHECK: jl
				;
				; Check that there's no redundant copy at the end of bb7 to feed %vreg3 to bb1.

				define i32 @foo(i32 %a) {
				bb0:
				br label %bb1

				bb1: ; preds = %bb0, %bb7
				%vreg0 = phi i32 [ 0 , %bb0 ], [ %vreg3, %bb7 ]
				%cond0 = icmp eq i32 %a, 0
				br i1 %cond0, label %bb2, label %bb3

				bb2: ; preds = %bb1
				br label %bb4

				bb3: ; preds = %bb1
				br label %bb4

				bb4: ; preds = %bb2, %bb3
				%vreg5 = phi i32 [ 1, %bb2 ], [ 2, %bb3 ]
				%cond1 = icmp eq i32 %vreg5, 0
				br i1 %cond1, label %bb5, label %bb6

				bb5: ; preds = %bb4
				br label %bb7

				bb6: ; preds = %bb4
				br label %bb7

				bb7: ; preds = %bb5, %bb6
				%vreg1 = phi i32 [ 1, %bb5 ], [ 2, %bb6 ]
				%vreg2 = add i32 %vreg5, %vreg0
				%vreg3 = add i32 %vreg1, %vreg2
				%cond2 = icmp slt i32 %vreg3, 10
				br i1 %cond2, label %bb1, label %bb8

				bb8: ; preds = %bb7
				ret i32 0
				}