This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/CodeGen/
-
CodeGen/
-
PeepholeOptimizer.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
peephole-na-phys-copy-folding.ll

Differential D15157

CodeGen peephole: fold redundant phys reg copies
ClosedPublic

Authored by jfb on Dec 2 2015, 9:20 AM.

Download Raw Diff

Details

Reviewers

qcolombet

Commits

rG1ac69947b6f2: CodeGen peephole: fold redundant phys reg copies
rL254665: CodeGen peephole: fold redundant phys reg copies

Summary

Code generation often exposes redundant physical register copies through
virtual registers such as:

%vreg = COPY %PHYSREG
...
%PHYSREG = COPY %vreg

There are cases where no intervening clobber of %PHYSREG occurs, and the
later copy could therefore be removed. In some cases this further allows
us to remove the initial copy.

This patch contains a motivating example which comes from the x86 build
of Chrome, specifically cc::ResourceProvider::UnlockForRead uses
libstdc++'s implementation of hash_map. That example has two tests live
at the same time, and after machine sinking LLVM has confused itself
enough and things spilling EFLAGS is a great idea even though it's
never restored and the comparison results are both live.

Before this patch we have:

DEC32m %RIP, 1, %noreg, <ga:@L>, %noreg, %EFLAGS<imp-def>
%vreg1<def> = COPY %EFLAGS; GR64:%vreg1
%EFLAGS<def> = COPY %vreg1; GR64:%vreg1
JNE_1 <BB#1>, %EFLAGS<imp-use>

Both copies are useless. This patch tries to eliminate the later copy in
a generic manner.

dec is especially confusing to LLVM when compared with sub.

I wrote this patch to treat all physical registers generically, but only
remove redundant copies of non-allocatable physical registers because
the allocatable ones caused issues (e.g. when calling conventions weren't
properly modeled) and should be handled later by the register allocator
anyways.

This works fine when there are intervening calls between flag generation and
flag usage (these calls clobber flags):

test/CodeGen/X86/cmpxchg-clobber-flags.ll

The following tests used to failed when the patch also replaced allocatable
registers:

CodeGen/X86/StackColoring.ll
CodeGen/X86/avx512-calling-conv.ll
CodeGen/X86/copy-propagation.ll
CodeGen/X86/inline-asm-fpstack.ll
CodeGen/X86/musttail-varargs.ll
CodeGen/X86/pop-stack-cleanup.ll
CodeGen/X86/preserve_mostcc64.ll
CodeGen/X86/tailcallstack64.ll
CodeGen/X86/this-return-64.ll

Note that all other backends' tests pass.

Diff Detail

Repository: rL LLVM

Event Timeline

jfb updated this revision to Diff 41642.Dec 2 2015, 9:20 AM

jfb retitled this revision from to CodeGen peephole: fold redundant phys reg copies.

jfb updated this object.

jfb added a subscriber: llvm-commits.

There's further complication around calling conventions, where it seems that the x86 backend doesn't fully model xmm registers as being caller-saved. This example:

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@I32 = external global i32

declare i32* @foo(i32*, i64)

declare i32* @bar(i32, i32)

declare double @D()

declare i32* @P32()

declare i32* @baz(double)

define hidden i32* @ExpressionFromLiteral(i32 %token) {
entry:
  switch i32 %token, label %return [
    i32 83, label %bb0
    i32 82, label %bb1
  ]

bb0:
  %I32.loaded = load i32, i32* @I32
  %call.foo = tail call i32* @foo(i32* nonnull @I32, i64 48)
  %call.bar = tail call i32* @bar(i32 %I32.loaded, i32 %I32.loaded)
  %call.foo.gep.40 = getelementptr inbounds i32, i32* %call.foo, i64 40
  %ptr.foo = bitcast i32* %call.foo.gep.40 to i32**
  store i32* %call.bar, i32** %ptr.foo
  br label %return

bb1:
  %call.D = tail call double @D()
  %call.P32 = tail call i32* @P32()
  %call.baz = tail call i32* @baz(double %call.D)
  %call.P32.gep.40 = getelementptr inbounds i32, i32* %call.P32, i64 40
  %ptr.P32 = bitcast i32* %call.P32.gep.40 to i32**
  store i32* %call.baz, i32** %ptr.P32
  br label %return

return:
  %retval.0 = phi i32* [ %call.P32, %bb1 ], [ %call.foo, %bb0 ], [ null, %entry ]
  ret i32* %retval.0
}

Confuses LLVM around the @D call because the double is returned in xmm0 and is live across a call, and passed to the next one. The code starts off as:

BB#2: derived from LLVM BB %bb1
    Predecessors according to CFG: BB#0
	ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
	CALL64pcrel32 <ga:@D>, <regmask %BH %BL %BP %BPL %BX %EBP %EBX %RBP %RBX %R12 %R13 %R14 %R15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W %R15W>, %RSP<imp-use>, %RSP<imp-def>, %XMM0<imp-def>
	ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
	%vreg8<def> = COPY %XMM0; FR64:%vreg8
	ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
	CALL64pcrel32 <ga:@P32>, <regmask %BH %BL %BP %BPL %BX %EBP %EBX %RBP %RBX %R12 %R13 %R14 %R15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W %R15W>, %RSP<imp-use>, %RSP<imp-def>, %RAX<imp-def>
	ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
	%vreg9<def> = COPY %RAX; GR64:%vreg9
	%vreg1<def> = COPY %vreg9; GR64:%vreg1,%vreg9
	ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
	%XMM0<def> = COPY %vreg8; FR64:%vreg8
	CALL64pcrel32 <ga:@baz>, <regmask %BH %BL %BP %BPL %BX %EBP %EBX %RBP %RBX %R12 %R13 %R14 %R15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W %R15W>, %RSP<imp-use>, %XMM0<imp-use>, %RSP<imp-def>, %RAX<imp-def>
	ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
	%vreg10<def> = COPY %RAX; GR64:%vreg10
	MOV64mr %vreg9, 1, %noreg, 160, %noreg, %vreg10; mem:ST8[%ptr.P32] GR64:%vreg9,%vreg10
    Successors according to CFG: BB#3(?%)

Note how xmm0 is implicitly defined, then (with available information) needlessly copied to vreg8 and back, then implicitly used. My patch deletes %XMM0<def> = COPY %vreg8, which then allows %vreg8<def> = COPY %XMM0 to be deleted.

This could be fixed by being either:

Being smarter about calling conventions in my code.
Marking all calls as having modeled side-effects (which would lose some potential optimizations).
Marking xmm registers as clobbered in calls such as @P32 above.

That problem doesn't really need to be tackled until I figure out how to fix the other one, though. It's just extra :-)

Hi,

We purposely don’t remove such copies because it over constrains the coloring later on.
The idea is that if the coalescing is possible, then the register allocator should prefer it. However, I reckon that in your motivating example that’s not going to happen since this is a cross register bank copy.

Anyway, the peephole optimizer happens too early IMO for this to be a reasonable solution. I would instead encourage you to look at fixing this into the CopyPropagation pass.

Cheers,
-Quentin

This revision now requires changes to proceed.Dec 2 2015, 9:40 AM

Note: The transformation would make sense in the peephole optimizer if we do it only for not allocatable registers.

In D15157#300551, @qcolombet wrote:

Note: The transformation would make sense in the peephole optimizer if we do it only for not allocatable registers.

Interesting proposal, that does seem to make sense. Let me look into it, thanks!

NFC: rename PhysCopy to ConstPhysCopy

First step in applying qcolombet's suggestion of only applying this optimization
to constant physical register (he suggested to affect non-allocatable only, but
that would include pinned ones too, whereas constant physical registers seem
more conservative).

Restrict to non-allocatable physical registers.

I tried using MachineRegisterInfo::isConstantPhysReg instead but that's too
conservative and doesn't include the main motivator for this patch: EFLAGS.

This implements the change qcolombet suggested and passes all x86 tests,
including the new test I added incdec-and-branch.ll.

This now passes all tests with ninja check on a build with all backends, and seems to do what I want! I'll clean up the code a bit, test it on Chrome some more (including the issue I posted above).

Clean up code.

I cleaned up the code, it passes all the tests and does what I wanted to achieve for the one Chrome usecase. @qcolombet would you mind taking another look?

jfb updated this object.Dec 2 2015, 2:37 PM

Hi,

Looks almost good.
Two main remarks:

Check for regmask in the operands as well
Try to add more test cases for coverage

Thanks,
-Quentin

lib/CodeGen/PeepholeOptimizer.cpp
1427 ↗	(On Diff #41680)	Put a message in the assert, even an obvious one :).
1537 ↗	(On Diff #41680)	I’ll make a function out of it and you have to check the reg mask operand as well. For instance, if we have a pure (side effect free) function call, you may clobber the register and we will miss it.
test/CodeGen/X86/incdec-and-branch.ll
16 ↗	(On Diff #41680)	Could we have some “positive” testing as well, i.e. CHECK lines not just CHECK-NOT? Also, I would rename the file into peephole-na-copy-folding or something and add more test cases where you check the inline asm case, the definition in the middle, etc. Anyhow, you get it, something that improves the coverage. I understand this can be difficult to come with additional test cases, just give it a try and if you can’t find some, then you can’t :).

Address qcolombet's comments:

Rename test file.
Add more tests.
Make the tests positive.
Document asserts.
Also test for reg mask clobbering.

Comments addressed.

lib/CodeGen/PeepholeOptimizer.cpp
1537 ↗	(On Diff #41680)	I added regmask. I'm not sure I understand what you mean with "pure". Do you mean `readonly`? That still has to observe the calling convention and declare its clobbers, no?

LGTM.

Thanks,
Q.

lib/CodeGen/PeepholeOptimizer.cpp
1537 ↗	(On Diff #41812)	Yes, I mean readonly :). Yes, it has to declare its clobbers. What I meant was that if we do not check for the regmask operand, the check "MI->hasUnmodeledSideEffects()” was not enough to ensure we won’t clobber the registers.
test/CodeGen/X86/peephole-na-phys-copy-folding.ll
6 ↗	(On Diff #41812)	s/whne/when

This revision is now accepted and ready to land.Dec 3 2015, 3:34 PM

Typo.

Closed by commit rL254665: CodeGen peephole: fold redundant phys reg copies (authored by jfb). · Explain WhyDec 3 2015, 3:46 PM

This revision was automatically updated to reflect the committed changes.

jfb mentioned this in D15198: X86InstrInfo::copyPhysReg: workaround reg liveness.Dec 3 2015, 5:18 PM

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

PeepholeOptimizer.cpp

144 lines

test/

CodeGen/

X86/

peephole-na-phys-copy-folding.ll

190 lines

Diff 41816

llvm/trunk/lib/CodeGen/PeepholeOptimizer.cpp

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
static cl::opt<bool>		static cl::opt<bool>
DisablePeephole("disable-peephole", cl::Hidden, cl::init(false),		DisablePeephole("disable-peephole", cl::Hidden, cl::init(false),
cl::desc("Disable the peephole optimizer"));		cl::desc("Disable the peephole optimizer"));

static cl::opt<bool>		static cl::opt<bool>
DisableAdvCopyOpt("disable-adv-copy-opt", cl::Hidden, cl::init(false),		DisableAdvCopyOpt("disable-adv-copy-opt", cl::Hidden, cl::init(false),
cl::desc("Disable advanced copy optimization"));		cl::desc("Disable advanced copy optimization"));

		static cl::opt<bool> DisableNAPhysCopyOpt(
		"disable-non-allocatable-phys-copy-opt", cl::Hidden, cl::init(false),
		cl::desc("Disable non-allocatable physical register copy optimization"));

// Limit the number of PHI instructions to process		// Limit the number of PHI instructions to process
// in PeepholeOptimizer::getNextSource.		// in PeepholeOptimizer::getNextSource.
static cl::opt<unsigned> RewritePHILimit(		static cl::opt<unsigned> RewritePHILimit(
"rewrite-phi-limit", cl::Hidden, cl::init(10),		"rewrite-phi-limit", cl::Hidden, cl::init(10),
cl::desc("Limit the length of PHI chains to lookup"));		cl::desc("Limit the length of PHI chains to lookup"));

STATISTIC(NumReuse, "Number of extension results reused");		STATISTIC(NumReuse, "Number of extension results reused");
STATISTIC(NumCmps, "Number of compares eliminated");		STATISTIC(NumCmps, "Number of compares eliminated");
STATISTIC(NumImmFold, "Number of move immediate folded");		STATISTIC(NumImmFold, "Number of move immediate folded");
STATISTIC(NumLoadFold, "Number of loads folded");		STATISTIC(NumLoadFold, "Number of loads folded");
STATISTIC(NumSelects, "Number of selects optimized");		STATISTIC(NumSelects, "Number of selects optimized");
STATISTIC(NumUncoalescableCopies, "Number of uncoalescable copies optimized");		STATISTIC(NumUncoalescableCopies, "Number of uncoalescable copies optimized");
STATISTIC(NumRewrittenCopies, "Number of copies rewritten");		STATISTIC(NumRewrittenCopies, "Number of copies rewritten");
		STATISTIC(NumNAPhysCopies, "Number of non-allocatable physical copies removed");

namespace {		namespace {
class ValueTrackerResult;		class ValueTrackerResult;

class PeepholeOptimizer : public MachineFunctionPass {		class PeepholeOptimizer : public MachineFunctionPass {
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;
const TargetRegisterInfo *TRI;		const TargetRegisterInfo *TRI;
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;
Show All 35 Lines	private:
bool isMoveImmediate(MachineInstr *MI,		bool isMoveImmediate(MachineInstr *MI,
SmallSet<unsigned, 4> &ImmDefRegs,		SmallSet<unsigned, 4> &ImmDefRegs,
DenseMap<unsigned, MachineInstr*> &ImmDefMIs);		DenseMap<unsigned, MachineInstr*> &ImmDefMIs);
bool foldImmediate(MachineInstr MI, MachineBasicBlock MBB,		bool foldImmediate(MachineInstr MI, MachineBasicBlock MBB,
SmallSet<unsigned, 4> &ImmDefRegs,		SmallSet<unsigned, 4> &ImmDefRegs,
DenseMap<unsigned, MachineInstr*> &ImmDefMIs);		DenseMap<unsigned, MachineInstr*> &ImmDefMIs);

/// \brief If copy instruction \p MI is a virtual register copy, track it in		/// \brief If copy instruction \p MI is a virtual register copy, track it in
/// the set \p CopiedFromRegs and \p CopyMIs. If this virtual register was		/// the set \p CopySrcRegs and \p CopyMIs. If this virtual register was
/// previously seen as a copy, replace the uses of this copy with the		/// previously seen as a copy, replace the uses of this copy with the
/// previously seen copy's destination register.		/// previously seen copy's destination register.
bool foldRedundantCopy(MachineInstr *MI,		bool foldRedundantCopy(MachineInstr *MI,
SmallSet<unsigned, 4> &CopiedFromRegs,		SmallSet<unsigned, 4> &CopySrcRegs,
DenseMap<unsigned, MachineInstr*> &CopyMIs);		DenseMap<unsigned, MachineInstr *> &CopyMIs);

		/// \brief Is the register \p Reg a non-allocatable physical register?
		bool isNAPhysCopy(unsigned Reg);

		/// \brief If copy instruction \p MI is a non-allocatable virtual<->physical
		/// register copy, track it in the \p NAPhysToVirtMIs map. If this
		/// non-allocatable physical register was previously copied to a virtual
		/// registered and hasn't been clobbered, the virt->phys copy can be
		/// deleted.
		bool foldRedundantNAPhysCopy(
		MachineInstr *MI,
		DenseMap<unsigned, MachineInstr *> &NAPhysToVirtMIs);

bool isLoadFoldable(MachineInstr *MI,		bool isLoadFoldable(MachineInstr *MI,
SmallSet<unsigned, 16> &FoldAsLoadDefCandidates);		SmallSet<unsigned, 16> &FoldAsLoadDefCandidates);

/// \brief Check whether \p MI is understood by the register coalescer		/// \brief Check whether \p MI is understood by the register coalescer
/// but may require some rewriting.		/// but may require some rewriting.
bool isCoalescableCopy(const MachineInstr &MI) {		bool isCoalescableCopy(const MachineInstr &MI) {
// SubregToRegs are not interesting, because they are already register		// SubregToRegs are not interesting, because they are already register
// coalescer friendly.		// coalescer friendly.
▲ Show 20 Lines • Show All 1,147 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = MI->getDesc().getNumOperands(); i != e; ++i) {
if (!MO.isReg() \|\| MO.isDef())		if (!MO.isReg() \|\| MO.isDef())
continue;		continue;
unsigned Reg = MO.getReg();		unsigned Reg = MO.getReg();
if (!TargetRegisterInfo::isVirtualRegister(Reg))		if (!TargetRegisterInfo::isVirtualRegister(Reg))
continue;		continue;
if (ImmDefRegs.count(Reg) == 0)		if (ImmDefRegs.count(Reg) == 0)
continue;		continue;
DenseMap<unsigned, MachineInstr*>::iterator II = ImmDefMIs.find(Reg);		DenseMap<unsigned, MachineInstr*>::iterator II = ImmDefMIs.find(Reg);
assert(II != ImmDefMIs.end());		assert(II != ImmDefMIs.end() && "couldn't find immediate definition");
if (TII->FoldImmediate(MI, II->second, Reg, MRI)) {		if (TII->FoldImmediate(MI, II->second, Reg, MRI)) {
++NumImmFold;		++NumImmFold;
return true;		return true;
}		}
}		}
return false;		return false;
}		}

// FIXME: This is very simple and misses some cases which should be handled when		// FIXME: This is very simple and misses some cases which should be handled when
// motivating examples are found.		// motivating examples are found.
//		//
// The copy rewriting logic should look at uses as well as defs and be able to		// The copy rewriting logic should look at uses as well as defs and be able to
// eliminate copies across blocks.		// eliminate copies across blocks.
//		//
// Later copies that are subregister extracts will also not be eliminated since		// Later copies that are subregister extracts will also not be eliminated since
// only the first copy is considered.		// only the first copy is considered.
//		//
// e.g.		// e.g.
// %vreg1 = COPY %vreg0		// %vreg1 = COPY %vreg0
// %vreg2 = COPY %vreg0:sub1		// %vreg2 = COPY %vreg0:sub1
//		//
// Should replace %vreg2 uses with %vreg1:sub1		// Should replace %vreg2 uses with %vreg1:sub1
bool PeepholeOptimizer::foldRedundantCopy(		bool PeepholeOptimizer::foldRedundantCopy(
MachineInstr *MI,		MachineInstr *MI,
SmallSet<unsigned, 4> &CopySrcRegs,		SmallSet<unsigned, 4> &CopySrcRegs,
DenseMap<unsigned, MachineInstr *> &CopyMIs) {		DenseMap<unsigned, MachineInstr *> &CopyMIs) {
assert(MI->isCopy());		assert(MI->isCopy() && "expected a COPY machine instruction");

unsigned SrcReg = MI->getOperand(1).getReg();		unsigned SrcReg = MI->getOperand(1).getReg();
if (!TargetRegisterInfo::isVirtualRegister(SrcReg))		if (!TargetRegisterInfo::isVirtualRegister(SrcReg))
return false;		return false;

unsigned DstReg = MI->getOperand(0).getReg();		unsigned DstReg = MI->getOperand(0).getReg();
if (!TargetRegisterInfo::isVirtualRegister(DstReg))		if (!TargetRegisterInfo::isVirtualRegister(DstReg))
return false;		return false;
Show All 24 Lines	bool PeepholeOptimizer::foldRedundantCopy(

MRI->replaceRegWith(DstReg, PrevDstReg);		MRI->replaceRegWith(DstReg, PrevDstReg);

// Lifetime of the previous copy has been extended.		// Lifetime of the previous copy has been extended.
MRI->clearKillFlags(PrevDstReg);		MRI->clearKillFlags(PrevDstReg);
return true;		return true;
}		}

		bool PeepholeOptimizer::isNAPhysCopy(unsigned Reg) {
		return TargetRegisterInfo::isPhysicalRegister(Reg) &&
		!MRI->isAllocatable(Reg);
		}

		bool PeepholeOptimizer::foldRedundantNAPhysCopy(
		MachineInstr MI, DenseMap<unsigned, MachineInstr > &NAPhysToVirtMIs) {
		assert(MI->isCopy() && "expected a COPY machine instruction");

		if (DisableNAPhysCopyOpt)
		return false;

		unsigned DstReg = MI->getOperand(0).getReg();
		unsigned SrcReg = MI->getOperand(1).getReg();
		if (isNAPhysCopy(SrcReg) && TargetRegisterInfo::isVirtualRegister(DstReg)) {
		// %vreg = COPY %PHYSREG
		// Avoid using a datastructure which can track multiple live non-allocatable
		// phys->virt copies since LLVM doesn't seem to do this.
		NAPhysToVirtMIs.insert({SrcReg, MI});
		return false;
		}

		if (!(TargetRegisterInfo::isVirtualRegister(SrcReg) && isNAPhysCopy(DstReg)))
		return false;

		// %PHYSREG = COPY %vreg
		auto PrevCopy = NAPhysToVirtMIs.find(DstReg);
		if (PrevCopy == NAPhysToVirtMIs.end()) {
		// We can't remove the copy: there was an intervening clobber of the
		// non-allocatable physical register after the copy to virtual.
		DEBUG(dbgs() << "NAPhysCopy: intervening clobber forbids erasing " << *MI
		<< '\n');
		return false;
		}

		unsigned PrevDstReg = PrevCopy->second->getOperand(0).getReg();
		if (PrevDstReg == SrcReg) {
		// Remove the virt->phys copy: we saw the virtual register definition, and
		// the non-allocatable physical register's state hasn't changed since then.
		DEBUG(dbgs() << "NAPhysCopy: erasing " << *MI << '\n');
		++NumNAPhysCopies;
		return true;
		}

		// Potential missed optimization opportunity: we saw a different virtual
		// register get a copy of the non-allocatable physical register, and we only
		// track one such copy. Avoid getting confused by this new non-allocatable
		// physical register definition, and remove it from the tracked copies.
		DEBUG(dbgs() << "NAPhysCopy: missed opportunity " << *MI << '\n');
		NAPhysToVirtMIs.erase(PrevCopy);
		return false;
		}

bool PeepholeOptimizer::runOnMachineFunction(MachineFunction &MF) {		bool PeepholeOptimizer::runOnMachineFunction(MachineFunction &MF) {
if (skipOptnoneFunction(*MF.getFunction()))		if (skipOptnoneFunction(*MF.getFunction()))
return false;		return false;

DEBUG(dbgs() << "******** PEEPHOLE OPTIMIZER ********\n");		DEBUG(dbgs() << "******** PEEPHOLE OPTIMIZER ********\n");
DEBUG(dbgs() << "********** Function: " << MF.getName() << '\n');		DEBUG(dbgs() << "********** Function: " << MF.getName() << '\n');

if (DisablePeephole)		if (DisablePeephole)
Show All 17 Lines	for (MachineFunction::iterator I = MF.begin(), E = MF.end(); I != E; ++I) {
// To perform this, the following set keeps track of the MIs already seen		// To perform this, the following set keeps track of the MIs already seen
// during the scan, if a MI is not in the set, it is assumed to be located		// during the scan, if a MI is not in the set, it is assumed to be located
// after. Newly created MIs have to be inserted in the set as well.		// after. Newly created MIs have to be inserted in the set as well.
SmallPtrSet<MachineInstr*, 16> LocalMIs;		SmallPtrSet<MachineInstr*, 16> LocalMIs;
SmallSet<unsigned, 4> ImmDefRegs;		SmallSet<unsigned, 4> ImmDefRegs;
DenseMap<unsigned, MachineInstr*> ImmDefMIs;		DenseMap<unsigned, MachineInstr*> ImmDefMIs;
SmallSet<unsigned, 16> FoldAsLoadDefCandidates;		SmallSet<unsigned, 16> FoldAsLoadDefCandidates;

		// Track when a non-allocatable physical register is copied to a virtual
		// register so that useless moves can be removed.
		//
		// %PHYSREG is the map index; MI is the last valid `%vreg = COPY %PHYSREG`
		// without any intervening re-definition of %PHYSREG.
		DenseMap<unsigned, MachineInstr *> NAPhysToVirtMIs;

// Set of virtual registers that are copied from.		// Set of virtual registers that are copied from.
SmallSet<unsigned, 4> CopySrcRegs;		SmallSet<unsigned, 4> CopySrcRegs;
DenseMap<unsigned, MachineInstr *> CopySrcMIs;		DenseMap<unsigned, MachineInstr *> CopySrcMIs;

for (MachineBasicBlock::iterator		for (MachineBasicBlock::iterator
MII = I->begin(), MIE = I->end(); MII != MIE; ) {		MII = I->begin(), MIE = I->end(); MII != MIE; ) {
MachineInstr MI = &MII;		MachineInstr MI = &MII;
// We may be erasing MI below, increment MII now.		// We may be erasing MI below, increment MII now.
++MII;		++MII;
LocalMIs.insert(MI);		LocalMIs.insert(MI);

// Skip debug values. They should not affect this peephole optimization.		// Skip debug values. They should not affect this peephole optimization.
if (MI->isDebugValue())		if (MI->isDebugValue())
continue;		continue;

// If we run into an instruction we can't fold across, discard		// If we run into an instruction we can't fold across, discard
// the load candidates.		// the load candidates.
if (MI->isLoadFoldBarrier())		if (MI->isLoadFoldBarrier())
FoldAsLoadDefCandidates.clear();		FoldAsLoadDefCandidates.clear();

if (MI->isPosition() \|\| MI->isPHI() \|\| MI->isImplicitDef() \|\|		if (MI->isPosition() \|\| MI->isPHI())
MI->isKill() \|\| MI->isInlineAsm() \|\|		continue;
MI->hasUnmodeledSideEffects())
		if (!MI->isCopy()) {
		for (const auto &Op : MI->operands()) {
		// Visit all operands: definitions can be implicit or explicit.
		if (Op.isReg()) {
		unsigned Reg = Op.getReg();
		if (Op.isDef() && isNAPhysCopy(Reg)) {
		const auto &Def = NAPhysToVirtMIs.find(Reg);
		if (Def != NAPhysToVirtMIs.end()) {
		// A new definition of the non-allocatable physical register
		// invalidates previous copies.
		DEBUG(dbgs() << "NAPhysCopy: invalidating because of " << *MI
		<< '\n');
		NAPhysToVirtMIs.erase(Def);
		}
		}
		} else if (Op.isRegMask()) {
		const uint32_t *RegMask = Op.getRegMask();
		for (auto &RegMI : NAPhysToVirtMIs) {
		unsigned Def = RegMI.first;
		if (MachineOperand::clobbersPhysReg(RegMask, Def)) {
		DEBUG(dbgs() << "NAPhysCopy: invalidating because of " << *MI
		<< '\n');
		NAPhysToVirtMIs.erase(Def);
		}
		}
		}
		}
		}

		if (MI->isImplicitDef() \|\| MI->isKill())
continue;		continue;

		if (MI->isInlineAsm() \|\| MI->hasUnmodeledSideEffects()) {
		// Blow away all non-allocatable physical registers knowledge since we
		// don't know what's correct anymore.
		//
		// FIXME: handle explicit asm clobbers.
		DEBUG(dbgs() << "NAPhysCopy: blowing away all info due to " << *MI
		<< '\n');
		NAPhysToVirtMIs.clear();
		continue;
		}

if ((isUncoalescableCopy(*MI) &&		if ((isUncoalescableCopy(*MI) &&
optimizeUncoalescableCopy(MI, LocalMIs)) \|\|		optimizeUncoalescableCopy(MI, LocalMIs)) \|\|
(MI->isCompare() && optimizeCmpInstr(MI, MBB)) \|\|		(MI->isCompare() && optimizeCmpInstr(MI, MBB)) \|\|
(MI->isSelect() && optimizeSelect(MI, LocalMIs))) {		(MI->isSelect() && optimizeSelect(MI, LocalMIs))) {
// MI is deleted.		// MI is deleted.
LocalMIs.erase(MI);		LocalMIs.erase(MI);
Changed = true;		Changed = true;
continue;		continue;
}		}

if (MI->isConditionalBranch() && optimizeCondBranch(MI)) {		if (MI->isConditionalBranch() && optimizeCondBranch(MI)) {
Changed = true;		Changed = true;
continue;		continue;
}		}

if (isCoalescableCopy(*MI) && optimizeCoalescableCopy(MI)) {		if (isCoalescableCopy(*MI) && optimizeCoalescableCopy(MI)) {
// MI is just rewritten.		// MI is just rewritten.
Changed = true;		Changed = true;
continue;		continue;
}		}

if (MI->isCopy() && foldRedundantCopy(MI, CopySrcRegs, CopySrcMIs)) {		if (MI->isCopy() &&
		(foldRedundantCopy(MI, CopySrcRegs, CopySrcMIs) \|\|
		foldRedundantNAPhysCopy(MI, NAPhysToVirtMIs))) {
LocalMIs.erase(MI);		LocalMIs.erase(MI);
MI->eraseFromParent();		MI->eraseFromParent();
Changed = true;		Changed = true;
continue;		continue;
}		}

if (isMoveImmediate(MI, ImmDefRegs, ImmDefMIs)) {		if (isMoveImmediate(MI, ImmDefRegs, ImmDefMIs)) {
SeenMoveImm = true;		SeenMoveImm = true;
▲ Show 20 Lines • Show All 335 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/peephole-na-phys-copy-folding.ll

				; RUN: llc -verify-machineinstrs -mtriple=i386-linux-gnu %s -o - \| FileCheck %s
				; RUN: llc -verify-machineinstrs -mtriple=x86_64-linux-gnu %s -o - \| FileCheck %s

				; The peephole optimizer can elide some physical register copies such as
				; EFLAGS. Make sure the flags are used directly, instead of needlessly using
				; lahf, when possible.

				@L = external global i32
				@M = external global i8
				declare i32 @bar(i64)

				; CHECK-LABEL: plus_one
				; CHECK-NOT: seto
				; CHECK-NOT: lahf
				; CHECK-NOT: sahf
				; CHECK-NOT: pushf
				; CHECK-NOT: popf
				; CHECK: incl L
				define i1 @plus_one() {
				entry:
				%loaded_L = load i32, i32* @L
				%val = add nsw i32 %loaded_L, 1 ; N.B. will emit inc.
				store i32 %val, i32* @L
				%loaded_M = load i8, i8* @M
				%masked = and i8 %loaded_M, 8
				%M_is_true = icmp ne i8 %masked, 0
				%L_is_false = icmp eq i32 %val, 0
				%cond = and i1 %L_is_false, %M_is_true
				br i1 %cond, label %exit2, label %exit

				exit:
				ret i1 true

				exit2:
				ret i1 false
				}

				; CHECK-LABEL: plus_forty_two
				; CHECK-NOT: seto
				; CHECK-NOT: lahf
				; CHECK-NOT: sahf
				; CHECK-NOT: pushf
				; CHECK-NOT: popf
				; CHECK: addl $42,
				define i1 @plus_forty_two() {
				entry:
				%loaded_L = load i32, i32* @L
				%val = add nsw i32 %loaded_L, 42 ; N.B. won't emit inc.
				store i32 %val, i32* @L
				%loaded_M = load i8, i8* @M
				%masked = and i8 %loaded_M, 8
				%M_is_true = icmp ne i8 %masked, 0
				%L_is_false = icmp eq i32 %val, 0
				%cond = and i1 %L_is_false, %M_is_true
				br i1 %cond, label %exit2, label %exit

				exit:
				ret i1 true

				exit2:
				ret i1 false
				}

				; CHECK-LABEL: minus_one
				; CHECK-NOT: seto
				; CHECK-NOT: lahf
				; CHECK-NOT: sahf
				; CHECK-NOT: pushf
				; CHECK-NOT: popf
				; CHECK: decl L
				define i1 @minus_one() {
				entry:
				%loaded_L = load i32, i32* @L
				%val = add nsw i32 %loaded_L, -1 ; N.B. will emit dec.
				store i32 %val, i32* @L
				%loaded_M = load i8, i8* @M
				%masked = and i8 %loaded_M, 8
				%M_is_true = icmp ne i8 %masked, 0
				%L_is_false = icmp eq i32 %val, 0
				%cond = and i1 %L_is_false, %M_is_true
				br i1 %cond, label %exit2, label %exit

				exit:
				ret i1 true

				exit2:
				ret i1 false
				}

				; CHECK-LABEL: minus_forty_two
				; CHECK-NOT: seto
				; CHECK-NOT: lahf
				; CHECK-NOT: sahf
				; CHECK-NOT: pushf
				; CHECK-NOT: popf
				; CHECK: addl $-42,
				define i1 @minus_forty_two() {
				entry:
				%loaded_L = load i32, i32* @L
				%val = add nsw i32 %loaded_L, -42 ; N.B. won't emit dec.
				store i32 %val, i32* @L
				%loaded_M = load i8, i8* @M
				%masked = and i8 %loaded_M, 8
				%M_is_true = icmp ne i8 %masked, 0
				%L_is_false = icmp eq i32 %val, 0
				%cond = and i1 %L_is_false, %M_is_true
				br i1 %cond, label %exit2, label %exit

				exit:
				ret i1 true

				exit2:
				ret i1 false
				}

				; CHECK-LABEL: test_intervening_call:
				; CHECK: cmpxchg
				; CHECK: seto %al
				; CHECK-NEXT: lahf
				; CHECK: call{{[lq]}} bar
				; CHECK: addb $127, %al
				; CHECK-NEXT: sahf
				define i64 @test_intervening_call(i64* %foo, i64 %bar, i64 %baz) {
				; cmpxchg sets EFLAGS, call clobbers it, then br uses EFLAGS.
				%cx = cmpxchg i64* %foo, i64 %bar, i64 %baz seq_cst seq_cst
				%v = extractvalue { i64, i1 } %cx, 0
				%p = extractvalue { i64, i1 } %cx, 1
				call i32 @bar(i64 %v)
				br i1 %p, label %t, label %f

				t:
				ret i64 42

				f:
				ret i64 0
				}

				; CHECK-LABEL: test_two_live_flags:
				; CHECK: cmpxchg
				; CHECK-NEXT: seto %al
				; CHECK-NEXT: lahf
				; Save result of the first cmpxchg into D.
				; CHECK-NEXT: mov{{[lq]}} %[[AX:[er]ax]], %[[D:[re]d[xi]]]
				; CHECK: cmpxchg
				; CHECK-NEXT: sete %al
				; Save result of the second cmpxchg onto the stack.
				; CHECK-NEXT: push{{[lq]}} %[[AX]]
				; Restore result of the first cmpxchg from D, put it back in EFLAGS.
				; CHECK-NEXT: mov{{[lq]}} %[[D]], %[[AX]]
				; CHECK-NEXT: addb $127, %al
				; CHECK-NEXT: sahf
				; Restore result of the second cmpxchg from the stack.
				; CHECK-NEXT: pop{{[lq]}} %[[AX]]
				; Test from EFLAGS restored from first cmpxchg, jump if that fails.
				; CHECK-NEXT: jne
				; Fallthrough to test the second cmpxchg's result.
				; CHECK: testb %al, %al
				; CHECK-NEXT: je
				define i64 @test_two_live_flags(
				i64* %foo0, i64 %bar0, i64 %baz0,
				i64* %foo1, i64 %bar1, i64 %baz1) {
				%cx0 = cmpxchg i64* %foo0, i64 %bar0, i64 %baz0 seq_cst seq_cst
				%p0 = extractvalue { i64, i1 } %cx0, 1
				%cx1 = cmpxchg i64* %foo1, i64 %bar1, i64 %baz1 seq_cst seq_cst
				%p1 = extractvalue { i64, i1 } %cx1, 1
				%flag = and i1 %p0, %p1
				br i1 %flag, label %t, label %f

				t:
				ret i64 42

				f:
				ret i64 0
				}

				; CHECK-LABEL: asm_clobbering_flags:
				; CHECK: test
				; CHECK-NEXT: setg
				; CHECK-NEXT: #APP
				; CHECK-NEXT: bsfl
				; CHECK-NEXT: #NO_APP
				; CHECK-NEXT: movl
				; CHECK-NEXT: ret
				define i1 @asm_clobbering_flags(i32* %mem) {
				%val = load i32, i32* %mem, align 4
				%cmp = icmp sgt i32 %val, 0
				%res = tail call i32 asm "bsfl $1,$0", "=r,r,~{cc},~{dirflag},~{fpsr},~{flags}"(i32 %val)
				store i32 %res, i32* %mem, align 4
				ret i1 %cmp
				}