This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/Target/
-
llvm/
-
Target/
-
TargetInstrInfo.h
-
lib/
-
CodeGen/
-
InlineSpiller.cpp
-
TargetInstrInfo.cpp
-
Target/X86/
-
X86/
-
X86InstrInfo.h
-
X86InstrInfo.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
partial-fold32.ll
-
partial-fold64.ll
-
vector-half-conversions.ll

Differential D26521

[X86] Allow folding of reloads from stack slots when loading a subreg of the spilled reg
ClosedPublic

Authored by mkuper on Nov 10 2016, 1:18 PM.

Download Raw Diff

Details

Reviewers

RKSimon
zvi
wmi
MatzeB
craig.topper

Commits

rG47eb85a0033f: [X86] Allow folding of stack reloads when loading a subreg of the spilled reg
rL287792: [X86] Allow folding of stack reloads when loading a subreg of the spilled reg

Summary

Currently we don't support subregs in InlineSpiller::foldMemoryOperand() because targets may not deal with them correctly.

It seems that X86, at least, already does the right thing when folding a load from a stack slot when it only needs the low (e.g. not "ah") subreg.

This fixes PR30832.

Diff Detail

Repository: rL LLVM

Event Timeline

mkuper updated this revision to Diff 77542.Nov 10 2016, 1:18 PM

mkuper retitled this revision from to [X86] Allow folding of reloads from stack slots when loading a subreg of the spilled reg.

mkuper updated this object.

mkuper added reviewers: RKSimon, craig.topper, zvi, MatzeB, wmi.

mkuper added a subscriber: llvm-commits.

Herald added a subscriber: qcolombet. · View Herald TranscriptNov 10 2016, 1:18 PM

Add accidentally missing negative test.

MatzeB added inline comments.Nov 10 2016, 2:48 PM

include/llvm/Target/TargetInstrInfo.h
820–828 ↗	(On Diff #77545)	Shouldn't we better fix the existing foldMemoryOperand() implementations in the targets instead of adding more API? A "fix" in the sense to actually check the SubReg and return nullptr if it is set would be enough.

mkuper added inline comments.Nov 10 2016, 3:10 PM

include/llvm/Target/TargetInstrInfo.h
820–828 ↗	(On Diff #77545)	I'm somewhat wary of changing the default behavior to be more permissive. It would be very easy to "fix" all the in-tree targets by sinking the SubReg check into their foldMemoryOperandImpl(), but it has the potential of introducing really nasty bugs for downstream out-of-tree targets.

craig.topper added inline comments.Nov 10 2016, 11:13 PM

lib/Target/X86/X86InstrInfo.h
387 ↗	(On Diff #77545)	Fix the formatting here to line up the arguments.

zvi added inline comments.Nov 10 2016, 11:42 PM

lib/Target/X86/X86InstrInfo.h
383 ↗	(On Diff #77545)	Can you please clarify the last sentence in the comment?

mkuper added inline comments.Nov 11 2016, 12:35 AM

lib/Target/X86/X86InstrInfo.h
383 ↗	(On Diff #77545)	Basically, this is the same as the LoadMI parameter to InlineSpiller::foldMemoryOperand(). There it's described as "Load instruction to use instead of stack slot when non-null." but I'm not sure that's any clearer. There are 3 cases InlineSpiller::foldMemoryOperand() supports: Fold a store to a stack slot into a def. Fold a load from a stack slot into a use. Fold an existing load instruction into a use. Cases (2) and (3) are distinguished by actually passing the load instruction we want to fold through the API. I can try to write that down more concisely (but hopefully in a clearer way than it is right now). Note that this doesn't actually use LoadMI, just verifies it's non-null, but, conceivably, it could be useful. Do you think it would be better if I just pass a bool here for now, and change the API if we eventually need the instruction?
387 ↗	(On Diff #77545)	Whoops, will do, thanks.

Nice! I have an outstanding poor codegen issue with the inability to split+fold scalarization cases, do you think we would be able to expand this patch in the future to handle those cases?

__m128i popcnt1(__m128i *in) {
  return (__m128i) { __builtin_popcountll(in[0][0]), __builtin_popcountll(in[0][1]) };
}

popcnt1(long long __vector(2)*):
        vmovdqu (%rdi), %xmm0
        vmovq   %xmm0, %rax
        vpextrq $1, %xmm0, %rcx
        popcntq %rax, %rax
        popcntq %rcx, %rcx
        vmovq   %rcx, %xmm0
        vmovq   %rax, %xmm1
        vpunpcklqdq     %xmm0, %xmm1, %xmm0 # xmm0 = xmm1[0],xmm0[0]
        retq

Which would be better as:

popcnt1(long long __vector(2)*):
        popcntq (%rdi), %rax
        popcntq 8(%rdi), %rcx
        vmovq   %rcx, %xmm0
        vmovq   %rax, %xmm1
        vpunpcklqdq     %xmm0, %xmm1, %xmm0 # xmm0 = xmm1[0],xmm0[0]
        retq

test/CodeGen/X86/partial-fold.ll
1 ↗	(On Diff #77545)	Add i686 tests as well and regenerate with utils\update_llc_test_checks.py

andreadb added a subscriber: andreadb.Nov 11 2016, 4:06 AM

In D26521#592717, @RKSimon wrote:

Nice! I have an outstanding poor codegen issue with the inability to split+fold scalarization cases, do you think we would be able to expand this patch in the future to handle those cases?

I don't know the spill/reload code well enough to even tell if it's happening for similar reasons or not.
But CC me on the PR, I'll take a look.

test/CodeGen/X86/partial-fold.ll
1 ↗	(On Diff #77545)	Sure, I'll add i686. I'd prefer not to use update_llc_test_checks here - there's a lot of push/pop noise because of forcing all the regs to be used.

In D26521#592842, @mkuper wrote:

In D26521#592717, @RKSimon wrote:

Nice! I have an outstanding poor codegen issue with the inability to split+fold scalarization cases, do you think we would be able to expand this patch in the future to handle those cases?

I don't know the spill/reload code well enough to even tell if it's happening for similar reasons or not.
But CC me on the PR, I'll take a look.

PR30986

test/CodeGen/X86/partial-fold.ll
1 ↗	(On Diff #77545)	That's fine - it might mean that you have to split this into 32/64 versions of the file as the existing inline asm will fail on i686

wmi added inline comments.Nov 11 2016, 9:47 AM

lib/CodeGen/TargetInstrInfo.cpp
547 ↗	(On Diff #77545)	In which case we will have (SubRegSize % 8) != 0?
lib/Target/X86/X86InstrInfo.cpp
6281 ↗	(On Diff #77545)	Why only folding from stack-slot is supported but folding from load is not?

mkuper added inline comments.Nov 11 2016, 9:51 AM

lib/CodeGen/TargetInstrInfo.cpp
547 ↗	(On Diff #77545)	It will never happen for x86, but it's possible in theory, and this is target-independent code.
lib/Target/X86/X86InstrInfo.cpp
6281 ↗	(On Diff #77545)	Because it's the simple case. :-) I need to make sure the other case makes sense (and find a test-case), and assuming it does, I'll send a follow-up patch. I'll add a TODO.

(Hopefully) addressed comments.

Thanks for improving the documentation, Michael. LGTM.

Is it worth adding the partial-fold test files to trunk with current codegen?

lib/CodeGen/TargetInstrInfo.cpp
553 ↗	(On Diff #77632)	Do we need an assert(MemSize && "Empty stack slot") here?

In D26521#594426, @RKSimon wrote:

Is it worth adding the partial-fold test files to trunk with current codegen?

I think that's rather less useful when not using update_llc_test_checks, since you don't see the full diff anyway. And, as I wrote above, I think update_llc_test_checks really does more harm than good in this case.
But I can add it anyway with just the relevant checks if you want the diff to be more explicit. What do you think?

lib/CodeGen/TargetInstrInfo.cpp
553 ↗	(On Diff #77632)	Sure, I'll add one.

qcolombet added inline comments.Nov 14 2016, 1:10 PM

include/llvm/Target/TargetInstrInfo.h
821 ↗	(On Diff #77632)	I am not sure I understand the intent. I believe this is just a problem of wording. Basically, are you referring to this case: v1 = ld op v1.sub or this case: v1.sub = ld op v1.sub I believe this is the former and adding the example would help :).

mkuper added inline comments.Nov 14 2016, 1:32 PM

include/llvm/Target/TargetInstrInfo.h
821 ↗	(On Diff #77632)	Yes, it's the former, I'll add an example, thanks!

Updated per comments, and rebased to show test diffs.

Also moved the code that computes MemSize to be before calling into the target's foldMemoryOperandImpl() because, apparently, the X86 implementation feels free to change MI, so accessing MI.getOperand(Idx) after the Impl call may produce unexpected results.

ping

gberry added a subscriber: gberry.Nov 22 2016, 11:03 AM

gberry added a child revision: D27002: [AArch64] Handle more zero reg cases in foldMemoryOperandImpl.Nov 22 2016, 2:28 PM

What about changing isSubregFoldable() to not take any parameters and just return true/false, so all the decision logic can stay in foldMemoryOperand()? At least that keeps the newly added API as simple as possible.
It also seems to match the usage pattern in https://reviews.llvm.org/D27002.

lib/CodeGen/TargetInstrInfo.cpp
574 ↗	(On Diff #77632)	(can't remove this comment because of phabricator bug http://llvm.org/PR30572)

In D26521#603360, @MatzeB wrote:

What about changing isSubregFoldable() to not take any parameters and just return true/false, so all the decision logic can stay in foldMemoryOperand()?

Do you mean in the target's foldMemoryOperandImpl()? (The decision needs to be target-dependent)
If you do, I think that can work. Not sure it's going to be a net improvement, but I'll post a patch.

mkuper mentioned this in D27002: [AArch64] Handle more zero reg cases in foldMemoryOperandImpl.Nov 22 2016, 4:15 PM

In D26521#603450, @mkuper wrote:

In D26521#603360, @MatzeB wrote:

What about changing isSubregFoldable() to not take any parameters and just return true/false, so all the decision logic can stay in foldMemoryOperand()?

Do you mean in the target's foldMemoryOperandImpl()? (The decision needs to be target-dependent)
If you do, I think that can work. Not sure it's going to be a net improvement, but I'll post a patch.

Yep, I would expect that to make the code (slightly) easier to follow as the logic is not split accross two callbacks anymore.

Updated with suggested API.

Herald added a subscriber: sanjoy. · View Herald TranscriptNov 22 2016, 4:50 PM

Thanks.

I wanted to propose to only perform the check in the `foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI, unsigned OpNum,

ArrayRef<MachineOperand> MOs, MachineBasicBlock::iterator InsertPt, unsigned Size, unsigned Align, bool AllowCommute)`

variant, as the other two finally call into that anyway. However I see a MI.setDesc(get(NewOpc)); and I currently fail to reason about whether that common function may actually still abort after we changed the instruction type (soo much code I'm not that familiar with...).

So this LGTM as is.

This revision is now accepted and ready to land.Nov 22 2016, 5:46 PM

Closed by commit rL287792: [X86] Allow folding of stack reloads when loading a subreg of the spilled reg (authored by mkuper). · Explain WhyNov 23 2016, 10:43 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Target/

TargetInstrInfo.h

14 lines

lib/

CodeGen/

InlineSpiller.cpp

11 lines

TargetInstrInfo.cpp

29 lines

Target/

X86/

X86InstrInfo.h

4 lines

X86InstrInfo.cpp

16 lines

test/

CodeGen/

X86/

partial-fold32.ll

3 lines

partial-fold64.ll

6 lines

vector-half-conversions.ll

20 lines

Diff 79126

llvm/trunk/include/llvm/Target/TargetInstrInfo.h

Show First 20 Lines • Show All 811 Lines • ▼ Show 20 Lines	public:
/// This function is called for all pseudo instructions		/// This function is called for all pseudo instructions
/// that remain after register allocation. Many pseudo instructions are		/// that remain after register allocation. Many pseudo instructions are
/// created to help register allocation. This is the place to convert them		/// created to help register allocation. This is the place to convert them
/// into real instructions. The target can edit MI in place, or it can insert		/// into real instructions. The target can edit MI in place, or it can insert
/// new instructions and erase MI. The function should return true if		/// new instructions and erase MI. The function should return true if
/// anything was changed.		/// anything was changed.
virtual bool expandPostRAPseudo(MachineInstr &MI) const { return false; }		virtual bool expandPostRAPseudo(MachineInstr &MI) const { return false; }

		/// Check whether the target can fold a load that feeds a subreg operand
		/// (or a subreg operand that feeds a store).
		/// For example, X86 may want to return true if it can fold
		/// movl (%esp), %eax
		/// subb, %al, ...
		/// Into:
		/// subb (%esp), ...
		///
		/// Ideally, we'd like the target implementation of foldMemoryOperand() to
		/// reject subregs - but since this behavior used to be enforced in the
		/// target-independent code, moving this responsibility to the targets
		/// has the potential of causing nasty silent breakage in out-of-tree targets.
		virtual bool isSubregFoldable() const { return false; }

/// Attempt to fold a load or store of the specified stack		/// Attempt to fold a load or store of the specified stack
/// slot into the specified machine instruction for the specified operand(s).		/// slot into the specified machine instruction for the specified operand(s).
/// If this is possible, a new instruction is returned with the specified		/// If this is possible, a new instruction is returned with the specified
/// operand folded, otherwise NULL is returned.		/// operand folded, otherwise NULL is returned.
/// The new instruction is inserted before MI, and the client is responsible		/// The new instruction is inserted before MI, and the client is responsible
/// for removing the old instruction.		/// for removing the old instruction.
MachineInstr *foldMemoryOperand(MachineInstr &MI, ArrayRef<unsigned> Ops,		MachineInstr *foldMemoryOperand(MachineInstr &MI, ArrayRef<unsigned> Ops,
int FrameIndex,		int FrameIndex,
▲ Show 20 Lines • Show All 703 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/InlineSpiller.cpp

Show First 20 Lines • Show All 733 Lines • ▼ Show 20 Lines	foldMemoryOperand(ArrayRef<std::pair<MachineInstr*, unsigned> > Ops,
// Don't attempt folding in bundles.		// Don't attempt folding in bundles.
MachineInstr *MI = Ops.front().first;		MachineInstr *MI = Ops.front().first;
if (Ops.back().first != MI \|\| MI->isBundled())		if (Ops.back().first != MI \|\| MI->isBundled())
return false;		return false;

bool WasCopy = MI->isCopy();		bool WasCopy = MI->isCopy();
unsigned ImpReg = 0;		unsigned ImpReg = 0;

bool SpillSubRegs = (MI->getOpcode() == TargetOpcode::STATEPOINT \|\|		// Spill subregs if the target allows it.
		// We always want to spill subregs for stackmap/patchpoint pseudos.
		bool SpillSubRegs = TII.isSubregFoldable() \|\|
		MI->getOpcode() == TargetOpcode::STATEPOINT \|\|
MI->getOpcode() == TargetOpcode::PATCHPOINT \|\|		MI->getOpcode() == TargetOpcode::PATCHPOINT \|\|
MI->getOpcode() == TargetOpcode::STACKMAP);		MI->getOpcode() == TargetOpcode::STACKMAP;

// TargetInstrInfo::foldMemoryOperand only expects explicit, non-tied		// TargetInstrInfo::foldMemoryOperand only expects explicit, non-tied
// operands.		// operands.
SmallVector<unsigned, 8> FoldOps;		SmallVector<unsigned, 8> FoldOps;
for (const auto &OpPair : Ops) {		for (const auto &OpPair : Ops) {
unsigned Idx = OpPair.second;		unsigned Idx = OpPair.second;
assert(MI == OpPair.first && "Instruction conflict during operand folding");		assert(MI == OpPair.first && "Instruction conflict during operand folding");
MachineOperand &MO = MI->getOperand(Idx);		MachineOperand &MO = MI->getOperand(Idx);
if (MO.isImplicit()) {		if (MO.isImplicit()) {
ImpReg = MO.getReg();		ImpReg = MO.getReg();
continue;		continue;
}		}
// FIXME: Teach targets to deal with subregs.
if (!SpillSubRegs && MO.getSubReg())		if (!SpillSubRegs && MO.getSubReg())
return false;		return false;
// We cannot fold a load instruction into a def.		// We cannot fold a load instruction into a def.
if (LoadMI && MO.isDef())		if (LoadMI && MO.isDef())
return false;		return false;
// Tied use operands should not be passed to foldMemoryOperand.		// Tied use operands should not be passed to foldMemoryOperand.
if (!MI->isRegTiedToDefOperand(Idx))		if (!MI->isRegTiedToDefOperand(Idx))
FoldOps.push_back(Idx);		FoldOps.push_back(Idx);
▲ Show 20 Lines • Show All 694 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/TargetInstrInfo.cpp

Show First 20 Lines • Show All 509 Lines • ▼ Show 20 Lines	if (MI.getOperand(Ops[i]).isDef())
Flags \|= MachineMemOperand::MOStore;		Flags \|= MachineMemOperand::MOStore;
else		else
Flags \|= MachineMemOperand::MOLoad;		Flags \|= MachineMemOperand::MOLoad;

MachineBasicBlock *MBB = MI.getParent();		MachineBasicBlock *MBB = MI.getParent();
assert(MBB && "foldMemoryOperand needs an inserted instruction");		assert(MBB && "foldMemoryOperand needs an inserted instruction");
MachineFunction &MF = *MBB->getParent();		MachineFunction &MF = *MBB->getParent();

		// If we're not folding a load into a subreg, the size of the load is the
		// size of the spill slot. But if we are, we need to figure out what the
		// actual load size is.
		int64_t MemSize = 0;
		const MachineFrameInfo &MFI = MF.getFrameInfo();
		const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();

		if (Flags & MachineMemOperand::MOStore) {
		MemSize = MFI.getObjectSize(FI);
		} else {
		for (unsigned Idx : Ops) {
		int64_t OpSize = MFI.getObjectSize(FI);

		if (auto SubReg = MI.getOperand(Idx).getSubReg()) {
		unsigned SubRegSize = TRI->getSubRegIdxSize(SubReg);
		if (SubRegSize > 0 && !(SubRegSize % 8))
		OpSize = SubRegSize / 8;
		}

		MemSize = std::max(MemSize, OpSize);
		}
		}

		assert(MemSize && "Did not expect a zero-sized stack slot");

MachineInstr *NewMI = nullptr;		MachineInstr *NewMI = nullptr;

if (MI.getOpcode() == TargetOpcode::STACKMAP \|\|		if (MI.getOpcode() == TargetOpcode::STACKMAP \|\|
MI.getOpcode() == TargetOpcode::PATCHPOINT \|\|		MI.getOpcode() == TargetOpcode::PATCHPOINT \|\|
MI.getOpcode() == TargetOpcode::STATEPOINT) {		MI.getOpcode() == TargetOpcode::STATEPOINT) {
// Fold stackmap/patchpoint.		// Fold stackmap/patchpoint.
NewMI = foldPatchpoint(MF, MI, Ops, FI, *this);		NewMI = foldPatchpoint(MF, MI, Ops, FI, *this);
if (NewMI)		if (NewMI)
MBB->insert(MI, NewMI);		MBB->insert(MI, NewMI);
} else {		} else {
// Ask the target to do the actual folding.		// Ask the target to do the actual folding.
NewMI = foldMemoryOperandImpl(MF, MI, Ops, MI, FI, LIS);		NewMI = foldMemoryOperandImpl(MF, MI, Ops, MI, FI, LIS);
}		}

if (NewMI) {		if (NewMI) {
NewMI->setMemRefs(MI.memoperands_begin(), MI.memoperands_end());		NewMI->setMemRefs(MI.memoperands_begin(), MI.memoperands_end());
// Add a memory operand, foldMemoryOperandImpl doesn't do that.		// Add a memory operand, foldMemoryOperandImpl doesn't do that.
assert((!(Flags & MachineMemOperand::MOStore) \|\|		assert((!(Flags & MachineMemOperand::MOStore) \|\|
NewMI->mayStore()) &&		NewMI->mayStore()) &&
"Folded a def to a non-store!");		"Folded a def to a non-store!");
assert((!(Flags & MachineMemOperand::MOLoad) \|\|		assert((!(Flags & MachineMemOperand::MOLoad) \|\|
NewMI->mayLoad()) &&		NewMI->mayLoad()) &&
"Folded a use to a non-load!");		"Folded a use to a non-load!");
const MachineFrameInfo &MFI = MF.getFrameInfo();
assert(MFI.getObjectOffset(FI) != -1);		assert(MFI.getObjectOffset(FI) != -1);
MachineMemOperand *MMO = MF.getMachineMemOperand(		MachineMemOperand *MMO = MF.getMachineMemOperand(
MachinePointerInfo::getFixedStack(MF, FI), Flags, MFI.getObjectSize(FI),		MachinePointerInfo::getFixedStack(MF, FI), Flags, MemSize,
MFI.getObjectAlignment(FI));		MFI.getObjectAlignment(FI));
NewMI->addMemOperand(MF, MMO);		NewMI->addMemOperand(MF, MMO);

return NewMI;		return NewMI;
}		}

// Straight COPY may fold as load/store.		// Straight COPY may fold as load/store.
if (!MI.isCopy() \|\| Ops.size() != 1)		if (!MI.isCopy() \|\| Ops.size() != 1)
return nullptr;		return nullptr;

const TargetRegisterClass *RC = canFoldCopy(MI, Ops[0]);		const TargetRegisterClass *RC = canFoldCopy(MI, Ops[0]);
if (!RC)		if (!RC)
return nullptr;		return nullptr;

const MachineOperand &MO = MI.getOperand(1 - Ops[0]);		const MachineOperand &MO = MI.getOperand(1 - Ops[0]);
MachineBasicBlock::iterator Pos = MI;		MachineBasicBlock::iterator Pos = MI;
const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();

if (Flags == MachineMemOperand::MOStore)		if (Flags == MachineMemOperand::MOStore)
storeRegToStackSlot(*MBB, Pos, MO.getReg(), MO.isKill(), FI, RC, TRI);		storeRegToStackSlot(*MBB, Pos, MO.getReg(), MO.isKill(), FI, RC, TRI);
else		else
loadRegFromStackSlot(*MBB, Pos, MO.getReg(), FI, RC, TRI);		loadRegFromStackSlot(*MBB, Pos, MO.getReg(), FI, RC, TRI);
return &*--Pos;		return &*--Pos;
}		}

▲ Show 20 Lines • Show All 606 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrInfo.h

Show First 20 Lines • Show All 372 Lines • ▼ Show 20 Lines	void loadRegFromAddr(MachineFunction &MF, unsigned DestReg,
SmallVectorImpl<MachineOperand> &Addr,		SmallVectorImpl<MachineOperand> &Addr,
const TargetRegisterClass *RC,		const TargetRegisterClass *RC,
MachineInstr::mmo_iterator MMOBegin,		MachineInstr::mmo_iterator MMOBegin,
MachineInstr::mmo_iterator MMOEnd,		MachineInstr::mmo_iterator MMOEnd,
SmallVectorImpl<MachineInstr*> &NewMIs) const;		SmallVectorImpl<MachineInstr*> &NewMIs) const;

bool expandPostRAPseudo(MachineInstr &MI) const override;		bool expandPostRAPseudo(MachineInstr &MI) const override;

		/// Check whether the target can fold a load that feeds a subreg operand
		/// (or a subreg operand that feeds a store).
		bool isSubregFoldable() const override { return true; }

/// foldMemoryOperand - If this target supports it, fold a load or store of		/// foldMemoryOperand - If this target supports it, fold a load or store of
/// the specified stack slot into the specified machine instruction for the		/// the specified stack slot into the specified machine instruction for the
/// specified operand(s). If this is possible, the target should perform the		/// specified operand(s). If this is possible, the target should perform the
/// folding and return true, otherwise it should return false. If it folds		/// folding and return true, otherwise it should return false. If it folds
/// the instruction, it is likely that the MachineInstruction the iterator		/// the instruction, it is likely that the MachineInstruction the iterator
/// references has been changed.		/// references has been changed.
MachineInstr *		MachineInstr *
foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,		foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,
▲ Show 20 Lines • Show All 216 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,837 Lines • ▼ Show 20 Lines	X86InstrInfo::foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,
if (NoFusing)		if (NoFusing)
return nullptr;		return nullptr;

// Unless optimizing for size, don't fold to avoid partial		// Unless optimizing for size, don't fold to avoid partial
// register update stalls		// register update stalls
if (!MF.getFunction()->optForSize() && hasPartialRegUpdate(MI.getOpcode()))		if (!MF.getFunction()->optForSize() && hasPartialRegUpdate(MI.getOpcode()))
return nullptr;		return nullptr;

		// Don't fold subreg spills, or reloads that use a high subreg.
		for (auto Op : Ops) {
		MachineOperand &MO = MI.getOperand(Op);
		auto SubReg = MO.getSubReg();
		if (SubReg && (MO.isDef() \|\| SubReg == X86::sub_8bit_hi))
		return nullptr;
		}

const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();
unsigned Size = MFI.getObjectSize(FrameIndex);		unsigned Size = MFI.getObjectSize(FrameIndex);
unsigned Alignment = MFI.getObjectAlignment(FrameIndex);		unsigned Alignment = MFI.getObjectAlignment(FrameIndex);
// If the function stack isn't realigned we don't want to fold instructions		// If the function stack isn't realigned we don't want to fold instructions
// that need increased alignment.		// that need increased alignment.
if (!RI.needsStackRealignment(MF))		if (!RI.needsStackRealignment(MF))
Alignment =		Alignment =
std::min(Alignment, Subtarget.getFrameLowering()->getStackAlignment());		std::min(Alignment, Subtarget.getFrameLowering()->getStackAlignment());
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	static bool isNonFoldablePartialRegisterLoad(const MachineInstr &LoadMI,

return false;		return false;
}		}

MachineInstr *X86InstrInfo::foldMemoryOperandImpl(		MachineInstr *X86InstrInfo::foldMemoryOperandImpl(
MachineFunction &MF, MachineInstr &MI, ArrayRef<unsigned> Ops,		MachineFunction &MF, MachineInstr &MI, ArrayRef<unsigned> Ops,
MachineBasicBlock::iterator InsertPt, MachineInstr &LoadMI,		MachineBasicBlock::iterator InsertPt, MachineInstr &LoadMI,
LiveIntervals *LIS) const {		LiveIntervals *LIS) const {

		// TODO: Support the case where LoadMI loads a wide register, but MI
		// only uses a subreg.
		for (auto Op : Ops) {
		if (MI.getOperand(Op).getSubReg())
		return nullptr;
		}

// If loading from a FrameIndex, fold directly from the FrameIndex.		// If loading from a FrameIndex, fold directly from the FrameIndex.
unsigned NumOps = LoadMI.getDesc().getNumOperands();		unsigned NumOps = LoadMI.getDesc().getNumOperands();
int FrameIndex;		int FrameIndex;
if (isLoadFromStackSlot(LoadMI, FrameIndex)) {		if (isLoadFromStackSlot(LoadMI, FrameIndex)) {
if (isNonFoldablePartialRegisterLoad(LoadMI, MI, MF))		if (isNonFoldablePartialRegisterLoad(LoadMI, MI, MF))
return nullptr;		return nullptr;
return foldMemoryOperandImpl(MF, MI, Ops, InsertPt, FrameIndex, LIS);		return foldMemoryOperandImpl(MF, MI, Ops, InsertPt, FrameIndex, LIS);
}		}
▲ Show 20 Lines • Show All 2,061 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/partial-fold32.ll

	; RUN: llc -mtriple=i686-unknown-linux-gnu -enable-misched=false < %s \| FileCheck %s			; RUN: llc -mtriple=i686-unknown-linux-gnu -enable-misched=false < %s \| FileCheck %s

	define fastcc i8 @fold32to8(i32 %add, i8 %spill) {			define fastcc i8 @fold32to8(i32 %add, i8 %spill) {
	; CHECK-LABEL: fold32to8:			; CHECK-LABEL: fold32to8:
	; CHECK: movl %ecx, (%esp) # 4-byte Spill			; CHECK: movl %ecx, (%esp) # 4-byte Spill
	; CHECK: movl (%esp), %eax # 4-byte Reload			; CHECK: subb (%esp), %dl # 1-byte Folded Reload
	; CHECK: subb %al, %dl
	entry:			entry:
	tail call void asm sideeffect "", "~{eax},~{ebx},~{ecx},~{edi},~{esi},~{ebp},~{dirflag},~{fpsr},~{flags}"()			tail call void asm sideeffect "", "~{eax},~{ebx},~{ecx},~{edi},~{esi},~{ebp},~{dirflag},~{fpsr},~{flags}"()
	%trunc = trunc i32 %add to i8			%trunc = trunc i32 %add to i8
	%sub = sub i8 %spill, %trunc			%sub = sub i8 %spill, %trunc
	ret i8 %sub			ret i8 %sub
	}			}

	; Do not fold a 1-byte store into a 4-byte spill slot			; Do not fold a 1-byte store into a 4-byte spill slot
	Show All 11 Lines

llvm/trunk/test/CodeGen/X86/partial-fold64.ll

	; RUN: llc -mtriple=x86_64-unknown-linux-gnu -enable-misched=false < %s \| FileCheck %s			; RUN: llc -mtriple=x86_64-unknown-linux-gnu -enable-misched=false < %s \| FileCheck %s

	define i32 @fold64to32(i64 %add, i32 %spill) {			define i32 @fold64to32(i64 %add, i32 %spill) {
	; CHECK-LABEL: fold64to32:			; CHECK-LABEL: fold64to32:
	; CHECK: movq %rdi, -{{[0-9]+}}(%rsp) # 8-byte Spill			; CHECK: movq %rdi, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; CHECK: movq -{{[0-9]+}}(%rsp), %rax # 8-byte Reload			; CHECK: subl -{{[0-9]+}}(%rsp), %esi # 4-byte Folded Reload
	; CHECK: subl %eax, %esi
	entry:			entry:
	tail call void asm sideeffect "", "~{rax},~{rbx},~{rcx},~{rdx},~{rdi},~{rbp},~{r8},~{r9},~{r10},~{r11},~{r12},~{r13},~{r14},~{r15},~{dirflag},~{fpsr},~{flags}"()			tail call void asm sideeffect "", "~{rax},~{rbx},~{rcx},~{rdx},~{rdi},~{rbp},~{r8},~{r9},~{r10},~{r11},~{r12},~{r13},~{r14},~{r15},~{dirflag},~{fpsr},~{flags}"()
	%trunc = trunc i64 %add to i32			%trunc = trunc i64 %add to i32
	%sub = sub i32 %spill, %trunc			%sub = sub i32 %spill, %trunc
	ret i32 %sub			ret i32 %sub
	}			}

	define i8 @fold64to8(i64 %add, i8 %spill) {			define i8 @fold64to8(i64 %add, i8 %spill) {
	; CHECK-LABEL: fold64to8:			; CHECK-LABEL: fold64to8:
	; CHECK: movq %rdi, -{{[0-9]+}}(%rsp) # 8-byte Spill			; CHECK: movq %rdi, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; CHECK: movq -{{[0-9]+}}(%rsp), %rax # 8-byte Reload			; CHECK: subb -{{[0-9]+}}(%rsp), %sil # 1-byte Folded Reload
	; CHECK: subb %al, %sil
	entry:			entry:
	tail call void asm sideeffect "", "~{rax},~{rbx},~{rcx},~{rdx},~{rdi},~{rbp},~{r8},~{r9},~{r10},~{r11},~{r12},~{r13},~{r14},~{r15},~{dirflag},~{fpsr},~{flags}"()			tail call void asm sideeffect "", "~{rax},~{rbx},~{rcx},~{rdx},~{rdi},~{rbp},~{r8},~{r9},~{r10},~{r11},~{r12},~{r13},~{r14},~{r15},~{dirflag},~{fpsr},~{flags}"()
	%trunc = trunc i64 %add to i8			%trunc = trunc i64 %add to i8
	%sub = sub i8 %spill, %trunc			%sub = sub i8 %spill, %trunc
	ret i8 %sub			ret i8 %sub
	}			}

	; Do not fold a 4-byte store into a 8-byte spill slot			; Do not fold a 4-byte store into a 8-byte spill slot
	Show All 16 Lines

llvm/trunk/test/CodeGen/X86/vector-half-conversions.ll

	Show First 20 Lines • Show All 4,782 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: movw %ax, %bx			; AVX1-NEXT: movw %ax, %bx
	; AVX1-NEXT: shll $16, %ebx			; AVX1-NEXT: shll $16, %ebx
	; AVX1-NEXT: vmovaps {{[0-9]+}}(%rsp), %xmm0 # 16-byte Reload			; AVX1-NEXT: vmovaps {{[0-9]+}}(%rsp), %xmm0 # 16-byte Reload
	; AVX1-NEXT: callq __truncdfhf2			; AVX1-NEXT: callq __truncdfhf2
	; AVX1-NEXT: movzwl %ax, %r14d			; AVX1-NEXT: movzwl %ax, %r14d
	; AVX1-NEXT: orl %ebx, %r14d			; AVX1-NEXT: orl %ebx, %r14d
	; AVX1-NEXT: shlq $32, %r14			; AVX1-NEXT: shlq $32, %r14
	; AVX1-NEXT: orq %r15, %r14			; AVX1-NEXT: orq %r15, %r14
	; AVX1-NEXT: vmovupd (%rsp), %ymm0 # 32-byte Reload			; AVX1-NEXT: vpermilpd $1, (%rsp), %xmm0 # 16-byte Folded Reload
	; AVX1-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]			; AVX1-NEXT: # xmm0 = mem[1,0]
	; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: callq __truncdfhf2			; AVX1-NEXT: callq __truncdfhf2
	; AVX1-NEXT: movw %ax, %bx			; AVX1-NEXT: movw %ax, %bx
	; AVX1-NEXT: shll $16, %ebx			; AVX1-NEXT: shll $16, %ebx
	; AVX1-NEXT: vmovups (%rsp), %ymm0 # 32-byte Reload			; AVX1-NEXT: vmovups (%rsp), %ymm0 # 32-byte Reload
	; AVX1-NEXT: # kill: %XMM0<def> %XMM0<kill> %YMM0<kill>			; AVX1-NEXT: # kill: %XMM0<def> %XMM0<kill> %YMM0<kill>
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: callq __truncdfhf2			; AVX1-NEXT: callq __truncdfhf2
	; AVX1-NEXT: movzwl %ax, %r15d			; AVX1-NEXT: movzwl %ax, %r15d
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: movw %ax, %bx			; AVX2-NEXT: movw %ax, %bx
	; AVX2-NEXT: shll $16, %ebx			; AVX2-NEXT: shll $16, %ebx
	; AVX2-NEXT: vmovaps {{[0-9]+}}(%rsp), %xmm0 # 16-byte Reload			; AVX2-NEXT: vmovaps {{[0-9]+}}(%rsp), %xmm0 # 16-byte Reload
	; AVX2-NEXT: callq __truncdfhf2			; AVX2-NEXT: callq __truncdfhf2
	; AVX2-NEXT: movzwl %ax, %r14d			; AVX2-NEXT: movzwl %ax, %r14d
	; AVX2-NEXT: orl %ebx, %r14d			; AVX2-NEXT: orl %ebx, %r14d
	; AVX2-NEXT: shlq $32, %r14			; AVX2-NEXT: shlq $32, %r14
	; AVX2-NEXT: orq %r15, %r14			; AVX2-NEXT: orq %r15, %r14
	; AVX2-NEXT: vmovupd (%rsp), %ymm0 # 32-byte Reload			; AVX2-NEXT: vpermilpd $1, (%rsp), %xmm0 # 16-byte Folded Reload
	; AVX2-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]			; AVX2-NEXT: # xmm0 = mem[1,0]
	; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: callq __truncdfhf2			; AVX2-NEXT: callq __truncdfhf2
	; AVX2-NEXT: movw %ax, %bx			; AVX2-NEXT: movw %ax, %bx
	; AVX2-NEXT: shll $16, %ebx			; AVX2-NEXT: shll $16, %ebx
	; AVX2-NEXT: vmovups (%rsp), %ymm0 # 32-byte Reload			; AVX2-NEXT: vmovups (%rsp), %ymm0 # 32-byte Reload
	; AVX2-NEXT: # kill: %XMM0<def> %XMM0<kill> %YMM0<kill>			; AVX2-NEXT: # kill: %XMM0<def> %XMM0<kill> %YMM0<kill>
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: callq __truncdfhf2			; AVX2-NEXT: callq __truncdfhf2
	; AVX2-NEXT: movzwl %ax, %r15d			; AVX2-NEXT: movzwl %ax, %r15d
	▲ Show 20 Lines • Show All 710 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: movw %ax, {{[0-9]+}}(%rsp) # 2-byte Spill			; AVX1-NEXT: movw %ax, {{[0-9]+}}(%rsp) # 2-byte Spill
	; AVX1-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload			; AVX1-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload
	; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0			; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
	; AVX1-NEXT: vmovapd %xmm0, {{[0-9]+}}(%rsp) # 16-byte Spill			; AVX1-NEXT: vmovapd %xmm0, {{[0-9]+}}(%rsp) # 16-byte Spill
	; AVX1-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]			; AVX1-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: callq __truncdfhf2			; AVX1-NEXT: callq __truncdfhf2
	; AVX1-NEXT: movw %ax, {{[0-9]+}}(%rsp) # 2-byte Spill			; AVX1-NEXT: movw %ax, {{[0-9]+}}(%rsp) # 2-byte Spill
	; AVX1-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload			; AVX1-NEXT: vpermilpd $1, {{[0-9]+}}(%rsp), %xmm0 # 16-byte Folded Reload
	; AVX1-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]			; AVX1-NEXT: # xmm0 = mem[1,0]
	; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: callq __truncdfhf2			; AVX1-NEXT: callq __truncdfhf2
	; AVX1-NEXT: movl %eax, %r12d			; AVX1-NEXT: movl %eax, %r12d
	; AVX1-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload			; AVX1-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload
	; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0			; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
	; AVX1-NEXT: vmovapd %xmm0, {{[0-9]+}}(%rsp) # 16-byte Spill			; AVX1-NEXT: vmovapd %xmm0, {{[0-9]+}}(%rsp) # 16-byte Spill
	; AVX1-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]			; AVX1-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: callq __truncdfhf2			; AVX1-NEXT: callq __truncdfhf2
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: movw %ax, {{[0-9]+}}(%rsp) # 2-byte Spill			; AVX2-NEXT: movw %ax, {{[0-9]+}}(%rsp) # 2-byte Spill
	; AVX2-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload			; AVX2-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload
	; AVX2-NEXT: vextractf128 $1, %ymm0, %xmm0			; AVX2-NEXT: vextractf128 $1, %ymm0, %xmm0
	; AVX2-NEXT: vmovapd %xmm0, {{[0-9]+}}(%rsp) # 16-byte Spill			; AVX2-NEXT: vmovapd %xmm0, {{[0-9]+}}(%rsp) # 16-byte Spill
	; AVX2-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]			; AVX2-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: callq __truncdfhf2			; AVX2-NEXT: callq __truncdfhf2
	; AVX2-NEXT: movw %ax, {{[0-9]+}}(%rsp) # 2-byte Spill			; AVX2-NEXT: movw %ax, {{[0-9]+}}(%rsp) # 2-byte Spill
	; AVX2-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload			; AVX2-NEXT: vpermilpd $1, {{[0-9]+}}(%rsp), %xmm0 # 16-byte Folded Reload
	; AVX2-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]			; AVX2-NEXT: # xmm0 = mem[1,0]
	; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: callq __truncdfhf2			; AVX2-NEXT: callq __truncdfhf2
	; AVX2-NEXT: movl %eax, %r12d			; AVX2-NEXT: movl %eax, %r12d
	; AVX2-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload			; AVX2-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload
	; AVX2-NEXT: vextractf128 $1, %ymm0, %xmm0			; AVX2-NEXT: vextractf128 $1, %ymm0, %xmm0
	; AVX2-NEXT: vmovapd %xmm0, {{[0-9]+}}(%rsp) # 16-byte Spill			; AVX2-NEXT: vmovapd %xmm0, {{[0-9]+}}(%rsp) # 16-byte Spill
	; AVX2-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]			; AVX2-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: callq __truncdfhf2			; AVX2-NEXT: callq __truncdfhf2
	▲ Show 20 Lines • Show All 167 Lines • Show Last 20 Lines