This is an archive of the discontinued LLVM Phabricator instance.

[X86] Memory folding for commutative instructions.
ClosedPublic

Authored by RKSimon on Oct 9 2014, 6:04 AM.

Download Raw Diff

Details

Reviewers

spatel
qcolombet
nadav
andreadb

Commits

rG77ac26d27989: [X86] Memory folding for commutative instructions.
rL219584: [X86] Memory folding for commutative instructions.

Summary

This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning.

This mainly helps the stack inliner better fold reloads of 3 (or more) operand instructions (VEX encoded SSE etc.) but by performing this in the lowest foldMemoryOperandImpl implementation it also replaces the X86InstrInfo::optimizeLoadInstr version and is now used by FastISel too.

Unlike the X86InstrInfo::optimizeLoadInstr implementation it uses findCommutedOpIndices instead of hard coded commute operand indices.

Diff Detail

Event Timeline

RKSimon updated this revision to Diff 14650.Oct 9 2014, 6:04 AM

RKSimon retitled this revision from to [X86] Memory folding for commutative instructions..

RKSimon updated this object.

RKSimon edited the test plan for this revision. (Show Details)

RKSimon added reviewers: nadav, andreadb, spatel.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added subscribers: alexr, Unknown Object (MLST).

Macro testcase:

Tidied up check for commutative instructions - we can avoid isCommutable and use findCommutedOpIndices directly and then fold anything that is commutable under certain cases (e.g. FMA instructions).

Added the missing test case.

Hi Simon

lib/Target/X86/X86InstrInfo.cpp
4218	I am not sure this is what we want generally speaking. Indeed, if you look at the code you modified earlier (BTW having the full context would be easier for the review), we may not want to keep the commuted instruction unless the commute is in place: MachineInstr *NewMI = commuteInstruction(MI, false); // Unable to commute. if (!NewMI) return 0; if (NewMI != MI) { // <——— here if the instruction is different we do not commute. // New instruction. It doesn't need to be kept. NewMI->eraseFromParent(); return 0; } Should we provide more arguments to have a finer control?
4218	I am not sure this is what we want generally speaking. Indeed, if you look at the code you modified earlier (BTW having the full context would be easier for the review), we may not want to keep the commuted instruction unless the commute is in place: MachineInstr *NewMI = commuteInstruction(MI, false); Unable to commute. if (!NewMI) return 0; if (NewMI != MI) { <——— here if the instruction is different we do not commute. // New instruction. It doesn't need to be kept. NewMI->eraseFromParent(); return 0; } Should we provide more arguments to have a finer control?
4219	This assert is not valid. This is legal for commuteInstruction to return nullptr. Make an early exit here. See r208371 for more details.
4219	This assert is not valid. This is legal for commuteInstruction to return nullptr. Make an early exit here. See r208371 for more details.
4228	Ditto.
4228	Ditto.

Hi Quentin,

I've added error handling code for commuteInstruction returning nullptr or a new MachineInstr*, both before and after the commuted folding attempt. This appears to be enough and we don't need to alter either the commute or folding call arguments to support it - we can keep to in-place instruction commutes only.

Simon.

Hi Simon,

LGTM.

As a side question, do you see any performance difference with that patch?

Thanks,
-Quentin

This revision is now accepted and ready to land.Oct 10 2014, 9:35 AM

As a side question, do you see any performance difference with that patch?

Thanks Quentin, on Jaguar I'm seeing a definite gain on (VEX encoded) SSE heavy code loops (physics, animation and vectormath), completely due to stack reload folding. On my Sandy Bridge Xeon the difference is a lot smaller / negligible. Although minor there are also improvements to code size / instruction packing.

Simon.

Closed by commit rL219584 (authored by @RKSimon).

Excuse me, I have reverted this in r219595.
It broke i686 builders.

I'll send a reproducible testcase.ll later.

Revision Contents

Path

Size

lib/

Target/

X86/

	X86FastISel.cpp
	X86FastISel.cpp (revision 219397)

2 lines

	X86InstrInfo.h
	X86InstrInfo.h (revision 219397)

2 lines

	X86InstrInfo.cpp
	X86InstrInfo.cpp (revision 219397)

100 lines

Diff 14650

lib/Target/X86/X86FastISel.cpp

Context not available.
	AM.getFullAddress(AddrOps);	AM.getFullAddress(AddrOps);

	MachineInstr *Result =	MachineInstr *Result =
	XII.foldMemoryOperandImpl(*FuncInfo.MF, MI, OpNo, AddrOps, Size, Alignment);	XII.foldMemoryOperandImpl(FuncInfo.MF, MI, OpNo, AddrOps, Size, Alignment, /AllowCommute=*/ true);
	if (!Result)	if (!Result)
	return false;	return false;

Context not available.

lib/Target/X86/X86InstrInfo.h

Context not available.
	MachineInstr* MI,	MachineInstr* MI,
	unsigned OpNum,	unsigned OpNum,
	const SmallVectorImpl<MachineOperand> &MOs,	const SmallVectorImpl<MachineOperand> &MOs,
	unsigned Size, unsigned Alignment) const;	unsigned Size, unsigned Alignment, bool AllowCommute) const;

	void	void
	getUnconditionalBranch(MCInst &Branch,	getUnconditionalBranch(MCInst &Branch,
Context not available.

lib/Target/X86/X86InstrInfo.cpp

Context not available.
	if (!DefMI->isSafeToMove(this, nullptr, SawStore))	if (!DefMI->isSafeToMove(this, nullptr, SawStore))
	return nullptr;	return nullptr;

	// We try to commute MI if possible.	// Collect information about virtual register operands of MI.
	unsigned IdxEnd = (MI->isCommutable()) ? 2 : 1;	unsigned SrcOperandId = 0;
	for (unsigned Idx = 0; Idx < IdxEnd; Idx++) {	bool FoundSrcOperand = false;
	// Collect information about virtual register operands of MI.	for (unsigned i = 0, e = MI->getDesc().getNumOperands(); i != e; ++i) {
	unsigned SrcOperandId = 0;	MachineOperand &MO = MI->getOperand(i);
	bool FoundSrcOperand = false;	if (!MO.isReg())
	for (unsigned i = 0, e = MI->getDesc().getNumOperands(); i != e; ++i) {	continue;
	MachineOperand &MO = MI->getOperand(i);	unsigned Reg = MO.getReg();
	if (!MO.isReg())	if (Reg != FoldAsLoadDefReg)
	continue;	continue;
	unsigned Reg = MO.getReg();	// Do not fold if we have a subreg use or a def or multiple uses.
	if (Reg != FoldAsLoadDefReg)	if (MO.getSubReg() \|\| MO.isDef() \|\| FoundSrcOperand)
	continue;	return nullptr;
	// Do not fold if we have a subreg use or a def or multiple uses.
	if (MO.getSubReg() \|\| MO.isDef() \|\| FoundSrcOperand)
	return nullptr;

	SrcOperandId = i;	SrcOperandId = i;
	FoundSrcOperand = true;	FoundSrcOperand = true;
	}	}
	if (!FoundSrcOperand) return nullptr;	if (!FoundSrcOperand) return nullptr;

	// Check whether we can fold the def into SrcOperandId.	// Check whether we can fold the def into SrcOperandId.
	SmallVector<unsigned, 8> Ops;	SmallVector<unsigned, 8> Ops;
	Ops.push_back(SrcOperandId);	Ops.push_back(SrcOperandId);
	MachineInstr *FoldMI = foldMemoryOperand(MI, Ops, DefMI);	MachineInstr *FoldMI = foldMemoryOperand(MI, Ops, DefMI);
	if (FoldMI) {	if (FoldMI) {
	FoldAsLoadDefReg = 0;	FoldAsLoadDefReg = 0;
	return FoldMI;	return FoldMI;
	}	}

	if (Idx == 1) {
	// MI was changed but it didn't help, commute it back!
	commuteInstruction(MI, false);
	return nullptr;
	}

	// Check whether we can commute MI and enable folding.
	if (MI->isCommutable()) {
	MachineInstr *NewMI = commuteInstruction(MI, false);
	// Unable to commute.
	if (!NewMI) return nullptr;
	if (NewMI != MI) {
	// New instruction. It doesn't need to be kept.
	NewMI->eraseFromParent();
	return nullptr;
	}
	}
	}
	return nullptr;	return nullptr;
	}	}

Context not available.
	X86InstrInfo::foldMemoryOperandImpl(MachineFunction &MF,	X86InstrInfo::foldMemoryOperandImpl(MachineFunction &MF,
	MachineInstr *MI, unsigned i,	MachineInstr *MI, unsigned i,
	const SmallVectorImpl<MachineOperand> &MOs,	const SmallVectorImpl<MachineOperand> &MOs,
	unsigned Size, unsigned Align) const {	unsigned Size, unsigned Align, bool AllowCommute) const {
	const DenseMap<unsigned,	const DenseMap<unsigned,
	std::pair<unsigned,unsigned> > *OpcodeTablePtr = nullptr;	std::pair<unsigned,unsigned> > *OpcodeTablePtr = nullptr;
	bool isCallRegIndirect = Subtarget.callRegIndirect();	bool isCallRegIndirect = Subtarget.callRegIndirect();
Context not available.
	}	}
	}	}

		// If the instruction and target operand are commutable, commute the instruction and try again.
		if (AllowCommute && MI->isCommutable()) {
		unsigned OriginalOpIdx = i, CommuteOpIdx1, CommuteOpIdx2;
		if (findCommutedOpIndices(MI, CommuteOpIdx1, CommuteOpIdx2)) {
		if ((CommuteOpIdx1 == OriginalOpIdx) \|\| (CommuteOpIdx2 == OriginalOpIdx)) {
		MI = commuteInstruction(MI, false);
		qcolombetUnsubmitted Not Done Reply Inline Actions I am not sure this is what we want generally speaking. Indeed, if you look at the code you modified earlier (BTW having the full context would be easier for the review), we may not want to keep the commuted instruction unless the commute is in place: MachineInstr NewMI = commuteInstruction(MI, false); // Unable to commute. if (!NewMI) return 0; if (NewMI != MI) { // <——— here if the instruction is different we do not commute. // New instruction. It doesn't need to be kept. NewMI->eraseFromParent(); return 0; } Should we provide more arguments to have a finer control? qcolombet:* I am not sure this is what we want generally speaking. Indeed, if you look at the code you…
		qcolombetUnsubmitted Not Done Reply Inline Actions I am not sure this is what we want generally speaking. Indeed, if you look at the code you modified earlier (BTW having the full context would be easier for the review), we may not want to keep the commuted instruction unless the commute is in place: MachineInstr NewMI = commuteInstruction(MI, false); Unable to commute. if (!NewMI) return 0; if (NewMI != MI) { <——— here if the instruction is different we do not commute. // New instruction. It doesn't need to be kept. NewMI->eraseFromParent(); return 0; } Should we provide more arguments to have a finer control? qcolombet:* I am not sure this is what we want generally speaking. Indeed, if you look at the code you…
		assert (MI && "commutable instruction failed to commute!");
		qcolombetUnsubmitted Not Done Reply Inline Actions This assert is not valid. This is legal for commuteInstruction to return nullptr. Make an early exit here. See r208371 for more details. qcolombet: This assert is not valid. This is legal for commuteInstruction to return nullptr. Make an early…
		qcolombetUnsubmitted Not Done Reply Inline Actions This assert is not valid. This is legal for commuteInstruction to return nullptr. Make an early exit here. See r208371 for more details. qcolombet: This assert is not valid. This is legal for commuteInstruction to return nullptr. Make an early…

		unsigned CommuteOpIdx = (CommuteOpIdx1 == OriginalOpIdx ? CommuteOpIdx2 : CommuteOpIdx1);
		NewMI = foldMemoryOperandImpl(MF, MI, CommuteOpIdx, MOs, Size, Align, /AllowCommute=/ false);
		if (NewMI)
		return NewMI;

		// Folding failed again - undo the commute before returning.
		MI = commuteInstruction(MI, false);
		assert (MI && "commutable instruction failed to commute!");
		qcolombetUnsubmitted Not Done Reply Inline Actions Ditto. qcolombet: Ditto.
		qcolombetUnsubmitted Not Done Reply Inline Actions Ditto. qcolombet: Ditto.

		// Return here to prevent duplicate fuse failure report.
		return nullptr;
		}
		}
		}

	// No fusion	// No fusion
	if (PrintFailedFusing && !MI->isCopy())	if (PrintFailedFusing && !MI->isCopy())
	dbgs() << "We failed to fuse operand " << i << " in " << *MI;	dbgs() << "We failed to fuse operand " << i << " in " << *MI;
Context not available.

	SmallVector<MachineOperand,4> MOs;	SmallVector<MachineOperand,4> MOs;
	MOs.push_back(MachineOperand::CreateFI(FrameIndex));	MOs.push_back(MachineOperand::CreateFI(FrameIndex));
	return foldMemoryOperandImpl(MF, MI, Ops[0], MOs, Size, Alignment);	return foldMemoryOperandImpl(MF, MI, Ops[0], MOs, Size, Alignment, /AllowCommute=/ true);
	}	}

	static bool isPartialRegisterLoad(const MachineInstr &LoadMI,	static bool isPartialRegisterLoad(const MachineInstr &LoadMI,
Context not available.
	break;	break;
	}	}
	}	}
	return foldMemoryOperandImpl(MF, MI, Ops[0], MOs, 0, Alignment);	return foldMemoryOperandImpl(MF, MI, Ops[0], MOs, 0, Alignment, /AllowCommute=/ true);
	}	}


Context not available.