This is an archive of the discontinued LLVM Phabricator instance.

[x86] add a reassociation optimization to increase ILP via the MachineCombiner pass
ClosedPublic

Authored by spatel on Jun 8 2015, 2:49 PM.

Download Raw Diff

Details

Reviewers

qcolombet
jmolloy
Gerolf
escha
mehdi_amini
resistor

Commits

rG08829bac8160: [x86] Add a reassociation optimization to increase ILP via the MachineCombiner…
rL239486: [x86] Add a reassociation optimization to increase ILP via the MachineCombiner…

Summary

This is a reimplementation of D9780 at the machine instruction level rather than the DAG.

I'm using the MachineCombiner pass to reassociate scalar single-precision AVX additions (just a starting point; see the TODO comment) to increase ILP when it's safe to do so.

The code is closely based on the existing MachineCombiner optimization that is implemented for AArch64.

I tried the test cases that Mehdi provided in the follow-up thread for r236031, and I don't see any instruction count increase. In the massive test case (~8000 machine instructions) that blew up previously, this optimization will fire ~2000 times. This causes a horrible compile-time increase:

$ time llc -enable-unsafe-fp-math -x86-machine-combiner=0 -mattr=avx spill.ll
1.8 sec
$ time llc -enable-unsafe-fp-math -x86-machine-combiner=1 -mattr=avx spill.ll
35.8 sec

For now, I'm assuming that this is a degenerate test case (for x86 at least) that we don't need to artificially limit, but we could also clip the optimization to only fire N times.

Diff Detail

Repository: rL LLVM

Event Timeline

spatel updated this revision to Diff 27331.Jun 8 2015, 2:49 PM

spatel retitled this revision from to [x86] add a reassociation optimization to increase ILP via the MachineCombiner pass.

spatel updated this object.

spatel edited the test plan for this revision. (Show Details)

spatel added reviewers: Gerolf, mehdi_amini, resistor, escha, qcolombet, jmolloy.

spatel added a subscriber: Unknown Object (MLST).

Herald added a subscriber: aemerson. · View Herald TranscriptJun 8 2015, 2:49 PM

I'm also curious about performance numbers.

Cheers
Gerolf

lib/Target/X86/X86InstrInfo.cpp
6228 ↗	(On Diff #27331)	This function doesn't fit on my screen :-) You could integrate it into the combiner or split it into a) Checking for candidate b) Checking for pattern and c) generating a pattern. You also make a design decision and filter (one) pattern (vsadd/vsadd/vsadd) while searching. Instead you could separate search and filter. I can see arguments either way, but please document cost model functionality clearly.
6400 ↗	(On Diff #27331)	You could implement this with a two-dimensional array (pattern rows, operand columns) and avoid the code bloat.
lib/Target/X86/X86InstrInfo.h
32 ↗	(On Diff #27331)	Can you think of more descriptive names than 1-4? And/or add a comment about the targeted pattern?

In D10321#185353, @Gerolf wrote:

I'm also curious about performance numbers.

Hi Gerolf -

Thanks for the review! I'll work on the inline comments soon.

For perf numbers, I see no difference running the benchmarking subset of test-suite on x86 with AVX and -ffast-math; this is as expected because this pass isn't firing more than a handful of times on any test.

Do you have suggestions for other benchmarks? My motivating examples for this patch are:
https://llvm.org/bugs/show_bug.cgi?id=17305
https://llvm.org/bugs/show_bug.cgi?id=21768
https://llvm.org/bugs/show_bug.cgi?id=23116
http://reviews.llvm.org/D8941

I can create a toy test case and measure that the perf matches the IACA output in:
https://llvm.org/bugs/show_bug.cgi?id=17305#c0

Hi Sanjay

Any HPC benchmark is a good candidate. But I was just curious if you had spotted some noticeable gains.

Cheers
Gerolf

Patch updated based on Gerolf's feedback:

Renamed patterns to be slightly more meaningful
Split functions that got much too big
Replace lengthy and repetitive 'switch' with selectable array for operand indexes

Also, while stepping through in the debugger, I discovered that my attempted recycling of a virtual register was causing the critical path calculation to be wrong, so I changed that to use a new VR (see comment in the code).

Finally, I added a comment regarding the machine combiner's internal logic: it replaces instructions even if there is no outright win in critical path or resources (just a tie). I'm not sure if that should be changed in a separate patch, but I think it would make it easier to generalize this reassociation pattern further while limiting the compile time hit. Ie, we don't need to have a sequence of 3 identical ops for this reassociation optimization, just 2.

Hi Sanjay,

is this on phabricator?

My understanding of the code is that you reassociate patterns like A = ?? op ??; B = A op X; C = B op Y, but your comments suggest A’s statement can use any operand. Please check in isReassocCandidate():

+ // We must match a simple chain of dependent ops.
+ if (checkPrevOpcode && MI1->getOpcode() != AssocOpcode)
+ return nullptr;

and note that for all incarnations checkPrevOpcode is ‘true’. Or do I miss something?

Cheers
Gerolf

Please see below.

In D10321#185889, @Gerolf wrote:

Hi Sanjay,

is this on phabricator?

Yes - although there may be something odd going on here because I got a duplicate email for each of your comments. Let me know if I need to send directly to the mailing list.

My understanding of the code is that you reassociate patterns like A = ?? op ??; B = A op X; C = B op Y, but your comments suggest A’s statement can use any operand. Please check in isReassocCandidate():

+ // We must match a simple chain of dependent ops.
+ if (checkPrevOpcode && MI1->getOpcode() != AssocOpcode)
+ return nullptr;

and note that for all incarnations checkPrevOpcode is ‘true’. Or do I miss something?

Sorry - my comment here and in the code is not clear enough. I would like to set checkPrevOpcode to 'false' in a follow-on patch. This would allow, for example, this sequence to be optimized:

A = ? mul ?   <--- 'mul' could actually be anything here
B = A add X
C = B add Y

For now, I limited the pattern to only match when there are 3 consecutive 'add' ops:

A = ? add ?
B = A add X
C = B add Y

I did this because if we allow the more liberal matching of the first sequence, I'm seeing that the transform will trigger more often, but not necessarily for any benefit because of the logic in the machine combiner that replaces instructions as long as they are no worse than the original sequence. Please let me know if that makes things clearer.

Gerolf added inline comments.Jun 9 2015, 5:39 PM

lib/Target/X86/X86InstrInfo.cpp
6234 ↗	(On Diff #27400)	I'm not happy about the check* parameters that control the pattern match, but I guess it is better to revisit that at a later point when you will support more reassociation pattern. Although I still suggest removing checkPrevOpcode even for this commit.
6286 ↗	(On Diff #27400)	If you like, I think this could be implemented this by by a 2x2 matrix and avoid the conditionals.
6293 ↗	(On Diff #27400)	else?
6313 ↗	(On Diff #27400)	For this commit that parameter could be eliminated. I understand your intention, but it confusing for anyone that comes across the code the first time. So I would add it in a later commit. That will do away with the TODO also.
6334 ↗	(On Diff #27400)	Right now the code supports A = ?? op ??. I would make that clear in the comment and change it later in the follow up patch that generalizes this one. Same for the equivalent comments above and below.
6429 ↗	(On Diff #27400)	How about something like: Encodes for each pattern where to find operand in Root and Prev. The operand order in the columns is A, B, X, Y.
6436 ↗	(On Diff #27400)	I think you could make the array static and only pass Root and Prev to reassociateOps(). That would result in a smaller signature.

spatel added inline comments.Jun 10 2015, 9:25 AM

lib/Target/X86/X86InstrInfo.cpp
6234 ↗	(On Diff #27400)	Sounds good - I removed the checkPrevOpcode param.
6286 ↗	(On Diff #27400)	I know we're already relying on the enum values as indexes below, but I'd prefer to keep this logic clearer for now instead of encoding.
6293 ↗	(On Diff #27400)	I was originally going with this: http://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return ...but I prefer the symmetry of an 'else' here. Fixed.
6313 ↗	(On Diff #27400)	Param removed, however, I think that just pushes the TODO into isReassocCandidate() until we support the general case. I do intend to continue working on this, so hopefully, the TODOs are quite temporary. :)
6334 ↗	(On Diff #27400)	I made the comments line up with the more general logic that is actually implemented - so we don't even need the 'A' instruction line. The current limitation on the 3rd instruction is noted in the TODO comment in isReassocCandidate().
6436 ↗	(On Diff #27400)	I may not have implemented this as you were envisioning: I moved the array and op selection into reassociateOps(). Now we have the smaller signature, but we need to pass the Pattern value as the selector. Let me know if you see a better way. Thanks!

Patch updated; see previous inline comments and replies.

Thanks for iterating on this! LGTM!

-Gerolf

Closed by commit rL239486: [x86] Add a reassociation optimization to increase ILP via the MachineCombiner… (authored by spatel). · Explain WhyJun 10 2015, 1:36 PM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D9780: expose ILP for associative operations in the DAG.Jun 10 2015, 4:35 PM

spatel mentioned this in D10460: [x86] generalize reassociation optimization in machine combiner to 2 instructions.Jun 15 2015, 2:05 PM

spatel mentioned this in rL240361: [x86] generalize reassociation optimization in machine combiner to 2….Jun 22 2015, 5:44 PM

jingyue mentioned this in rL241099: [NVPTX] Fix issue introduced in D10321.Jun 30 2015, 11:59 AM

spatel mentioned this in D26855: New unsafe-fp-math implementation for X86 target.Jan 26 2017, 7:55 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86InstrInfo.h

33 lines

X86InstrInfo.cpp

204 lines

X86TargetMachine.cpp

6 lines

test/

CodeGen/

X86/

fp-fast.ll

78 lines

Diff 27463

llvm/trunk/lib/Target/X86/X86InstrInfo.h

Show All 20 Lines

#define GET_INSTRINFO_HEADER		#define GET_INSTRINFO_HEADER
#include "X86GenInstrInfo.inc"		#include "X86GenInstrInfo.inc"

namespace llvm {		namespace llvm {
class X86RegisterInfo;		class X86RegisterInfo;
class X86Subtarget;		class X86Subtarget;

		namespace MachineCombinerPattern {
		enum MC_PATTERN : int {
		// These are commutative variants for reassociating a computation chain
		// of the form:
		// B = A op X (Prev)
		// C = B op Y (Root)
		MC_REASSOC_AX_BY = 0,
		MC_REASSOC_AX_YB = 1,
		MC_REASSOC_XA_BY = 2,
		MC_REASSOC_XA_YB = 3,
		};
		} // end namespace MachineCombinerPattern

namespace X86 {		namespace X86 {
// X86 specific condition code. These correspond to X86_*_COND in		// X86 specific condition code. These correspond to X86_*_COND in
// X86InstrInfo.td. They must be kept in synch.		// X86InstrInfo.td. They must be kept in synch.
enum CondCode {		enum CondCode {
COND_A = 0,		COND_A = 0,
COND_AE = 1,		COND_AE = 1,
COND_B = 2,		COND_B = 2,
COND_BE = 3,		COND_BE = 3,
▲ Show 20 Lines • Show All 387 Lines • ▼ Show 20 Lines	public:
bool isHighLatencyDef(int opc) const override;		bool isHighLatencyDef(int opc) const override;

bool hasHighOperandLatency(const InstrItineraryData *ItinData,		bool hasHighOperandLatency(const InstrItineraryData *ItinData,
const MachineRegisterInfo *MRI,		const MachineRegisterInfo *MRI,
const MachineInstr *DefMI, unsigned DefIdx,		const MachineInstr *DefMI, unsigned DefIdx,
const MachineInstr *UseMI,		const MachineInstr *UseMI,
unsigned UseIdx) const override;		unsigned UseIdx) const override;


		bool useMachineCombiner() const override {
		return true;
		}

		/// Return true when there is potentially a faster code sequence
		/// for an instruction chain ending in <Root>. All potential patterns are
		/// output in the <Pattern> array.
		bool hasPattern(
		MachineInstr &Root,
		SmallVectorImpl<MachineCombinerPattern::MC_PATTERN> &P) const override;

		/// When hasPattern() finds a pattern, this function generates the
		/// instructions that could replace the original code sequence.
		void genAlternativeCodeSequence(
		MachineInstr &Root, MachineCombinerPattern::MC_PATTERN P,
		SmallVectorImpl<MachineInstr *> &InsInstrs,
		SmallVectorImpl<MachineInstr *> &DelInstrs,
		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const override;

/// analyzeCompare - For a comparison instruction, return the source registers		/// analyzeCompare - For a comparison instruction, return the source registers
/// in SrcReg and SrcReg2 if having two register operands, and the value it		/// in SrcReg and SrcReg2 if having two register operands, and the value it
/// compares against in CmpValue. Return true if the comparison instruction		/// compares against in CmpValue. Return true if the comparison instruction
/// can be analyzed.		/// can be analyzed.
bool analyzeCompare(const MachineInstr *MI, unsigned &SrcReg,		bool analyzeCompare(const MachineInstr *MI, unsigned &SrcReg,
unsigned &SrcReg2, int &CmpMask,		unsigned &SrcReg2, int &CmpMask,
int &CmpValue) const override;		int &CmpValue) const override;

Show All 34 Lines

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 6,220 Lines • ▼ Show 20 Lines
	bool X86InstrInfo::			bool X86InstrInfo::
	hasHighOperandLatency(const InstrItineraryData *ItinData,			hasHighOperandLatency(const InstrItineraryData *ItinData,
	const MachineRegisterInfo *MRI,			const MachineRegisterInfo *MRI,
	const MachineInstr *DefMI, unsigned DefIdx,			const MachineInstr *DefMI, unsigned DefIdx,
	const MachineInstr *UseMI, unsigned UseIdx) const {			const MachineInstr *UseMI, unsigned UseIdx) const {
	return isHighLatencyDef(DefMI->getOpcode());			return isHighLatencyDef(DefMI->getOpcode());
	}			}

				/// If the input instruction is part of a chain of dependent ops that are
				/// suitable for reassociation, return the earlier instruction in the sequence
				/// that defines its first operand, otherwise return a nullptr.
				/// If the instruction's operands must be commuted to be considered a
				/// reassociation candidate, Commuted will be set to true.
				static MachineInstr *isReassocCandidate(const MachineInstr &Inst,
				unsigned AssocOpcode,
				bool checkPrevOneUse,
				bool &Commuted) {
				if (Inst.getOpcode() != AssocOpcode)
				return nullptr;

				MachineOperand Op1 = Inst.getOperand(1);
				MachineOperand Op2 = Inst.getOperand(2);

				const MachineBasicBlock *MBB = Inst.getParent();
				const MachineRegisterInfo &MRI = MBB->getParent()->getRegInfo();

				// We need virtual register definitions.
				MachineInstr *MI1 = nullptr;
				MachineInstr *MI2 = nullptr;
				if (Op1.isReg() && TargetRegisterInfo::isVirtualRegister(Op1.getReg()))
				MI1 = MRI.getUniqueVRegDef(Op1.getReg());
				if (Op2.isReg() && TargetRegisterInfo::isVirtualRegister(Op2.getReg()))
				MI2 = MRI.getUniqueVRegDef(Op2.getReg());

				// And they need to be in the trace (otherwise, they won't have a depth).
				if (!MI1 \|\| !MI2 \|\| MI1->getParent() != MBB \|\| MI2->getParent() != MBB)
				return nullptr;

				Commuted = false;
				if (MI1->getOpcode() != AssocOpcode && MI2->getOpcode() == AssocOpcode) {
				std::swap(MI1, MI2);
				Commuted = true;
				}

				// Avoid reassociating operands when it won't provide any benefit. If both
				// operands are produced by instructions of this type, we may already
				// have the optimal sequence.
				if (MI2->getOpcode() == AssocOpcode)
				return nullptr;

				// The instruction must only be used by the other instruction that we
				// reassociate with.
				if (checkPrevOneUse && !MRI.hasOneNonDBGUse(MI1->getOperand(0).getReg()))
				return nullptr;

				// We must match a simple chain of dependent ops.
				// TODO: This check is not necessary for the earliest instruction in the
				// sequence. Instead of a sequence of 3 dependent instructions with the same
				// opcode, we only need to find a sequence of 2 dependent instructions with
				// the same opcode plus 1 other instruction that adds to the height of the
				// trace.
				if (MI1->getOpcode() != AssocOpcode)
				return nullptr;

				return MI1;
				}

				/// Select a pattern based on how the operands of each associative operation
				/// need to be commuted.
				static MachineCombinerPattern::MC_PATTERN getPattern(bool CommutePrev,
				bool CommuteRoot) {
				if (CommutePrev) {
				if (CommuteRoot)
				return MachineCombinerPattern::MC_REASSOC_XA_YB;
				return MachineCombinerPattern::MC_REASSOC_XA_BY;
				} else {
				if (CommuteRoot)
				return MachineCombinerPattern::MC_REASSOC_AX_YB;
				return MachineCombinerPattern::MC_REASSOC_AX_BY;
				}
				}

				bool X86InstrInfo::hasPattern(MachineInstr &Root,
				SmallVectorImpl<MachineCombinerPattern::MC_PATTERN> &Pattern) const {
				if (!Root.getParent()->getParent()->getTarget().Options.UnsafeFPMath)
				return false;

				// TODO: There are many more associative instruction types to match:
				// 1. Other forms of scalar FP add (non-AVX)
				// 2. Other data types (double, integer, vectors)
				// 3. Other math / logic operations (mul, and, or)
				unsigned AssocOpcode = X86::VADDSSrr;

				// TODO: There is nothing x86-specific here except the instruction type.
				// This logic could be hoisted into the machine combiner pass itself.
				bool CommuteRoot;
				if (MachineInstr *Prev = isReassocCandidate(Root, AssocOpcode, true,
				CommuteRoot)) {
				bool CommutePrev;
				if (isReassocCandidate(*Prev, AssocOpcode, false, CommutePrev)) {
				// We found a sequence of instructions that may be suitable for a
				// reassociation of operands to increase ILP.
				Pattern.push_back(getPattern(CommutePrev, CommuteRoot));
				return true;
				}
				}

				return false;
				}

				/// Attempt the following reassociation to reduce critical path length:
				/// B = A op X (Prev)
				/// C = B op Y (Root)
				/// ===>
				/// B = X op Y
				/// C = A op B
				static void reassociateOps(MachineInstr &Root, MachineInstr &Prev,
				MachineCombinerPattern::MC_PATTERN Pattern,
				SmallVectorImpl<MachineInstr *> &InsInstrs,
				SmallVectorImpl<MachineInstr *> &DelInstrs,
				DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) {
				MachineFunction *MF = Root.getParent()->getParent();
				MachineRegisterInfo &MRI = MF->getRegInfo();
				const TargetInstrInfo *TII = MF->getSubtarget().getInstrInfo();
				const TargetRegisterInfo *TRI = MF->getSubtarget().getRegisterInfo();
				const TargetRegisterClass *RC = Root.getRegClassConstraint(0, TII, TRI);

				// This array encodes the operand index for each parameter because the
				// operands may be commuted. Each row corresponds to a pattern value,
				// and each column specifies the index of A, B, X, Y.
				unsigned OpIdx[4][4] = {
				{ 1, 1, 2, 2 },
				{ 1, 2, 2, 1 },
				{ 2, 1, 1, 2 },
				{ 2, 2, 1, 1 }
				};

				MachineOperand &OpA = Prev.getOperand(OpIdx[Pattern][0]);
				MachineOperand &OpB = Root.getOperand(OpIdx[Pattern][1]);
				MachineOperand &OpX = Prev.getOperand(OpIdx[Pattern][2]);
				MachineOperand &OpY = Root.getOperand(OpIdx[Pattern][3]);
				MachineOperand &OpC = Root.getOperand(0);

				unsigned RegA = OpA.getReg();
				unsigned RegB = OpB.getReg();
				unsigned RegX = OpX.getReg();
				unsigned RegY = OpY.getReg();
				unsigned RegC = OpC.getReg();

				if (TargetRegisterInfo::isVirtualRegister(RegA))
				MRI.constrainRegClass(RegA, RC);
				if (TargetRegisterInfo::isVirtualRegister(RegB))
				MRI.constrainRegClass(RegB, RC);
				if (TargetRegisterInfo::isVirtualRegister(RegX))
				MRI.constrainRegClass(RegX, RC);
				if (TargetRegisterInfo::isVirtualRegister(RegY))
				MRI.constrainRegClass(RegY, RC);
				if (TargetRegisterInfo::isVirtualRegister(RegC))
				MRI.constrainRegClass(RegC, RC);

				// Create a new virtual register for the result of (X op Y) instead of
				// recycling RegB because the MachineCombiner's computation of the critical
				// path requires a new register definition rather than an existing one.
				unsigned NewVR = MRI.createVirtualRegister(RC);
				InstrIdxForVirtReg.insert(std::make_pair(NewVR, 0));

				unsigned Opcode = Root.getOpcode();
				bool KillA = OpA.isKill();
				bool KillX = OpX.isKill();
				bool KillY = OpY.isKill();

				// Create new instructions for insertion.
				MachineInstrBuilder MIB1 =
				BuildMI(*MF, Prev.getDebugLoc(), TII->get(Opcode), NewVR)
				.addReg(RegX, getKillRegState(KillX))
				.addReg(RegY, getKillRegState(KillY));
				InsInstrs.push_back(MIB1);

				MachineInstrBuilder MIB2 =
				BuildMI(*MF, Root.getDebugLoc(), TII->get(Opcode), RegC)
				.addReg(RegA, getKillRegState(KillA))
				.addReg(NewVR, getKillRegState(true));
				InsInstrs.push_back(MIB2);

				// Record old instructions for deletion.
				DelInstrs.push_back(&Prev);
				DelInstrs.push_back(&Root);
				}

				void X86InstrInfo::genAlternativeCodeSequence(
				MachineInstr &Root,
				MachineCombinerPattern::MC_PATTERN Pattern,
				SmallVectorImpl<MachineInstr *> &InsInstrs,
				SmallVectorImpl<MachineInstr *> &DelInstrs,
				DenseMap<unsigned, unsigned> &InstIdxForVirtReg) const {
				MachineRegisterInfo &MRI = Root.getParent()->getParent()->getRegInfo();

				// Select the previous instruction in the sequence based on the input pattern.
				MachineInstr *Prev = nullptr;
				if (Pattern == MachineCombinerPattern::MC_REASSOC_AX_BY \|\|
				Pattern == MachineCombinerPattern::MC_REASSOC_XA_BY)
				Prev = MRI.getUniqueVRegDef(Root.getOperand(1).getReg());
				else if (Pattern == MachineCombinerPattern::MC_REASSOC_AX_YB \|\|
				Pattern == MachineCombinerPattern::MC_REASSOC_XA_YB)
				Prev = MRI.getUniqueVRegDef(Root.getOperand(2).getReg());
				else
				assert("Unknown pattern for machine combiner");

				reassociateOps(Root, *Prev, Pattern, InsInstrs, DelInstrs, InstIdxForVirtReg);
				return;
				}

	namespace {			namespace {
	/// Create Global Base Reg pass. This initializes the PIC			/// Create Global Base Reg pass. This initializes the PIC
	/// global base register for x86-32.			/// global base register for x86-32.
	struct CGBR : public MachineFunctionPass {			struct CGBR : public MachineFunctionPass {
	static char ID;			static char ID;
	CGBR() : MachineFunctionPass(ID) {}			CGBR() : MachineFunctionPass(ID) {}

	bool runOnMachineFunction(MachineFunction &MF) override {			bool runOnMachineFunction(MachineFunction &MF) override {
	▲ Show 20 Lines • Show All 175 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86TargetMachine.cpp

Show All 18 Lines
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/FormattedStream.h"		#include "llvm/Support/FormattedStream.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Target/TargetOptions.h"		#include "llvm/Target/TargetOptions.h"
using namespace llvm;		using namespace llvm;

		static cl::opt<bool> EnableMachineCombinerPass("x86-machine-combiner",
		cl::desc("Enable the machine combiner pass"),
		cl::init(true), cl::Hidden);

extern "C" void LLVMInitializeX86Target() {		extern "C" void LLVMInitializeX86Target() {
// Register the target.		// Register the target.
RegisterTargetMachine<X86TargetMachine> X(TheX86_32Target);		RegisterTargetMachine<X86TargetMachine> X(TheX86_32Target);
RegisterTargetMachine<X86TargetMachine> Y(TheX86_64Target);		RegisterTargetMachine<X86TargetMachine> Y(TheX86_64Target);
}		}

static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {		static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {
if (TT.isOSBinFormatMachO()) {		if (TT.isOSBinFormatMachO()) {
▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	bool X86PassConfig::addInstSelector() {

addPass(createX86GlobalBaseRegPass());		addPass(createX86GlobalBaseRegPass());

return false;		return false;
}		}

bool X86PassConfig::addILPOpts() {		bool X86PassConfig::addILPOpts() {
addPass(&EarlyIfConverterID);		addPass(&EarlyIfConverterID);
		if (EnableMachineCombinerPass)
		addPass(&MachineCombinerID);
return true;		return true;
}		}

bool X86PassConfig::addPreISel() {		bool X86PassConfig::addPreISel() {
// Only add this pass for 32-bit x86 Windows.		// Only add this pass for 32-bit x86 Windows.
Triple TT(TM->getTargetTriple());		Triple TT(TM->getTargetTriple());
if (TT.isOSWindows() && TT.getArch() == Triple::x86)		if (TT.isOSWindows() && TT.getArch() == Triple::x86)
addPass(createX86WinEHStatePass());		addPass(createX86WinEHStatePass());
Show All 25 Lines

llvm/trunk/test/CodeGen/X86/fp-fast.ll

	Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0			; CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fsub float -0.0, %a			%t1 = fsub float -0.0, %a
	%t2 = fadd float %a, %t1			%t2 = fadd float %a, %t1
	ret float %t2			ret float %t2
	}			}

				; Verify that the first two adds are independent regardless of how the inputs are
				; commuted. The destination registers are used as source registers for the third add.

				define float @reassociate_adds1(float %x0, float %x1, float %x2, float %x3) {
				; CHECK-LABEL: reassociate_adds1:
				; CHECK: # BB#0:
				; CHECK-NEXT: vaddss %xmm1, %xmm0, %xmm0
				; CHECK-NEXT: vaddss %xmm3, %xmm2, %xmm1
				; CHECK-NEXT: vaddss %xmm1, %xmm0, %xmm0
				; CHECK-NEXT: retq
				%t0 = fadd float %x0, %x1
				%t1 = fadd float %t0, %x2
				%t2 = fadd float %t1, %x3
				ret float %t2
				}

				define float @reassociate_adds2(float %x0, float %x1, float %x2, float %x3) {
				; CHECK-LABEL: reassociate_adds2:
				; CHECK: # BB#0:
				; CHECK-NEXT: vaddss %xmm1, %xmm0, %xmm0
				; CHECK-NEXT: vaddss %xmm3, %xmm2, %xmm1
				; CHECK-NEXT: vaddss %xmm1, %xmm0, %xmm0
				; CHECK-NEXT: retq
				%t0 = fadd float %x0, %x1
				%t1 = fadd float %x2, %t0
				%t2 = fadd float %t1, %x3
				ret float %t2
				}

				define float @reassociate_adds3(float %x0, float %x1, float %x2, float %x3) {
				; CHECK-LABEL: reassociate_adds3:
				; CHECK: # BB#0:
				; CHECK-NEXT: vaddss %xmm1, %xmm0, %xmm0
				; CHECK-NEXT: vaddss %xmm3, %xmm2, %xmm1
				; CHECK-NEXT: vaddss %xmm1, %xmm0, %xmm0
				; CHECK-NEXT: retq
				%t0 = fadd float %x0, %x1
				%t1 = fadd float %t0, %x2
				%t2 = fadd float %x3, %t1
				ret float %t2
				}

				define float @reassociate_adds4(float %x0, float %x1, float %x2, float %x3) {
				; CHECK-LABEL: reassociate_adds4:
				; CHECK: # BB#0:
				; CHECK-NEXT: vaddss %xmm1, %xmm0, %xmm0
				; CHECK-NEXT: vaddss %xmm3, %xmm2, %xmm1
				; CHECK-NEXT: vaddss %xmm1, %xmm0, %xmm0
				; CHECK-NEXT: retq
				%t0 = fadd float %x0, %x1
				%t1 = fadd float %x2, %t0
				%t2 = fadd float %x3, %t1
				ret float %t2
				}

				; Verify that we reassociate some of these ops. The optimal balanced tree of adds is not
				; produced because that would cost more compile time.

				define float @reassociate_adds5(float %x0, float %x1, float %x2, float %x3, float %x4, float %x5, float %x6, float %x7) {
				; CHECK-LABEL: reassociate_adds5:
				; CHECK: # BB#0:
				; CHECK-NEXT: vaddss %xmm1, %xmm0, %xmm0
				; CHECK-NEXT: vaddss %xmm3, %xmm2, %xmm1
				; CHECK-NEXT: vaddss %xmm1, %xmm0, %xmm0
				; CHECK-NEXT: vaddss %xmm5, %xmm4, %xmm1
				; CHECK-NEXT: vaddss %xmm1, %xmm0, %xmm0
				; CHECK-NEXT: vaddss %xmm7, %xmm6, %xmm1
				; CHECK-NEXT: vaddss %xmm1, %xmm0, %xmm0
				; CHECK-NEXT: retq
				%t0 = fadd float %x0, %x1
				%t1 = fadd float %t0, %x2
				%t2 = fadd float %t1, %x3
				%t3 = fadd float %t2, %x4
				%t4 = fadd float %t3, %x5
				%t5 = fadd float %t4, %x6
				%t6 = fadd float %t5, %x7
				ret float %t6
				}

This is an archive of the discontinued LLVM Phabricator instance.

[x86] add a reassociation optimization to increase ILP via the MachineCombiner passClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 27463

llvm/trunk/lib/Target/X86/X86InstrInfo.h

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

llvm/trunk/lib/Target/X86/X86TargetMachine.cpp

llvm/trunk/test/CodeGen/X86/fp-fast.ll

[x86] add a reassociation optimization to increase ILP via the MachineCombiner pass
ClosedPublic