This is an archive of the discontinued LLVM Phabricator instance.

Differential D19439

Optimization bisect support in X86-specific passes
ClosedPublic

Authored by andrew.w.kaylor on Apr 22 2016, 3:06 PM.

Download Raw Diff

Details

Reviewers

Commits

rG2bee5ef462d1: Optimization bisect support in X86-specific passes
rL267608: Optimization bisect support in X86-specific passes

Summary

This patch adds opt-in calls to the x86-specific passes which can be skipped while still producing correct code.

The following passes are not opting in to bisection because they cannot be skipped:

FPS ("X86 FP Stackifier")
X86ExpandPseudo
WinEHStatePass
CGBR ("X86 PIC Global Base Reg Initialization")

Diff Detail

Repository: rL LLVM

Event Timeline

andrew.w.kaylor updated this revision to Diff 54735.Apr 22 2016, 3:06 PM

andrew.w.kaylor retitled this revision from to Optimization bisect support in X86-specific passes.

andrew.w.kaylor updated this object.

andrew.w.kaylor added a reviewer: DavidKreitzer.

andrew.w.kaylor set the repository for this revision to rL LLVM.

andrew.w.kaylor added a parent revision: D19172: New optimization bisect implementation (now modeled on optnone handling).

andrew.w.kaylor added a subscriber: llvm-commits.

My comments about vzeroupper make me wonder whether we want skipFunction to be able to make the distinction between skipping a pass for OptBisect and skipping a pass for -O0. I can certainly imagine wanting to run some functionally optional "optimization" passes at -O0, e.g. a pass that can improve the size/performance of the generated code w/o negatively affecting debuggability.

lib/Target/X86/X86FixupBWInsts.cpp
138 ↗	(On Diff #54735)	I see that you chose to put the skipFunction calls first before other pass-skipping checks. Did you give any thought to that placement and/or do this intentionally? The effect of putting the call earlier is that the bisection counts will be higher. For these skipFunction calls, it probably doesn't matter too much, but when we get around to adding shouldRunCase calls, we tend to want to delay the calls as long as possible to avoid wasting bisect numbers on optimization cases that get filtered out by subsequent safety/performance checks. Along those lines, you might want to make LastBisectNum in OptBisect.h a 64-bit integer rather than a 32-bit integer. It would not surprise me if a 32-bit counter overflows for a large LTO compilation, especially if we are not careful with our calls to shouldRunCase.
lib/Target/X86/X86VZeroUpper.cpp
258 ↗	(On Diff #54735)	skipFunction will return true at -O0, right? That is not the behavior we want for the vzeroupper insertion pass. Leaving the machine in a VEX-256 "dirty" state at function call/return boundaries is a very bad idea, because it will result in AVX<=>SSE transition penalties for any subsequent transitions until the next vzeroupper. The vzeroupper strategy that we defined at the initial implementation of AVX was to assume and preserve a zeroupper state at function entry/call/return boundaries so that AVX128<=>SSE transitions would incur no penalties. Think of it as a kind of "performance ABI". Having just one call to one routine that violates these rules can crater performance for the lifetime of a program. So perhaps the best approach is to just make VZeroUpperInserter a required pass and document the rationale. I would have no objection to hooking this pass up to the optimization bisector for debugging purposes since the pass is optional from a functional perspective. But that might cause more confusion than it's worth.

In D19439#412264, @DavidKreitzer wrote:

My comments about vzeroupper make me wonder whether we want skipFunction to be able to make the distinction between skipping a pass for OptBisect and skipping a pass for -O0. I can certainly imagine wanting to run some functionally optional "optimization" passes at -O0, e.g. a pass that can improve the size/performance of the generated code w/o negatively affecting debuggability.

I think you're right about having "optimization" passes that we want to run at -O0. I would expect that to be uncommon enough and limited enough in scope that it's probably OK not to involve those passes in the bisect, the benefit being that -O0, "optnone" and the bisect all follow the same rules. We could revisit that later if it turns out to limit the usefulness of the bisect.

lib/Target/X86/X86FixupBWInsts.cpp
138 ↗	(On Diff #54735)	I started out always making the skip check be the first thing that happens so that it doesn't creep in to the logic of the run functions, but as I've been making more of these changes I've started putting it behind more trivial checks like this to trim the extra call in cases where bisection isn't being done. Given the logarithmic order of the bisection I wouldn't think it will make much difference at this level, but I suppose once we get into having all the cases instrumented those will add up quickly. It's definitely best to be consistent and I don't have any reason not to put the skip check behind trivial checks like this, so I'll standardize on that.
lib/Target/X86/X86VZeroUpper.cpp
258 ↗	(On Diff #54735)	It turns out that there is a lit test which verifies that nothing is skipped by the "optnone" attribute check that would otherwise be run at -O0 and I think it makes sense for opt bisect to follow the same rule. So given that we think VZeroUpper should be run at O0 I'll remove this check.

Removed skip calls from passes that are run at -O0
Moved skip checks behind other single variable conditions

Thanks for the fixes, Andy.

Hmmm, it was probably unintentional for X86CallFrameOptimization to be run at -O0. (It was a consequence of this change: http://reviews.llvm.org/D18573.) I doubt we want to do that, because it potentially has an adverse affect on debugging. Based on a quick experiment, it looks like the optimization fails to detect the pattern in the -O0 machine IR anyway, but that is probably also coincidental. At any rate, don't worry about it for this patch. I'm doing some work in X86CallFrameOptimization and can add the code to make sure it gets disabled at -O0.

So this patch LGTM.

This revision is now accepted and ready to land.Apr 26 2016, 1:55 PM

Closed by commit rL267608: Optimization bisect support in X86-specific passes (authored by akaylor). · Explain WhyApr 26 2016, 2:50 PM

This revision was automatically updated to reflect the committed changes.

andrew.w.kaylor mentioned this in D23683: Include X86CallFrameOptimization in the opt-bisect process.Aug 18 2016, 11:40 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

2 lines

3 lines

5 lines

3 lines

X86PadShortFunction.cpp

3 lines

Diff 55098

llvm/trunk/lib/Target/X86/X86FixupBWInsts.cpp

Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	private:
LivePhysRegs LiveRegs;		LivePhysRegs LiveRegs;
};		};
char FixupBWInstPass::ID = 0;		char FixupBWInstPass::ID = 0;
}		}

FunctionPass *llvm::createX86FixupBWInsts() { return new FixupBWInstPass(); }		FunctionPass *llvm::createX86FixupBWInsts() { return new FixupBWInstPass(); }

bool FixupBWInstPass::runOnMachineFunction(MachineFunction &MF) {		bool FixupBWInstPass::runOnMachineFunction(MachineFunction &MF) {
if (!FixupBWInsts)		if (!FixupBWInsts \|\| skipFunction(*MF.getFunction()))
return false;		return false;

this->MF = &MF;		this->MF = &MF;
TII = MF.getSubtarget<X86Subtarget>().getInstrInfo();		TII = MF.getSubtarget<X86Subtarget>().getInstrInfo();
OptForSize = MF.getFunction()->optForSize();		OptForSize = MF.getFunction()->optForSize();
MLI = &getAnalysis<MachineLoopInfo>();		MLI = &getAnalysis<MachineLoopInfo>();
LiveRegs.init(&TII->getRegisterInfo());		LiveRegs.init(&TII->getRegisterInfo());

▲ Show 20 Lines • Show All 144 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86FixupLEAs.cpp

Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	case X86::ADD16rr_DB:
}		}
}		}
return TII->convertToThreeAddress(MFI, MBBI, nullptr);		return TII->convertToThreeAddress(MFI, MBBI, nullptr);
}		}

FunctionPass *llvm::createX86FixupLEAs() { return new FixupLEAPass(); }		FunctionPass *llvm::createX86FixupLEAs() { return new FixupLEAPass(); }

bool FixupLEAPass::runOnMachineFunction(MachineFunction &Func) {		bool FixupLEAPass::runOnMachineFunction(MachineFunction &Func) {
		if (skipFunction(*Func.getFunction()))
		return false;

MF = &Func;		MF = &Func;
const X86Subtarget &ST = Func.getSubtarget<X86Subtarget>();		const X86Subtarget &ST = Func.getSubtarget<X86Subtarget>();
OptIncDec = !ST.slowIncDec() \|\| Func.getFunction()->optForMinSize();		OptIncDec = !ST.slowIncDec() \|\| Func.getFunction()->optForMinSize();
OptLEA = ST.LEAusesAG() \|\| ST.slowLEA();		OptLEA = ST.LEAusesAG() \|\| ST.slowLEA();

if (!OptLEA && !OptIncDec)		if (!OptLEA && !OptIncDec)
return false;		return false;

▲ Show 20 Lines • Show All 244 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 7,371 Lines • ▼ Show 20 Lines
	llvm::createX86GlobalBaseRegPass() { return new CGBR(); }			llvm::createX86GlobalBaseRegPass() { return new CGBR(); }

	namespace {			namespace {
	struct LDTLSCleanup : public MachineFunctionPass {			struct LDTLSCleanup : public MachineFunctionPass {
	static char ID;			static char ID;
	LDTLSCleanup() : MachineFunctionPass(ID) {}			LDTLSCleanup() : MachineFunctionPass(ID) {}

	bool runOnMachineFunction(MachineFunction &MF) override {			bool runOnMachineFunction(MachineFunction &MF) override {
				if (skipFunction(*MF.getFunction()))
				return false;

	X86MachineFunctionInfo* MFI = MF.getInfo<X86MachineFunctionInfo>();			X86MachineFunctionInfo *MFI = MF.getInfo<X86MachineFunctionInfo>();
	if (MFI->getNumLocalDynamicTLSAccesses() < 2) {			if (MFI->getNumLocalDynamicTLSAccesses() < 2) {
	// No point folding accesses if there isn't at least two.			// No point folding accesses if there isn't at least two.
	return false;			return false;
	}			}

	MachineDominatorTree *DT = &getAnalysis<MachineDominatorTree>();			MachineDominatorTree *DT = &getAnalysis<MachineDominatorTree>();
	return VisitNode(DT->getRootNode(), 0);			return VisitNode(DT->getRootNode(), 0);
	}			}
	▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86OptimizeLEAs.cpp

Show First 20 Lines • Show All 610 Lines • ▼ Show 20 Lines	bool OptimizeLEAPass::removeRedundantLEAs(MemOpMap &LEAs) {

return Changed;		return Changed;
}		}

bool OptimizeLEAPass::runOnMachineFunction(MachineFunction &MF) {		bool OptimizeLEAPass::runOnMachineFunction(MachineFunction &MF) {
bool Changed = false;		bool Changed = false;

// Perform this optimization only if we care about code size.		// Perform this optimization only if we care about code size.
if (DisableX86LEAOpt \|\| !MF.getFunction()->optForSize())		if (DisableX86LEAOpt \|\| skipFunction(*MF.getFunction()) \|\|
		!MF.getFunction()->optForSize())
return false;		return false;

MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();
TII = MF.getSubtarget<X86Subtarget>().getInstrInfo();		TII = MF.getSubtarget<X86Subtarget>().getInstrInfo();
TRI = MF.getSubtarget<X86Subtarget>().getRegisterInfo();		TRI = MF.getSubtarget<X86Subtarget>().getRegisterInfo();

// Process all basic blocks.		// Process all basic blocks.
for (auto &MBB : MF) {		for (auto &MBB : MF) {
Show All 21 Lines

llvm/trunk/lib/Target/X86/X86PadShortFunction.cpp

	Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines

	FunctionPass *llvm::createX86PadShortFunctions() {			FunctionPass *llvm::createX86PadShortFunctions() {
	return new PadShortFunc();			return new PadShortFunc();
	}			}

	/// runOnMachineFunction - Loop over all of the basic blocks, inserting			/// runOnMachineFunction - Loop over all of the basic blocks, inserting
	/// NOOP instructions before early exits.			/// NOOP instructions before early exits.
	bool PadShortFunc::runOnMachineFunction(MachineFunction &MF) {			bool PadShortFunc::runOnMachineFunction(MachineFunction &MF) {
				if (skipFunction(*MF.getFunction()))
				return false;

	if (MF.getFunction()->optForSize()) {			if (MF.getFunction()->optForSize()) {
	return false;			return false;
	}			}

	STI = &MF.getSubtarget<X86Subtarget>();			STI = &MF.getSubtarget<X86Subtarget>();
	if (!STI->padShortFunctions())			if (!STI->padShortFunctions())
	return false;			return false;

	▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Optimization bisect support in X86-specific passesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 55098

llvm/trunk/lib/Target/X86/X86FixupBWInsts.cpp

llvm/trunk/lib/Target/X86/X86FixupLEAs.cpp

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

llvm/trunk/lib/Target/X86/X86OptimizeLEAs.cpp

llvm/trunk/lib/Target/X86/X86PadShortFunction.cpp

Optimization bisect support in X86-specific passes
ClosedPublic