This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
IR/
-
BasicBlock.h
-
InitializePasses.h
-
LinkAllPasses.h
-
Transforms/
4
Scalar.h
-
lib/
-
IR/
-
BasicBlock.cpp
-
Transforms/
-
IPO/
4
PassManagerBuilder.cpp
-
Scalar/
-
CMakeLists.txt
-
Scalar.cpp
26
SpeculativeExecution.cpp
-
test/Transforms/SpeculativeExecution/
-
Transforms/
-
SpeculativeExecution/
-
spec.ll

Differential D9360

Add a speculative execution pass
ClosedPublic

Authored by broune on Apr 29 2015, 5:43 PM.

Download Raw Diff

Details

Reviewers

eliben
• dberlin
meheff
jingyue
hfinkel

Commits

rG154eb5aa1d9c: Add a speculative execution pass
rL237459: Add a speculative execution pass

Summary

This is a pass for speculative execution of instructions for simple if-then (triangle) control flow. It's aimed at GPUs, but could perhaps be used in other contexts. Enabling this pass gives us a 1.0% geomean improvement on Google benchmark suites, with one benchmark improving 33%.

Credit goes to Jingyue Wu for writing an earlier version of this pass.

Patched by Bjarke Roune.

Diff Detail

Event Timeline

broune updated this revision to Diff 24670.Apr 29 2015, 5:43 PM

broune retitled this revision from to Add a speculative execution pass.

broune updated this object.

broune edited the test plan for this revision. (Show Details)

broune added reviewers: meheff, jingyue, eliben, hfinkel, • dberlin.

broune added subscribers: Unknown Object (MLST), jholewinski.

It seems this pass has a good deal in common with SimplifyCFG. How is this
pass different from bumping up the cost threshold in that pass?

In D9360#163642, @majnemer wrote:

It seems this pass has a good deal in common with SimplifyCFG. How is this
pass different from bumping up the cost threshold in that pass?

I assume you are referring to the functionality in the function SpeculativelyExecuteBB in SimplifyCFG. As I read what it does, SimplifyCFG will not speculate if no selects are introduced and it will speculate at most one instruction. It also will not speculate if there is a value defined in the if-block that is only used in the then-block. These restrictions make sense since the speculation in SimplifyCFG seems aimed at introducing cheap selects, while this pass is intended to do more aggressive speculation while counting on later passes to either capitalize on that or clean it up.

What this pass does could be merged into SimplifyCFG, though at first look it doesn't appear to me to be an easy fit. We'd also need two different configurations of SimplifyCFG since aggressive speculation at every point where SimplifyCFG runs would probably be too much; I've experimented with adding this pass in multiple places simultaneously and that hasn't yielded good results so far.

jingyue added inline comments.May 5 2015, 10:52 AM

lib/Transforms/Scalar/SpeculativeExecution.cpp
61	Comment on why this threshold is necessary. My understanding is the NVIDIA driver only speculates a limited number of instructions. If too many instructions are left behind, the conditional basic block won't be executed in the SASS level.
70	Add a constructor that takes `spec-exec-max-speculation-cost` and `spec-exec-max-not-hoisted` as parameters. This allows embedded uses of LLVM (e.g., users can programmatically create these passes with their desired thresholds).
114	Can we make a new interface `getSingleSuccessor()`?
142	Should we consider `ConstantExpr` free?
181	TotalSpeculationCost
182	for (auto &I : FromBlock)

jingyue added inline comments.May 5 2015, 1:25 PM

lib/Transforms/Scalar/SpeculativeExecution.cpp
61	I meant "won't be speculatively executed".

meheff added inline comments.May 5 2015, 2:07 PM

lib/Transforms/Scalar/SpeculativeExecution.cpp
65	The double negative at the end of this description confused me as written. Maybe: "Speculative execution is not be applied to basic blocks where the number of ... exceeds this limit." Could change the other option description to match.
147	Stale comment?
160	Instruction::And? Can you use TTI instruction costs here?

hfinkel added inline comments.May 5 2015, 2:28 PM

include/llvm/Transforms/Scalar.h
426	This pass does not speculatively execute anything. Please reword.
lib/Transforms/IPO/PassManagerBuilder.cpp
232	This is pretty early in the pipeline, why? Do you want LICM to run first?
lib/Transforms/Scalar/SpeculativeExecution.cpp
148	And should this depend on the number of non-constant indices?

Added and used BasicBlock::getSingleSuccessor().
Improved flag descriptions.

Use getSingleSuccessor() more.

include/llvm/Transforms/Scalar.h
426	I'm very happy to correct any mistaken terminology. I'm unclear on whether you object to "speculation" or "execution" or both and what you would prefer instead? I note that SimplifyCFG has a function SpeculativelyExecuteBB that also moves instructions out of branches and I did see some sources with a similar terminology, e.g. "Software Speculative Execution [...] When the compiler generates speculative code, it moves instructions in front of a branch that previously had protected them from causing exceptions." http://techpubs.sgi.com/library/dynaweb_docs/0650/SGI_Developer/books/OrOn2_PfTune/sgi_html/ch05.html
lib/Transforms/IPO/PassManagerBuilder.cpp
232	I did start with a late placement, right before the nvptx backend runs. I also tried right before LICM and a bit after LICM and other places. The current placement is what worked best out of those things that I tried. My intuition of why this early placement works well is that we speculate, optimize and then soon after those instructions that were speculated but did not enable further optimization get put back by InstCombine.
lib/Transforms/Scalar/SpeculativeExecution.cpp
61	Done. The base reason for this limit is that speculating just a few instructions from a larger block tends not to be profitable. That could in part be due to an interaction with the NVIDIA driver, even though this pass is run quite early in the compiler, putting it later makes our benchmarks worse, and speculated instructions that are not further optimized tend to be sunk back by InstCombine to where they were before speculation. In the best case that I've seen for this pass, speculation allowed further follow-on optimizations, collapsing a lot of control flow and bit-fiddling instructions into fewer instructions; that sort of thing is less likely to happen when speculating just a few instructions from a larger block, so that's another reason for the limit. The comment I added fits both reasons.
65	Done.
70	This doesn't seem to be common for the other passes that I looked at in Scalar/ that have flags and it seems like it would be hard to maintain when options are added or removed, so I'd prefer to postpone adding that until someone uses it. I could add it now if you'll be using it right away?
114	Nice idea. Done.
142	Do you mean should we return 0 if all operands to an instruction are ConstantExpr? I'd think most such cases would be folded already, though maybe not for GEP, since I believe that it's required for changing types, so that does seem reasonable to catch.
147	Done.
148	I'm especially keen to speculate GEPs, as that was part of the original motivation for the pass, but you're right that this might not be good. I'll try having the cost be the number of non-constant indices + 1 and get back to you.
160	Since the speculated instructions can be sunk back or optimized later, the optimal score isn't necessarily the time it would take to execute the instructions, it's more propensity to cause good things and not cause bad things when speculated. Though TTI could still be a better approximation to that than what I've got here, so I'll try it both ways and get back to you.
181	Done.
182	Done.

jingyue added inline comments.May 6 2015, 10:15 AM

lib/Transforms/Scalar/SpeculativeExecution.cpp
70	Fine with me. I've seen CFGSimplifyPass, JumpThreading and LoopUnrolling have pass parameters.
160	If TTI's cost model works for you, I'd prefer using that. No need to invent another cost model if an existing one already works.

Use TTI for speculation cost of whitelisted instructions.

I promised to get back to you on a few things. I tried these versions overnight:

no changes,
as 1, but with the cost of a GEP as the number of non-constant indices + 1,
as 1, but with the cost function entirely replaced with TTI->getUserCost,
as 3, but still returning UINT_MAX when 1 does.

1, 2 and 4 were about the same while 3 was worse than the others. So I propose to go with 4 and I have updated the patch to that.

LGTM with a minor

lib/Transforms/Scalar/SpeculativeExecution.cpp
142	I was thinking of the case where `I` is a `ConstantExpr` because `I` is a `User` which can be `Instruction` or `ConstantExpr`. However, I later noticed you only call `ComputeSpeculationCost` with instructions. So we should make this function to more restrictively take only `Instruction`.

This revision is now accepted and ready to land.May 7 2015, 10:37 AM

LGTM

Change first parameter type of ComputeSpeculationCost to Instruction *.

lib/Transforms/Scalar/SpeculativeExecution.cpp
142	Done.

hfinkel added inline comments.May 14 2015, 11:41 AM

include/llvm/Transforms/Scalar.h
426	Please reword like this: // SpeculativeExecution - Aggressively hoist instructions to enable speculatively execution on targets where branches are expensive. (the import part here is that the verb is 'hoist', not 'execute').
lib/Transforms/IPO/PassManagerBuilder.cpp
232	How exactly do you intend this to work? There are two or three options: Don't put it in the standard pipeline (assuming that GPU-compilers have their own pipelines anyway and will use it in those custom pipelines only). Add some TTI hook to control whether or not the pass does anything. Add a variable to the pass manager controlling this (so that the frontend must decide). I prefer (2), but just leaving it here, as a something intended to be dead (except for use of some command-line flag) is not a reasonable plan. The command-line flags are for debugging and testing, not for production pass-manager control.
lib/Transforms/Scalar/SpeculativeExecution.cpp
11	Same problem here: the pass does not execute anything, it hoists instructions to cause speculative execution. Please reword.

Remove speculation flag and call from PassManagerBuilder.
Update comments on speculation.

LGTM.

Fix typo.

broune added inline comments.May 14 2015, 5:21 PM

include/llvm/Transforms/Scalar.h
426	Done, thank you for the clarification.
lib/Transforms/IPO/PassManagerBuilder.cpp
232	I removed the flag and the code it controls, going for option 1 for now. Let me know if anyone interested in this pass needs another option.

jingyue updated this object.May 15 2015, 10:57 AM

jingyue closed this revision.May 15 2015, 10:58 AM

Revision Contents

Path

Size

include/

llvm/

IR/

BasicBlock.h

16 lines

InitializePasses.h

1 line

LinkAllPasses.h

1 line

Transforms/

Scalar.h

7 lines

lib/

IR/

BasicBlock.cpp

8 lines

Transforms/

IPO/

PassManagerBuilder.cpp

1 line

Scalar/

CMakeLists.txt

1 line

Scalar.cpp

1 line

SpeculativeExecution.cpp

232 lines

test/

Transforms/

SpeculativeExecution/

spec.ll

195 lines

Diff 25829

include/llvm/IR/BasicBlock.h

Show First 20 Lines • Show All 200 Lines • ▼ Show 20 Lines	public:
/// Note that unique predecessor doesn't mean single edge, there can be		/// Note that unique predecessor doesn't mean single edge, there can be
/// multiple edges from the unique predecessor to this block (for example a		/// multiple edges from the unique predecessor to this block (for example a
/// switch statement with multiple cases having the same destination).		/// switch statement with multiple cases having the same destination).
BasicBlock *getUniquePredecessor();		BasicBlock *getUniquePredecessor();
const BasicBlock *getUniquePredecessor() const {		const BasicBlock *getUniquePredecessor() const {
return const_cast<BasicBlock*>(this)->getUniquePredecessor();		return const_cast<BasicBlock*>(this)->getUniquePredecessor();
}		}

/// Return the successor of this block if it has a unique successor.		/// \brief Return the successor of this block if it has a single successor.
/// Otherwise return a null pointer. This method is analogous to		/// Otherwise return a null pointer.
/// getUniquePredeccessor above.		///
		/// This method is analogous to getSinglePredecessor above.
		BasicBlock *getSingleSuccessor();
		const BasicBlock *getSingleSuccessor() const {
		return const_cast<BasicBlock*>(this)->getSingleSuccessor();
		}

		/// \brief Return the successor of this block if it has a unique successor.
		/// Otherwise return a null pointer.
		///
		/// This method is analogous to getUniquePredecessor above.
BasicBlock *getUniqueSuccessor();		BasicBlock *getUniqueSuccessor();
const BasicBlock *getUniqueSuccessor() const {		const BasicBlock *getUniqueSuccessor() const {
return const_cast<BasicBlock*>(this)->getUniqueSuccessor();		return const_cast<BasicBlock*>(this)->getUniqueSuccessor();
}		}

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
/// Instruction iterator methods		/// Instruction iterator methods
///		///
▲ Show 20 Lines • Show All 121 Lines • Show Last 20 Lines

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 249 Lines • ▼ Show 20 Lines
	void initializeSimpleInlinerPass(PassRegistry&);			void initializeSimpleInlinerPass(PassRegistry&);
	void initializeShadowStackGCLoweringPass(PassRegistry&);			void initializeShadowStackGCLoweringPass(PassRegistry&);
	void initializeRegisterCoalescerPass(PassRegistry&);			void initializeRegisterCoalescerPass(PassRegistry&);
	void initializeSingleLoopExtractorPass(PassRegistry&);			void initializeSingleLoopExtractorPass(PassRegistry&);
	void initializeSinkingPass(PassRegistry&);			void initializeSinkingPass(PassRegistry&);
	void initializeSeparateConstOffsetFromGEPPass(PassRegistry &);			void initializeSeparateConstOffsetFromGEPPass(PassRegistry &);
	void initializeSlotIndexesPass(PassRegistry&);			void initializeSlotIndexesPass(PassRegistry&);
	void initializeSpillPlacementPass(PassRegistry&);			void initializeSpillPlacementPass(PassRegistry&);
				void initializeSpeculativeExecutionPass(PassRegistry&);
	void initializeStackProtectorPass(PassRegistry&);			void initializeStackProtectorPass(PassRegistry&);
	void initializeStackColoringPass(PassRegistry&);			void initializeStackColoringPass(PassRegistry&);
	void initializeStackSlotColoringPass(PassRegistry&);			void initializeStackSlotColoringPass(PassRegistry&);
	void initializeStraightLineStrengthReducePass(PassRegistry &);			void initializeStraightLineStrengthReducePass(PassRegistry &);
	void initializeStripDeadDebugInfoPass(PassRegistry&);			void initializeStripDeadDebugInfoPass(PassRegistry&);
	void initializeStripDeadPrototypesPassPass(PassRegistry&);			void initializeStripDeadPrototypesPassPass(PassRegistry&);
	void initializeStripDebugDeclarePass(PassRegistry&);			void initializeStripDebugDeclarePass(PassRegistry&);
	void initializeStripNonDebugSymbolsPass(PassRegistry&);			void initializeStripNonDebugSymbolsPass(PassRegistry&);
	Show All 37 Lines

include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createMemDepPrinter();		(void) llvm::createMemDepPrinter();
(void) llvm::createInstructionSimplifierPass();		(void) llvm::createInstructionSimplifierPass();
(void) llvm::createLoopVectorizePass();		(void) llvm::createLoopVectorizePass();
(void) llvm::createSLPVectorizerPass();		(void) llvm::createSLPVectorizerPass();
(void) llvm::createBBVectorizePass();		(void) llvm::createBBVectorizePass();
(void) llvm::createPartiallyInlineLibCallsPass();		(void) llvm::createPartiallyInlineLibCallsPass();
(void) llvm::createScalarizerPass();		(void) llvm::createScalarizerPass();
(void) llvm::createSeparateConstOffsetFromGEPPass();		(void) llvm::createSeparateConstOffsetFromGEPPass();
		(void) llvm::createSpeculativeExecutionPass();
(void) llvm::createRewriteSymbolsPass();		(void) llvm::createRewriteSymbolsPass();
(void) llvm::createStraightLineStrengthReducePass();		(void) llvm::createStraightLineStrengthReducePass();
(void) llvm::createMemDerefPrinter();		(void) llvm::createMemDerefPrinter();
(void) llvm::createFloat2IntPass();		(void) llvm::createFloat2IntPass();

(void)new llvm::IntervalPartition();		(void)new llvm::IntervalPartition();
(void)new llvm::ScalarEvolution();		(void)new llvm::ScalarEvolution();
((llvm::Function*)nullptr)->viewCFGOnly();		((llvm::Function*)nullptr)->viewCFGOnly();
Show All 11 Lines

include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 417 Lines • ▼ Show 20 Lines
	// SeparateConstOffsetFromGEP - Split GEPs for better CSE			// SeparateConstOffsetFromGEP - Split GEPs for better CSE
	//			//
	FunctionPass *			FunctionPass *
	createSeparateConstOffsetFromGEPPass(const TargetMachine *TM = nullptr,			createSeparateConstOffsetFromGEPPass(const TargetMachine *TM = nullptr,
	bool LowerGEP = false);			bool LowerGEP = false);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
				// SpeculativeExecution - Aggressively hoist instructions to enable
				hfinkelUnsubmitted Not Done Reply Inline Actions This pass does not speculatively execute anything. Please reword. hfinkel: This pass does not speculatively execute anything. Please reword.
				brouneAuthorUnsubmitted Not Done Reply Inline Actions I'm very happy to correct any mistaken terminology. I'm unclear on whether you object to "speculation" or "execution" or both and what you would prefer instead? I note that SimplifyCFG has a function SpeculativelyExecuteBB that also moves instructions out of branches and I did see some sources with a similar terminology, e.g. "Software Speculative Execution [...] When the compiler generates speculative code, it moves instructions in front of a branch that previously had protected them from causing exceptions." http://techpubs.sgi.com/library/dynaweb_docs/0650/SGI_Developer/books/OrOn2_PfTune/sgi_html/ch05.html broune: I'm very happy to correct any mistaken terminology. I'm unclear on whether you object to…
				hfinkelUnsubmitted Not Done Reply Inline Actions Please reword like this: // SpeculativeExecution - Aggressively hoist instructions to enable speculatively execution on targets where branches are expensive. (the import part here is that the verb is 'hoist', not 'execute'). hfinkel: Please reword like this: // SpeculativeExecution - Aggressively hoist instructions to enable…
				brouneAuthorUnsubmitted Not Done Reply Inline Actions Done, thank you for the clarification. broune: Done, thank you for the clarification.
				// speculatively execution on targets where branches are expensive.
				//
				FunctionPass *createSpeculativeExecutionPass();

				//===----------------------------------------------------------------------===//
				//
	// LoadCombine - Combine loads into bigger loads.			// LoadCombine - Combine loads into bigger loads.
	//			//
	BasicBlockPass *createLoadCombinePass();			BasicBlockPass *createLoadCombinePass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// StraightLineStrengthReduce - This pass strength-reduces some certain			// StraightLineStrengthReduce - This pass strength-reduces some certain
	// instruction patterns in straight-line code.			// instruction patterns in straight-line code.
	Show All 35 Lines

lib/IR/BasicBlock.cpp

Show First 20 Lines • Show All 232 Lines • ▼ Show 20 Lines	for (;PI != E; ++PI) {
if (*PI != PredBB)		if (*PI != PredBB)
return nullptr;		return nullptr;
// The same predecessor appears multiple times in the predecessor list.		// The same predecessor appears multiple times in the predecessor list.
// This is OK.		// This is OK.
}		}
return PredBB;		return PredBB;
}		}

		BasicBlock *BasicBlock::getSingleSuccessor() {
		succ_iterator SI = succ_begin(this), E = succ_end(this);
		if (SI == E) return nullptr; // no successors
		BasicBlock TheSucc = SI;
		++SI;
		return (SI == E) ? TheSucc : nullptr /* multiple successors */;
		}

BasicBlock *BasicBlock::getUniqueSuccessor() {		BasicBlock *BasicBlock::getUniqueSuccessor() {
succ_iterator SI = succ_begin(this), E = succ_end(this);		succ_iterator SI = succ_begin(this), E = succ_end(this);
if (SI == E) return NULL; // No successors		if (SI == E) return NULL; // No successors
BasicBlock SuccBB = SI;		BasicBlock SuccBB = SI;
++SI;		++SI;
for (;SI != E; ++SI) {		for (;SI != E; ++SI) {
if (*SI != SuccBB)		if (*SI != SuccBB)
return NULL;		return NULL;
▲ Show 20 Lines • Show All 167 Lines • Show Last 20 Lines

lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 221 Lines • ▼ Show 20 Lines	if (OptLevel > 2)
MPM.add(createArgumentPromotionPass()); // Scalarize uninlined fn args		MPM.add(createArgumentPromotionPass()); // Scalarize uninlined fn args

// Start of function pass.		// Start of function pass.
// Break up aggregate allocas, using SSAUpdater.		// Break up aggregate allocas, using SSAUpdater.
if (UseNewSROA)		if (UseNewSROA)
MPM.add(createSROAPass(/RequiresDomTree/ false));		MPM.add(createSROAPass(/RequiresDomTree/ false));
else		else
MPM.add(createScalarReplAggregatesPass(-1, false));		MPM.add(createScalarReplAggregatesPass(-1, false));

MPM.add(createEarlyCSEPass()); // Catch trivial redundancies		MPM.add(createEarlyCSEPass()); // Catch trivial redundancies
MPM.add(createJumpThreadingPass()); // Thread jumps.		MPM.add(createJumpThreadingPass()); // Thread jumps.
		hfinkelUnsubmitted Not Done Reply Inline Actions This is pretty early in the pipeline, why? Do you want LICM to run first? hfinkel: This is pretty early in the pipeline, why? Do you want LICM to run first?
		brouneAuthorUnsubmitted Not Done Reply Inline Actions I did start with a late placement, right before the nvptx backend runs. I also tried right before LICM and a bit after LICM and other places. The current placement is what worked best out of those things that I tried. My intuition of why this early placement works well is that we speculate, optimize and then soon after those instructions that were speculated but did not enable further optimization get put back by InstCombine. broune: I did start with a late placement, right before the nvptx backend runs. I also tried right…
		hfinkelUnsubmitted Not Done Reply Inline Actions How exactly do you intend this to work? There are two or three options: Don't put it in the standard pipeline (assuming that GPU-compilers have their own pipelines anyway and will use it in those custom pipelines only). Add some TTI hook to control whether or not the pass does anything. Add a variable to the pass manager controlling this (so that the frontend must decide). I prefer (2), but just leaving it here, as a something intended to be dead (except for use of some command-line flag) is not a reasonable plan. The command-line flags are for debugging and testing, not for production pass-manager control. hfinkel: How exactly do you intend this to work? There are two or three options: 1. Don't put it in…
		brouneAuthorUnsubmitted Not Done Reply Inline Actions I removed the flag and the code it controls, going for option 1 for now. Let me know if anyone interested in this pass needs another option. broune: I removed the flag and the code it controls, going for option 1 for now. Let me know if anyone…
MPM.add(createCorrelatedValuePropagationPass()); // Propagate conditionals		MPM.add(createCorrelatedValuePropagationPass()); // Propagate conditionals
MPM.add(createCFGSimplificationPass()); // Merge & remove BBs		MPM.add(createCFGSimplificationPass()); // Merge & remove BBs
MPM.add(createInstructionCombiningPass()); // Combine silly seq's		MPM.add(createInstructionCombiningPass()); // Combine silly seq's
addExtensionsToPM(EP_Peephole, MPM);		addExtensionsToPM(EP_Peephole, MPM);

if (!DisableTailCalls)		if (!DisableTailCalls)
MPM.add(createTailCallEliminationPass()); // Eliminate tail calls		MPM.add(createTailCallEliminationPass()); // Eliminate tail calls
MPM.add(createCFGSimplificationPass()); // Merge & remove BBs		MPM.add(createCFGSimplificationPass()); // Merge & remove BBs
▲ Show 20 Lines • Show All 389 Lines • Show Last 20 Lines

lib/Transforms/Scalar/CMakeLists.txt

Show All 38 Lines	add_llvm_library(LLVMScalarOpts
SROA.cpp		SROA.cpp
SampleProfile.cpp		SampleProfile.cpp
Scalar.cpp		Scalar.cpp
ScalarReplAggregates.cpp		ScalarReplAggregates.cpp
Scalarizer.cpp		Scalarizer.cpp
SeparateConstOffsetFromGEP.cpp		SeparateConstOffsetFromGEP.cpp
SimplifyCFGPass.cpp		SimplifyCFGPass.cpp
Sink.cpp		Sink.cpp
		SpeculativeExecution.cpp
StraightLineStrengthReduce.cpp		StraightLineStrengthReduce.cpp
StructurizeCFG.cpp		StructurizeCFG.cpp
TailRecursionElimination.cpp		TailRecursionElimination.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms		${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms
${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms/Scalar		${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms/Scalar
)		)

add_dependencies(LLVMScalarOpts intrinsics_gen)		add_dependencies(LLVMScalarOpts intrinsics_gen)

lib/Transforms/Scalar/Scalar.cpp

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeSROAPass(Registry);		initializeSROAPass(Registry);
initializeSROA_DTPass(Registry);		initializeSROA_DTPass(Registry);
initializeSROA_SSAUpPass(Registry);		initializeSROA_SSAUpPass(Registry);
initializeCFGSimplifyPassPass(Registry);		initializeCFGSimplifyPassPass(Registry);
initializeStructurizeCFGPass(Registry);		initializeStructurizeCFGPass(Registry);
initializeSinkingPass(Registry);		initializeSinkingPass(Registry);
initializeTailCallElimPass(Registry);		initializeTailCallElimPass(Registry);
initializeSeparateConstOffsetFromGEPPass(Registry);		initializeSeparateConstOffsetFromGEPPass(Registry);
		initializeSpeculativeExecutionPass(Registry);
initializeStraightLineStrengthReducePass(Registry);		initializeStraightLineStrengthReducePass(Registry);
initializeLoadCombinePass(Registry);		initializeLoadCombinePass(Registry);
initializePlaceBackedgeSafepointsImplPass(Registry);		initializePlaceBackedgeSafepointsImplPass(Registry);
initializePlaceSafepointsPass(Registry);		initializePlaceSafepointsPass(Registry);
initializeFloat2IntPass(Registry);		initializeFloat2IntPass(Registry);
}		}

void LLVMInitializeScalarOpts(LLVMPassRegistryRef R) {		void LLVMInitializeScalarOpts(LLVMPassRegistryRef R) {
▲ Show 20 Lines • Show All 155 Lines • Show Last 20 Lines

lib/Transforms/Scalar/SpeculativeExecution.cpp

This file was added.

				//===- SpeculativeExecution.cpp ---------------------------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass hoists instructions to enable speculative execution on
				// targets where branches are expensive. This is aimed at GPUs. It
				hfinkelUnsubmitted Not Done Reply Inline Actions Same problem here: the pass does not execute anything, it hoists instructions to cause speculative execution. Please reword. hfinkel: Same problem here: the pass does not execute anything, it hoists instructions to cause…
				// currently works on simple if-then and if-then-else
				// patterns.
				//
				// Removing branches is not the only motivation for this
				// pass. E.g. consider this code and assume that there is no
				// addressing mode for multiplying by sizeof(*a):
				//
				// if (b > 0)
				// c = a[i + 1]
				// if (d > 0)
				// e = a[i + 2]
				//
				// turns into
				//
				// p = &a[i + 1];
				// if (b > 0)
				// c = *p;
				// q = &a[i + 2];
				// if (d > 0)
				// e = *q;
				//
				// which could later be optimized to
				//
				// r = &a[i];
				// if (b > 0)
				// c = r[1];
				// if (d > 0)
				// e = r[2];
				//
				// Later passes sink back much of the speculated code that did not enable
				// further optimization.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/SmallSet.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/Analysis/ValueTracking.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/Module.h"
				#include "llvm/IR/Operator.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/Debug.h"

				using namespace llvm;

				#define DEBUG_TYPE "speculative-execution"

				// The risk that speculation will not pay off increases with the
				// number of instructions speculated, so we put a limit on that.
				static cl::opt<unsigned> SpecExecMaxSpeculationCost(
				jingyueUnsubmitted Not Done Reply Inline Actions Comment on why this threshold is necessary. My understanding is the NVIDIA driver only speculates a limited number of instructions. If too many instructions are left behind, the conditional basic block won't be executed in the SASS level. jingyue: Comment on why this threshold is necessary. My understanding is the NVIDIA driver only…
				jingyueUnsubmitted Not Done Reply Inline Actions I meant "won't be speculatively executed". jingyue: I meant "won't be speculatively executed".
				brouneAuthorUnsubmitted Not Done Reply Inline Actions Done. The base reason for this limit is that speculating just a few instructions from a larger block tends not to be profitable. That could in part be due to an interaction with the NVIDIA driver, even though this pass is run quite early in the compiler, putting it later makes our benchmarks worse, and speculated instructions that are not further optimized tend to be sunk back by InstCombine to where they were before speculation. In the best case that I've seen for this pass, speculation allowed further follow-on optimizations, collapsing a lot of control flow and bit-fiddling instructions into fewer instructions; that sort of thing is less likely to happen when speculating just a few instructions from a larger block, so that's another reason for the limit. The comment I added fits both reasons. broune: Done. The base reason for this limit is that speculating just a few instructions from a larger…
				"spec-exec-max-speculation-cost", cl::init(7), cl::Hidden,
				cl::desc("Speculative execution is not applied to basic blocks where "
				"the cost of the instructions to speculatively execute "
				"exceeds this limit."));
				meheffUnsubmitted Not Done Reply Inline Actions The double negative at the end of this description confused me as written. Maybe: "Speculative execution is not be applied to basic blocks where the number of ... exceeds this limit." Could change the other option description to match. meheff: The double negative at the end of this description confused me as written. Maybe…
				brouneAuthorUnsubmitted Not Done Reply Inline Actions Done. broune: Done.

				// Speculating just a few instructions from a larger block tends not
				// to be profitable and this limit prevents that. A reason for that is
				// that small basic blocks are more likely to be candidates for
				// further optimization.
				jingyueUnsubmitted Not Done Reply Inline Actions Add a constructor that takes `spec-exec-max-speculation-cost` and `spec-exec-max-not-hoisted` as parameters. This allows embedded uses of LLVM (e.g., users can programmatically create these passes with their desired thresholds). jingyue: Add a constructor that takes `spec-exec-max-speculation-cost` and `spec-exec-max-not-hoisted`…
				brouneAuthorUnsubmitted Not Done Reply Inline Actions This doesn't seem to be common for the other passes that I looked at in Scalar/ that have flags and it seems like it would be hard to maintain when options are added or removed, so I'd prefer to postpone adding that until someone uses it. I could add it now if you'll be using it right away? broune: This doesn't seem to be common for the other passes that I looked at in Scalar/ that have flags…
				jingyueUnsubmitted Not Done Reply Inline Actions Fine with me. I've seen CFGSimplifyPass, JumpThreading and LoopUnrolling have pass parameters. jingyue: Fine with me. I've seen CFGSimplifyPass, JumpThreading and LoopUnrolling have pass parameters.
				static cl::opt<unsigned> SpecExecMaxNotHoisted(
				"spec-exec-max-not-hoisted", cl::init(5), cl::Hidden,
				cl::desc("Speculative execution is not applied to basic blocks where the "
				"number of instructions that would not be speculatively executed "
				"exceeds this limit."));

				class SpeculativeExecution : public FunctionPass {
				public:
				static char ID;
				SpeculativeExecution(): FunctionPass(ID) {}

				void getAnalysisUsage(AnalysisUsage &AU) const override;
				bool runOnFunction(Function &F) override;

				private:
				bool runOnBasicBlock(BasicBlock &B);
				bool considerHoistingFromTo(BasicBlock &FromBlock, BasicBlock &ToBlock);

				const TargetTransformInfo *TTI = nullptr;
				};

				char SpeculativeExecution::ID = 0;
				INITIALIZE_PASS_BEGIN(SpeculativeExecution, "speculative-execution",
				"Speculatively execute instructions", false, false)
				INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
				INITIALIZE_PASS_END(SpeculativeExecution, "speculative-execution",
				"Speculatively execute instructions", false, false)

				void SpeculativeExecution::getAnalysisUsage(AnalysisUsage &AU) const {
				AU.addRequired<TargetTransformInfoWrapperPass>();
				}

				bool SpeculativeExecution::runOnFunction(Function &F) {
				if (skipOptnoneFunction(F))
				return false;

				TTI = &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);

				bool Changed = false;
				for (auto& B : F) {
				Changed \|= runOnBasicBlock(B);
				}
				return Changed;
				}
				jingyueUnsubmitted Not Done Reply Inline Actions Can we make a new interface `getSingleSuccessor()`? jingyue: Can we make a new interface `getSingleSuccessor()`?
				brouneAuthorUnsubmitted Not Done Reply Inline Actions Nice idea. Done. broune: Nice idea. Done.

				bool SpeculativeExecution::runOnBasicBlock(BasicBlock &B) {
				BranchInst *BI = dyn_cast<BranchInst>(B.getTerminator());
				if (BI == nullptr)
				return false;

				if (BI->getNumSuccessors() != 2)
				return false;
				BasicBlock &Succ0 = *BI->getSuccessor(0);
				BasicBlock &Succ1 = *BI->getSuccessor(1);

				if (&B == &Succ0 \|\| &B == &Succ1 \|\| &Succ0 == &Succ1) {
				return false;
				}

				// Hoist from if-then (triangle).
				if (Succ0.getSinglePredecessor() != nullptr &&
				Succ0.getSingleSuccessor() == &Succ1) {
				return considerHoistingFromTo(Succ0, B);
				}

				// Hoist from if-else (triangle).
				if (Succ1.getSinglePredecessor() != nullptr &&
				Succ1.getSingleSuccessor() == &Succ0) {
				return considerHoistingFromTo(Succ1, B);
				}

				// Hoist from if-then-else (diamond), but only if it is equivalent to
				jingyueUnsubmitted Not Done Reply Inline Actions Should we consider `ConstantExpr` free? jingyue: Should we consider `ConstantExpr` free?
				brouneAuthorUnsubmitted Not Done Reply Inline Actions Do you mean should we return 0 if all operands to an instruction are ConstantExpr? I'd think most such cases would be folded already, though maybe not for GEP, since I believe that it's required for changing types, so that does seem reasonable to catch. broune: Do you mean should we return 0 if all operands to an instruction are ConstantExpr? I'd think…
				jingyueUnsubmitted Not Done Reply Inline Actions I was thinking of the case where `I` is a `ConstantExpr` because `I` is a `User` which can be `Instruction` or `ConstantExpr`. However, I later noticed you only call `ComputeSpeculationCost` with instructions. So we should make this function to more restrictively take only `Instruction`. jingyue: I was thinking of the case where `I` is a `ConstantExpr` because `I` is a `User` which can be…
				brouneAuthorUnsubmitted Not Done Reply Inline Actions Done. broune: Done.
				// an if-else or if-then due to one of the branches doing nothing.
				if (Succ0.getSinglePredecessor() != nullptr &&
				Succ1.getSinglePredecessor() != nullptr &&
				Succ1.getSingleSuccessor() != nullptr &&
				Succ1.getSingleSuccessor() != &B &&
				meheffUnsubmitted Not Done Reply Inline Actions Stale comment? meheff: Stale comment?
				brouneAuthorUnsubmitted Not Done Reply Inline Actions Done. broune: Done.
				Succ1.getSingleSuccessor() == Succ0.getSingleSuccessor()) {
				hfinkelUnsubmitted Not Done Reply Inline Actions And should this depend on the number of non-constant indices? hfinkel: And should this depend on the number of non-constant indices?
				brouneAuthorUnsubmitted Not Done Reply Inline Actions I'm especially keen to speculate GEPs, as that was part of the original motivation for the pass, but you're right that this might not be good. I'll try having the cost be the number of non-constant indices + 1 and get back to you. broune: I'm especially keen to speculate GEPs, as that was part of the original motivation for the pass…
				// If a block has only one instruction, then that is a terminator
				// instruction so that the block does nothing. This does happen.
				if (Succ1.size() == 1) // equivalent to if-then
				return considerHoistingFromTo(Succ0, B);
				if (Succ0.size() == 1) // equivalent to if-else
				return considerHoistingFromTo(Succ1, B);
				}

				return false;
				}

				static unsigned ComputeSpeculationCost(const Instruction *I,
				meheffUnsubmitted Not Done Reply Inline Actions Instruction::And? Can you use TTI instruction costs here? meheff: Instruction::And? Can you use TTI instruction costs here?
				brouneAuthorUnsubmitted Not Done Reply Inline Actions Since the speculated instructions can be sunk back or optimized later, the optimal score isn't necessarily the time it would take to execute the instructions, it's more propensity to cause good things and not cause bad things when speculated. Though TTI could still be a better approximation to that than what I've got here, so I'll try it both ways and get back to you. broune: Since the speculated instructions can be sunk back or optimized later, the optimal score isn't…
				jingyueUnsubmitted Not Done Reply Inline Actions If TTI's cost model works for you, I'd prefer using that. No need to invent another cost model if an existing one already works. jingyue: If TTI's cost model works for you, I'd prefer using that. No need to invent another cost model…
				const TargetTransformInfo &TTI) {
				switch (Operator::getOpcode(I)) {
				case Instruction::GetElementPtr:
				case Instruction::Add:
				case Instruction::Mul:
				case Instruction::And:
				case Instruction::Or:
				case Instruction::Select:
				case Instruction::Shl:
				case Instruction::Sub:
				case Instruction::LShr:
				case Instruction::AShr:
				case Instruction::Xor:
				case Instruction::ZExt:
				case Instruction::SExt:
				return TTI.getUserCost(I);

				default:
				return UINT_MAX; // Disallow anything not whitelisted.
				}
				}
				jingyueUnsubmitted Not Done Reply Inline Actions TotalSpeculationCost jingyue: TotalSpeculationCost
				brouneAuthorUnsubmitted Not Done Reply Inline Actions Done. broune: Done.

				jingyueUnsubmitted Not Done Reply Inline Actions for (auto &I : FromBlock) jingyue: ``` for (auto &I : FromBlock) ```
				brouneAuthorUnsubmitted Not Done Reply Inline Actions Done. broune: Done.
				bool SpeculativeExecution::considerHoistingFromTo(BasicBlock &FromBlock,
				BasicBlock &ToBlock) {
				SmallSet<const Instruction *, 8> NotHoisted;
				const auto AllPrecedingUsesFromBlockHoisted = [&NotHoisted](User *U) {
				for (Value* V : U->operand_values()) {
				if (Instruction *I = dyn_cast<Instruction>(V)) {
				if (NotHoisted.count(I) > 0)
				return false;
				}
				}
				return true;
				};

				unsigned TotalSpeculationCost = 0;
				for (auto& I : FromBlock) {
				const unsigned Cost = ComputeSpeculationCost(&I, *TTI);
				if (Cost != UINT_MAX && isSafeToSpeculativelyExecute(&I) &&
				AllPrecedingUsesFromBlockHoisted(&I)) {
				TotalSpeculationCost += Cost;
				if (TotalSpeculationCost > SpecExecMaxSpeculationCost)
				return false; // too much to hoist
				} else {
				NotHoisted.insert(&I);
				if (NotHoisted.size() > SpecExecMaxNotHoisted)
				return false; // too much left behind
				}
				}

				if (TotalSpeculationCost == 0)
				return false; // nothing to hoist

				for (auto I = FromBlock.begin(); I != FromBlock.end();) {
				// We have to increment I before moving Current as moving Current
				// changes the list that I is iterating through.
				auto Current = I;
				++I;
				if (!NotHoisted.count(Current)) {
				Current->moveBefore(ToBlock.getTerminator());
				}
				}
				return true;
				}

				namespace llvm {

				FunctionPass *createSpeculativeExecutionPass() {
				return new SpeculativeExecution();
				}

				} // namespace llvm

test/Transforms/SpeculativeExecution/spec.ll

This file was added.

				; RUN: opt < %s -S -speculative-execution \
				; RUN: -spec-exec-max-speculation-cost 4 -spec-exec-max-not-hoisted 3 \
				; RUN: \| FileCheck %s

				target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64"

				; Hoist in if-then pattern.
				define void @ifThen() {
				; CHECK-LABEL: @ifThen(
				; CHECK: %x = add i32 2, 3
				; CHECK: br i1 true
				br i1 true, label %a, label %b
				; CHECK: a:
				a:
				%x = add i32 2, 3
				; CHECK: br label
				br label %b
				; CHECK: b:
				b:
				; CHECK: ret void
				ret void
				}

				; Hoist in if-else pattern.
				define void @ifElse() {
				; CHECK-LABEL: @ifElse(
				; CHECK: %x = add i32 2, 3
				; CHECK: br i1 true
				br i1 true, label %b, label %a
				; CHECK: a:
				a:
				%x = add i32 2, 3
				; CHECK: br label
				br label %b
				; CHECK: b:
				b:
				; CHECK: ret void
				ret void
				}

				; Hoist in if-then-else pattern if it is equivalent to if-then.
				define void @ifElseThenAsIfThen() {
				; CHECK-LABEL: @ifElseThenAsIfThen(
				; CHECK: %x = add i32 2, 3
				; CHECK: br
				br i1 true, label %a, label %b
				; CHECK: a:
				a:
				%x = add i32 2, 3
				; CHECK: br label
				br label %c
				; CHECK: b:
				b:
				br label %c
				; CHECK: c
				c:
				ret void
				}

				; Hoist in if-then-else pattern if it is equivalent to if-else.
				define void @ifElseThenAsIfElse() {
				; CHECK-LABEL: @ifElseThenAsIfElse(
				; CHECK: %x = add i32 2, 3
				; CHECK: br
				br i1 true, label %b, label %a
				; CHECK: a:
				a:
				%x = add i32 2, 3
				; CHECK: br label
				br label %c
				; CHECK: b:
				b:
				br label %c
				; CHECK: c
				c:
				ret void
				}

				; Do not hoist if-then-else pattern if it is not equivalent to if-then
				; or if-else.
				define void @ifElseThen() {
				; CHECK-LABEL: @ifElseThen(
				; CHECK: br
				br i1 true, label %a, label %b
				; CHECK: a:
				a:
				; CHECK: %x = add
				%x = add i32 2, 3
				; CHECK: br label
				br label %c
				; CHECK: b:
				b:
				; CHECK: %y = add
				%y = add i32 2, 3
				br label %c
				; CHECK: c
				c:
				ret void
				}

				; Do not hoist loads and do not hoist an instruction past a definition of
				; an operand.
				define void @doNotHoistPastDef() {
				; CHECK-LABEL: @doNotHoistPastDef(
				br i1 true, label %b, label %a
				; CHECK-NOT: load
				; CHECK-NOT: add
				; CHECK: a:
				a:
				; CHECK: %def = load
				%def = load i32, i32* null
				; CHECK: %use = add
				%use = add i32 %def, 0
				br label %b
				; CHECK: b:
				b:
				ret void
				}

				; Case with nothing to speculate.
				define void @nothingToSpeculate() {
				; CHECK-LABEL: @nothingToSpeculate(
				br i1 true, label %b, label %a
				; CHECK: a:
				a:
				; CHECK: %def = load
				%def = load i32, i32* null
				br label %b
				; CHECK: b:
				b:
				ret void
				}

				; Still hoist if an operand is defined before the block or is itself hoisted.
				define void @hoistIfNotPastDef() {
				; CHECK-LABEL: @hoistIfNotPastDef(
				; CHECK: %x = load
				%x = load i32, i32* null
				; CHECK: %y = add i32 %x, 1
				; CHECK: %z = add i32 %y, 1
				; CHECK: br
				br i1 true, label %b, label %a
				; CHECK: a:
				a:
				%y = add i32 %x, 1
				%z = add i32 %y, 1
				br label %b
				; CHECK: b:
				b:
				ret void
				}

				; Do not hoist if the speculation cost is too high.
				define void @costTooHigh() {
				; CHECK-LABEL: @costTooHigh(
				; CHECK: br
				br i1 true, label %b, label %a
				; CHECK: a:
				a:
				; CHECK: %r1 = add
				%r1 = add i32 1, 1
				; CHECK: %r2 = add
				%r2 = add i32 1, 1
				; CHECK: %r3 = add
				%r3 = add i32 1, 1
				; CHECK: %r4 = add
				%r4 = add i32 1, 1
				; CHECK: %r5 = add
				%r5 = add i32 1, 1
				br label %b
				; CHECK: b:
				b:
				ret void
				}

				; Do not hoist if too many instructions are left behind.
				define void @tooMuchLeftBehind() {
				; CHECK-LABEL: @tooMuchLeftBehind(
				; CHECK: br
				br i1 true, label %b, label %a
				; CHECK: a:
				a:
				; CHECK: %x = load
				%x = load i32, i32* null
				; CHECK: %r1 = add
				%r1 = add i32 %x, 1
				; CHECK: %r2 = add
				%r2 = add i32 %x, 1
				; CHECK: %r3 = add
				%r3 = add i32 %x, 1
				br label %b
				; CHECK: b:
				b:
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

Add a speculative execution passClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 25829

include/llvm/IR/BasicBlock.h

include/llvm/InitializePasses.h

include/llvm/LinkAllPasses.h

include/llvm/Transforms/Scalar.h

lib/IR/BasicBlock.cpp

lib/Transforms/IPO/PassManagerBuilder.cpp

lib/Transforms/Scalar/CMakeLists.txt

lib/Transforms/Scalar/Scalar.cpp

lib/Transforms/Scalar/SpeculativeExecution.cpp

test/Transforms/SpeculativeExecution/spec.ll

Add a speculative execution pass
ClosedPublic