This is an archive of the discontinued LLVM Phabricator instance.

New code hoisting pass based on GVN
AbandonedPublic

Authored by sebpop on Apr 5 2016, 10:55 AM.

Download Raw Diff

Details

Reviewers

chandlerc
• dberlin

Summary

This pass hoists common computations across branches sharing common immediate
dominator. Like early-cse, the primary goal of early-gvn is to reduce the size
of functions before inline heuristics to reduce the total cost of function
inlining. In some cases this pass also reduces the critical path by exposing
more ILP.

The pass addresses the comments from Daniel Berlin from the previous review at http://reviews.llvm.org/D18710:
in particular, the complexity of the pass is now O(N) with N the number of instructions in the blocks where code hoisting is applied.
The pass works on the existing GVN from trunk, and can be ported to Danny's NewGVN by updating the calls to the VN look-ups.

Passes llvm regression test and test-suite.

Pass written by:
Sebastian Pop
Aditya Kumar
Xiaoyu Hu
Brian Rzycki

Diff Detail

Event Timeline

sebpop updated this revision to Diff 52713.Apr 5 2016, 10:55 AM

sebpop retitled this revision from to New code hoisting pass based on GVN.

sebpop updated this object.

sebpop added reviewers: • dberlin, chandlerc, mcrosier.

sebpop set the repository for this revision to rL LLVM.

sebpop added subscribers: llvm-commits, flyingforyou, hiraditya and 2 others.

hxy9243 added a subscriber: hxy9243.Apr 5 2016, 11:03 AM

Some high level comments:

The code formatting seems bad. Please run clang-format at the least on the new code.
Thanks for addressing Danny's comments about the algorithmic complexity. Keeping this away from O(n^2) algorithms is really important.
You didn't address Danny's comment about providing motivating benchmark data showing the improvements this provides.
We also would likely need to see a reasonable analysis of the effect this has on compile time so we understand the tradeoff this is making. My suspicion is that it won't be the correct tradeoff for O2 if it is correct at any level. (See below.)

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
212–214	It makes very little sense IMO to do early-cse and then early-gvn... The only point of early-cse was to be a substantially lighter weight CSE than GVN. If you're going to run GVN anyways to do hoisting, just run GVN. Either way, I strongly suspect this should be conditioned on O3...

This revision now requires changes to proceed.Apr 7 2016, 1:26 AM

I just want to echo Chandler's comments about complexity:
Thank you for taking the time to do the work to not add another N^2 pass to
the compiler, even if it took you a bit more time/effort :)
(I'll take another gander at the patch for code review, but you should
address chandler's comments)

In D18798#394436, @dberlin wrote:

I just want to echo Chandler's comments about complexity:
Thank you for taking the time to do the work to not add another N^2 pass to
the compiler, even if it took you a bit more time/effort :)
(I'll take another gander at the patch for code review, but you should
address chandler's comments)

Sure, we will get back with useful numbers and fix formatting issues.

jevinskie added a subscriber: jevinskie.Apr 7 2016, 11:07 AM

In D18798#394160, @chandlerc wrote:

We also would likely need to see a reasonable analysis of the effect this

has on compile time so we understand the tradeoff this is making. My suspicion
is that it won't be the correct tradeoff for O2 if it is correct at any
level. (See below.) [...] It makes very little sense IMO to do early-cse and
then early-gvn... The only point of early-cse was to be a substantially lighter
weight CSE than GVN. If you're going to run GVN anyways to do hoisting, just
run GVN.

I think there is a misconception of "GVN == PRE": this is because GVN.cpp was
around for a while and the "GVN pass" is a "redundancy elimination" pass.

To clarify matters, GVN is an analysis pass, like SCEV or data dependence are
analysis passes. What GVN does is similar to what md5 does to a text file:
given an instruction, GVN returns a unique integer identifying what the compiler
safely estimates as identical computations.

Our pass uses GVN as an analysis pass to perform code hoisting. CSE removes
redundant computations (same expression computed twice over the same execution
path.) Code hoisting does not remove redundancies: it replaces several identical
expressions executed exactly once on all execution paths with a single
expression placed in a common dominator block. Code hoisting, like CSE, is good
for code size reduction: it improves inlining heuristics by reducing the size of
function bodies.

Over the c/c++ SPEC 2006 benchmarks the GVN-hoisting pass removes 16161
instructions, 4622 loads and 11539 scalar instructions. Overall spec scores
improve with the patch except for bzip2 and gcc:

CINT2006:
400.perlbench 2.4%
401.bzip2 -0.9%
403.gcc -1%
429.mcf 6%
445.gobmk 1%
456.hmmer 0%
458.sjeng 1%
462.libquantum 4.6%
464.h264ref 0.6%
471.omnetpp 1.5%
473.astar 1.9%
483.xalancbmk llvm trunk coredumps

CFP2006:
433.milc 0%
444.namd 0.8%
447.dealII 0%
450.soplex 2.8%
453.povray 1%
470.lbm 2.6%
482.sphinx3 3.9%

Compile time statistics for the c/c++ benchmarks of SPEC2006 show that overall
there are more functions inlined:

Without the patch:

Number of call sites deleted, not inlined: 20394
Number of functions deleted because all callers found: 70497
Number of functions inlined: 182119
Number of allocas merged together: 225
Number of caller-callers analyzed: 200361
Number of call sites analyzed: 445806

With the patch:

Number of call sites deleted, not inlined: 20419
Number of functions deleted because all callers found: 70502
Number of functions inlined: 182449
Number of allocas merged together: 227
Number of caller-callers analyzed: 200929
Number of call sites analyzed: 446858
Number of hoisted loads: 4622
Number of hoisted scalar instructions: 11539

Compilation time is not impacted compiling nightly test-suite with a clang
configured in release mode:

llvm-trunk $ /usr/bin/time ninja -j1
355.80user 16.30system 6:24.47elapsed 96%CPU (0avgtext+0avgdata 285796maxresident)k
64inputs+291368outputs (7major+10867361minor)pagefaults 0swaps

patch $ /usr/bin/time ninja -j1
355.34user 15.55system 6:24.26elapsed 96%CPU (0avgtext+0avgdata 286008maxresident)k
94568inputs+290792outputs (453major+10921164minor)pagefaults 0swaps

I have renamed the new pass GVN-hoist to avoid further confusion. I think it
would be good in the NewGVN that Danny is pushing to trunk to have a clear
distinction of the GVN analysis with a clear interface that can be used by
transforms other than GVN-PRE. IMO forcing the GVN.cpp file to only contain the
analysis and pushing the other transforms out in different files would achieve
more clarity in the use of GVN analysis.

mcrosier added inline comments.Apr 11 2016, 11:17 AM

llvm/lib/Transforms/Scalar/GVNHoist.cpp
265	Should this be 'return hoistExpressions()'?

• dberlin added inline comments.Apr 11 2016, 11:38 AM

llvm/lib/Transforms/Scalar/GVNHoist.cpp
113	You may just want to use df_iterator over the dom tree instead of recursion. It's easier to follow, and won't blow out the stack :) If you use recursion, you have to or together all these values.
173	This still value numbers bbs unnecessarily, and can try to do so multiple times depending on the branch structure. The way i suggested only value numbers a given BB once :) Let me suggest a simpler way than rewriting the algorithm entirely (as i suggested): value number everything up front, store a sorted set bbtovalues, keeping a list of value numbers that exist in each bb. Instead of walking instructions in this algorithm, intersect bbtovalues. For each VN in the intersection, see if the expressions for that VN that occur in each BB can be hoisted. (this will only ever walk the expressions you might hoist, instead of try to value number every expression again)

• dberlin added inline comments.Apr 11 2016, 11:40 AM

llvm/lib/Transforms/Scalar/GVNHoist.cpp
173	Note: I realize you have tried to limit the branch structure to cases where you can guarantee you will only value number a given BB once. I'm saying "if anyone ever, in the future, wanted to extend this at all, they'd have to rewrite your entire algorithm right now" :)

hiraditya added inline comments.Apr 12 2016, 10:23 AM

llvm/lib/Transforms/Scalar/GVNHoist.cpp
173	Thanks for the feedback. Yes, in the current implementation we value number each instruction only once. When an expression is hoisted still the value numbers are not invalidated because one of the expressions in the child branch gets hoisted. So, an extension of this algorithm could be to start hoisting equivalent (scalar) expressions across non-sibling branches. In that case, I think, we would still value number only once, if we hoist one of the expressions and delete the rest. For loads, or other expressions with side-effects, hoisting would require more complicated analysis across each edge dominated by the common dominator of the equivalent expressions, so maybe we could skip that to save compile time, I'm not very sure about this though. Could you give an example of a case when multiple value numbering of same expression will be required, that will be helpful.

hiraditya added inline comments.Apr 12 2016, 10:30 AM

llvm/lib/Transforms/Scalar/GVNHoist.cpp
265	Yes, thanks for pointing out.

Thanks for the feedback. Yes, in the current implementation we value
number each instruction only once. When an expression is hoisted still the
value numbers are not invalidated because one of the expressions in the
child branch gets hoisted.

Right. This is a very trivial form of hoisting using the dominator tree.

So, an extension of this algorithm could be to start hoisting equivalent
(scalar) expressions across non-sibling branches.

This will not work if the branches are ever back edges or blocks with
multiple predecessors in general unless all predecessors are part of the
same branch of the dominator tree.

In that case, I think, we would still value number only once,

Only if the conditions above are met (that's off the top of my head, i'm
pretty sure i could think of more).

if we hoist one of the expressions and delete the rest. For loads, or
other expressions with side-effects, hoisting would require more
complicated analysis across each edge dominated by the common dominator of
the equivalent expressions, so maybe we could skip that to save compile
time, I'm not very sure about this though.

It's actually a pretty simple analysis, FWIW:

With memory-ssa, all loads have a "def" of the nearest thing above them
that kills them. So if the loads reach the same def, you can hoist them to
that def. If they do not, you cannot.
You can do extra work to see if you can somehow go back past that def if
you like.

Could you give an example of a case when multiple value numbering of same
expression will be required, that will be helpful.

Anything whose value depends on a phi node, directly or indirectly. For
multiple predecessor blocks, if you are trying to determine constant values
of things and not just sameness, it will also take multiple iterations to
get correct answers.

See, in general, the description of the RPO algorithm in
https://www.cs.rice.edu/~keith/Promo/CRPC-TR95636.pdf.gz

Currently

http://reviews.llvm.org/D18798

The optimistic approach Danny asked for is implemented in http://reviews.llvm.org/D19338

llvm/lib/Transforms/Scalar/GVNHoist.cpp
173	"[...] rewrite your entire algorithm right now" :) Danny knows how to persuade people on doing what he wants ;-)

mcrosier resigned from this revision.Jun 10 2016, 8:40 AM

mcrosier removed a reviewer: mcrosier.

Should this revision be abandoned?

mehdi_amini removed a subscriber: mehdi_amini.Jun 10 2016, 8:51 AM

Abandoned: the newer version of the patch is in http://reviews.llvm.org/D19338

Revision Contents

Path

Size

llvm/

include/

llvm/

InitializePasses.h

1 line

LinkAllPasses.h

1 line

Transforms/

Scalar.h

7 lines

Scalar/

GVN.h

15 lines

lib/

Passes/

PassRegistry.def

1 line

Transforms/

IPO/

PassManagerBuilder.cpp

1 line

Scalar/

CMakeLists.txt

1 line

GVNHoist.cpp

322 lines

Scalar.cpp

5 lines

Utils/

SimplifyCFG.cpp

176 lines

test/

Transforms/

GVN/

hoist.ll

212 lines

Diff 53095

llvm/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines
	void initializeAddressSanitizerPass(PassRegistry&);			void initializeAddressSanitizerPass(PassRegistry&);
	void initializeAddressSanitizerModulePass(PassRegistry&);			void initializeAddressSanitizerModulePass(PassRegistry&);
	void initializeMemorySanitizerPass(PassRegistry&);			void initializeMemorySanitizerPass(PassRegistry&);
	void initializeThreadSanitizerPass(PassRegistry&);			void initializeThreadSanitizerPass(PassRegistry&);
	void initializeSanitizerCoverageModulePass(PassRegistry&);			void initializeSanitizerCoverageModulePass(PassRegistry&);
	void initializeDataFlowSanitizerPass(PassRegistry&);			void initializeDataFlowSanitizerPass(PassRegistry&);
	void initializeScalarizerPass(PassRegistry&);			void initializeScalarizerPass(PassRegistry&);
	void initializeEarlyCSELegacyPassPass(PassRegistry &);			void initializeEarlyCSELegacyPassPass(PassRegistry &);
				void initializeGVNHoistLegacyPassPass(PassRegistry &);
	void initializeEliminateAvailableExternallyPass(PassRegistry&);			void initializeEliminateAvailableExternallyPass(PassRegistry&);
	void initializeExpandISelPseudosPass(PassRegistry&);			void initializeExpandISelPseudosPass(PassRegistry&);
	void initializeForceFunctionAttrsLegacyPassPass(PassRegistry&);			void initializeForceFunctionAttrsLegacyPassPass(PassRegistry&);
	void initializeGCMachineCodeAnalysisPass(PassRegistry&);			void initializeGCMachineCodeAnalysisPass(PassRegistry&);
	void initializeGCModuleInfoPass(PassRegistry&);			void initializeGCModuleInfoPass(PassRegistry&);
	void initializeGVNLegacyPassPass(PassRegistry&);			void initializeGVNLegacyPassPass(PassRegistry&);
	void initializeGlobalDCEPass(PassRegistry&);			void initializeGlobalDCEPass(PassRegistry&);
	void initializeGlobalOptPass(PassRegistry&);			void initializeGlobalOptPass(PassRegistry&);
	▲ Show 20 Lines • Show All 186 Lines • Show Last 20 Lines

llvm/include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createStripDeadPrototypesPass();		(void) llvm::createStripDeadPrototypesPass();
(void) llvm::createTailCallEliminationPass();		(void) llvm::createTailCallEliminationPass();
(void) llvm::createJumpThreadingPass();		(void) llvm::createJumpThreadingPass();
(void) llvm::createUnifyFunctionExitNodesPass();		(void) llvm::createUnifyFunctionExitNodesPass();
(void) llvm::createInstCountPass();		(void) llvm::createInstCountPass();
(void) llvm::createConstantHoistingPass();		(void) llvm::createConstantHoistingPass();
(void) llvm::createCodeGenPreparePass();		(void) llvm::createCodeGenPreparePass();
(void) llvm::createEarlyCSEPass();		(void) llvm::createEarlyCSEPass();
		(void) llvm::createGVNHoistPass();
(void) llvm::createMergedLoadStoreMotionPass();		(void) llvm::createMergedLoadStoreMotionPass();
(void) llvm::createGVNPass();		(void) llvm::createGVNPass();
(void) llvm::createMemCpyOptPass();		(void) llvm::createMemCpyOptPass();
(void) llvm::createLoopDeletionPass();		(void) llvm::createLoopDeletionPass();
(void) llvm::createPostDomTree();		(void) llvm::createPostDomTree();
(void) llvm::createInstructionNamerPass();		(void) llvm::createInstructionNamerPass();
(void) llvm::createMetaRenamerPass();		(void) llvm::createMetaRenamerPass();
(void) llvm::createPostOrderFunctionAttrsLegacyPass();		(void) llvm::createPostOrderFunctionAttrsLegacyPass();
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 322 Lines • ▼ Show 20 Lines
	//			//
	// EarlyCSE - This pass performs a simple and fast CSE pass over the dominator			// EarlyCSE - This pass performs a simple and fast CSE pass over the dominator
	// tree.			// tree.
	//			//
	FunctionPass *createEarlyCSEPass();			FunctionPass *createEarlyCSEPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
				// GVNHoist - This pass performs a simple and fast GVN pass over the dominator
				// tree to hoist common expressions from sibling branches.
				//
				FunctionPass *createGVNHoistPass();

				//===----------------------------------------------------------------------===//
				//
	// MergedLoadStoreMotion - This pass merges loads and stores in diamonds. Loads			// MergedLoadStoreMotion - This pass merges loads and stores in diamonds. Loads
	// are hoisted into the header, while stores sink into the footer.			// are hoisted into the header, while stores sink into the footer.
	//			//
	FunctionPass *createMergedLoadStoreMotionPass();			FunctionPass *createMergedLoadStoreMotionPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// MemCpyOpt - This pass performs optimizations related to eliminating memcpy			// MemCpyOpt - This pass performs optimizations related to eliminating memcpy
	▲ Show 20 Lines • Show All 164 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar/GVN.h

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	void markInstructionForDeletion(Instruction *I) {
VN.erase(I);		VN.erase(I);
InstrsToErase.push_back(I);		InstrsToErase.push_back(I);
}		}

DominatorTree &getDominatorTree() const { return *DT; }		DominatorTree &getDominatorTree() const { return *DT; }
AliasAnalysis *getAliasAnalysis() const { return VN.getAliasAnalysis(); }		AliasAnalysis *getAliasAnalysis() const { return VN.getAliasAnalysis(); }
MemoryDependenceResults &getMemDep() const { return *MD; }		MemoryDependenceResults &getMemDep() const { return *MD; }

private:
friend class gvn::GVNLegacyPass;

struct Expression;		struct Expression;
friend struct DenseMapInfo<Expression>;

/// This class holds the mapping between values and value numbers. It is used		/// This class holds the mapping between values and value numbers. It is used
/// as an efficient mechanism to determine the expression-wise equivalence of		/// as an efficient mechanism to determine the expression-wise equivalence of
/// two values.		/// two values.
class ValueTable {		class ValueTable {
DenseMap<Value *, uint32_t> valueNumbering;		DenseMap<Value *, uint32_t> valueNumbering;
DenseMap<Expression, uint32_t> expressionNumbering;		DenseMap<Expression, uint32_t> expressionNumbering;
AliasAnalysis *AA;		AliasAnalysis *AA;
Show All 26 Lines	public:
void setAliasAnalysis(AliasAnalysis *A) { AA = A; }		void setAliasAnalysis(AliasAnalysis *A) { AA = A; }
AliasAnalysis *getAliasAnalysis() const { return AA; }		AliasAnalysis *getAliasAnalysis() const { return AA; }
void setMemDep(MemoryDependenceResults *M) { MD = M; }		void setMemDep(MemoryDependenceResults *M) { MD = M; }
void setDomTree(DominatorTree *D) { DT = D; }		void setDomTree(DominatorTree *D) { DT = D; }
uint32_t getNextUnusedValueNumber() { return nextValueNumber; }		uint32_t getNextUnusedValueNumber() { return nextValueNumber; }
void verifyRemoved(const Value *) const;		void verifyRemoved(const Value *) const;
};		};

		private:
		friend class gvn::GVNLegacyPass;
		friend struct DenseMapInfo<Expression>;

MemoryDependenceResults *MD;		MemoryDependenceResults *MD;
DominatorTree *DT;		DominatorTree *DT;
const TargetLibraryInfo *TLI;		const TargetLibraryInfo *TLI;
AssumptionCache *AC;		AssumptionCache *AC;
SetVector<BasicBlock *> DeadBlocks;		SetVector<BasicBlock *> DeadBlocks;

ValueTable VN;		ValueTable VN;

▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	private:
void addDeadBlock(BasicBlock *BB);		void addDeadBlock(BasicBlock *BB);
void assignValNumForDeadCode();		void assignValNumForDeadCode();
};		};

/// Create a legacy GVN pass. This also allows parameterizing whether or not		/// Create a legacy GVN pass. This also allows parameterizing whether or not
/// loads are eliminated by the pass.		/// loads are eliminated by the pass.
FunctionPass *createGVNPass(bool NoLoads = false);		FunctionPass *createGVNPass(bool NoLoads = false);

		/// \brief A simple and fast domtree-based GVN pass to hoist common expressions
		/// from sibling branches.
		struct GVNHoistPass : PassInfoMixin<GVNHoistPass> {
		/// \brief Run the pass over the function.
		PreservedAnalyses run(Function &F, AnalysisManager<Function> &AM);
		};

}		}

#endif		#endif

llvm/lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	#undef FUNCTION_ANALYSIS			#undef FUNCTION_ANALYSIS

	#ifndef FUNCTION_PASS			#ifndef FUNCTION_PASS
	#define FUNCTION_PASS(NAME, CREATE_PASS)			#define FUNCTION_PASS(NAME, CREATE_PASS)
	#endif			#endif
	FUNCTION_PASS("aa-eval", AAEvaluator())			FUNCTION_PASS("aa-eval", AAEvaluator())
	FUNCTION_PASS("adce", ADCEPass())			FUNCTION_PASS("adce", ADCEPass())
	FUNCTION_PASS("early-cse", EarlyCSEPass())			FUNCTION_PASS("early-cse", EarlyCSEPass())
				FUNCTION_PASS("gvn-hoist", GVNHoistPass())
	FUNCTION_PASS("instcombine", InstCombinePass())			FUNCTION_PASS("instcombine", InstCombinePass())
	FUNCTION_PASS("invalidate<all>", InvalidateAllAnalysesPass())			FUNCTION_PASS("invalidate<all>", InvalidateAllAnalysesPass())
	FUNCTION_PASS("no-op-function", NoOpFunctionPass())			FUNCTION_PASS("no-op-function", NoOpFunctionPass())
	FUNCTION_PASS("lower-expect", LowerExpectIntrinsicPass())			FUNCTION_PASS("lower-expect", LowerExpectIntrinsicPass())
	FUNCTION_PASS("gvn", GVN())			FUNCTION_PASS("gvn", GVN())
	FUNCTION_PASS("print", PrintFunctionPass(dbgs()))			FUNCTION_PASS("print", PrintFunctionPass(dbgs()))
	FUNCTION_PASS("print<assumptions>", AssumptionPrinterPass(dbgs()))			FUNCTION_PASS("print<assumptions>", AssumptionPrinterPass(dbgs()))
	FUNCTION_PASS("print<domtree>", DominatorTreePrinterPass(dbgs()))			FUNCTION_PASS("print<domtree>", DominatorTreePrinterPass(dbgs()))
	Show All 25 Lines

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 203 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateFunctionPassManager(

addInitialAliasAnalysisPasses(FPM);		addInitialAliasAnalysisPasses(FPM);

FPM.add(createCFGSimplificationPass());		FPM.add(createCFGSimplificationPass());
if (UseNewSROA)		if (UseNewSROA)
FPM.add(createSROAPass());		FPM.add(createSROAPass());
else		else
FPM.add(createScalarReplAggregatesPass());		FPM.add(createScalarReplAggregatesPass());
FPM.add(createEarlyCSEPass());		FPM.add(createEarlyCSEPass());
		FPM.add(createGVNHoistPass());
FPM.add(createLowerExpectIntrinsicPass());		FPM.add(createLowerExpectIntrinsicPass());
		chandlercUnsubmitted Not Done Reply Inline Actions It makes very little sense IMO to do early-cse and then early-gvn... The only point of early-cse was to be a substantially lighter weight CSE than GVN. If you're going to run GVN anyways to do hoisting, just run GVN. Either way, I strongly suspect this should be conditioned on O3... chandlerc: It makes very little sense IMO to do early-cse and then early-gvn... The only point of early…
}		}

// Do PGO instrumentation generation or use pass as the option specified.		// Do PGO instrumentation generation or use pass as the option specified.
void PassManagerBuilder::addPGOInstrPasses(legacy::PassManagerBase &MPM) {		void PassManagerBuilder::addPGOInstrPasses(legacy::PassManagerBase &MPM) {
if (!PGOInstrGen.empty()) {		if (!PGOInstrGen.empty()) {
MPM.add(createPGOInstrumentationGenPass());		MPM.add(createPGOInstrumentationGenPass());
// Add the profile lowering pass.		// Add the profile lowering pass.
InstrProfOptions Options;		InstrProfOptions Options;
▲ Show 20 Lines • Show All 633 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/CMakeLists.txt

	add_llvm_library(LLVMScalarOpts			add_llvm_library(LLVMScalarOpts
	ADCE.cpp			ADCE.cpp
	AlignmentFromAssumptions.cpp			AlignmentFromAssumptions.cpp
	BDCE.cpp			BDCE.cpp
	ConstantHoisting.cpp			ConstantHoisting.cpp
	ConstantProp.cpp			ConstantProp.cpp
	CorrelatedValuePropagation.cpp			CorrelatedValuePropagation.cpp
	DCE.cpp			DCE.cpp
	DeadStoreElimination.cpp			DeadStoreElimination.cpp
	EarlyCSE.cpp			EarlyCSE.cpp
	FlattenCFGPass.cpp			FlattenCFGPass.cpp
	Float2Int.cpp			Float2Int.cpp
	GVN.cpp			GVN.cpp
				GVNHoist.cpp
	InductiveRangeCheckElimination.cpp			InductiveRangeCheckElimination.cpp
	IndVarSimplify.cpp			IndVarSimplify.cpp
	JumpThreading.cpp			JumpThreading.cpp
	LICM.cpp			LICM.cpp
	LoadCombine.cpp			LoadCombine.cpp
	LoopDeletion.cpp			LoopDeletion.cpp
	LoopDataPrefetch.cpp			LoopDataPrefetch.cpp
	LoopDistribute.cpp			LoopDistribute.cpp
	Show All 40 Lines

llvm/lib/Transforms/Scalar/GVNHoist.cpp

This file was added.

				//===- GVNHoist.cpp - Hoist scalar and load expressions -------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass hoists expressions from branches to a common dominator. This pass
				// uses GVN (global value numbering) to discover expressions computing the same
				// values.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/Scalar/GVN.h"
				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/DepthFirstIterator.h"
				#include "llvm/ADT/Hashing.h"
				#include "llvm/ADT/MapVector.h"
				#include "llvm/ADT/PostOrderIterator.h"
				#include "llvm/ADT/SetVector.h"
				#include "llvm/ADT/SmallPtrSet.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/Analysis/AliasAnalysis.h"
				#include "llvm/Analysis/AssumptionCache.h"
				#include "llvm/Analysis/CFG.h"
				#include "llvm/Analysis/ConstantFolding.h"
				#include "llvm/Analysis/GlobalsModRef.h"
				#include "llvm/Analysis/InstructionSimplify.h"
				#include "llvm/Analysis/Loads.h"
				#include "llvm/Analysis/MemoryBuiltins.h"
				#include "llvm/Analysis/MemoryDependenceAnalysis.h"
				#include "llvm/Analysis/PHITransAddr.h"
				#include "llvm/Analysis/TargetLibraryInfo.h"
				#include "llvm/Analysis/ValueTracking.h"
				#include "llvm/IR/DataLayout.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/IR/GlobalVariable.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/IR/LLVMContext.h"
				#include "llvm/IR/Metadata.h"
				#include "llvm/IR/PatternMatch.h"
				#include "llvm/Support/Allocator.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Transforms/Scalar.h"
				#include "llvm/Transforms/Utils/BasicBlockUtils.h"
				#include "llvm/Transforms/Utils/Local.h"
				#include "llvm/Transforms/Utils/SSAUpdater.h"
				#include <vector>
				#include <unordered_map>
				using namespace llvm;

				#define DEBUG_TYPE "gvn-hoist"

				STATISTIC(NumHoistedScalars, "Number of hoisted scalar instructions");
				STATISTIC(NumHoistedLoads, "Number of hoisted loads");

				static cl::opt<int> HoistedScalarsThreshold(
				"hoisted-scalars-threshold", cl::Hidden, cl::init(-1),
				cl::desc("Max number of scalar instructions to hoist "
				"(default unlimited = -1)"));
				static cl::opt<int>
				HoistedLoadsThreshold("hoisted-loads-threshold", cl::Hidden, cl::init(-1),
				cl::desc("Max number of loads to hoist "
				"(default unlimited = -1)"));
				static int HoistedScalars = 0;
				static int HoistedLoads = 0;

				namespace {
				// This pass hoists common computations across branches sharing
				// common immediate dominator. The primary goal is to reduce the code size,
				// and in some cases reduce critical path (by exposing more ILP).
				class GVNHoistLegacyPassImpl {
				public:
				GVN::ValueTable VN;
				DominatorTree *DT;
				AliasAnalysis *AA;
				MemoryDependenceResults *MD;
				static char ID;

				GVNHoistLegacyPassImpl(DominatorTree Dt, AliasAnalysis Aa,
				MemoryDependenceResults *Md)
				: DT(Dt), AA(Aa), MD(Md) {}

				// Return true when all operands of Instr are available at insertion point
				// InsertPt. When limiting the number of hoisted expressions, one could hoist
				// a load without hoisting its access function. So before hoisting any
				// expression, make sure that all its operands are available at insert point.
				bool allOperandsAvailable(Instruction I, Instruction InsertPt) {
				for (unsigned i = 0, e = I->getNumOperands(); i != e; ++i) {
				Value *Op = I->getOperand(i);
				Instruction *Inst = dyn_cast<Instruction>(Op);
				if (!Inst)
				continue;

				if (!DT->dominates(Inst->getParent(), InsertPt->getParent()))
				return false;
				}

				return true;
				}

				// Hoist all expressions. Return true when code has been hoisted under Dom.
				bool hoistExpressions(DomTreeNodeBase<BasicBlock> *Dom) {
				// Depth first search for the leaves of the dominator tree. We start
				// hoisting expressions from the bottom up because that would allow some
				// expressions to be hoisted several times.
				for (auto BB : Dom)
				hoistExpressions(BB);
				dberlinUnsubmitted Not Done Reply Inline Actions You may just want to use df_iterator over the dom tree instead of recursion. It's easier to follow, and won't blow out the stack :) If you use recursion, you have to or together all these values. dberlin: You may just want to use df_iterator over the dom tree instead of recursion. It's easier to…

				BasicBlock *BB = Dom->getBlock();
				// Only handle two branches for now: it is possible to extend the hoisting
				// to switch statements.
				BranchInst *BI = dyn_cast<BranchInst>(BB->getTerminator());
				if (!BI \|\| BI->getNumSuccessors() != 2)
				return false;

				BasicBlock *BB1 = BI->getSuccessor(0);
				BasicBlock *BB2 = BI->getSuccessor(1);
				assert(BB1 != BB2 && "invalid CFG");

				if (!DT->properlyDominates(BB, BB1) \|\| !DT->properlyDominates(BB, BB2) \|\|
				BB1->isEHPad() \|\| BB1->hasAddressTaken() \|\| BB2->isEHPad() \|\|
				BB2->hasAddressTaken())
				return false;

				bool Changed = false;
				bool IsTriangle = false;

				// The First BB to be traversed should be the one with single predecessor.
				if (!BB1->getSinglePredecessor()) {
				if (!BB2->getSinglePredecessor())
				return false;
				if (BB2->getSingleSuccessor() != BB1)
				return false;
				std::swap(BB1, BB2);
				IsTriangle = true;
				} else if (!BB2->getSinglePredecessor()) {
				if (BB1->getSingleSuccessor() != BB2)
				return false;
				IsTriangle = true;
				}

				assert(BB1->getSinglePredecessor() == BB && "invalid insertion point");

				// Record from BB1 all load and scalar instructions and their VN.
				std::unordered_map<unsigned, Instruction *> VNtoScalar;
				std::unordered_map<unsigned, LoadInst *> VNtoLoad;
				bool ProcessLoads = true;
				for (Instruction &I1 : *BB1) {
				if (I1.mayHaveSideEffects()) {
				if (IsTriangle)
				VNtoLoad.clear();
				ProcessLoads = false;
				}
				LoadInst *Load = dyn_cast<LoadInst>(&I1);
				if (!Load) {
				VNtoScalar.insert(std::make_pair(VN.lookup_or_add(&I1), &I1));
				continue;
				}
				if (!ProcessLoads \|\| !Load->isSimple()) {
				ProcessLoads = false;
				continue;
				}
				Value *Ptr = Load->getPointerOperand();
				VNtoLoad.insert(std::make_pair(VN.lookup_or_add(Ptr), Load));
				}

				// Scan BB2 for instructions appearing in BB1 with identical VN and hoist
				dberlinUnsubmitted Not Done Reply Inline Actions This still value numbers bbs unnecessarily, and can try to do so multiple times depending on the branch structure. The way i suggested only value numbers a given BB once :) Let me suggest a simpler way than rewriting the algorithm entirely (as i suggested): value number everything up front, store a sorted set bbtovalues, keeping a list of value numbers that exist in each bb. Instead of walking instructions in this algorithm, intersect bbtovalues. For each VN in the intersection, see if the expressions for that VN that occur in each BB can be hoisted. (this will only ever walk the expressions you might hoist, instead of try to value number every expression again) dberlin: This still value numbers bbs unnecessarily, and can try to do so multiple times depending on…
				dberlinUnsubmitted Not Done Reply Inline Actions Note: I realize you have tried to limit the branch structure to cases where you can guarantee you will only value number a given BB once. I'm saying "if anyone ever, in the future, wanted to extend this at all, they'd have to rewrite your entire algorithm right now" :) dberlin: Note: I realize you have tried to limit the branch structure to cases where you can guarantee…
				hiradityaUnsubmitted Not Done Reply Inline Actions Thanks for the feedback. Yes, in the current implementation we value number each instruction only once. When an expression is hoisted still the value numbers are not invalidated because one of the expressions in the child branch gets hoisted. So, an extension of this algorithm could be to start hoisting equivalent (scalar) expressions across non-sibling branches. In that case, I think, we would still value number only once, if we hoist one of the expressions and delete the rest. For loads, or other expressions with side-effects, hoisting would require more complicated analysis across each edge dominated by the common dominator of the equivalent expressions, so maybe we could skip that to save compile time, I'm not very sure about this though. Could you give an example of a case when multiple value numbering of same expression will be required, that will be helpful. hiraditya: Thanks for the feedback. Yes, in the current implementation we value number each instruction…
				sebpopAuthorUnsubmitted Not Done Reply Inline Actions "[...] rewrite your entire algorithm right now" :) Danny knows how to persuade people on doing what he wants ;-) sebpop: > "[...] rewrite your entire algorithm right now" :) Danny knows how to persuade people on…
				// them at InsertPt.
				Instruction *InsertPt = BB->getTerminator();
				ProcessLoads = true;
				for (BasicBlock::iterator II = BB2->begin(); II != BB2->end();) {
				Instruction I2 = &II;
				++II;

				if (I2->mayHaveSideEffects() \|\| I2->mayWriteToMemory()) {
				ProcessLoads = false;
				continue;
				}

				LoadInst *Load = dyn_cast<LoadInst>(I2);
				if (!Load) {
				// Bound the number of hoisted scalar expressions.
				if (HoistedScalarsThreshold != -1 &&
				HoistedScalars >= HoistedScalarsThreshold)
				continue;

				// Hoist scalars.
				unsigned V = VN.lookup_or_add(I2);

				// Check whether BB1 contains an similar scalar instruction.
				auto It = VNtoScalar.find(V);
				if (It == VNtoScalar.end())
				continue;

				// Make sure all operands are available at insertion point.
				if (!allOperandsAvailable(I2, InsertPt))
				continue;

				// Hoist identical instructions I2 and I1.
				Instruction *I1 = It->second;
				DEBUG(dbgs() << "GVN hoisting scalar: " << *I1 << '\n');
				I1->moveBefore(InsertPt);
				I2->replaceAllUsesWith(I1);
				I2->eraseFromParent();

				Changed = true;
				NumHoistedScalars++;
				HoistedScalars++;
				continue;
				}

				if (!ProcessLoads \|\| !Load->isSimple()) {
				ProcessLoads = false;
				continue;
				}
				// Bound the number of hoisted load expressions.
				if (HoistedLoadsThreshold != -1 && HoistedLoads >= HoistedLoadsThreshold)
				continue;

				// Hoist loads.
				Value *Ptr2 = Load->getPointerOperand();
				unsigned V = VN.lookup_or_add(Ptr2);

				// Check whether BB1 contains a similar load.
				auto It = VNtoLoad.find(V);
				if (It == VNtoLoad.end())
				continue;

				// Check whether the load elements are of the same type.
				LoadInst *I1 = It->second;
				Value *Ptr1 = I1->getPointerOperand();
				if (cast<PointerType>(Ptr1->getType())->getElementType() !=
				cast<PointerType>(Ptr2->getType())->getElementType())
				continue;

				// Make sure all operands are available at insertion point.
				if (!allOperandsAvailable(I2, InsertPt))
				continue;

				// Hoist identical load instructions I2 and I1.
				DEBUG(dbgs() << "GVN hoisting load: " << *I1 << '\n');
				I1->moveBefore(InsertPt);
				I2->replaceAllUsesWith(I1);
				I2->eraseFromParent();

				Changed = true;
				NumHoistedLoads++;
				HoistedLoads++;
				}

				return Changed;
				}

				bool run() {
				VN.setDomTree(DT);
				VN.setAliasAnalysis(AA);
				VN.setMemDep(MD);
				hoistExpressions(DT->getNode(DT->getRoot()));
				return false;
				mcrosierUnsubmitted Not Done Reply Inline Actions Should this be 'return hoistExpressions()'? mcrosier: Should this be 'return hoistExpressions()'?
				hiradityaUnsubmitted Not Done Reply Inline Actions Yes, thanks for pointing out. hiraditya: Yes, thanks for pointing out.
				}
				};

				class GVNHoistLegacyPass : public FunctionPass {
				public:
				static char ID;

				GVNHoistLegacyPass() : FunctionPass(ID) {
				initializeGVNHoistLegacyPassPass(*PassRegistry::getPassRegistry());
				}

				bool runOnFunction(Function &F) override {
				if (skipOptnoneFunction(F))
				return false;

				auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
				auto &AA = getAnalysis<AAResultsWrapperPass>().getAAResults();
				auto &MD = getAnalysis<MemoryDependenceWrapperPass>().getMemDep();

				GVNHoistLegacyPassImpl G(&DT, &AA, &MD);
				return G.run();
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<DominatorTreeWrapperPass>();
				AU.addRequired<MemoryDependenceWrapperPass>();
				AU.addRequired<AAResultsWrapperPass>();
				AU.addPreserved<DominatorTreeWrapperPass>();
				}
				};
				} // namespace

				PreservedAnalyses GVNHoistPass::run(Function &F,
				AnalysisManager<Function> &AM) {
				DominatorTree &DT = AM.getResult<DominatorTreeAnalysis>(F);
				AliasAnalysis &AA = AM.getResult<AAManager>(F);
				MemoryDependenceResults &MD = AM.getResult<MemoryDependenceAnalysis>(F);

				GVNHoistLegacyPassImpl G(&DT, &AA, &MD);
				if (!G.run())
				return PreservedAnalyses::all();

				PreservedAnalyses PA;
				PA.preserve<DominatorTreeAnalysis>();
				return PA;
				}

				char GVNHoistLegacyPass::ID = 0;
				INITIALIZE_PASS_BEGIN(GVNHoistLegacyPass, "gvn-hoist",
				"Early GVN Hoisting of Expressions", false, false)
				INITIALIZE_PASS_DEPENDENCY(MemoryDependenceWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
				INITIALIZE_PASS_END(GVNHoistLegacyPass, "gvn-hoist",
				"Early GVN Hoisting of Expressions", false, false)

				FunctionPass *llvm::createGVNHoistPass() { return new GVNHoistLegacyPass(); }

llvm/lib/Transforms/Scalar/Scalar.cpp

Show All 37 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeConstantPropagationPass(Registry);		initializeConstantPropagationPass(Registry);
initializeCorrelatedValuePropagationPass(Registry);		initializeCorrelatedValuePropagationPass(Registry);
initializeDCEPass(Registry);		initializeDCEPass(Registry);
initializeDeadInstEliminationPass(Registry);		initializeDeadInstEliminationPass(Registry);
initializeScalarizerPass(Registry);		initializeScalarizerPass(Registry);
initializeDSEPass(Registry);		initializeDSEPass(Registry);
initializeGVNLegacyPassPass(Registry);		initializeGVNLegacyPassPass(Registry);
initializeEarlyCSELegacyPassPass(Registry);		initializeEarlyCSELegacyPassPass(Registry);
		initializeGVNHoistLegacyPassPass(Registry);
initializeFlattenCFGPassPass(Registry);		initializeFlattenCFGPassPass(Registry);
initializeInductiveRangeCheckEliminationPass(Registry);		initializeInductiveRangeCheckEliminationPass(Registry);
initializeIndVarSimplifyPass(Registry);		initializeIndVarSimplifyPass(Registry);
initializeJumpThreadingPass(Registry);		initializeJumpThreadingPass(Registry);
initializeLICMPass(Registry);		initializeLICMPass(Registry);
initializeLoopDataPrefetchPass(Registry);		initializeLoopDataPrefetchPass(Registry);
initializeLoopDeletionPass(Registry);		initializeLoopDeletionPass(Registry);
initializeLoopAccessAnalysisPass(Registry);		initializeLoopAccessAnalysisPass(Registry);
▲ Show 20 Lines • Show All 177 Lines • ▼ Show 20 Lines
void LLVMAddCorrelatedValuePropagationPass(LLVMPassManagerRef PM) {		void LLVMAddCorrelatedValuePropagationPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createCorrelatedValuePropagationPass());		unwrap(PM)->add(createCorrelatedValuePropagationPass());
}		}

void LLVMAddEarlyCSEPass(LLVMPassManagerRef PM) {		void LLVMAddEarlyCSEPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createEarlyCSEPass());		unwrap(PM)->add(createEarlyCSEPass());
}		}

		void LLVMAddGVNHoistLegacyPass(LLVMPassManagerRef PM) {
		unwrap(PM)->add(createGVNHoistPass());
		}

void LLVMAddTypeBasedAliasAnalysisPass(LLVMPassManagerRef PM) {		void LLVMAddTypeBasedAliasAnalysisPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createTypeBasedAAWrapperPass());		unwrap(PM)->add(createTypeBasedAAWrapperPass());
}		}

void LLVMAddScopedNoAliasAAPass(LLVMPassManagerRef PM) {		void LLVMAddScopedNoAliasAAPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createScopedNoAliasAAWrapperPass());		unwrap(PM)->add(createScopedNoAliasAAWrapperPass());
}		}

void LLVMAddBasicAliasAnalysisPass(LLVMPassManagerRef PM) {		void LLVMAddBasicAliasAnalysisPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createBasicAAWrapperPass());		unwrap(PM)->add(createBasicAAWrapperPass());
}		}

void LLVMAddLowerExpectIntrinsicPass(LLVMPassManagerRef PM) {		void LLVMAddLowerExpectIntrinsicPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createLowerExpectIntrinsicPass());		unwrap(PM)->add(createLowerExpectIntrinsicPass());
}		}

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

Show First 20 Lines • Show All 1,062 Lines • ▼ Show 20 Lines	if (PCV == CV && SafeToMergeTerminators(TI, PTI)) {
}		}

Changed = true;		Changed = true;
}		}
}		}
return Changed;		return Changed;
}		}

// If we would need to insert a select that uses the value of this invoke
// (comments in HoistThenElseCodeToIf explain why we would need to do this), we
// can't hoist the invoke, as there is nowhere to put the select in this case.
static bool isSafeToHoistInvoke(BasicBlock BB1, BasicBlock BB2,
Instruction I1, Instruction I2) {
for (BasicBlock *Succ : successors(BB1)) {
PHINode *PN;
for (BasicBlock::iterator BBI = Succ->begin();
(PN = dyn_cast<PHINode>(BBI)); ++BBI) {
Value *BB1V = PN->getIncomingValueForBlock(BB1);
Value *BB2V = PN->getIncomingValueForBlock(BB2);
if (BB1V != BB2V && (BB1V==I1 \|\| BB2V==I2)) {
return false;
}
}
}
return true;
}

static bool passingValueIsAlwaysUndefined(Value V, Instruction I);		static bool passingValueIsAlwaysUndefined(Value V, Instruction I);

/// Given a conditional branch that goes to BB1 and BB2, hoist any common code
/// in the two blocks up into the branch block. The caller of this function
/// guarantees that BI's block dominates BB1 and BB2.
static bool HoistThenElseCodeToIf(BranchInst *BI,
const TargetTransformInfo &TTI) {
// This does very trivial matching, with limited scanning, to find identical
// instructions in the two blocks. In particular, we don't want to get into
// O(M*N) situations here where M and N are the sizes of BB1 and BB2. As
// such, we currently just scan for obviously identical instructions in an
// identical order.
BasicBlock *BB1 = BI->getSuccessor(0); // The true destination.
BasicBlock *BB2 = BI->getSuccessor(1); // The false destination

BasicBlock::iterator BB1_Itr = BB1->begin();
BasicBlock::iterator BB2_Itr = BB2->begin();

Instruction I1 = &BB1_Itr++, I2 = &BB2_Itr++;
// Skip debug info if it is not identical.
DbgInfoIntrinsic *DBI1 = dyn_cast<DbgInfoIntrinsic>(I1);
DbgInfoIntrinsic *DBI2 = dyn_cast<DbgInfoIntrinsic>(I2);
if (!DBI1 \|\| !DBI2 \|\| !DBI1->isIdenticalToWhenDefined(DBI2)) {
while (isa<DbgInfoIntrinsic>(I1))
I1 = &*BB1_Itr++;
while (isa<DbgInfoIntrinsic>(I2))
I2 = &*BB2_Itr++;
}
if (isa<PHINode>(I1) \|\| !I1->isIdenticalToWhenDefined(I2) \|\|
(isa<InvokeInst>(I1) && !isSafeToHoistInvoke(BB1, BB2, I1, I2)))
return false;

BasicBlock *BIParent = BI->getParent();

bool Changed = false;
do {
// If we are hoisting the terminator instruction, don't move one (making a
// broken BB), instead clone it, and remove BI.
if (isa<TerminatorInst>(I1))
goto HoistTerminator;

if (!TTI.isProfitableToHoist(I1) \|\| !TTI.isProfitableToHoist(I2))
return Changed;

// For a normal instruction, we just move one to right before the branch,
// then replace all uses of the other with the first. Finally, we remove
// the now redundant second instruction.
BIParent->getInstList().splice(BI->getIterator(), BB1->getInstList(), I1);
if (!I2->use_empty())
I2->replaceAllUsesWith(I1);
I1->intersectOptionalDataWith(I2);
unsigned KnownIDs[] = {
LLVMContext::MD_tbaa, LLVMContext::MD_range,
LLVMContext::MD_fpmath, LLVMContext::MD_invariant_load,
LLVMContext::MD_nonnull, LLVMContext::MD_invariant_group,
LLVMContext::MD_align, LLVMContext::MD_dereferenceable,
LLVMContext::MD_dereferenceable_or_null};
combineMetadata(I1, I2, KnownIDs);
I2->eraseFromParent();
Changed = true;

I1 = &*BB1_Itr++;
I2 = &*BB2_Itr++;
// Skip debug info if it is not identical.
DbgInfoIntrinsic *DBI1 = dyn_cast<DbgInfoIntrinsic>(I1);
DbgInfoIntrinsic *DBI2 = dyn_cast<DbgInfoIntrinsic>(I2);
if (!DBI1 \|\| !DBI2 \|\| !DBI1->isIdenticalToWhenDefined(DBI2)) {
while (isa<DbgInfoIntrinsic>(I1))
I1 = &*BB1_Itr++;
while (isa<DbgInfoIntrinsic>(I2))
I2 = &*BB2_Itr++;
}
} while (I1->isIdenticalToWhenDefined(I2));

return true;

HoistTerminator:
// It may not be possible to hoist an invoke.
if (isa<InvokeInst>(I1) && !isSafeToHoistInvoke(BB1, BB2, I1, I2))
return Changed;

for (BasicBlock *Succ : successors(BB1)) {
PHINode *PN;
for (BasicBlock::iterator BBI = Succ->begin();
(PN = dyn_cast<PHINode>(BBI)); ++BBI) {
Value *BB1V = PN->getIncomingValueForBlock(BB1);
Value *BB2V = PN->getIncomingValueForBlock(BB2);
if (BB1V == BB2V)
continue;

// Check for passingValueIsAlwaysUndefined here because we would rather
// eliminate undefined control flow then converting it to a select.
if (passingValueIsAlwaysUndefined(BB1V, PN) \|\|
passingValueIsAlwaysUndefined(BB2V, PN))
return Changed;

if (isa<ConstantExpr>(BB1V) && !isSafeToSpeculativelyExecute(BB1V))
return Changed;
if (isa<ConstantExpr>(BB2V) && !isSafeToSpeculativelyExecute(BB2V))
return Changed;
}
}

// Okay, it is safe to hoist the terminator.
Instruction *NT = I1->clone();
BIParent->getInstList().insert(BI->getIterator(), NT);
if (!NT->getType()->isVoidTy()) {
I1->replaceAllUsesWith(NT);
I2->replaceAllUsesWith(NT);
NT->takeName(I1);
}

IRBuilder<NoFolder> Builder(NT);
// Hoisting one of the terminators from our successor is a great thing.
// Unfortunately, the successors of the if/else blocks may have PHI nodes in
// them. If they do, all PHI entries for BB1/BB2 must agree for all PHI
// nodes, so we insert select instruction to compute the final result.
std::map<std::pair<Value,Value>, SelectInst*> InsertedSelects;
for (BasicBlock *Succ : successors(BB1)) {
PHINode *PN;
for (BasicBlock::iterator BBI = Succ->begin();
(PN = dyn_cast<PHINode>(BBI)); ++BBI) {
Value *BB1V = PN->getIncomingValueForBlock(BB1);
Value *BB2V = PN->getIncomingValueForBlock(BB2);
if (BB1V == BB2V) continue;

// These values do not agree. Insert a select instruction before NT
// that determines the right value.
SelectInst *&SI = InsertedSelects[std::make_pair(BB1V, BB2V)];
if (!SI)
SI = cast<SelectInst>
(Builder.CreateSelect(BI->getCondition(), BB1V, BB2V,
BB1V->getName() + "." + BB2V->getName()));

// Make the PHI node use the select for all incoming values for BB1/BB2
for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i)
if (PN->getIncomingBlock(i) == BB1 \|\| PN->getIncomingBlock(i) == BB2)
PN->setIncomingValue(i, SI);
}
}

// Update any PHI nodes in our new successors.
for (BasicBlock *Succ : successors(BB1))
AddPredecessorToBlock(Succ, BIParent, BB1);

EraseTerminatorInstAndDCECond(BI);
return true;
}

/// Given an unconditional branch that goes to BBEnd,		/// Given an unconditional branch that goes to BBEnd,
/// check whether BBEnd has only two predecessors and the other predecessor		/// check whether BBEnd has only two predecessors and the other predecessor
/// ends with an unconditional branch. If it is true, sink any common code		/// ends with an unconditional branch. If it is true, sink any common code
/// in the two predecessors to BBEnd.		/// in the two predecessors to BBEnd.
static bool SinkThenElseCodeToEnd(BranchInst *BI1) {		static bool SinkThenElseCodeToEnd(BranchInst *BI1) {
assert(BI1->isUnconditional());		assert(BI1->isUnconditional());
BasicBlock *BB1 = BI1->getParent();		BasicBlock *BB1 = BI1->getParent();
BasicBlock *BBEnd = BI1->getSuccessor(0);		BasicBlock *BBEnd = BI1->getSuccessor(0);
▲ Show 20 Lines • Show All 343 Lines • ▼ Show 20 Lines	static bool SpeculativelyExecuteBB(BranchInst BI, BasicBlock ThenBB,

// Check that the PHI nodes can be converted to selects.		// Check that the PHI nodes can be converted to selects.
bool HaveRewritablePHIs = false;		bool HaveRewritablePHIs = false;
for (BasicBlock::iterator I = EndBB->begin();		for (BasicBlock::iterator I = EndBB->begin();
PHINode *PN = dyn_cast<PHINode>(I); ++I) {		PHINode *PN = dyn_cast<PHINode>(I); ++I) {
Value *OrigV = PN->getIncomingValueForBlock(BB);		Value *OrigV = PN->getIncomingValueForBlock(BB);
Value *ThenV = PN->getIncomingValueForBlock(ThenBB);		Value *ThenV = PN->getIncomingValueForBlock(ThenBB);

// FIXME: Try to remove some of the duplication with HoistThenElseCodeToIf.
// Skip PHIs which are trivial.		// Skip PHIs which are trivial.
if (ThenV == OrigV)		if (ThenV == OrigV)
continue;		continue;

// Don't convert to selects if we could remove undefined behavior instead.		// Don't convert to selects if we could remove undefined behavior instead.
if (passingValueIsAlwaysUndefined(OrigV, PN) \|\|		if (passingValueIsAlwaysUndefined(OrigV, PN) \|\|
passingValueIsAlwaysUndefined(ThenV, PN))		passingValueIsAlwaysUndefined(ThenV, PN))
return false;		return false;
▲ Show 20 Lines • Show All 3,550 Lines • ▼ Show 20 Lines	if (SimplifyBranchOnICmpChain(BI, Builder, DL))
return true;		return true;

// If this basic block is ONLY a compare and a branch, and if a predecessor		// If this basic block is ONLY a compare and a branch, and if a predecessor
// branches to us and one of our successors, fold the comparison into the		// branches to us and one of our successors, fold the comparison into the
// predecessor and use logical operations to pick the right destination.		// predecessor and use logical operations to pick the right destination.
if (FoldBranchToCommonDest(BI, BonusInstThreshold))		if (FoldBranchToCommonDest(BI, BonusInstThreshold))
return SimplifyCFG(BB, TTI, BonusInstThreshold, AC) \| true;		return SimplifyCFG(BB, TTI, BonusInstThreshold, AC) \| true;

// We have a conditional branch to two blocks that are only reachable
// from BI. We know that the condbr dominates the two blocks, so see if
// there is any identical code in the "then" and "else" blocks. If so, we
// can hoist it up to the branching block.
if (BI->getSuccessor(0)->getSinglePredecessor()) {		if (BI->getSuccessor(0)->getSinglePredecessor()) {
if (BI->getSuccessor(1)->getSinglePredecessor()) {		if (!BI->getSuccessor(1)->getSinglePredecessor()) {
if (HoistThenElseCodeToIf(BI, TTI))
return SimplifyCFG(BB, TTI, BonusInstThreshold, AC) \| true;
} else {
// If Successor #1 has multiple preds, we may be able to conditionally		// If Successor #1 has multiple preds, we may be able to conditionally
// execute Successor #0 if it branches to Successor #1.		// execute Successor #0 if it branches to Successor #1.
TerminatorInst *Succ0TI = BI->getSuccessor(0)->getTerminator();		TerminatorInst *Succ0TI = BI->getSuccessor(0)->getTerminator();
if (Succ0TI->getNumSuccessors() == 1 &&		if (Succ0TI->getNumSuccessors() == 1 &&
Succ0TI->getSuccessor(0) == BI->getSuccessor(1))		Succ0TI->getSuccessor(0) == BI->getSuccessor(1))
if (SpeculativelyExecuteBB(BI, BI->getSuccessor(0), TTI))		if (SpeculativelyExecuteBB(BI, BI->getSuccessor(0), TTI))
return SimplifyCFG(BB, TTI, BonusInstThreshold, AC) \| true;		return SimplifyCFG(BB, TTI, BonusInstThreshold, AC) \| true;
}		}
▲ Show 20 Lines • Show All 182 Lines • Show Last 20 Lines

llvm/test/Transforms/GVN/hoist.ll

This file was added.

				; RUN: opt -gvn-hoist -S < %s \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Check that all scalar expressions are hoisted.
				;
				; CHECK-LABEL: @scalarsHoisting
				; CHECK: fsub
				; CHECK: fmul
				; CHECK: fsub
				; CHECK: fmul
				; CHECK-NOT: fmul
				; CHECK-NOT: fsub
				define float @scalarsHoisting(float %d, float %min, float %max, float %a) {
				entry:
				%div = fdiv float 1.000000e+00, %d
				%cmp = fcmp oge float %div, 0.000000e+00
				br i1 %cmp, label %if.then, label %if.else

				if.then: ; preds = %entry
				%sub = fsub float %min, %a
				%mul = fmul float %sub, %div
				%sub1 = fsub float %max, %a
				%mul2 = fmul float %sub1, %div
				br label %if.end

				if.else: ; preds = %entry
				%sub3 = fsub float %max, %a
				%mul4 = fmul float %sub3, %div
				%sub5 = fsub float %min, %a
				%mul6 = fmul float %sub5, %div
				br label %if.end

				if.end: ; preds = %if.else, %if.then
				%tmax.0 = phi float [ %mul2, %if.then ], [ %mul6, %if.else ]
				%tmin.0 = phi float [ %mul, %if.then ], [ %mul4, %if.else ]
				%add = fadd float %tmax.0, %tmin.0
				ret float %add
				}

				; Check that all loads and scalars depending on the loads are hoisted.
				;
				; CHECK-LABEL: @readsAndScalarsHoisting
				; CHECK: load
				; CHECK: load
				; CHECK: fsub
				; CHECK: fmul
				; CHECK: load
				; CHECK: fsub
				; CHECK: fmul
				; CHECK-NOT: load
				; CHECK-NOT: fmul
				; CHECK-NOT: fsub
				define float @readsAndScalarsHoisting(float %d, float* %min, float* %max, float* %a) {
				entry:
				%div = fdiv float 1.000000e+00, %d
				%cmp = fcmp oge float %div, 0.000000e+00
				br i1 %cmp, label %if.then, label %if.else

				if.then: ; preds = %entry
				%0 = load float, float* %min, align 4
				%1 = load float, float* %a, align 4
				%sub = fsub float %0, %1
				%mul = fmul float %sub, %div
				%2 = load float, float* %max, align 4
				%sub1 = fsub float %2, %1
				%mul2 = fmul float %sub1, %div
				br label %if.end

				if.else: ; preds = %entry
				%3 = load float, float* %max, align 4
				%4 = load float, float* %a, align 4
				%sub3 = fsub float %3, %4
				%mul4 = fmul float %sub3, %div
				%5 = load float, float* %min, align 4
				%sub5 = fsub float %5, %4
				%mul6 = fmul float %sub5, %div
				br label %if.end

				if.end: ; preds = %if.else, %if.then
				%tmax.0 = phi float [ %mul2, %if.then ], [ %mul6, %if.else ]
				%tmin.0 = phi float [ %mul, %if.then ], [ %mul4, %if.else ]
				%add = fadd float %tmax.0, %tmin.0
				ret float %add
				}

				; Check that we do not hoist loads after a store: the first two loads will be
				; hoisted, and then the third load will not be hoisted.
				;
				; CHECK-LABEL: @readsAndWrites
				; CHECK: load
				; CHECK: load
				; CHECK: fsub
				; CHECK: fmul
				; CHECK: store
				; CHECK: load
				; CHECK: fsub
				; CHECK: fmul
				; CHECK: load
				; CHECK: fsub
				; CHECK: fmul
				; CHECK-NOT: load
				; CHECK-NOT: fmul
				; CHECK-NOT: fsub

				@G = internal global float 1.000000e+00

				define float @readsAndWrites(float %d, float* %min, float* %max, float* %a) {
				entry:
				%div = fdiv float 1.000000e+00, %d
				%cmp = fcmp oge float %div, 0.000000e+00
				br i1 %cmp, label %if.then, label %if.else

				if.then: ; preds = %entry
				%0 = load float, float* %min, align 4
				%1 = load float, float* %a, align 4
				store float %0, float* @G
				%sub = fsub float %0, %1
				%mul = fmul float %sub, %div
				%2 = load float, float* %max, align 4
				%sub1 = fsub float %2, %1
				%mul2 = fmul float %sub1, %div
				br label %if.end

				if.else: ; preds = %entry
				%3 = load float, float* %max, align 4
				%4 = load float, float* %a, align 4
				%sub3 = fsub float %3, %4
				%mul4 = fmul float %sub3, %div
				%5 = load float, float* %min, align 4
				%sub5 = fsub float %5, %4
				%mul6 = fmul float %sub5, %div
				br label %if.end

				if.end: ; preds = %if.else, %if.then
				%tmax.0 = phi float [ %mul2, %if.then ], [ %mul6, %if.else ]
				%tmin.0 = phi float [ %mul, %if.then ], [ %mul4, %if.else ]
				%add = fadd float %tmax.0, %tmin.0
				ret float %add
				}

				; Check that dependent expressions are hoisted.
				; CHECK-LABEL: @dependentScalarsHoisting
				; CHECK: fsub
				; CHECK: fadd
				; CHECK: fdiv
				; CHECK: fmul
				; CHECK-NOT: fsub
				; CHECK-NOT: fadd
				; CHECK-NOT: fdiv
				; CHECK-NOT: fmul
				define float @dependentScalarsHoisting(float %a, float %b, i1 %c) {
				entry:
				br i1 %c, label %if.then, label %if.else

				if.then:
				%d = fsub float %b, %a
				%e = fadd float %d, %a
				%f = fdiv float %e, %a
				%g = fmul float %f, %a
				br label %if.end

				if.else:
				%h = fsub float %b, %a
				%i = fadd float %h, %a
				%j = fdiv float %i, %a
				%k = fmul float %j, %a
				br label %if.end

				if.end:
				%r = phi float [ %g, %if.then ], [ %k, %if.else ]
				ret float %r
				}

				; Check that all independent expressions are hoisted.
				; CHECK-LABEL: @independentScalarsHoisting
				; CHECK: fadd
				; CHECK: fsub
				; CHECK: fdiv
				; CHECK: fmul
				; CHECK-NOT: fsub
				; CHECK-NOT: fdiv
				; CHECK-NOT: fmul
				define float @independentScalarsHoisting(float %a, float %b, i1 %c) {
				entry:
				br i1 %c, label %if.then, label %if.else

				if.then:
				%d = fadd float %b, %a
				%e = fsub float %b, %a
				%f = fdiv float %b, %a
				%g = fmul float %b, %a
				br label %if.end

				if.else:
				%i = fadd float %b, %a
				%h = fsub float %b, %a
				%j = fdiv float %b, %a
				%k = fmul float %b, %a
				br label %if.end

				if.end:
				%p = phi float [ %d, %if.then ], [ %i, %if.else ]
				%q = phi float [ %e, %if.then ], [ %h, %if.else ]
				%r = phi float [ %f, %if.then ], [ %j, %if.else ]
				%s = phi float [ %g, %if.then ], [ %k, %if.else ]
				%t = fadd float %p, %q
				%u = fadd float %r, %s
				%v = fadd float %t, %u
				ret float %v
				}

This is an archive of the discontinued LLVM Phabricator instance.

New code hoisting pass based on GVNAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 53095

llvm/include/llvm/InitializePasses.h

llvm/include/llvm/LinkAllPasses.h

llvm/include/llvm/Transforms/Scalar.h

llvm/include/llvm/Transforms/Scalar/GVN.h

llvm/lib/Passes/PassRegistry.def

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

llvm/lib/Transforms/Scalar/CMakeLists.txt

llvm/lib/Transforms/Scalar/GVNHoist.cpp

llvm/lib/Transforms/Scalar/Scalar.cpp

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

llvm/test/Transforms/GVN/hoist.ll

New code hoisting pass based on GVN
AbandonedPublic