This is an archive of the discontinued LLVM Phabricator instance.

[JumpThreading] Put a limit on the PHI nodes when duplicating a BB.
ClosedPublic

Authored by mnadeem on Oct 25 2022, 3:43 PM.

Download Raw Diff

Details

Reviewers

efriedma
nikic
vzakhari

Commits

rG32755786e020: [JumpThreading] Put a limit on the PHI nodes when duplicating a BB.

Summary

Do not duplicate a BB if it has a lot of PHI nodes.
If a threadable chain is too long then the number of duplicated PHI nodes
can add up, leading to a substantial increase in compile time when rewriting
the SSA.

Fixes https://github.com/llvm/llvm-project/issues/58203

The threshold of 76 in this patch is reasonably high and reduces the compile
time of cldwat2m_macro.f90 in SPEC2017/cam4 from 80+min to <2min.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mnadeem created this revision.Oct 25 2022, 3:43 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2022, 3:43 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

mnadeem requested review of this revision.Oct 25 2022, 3:43 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2022, 3:43 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

mnadeem mentioned this in D135125: [JumpThreading] Reverse the order of basic block iteration..Oct 25 2022, 3:45 PM

Harbormaster completed remote builds in B194280: Diff 470633.Oct 25 2022, 4:25 PM

tblah added a subscriber: tblah.Oct 27 2022, 2:58 AM

This approach seems fine. Is there any particular reason behind the threshold 76?

llvm/lib/Transforms/Scalar/JumpThreading.cpp
542	Since we're iterating over the beginning of the block anyway, can we use the computed non-PHI instruction?

Looks good to me.

This revision is now accepted and ready to land.Oct 31 2022, 9:15 AM

In D136716#3896687, @efriedma wrote:

This approach seems fine. Is there any particular reason behind the threshold 76?

I didn't want to miss any threading opportunities if the compile time was reasonable, 76 seemed high enough (after some experimentation) to cover a lot of cases in SPEC2017/cam4 while maintaining a reasonable compile time.

Reuse the computed non-PHI instruction.

mnadeem marked an inline comment as done.Oct 31 2022, 11:42 AM

Harbormaster completed remote builds in B195322: Diff 472087.Oct 31 2022, 1:05 PM

Closed by commit rG32755786e020: [JumpThreading] Put a limit on the PHI nodes when duplicating a BB. (authored by mnadeem). · Explain WhyOct 31 2022, 3:52 PM

This revision was automatically updated to reflect the committed changes.

mnadeem added a commit: rG32755786e020: [JumpThreading] Put a limit on the PHI nodes when duplicating a BB..

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

JumpThreading.cpp

22 lines

Diff 472167

llvm/lib/Transforms/Scalar/JumpThreading.cpp

	Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines

	static cl::opt<unsigned>			static cl::opt<unsigned>
	ImplicationSearchThreshold(			ImplicationSearchThreshold(
	"jump-threading-implication-search-threshold",			"jump-threading-implication-search-threshold",
	cl::desc("The number of predecessors to search for a stronger "			cl::desc("The number of predecessors to search for a stronger "
	"condition to use to thread over a weaker condition"),			"condition to use to thread over a weaker condition"),
	cl::init(3), cl::Hidden);			cl::init(3), cl::Hidden);

				static cl::opt<unsigned> PhiDuplicateThreshold(
				"jump-threading-phi-threshold",
				cl::desc("Max PHIs in BB to duplicate for jump threading"), cl::init(76),
				cl::Hidden);

	static cl::opt<bool> PrintLVIAfterJumpThreading(			static cl::opt<bool> PrintLVIAfterJumpThreading(
	"print-lvi-after-jump-threading",			"print-lvi-after-jump-threading",
	cl::desc("Print the LazyValueInfo cache after JumpThreading"), cl::init(false),			cl::desc("Print the LazyValueInfo cache after JumpThreading"), cl::init(false),
	cl::Hidden);			cl::Hidden);

	static cl::opt<bool> ThreadAcrossLoopHeaders(			static cl::opt<bool> ThreadAcrossLoopHeaders(
	"jump-threading-across-loop-headers",			"jump-threading-across-loop-headers",
	cl::desc("Allow JumpThreading to thread across loop headers, for testing"),			cl::desc("Allow JumpThreading to thread across loop headers, for testing"),
	▲ Show 20 Lines • Show All 402 Lines • ▼ Show 20 Lines
	/// Return the cost of duplicating a piece of this block from first non-phi			/// Return the cost of duplicating a piece of this block from first non-phi
	/// and before StopAt instruction to thread across it. Stop scanning the block			/// and before StopAt instruction to thread across it. Stop scanning the block
	/// when exceeding the threshold. If duplication is impossible, returns ~0U.			/// when exceeding the threshold. If duplication is impossible, returns ~0U.
	static unsigned getJumpThreadDuplicationCost(const TargetTransformInfo *TTI,			static unsigned getJumpThreadDuplicationCost(const TargetTransformInfo *TTI,
	BasicBlock *BB,			BasicBlock *BB,
	Instruction *StopAt,			Instruction *StopAt,
	unsigned Threshold) {			unsigned Threshold) {
	assert(StopAt->getParent() == BB && "Not an instruction from proper BB?");			assert(StopAt->getParent() == BB && "Not an instruction from proper BB?");

				// Do not duplicate the BB if it has a lot of PHI nodes.
				// If a threadable chain is too long then the number of PHI nodes can add up,
				// leading to a substantial increase in compile time when rewriting the SSA.
				unsigned PhiCount = 0;
				Instruction *FirstNonPHI = nullptr;
				for (Instruction &I : *BB) {
				if (!isa<PHINode>(&I)) {
				FirstNonPHI = &I;
				break;
				}
				if (++PhiCount > PhiDuplicateThreshold)
				return ~0U;
				}

	/// Ignore PHI nodes, these will be flattened when duplication happens.			/// Ignore PHI nodes, these will be flattened when duplication happens.
	BasicBlock::const_iterator I(BB->getFirstNonPHI());			BasicBlock::const_iterator I(FirstNonPHI);
				efriedmaUnsubmitted Done Reply Inline Actions Since we're iterating over the beginning of the block anyway, can we use the computed non-PHI instruction? efriedma: Since we're iterating over the beginning of the block anyway, can we use the computed non-PHI…

	// FIXME: THREADING will delete values that are just used to compute the			// FIXME: THREADING will delete values that are just used to compute the
	// branch, so they shouldn't count against the duplication cost.			// branch, so they shouldn't count against the duplication cost.

	unsigned Bonus = 0;			unsigned Bonus = 0;
	if (BB->getTerminator() == StopAt) {			if (BB->getTerminator() == StopAt) {
	// Threading through a switch statement is particularly profitable. If this			// Threading through a switch statement is particularly profitable. If this
	// block ends in a switch, decrease its cost to make it more likely to			// block ends in a switch, decrease its cost to make it more likely to
	▲ Show 20 Lines • Show All 2,531 Lines • Show Last 20 Lines