Download Raw Diff

Details

Reviewers

spatel
xbolva00
efriedma
lattner
RKSimon
goldstein.w.n
aeubanks

Commits

rGad7f02010f32: [InstCombine] Process blocks in RPO

Summary

InstComine currently processes blocks in some ill-defined depth-first order. This can break the usual invariant that the operands of an instruction should be simplified before the instruction itself, if uses across basic blocks (particularly inside phi nodes) are involved.

This patch switches the initial worklist population to use RPO instead, which will ensure that predecessors are visited before successors (back-edges notwithstanding).

This allows us to fold more cases within a single InstCombine iteration. The broader context here is that I want to limit InstCombine to a single iteration in the future (removing the current fix-point iteration), which will give a large (5% end-to-end) compile-time improvement without substantial optimization impact. However, this requires eliminating as many cases where we fail to reach the fix point in one iteration as possible, and the current worklist population order is one of the main issues.

In the meantime, this does cause a small compile-time regression of about 0.1% (http://llvm-compile-time-tracker.com/compare.php?from=725fcf40c3e55b2c03a1ed2326375984c0a8560f&to=8d0280338cd9409ec6fa4dbf86f4c2a9dfa57c60&stat=instructions:u), because calculating RPO is more expensive than one would think.

Diff Detail

Unit TestsFailed

	Time	Test
	60,520 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/non-overloaded::vloxseg.c
	60,610 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/non-overloaded::vluxseg.c
	60,580 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/overloaded::vloxseg.c
	60,560 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/overloaded::vluxseg.c
	60,050 ms	x64 debian > MLIR.Examples/standalone::test.toy
		View Full Test Results (6 Failed)

Event Timeline

nikic created this revision.Feb 28 2020, 9:04 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 28 2020, 9:04 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B47602: Diff 247284.Feb 28 2020, 9:19 AM

Thanks, useful!

Did you observe any (positive) compile time changes?

test/Transforms/InstCombine/store.ll
2 ↗	(On Diff #247284)	since -infinite-loop-threshold now serves as a max treshold value, should we rename it? -instcombine-iterations-threshold?

In D75362#1898366, @xbolva00 wrote:

Did you observe any (positive) compile time changes?

I don't expect visible improvement from (just) this change, because it affects few cases (less than 1% of our InstCombine tests). My hope is that we can remove fixpoint iteration from InstCombine entirely in the future, once all the known issues are resolved. I'm still not sure if that's realistic, but I'm getting pretty close now...

test/Transforms/InstCombine/store.ll
2 ↗	(On Diff #247284)	Yes, the "infinite loop" terminology doesn't make a lot of sense for this usage. We also have a `-instcombine-max-iterations` setting, so it's a bit tricky to distinguish these two limits (one just stops, the other reports a fatal error).

Reduce cost by computing RPOT only once.

In D75362#1898366, @xbolva00 wrote:

Did you observe any (positive) compile time changes?

It turned out that computing the RPO order is more expensive than I expected... After optimizing a bit, this change still clocks in as a 0.15% regression.

This looks good to me.
@spatel?

In D75362#1961474, @lebedev.ri wrote:

This looks good to me.
@spatel?

I don't have a sense of the value/cost trade-off other than what is noted in this review, so I don't have anything useful to say here.
Let's add some other reviewers and see if anyone else has experience/ideas.

A depth-first search is enough to ensure some predecessor of every block is visited before that block. So the benefit of RPO is to change that to all (non-loop) predecessors, which I guess helps optimizations involving PHI nodes?

I'm surprised this shows up in the compile-time statistics that way. Given the cost of everything else instcombine does, an RPO traversal shouldn't rank very high. Maybe worth adding timers to check whether the time is actually in this function, or we end up doing more work due to the order of instructions in the worklist. (Maybe we should be using SmallVector in ReversePostOrderTraversal?)

nikic mentioned this in D82005: [InstCombine] Replace selects with Phis.Jun 22 2020, 12:28 AM

lebedev.ri requested changes to this revision.Jul 1 2020, 7:03 AM

This revision now requires changes to proceed.Jul 1 2020, 7:03 AM

This review seems to be stuck/dead, consider abandoning if no longer relevant.

This revision now requires review to proceed.Jan 12 2023, 4:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 4:46 PM

Herald added a subscriber: StephenFan. · View Herald Transcript

Rebase. Compile-time impact is still about the same: http://llvm-compile-time-tracker.com/compare.php?from=d734edfe7c13d1b8e32d75a5df897ef0d9b69302&to=9419472abcffeed5401e27c340dfc249dbef5d18&stat=instructions:u

Herald added a subscriber: hiraditya. · View Herald TranscriptJan 13 2023, 2:37 AM

This still seems like the right thing to do, but not pursing it right now.

Harbormaster completed remote builds in B207578: Diff 488922.Jan 13 2023, 3:33 AM

nikic mentioned this in D150900: [InstCombine] Insert a bitcast to enable merging similar store insts.May 22 2023, 5:48 AM

Rebase and put up for review again.

Harbormaster completed remote builds in B234529: Diff 525628.May 25 2023, 9:30 AM

Rebase

Harbormaster completed remote builds in B235290: Diff 526613.May 30 2023, 8:35 AM

Rebase due to test changes

Harbormaster completed remote builds in B240509: Diff 533610.Jun 22 2023, 9:09 AM

nikic added a child revision: D154579: [InstCombine] Only perform one iteration.Jul 6 2023, 1:40 AM

nikic added a reviewer: aeubanks.Jul 28 2023, 10:44 AM

lgtm, assuming it's still necessary for D154579

This revision is now accepted and ready to land.Jul 28 2023, 10:44 AM

Closed by commit rGad7f02010f32: [InstCombine] Process blocks in RPO (authored by nikic). · Explain WhyJul 30 2023, 9:39 AM

This revision was automatically updated to reflect the committed changes.

nikic added a commit: rGad7f02010f32: [InstCombine] Process blocks in RPO.

nikic mentioned this in rG72ec2c007e4c: [InstCombine] Fix handling of irreducible loops (PR64259).Jul 31 2023, 7:20 AM

Diff 488922

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Show All 32 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "InstCombineInternal.h"		#include "InstCombineInternal.h"
#include "llvm-c/Initialization.h"		#include "llvm-c/Initialization.h"
#include "llvm-c/Transforms/InstCombine.h"		#include "llvm-c/Transforms/InstCombine.h"
#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"		#include "llvm/Analysis/BasicAliasAnalysis.h"
#include "llvm/Analysis/BlockFrequencyInfo.h"		#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/Analysis/CFG.h"		#include "llvm/Analysis/CFG.h"
▲ Show 20 Lines • Show All 4,392 Lines • ▼ Show 20 Lines	if (auto *MD = dyn_cast<MDNode>(MDOperand))
return !UsedAliasScopesAndLists.contains(MD) \|\|		return !UsedAliasScopesAndLists.contains(MD) \|\|
!UsedNoAliasScopesAndLists.contains(MD);		!UsedNoAliasScopesAndLists.contains(MD);

// Not an MDNode ? throw away.		// Not an MDNode ? throw away.
return true;		return true;
}		}
};		};

/// Populate the IC worklist from a function, by walking it in depth-first		/// Populate the IC worklist from a function, by walking it in reverse
/// order and adding all reachable code to the worklist.		/// post-order and adding all reachable code to the worklist.
///		///
/// This has a couple of tricks to make the code faster and more powerful. In		/// This has a couple of tricks to make the code faster and more powerful. In
/// particular, we constant fold and DCE instructions as we go, to avoid adding		/// particular, we constant fold and DCE instructions as we go, to avoid adding
/// them to the worklist (this significantly speeds up instcombine on code where		/// them to the worklist (this significantly speeds up instcombine on code where
/// many instructions are dead or constant). Additionally, if we find a branch		/// many instructions are dead or constant). Additionally, if we find a branch
/// whose condition is a known constant, we only visit the reachable successors.		/// whose condition is a known constant, we only visit the reachable successors.
static bool prepareICWorklistFromFunction(Function &F, const DataLayout &DL,		static bool
		prepareICWorklistFromFunction(Function &F, const DataLayout &DL,
const TargetLibraryInfo *TLI,		const TargetLibraryInfo *TLI,
InstructionWorklist &ICWorklist) {		InstructionWorklist &ICWorklist,
		ReversePostOrderTraversal<BasicBlock *> &RPOT) {
bool MadeIRChange = false;		bool MadeIRChange = false;
SmallPtrSet<BasicBlock *, 32> Visited;		SmallPtrSet<BasicBlock *, 32> LiveBlocks;
SmallVector<BasicBlock*, 256> Worklist;		LiveBlocks.insert(&F.front());
Worklist.push_back(&F.front());

SmallVector<Instruction *, 128> InstrsForInstructionWorklist;		SmallVector<Instruction *, 128> InstrsForInstructionWorklist;
DenseMap<Constant , Constant > FoldedConstants;		DenseMap<Constant , Constant > FoldedConstants;
AliasScopeTracker SeenAliasScopes;		AliasScopeTracker SeenAliasScopes;

do {		for (BasicBlock *BB : RPOT) {
BasicBlock *BB = Worklist.pop_back_val();		if (!LiveBlocks.count(BB))

// We have now visited this block! If we've already been here, ignore it.
if (!Visited.insert(BB).second)
continue;		continue;

for (Instruction &Inst : llvm::make_early_inc_range(*BB)) {		for (Instruction &Inst : llvm::make_early_inc_range(*BB)) {
// ConstantProp instruction if trivially constant.		// ConstantProp instruction if trivially constant.
if (!Inst.use_empty() &&		if (!Inst.use_empty() &&
(Inst.getNumOperands() == 0 \|\| isa<Constant>(Inst.getOperand(0))))		(Inst.getNumOperands() == 0 \|\| isa<Constant>(Inst.getOperand(0))))
if (Constant *C = ConstantFoldInstruction(&Inst, DL, TLI)) {		if (Constant *C = ConstantFoldInstruction(&Inst, DL, TLI)) {
LLVM_DEBUG(dbgs() << "IC: ConstFold to: " << *C << " from: " << Inst		LLVM_DEBUG(dbgs() << "IC: ConstFold to: " << *C << " from: " << Inst
Show All 29 Lines	for (Instruction &Inst : llvm::make_early_inc_range(*BB)) {
// these call instructions consumes non-trivial amount of time and		// these call instructions consumes non-trivial amount of time and
// provides no value for the optimization.		// provides no value for the optimization.
if (!Inst.isDebugOrPseudoInst()) {		if (!Inst.isDebugOrPseudoInst()) {
InstrsForInstructionWorklist.push_back(&Inst);		InstrsForInstructionWorklist.push_back(&Inst);
SeenAliasScopes.analyse(&Inst);		SeenAliasScopes.analyse(&Inst);
}		}
}		}

// Recursively visit successors. If this is a branch or switch on a		// If this is a branch or switch on a constant, mark only the single
// constant, only visit the reachable successor.		// live successor. Otherwise assume all successors are live.
Instruction *TI = BB->getTerminator();		Instruction *TI = BB->getTerminator();
if (BranchInst *BI = dyn_cast<BranchInst>(TI)) {		if (BranchInst *BI = dyn_cast<BranchInst>(TI)) {
if (BI->isConditional() && isa<ConstantInt>(BI->getCondition())) {		if (BI->isConditional() && isa<ConstantInt>(BI->getCondition())) {
bool CondVal = cast<ConstantInt>(BI->getCondition())->getZExtValue();		bool CondVal = cast<ConstantInt>(BI->getCondition())->getZExtValue();
BasicBlock *ReachableBB = BI->getSuccessor(!CondVal);		BasicBlock *ReachableBB = BI->getSuccessor(!CondVal);
Worklist.push_back(ReachableBB);		LiveBlocks.insert(ReachableBB);
continue;		continue;
}		}
} else if (SwitchInst *SI = dyn_cast<SwitchInst>(TI)) {		} else if (SwitchInst *SI = dyn_cast<SwitchInst>(TI)) {
if (ConstantInt *Cond = dyn_cast<ConstantInt>(SI->getCondition())) {		if (ConstantInt *Cond = dyn_cast<ConstantInt>(SI->getCondition())) {
Worklist.push_back(SI->findCaseValue(Cond)->getCaseSuccessor());		LiveBlocks.insert(SI->findCaseValue(Cond)->getCaseSuccessor());
continue;		continue;
}		}
}		}

append_range(Worklist, successors(TI));		for (BasicBlock *SuccBB : successors(TI))
} while (!Worklist.empty());		LiveBlocks.insert(SuccBB);
		}

// Remove instructions inside unreachable blocks. This prevents the		// Remove instructions inside unreachable blocks. This prevents the
// instcombine code from having to deal with some bad special cases, and		// instcombine code from having to deal with some bad special cases, and
// reduces use counts of instructions.		// reduces use counts of instructions.
for (BasicBlock &BB : F) {		for (BasicBlock &BB : F) {
if (Visited.count(&BB))		if (LiveBlocks.count(&BB))
continue;		continue;

unsigned NumDeadInstInBB;		unsigned NumDeadInstInBB;
unsigned NumDeadDbgInstInBB;		unsigned NumDeadDbgInstInBB;
std::tie(NumDeadInstInBB, NumDeadDbgInstInBB) =		std::tie(NumDeadInstInBB, NumDeadDbgInstInBB) =
removeAllNonTerminatorAndEHPadInstructions(&BB);		removeAllNonTerminatorAndEHPadInstructions(&BB);

MadeIRChange \|= NumDeadInstInBB + NumDeadDbgInstInBB > 0;		MadeIRChange \|= NumDeadInstInBB + NumDeadDbgInstInBB > 0;
Show All 38 Lines	static bool combineInstructionsOverFunction(
IRBuilder<TargetFolder, IRBuilderCallbackInserter> Builder(		IRBuilder<TargetFolder, IRBuilderCallbackInserter> Builder(
F.getContext(), TargetFolder(DL),		F.getContext(), TargetFolder(DL),
IRBuilderCallbackInserter([&Worklist, &AC](Instruction *I) {		IRBuilderCallbackInserter([&Worklist, &AC](Instruction *I) {
Worklist.add(I);		Worklist.add(I);
if (auto *Assume = dyn_cast<AssumeInst>(I))		if (auto *Assume = dyn_cast<AssumeInst>(I))
AC.registerAssumption(Assume);		AC.registerAssumption(Assume);
}));		}));

		ReversePostOrderTraversal<BasicBlock *> RPOT(&F.front());

// Lower dbg.declare intrinsics otherwise their value may be clobbered		// Lower dbg.declare intrinsics otherwise their value may be clobbered
// by instcombiner.		// by instcombiner.
bool MadeIRChange = false;		bool MadeIRChange = false;
if (ShouldLowerDbgDeclare)		if (ShouldLowerDbgDeclare)
MadeIRChange = LowerDbgDeclare(F);		MadeIRChange = LowerDbgDeclare(F);
// LowerDbgDeclare calls RemoveRedundantDbgInstrs, but LowerDbgDeclare will		// LowerDbgDeclare calls RemoveRedundantDbgInstrs, but LowerDbgDeclare will
// almost never return true when running an assignment tracking build. Take		// almost never return true when running an assignment tracking build. Take
// this opportunity to do some clean up for assignment tracking builds too.		// this opportunity to do some clean up for assignment tracking builds too.
Show All 19 Lines	if (Iteration > MaxIterations) {
<< " on " << F.getName()		<< " on " << F.getName()
<< " reached; stopping before reaching a fixpoint\n");		<< " reached; stopping before reaching a fixpoint\n");
break;		break;
}		}

LLVM_DEBUG(dbgs() << "\n\nINSTCOMBINE ITERATION #" << Iteration << " on "		LLVM_DEBUG(dbgs() << "\n\nINSTCOMBINE ITERATION #" << Iteration << " on "
<< F.getName() << "\n");		<< F.getName() << "\n");

MadeIRChange \|= prepareICWorklistFromFunction(F, DL, &TLI, Worklist);		MadeIRChange \|= prepareICWorklistFromFunction(F, DL, &TLI, Worklist, RPOT);

InstCombinerImpl IC(Worklist, Builder, F.hasMinSize(), AA, AC, TLI, TTI, DT,		InstCombinerImpl IC(Worklist, Builder, F.hasMinSize(), AA, AC, TLI, TTI, DT,
ORE, BFI, PSI, DL, LI);		ORE, BFI, PSI, DL, LI);
IC.MaxArraySizeForCombine = MaxArraySize;		IC.MaxArraySizeForCombine = MaxArraySize;

if (!IC.run())		if (!IC.run())
break;		break;

▲ Show 20 Lines • Show All 127 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/pr44245.ll

	Show First 20 Lines • Show All 152 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: br label [[WHILE_COND:%.*]]			; CHECK-NEXT: br label [[WHILE_COND:%.*]]
	; CHECK: while.cond:			; CHECK: while.cond:
	; CHECK-NEXT: br label [[FOR_COND:%.*]]			; CHECK-NEXT: br label [[FOR_COND:%.*]]
	; CHECK: for.cond:			; CHECK: for.cond:
	; CHECK-NEXT: br i1 [[C:%.]], label [[COND_TRUE133:%.]], label [[COND_FALSE138:%.*]]			; CHECK-NEXT: br i1 [[C:%.]], label [[COND_TRUE133:%.]], label [[COND_FALSE138:%.*]]
	; CHECK: cond.true133:			; CHECK: cond.true133:
	; CHECK-NEXT: br label [[COND_END144:%.*]]			; CHECK-NEXT: br label [[COND_END144:%.*]]
	; CHECK: cond.false138:			; CHECK: cond.false138:
	; CHECK-NEXT: store ptr poison, ptr null, align 4294967296
	; CHECK-NEXT: br label [[COND_END144]]			; CHECK-NEXT: br label [[COND_END144]]
	; CHECK: cond.end144:			; CHECK: cond.end144:
				; CHECK-NEXT: store ptr poison, ptr null, align 4294967296
	; CHECK-NEXT: br label [[WHILE_COND]]			; CHECK-NEXT: br label [[WHILE_COND]]
	;			;
	entry:			entry:
	br label %while.cond			br label %while.cond

	while.cond: ; preds = %cond.end144, %entry			while.cond: ; preds = %cond.end144, %entry
	%link.0 = phi ptr [ undef, %entry ], [ %cond145, %cond.end144 ]			%link.0 = phi ptr [ undef, %entry ], [ %cond145, %cond.end144 ]
	br label %for.cond			br label %for.cond
	Show All 16 Lines

llvm/test/Transforms/InstCombine/select.ll

	Show First 20 Lines • Show All 954 Lines • ▼ Show 20 Lines
	; PR14131			; PR14131
	define void @test64(i32 %p, i16 %b, i1 %c1) noreturn {			define void @test64(i32 %p, i16 %b, i1 %c1) noreturn {
	; CHECK-LABEL: @test64(			; CHECK-LABEL: @test64(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 [[C1:%.]], label [[LOR_RHS:%.]], label [[LOR_END:%.*]]			; CHECK-NEXT: br i1 [[C1:%.]], label [[LOR_RHS:%.]], label [[LOR_END:%.*]]
	; CHECK: lor.rhs:			; CHECK: lor.rhs:
	; CHECK-NEXT: br label [[LOR_END]]			; CHECK-NEXT: br label [[LOR_END]]
	; CHECK: lor.end:			; CHECK: lor.end:
	; CHECK-NEXT: br i1 true, label [[COND_END17:%.]], label [[COND_FALSE16:%.]]			; CHECK-NEXT: br i1 poison, label [[COND_END17:%.]], label [[COND_FALSE16:%.]]
	; CHECK: cond.false16:			; CHECK: cond.false16:
	; CHECK-NEXT: br label [[COND_END17]]			; CHECK-NEXT: br label [[COND_END17]]
	; CHECK: cond.end17:			; CHECK: cond.end17:
	; CHECK-NEXT: br label [[WHILE_BODY:%.*]]			; CHECK-NEXT: br label [[WHILE_BODY:%.*]]
	; CHECK: while.body:			; CHECK: while.body:
	; CHECK-NEXT: br label [[WHILE_BODY]]			; CHECK-NEXT: br label [[WHILE_BODY]]
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 2,427 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/store.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=instcombine -S \| FileCheck %s			; RUN: opt < %s -passes=instcombine -instcombine-infinite-loop-threshold=2 -S \| FileCheck %s

	; FIXME: This is technically incorrect because it might overwrite a poison			; FIXME: This is technically incorrect because it might overwrite a poison
	; value. Stop folding it once #52930 is resolved.			; value. Stop folding it once #52930 is resolved.
	define void @store_of_undef(ptr %P) {			define void @store_of_undef(ptr %P) {
	; CHECK-LABEL: @store_of_undef(			; CHECK-LABEL: @store_of_undef(
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	store i32 undef, ptr %P			store i32 undef, ptr %P
	▲ Show 20 Lines • Show All 341 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Process blocks in RPO
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 488922

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

llvm/test/Transforms/InstCombine/pr44245.ll

llvm/test/Transforms/InstCombine/select.ll

llvm/test/Transforms/InstCombine/store.ll

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Process blocks in RPOClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 488922

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

llvm/test/Transforms/InstCombine/pr44245.ll

llvm/test/Transforms/InstCombine/select.ll

llvm/test/Transforms/InstCombine/store.ll

[InstCombine] Process blocks in RPO
ClosedPublic