Download Raw Diff

Details

Reviewers

spatel
xbolva00
efriedma
lattner
RKSimon
goldstein.w.n
aeubanks

Commits

rGad7f02010f32: [InstCombine] Process blocks in RPO

Summary

InstComine currently processes blocks in some ill-defined depth-first order. This can break the usual invariant that the operands of an instruction should be simplified before the instruction itself, if uses across basic blocks (particularly inside phi nodes) are involved.

This patch switches the initial worklist population to use RPO instead, which will ensure that predecessors are visited before successors (back-edges notwithstanding).

This allows us to fold more cases within a single InstCombine iteration. The broader context here is that I want to limit InstCombine to a single iteration in the future (removing the current fix-point iteration), which will give a large (5% end-to-end) compile-time improvement without substantial optimization impact. However, this requires eliminating as many cases where we fail to reach the fix point in one iteration as possible, and the current worklist population order is one of the main issues.

In the meantime, this does cause a small compile-time regression of about 0.1% (http://llvm-compile-time-tracker.com/compare.php?from=725fcf40c3e55b2c03a1ed2326375984c0a8560f&to=8d0280338cd9409ec6fa4dbf86f4c2a9dfa57c60&stat=instructions:u), because calculating RPO is more expensive than one would think.

Diff Detail

Event Timeline

nikic created this revision.Feb 28 2020, 9:04 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 28 2020, 9:04 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B47602: Diff 247284.Feb 28 2020, 9:19 AM

Thanks, useful!

Did you observe any (positive) compile time changes?

test/Transforms/InstCombine/store.ll
2	since -infinite-loop-threshold now serves as a max treshold value, should we rename it? -instcombine-iterations-threshold?

In D75362#1898366, @xbolva00 wrote:

Did you observe any (positive) compile time changes?

I don't expect visible improvement from (just) this change, because it affects few cases (less than 1% of our InstCombine tests). My hope is that we can remove fixpoint iteration from InstCombine entirely in the future, once all the known issues are resolved. I'm still not sure if that's realistic, but I'm getting pretty close now...

test/Transforms/InstCombine/store.ll
2	Yes, the "infinite loop" terminology doesn't make a lot of sense for this usage. We also have a `-instcombine-max-iterations` setting, so it's a bit tricky to distinguish these two limits (one just stops, the other reports a fatal error).

Reduce cost by computing RPOT only once.

In D75362#1898366, @xbolva00 wrote:

Did you observe any (positive) compile time changes?

It turned out that computing the RPO order is more expensive than I expected... After optimizing a bit, this change still clocks in as a 0.15% regression.

This looks good to me.
@spatel?

In D75362#1961474, @lebedev.ri wrote:

This looks good to me.
@spatel?

I don't have a sense of the value/cost trade-off other than what is noted in this review, so I don't have anything useful to say here.
Let's add some other reviewers and see if anyone else has experience/ideas.

A depth-first search is enough to ensure some predecessor of every block is visited before that block. So the benefit of RPO is to change that to all (non-loop) predecessors, which I guess helps optimizations involving PHI nodes?

I'm surprised this shows up in the compile-time statistics that way. Given the cost of everything else instcombine does, an RPO traversal shouldn't rank very high. Maybe worth adding timers to check whether the time is actually in this function, or we end up doing more work due to the order of instructions in the worklist. (Maybe we should be using SmallVector in ReversePostOrderTraversal?)

nikic mentioned this in D82005: [InstCombine] Replace selects with Phis.Jun 22 2020, 12:28 AM

lebedev.ri requested changes to this revision.Jul 1 2020, 7:03 AM

This revision now requires changes to proceed.Jul 1 2020, 7:03 AM

This review seems to be stuck/dead, consider abandoning if no longer relevant.

This revision now requires review to proceed.Jan 12 2023, 4:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 4:46 PM

Herald added a subscriber: StephenFan. · View Herald Transcript

Rebase. Compile-time impact is still about the same: http://llvm-compile-time-tracker.com/compare.php?from=d734edfe7c13d1b8e32d75a5df897ef0d9b69302&to=9419472abcffeed5401e27c340dfc249dbef5d18&stat=instructions:u

Herald added a subscriber: hiraditya. · View Herald TranscriptJan 13 2023, 2:37 AM

This still seems like the right thing to do, but not pursing it right now.

Harbormaster completed remote builds in B207578: Diff 488922.Jan 13 2023, 3:33 AM

nikic mentioned this in D150900: [InstCombine] Insert a bitcast to enable merging similar store insts.May 22 2023, 5:48 AM

Rebase and put up for review again.

Harbormaster completed remote builds in B234529: Diff 525628.May 25 2023, 9:30 AM

Rebase

Harbormaster completed remote builds in B235290: Diff 526613.May 30 2023, 8:35 AM

Rebase due to test changes

Harbormaster completed remote builds in B240509: Diff 533610.Jun 22 2023, 9:09 AM

nikic added a child revision: D154579: [InstCombine] Only perform one iteration.Jul 6 2023, 1:40 AM

nikic added a reviewer: aeubanks.Jul 28 2023, 10:44 AM

lgtm, assuming it's still necessary for D154579

This revision is now accepted and ready to land.Jul 28 2023, 10:44 AM

Closed by commit rGad7f02010f32: [InstCombine] Process blocks in RPO (authored by nikic). · Explain WhyJul 30 2023, 9:39 AM

This revision was automatically updated to reflect the committed changes.

nikic added a commit: rGad7f02010f32: [InstCombine] Process blocks in RPO.

nikic mentioned this in rG72ec2c007e4c: [InstCombine] Fix handling of irreducible loops (PR64259).Jul 31 2023, 7:20 AM

Diff 247284

lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 3,568 Lines • ▼ Show 20 Lines	if (Instruction Result = visit(I)) {
MadeIRChange = true;		MadeIRChange = true;
}		}
}		}

Worklist.zap();		Worklist.zap();
return MadeIRChange;		return MadeIRChange;
}		}

/// Walk the function in depth-first order, adding all reachable code to the		/// Walk the function in reverse post-order, adding all reachable code to the
/// worklist.		/// worklist.
///		///
/// This has a couple of tricks to make the code faster and more powerful. In		/// This has a couple of tricks to make the code faster and more powerful. In
/// particular, we constant fold and DCE instructions as we go, to avoid adding		/// particular, we constant fold and DCE instructions as we go, to avoid adding
/// them to the worklist (this significantly speeds up instcombine on code where		/// them to the worklist (this significantly speeds up instcombine on code where
/// many instructions are dead or constant). Additionally, if we find a branch		/// many instructions are dead or constant). Additionally, if we find a branch
/// whose condition is a known constant, we only visit the reachable successors.		/// whose condition is a known constant, we only visit the reachable successors.
static bool AddReachableCodeToWorklist(BasicBlock *BB, const DataLayout &DL,		static bool AddReachableCodeToWorklist(
SmallPtrSetImpl<BasicBlock *> &Visited,		BasicBlock *BB, const DataLayout &DL,
InstCombineWorklist &ICWorklist,		SmallPtrSetImpl<BasicBlock *> &LiveBlocks, InstCombineWorklist &ICWorklist,
const TargetLibraryInfo *TLI) {		const TargetLibraryInfo *TLI) {
bool MadeIRChange = false;		bool MadeIRChange = false;
SmallVector<BasicBlock*, 256> Worklist;
Worklist.push_back(BB);

SmallVector<Instruction*, 128> InstrsForInstCombineWorklist;		SmallVector<Instruction*, 128> InstrsForInstCombineWorklist;
DenseMap<Constant , Constant > FoldedConstants;		DenseMap<Constant , Constant > FoldedConstants;

do {		ReversePostOrderTraversal<BasicBlock *> RPOT(BB);
BB = Worklist.pop_back_val();		LiveBlocks.insert(BB);

// We have now visited this block! If we've already been here, ignore it.		for (BasicBlock *BB : RPOT) {
if (!Visited.insert(BB).second)		if (!LiveBlocks.count(BB))
continue;		continue;

for (BasicBlock::iterator BBI = BB->begin(), E = BB->end(); BBI != E; ) {		for (BasicBlock::iterator BBI = BB->begin(), E = BB->end(); BBI != E; ) {
Instruction Inst = &BBI++;		Instruction Inst = &BBI++;

// ConstantProp instruction if trivially constant.		// ConstantProp instruction if trivially constant.
if (!Inst->use_empty() &&		if (!Inst->use_empty() &&
(Inst->getNumOperands() == 0 \|\| isa<Constant>(Inst->getOperand(0))))		(Inst->getNumOperands() == 0 \|\| isa<Constant>(Inst->getOperand(0))))
Show All 30 Lines	for (BasicBlock::iterator BBI = BB->begin(), E = BB->end(); BBI != E; ) {
}		}

// Skip processing debug intrinsics in InstCombine. Processing these call instructions		// Skip processing debug intrinsics in InstCombine. Processing these call instructions
// consumes non-trivial amount of time and provides no value for the optimization.		// consumes non-trivial amount of time and provides no value for the optimization.
if (!isa<DbgInfoIntrinsic>(Inst))		if (!isa<DbgInfoIntrinsic>(Inst))
InstrsForInstCombineWorklist.push_back(Inst);		InstrsForInstCombineWorklist.push_back(Inst);
}		}

// Recursively visit successors. If this is a branch or switch on a		// If this is a branch or switch on a constant, mark only the single
// constant, only visit the reachable successor.		// live successor. Otherwise assume all successors are live.
Instruction *TI = BB->getTerminator();		Instruction *TI = BB->getTerminator();
if (BranchInst *BI = dyn_cast<BranchInst>(TI)) {		if (BranchInst *BI = dyn_cast<BranchInst>(TI)) {
if (BI->isConditional() && isa<ConstantInt>(BI->getCondition())) {		if (BI->isConditional() && isa<ConstantInt>(BI->getCondition())) {
bool CondVal = cast<ConstantInt>(BI->getCondition())->getZExtValue();		bool CondVal = cast<ConstantInt>(BI->getCondition())->getZExtValue();
BasicBlock *ReachableBB = BI->getSuccessor(!CondVal);		BasicBlock *ReachableBB = BI->getSuccessor(!CondVal);
Worklist.push_back(ReachableBB);		LiveBlocks.insert(ReachableBB);
continue;		continue;
}		}
} else if (SwitchInst *SI = dyn_cast<SwitchInst>(TI)) {		} else if (SwitchInst *SI = dyn_cast<SwitchInst>(TI)) {
if (ConstantInt *Cond = dyn_cast<ConstantInt>(SI->getCondition())) {		if (ConstantInt *Cond = dyn_cast<ConstantInt>(SI->getCondition())) {
Worklist.push_back(SI->findCaseValue(Cond)->getCaseSuccessor());		LiveBlocks.insert(SI->findCaseValue(Cond)->getCaseSuccessor());
continue;		continue;
}		}
}		}

for (BasicBlock *SuccBB : successors(TI))		for (BasicBlock *SuccBB : successors(TI))
Worklist.push_back(SuccBB);		LiveBlocks.insert(SuccBB);
} while (!Worklist.empty());		}

// Once we've found all of the instructions to add to instcombine's worklist,		// Once we've found all of the instructions to add to instcombine's worklist,
// add them in reverse order. This way instcombine will visit from the top		// add them in reverse order. This way instcombine will visit from the top
// of the function down. This jives well with the way that it adds all uses		// of the function down. This jives well with the way that it adds all uses
// of instructions to the worklist after doing a transformation, thus avoiding		// of instructions to the worklist after doing a transformation, thus avoiding
// some N^2 behavior in pathological cases.		// some N^2 behavior in pathological cases.
ICWorklist.reserve(InstrsForInstCombineWorklist.size());		ICWorklist.reserve(InstrsForInstCombineWorklist.size());
for (Instruction *Inst : reverse(InstrsForInstCombineWorklist)) {		for (Instruction *Inst : reverse(InstrsForInstCombineWorklist)) {
Show All 22 Lines
static bool prepareICWorklistFromFunction(Function &F, const DataLayout &DL,		static bool prepareICWorklistFromFunction(Function &F, const DataLayout &DL,
TargetLibraryInfo *TLI,		TargetLibraryInfo *TLI,
InstCombineWorklist &ICWorklist) {		InstCombineWorklist &ICWorklist) {
bool MadeIRChange = false;		bool MadeIRChange = false;

// Do a depth-first traversal of the function, populate the worklist with		// Do a depth-first traversal of the function, populate the worklist with
// the reachable instructions. Ignore blocks that are not reachable. Keep		// the reachable instructions. Ignore blocks that are not reachable. Keep
// track of which blocks we visit.		// track of which blocks we visit.
SmallPtrSet<BasicBlock *, 32> Visited;		SmallPtrSet<BasicBlock *, 32> LiveBlocks;
MadeIRChange \|=		MadeIRChange \|=
AddReachableCodeToWorklist(&F.front(), DL, Visited, ICWorklist, TLI);		AddReachableCodeToWorklist(&F.front(), DL, LiveBlocks, ICWorklist, TLI);

// Do a quick scan over the function. If we find any blocks that are		// Do a quick scan over the function. If we find any blocks that are
// unreachable, remove any instructions inside of them. This prevents		// unreachable, remove any instructions inside of them. This prevents
// the instcombine code from having to deal with some bad special cases.		// the instcombine code from having to deal with some bad special cases.
for (BasicBlock &BB : F) {		for (BasicBlock &BB : F) {
if (Visited.count(&BB))		if (LiveBlocks.count(&BB))
continue;		continue;

unsigned NumDeadInstInBB = removeAllNonTerminatorAndEHPadInstructions(&BB);		unsigned NumDeadInstInBB = removeAllNonTerminatorAndEHPadInstructions(&BB);
MadeIRChange \|= NumDeadInstInBB > 0;		MadeIRChange \|= NumDeadInstInBB > 0;
NumDeadInst += NumDeadInstInBB;		NumDeadInst += NumDeadInstInBB;
}		}

return MadeIRChange;		return MadeIRChange;
▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

test/Transforms/InstCombine/icmp-div-constant.ll

	Show First 20 Lines • Show All 143 Lines • ▼ Show 20 Lines
	define i32 @icmp_div2(i16 %a, i16 %c) {			define i32 @icmp_div2(i16 %a, i16 %c) {
	; CHECK-LABEL: @icmp_div2(			; CHECK-LABEL: @icmp_div2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TOBOOL:%.]] = icmp eq i16 [[A:%.]], 0			; CHECK-NEXT: [[TOBOOL:%.]] = icmp eq i16 [[A:%.]], 0
	; CHECK-NEXT: br i1 [[TOBOOL]], label [[THEN:%.]], label [[EXIT:%.]]			; CHECK-NEXT: br i1 [[TOBOOL]], label [[THEN:%.]], label [[EXIT:%.]]
	; CHECK: then:			; CHECK: then:
	; CHECK-NEXT: br label [[EXIT]]			; CHECK-NEXT: br label [[EXIT]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[PHI:%.]] = phi i32 [ -1, [[ENTRY:%.]] ], [ 0, [[THEN]] ]			; CHECK-NEXT: ret i32 -1
	; CHECK-NEXT: ret i32 [[PHI]]
	;			;
	entry:			entry:
	%tobool = icmp eq i16 %a, 0			%tobool = icmp eq i16 %a, 0
	br i1 %tobool, label %then, label %exit			br i1 %tobool, label %then, label %exit

	then:			then:
	%div = sdiv i16 %c, 0			%div = sdiv i16 %c, 0
	%cmp = icmp ne i16 %div, 0			%cmp = icmp ne i16 %div, 0
	Show All 38 Lines

test/Transforms/InstCombine/pr44245.ll

	Show First 20 Lines • Show All 151 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @test_2(			; CHECK-LABEL: @test_2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[WHILE_COND:%.*]]			; CHECK-NEXT: br label [[WHILE_COND:%.*]]
	; CHECK: while.cond:			; CHECK: while.cond:
	; CHECK-NEXT: br label [[FOR_COND:%.*]]			; CHECK-NEXT: br label [[FOR_COND:%.*]]
	; CHECK: for.cond:			; CHECK: for.cond:
	; CHECK-NEXT: br i1 [[C:%.]], label [[COND_TRUE133:%.]], label [[COND_FALSE138:%.*]]			; CHECK-NEXT: br i1 [[C:%.]], label [[COND_TRUE133:%.]], label [[COND_FALSE138:%.*]]
	; CHECK: cond.true133:			; CHECK: cond.true133:
				; CHECK-NEXT: store %type_2* undef, %type_2** null, align 536870912
	; CHECK-NEXT: br label [[COND_END144:%.*]]			; CHECK-NEXT: br label [[COND_END144:%.*]]
	; CHECK: cond.false138:			; CHECK: cond.false138:
	; CHECK-NEXT: store %type_2* undef, %type_2** null, align 536870912			; CHECK-NEXT: store %type_2* undef, %type_2** null, align 536870912
	; CHECK-NEXT: br label [[COND_END144]]			; CHECK-NEXT: br label [[COND_END144]]
	; CHECK: cond.end144:			; CHECK: cond.end144:
	; CHECK-NEXT: br label [[WHILE_COND]]			; CHECK-NEXT: br label [[WHILE_COND]]
	;			;
	entry:			entry:
	Show All 24 Lines

test/Transforms/InstCombine/store.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -instcombine-infinite-loop-threshold=2 -S \| FileCheck %s
				xbolva00Unsubmitted Not Done Reply Inline Actions since -infinite-loop-threshold now serves as a max treshold value, should we rename it? -instcombine-iterations-threshold? xbolva00: since -infinite-loop-threshold now serves as a max treshold value, should we rename it?
				nikicAuthorUnsubmitted Done Reply Inline Actions Yes, the "infinite loop" terminology doesn't make a lot of sense for this usage. We also have a `-instcombine-max-iterations` setting, so it's a bit tricky to distinguish these two limits (one just stops, the other reports a fatal error). nikic: Yes, the "infinite loop" terminology doesn't make a lot of sense for this usage. We also have a…

	define void @test1(i32* %P) {			define void @test1(i32* %P) {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: store i32 123, i32* undef, align 4			; CHECK-NEXT: store i32 123, i32* undef, align 4
	; CHECK-NEXT: store i32 undef, i32* null, align 536870912			; CHECK-NEXT: store i32 undef, i32* null, align 536870912
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	store i32 undef, i32* %P			store i32 undef, i32* %P
	▲ Show 20 Lines • Show All 298 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Process blocks in RPO
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 247284

lib/Transforms/InstCombine/InstructionCombining.cpp

test/Transforms/InstCombine/icmp-div-constant.ll

test/Transforms/InstCombine/pr44245.ll

test/Transforms/InstCombine/store.ll

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Process blocks in RPOClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 247284

lib/Transforms/InstCombine/InstructionCombining.cpp

test/Transforms/InstCombine/icmp-div-constant.ll

test/Transforms/InstCombine/pr44245.ll

test/Transforms/InstCombine/store.ll

[InstCombine] Process blocks in RPO
ClosedPublic