This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
InstructionWorklist.h
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineInternal.h
-
test/
-
Other/
-
print-debug-counter.ll
-
Transforms/InstCombine/
-
InstCombine/
2/5
or-shifted-masks.ll

Differential D151807

[InstCombine] Revisit user of newly one-use instructions
ClosedPublic

Authored by nikic on May 31 2023, 7:50 AM.

Download Raw Diff

Details

Reviewers

goldstein.w.n
RKSimon
fhahn

Commits

rGf7a977c7b3b4: [InstCombine] Revisit user of newly one-use instructions

Summary

Many folds in InstCombine are limited to one-use instructions. For that reason, if the use-count of an instruction drops to one, it makes sense to revisit that one user. This is one of the most common reasons why InstCombine fails to finish in a single iteration.

Doing this revisit actually slightly improves compile-time (http://llvm-compile-time-tracker.com/compare.php?from=97f0e7b06e6b76fd85fb81b8c12eba2255ff1742&to=fc740dee13f42a948b378d731e666d0a80d16061&stat=instructions:u), because we save an extra InstCombine iteration in enough cases to make a visible difference.

This is conceptually NFC, but not NFC in practice, because differences in worklist order can result in slightly different folding behavior.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nikic created this revision.May 31 2023, 7:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 31 2023, 7:50 AM

Herald added subscribers: StephenFan, hiraditya. · View Herald Transcript

nikic requested review of this revision.May 31 2023, 7:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 31 2023, 7:50 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

nikic added inline comments.May 31 2023, 8:08 AM

llvm/test/Transforms/InstCombine/or-shifted-masks.ll
55	This is a bit unfortunate, but I believe it's ultimately not a problem. The two lshrs here are the same instruction and will be CSEd, at which point this reduces to the same IR. Similar for the test below. I double checked the worklist order, and the new one is "more correct" (i.e. follows instruction order more closely), but happens to produce worse results in this instance.

Harbormaster completed remote builds in B235572: Diff 527040.May 31 2023, 9:02 AM

ping

goldstein.w.n added inline comments.Jun 7 2023, 2:01 AM

llvm/test/Transforms/InstCombine/or-shifted-masks.ll

This one seems to keep the worse codegen (full -O3 pipeline):
Before:

define i32 @multiuse2(i32 %x) {
  %i = shl i32 %x, 1
  %i6 = shl i32 %x, 8
  %i10 = and i32 %i6, 32256
  %i12 = and i32 %i, 252
  %i13 = or i32 %i10, %i12
  ret i32 %i13
}

After:

define i32 @multiuse2(i32 %x) {
  %i = shl i32 %x, 1
  %i6 = and i32 %x, 96
  %i7 = shl nuw nsw i32 %i6, 8
  %i8 = shl nuw nsw i32 %i6, 1
  %i14 = shl i32 %x, 8
  %i9 = and i32 %i14, 7680
  %i10 = or i32 %i7, %i9
  %1 = and i32 %i, 60
  %i12 = or i32 %i8, %1
  %i13 = or i32 %i10, %i12
  ret i32 %i13
}

Its not a dealbreaker imo, but maybe leave a TODO on the fold the implements it indicating the combine is missing a case.

RKSimon added inline comments.Jun 7 2023, 6:27 AM

llvm/test/Transforms/InstCombine/or-shifted-masks.ll
80	@nikic Are you intending to look at this regression in a followup?

nikic added inline comments.Jun 9 2023, 6:45 AM

llvm/test/Transforms/InstCombine/or-shifted-masks.ll
80	We're missing this fold: https://alive2.llvm.org/ce/z/ZXvVh-

goldstein.w.n mentioned this in D152568: [InstCombine] Transform `(binop1 (binop2 (lshift X,Amt),Mask),(lshift Y,Amt))`.Jun 9 2023, 11:02 AM

goldstein.w.n added inline comments.Jun 9 2023, 11:03 AM

llvm/test/Transforms/InstCombine/or-shifted-masks.ll
80	See: D152568 to fix that.

LGTM @goldstein.w.n any other comments?

This revision is now accepted and ready to land.Jun 12 2023, 6:04 AM

LGTM.

goldstein.w.n mentioned this in rG91cdffcb2f9e: [InstCombine] Transform `(binop1 (binop2 (lshift X,Amt),Mask),(lshift Y,Amt))`.Jun 13 2023, 6:08 PM

goldstein.w.n mentioned this in D152876: [InstCombine] Expand `foldBinOpShiftWithShift` to handle multiple binops.Jun 13 2023, 6:27 PM

Closed by commit rGf7a977c7b3b4: [InstCombine] Revisit user of newly one-use instructions (authored by nikic). · Explain WhyJun 14 2023, 12:12 AM

This revision was automatically updated to reflect the committed changes.

nikic added a commit: rGf7a977c7b3b4: [InstCombine] Revisit user of newly one-use instructions.

@nikic this is causing an infinite loop in opt -passes=instcombine, test case:

define void @main(<4 x float> %arg) {
bb:
  %i = shufflevector <4 x float> %arg, <4 x float> zeroinitializer, <2 x i32> <i32 2, i32 3>
  %i1 = bitcast <2 x float> %i to <2 x i32>
  %i2 = shufflevector <2 x i32> %i1, <2 x i32> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
  %i3 = extractelement <4 x i32> %i2, i32 0
  %i4 = insertelement <4 x i32> zeroinitializer, i32 %i3, i32 0
  %i5 = shufflevector <4 x i32> %i4, <4 x i32> zeroinitializer, <2 x i32> <i32 1, i32 0>
  %i6 = bitcast <2 x i32> %i5 to <2 x float>
  %i7 = call <2 x float> (...) null(<2 x float> %i6, <2 x float> zeroinitializer, <2 x float> zeroinitializer)
  ret void
}

nikic mentioned this in rG57a8ea855385: [InstCombine] Avoid infinite loop in insert/extract combine.Jun 14 2023, 5:59 AM

@foad Thanks for the report! This should be fixed with https://github.com/llvm/llvm-project/commit/57a8ea85538503a35d1e04fd8c8ba32aa2ba3f2a. Let me know if you encounter any further issues.

In D151807#4420875, @nikic wrote:

@foad Thanks for the report! This should be fixed with https://github.com/llvm/llvm-project/commit/57a8ea85538503a35d1e04fd8c8ba32aa2ba3f2a. Let me know if you encounter any further issues.

Thanks! I've also verified that it fixes my original (unreduced) test case.

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Utils/

InstructionWorklist.h

11 lines

lib/

Transforms/

InstCombine/

InstCombineInternal.h

7 lines

test/

Other/

print-debug-counter.ll

2 lines

Transforms/

InstCombine/

or-shifted-masks.ll

30 lines

Diff 531215

llvm/include/llvm/Transforms/Utils/InstructionWorklist.h

Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	public:

/// When an instruction is simplified, add all users of the instruction		/// When an instruction is simplified, add all users of the instruction
/// to the work lists because they might get more simplified now.		/// to the work lists because they might get more simplified now.
void pushUsersToWorkList(Instruction &I) {		void pushUsersToWorkList(Instruction &I) {
for (User *U : I.users())		for (User *U : I.users())
push(cast<Instruction>(U));		push(cast<Instruction>(U));
}		}

		/// Should be called after decrementing the use-count on V.
		void handleUseCountDecrement(Value *V) {
		if (auto *I = dyn_cast<Instruction>(V)) {
		add(I);
		// Many folds have one-use limitations. If there's only one use left,
		// revisit that use.
		if (I->hasOneUse())
		add(cast<Instruction>(*I->user_begin()));
		}
		}

/// Check that the worklist is empty and nuke the backing store for the map.		/// Check that the worklist is empty and nuke the backing store for the map.
void zap() {		void zap() {
assert(WorklistMap.empty() && "Worklist empty, but map not?");		assert(WorklistMap.empty() && "Worklist empty, but map not?");
assert(Deferred.empty() && "Deferred instructions left over");		assert(Deferred.empty() && "Deferred instructions left over");

// Do an explicit clear, this shrinks the map if needed.		// Do an explicit clear, this shrinks the map if needed.
WorklistMap.clear();		WorklistMap.clear();
}		}
};		};

} // end namespace llvm.		} // end namespace llvm.

#endif		#endif

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 405 Lines • ▼ Show 20 Lines	public:
/// methods should return the value returned by this function.		/// methods should return the value returned by this function.
Instruction *eraseInstFromFunction(Instruction &I) override {		Instruction *eraseInstFromFunction(Instruction &I) override {
LLVM_DEBUG(dbgs() << "IC: ERASE " << I << '\n');		LLVM_DEBUG(dbgs() << "IC: ERASE " << I << '\n');
assert(I.use_empty() && "Cannot erase instruction that is used!");		assert(I.use_empty() && "Cannot erase instruction that is used!");
salvageDebugInfo(I);		salvageDebugInfo(I);

// Make sure that we reprocess all operands now that we reduced their		// Make sure that we reprocess all operands now that we reduced their
// use counts.		// use counts.
for (Use &Operand : I.operands())		SmallVector<Value *> Ops(I.operands());
if (auto *Inst = dyn_cast<Instruction>(Operand))
Worklist.add(Inst);

Worklist.remove(&I);		Worklist.remove(&I);
I.eraseFromParent();		I.eraseFromParent();
		for (Value *Op : Ops)
		Worklist.handleUseCountDecrement(Op);
MadeIRChange = true;		MadeIRChange = true;
return nullptr; // Don't do anything with FI		return nullptr; // Don't do anything with FI
}		}

OverflowResult computeOverflow(		OverflowResult computeOverflow(
Instruction::BinaryOps BinaryOp, bool IsSigned,		Instruction::BinaryOps BinaryOp, bool IsSigned,
Value LHS, Value RHS, Instruction *CxtI) const;		Value LHS, Value RHS, Instruction *CxtI) const;

▲ Show 20 Lines • Show All 291 Lines • Show Last 20 Lines

llvm/test/Other/print-debug-counter.ll

	; REQUIRES: asserts			; REQUIRES: asserts

	; RUN: opt -S -debug-counter=early-cse-skip=1,early-cse-count=1 -passes=early-cse,newgvn,instcombine -earlycse-debug-hash \			; RUN: opt -S -debug-counter=early-cse-skip=1,early-cse-count=1 -passes=early-cse,newgvn,instcombine -earlycse-debug-hash \
	; RUN: -debug-counter=newgvn-vn-skip=1,newgvn-vn-count=2 \			; RUN: -debug-counter=newgvn-vn-skip=1,newgvn-vn-count=2 \
	; RUN: -print-debug-counter < %s 2>&1 \| FileCheck %s			; RUN: -print-debug-counter < %s 2>&1 \| FileCheck %s
	;; Test debug counter prints correct info in right order.			;; Test debug counter prints correct info in right order.
	; CHECK-LABEL: Counters and values:			; CHECK-LABEL: Counters and values:
	; CHECK: early-cse			; CHECK: early-cse
	; CHECK-SAME: {4,1,1}			; CHECK-SAME: {4,1,1}
	; CHECK: instcombine-visit			; CHECK: instcombine-visit
	; CHECK-SAME: {12,0,-1}			; CHECK-SAME: {13,0,-1}
	; CHECK: newgvn-vn			; CHECK: newgvn-vn
	; CHECK-SAME: {9,1,2}			; CHECK-SAME: {9,1,2}
	define i32 @f1(i32 %a, i32 %b) {			define i32 @f1(i32 %a, i32 %b) {
	bb:			bb:
	%add1 = add i32 %a, %b			%add1 = add i32 %a, %b
	%add2 = add i32 %a, %b			%add2 = add i32 %a, %b
	%add3 = add i32 %a, %b			%add3 = add i32 %a, %b
	%add4 = add i32 %a, %b			%add4 = add i32 %a, %b
	Show All 13 Lines

llvm/test/Transforms/InstCombine/or-shifted-masks.ll

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	;
%i = and i32 %x, 7		%i = and i32 %x, 7
%i1 = shl i32 %i, 3		%i1 = shl i32 %i, 3
%i2 = shl i32 %x, 2		%i2 = shl i32 %x, 2
%i3 = and i32 %i2, 28		%i3 = and i32 %i2, 28
%i4 = or i32 %i1, %i3		%i4 = or i32 %i1, %i3
ret i32 %i4		ret i32 %i4
}		}

define i32 @multiuse1(i32 %x) {		define i32 @multiuse1(i32 %x) {
		nikicAuthorUnsubmitted Done Reply Inline Actions This is a bit unfortunate, but I believe it's ultimately not a problem. The two lshrs here are the same instruction and will be CSEd, at which point this reduces to the same IR. Similar for the test below. I double checked the worklist order, and the new one is "more correct" (i.e. follows instruction order more closely), but happens to produce worse results in this instance. nikic: This is a bit unfortunate, but I believe it's ultimately not a problem. The two lshrs here are…
; CHECK-LABEL: @multiuse1(		; CHECK-LABEL: @multiuse1(
; CHECK-NEXT: [[I21:%.]] = shl i32 [[X:%.]], 6		; CHECK-NEXT: [[I:%.]] = lshr i32 [[X:%.]], 1
		; CHECK-NEXT: [[I3:%.*]] = and i32 [[I]], 1
		; CHECK-NEXT: [[I1:%.*]] = lshr i32 [[X]], 1
		; CHECK-NEXT: [[I5:%.*]] = and i32 [[I1]], 2
		; CHECK-NEXT: [[I21:%.*]] = shl i32 [[X]], 6
; CHECK-NEXT: [[I6:%.*]] = and i32 [[I21]], 384		; CHECK-NEXT: [[I6:%.*]] = and i32 [[I21]], 384
; CHECK-NEXT: [[I32:%.*]] = lshr i32 [[X]], 1		; CHECK-NEXT: [[I7:%.*]] = or i32 [[I3]], [[I5]]
; CHECK-NEXT: [[I7:%.*]] = and i32 [[I32]], 3
; CHECK-NEXT: [[I8:%.*]] = or i32 [[I7]], [[I6]]		; CHECK-NEXT: [[I8:%.*]] = or i32 [[I7]], [[I6]]
; CHECK-NEXT: ret i32 [[I8]]		; CHECK-NEXT: ret i32 [[I8]]
;		;
%i = and i32 %x, 2		%i = and i32 %x, 2
%i1 = and i32 %x, 4		%i1 = and i32 %x, 4
%i2 = shl nuw nsw i32 %i, 6		%i2 = shl nuw nsw i32 %i, 6
%i3 = lshr exact i32 %i, 1		%i3 = lshr exact i32 %i, 1
%i4 = shl nuw nsw i32 %i1, 6		%i4 = shl nuw nsw i32 %i1, 6
%i5 = lshr exact i32 %i1, 1		%i5 = lshr exact i32 %i1, 1
%i6 = or i32 %i2, %i4		%i6 = or i32 %i2, %i4
%i7 = or i32 %i3, %i5		%i7 = or i32 %i3, %i5
%i8 = or i32 %i7, %i6		%i8 = or i32 %i7, %i6
ret i32 %i8		ret i32 %i8
}		}

define i32 @multiuse2(i32 %x) {		define i32 @multiuse2(i32 %x) {
; CHECK-LABEL: @multiuse2(		; CHECK-LABEL: @multiuse2(
		goldstein.w.nUnsubmitted Not Done Reply Inline Actions This one seems to keep the worse codegen (full -O3 pipeline): Before: define i32 @multiuse2(i32 %x) { %i = shl i32 %x, 1 %i6 = shl i32 %x, 8 %i10 = and i32 %i6, 32256 %i12 = and i32 %i, 252 %i13 = or i32 %i10, %i12 ret i32 %i13 } After: define i32 @multiuse2(i32 %x) { %i = shl i32 %x, 1 %i6 = and i32 %x, 96 %i7 = shl nuw nsw i32 %i6, 8 %i8 = shl nuw nsw i32 %i6, 1 %i14 = shl i32 %x, 8 %i9 = and i32 %i14, 7680 %i10 = or i32 %i7, %i9 %1 = and i32 %i, 60 %i12 = or i32 %i8, %1 %i13 = or i32 %i10, %i12 ret i32 %i13 } Its not a dealbreaker imo, but maybe leave a TODO on the fold the implements it indicating the combine is missing a case. goldstein.w.n: This one seems to keep the worse codegen (full -O3 pipeline): Before: ``` define i32 @multiuse2…
		RKSimonUnsubmitted Not Done Reply Inline Actions @nikic Are you intending to look at this regression in a followup? RKSimon: @nikic Are you intending to look at this regression in a followup?
		nikicAuthorUnsubmitted Done Reply Inline Actions We're missing this fold: https://alive2.llvm.org/ce/z/ZXvVh- nikic: We're missing this fold: https://alive2.llvm.org/ce/z/ZXvVh-
		goldstein.w.nUnsubmitted Not Done Reply Inline Actions See: D152568 to fix that. goldstein.w.n: See: D152568 to fix that.
; CHECK-NEXT: [[TMP1:%.]] = shl i32 [[X:%.]], 8		; CHECK-NEXT: [[I:%.]] = shl i32 [[X:%.]], 1
		; CHECK-NEXT: [[I2:%.*]] = and i32 [[I]], 12
		; CHECK-NEXT: [[I3:%.*]] = shl i32 [[X]], 1
		; CHECK-NEXT: [[I5:%.*]] = and i32 [[I3]], 48
		; CHECK-NEXT: [[I6:%.*]] = shl i32 [[X]], 1
		; CHECK-NEXT: [[I8:%.*]] = and i32 [[I6]], 192
		; CHECK-NEXT: [[TMP1:%.*]] = shl i32 [[X]], 8
; CHECK-NEXT: [[I10:%.*]] = and i32 [[TMP1]], 32256		; CHECK-NEXT: [[I10:%.*]] = and i32 [[TMP1]], 32256
; CHECK-NEXT: [[TMP2:%.*]] = shl i32 [[X]], 1		; CHECK-NEXT: [[I11:%.*]] = or i32 [[I8]], [[I5]]
; CHECK-NEXT: [[I12:%.*]] = and i32 [[TMP2]], 252		; CHECK-NEXT: [[I12:%.*]] = or i32 [[I2]], [[I11]]
; CHECK-NEXT: [[I13:%.*]] = or i32 [[I10]], [[I12]]		; CHECK-NEXT: [[I13:%.*]] = or i32 [[I10]], [[I12]]
; CHECK-NEXT: ret i32 [[I13]]		; CHECK-NEXT: ret i32 [[I13]]
;		;
%i = and i32 %x, 6		%i = and i32 %x, 6
%i1 = shl nuw nsw i32 %i, 8		%i1 = shl nuw nsw i32 %i, 8
%i2 = shl nuw nsw i32 %i, 1		%i2 = shl nuw nsw i32 %i, 1
%i3 = and i32 %x, 24		%i3 = and i32 %x, 24
%i4 = shl nuw nsw i32 %i3, 8		%i4 = shl nuw nsw i32 %i3, 8
%i5 = shl nuw nsw i32 %i3, 1		%i5 = shl nuw nsw i32 %i3, 1
%i6 = and i32 %x, 96		%i6 = and i32 %x, 96
%i7 = shl nuw nsw i32 %i6, 8		%i7 = shl nuw nsw i32 %i6, 8
%i8 = shl nuw nsw i32 %i6, 1		%i8 = shl nuw nsw i32 %i6, 1
%i9 = or i32 %i1, %i4		%i9 = or i32 %i1, %i4
%i10 = or i32 %i7, %i9		%i10 = or i32 %i7, %i9
%i11 = or i32 %i8, %i5		%i11 = or i32 %i8, %i5
%i12 = or i32 %i2, %i11		%i12 = or i32 %i2, %i11
%i13 = or i32 %i10, %i12		%i13 = or i32 %i10, %i12
ret i32 %i13		ret i32 %i13
}		}

define i32 @multiuse3(i32 %x) {		define i32 @multiuse3(i32 %x) {
; CHECK-LABEL: @multiuse3(		; CHECK-LABEL: @multiuse3(
; CHECK-NEXT: [[TMP1:%.]] = shl i32 [[X:%.]], 6		; CHECK-NEXT: [[I:%.]] = lshr i32 [[X:%.]], 1
		; CHECK-NEXT: [[I2:%.*]] = and i32 [[I]], 48
		; CHECK-NEXT: [[TMP1:%.*]] = shl i32 [[X]], 6
; CHECK-NEXT: [[I5:%.*]] = and i32 [[TMP1]], 8064		; CHECK-NEXT: [[I5:%.*]] = and i32 [[TMP1]], 8064
; CHECK-NEXT: [[TMP2:%.*]] = lshr i32 [[X]], 1		; CHECK-NEXT: [[I6:%.*]] = lshr i32 [[X]], 1
; CHECK-NEXT: [[I8:%.*]] = and i32 [[TMP2]], 63		; CHECK-NEXT: [[I7:%.*]] = and i32 [[I6]], 15
		; CHECK-NEXT: [[I8:%.*]] = or i32 [[I2]], [[I7]]
; CHECK-NEXT: [[I9:%.*]] = or i32 [[I8]], [[I5]]		; CHECK-NEXT: [[I9:%.*]] = or i32 [[I8]], [[I5]]
; CHECK-NEXT: ret i32 [[I9]]		; CHECK-NEXT: ret i32 [[I9]]
;		;
%i = and i32 %x, 96		%i = and i32 %x, 96
%i1 = shl nuw nsw i32 %i, 6		%i1 = shl nuw nsw i32 %i, 6
%i2 = lshr exact i32 %i, 1		%i2 = lshr exact i32 %i, 1
%i3 = shl i32 %x, 6		%i3 = shl i32 %x, 6
%i4 = and i32 %i3, 1920		%i4 = and i32 %i3, 1920
▲ Show 20 Lines • Show All 179 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Revisit user of newly one-use instructionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 531215

llvm/include/llvm/Transforms/Utils/InstructionWorklist.h

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

llvm/test/Other/print-debug-counter.ll

llvm/test/Transforms/InstCombine/or-shifted-masks.ll

[InstCombine] Revisit user of newly one-use instructions
ClosedPublic