This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
4/8
NaryReassociate.cpp
-
test/Transforms/NaryReassociate/
-
Transforms/
-
NaryReassociate/
-
nary-req.ll

Differential D112060

[NARY-REASSOCIATE] Fix infinite recursion optimizing min\max
ClosedPublic

Authored by ebrevnov on Oct 19 2021, 3:15 AM.

Download Raw Diff

Details

Reviewers

mkazantsev
spatel
lebedev.ri
nikic
alex-t

Commits

rG269f563a2bcd: [NARY-REASSOCIATE] Fix infinite recursion optimizing min\max

Summary

To guarantee convergence of the algorithm each optimization step should decrease number of instructions when IR is modified. This property is not held in this test case. The problem is that SCEV Expander may do "unexpected" reassociation what results in creation of new min/max chains and introduction of extra instructions. As a result on each step we indefinitely optimize back and forth.

The solution is to restrict SCEV Expander to perform uncontrolled reassociations by means of "Unknown" expressions.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ebrevnov created this revision.Oct 19 2021, 3:15 AM

Herald added subscribers: javed.absar, hiraditya. · View Herald TranscriptOct 19 2021, 3:15 AM

ebrevnov requested review of this revision.Oct 19 2021, 3:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2021, 3:15 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

ebrevnov added reviewers: mkazantsev, spatel.Oct 19 2021, 3:17 AM

ebrevnov mentioned this in D88287: [NARY-REASSOCIATE] Support reassociation of min/max.Oct 19 2021, 3:19 AM

Harbormaster completed remote builds in B129506: Diff 380630.Oct 19 2021, 3:47 AM

I've never looked at this pass closely before, so another reviewer should comment too.
It may be independent of this patch, but I find the use of PatternMatch internals ("bind_ty", "smax_pred_ty", etc) confusing - why is that necessary?
Is this code going to stop being useful when we canonicalize to min/max intrinsics ( D98152 ) - it is specifically matching cmp/sel idioms only?

llvm/lib/Transforms/Scalar/NaryReassociate.cpp
615	This code comment doesn't match the code?

alex-t added a reviewer: alex-t.Oct 19 2021, 9:17 AM

The fix itself (using SCEVUnknown) looks good to me, though I agree that the whole implementation is pretty odd and could be cleaned up in a followup change.

llvm/lib/Transforms/Scalar/NaryReassociate.cpp
615	Something that also confuses me about this code is why it performs a loop with two iterations over `j`, and then has completely separate behavior for them. Wouldn't this whole loop be equivalent to the following? if (BExpr != CExpr) { std::swap(BExpr, CExpr); std::swap(B, C); } if (AExpr != CExpr) { std::swap(AExpr, CExpr); std::swap(A, C); } Or if we allow some redundant swaps just: std::swap(BExpr, CExpr); std::swap(B, C); std::swap(AExpr, CExpr); std::swap(A, C); Which reorders `A B C` into `B C A`. Though as all of these variables are under your control I think you could just directly match them into the right variables rather than swapping them after the fact. Am I missing something here?
642	You can pass these directly, no need to create SmallVector for an ArrayRef.

This revision is now accepted and ready to land.Oct 19 2021, 9:36 AM

ebrevnov added inline comments.Oct 19 2021, 10:53 PM

llvm/lib/Transforms/Scalar/NaryReassociate.cpp
615	This code comment doesn't match the code? It does matches the code but not easy to follow. The trick is that pointers named A, B, and C refers to different expressions on each iteration. For example, after the first swap B points to what C used to point.... I agree all that looks confusing ... I will try to restructure the code to make it simpler for understanding.
615	Something that also confuses me about this code is why it performs a loop with two iterations over `j`, and then has completely separate behavior for them. Wouldn't this whole loop be equivalent to the following? if (BExpr != CExpr) { std::swap(BExpr, CExpr); std::swap(B, C); } if (AExpr != CExpr) { std::swap(AExpr, CExpr); std::swap(A, C); } Or if we allow some redundant swaps just: std::swap(BExpr, CExpr); std::swap(B, C); std::swap(AExpr, CExpr); std::swap(A, C); Which reorders `A B C` into `B C A`. Though as all of these variables are under your control I think you could just directly match them into the right variables rather than swapping them after the fact. Am I missing something here? No, this is not semantically equivalent code. The idea is to try two combinations: (AopC)opB and then (BopC)opA. That's why the loop of two iterations. I agree all that look confusing...I will try to restructure the code to make it simpler for understanding.
642	You can pass these directly, no need to create SmallVector for an ArrayRef. Thanks. Will fix.

Closed by commit rG269f563a2bcd: [NARY-REASSOCIATE] Fix infinite recursion optimizing min\max (authored by Evgeniy Brevnov <ybrevnov@azul.com>). · Explain WhyOct 20 2021, 12:23 AM

This revision was automatically updated to reflect the committed changes.

Evgeniy Brevnov <ybrevnov@azul.com> added a commit: rG269f563a2bcd: [NARY-REASSOCIATE] Fix infinite recursion optimizing min\max.

nikic added inline comments.Oct 20 2021, 12:37 AM

llvm/lib/Transforms/Scalar/NaryReassociate.cpp
615	No, this is not semantically equivalent code. The idea is to try two combinations: (AopC)opB and then (BopC)opA. That's why the loop of two iterations. Uh yeah, I misread the code. For some reason I thought the `j` loop ends after the swaps, but there's more code after it :) Maybe a clearer way to do this is to have a helper function with the actual logic, and then call it twice with arguments `A, C, B` and `B, C, A`?

ebrevnov added inline comments.Oct 20 2021, 3:07 AM

llvm/lib/Transforms/Scalar/NaryReassociate.cpp
615	Maybe a clearer way to do this is to have a helper function with the actual logic, and then call it twice with arguments `A, C, B` and `B, C, A`? That's exactly how I was going to restructure the code. Here it is https://reviews.llvm.org/D112128

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

NaryReassociate.cpp

21 lines

test/

Transforms/

NaryReassociate/

nary-req.ll

42 lines

Diff 380864

llvm/lib/Transforms/Scalar/NaryReassociate.cpp

	Show First 20 Lines • Show All 592 Lines • ▼ Show 20 Lines
	template <typename MaxMinT>			template <typename MaxMinT>
	Value NaryReassociatePass::tryReassociateMinOrMax(Instruction I,			Value NaryReassociatePass::tryReassociateMinOrMax(Instruction I,
	MaxMinT MaxMinMatch,			MaxMinT MaxMinMatch,
	Value LHS, Value RHS) {			Value LHS, Value RHS) {
	Value A = nullptr, B = nullptr;			Value A = nullptr, B = nullptr;
	MaxMinT m_MaxMin(m_Value(A), m_Value(B));			MaxMinT m_MaxMin(m_Value(A), m_Value(B));
	for (unsigned int i = 0; i < 2; ++i) {			for (unsigned int i = 0; i < 2; ++i) {
	if (!LHS->hasNUsesOrMore(3) && match(LHS, m_MaxMin)) {			if (!LHS->hasNUsesOrMore(3) && match(LHS, m_MaxMin)) {
				Value *C = RHS;
	const SCEV AExpr = SE->getSCEV(A), BExpr = SE->getSCEV(B);			const SCEV AExpr = SE->getSCEV(A), BExpr = SE->getSCEV(B);
	const SCEV *RHSExpr = SE->getSCEV(RHS);			const SCEV *CExpr = SE->getSCEV(C);
	for (unsigned int j = 0; j < 2; ++j) {			for (unsigned int j = 0; j < 2; ++j) {
	if (j == 0) {			if (j == 0) {
	if (BExpr == RHSExpr)			if (BExpr == CExpr)
	continue;			continue;
	// Transform 'I = (A op B) op RHS' to 'I = (A op RHS) op B' on the			// Transform 'I = (A op B) op C' to 'I = (A op C) op B' on the
	// first iteration.			// first iteration.
	std::swap(BExpr, RHSExpr);			std::swap(BExpr, CExpr);
				std::swap(B, C);
	} else {			} else {
	if (AExpr == RHSExpr)			if (AExpr == CExpr)
	continue;			continue;
	// Transform 'I = (A op RHS) op B' 'I = (B op RHS) op A' on the second			// Transform 'I = (A op C) op B' to 'I = (B op C) op A' on the second
				spatelUnsubmitted Not Done Reply Inline Actions This code comment doesn't match the code? spatel: This code comment doesn't match the code?
				nikicUnsubmitted Not Done Reply Inline Actions Something that also confuses me about this code is why it performs a loop with two iterations over `j`, and then has completely separate behavior for them. Wouldn't this whole loop be equivalent to the following? if (BExpr != CExpr) { std::swap(BExpr, CExpr); std::swap(B, C); } if (AExpr != CExpr) { std::swap(AExpr, CExpr); std::swap(A, C); } Or if we allow some redundant swaps just: std::swap(BExpr, CExpr); std::swap(B, C); std::swap(AExpr, CExpr); std::swap(A, C); Which reorders `A B C` into `B C A`. Though as all of these variables are under your control I think you could just directly match them into the right variables rather than swapping them after the fact. Am I missing something here? nikic: Something that also confuses me about this code is why it performs a loop with two iterations…
				ebrevnovAuthorUnsubmitted Done Reply Inline Actions Something that also confuses me about this code is why it performs a loop with two iterations over `j`, and then has completely separate behavior for them. Wouldn't this whole loop be equivalent to the following? if (BExpr != CExpr) { std::swap(BExpr, CExpr); std::swap(B, C); } if (AExpr != CExpr) { std::swap(AExpr, CExpr); std::swap(A, C); } Or if we allow some redundant swaps just: std::swap(BExpr, CExpr); std::swap(B, C); std::swap(AExpr, CExpr); std::swap(A, C); Which reorders `A B C` into `B C A`. Though as all of these variables are under your control I think you could just directly match them into the right variables rather than swapping them after the fact. Am I missing something here? No, this is not semantically equivalent code. The idea is to try two combinations: (AopC)opB and then (BopC)opA. That's why the loop of two iterations. I agree all that look confusing...I will try to restructure the code to make it simpler for understanding. ebrevnov: > Something that also confuses me about this code is why it performs a loop with two iterations…
				nikicUnsubmitted Not Done Reply Inline Actions No, this is not semantically equivalent code. The idea is to try two combinations: (AopC)opB and then (BopC)opA. That's why the loop of two iterations. Uh yeah, I misread the code. For some reason I thought the `j` loop ends after the swaps, but there's more code after it :) Maybe a clearer way to do this is to have a helper function with the actual logic, and then call it twice with arguments `A, C, B` and `B, C, A`? nikic: > No, this is not semantically equivalent code. The idea is to try two combinations: (AopC)opB…
				ebrevnovAuthorUnsubmitted Done Reply Inline Actions Maybe a clearer way to do this is to have a helper function with the actual logic, and then call it twice with arguments `A, C, B` and `B, C, A`? That's exactly how I was going to restructure the code. Here it is https://reviews.llvm.org/D112128 ebrevnov: > Maybe a clearer way to do this is to have a helper function with the actual logic, and then…
				ebrevnovAuthorUnsubmitted Done Reply Inline Actions This code comment doesn't match the code? It does matches the code but not easy to follow. The trick is that pointers named A, B, and C refers to different expressions on each iteration. For example, after the first swap B points to what C used to point.... I agree all that looks confusing ... I will try to restructure the code to make it simpler for understanding. ebrevnov: > This code comment doesn't match the code? It does matches the code but not easy to follow.
	// iteration.			// iteration.
	std::swap(AExpr, RHSExpr);			std::swap(AExpr, CExpr);
				std::swap(A, C);
	}			}

	// The optimization is profitable only if LHS can be removed in the end.			// The optimization is profitable only if LHS can be removed in the end.
	// In other words LHS should be used (directly or indirectly) by I only.			// In other words LHS should be used (directly or indirectly) by I only.
	if (llvm::any_of(LHS->users(), [&](auto *U) {			if (llvm::any_of(LHS->users(), [&](auto *U) {
	return U != I && !(U->hasOneUser() && *U->users().begin() == I);			return U != I && !(U->hasOneUser() && *U->users().begin() == I);
	}))			}))
	continue;			continue;

	SCEVExpander Expander(SE, DL, "nary-reassociate");			SCEVExpander Expander(SE, DL, "nary-reassociate");
	SmallVector<const SCEV *, 2> Ops1{ BExpr, AExpr };			SmallVector<const SCEV *, 2> Ops1{ BExpr, AExpr };
	const SCEVTypes SCEVType = convertToSCEVype(m_MaxMin);			const SCEVTypes SCEVType = convertToSCEVype(m_MaxMin);
	const SCEV *R1Expr = SE->getMinMaxExpr(SCEVType, Ops1);			const SCEV *R1Expr = SE->getMinMaxExpr(SCEVType, Ops1);

	Instruction *R1MinMax = findClosestMatchingDominator(R1Expr, I);			Instruction *R1MinMax = findClosestMatchingDominator(R1Expr, I);

	if (!R1MinMax)			if (!R1MinMax)
	continue;			continue;

	LLVM_DEBUG(dbgs() << "NARY: Found common sub-expr: " << *R1MinMax			LLVM_DEBUG(dbgs() << "NARY: Found common sub-expr: " << *R1MinMax
	<< "\n");			<< "\n");

	R1Expr = SE->getUnknown(R1MinMax);			SmallVector<const SCEV *, 2> Ops2{SE->getUnknown(C),
	SmallVector<const SCEV *, 2> Ops2{ RHSExpr, R1Expr };			SE->getUnknown(R1MinMax)};
				nikicUnsubmitted Not Done Reply Inline Actions You can pass these directly, no need to create SmallVector for an ArrayRef. nikic: You can pass these directly, no need to create SmallVector for an ArrayRef.
				ebrevnovAuthorUnsubmitted Done Reply Inline Actions You can pass these directly, no need to create SmallVector for an ArrayRef. Thanks. Will fix. ebrevnov: > You can pass these directly, no need to create SmallVector for an ArrayRef. Thanks. Will fix.
	const SCEV *R2Expr = SE->getMinMaxExpr(SCEVType, Ops2);			const SCEV *R2Expr = SE->getMinMaxExpr(SCEVType, Ops2);

	Value *NewMinMax = Expander.expandCodeFor(R2Expr, I->getType(), I);			Value *NewMinMax = Expander.expandCodeFor(R2Expr, I->getType(), I);
	NewMinMax->setName(Twine(I->getName()).concat(".nary"));			NewMinMax->setName(Twine(I->getName()).concat(".nary"));

	LLVM_DEBUG(dbgs() << "NARY: Deleting: " << *I << "\n"			LLVM_DEBUG(dbgs() << "NARY: Deleting: " << *I << "\n"
	<< "NARY: Inserting: " << *NewMinMax << "\n");			<< "NARY: Inserting: " << *NewMinMax << "\n");
	return NewMinMax;			return NewMinMax;
	}			}
	}			}
	std::swap(LHS, RHS);			std::swap(LHS, RHS);
	}			}
	return nullptr;			return nullptr;
	}			}

llvm/test/Transforms/NaryReassociate/nary-req.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -nary-reassociate -S \| FileCheck %s			; RUN: opt < %s -nary-reassociate -S \| FileCheck %s
	; RUN: opt < %s -passes='nary-reassociate' -S \| FileCheck %s			; RUN: opt < %s -passes='nary-reassociate' -S \| FileCheck %s

	declare i32 @llvm.smax.i32(i32 %a, i32 %b)			declare i32 @llvm.smax.i32(i32 %a, i32 %b)
	declare i64 @llvm.umin.i64(i64, i64)			declare i64 @llvm.umin.i64(i64, i64)

	; This is a negative test. We should not optimize if intermediate result			; This is a negative test. We should not optimize if intermediate result
	; has a use outside of optimizable pattern. In other words %smax2 has one			; has a use outside of optimizable pattern. In other words %smax2 has one
	; use from %smax3 and side use from %res2.			; use from %smax3 and side use from %res2.
	define i32 @smax_test1(i32 %a, i32 %b, i32 %c) {			define i32 @smax_test1(i32 %a, i32 %b, i32 %c) {
	; CHECK-LABEL: @smax_test1(			; CHECK-LABEL: @smax_test1(
	; CHECK-NEXT: [[C1:%.]] = icmp sgt i32 [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[C1:%.]] = icmp sgt i32 [[A:%.]], [[B:%.*]]
	; CHECK-NEXT: [[SMAX1:%.*]] = select i1 [[C1]], i32 [[A]], i32 [[B]]			; CHECK-NEXT: [[SMAX1:%.*]] = select i1 [[C1]], i32 [[A]], i32 [[B]]
	Show All 26 Lines
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[E:%.*]] = sub i64 undef, 0			; CHECK-NEXT: [[E:%.*]] = sub i64 undef, 0
	; CHECK-NEXT: [[E1:%.]] = sub i64 [[ARG:%.]], 0			; CHECK-NEXT: [[E1:%.]] = sub i64 [[ARG:%.]], 0
	; CHECK-NEXT: [[E2:%.*]] = call i64 @llvm.umin.i64(i64 [[E]], i64 [[E1]])			; CHECK-NEXT: [[E2:%.*]] = call i64 @llvm.umin.i64(i64 [[E]], i64 [[E1]])
	; CHECK-NEXT: [[E3:%.*]] = call i64 @llvm.umin.i64(i64 [[E2]], i64 16384)			; CHECK-NEXT: [[E3:%.*]] = call i64 @llvm.umin.i64(i64 [[E2]], i64 16384)
	; CHECK-NEXT: [[E4:%.*]] = sub i64 [[ARG]], 0			; CHECK-NEXT: [[E4:%.*]] = sub i64 [[ARG]], 0
	; CHECK-NEXT: [[E5:%.*]] = call i64 @llvm.umin.i64(i64 [[E4]], i64 16384)			; CHECK-NEXT: [[E5:%.*]] = call i64 @llvm.umin.i64(i64 [[E4]], i64 16384)
	; CHECK-NEXT: [[E6:%.*]] = icmp ugt i64 [[E5]], 0			; CHECK-NEXT: [[E6:%.*]] = icmp ugt i64 [[E5]], 0
	; CHECK-NEXT: [[E10_NARY:%.*]] = call i64 @llvm.umin.i64(i64 [[E5]], i64 [[E]])			; CHECK-NEXT: [[E7:%.*]] = sub i64 undef, 0
				; CHECK-NEXT: [[E10_NARY:%.*]] = call i64 @llvm.umin.i64(i64 [[E5]], i64 [[E7]])
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	bb:			bb:
	%e = sub i64 undef, 0			%e = sub i64 undef, 0
	%e1 = sub i64 %arg, 0			%e1 = sub i64 %arg, 0
	%e2 = call i64 @llvm.umin.i64(i64 %e, i64 %e1)			%e2 = call i64 @llvm.umin.i64(i64 %e, i64 %e1)
	%e3 = call i64 @llvm.umin.i64(i64 %e2, i64 16384)			%e3 = call i64 @llvm.umin.i64(i64 %e2, i64 16384)
	%e4 = sub i64 %arg, 0			%e4 = sub i64 %arg, 0
	%e5 = call i64 @llvm.umin.i64(i64 %e4, i64 16384)			%e5 = call i64 @llvm.umin.i64(i64 %e4, i64 16384)
	%e6 = icmp ugt i64 %e5, 0			%e6 = icmp ugt i64 %e5, 0
	%e7 = sub i64 undef, 0			%e7 = sub i64 undef, 0
	%e8 = sub i64 %arg, 0			%e8 = sub i64 %arg, 0
	%e9 = call i64 @llvm.umin.i64(i64 %e7, i64 %e8)			%e9 = call i64 @llvm.umin.i64(i64 %e7, i64 %e8)
	%e10 = call i64 @llvm.umin.i64(i64 %e9, i64 16384)			%e10 = call i64 @llvm.umin.i64(i64 %e9, i64 16384)
	unreachable			unreachable
	}			}

				; Make sure we don't fall into infinte loop optimizing %sel5.
				; The subtle thing is that %sel3 is min/max as well and
				; there is "unexpected" reassociation coming from SCEV Expander
				; during %sel5 rewrite. That results in a new chain of min/max
				; which is matched on the next iteration.
				define i32 @nary_infinite_loop_minmax(i32 %d0, i32 %d1, i32 %d2, i32 %d3) {
				; CHECK-LABEL: @nary_infinite_loop_minmax(
				; CHECK-NEXT: [[CMP0:%.]] = icmp slt i32 [[D2:%.]], [[D1:%.*]]
				; CHECK-NEXT: [[SEL0:%.*]] = select i1 [[CMP0]], i32 [[D1]], i32 [[D2]]
				; CHECK-NEXT: [[CMP1:%.]] = icmp slt i32 [[D3:%.]], [[D0:%.*]]
				; CHECK-NEXT: [[SEL1:%.*]] = select i1 [[CMP1]], i32 [[D0]], i32 [[D3]]
				; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[SEL1]], [[SEL0]]
				; CHECK-NEXT: [[SEL2:%.*]] = select i1 [[CMP2]], i32 [[SEL1]], i32 [[SEL0]]
				; CHECK-NEXT: [[CMP3:%.*]] = icmp slt i32 [[D3]], [[D0]]
				; CHECK-NEXT: [[SEL3:%.*]] = select i1 [[CMP3]], i32 [[D0]], i32 [[D3]]
				; CHECK-NEXT: [[SEL5_NARY:%.*]] = call i32 @llvm.smax.i32(i32 [[SEL0]], i32 [[SEL3]])
				; CHECK-NEXT: ret i32 [[SEL5_NARY]]
				;
				%cmp0 = icmp slt i32 %d2, %d1
				%sel0 = select i1 %cmp0, i32 %d1, i32 %d2

				%cmp1 = icmp slt i32 %d3, %d0
				%sel1 = select i1 %cmp1, i32 %d0, i32 %d3

				%cmp2 = icmp slt i32 %sel1, %sel0
				%sel2 = select i1 %cmp2, i32 %sel1, i32 %sel0

				%cmp3 = icmp slt i32 %d3, %d0
				%sel3 = select i1 %cmp3, i32 %d0, i32 %d3

				%cmp4 = icmp slt i32 %sel3, %d2
				%sel4 = select i1 %cmp4, i32 %d2, i32 %sel3

				%cmp5 = icmp slt i32 %sel4, %d1
				%sel5 = select i1 %cmp5, i32 %d1, i32 %sel4
				ret i32 %sel5
				}