This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
-
IVUsers.cpp
-
test/Transforms/LoopStrengthReduce/
-
Transforms/
-
LoopStrengthReduce/
-
scaling-factor-incompat-type.ll
-
urem-use-type-conversion.ll

Differential D151394

[LSR] Treat URem as uninteresting
AbandonedPublic

Authored by peixin on May 24 2023, 6:40 PM.

Download Raw Diff

Details

Reviewers

nikic
reames

Summary

The URem instruction should be uninteresting and be treated as the user.
There is no URem SCEV expression, and the URem operation is analyzed as
Add SCEV. So, we need to check the opcode of URem instruction in the
instruction interesting check function.

Fix https://github.com/llvm/llvm-project/issues/62852.

Diff Detail

Event Timeline

peixin created this revision.May 24 2023, 6:40 PM

Herald added subscribers: StephenFan, javed.absar, hiraditya. · View Herald TranscriptMay 24 2023, 6:40 PM

peixin requested review of this revision.May 24 2023, 6:40 PM

Herald added a subscriber: llvm-commits. · View Herald TranscriptMay 24 2023, 6:40 PM

Harbormaster completed remote builds in B234368: Diff 525397.May 24 2023, 7:23 PM

Can you please explain in more detail what causes the miscompile in the first place?

In D151394#4372286, @nikic wrote:

Can you please explain in more detail what causes the miscompile in the first place?

Sure.

Actually, this bug should exist long time ago. The bug is exposed when we enable opaque pointer and the IR generated becomes different. It seems the old IR with typed pointer cannot trigger this scenario.

In this scenario, there are two phi nodes with i64 and i32 type, and the use of phi node with i32 type is URem instruction. In LSR, the two phi nodes are reduced into one with i64 type, and the urem can be replaced with udiv/add instructions in Scalar Evolution. In the above test case (urem-use-type-conversion.ll), the phi node matters is %dec.137. When converting to i64, the negative value will be the problem. Check the following:

// The wrong transformation: (think about what if %dec.137 is -1)
%dec.137 = ... ; i64 type
%rem = urem i64 %dec.137, 53
%newrem = trunc i64 %rem to i32
%call = ... %rem
// The correct transformation:
%dec.137 = ... ; i64 type
%dec.137.2 = trunc i64 %dec.137 to i32
%rem = urem i32 %dec.137.2, 53
%call = ... %rem

Previously, when getting the IVuses, isInteresting thinks URem instruction is not one user of Phi node (%dec.137) since the SCEV expression of URem is Add expr. As a result, the user of Phi node (%dec.137) is analyzed as %call. Then, in function LSRInstance::GenerateAllReuseFormulae in LSR, the wrong transformation SCEV expression is generated.

@nikic Please let me know if something is not clear to you.
For example, there is UDiv SCEV, so the bug does exist when udiv is the user of Phi node (%dec.137).

I think what I still don't really understand is why LSR thinks it is safe to replace the IV here when it isn't. It seems like marking the urem as interesting here may be non-profitable, but also shouldn't be incorrect. I think there has to be some bug on the LSR side to enable this replacement.

Note that urem is not the only non-add instruction that can end up producing an add SCEV node, so I'm not sure why we need to treat urem in particular specially.

In D151394#4377695, @nikic wrote:

I think what I still don't really understand is why LSR thinks it is safe to replace the IV here when it isn't. It seems like marking the urem as interesting here may be non-profitable, but also shouldn't be incorrect. I think there has to be some bug on the LSR side to enable this replacement.

Note that urem is not the only non-add instruction that can end up producing an add SCEV node, so I'm not sure why we need to treat urem in particular specially.

This truly seems to be like one work around instead of fixing the problem for now. I tested some internal benchmarks and there is no performance regression. I tried to analyze this case in LSR, but everywhere looks like reasonable. I tested the test case with typed pointer, and the bug exists in LLVM 12. I may need more time to dig when and which design missed this scenario in LSR.

urem is special since the i32-i64 conversion may cause the value difference and the variable is negative since it converts the signed to unsigned. Another one is udiv, but there is SCEV Div Expr.

signed: (i32 -1) == (i64 -1)
unsigned: (i32 -1) != (i64 -1)

I think the underlying issue is that normalized expressions are extended and then used in a post-inc context outside the loop. I put up D153004

peixin abandoned this revision.Jul 6 2023, 6:17 PM

Revision Contents

Path

Size

llvm/

lib/

Analysis/

IVUsers.cpp

4 lines

test/

Transforms/

LoopStrengthReduce/

scaling-factor-incompat-type.ll

8 lines

urem-use-type-conversion.ll

57 lines

Diff 525397

llvm/lib/Analysis/IVUsers.cpp

	Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines

	Pass *llvm::createIVUsersPass() { return new IVUsersWrapperPass(); }			Pass *llvm::createIVUsersPass() { return new IVUsersWrapperPass(); }

	/// isInteresting - Test whether the given expression is "interesting" when			/// isInteresting - Test whether the given expression is "interesting" when
	/// used by the given expression, within the context of analyzing the			/// used by the given expression, within the context of analyzing the
	/// given loop.			/// given loop.
	static bool isInteresting(const SCEV S, const Instruction I, const Loop *L,			static bool isInteresting(const SCEV S, const Instruction I, const Loop *L,
	ScalarEvolution SE, LoopInfo LI) {			ScalarEvolution SE, LoopInfo LI) {
				// The SCEV for URem instruction is SCEVAddExpr, but URem is uninteresting.
				if (I->getOpcode() == Instruction::URem)
				return false;

	// An addrec is interesting if it's affine or if it has an interesting start.			// An addrec is interesting if it's affine or if it has an interesting start.
	if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(S)) {			if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(S)) {
	// Keep things simple. Don't touch loop-variant strides unless they're			// Keep things simple. Don't touch loop-variant strides unless they're
	// only used outside the loop and we can simplify them.			// only used outside the loop and we can simplify them.
	if (AR->getLoop() == L)			if (AR->getLoop() == L)
	return AR->isAffine() \|\|			return AR->isAffine() \|\|
	(!L->contains(I) &&			(!L->contains(I) &&
	SE->getSCEVAtScope(AR, LI->getLoopFor(I->getParent())) != AR);			SE->getSCEVAtScope(AR, LI->getLoopFor(I->getParent())) != AR);
	▲ Show 20 Lines • Show All 308 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopStrengthReduce/scaling-factor-incompat-type.ll

	Show All 11 Lines
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: br label [[BB4:%.*]]			; CHECK-NEXT: br label [[BB4:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[T3:%.]] = ashr i64 [[LSR_IV_NEXT:%.]], 32			; CHECK-NEXT: [[T3:%.]] = ashr i64 [[LSR_IV_NEXT:%.]], 32
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: bb4:			; CHECK: bb4:
	; CHECK-NEXT: [[LSR_IV1:%.]] = phi i16 [ [[LSR_IV_NEXT2:%.]], [[BB13:%.]] ], [ 6, [[BB:%.]] ]			; CHECK-NEXT: [[LSR_IV1:%.]] = phi i16 [ [[LSR_IV_NEXT2:%.]], [[BB13:%.]] ], [ 6, [[BB:%.]] ]
	; CHECK-NEXT: [[LSR_IV:%.*]] = phi i64 [ [[LSR_IV_NEXT]], [[BB13]] ], [ 8589934593, [[BB]] ]			; CHECK-NEXT: [[LSR_IV:%.*]] = phi i64 [ [[LSR_IV_NEXT]], [[BB13]] ], [ 8589934593, [[BB]] ]
	; CHECK-NEXT: [[LSR_IV_NEXT]] = add nuw nsw i64 [[LSR_IV]], 25769803776			; CHECK-NEXT: [[T8:%.*]] = urem i16 [[LSR_IV1]], 3
	; CHECK-NEXT: [[LSR_IV_NEXT2]] = add nuw nsw i16 [[LSR_IV1]], 6			; CHECK-NEXT: [[T9:%.*]] = mul i16 [[T8]], 2
	; CHECK-NEXT: [[T10:%.*]] = icmp eq i16 1, 0			; CHECK-NEXT: [[T10:%.*]] = icmp eq i16 [[T9]], 1
	; CHECK-NEXT: br i1 [[T10]], label [[BB11:%.*]], label [[BB13]]			; CHECK-NEXT: br i1 [[T10]], label [[BB11:%.*]], label [[BB13]]
	; CHECK: bb11:			; CHECK: bb11:
	; CHECK-NEXT: [[T12:%.*]] = udiv i16 1, [[LSR_IV1]]			; CHECK-NEXT: [[T12:%.*]] = udiv i16 1, [[LSR_IV1]]
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: bb13:			; CHECK: bb13:
				; CHECK-NEXT: [[LSR_IV_NEXT]] = add nuw nsw i64 [[LSR_IV]], 25769803776
				; CHECK-NEXT: [[LSR_IV_NEXT2]] = add nuw nsw i16 [[LSR_IV1]], 6
	; CHECK-NEXT: br i1 true, label [[BB1:%.*]], label [[BB4]]			; CHECK-NEXT: br i1 true, label [[BB1:%.*]], label [[BB4]]
	;			;
	bb:			bb:
	br label %bb4			br label %bb4
	bb1: ; preds = %bb13			bb1: ; preds = %bb13
	%t = shl i64 %t14, 32			%t = shl i64 %t14, 32
	%t2 = add i64 %t, 1			%t2 = add i64 %t, 1
	%t3 = ashr i64 %t2, 32			%t3 = ashr i64 %t2, 32
	Show All 18 Lines

llvm/test/Transforms/LoopStrengthReduce/urem-use-type-conversion.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -loop-reduce -S \| FileCheck %s

				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-unknown-linux-gnu"

				@l = global i64 0
				@i = global i64 0
				@.str = private constant [4 x i8] c"%d\0A\00"

				define i32 @main() {
				; CHECK-LABEL: define i32 @main() {
				; CHECK-NEXT: for.body8.1.preheader:
				; CHECK-NEXT: br label [[FOR_BODY8_1:%.*]]
				; CHECK: for.body8.1:
				; CHECK-NEXT: [[LSR_IV:%.]] = phi i64 [ [[LSR_IV_NEXT:%.]], [[FOR_BODY8_1]] ], [ 1, [[FOR_BODY8_1_PREHEADER:%.*]] ]
				; CHECK-NEXT: [[LSR_IV_NEXT]] = add nsw i64 [[LSR_IV]], -1
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[LSR_IV_NEXT]], 1
				; CHECK-NEXT: [[TMP:%.*]] = trunc i64 [[TMP0]] to i32
				; CHECK-NEXT: [[CMP6_1:%.*]] = icmp sgt i32 [[TMP]], 0
				; CHECK-NEXT: br i1 [[CMP6_1]], label [[FOR_BODY8_1]], label [[FOR_INC10_1:%.*]]
				; CHECK: for.inc10.1:
				; CHECK-NEXT: [[TMP1:%.*]] = mul i64 [[LSR_IV_NEXT]], -1
				; CHECK-NEXT: br label [[FOR_END15:%.*]]
				; CHECK: for.end15:
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[LSR_IV_NEXT]], 1
				; CHECK-NEXT: store i64 [[TMP1]], ptr @l, align 8
				; CHECK-NEXT: store i64 [[TMP2]], ptr @i, align 8
				; CHECK-NEXT: [[TMP3:%.*]] = trunc i64 [[LSR_IV_NEXT]] to i32
				; CHECK-NEXT: [[REM:%.*]] = urem i32 [[TMP3]], 53
				; CHECK-NEXT: [[CALL:%.*]] = tail call i32 (ptr, ...) @printf(ptr @.str, i32 [[REM]])
				; CHECK-NEXT: ret i32 0
				;
				for.body8.1.preheader:
				br label %for.body8.1

				for.body8.1: ; preds = %for.body8.1, %for.body8.1.preheader
				%dec.137 = phi i32 [ %dec.1, %for.body8.1 ], [ 1, %for.body8.1.preheader ]
				%inc1821.1 = phi i64 [ %inc.1, %for.body8.1 ], [ 0, %for.body8.1.preheader ]
				%inc.1 = add nsw i64 %inc1821.1, 1
				%dec.1 = add nsw i32 %dec.137, -1
				%cmp6.1 = icmp sgt i32 %dec.137, 0
				br i1 %cmp6.1, label %for.body8.1, label %for.inc10.1

				for.inc10.1: ; preds = %for.body8.1
				br label %for.end15

				for.end15: ; preds = %for.inc10.1
				%conv9.le.le.le = zext i32 %dec.137 to i64
				store i64 %inc1821.1, ptr @l, align 8
				store i64 %conv9.le.le.le, ptr @i, align 8
				%rem = urem i32 %dec.1, 53
				%call = tail call i32 (ptr, ...) @printf(ptr @.str, i32 %rem)
				ret i32 0
				}

				declare i32 @printf(ptr, ...)