This is an archive of the discontinued LLVM Phabricator instance.

[IndVarSimplify] Do not use SCEV expander for IVCount in LFTR when possible.
Needs RevisionPublic

Authored by aus_intel on Aug 28 2019, 9:25 AM.

Download Raw Diff

Details

Reviewers

reames
sanjoy

Summary

SCEV analysis cannot properly cache instruction with poison flags
(for example, add nsw outside of loop will not be reused by expander).
This can lead to generating of additional instructions by SCEV expander.

Example IR:

...
%maxval = add nuw nsw i32 %a1, %a2
...

for.body:

...
%cmp22 = icmp ult i32 %ivadd, %maxval
br i1 %cmp22, label %for.body, label %for.end
...

SCEV expander will generate copy of %maxval in preheader but without
nuw/nsw flags. This can be avoided by explicit check that iv count
value gives the same SCEV expression as calculated by LFTR.

Diff Detail

Event Timeline

aus_intel created this revision.Aug 28 2019, 9:25 AM

Herald added subscribers: javed.absar, hiraditya. · View Herald TranscriptAug 28 2019, 9:25 AM

Can you run ./update_test_checks.py on llvm/test/Transforms/IndVarSimplify/add_nsw.ll and llvm/test/Transforms/IndVarSimplify/udiv.ll?

cc @reames since he is more familiar with SCEV and can comment this patch.

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
2399	nits: auto BI auto Cmp

I don't have a stance here, but i think cache/flags issue is a bug regardless (is there a bug # ?),
also, does/should some other pass cleanup the IR without this patch?

There is https://reviews.llvm.org/D41578 which disabled caching of SCEV if it lost flags which, in my understanding, causes this problem. There is also analysis that SCEV should generate poison but it doesn't handle loop invariants (and I assume that changing of it can trigger longer compilation time). With all of these, I decided that it is better to just avoid use of expander if it is possible.

I noticed that early CSE can handle simple cases like this (simple add: c++ - https://godbolt.org/z/yBcf8L, opt with indvars and cse - https://godbolt.org/z/_s-Z9P). However, I have an example when CSE (or anyone else) does not remove extra instructions (c++ - https://godbolt.org/z/vc2Wf5, opt - https://godbolt.org/z/so9zTE). So it seems that there is no cleanup for this code when -O2 is passed. Moreover, if cleanup is done it loses flags from instructions too (is this ok?).

Moreover, now I discovered that there is a problem in different pass: loop strength reduction. It uses SCEV expander to rewrite values and it again can generate extra instructions even when nothing is changed inside loop (https://godbolt.org/z/gYx2Ke). Initially I didn't noticed that because some backends do not use codegen passes that added with TargetPassConfig so this pass was not added by default (opt do not run LSR too). Seems, that I should file a bug on this.

Updated patch. Added autos instead of pointer types and updates tests with script.

See inline review comment

If you decide to revise, please do the following:

Land your tests in a precommit, then rebase
Land the auto-generation of the one effected test, then rebase
Provide a clear comment (or pointer to particular test is fine) where SCEVExpander goes wrong. I might be able to make better suggestions on approach.

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
2403	I don't believe this is sound. The problem is that SCEV can map multiple expressions to the same SCEV, but that does not imply that all of them are equivalent. Given: %v1 = add nsw %a, %b -> SCEV A %v2 = add nuw %a, %b -> SCEV B (i.e pretend SCEV dropped all flags in it's construction) This would replace all uses of %v2 with %v1 which is incorrect.

This revision now requires changes to proceed.Aug 30 2019, 9:39 AM

Sorry for delayed answer.

How can I land tests in precommit? I tested my patch locally (lit tests and LNT on x86) and everything seems to be ok. I would be glad to know if there is possibility for more testing.
About auto-generation of affected test. Is it about these ll tests that I attached to patch? If so, I have already used script to regenerate them.
Probably, I left a bad comment pointing to SCEV expander. Actually, it is SCEV analysis issue. It loses flags from add instruction and then SCEV expander cannot find any cached value to use it which results in extra instruction without flags. This new instruction is for logically the same operation (compute loop limit) so the old value could be reused. There is a function getNoWrapFlagsFromUB that tries to set flags for computed SCEV. It calls isSCEVExprNeverPoison which does not handle loop invariants and this leads to losing of flags. However, if I fix this function (I tried this locally), the problem still presents. The issue is that there are some transformations of SCEV obtained from loop limit and these can lose flags again.

How can I land tests in precommit?

You need to request a write access to LLVM.

sanjoy resigned from this revision.Jan 29 2022, 5:40 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

IndVarSimplify.cpp

12 lines

test/

Transforms/

IndVarSimplify/

add_nsw.ll

23 lines

lftr-reuse.ll

9 lines

udiv.ll

1 line

Diff 217664

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp

Show First 20 Lines • Show All 2,387 Lines • ▼ Show 20 Lines	if (SE->getTypeSizeInBits(IVInit->getType())
IVInit = SE->getTruncateExpr(IVInit, ExitCount->getType());		IVInit = SE->getTruncateExpr(IVInit, ExitCount->getType());
}		}

const SCEV *IVLimit = SE->getAddExpr(IVInit, ExitCount);		const SCEV *IVLimit = SE->getAddExpr(IVInit, ExitCount);

if (UsePostInc)		if (UsePostInc)
IVLimit = SE->getAddExpr(IVLimit, SE->getOne(IVLimit->getType()));		IVLimit = SE->getAddExpr(IVLimit, SE->getOne(IVLimit->getType()));

		// If computed limit is equal to old limit then do not use SCEV expander
		// because it can lost NUW/NSW flags and create extra instructions.
		BranchInst *BI = cast<BranchInst>(ExitingBB->getTerminator());
		if (ICmpInst *Cmp = dyn_cast<ICmpInst>(BI->getOperand(0))) {
		xbolva00Unsubmitted Done Reply Inline Actions nits: auto BI auto Cmp xbolva00: nits: auto BI auto Cmp
		Value *Limit = Cmp->getOperand(0);
		if (!L->isLoopInvariant(Limit))
		Limit = Cmp->getOperand(1);
		if (SE->getSCEV(Limit) == IVLimit)
		reamesUnsubmitted Not Done Reply Inline Actions I don't believe this is sound. The problem is that SCEV can map multiple expressions to the same SCEV, but that does not imply that all of them are equivalent. Given: %v1 = add nsw %a, %b -> SCEV A %v2 = add nuw %a, %b -> SCEV B (i.e pretend SCEV dropped all flags in it's construction) This would replace all uses of %v2 with %v1 which is incorrect. reames: I don't believe this is sound. The problem is that SCEV can map multiple expressions to the…
		return Limit;
		}

// Expand the code for the iteration count.		// Expand the code for the iteration count.
assert(SE->isLoopInvariant(IVLimit, L) &&		assert(SE->isLoopInvariant(IVLimit, L) &&
"Computed iteration count is not loop invariant!");		"Computed iteration count is not loop invariant!");
// Ensure that we generate the same type as IndVar, or a smaller integer		// Ensure that we generate the same type as IndVar, or a smaller integer
// type. In the presence of null pointer values, we have an integer type		// type. In the presence of null pointer values, we have an integer type
// SCEV expression (IVInit) for a pointer type IV value (IndVar).		// SCEV expression (IVInit) for a pointer type IV value (IndVar).
Type *LimitTy = ExitCount->getType()->isPointerTy() ?		Type *LimitTy = ExitCount->getType()->isPointerTy() ?
IndVar->getType() : ExitCount->getType();		IndVar->getType() : ExitCount->getType();
BranchInst *BI = cast<BranchInst>(ExitingBB->getTerminator());
return Rewriter.expandCodeFor(IVLimit, LimitTy, BI);		return Rewriter.expandCodeFor(IVLimit, LimitTy, BI);
}		}
}		}

/// This method rewrites the exit condition of the loop to be a canonical !=		/// This method rewrites the exit condition of the loop to be a canonical !=
/// comparison against the incremented loop induction variable. This pass is		/// comparison against the incremented loop induction variable. This pass is
/// able to rewrite the exit tests of any loop where the SCEV analysis can		/// able to rewrite the exit tests of any loop where the SCEV analysis can
/// determine a loop-invariant trip count of the loop, which is actually a much		/// determine a loop-invariant trip count of the loop, which is actually a much
▲ Show 20 Lines • Show All 562 Lines • Show Last 20 Lines

llvm/test/Transforms/IndVarSimplify/add_nsw.ll

This file was added.

				; RUN: opt -indvars -S %s \| FileCheck %s

				target datalayout = "e-p:32:32-i64:64-n8:16:32"

				; CHECK: for.body.preheader:
				; CHECK-NOT: add
				; CHECK: for.body:

				define void @foo(i32 %a1, i32 %a2) {
				entry:
				%maxval = add nuw nsw i32 %a1, %a2
				%cmp = icmp slt i32 %maxval, 1
				br i1 %cmp, label %for.end, label %for.body

				for.body: ; preds = %entry, %for.body
				%j.02 = phi i32 [ 0, %entry ], [ %add31, %for.body ]
				%add31 = add nuw nsw i32 %j.02, 1
				%cmp22 = icmp slt i32 %add31, %maxval
				br i1 %cmp22, label %for.body, label %for.end

				for.end: ; preds = %for.body
				ret void
				}

llvm/test/Transforms/IndVarSimplify/lftr-reuse.ll

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; simplified, and as a result the inner loop's exit test will be rewritten.			; simplified, and as a result the inner loop's exit test will be rewritten.
	define void @expandOuterRecurrence(i32 %arg) nounwind {			define void @expandOuterRecurrence(i32 %arg) nounwind {
	; CHECK-LABEL: @expandOuterRecurrence(			; CHECK-LABEL: @expandOuterRecurrence(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[SUB1:%.]] = sub nsw i32 [[ARG:%.]], 1			; CHECK-NEXT: [[SUB1:%.]] = sub nsw i32 [[ARG:%.]], 1
	; CHECK-NEXT: [[CMP1:%.*]] = icmp slt i32 0, [[SUB1]]			; CHECK-NEXT: [[CMP1:%.*]] = icmp slt i32 0, [[SUB1]]
	; CHECK-NEXT: br i1 [[CMP1]], label [[OUTER_PREHEADER:%.]], label [[EXIT:%.]]			; CHECK-NEXT: br i1 [[CMP1]], label [[OUTER_PREHEADER:%.]], label [[EXIT:%.]]
	; CHECK: outer.preheader:			; CHECK: outer.preheader:
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[ARG]], -1
	; CHECK-NEXT: br label [[OUTER:%.*]]			; CHECK-NEXT: br label [[OUTER:%.*]]
	; CHECK: outer:			; CHECK: outer:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i32 [ [[TMP0]], [[OUTER_PREHEADER]] ], [ [[INDVARS_IV_NEXT:%.]], [[OUTER_INC:%.*]] ]			; CHECK-NEXT: [[I:%.]] = phi i32 [ [[I_INC:%.]], [[OUTER_INC:%.*]] ], [ 0, [[OUTER_PREHEADER]] ]
	; CHECK-NEXT: [[I:%.]] = phi i32 [ [[I_INC:%.]], [[OUTER_INC]] ], [ 0, [[OUTER_PREHEADER]] ]
	; CHECK-NEXT: [[SUB2:%.*]] = sub nsw i32 [[ARG]], [[I]]			; CHECK-NEXT: [[SUB2:%.*]] = sub nsw i32 [[ARG]], [[I]]
	; CHECK-NEXT: [[SUB3:%.*]] = sub nsw i32 [[SUB2]], 1			; CHECK-NEXT: [[SUB3:%.*]] = sub nsw i32 [[SUB2]], 1
	; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 0, [[SUB3]]			; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 0, [[SUB3]]
	; CHECK-NEXT: br i1 [[CMP2]], label [[INNER_PH:%.*]], label [[OUTER_INC]]			; CHECK-NEXT: br i1 [[CMP2]], label [[INNER_PH:%.*]], label [[OUTER_INC]]
	; CHECK: inner.ph:			; CHECK: inner.ph:
	; CHECK-NEXT: br label [[INNER:%.*]]			; CHECK-NEXT: br label [[INNER:%.*]]
	; CHECK: inner:			; CHECK: inner:
	; CHECK-NEXT: [[J:%.]] = phi i32 [ 0, [[INNER_PH]] ], [ [[J_INC:%.]], [[INNER]] ]			; CHECK-NEXT: [[J:%.]] = phi i32 [ 0, [[INNER_PH]] ], [ [[J_INC:%.]], [[INNER]] ]
	; CHECK-NEXT: [[J_INC]] = add nuw nsw i32 [[J]], 1			; CHECK-NEXT: [[J_INC]] = add nuw nsw i32 [[J]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i32 [[J_INC]], [[INDVARS_IV]]			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i32 [[J_INC]], [[SUB3]]
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[INNER]], label [[OUTER_INC_LOOPEXIT:%.*]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[INNER]], label [[OUTER_INC_LOOPEXIT:%.*]]
	; CHECK: outer.inc.loopexit:			; CHECK: outer.inc.loopexit:
	; CHECK-NEXT: br label [[OUTER_INC]]			; CHECK-NEXT: br label [[OUTER_INC]]
	; CHECK: outer.inc:			; CHECK: outer.inc:
	; CHECK-NEXT: [[I_INC]] = add nuw nsw i32 [[I]], 1			; CHECK-NEXT: [[I_INC]] = add nuw nsw i32 [[I]], 1
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i32 [[INDVARS_IV]], -1			; CHECK-NEXT: [[EXITCOND1:%.*]] = icmp ne i32 [[I_INC]], [[SUB1]]
	; CHECK-NEXT: [[EXITCOND1:%.*]] = icmp ne i32 [[I_INC]], [[TMP0]]
	; CHECK-NEXT: br i1 [[EXITCOND1]], label [[OUTER]], label [[EXIT_LOOPEXIT:%.*]]			; CHECK-NEXT: br i1 [[EXITCOND1]], label [[OUTER]], label [[EXIT_LOOPEXIT:%.*]]
	; CHECK: exit.loopexit:			; CHECK: exit.loopexit:
	; CHECK-NEXT: br label [[EXIT]]			; CHECK-NEXT: br label [[EXIT]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%sub1 = sub nsw i32 %arg, 1			%sub1 = sub nsw i32 %arg, 1
	▲ Show 20 Lines • Show All 234 Lines • Show Last 20 Lines

llvm/test/Transforms/IndVarSimplify/udiv.ll

	Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines
	declare i32 @printf(i8* nocapture, ...) nounwind			declare i32 @printf(i8* nocapture, ...) nounwind

	; IndVars doesn't emit a udiv in for.body.preheader since SCEVExpander::expand will			; IndVars doesn't emit a udiv in for.body.preheader since SCEVExpander::expand will
	; find out there's already a udiv in the original code.			; find out there's already a udiv in the original code.

	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK: for.body.preheader:			; CHECK: for.body.preheader:
	; CHECK-NOT: udiv			; CHECK-NOT: udiv
				; CHECK: for.body:

	define void @foo(double* %p, i64 %n) nounwind {			define void @foo(double* %p, i64 %n) nounwind {
	entry:			entry:
	%div0 = udiv i64 %n, 7 ; <i64> [#uses=1]			%div0 = udiv i64 %n, 7 ; <i64> [#uses=1]
	%div1 = add i64 %div0, 1			%div1 = add i64 %div0, 1
	%cmp2 = icmp ult i64 0, %div1 ; <i1> [#uses=1]			%cmp2 = icmp ult i64 0, %div1 ; <i1> [#uses=1]
	br i1 %cmp2, label %for.body.preheader, label %for.end			br i1 %cmp2, label %for.body.preheader, label %for.end

	Show All 19 Lines