This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Analysis/ScalarEvolution.cpp
1157	This check should be lower in the function. Certainly doesn't make sense to do this before the folding set lookup or the constant folding. I think the right position is directly before a new SCEVTruncateExpr is allocated, i.e. after trunc has already been pushed through operations where possible.

lebedev.ri added a subscriber: lebedev.ri.Jan 2 2021, 3:11 AM

lebedev.ri added inline comments.

llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-assumed-divisible-TC.ll
1	Please precommit the test

gilr marked 2 inline comments as done.Jan 2 2021, 7:42 AM

Addressed comments.

fhahn added inline comments.Jan 2 2021, 7:58 AM

llvm/lib/Analysis/ScalarEvolution.cpp
1181	Are you worried about the cost of calling `GetMintrailingZeros` here? Could it happen that we have a arbitrary SCEV expression here for which we can get more accurate results thanks to `GetMinTrailingZeros`?

nikic added inline comments.Jan 2 2021, 8:10 AM

llvm/lib/Analysis/ScalarEvolution.cpp
1181	I believe this optimization can only be useful in the first place for SCEVUnknown (and SCEVPtrToInt, which is basically the same thing), as well as min/max expression. The latter only because we don't push truncates through min/max like we do for everything else. Possibly we should be doing that, but that's a different issue... That said, from a code structure perspective, I'm not sure why this check is present in the "Depth > MaxCastDepth" branch at all. This is a recursion cut-off that is supposed to produce sub-optimal SCEV expressions, and there does not seem be any strong cause why this particular optimization needs to be applied unconditionally.

No visible impact on compile-time: https://llvm-compile-time-tracker.com/compare.php?from=d8af31006351c9f441d73d4b6c5ea6d109f3d4f1&to=bc45471b6685606b02f55bc76d6161d7c0e32d62&stat=instructions

gilr added inline comments.Jan 2 2021, 11:27 PM

llvm/lib/Analysis/ScalarEvolution.cpp
1181	I believe this optimization can only be useful in the first place for SCEVUnknown This patch was indeed motivated by assumptions, but the analysis first needs to get to the SCEVUnknowns. Calling GetMinTrailingZeros on any SCEV that got through GetTruncateExpr complements what ever the latter didn't simplify, but admittedly GetMinTrailingZeros is unboundedly recursive itself (even if cached and unrelated to getTruncateExpr's recursion). This is a recursion cut-off that is supposed to produce sub-optimal SCEV expressions Cutting off potentially exponential recursion at some point makes a lot of sense. Not sure it implies not doing any work at the leaves though. To be on the safe side let's start by calling GetMinTrailingZeros only for SCEVUnknowns at the end of the function and extend as needed.

Limit trailing-zeros check to SCEVUnknowns and only while depth limit is not reached.

LGTM

This revision is now accepted and ready to land.Jan 3 2021, 1:49 AM

fhahn added inline comments.Jan 3 2021, 3:16 AM

llvm/lib/Analysis/ScalarEvolution.cpp
1181	I believe this optimization can only be useful in the first place for SCEVUnknown This patch was indeed motivated by assumptions, but the analysis first needs to get to the SCEVUnknowns. Calling GetMinTrailingZeros on any SCEV that got through GetTruncateExpr complements what ever the latter didn't simplify, but admittedly GetMinTrailingZeros is unboundedly recursive itself (even if cached and unrelated to getTruncateExpr's recursion). I am not sure, is there's anything conceptually making this only useful for `SCEVUnknown`? `GetMinTrailingZeroes` can provide useful bounds for a range of expressions which may be helpful for this optimization. One example involving a `UMax` expression below. This example probably highlights a missing fold for truncates, but the main point is that the reasoning in this patch here is complimentary and may catch additional cases. There is some overlap with the folds in the function, but I am not sure this means we should restrict this only to `SCEVUnknown`. If `GetMinTrailingZeros` gets improved, it would be good to not miss out of the benefits in this function. define i8 @trunc_to_assumed_zeros0(i32* %p, i32* %p.2, i1 %c) { %a = load i32, i32* %p %b = load i32, i32* %p.2 %and.1 = and i32 %a, 255 %cmp.1 = icmp eq i32 %and.1, 0 tail call void @llvm.assume(i1 %cmp.1) %and.2 = and i32 %b, 255 %cmp.2 = icmp eq i32 %and.2, 0 tail call void @llvm.assume(i1 %cmp.2) %lt = icmp ugt i32 %a, %b %sel = select i1 %lt, i32 %a, i32 %b %t1 = trunc i32 %sel to i8 %t2 = trunc i32 %a to i8 %t3 = trunc i32 %b to i8 ret i8 %t1 }

nikic added inline comments.Jan 3 2021, 3:33 AM

llvm/lib/Analysis/ScalarEvolution.cpp
1181	Just to be clear, I have no problem the fold being applied to all SCEV expressions, not just SCEVUnknown. My only concern was with where in this function the fold happens, not what it is applied to.

gilr added inline comments.Jan 3 2021, 3:54 AM

llvm/lib/Analysis/ScalarEvolution.cpp
1181	Right @fhahn. And the use of GetMinTrailingZeros should automatically be reduced if getTruncateExpr is added further simplifications. Only reason to restrict this to (the non-recursive) SCEVUnknown was being extra-careful regarding compile-time. Since @nikic is also Ok with folding any SCEV at the end of the function I'll remove this restriction. Thanks guys!

Closed by commit rGd9c0b128e354: [SCEV] Simplify trunc to zero based on known bits (authored by gilr). · Explain WhyJan 3 2021, 4:08 AM

This revision was automatically updated to reflect the committed changes.

gilr added a commit: rGd9c0b128e354: [SCEV] Simplify trunc to zero based on known bits.

fhahn added inline comments.Jan 3 2021, 4:10 AM

llvm/lib/Analysis/ScalarEvolution.cpp
1181	SGTM, thanks!

Revision Contents

Path

Size

llvm/

lib/

Analysis/

ScalarEvolution.cpp

5 lines

test/

Analysis/

ScalarEvolution/

trunc-simplify.ll

22 lines

Transforms/

LoopVectorize/

dont-fold-tail-for-assumed-divisible-TC.ll

65 lines

Diff 314275

llvm/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,148 Lines • ▼ Show 20 Lines	assert(getTypeSizeInBits(Op->getType()) > getTypeSizeInBits(Ty) &&
"This is not a truncating conversion!");		"This is not a truncating conversion!");
assert(isSCEVable(Ty) &&		assert(isSCEVable(Ty) &&
"This is not a conversion to a SCEVable type!");		"This is not a conversion to a SCEVable type!");
Ty = getEffectiveSCEVType(Ty);		Ty = getEffectiveSCEVType(Ty);

FoldingSetNodeID ID;		FoldingSetNodeID ID;
ID.AddInteger(scTruncate);		ID.AddInteger(scTruncate);
ID.AddPointer(Op);		ID.AddPointer(Op);
ID.AddPointer(Ty);		ID.AddPointer(Ty);
		nikicUnsubmitted Done Reply Inline Actions This check should be lower in the function. Certainly doesn't make sense to do this before the folding set lookup or the constant folding. I think the right position is directly before a new SCEVTruncateExpr is allocated, i.e. after trunc has already been pushed through operations where possible. nikic: This check should be lower in the function. Certainly doesn't make sense to do this before the…
void *IP = nullptr;		void *IP = nullptr;
if (const SCEV *S = UniqueSCEVs.FindNodeOrInsertPos(ID, IP)) return S;		if (const SCEV *S = UniqueSCEVs.FindNodeOrInsertPos(ID, IP)) return S;

// Fold if the operand is constant.		// Fold if the operand is constant.
if (const SCEVConstant *SC = dyn_cast<SCEVConstant>(Op))		if (const SCEVConstant *SC = dyn_cast<SCEVConstant>(Op))
return getConstant(		return getConstant(
cast<ConstantInt>(ConstantExpr::getTrunc(SC->getValue(), Ty)));		cast<ConstantInt>(ConstantExpr::getTrunc(SC->getValue(), Ty)));

// trunc(trunc(x)) --> trunc(x)		// trunc(trunc(x)) --> trunc(x)
if (const SCEVTruncateExpr *ST = dyn_cast<SCEVTruncateExpr>(Op))		if (const SCEVTruncateExpr *ST = dyn_cast<SCEVTruncateExpr>(Op))
return getTruncateExpr(ST->getOperand(), Ty, Depth + 1);		return getTruncateExpr(ST->getOperand(), Ty, Depth + 1);

// trunc(sext(x)) --> sext(x) if widening or trunc(x) if narrowing		// trunc(sext(x)) --> sext(x) if widening or trunc(x) if narrowing
if (const SCEVSignExtendExpr *SS = dyn_cast<SCEVSignExtendExpr>(Op))		if (const SCEVSignExtendExpr *SS = dyn_cast<SCEVSignExtendExpr>(Op))
return getTruncateOrSignExtend(SS->getOperand(), Ty, Depth + 1);		return getTruncateOrSignExtend(SS->getOperand(), Ty, Depth + 1);

// trunc(zext(x)) --> zext(x) if widening or trunc(x) if narrowing		// trunc(zext(x)) --> zext(x) if widening or trunc(x) if narrowing
if (const SCEVZeroExtendExpr *SZ = dyn_cast<SCEVZeroExtendExpr>(Op))		if (const SCEVZeroExtendExpr *SZ = dyn_cast<SCEVZeroExtendExpr>(Op))
return getTruncateOrZeroExtend(SZ->getOperand(), Ty, Depth + 1);		return getTruncateOrZeroExtend(SZ->getOperand(), Ty, Depth + 1);

if (Depth > MaxCastDepth) {		if (Depth > MaxCastDepth) {
SCEV *S =		SCEV *S =
new (SCEVAllocator) SCEVTruncateExpr(ID.Intern(SCEVAllocator), Op, Ty);		new (SCEVAllocator) SCEVTruncateExpr(ID.Intern(SCEVAllocator), Op, Ty);
UniqueSCEVs.InsertNode(S, IP);		UniqueSCEVs.InsertNode(S, IP);
		fhahnUnsubmitted Not Done Reply Inline Actions Are you worried about the cost of calling `GetMintrailingZeros` here? Could it happen that we have a arbitrary SCEV expression here for which we can get more accurate results thanks to `GetMinTrailingZeros`? fhahn: Are you worried about the cost of calling `GetMintrailingZeros` here? Could it happen that we…
		nikicUnsubmitted Not Done Reply Inline Actions I believe this optimization can only be useful in the first place for SCEVUnknown (and SCEVPtrToInt, which is basically the same thing), as well as min/max expression. The latter only because we don't push truncates through min/max like we do for everything else. Possibly we should be doing that, but that's a different issue... That said, from a code structure perspective, I'm not sure why this check is present in the "Depth > MaxCastDepth" branch at all. This is a recursion cut-off that is supposed to produce sub-optimal SCEV expressions, and there does not seem be any strong cause why this particular optimization needs to be applied unconditionally. nikic: I believe this optimization can only be useful in the first place for SCEVUnknown (and…
		gilrAuthorUnsubmitted Done Reply Inline Actions I believe this optimization can only be useful in the first place for SCEVUnknown This patch was indeed motivated by assumptions, but the analysis first needs to get to the SCEVUnknowns. Calling GetMinTrailingZeros on any SCEV that got through GetTruncateExpr complements what ever the latter didn't simplify, but admittedly GetMinTrailingZeros is unboundedly recursive itself (even if cached and unrelated to getTruncateExpr's recursion). This is a recursion cut-off that is supposed to produce sub-optimal SCEV expressions Cutting off potentially exponential recursion at some point makes a lot of sense. Not sure it implies not doing any work at the leaves though. To be on the safe side let's start by calling GetMinTrailingZeros only for SCEVUnknowns at the end of the function and extend as needed. gilr: >I believe this optimization can only be useful in the first place for SCEVUnknown This patch…
		fhahnUnsubmitted Not Done Reply Inline Actions I believe this optimization can only be useful in the first place for SCEVUnknown This patch was indeed motivated by assumptions, but the analysis first needs to get to the SCEVUnknowns. Calling GetMinTrailingZeros on any SCEV that got through GetTruncateExpr complements what ever the latter didn't simplify, but admittedly GetMinTrailingZeros is unboundedly recursive itself (even if cached and unrelated to getTruncateExpr's recursion). I am not sure, is there's anything conceptually making this only useful for `SCEVUnknown`? `GetMinTrailingZeroes` can provide useful bounds for a range of expressions which may be helpful for this optimization. One example involving a `UMax` expression below. This example probably highlights a missing fold for truncates, but the main point is that the reasoning in this patch here is complimentary and may catch additional cases. There is some overlap with the folds in the function, but I am not sure this means we should restrict this only to `SCEVUnknown`. If `GetMinTrailingZeros` gets improved, it would be good to not miss out of the benefits in this function. define i8 @trunc_to_assumed_zeros0(i32* %p, i32* %p.2, i1 %c) { %a = load i32, i32* %p %b = load i32, i32* %p.2 %and.1 = and i32 %a, 255 %cmp.1 = icmp eq i32 %and.1, 0 tail call void @llvm.assume(i1 %cmp.1) %and.2 = and i32 %b, 255 %cmp.2 = icmp eq i32 %and.2, 0 tail call void @llvm.assume(i1 %cmp.2) %lt = icmp ugt i32 %a, %b %sel = select i1 %lt, i32 %a, i32 %b %t1 = trunc i32 %sel to i8 %t2 = trunc i32 %a to i8 %t3 = trunc i32 %b to i8 ret i8 %t1 } fhahn: >> I believe this optimization can only be useful in the first place for SCEVUnknown > This…
		nikicUnsubmitted Not Done Reply Inline Actions Just to be clear, I have no problem the fold being applied to all SCEV expressions, not just SCEVUnknown. My only concern was with where in this function the fold happens, not what it is applied to. nikic: Just to be clear, I have no problem the fold being applied to all SCEV expressions, not just…
		gilrAuthorUnsubmitted Done Reply Inline Actions Right @fhahn. And the use of GetMinTrailingZeros should automatically be reduced if getTruncateExpr is added further simplifications. Only reason to restrict this to (the non-recursive) SCEVUnknown was being extra-careful regarding compile-time. Since @nikic is also Ok with folding any SCEV at the end of the function I'll remove this restriction. Thanks guys! gilr: Right @fhahn. And the use of GetMinTrailingZeros should automatically be reduced if…
		fhahnUnsubmitted Not Done Reply Inline Actions SGTM, thanks! fhahn: SGTM, thanks!
addToLoopUseLists(S);		addToLoopUseLists(S);
return S;		return S;
}		}

// trunc(x1 + ... + xN) --> trunc(x1) + ... + trunc(xN) and		// trunc(x1 + ... + xN) --> trunc(x1) + ... + trunc(xN) and
// trunc(x1 * ... * xN) --> trunc(x1) * ... * trunc(xN),		// trunc(x1 * ... * xN) --> trunc(x1) * ... * trunc(xN),
// if after transforming we have at most one truncate, not counting truncates		// if after transforming we have at most one truncate, not counting truncates
// that replace other casts.		// that replace other casts.
Show All 27 Lines	const SCEV ScalarEvolution::getTruncateExpr(const SCEV Op, Type *Ty,
// If the input value is a chrec scev, truncate the chrec's operands.		// If the input value is a chrec scev, truncate the chrec's operands.
if (const SCEVAddRecExpr *AddRec = dyn_cast<SCEVAddRecExpr>(Op)) {		if (const SCEVAddRecExpr *AddRec = dyn_cast<SCEVAddRecExpr>(Op)) {
SmallVector<const SCEV *, 4> Operands;		SmallVector<const SCEV *, 4> Operands;
for (const SCEV *Op : AddRec->operands())		for (const SCEV *Op : AddRec->operands())
Operands.push_back(getTruncateExpr(Op, Ty, Depth + 1));		Operands.push_back(getTruncateExpr(Op, Ty, Depth + 1));
return getAddRecExpr(Operands, AddRec->getLoop(), SCEV::FlagAnyWrap);		return getAddRecExpr(Operands, AddRec->getLoop(), SCEV::FlagAnyWrap);
}		}

		// Return zero if truncating to known zeros.
		uint32_t MinTrailingZeros = GetMinTrailingZeros(Op);
		if (MinTrailingZeros >= getTypeSizeInBits(Ty))
		return getZero(Ty);

// The cast wasn't folded; create an explicit cast node. We can reuse		// The cast wasn't folded; create an explicit cast node. We can reuse
// the existing insert position since if we get here, we won't have		// the existing insert position since if we get here, we won't have
// made any changes which would invalidate it.		// made any changes which would invalidate it.
SCEV *S = new (SCEVAllocator) SCEVTruncateExpr(ID.Intern(SCEVAllocator),		SCEV *S = new (SCEVAllocator) SCEVTruncateExpr(ID.Intern(SCEVAllocator),
Op, Ty);		Op, Ty);
UniqueSCEVs.InsertNode(S, IP);		UniqueSCEVs.InsertNode(S, IP);
addToLoopUseLists(S);		addToLoopUseLists(S);
return S;		return S;
▲ Show 20 Lines • Show All 12,103 Lines • Show Last 20 Lines

llvm/test/Analysis/ScalarEvolution/trunc-simplify.ll

	Show All 18 Lines
	; CHECK-LABEL: @trunc_of_add			; CHECK-LABEL: @trunc_of_add
	define i8 @trunc_of_add(i32 %a) {			define i8 @trunc_of_add(i32 %a) {
	%b = add i32 %a, 100			%b = add i32 %a, 100
	; CHECK: %c			; CHECK: %c
	; CHECK-NEXT: --> (100 + (trunc i32 %a to i8))			; CHECK-NEXT: --> (100 + (trunc i32 %a to i8))
	%c = trunc i32 %b to i8			%c = trunc i32 %b to i8
	ret i8 %c			ret i8 %c
	}			}

				; Check that we truncate to zero values assumed to have at least as many
				; trailing zeros as the target type.
				; CHECK-LABEL: @trunc_to_assumed_zeros
				define i8 @trunc_to_assumed_zeros(i32* %p) {
				%a = load i32, i32* %p
				%and = and i32 %a, 255
				%cmp = icmp eq i32 %and, 0
				tail call void @llvm.assume(i1 %cmp)
				; CHECK: %c
				; CHECK-NEXT: --> 0
				%c = trunc i32 %a to i8
				; CHECK: %d
				; CHECK-NEXT: --> false
				%d = trunc i32 %a to i1
				; CHECK: %e
				; CHECK-NEXT: --> (trunc i32 %a to i16)
				%e = trunc i32 %a to i16
				ret i8 %c
				}

				declare void @llvm.assume(i1 noundef) nofree nosync nounwind willreturn

llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-assumed-divisible-TC.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				lebedev.riUnsubmitted Done Reply Inline Actions Please precommit the test lebedev.ri: Please precommit the test
	; RUN: opt < %s -loop-vectorize -force-vector-width=4 -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-width=4 -S \| FileCheck %s

	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

	; TODO: Make sure the loop is vectorized under -Os without folding its tail based on			; Make sure the loop is vectorized under -Os without folding its tail based on
	; its trip-count's lower bits assumed to be zero.			; its trip-count's lower bits assumed to be zero.

	define dso_local void @assumeAlignedTC(i32* noalias nocapture %A, i32* %p) optsize {			define dso_local void @assumeAlignedTC(i32* noalias nocapture %A, i32* %p) optsize {
	; CHECK-LABEL: @assumeAlignedTC(			; CHECK-LABEL: @assumeAlignedTC(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[N:%.]] = load i32, i32 [[P:%.*]], align 4			; CHECK-NEXT: [[N:%.]] = load i32, i32 [[P:%.*]], align 4
	; CHECK-NEXT: [[AND:%.*]] = and i32 [[N]], 3			; CHECK-NEXT: [[AND:%.*]] = and i32 [[N]], 3
	; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[AND]], 0			; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[AND]], 0
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[CMP]])			; CHECK-NEXT: tail call void @llvm.assume(i1 [[CMP]])
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 4
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 3			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N]], 4
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4			; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N]], [[N_MOD_VF]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i32 [[N]], 1
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[TRIP_COUNT_MINUS_1]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE6:%.*]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE6]] ]			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[INDEX]], i32 0
	; CHECK-NEXT: [[TMP0:%.*]] = icmp ule <4 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i1> [[TMP0]], i32 0			; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: br i1 [[TMP1]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK: pred.store.if:			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[TMP2]]			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: store i32 13, i32* [[TMP3]], align 1			; CHECK-NEXT: store <4 x i32> <i32 13, i32 13, i32 13, i32 13>, <4 x i32>* [[TMP3]], align 1
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]]
	; CHECK: pred.store.continue:
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i1> [[TMP0]], i32 1
	; CHECK-NEXT: br i1 [[TMP4]], label [[PRED_STORE_IF1:%.]], label [[PRED_STORE_CONTINUE2:%.]]
	; CHECK: pred.store.if1:
	; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[INDEX]], 1
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[TMP5]]
	; CHECK-NEXT: store i32 13, i32* [[TMP6]], align 1
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE2]]
	; CHECK: pred.store.continue2:
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i1> [[TMP0]], i32 2
	; CHECK-NEXT: br i1 [[TMP7]], label [[PRED_STORE_IF3:%.]], label [[PRED_STORE_CONTINUE4:%.]]
	; CHECK: pred.store.if3:
	; CHECK-NEXT: [[TMP8:%.*]] = add i32 [[INDEX]], 2
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[TMP8]]
	; CHECK-NEXT: store i32 13, i32* [[TMP9]], align 1
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE4]]
	; CHECK: pred.store.continue4:
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i1> [[TMP0]], i32 3
	; CHECK-NEXT: br i1 [[TMP10]], label [[PRED_STORE_IF5:%.*]], label [[PRED_STORE_CONTINUE6]]
	; CHECK: pred.store.if5:
	; CHECK-NEXT: [[TMP11:%.*]] = add i32 [[INDEX]], 3
	; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[TMP11]]
	; CHECK-NEXT: store i32 13, i32* [[TMP12]], align 1
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE6]]
	; CHECK: pred.store.continue6:
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>			; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP0:!llvm.loop !.]]
	; CHECK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP0:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[RIV:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[RIVPLUS1:%.]], [[LOOP]] ]			; CHECK-NEXT: [[RIV:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[RIVPLUS1:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[RIV]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[RIV]]
	; CHECK-NEXT: store i32 13, i32* [[ARRAYIDX]], align 1			; CHECK-NEXT: store i32 13, i32* [[ARRAYIDX]], align 1
	; CHECK-NEXT: [[RIVPLUS1]] = add nuw nsw i32 [[RIV]], 1			; CHECK-NEXT: [[RIVPLUS1]] = add nuw nsw i32 [[RIV]], 1
	Show All 25 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Simplify trunc to zero based on known bitsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 314275

llvm/lib/Analysis/ScalarEvolution.cpp

llvm/test/Analysis/ScalarEvolution/trunc-simplify.ll

llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-assumed-divisible-TC.ll

[SCEV] Simplify trunc to zero based on known bits
ClosedPublic