This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
Analysis/
-
IVDescriptors.cpp
-
Transforms/Vectorize/
-
Vectorize/
-
LoopVectorizationLegality.cpp
-
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
RISCV/
-
strided-accesses.ll
-
pointer-induction-unroll.ll
-
pointer-induction.ll

Differential D147336

[IVDescriptors] Add pointer InductionDescriptors with non-constant strides (try 2)
ClosedPublic

Authored by reames on Mar 31 2023, 9:25 AM.

Download Raw Diff

Details

Reviewers

fhahn
dmgreen

Commits

rGc416f6700f51: [IVDescriptors] Add pointer InductionDescriptors with non-constant strides (try…

Summary

(JFYI - This has been heavily reframed since original attempt at landing.)

This change updates the InductionDescriptor logic to allow matching a pointer IV with a non-constant stride, but also updates the LoopVectorizer to bailout on such descriptors by default. This preserves the default vectorizer behavior.

In review, it was pointed out that there's multiple unfortunate performance implications which need to be addressed before this can be enabled. Having a flag allows us to exercise the behavior, and write test cases for logic which is otherwise unreachable (or hard to reach).

This will also enable non-constant stride pointer recurrences for other consumers. I've audited said code, and don't see any obvious issues.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

reames created this revision.Mar 31 2023, 9:25 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2023, 9:25 AM

Herald added subscribers: luke, StephenFan, frasercrmck and 23 others. · View Herald Transcript

reames requested review of this revision.Mar 31 2023, 9:25 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2023, 9:25 AM

Herald added subscribers: • pcwang-thead, MaskRay. · View Herald Transcript

reames mentioned this in rG498aa534f472: [IVDescriptors] Add pointer InductionDescriptors with non-constant strides.Mar 31 2023, 9:27 AM

@dmgreen I'd really appreciate any input you can share on the performance swing you observed.

Thinking through this, I've got a couple of potential theories, but it would help to know which (if any) is correct.

Option 1 - The codegen for the scalar and vector expansions is fairly poor. With solely constants, the expressions would get constant folded down so much it doesn't really matter, but for non-constants we're leaving a lot for the backend to cleanup. This is an easy fix, but probably the least likely issue.

Option 2 - There's something missing cost model wise for arbitrary gathers or something is matching a strided access too broadly in target code. I haven't looked into this at all yet. This is probably the most likely.

Option 3 - This is simply exposing more cases where LAA/LV can speculate on the stride of a non-constant step IV. (We already do this for integer forms.) From what I've seen in cases I'm looking at, the speculation and versioning that stride=1 at runtime can be commonly unprofitable. I'd planned on tackling that, but given it already kicks in so widely on scalar IVs, I hadn't thought enabling pointer IVs would make it critical.

Option 4 - Because of the lack of aliasing support for non-constant strides, we're generating more runtime checks resulting in overhead for shorter loops. This falls into the same category as the former in that I'm surprised if this is an issue that we don't see it with integer IVs.

reames edited the summary of this revision. (Show Details)Mar 31 2023, 9:32 AM

Hello. About the performance changes - I have been looking into them a little. A lot of it looks good, especially where gather/scatter are available which this can make use of. There are some cases where it wasn't doing quite as well under some places in MVE, but https://reviews.llvm.org/D147331 should hopefully sort out the largest of them. Sam/Nick should be back on Monday to review it. We have some other cases where gather/scatter are not available for whatever reason and those don't look as good but the changes are much smaller. Again it likely comes down to adjusting the costmodel, perhaps for the cost of the scalarized loads in getMemInstScalarizationCost needs to be a little higher under MVE. I wouldn't consider any of that blockers considering all the improvements.

The other part under AArch64 is a bit more silly I'm afraid, and falls into your Option 3. It is extracted from the code in x264, using the 16x16 version: https://github.com/mirror/x264/blob/master/common/pixel.c#L53, which I believe is also a part of Spec. I was thinking of adding a phase-ordering test but will just put it here: https://godbolt.org/z/qo54f7Pbv. It now decides to version the loop based on a stride of 1, like you mentioned. The loop-vectorized code is not as good as the slp version though, and never executed. I wouldn't expect that to cause a lot of slowdown, it only really adding an extra compare/branch and some dead code. The differences seemed higher that I expected in places though, around 25% for that function in isolation on some cpus. That might be quite noisy though as some were a lot closer to 0.

I might also be able to provide another example of an assert this is still hitting, with Assertion Instance.Lane.isFirstLane() && "cannot get lane > 0 for scalar"' failed`. It is reducing now.

This is the code producing the assert: https://godbolt.org/z/3a4drTheG. It has been ran through creduce and may be a bit over-reduced. It doesn't reproduce in godbolt, but does locally with bin/clang -O3 -mcpu=neoverse-v1 -c reduce.c -target aarch64.

clang: llvm/lib/Transforms/Vectorize/VPlan.cpp:226: llvm::Value* llvm::VPTransformState::get(llvm::VPValue*, const llvm::VPIteration&): Assertion `Instance.Lane.isFirstLane() && "cannot get lane > 0 for scalar"' failed.

Let me know if it doesn't work and I can try and get something better that use opt.

In D147336#4237118, @dmgreen wrote:

Hello. About the performance changes - I have been looking into them a little. A lot of it looks good, especially where gather/scatter are available which this can make use of. There are some cases where it wasn't doing quite as well under some places in MVE, but https://reviews.llvm.org/D147331 should hopefully sort out the largest of them. Sam/Nick should be back on Monday to review it. We have some other cases where gather/scatter are not available for whatever reason and those don't look as good but the changes are much smaller. Again it likely comes down to adjusting the costmodel, perhaps for the cost of the scalarized loads in getMemInstScalarizationCost needs to be a little higher under MVE. I wouldn't consider any of that blockers considering all the improvements.

This is all good news, thanks!

The other part under AArch64 is a bit more silly I'm afraid, and falls into your Option 3. It is extracted from the code in x264, using the 16x16 version: https://github.com/mirror/x264/blob/master/common/pixel.c#L53, which I believe is also a part of Spec. I was thinking of adding a phase-ordering test but will just put it here: https://godbolt.org/z/qo54f7Pbv. It now decides to version the loop based on a stride of 1, like you mentioned. The loop-vectorized code is not as good as the slp version though, and never executed. I wouldn't expect that to cause a lot of slowdown, it only really adding an extra compare/branch and some dead code. The differences seemed higher that I expected in places though, around 25% for that function in isolation on some cpus. That might be quite noisy though as some were a lot closer to 0.

Hm, this is less good news.

I took a bit of a look here, and noticed something interesting. The LLVM IR after optimization for the path when stride!=1 is nearly identical to the prior version. However, the runtime check in the assembly appears to have tripped some kind of hoisting optimization in the backend, and as a result, the assembly path for the stride!=1 path is a bit different.

I can see a couple ways of tackling this:

Option 1 - Be more restrictive on stride==1 speculation. I'd meant to do this anyways, but was expecting that to be the piece that had interesting perf swings.

Option 2 - Investigate the hoisting bit. (I don't really have the context to do this)

I might also be able to provide another example of an assert this is still hitting, with Assertion Instance.Lane.isFirstLane() && "cannot get lane > 0 for scalar"' failed`. It is reducing now.

Oops, yeah definitely looking forward to that. I'd assumed all the reports were the same issue since the assert seemed common.

Harbormaster completed remote builds in B223031: Diff 510058.Mar 31 2023, 12:20 PM

In D147336#4237233, @reames wrote:

I took a bit of a look here, and noticed something interesting. The LLVM IR after optimization for the path when stride!=1 is nearly identical to the prior version. However, the runtime check in the assembly appears to have tripped some kind of hoisting optimization in the backend, and as a result, the assembly path for the stride!=1 path is a bit different.

I can see a couple ways of tackling this:

Option 1 - Be more restrictive on stride==1 speculation. I'd meant to do this anyways, but was expecting that to be the piece that had interesting perf swings.

Option 2 - Investigate the hoisting bit. (I don't really have the context to do this)

I was looking at another profile that ran with different architecture features and with LTO, where the differences were more pronounced. The main body was vectorized with scalable vectors and remainder was no longer unrolled. That might have been why the differences in performance looked higher that I expected, but that function is relatively hot and called many times, so even just the extra compares can slow things down a bit.

Do you think it would be possible to come up with a heuristic to prevent the 1-stride speculation in this case at least? The overlapping load in https://godbolt.org/z/ToT4W74Yf with a stride of 1 in the same loop at the point of vectorization look like something that is unlike to be helpful in many cases.

(I have looked into single stride before in https://reviews.llvm.org/D71919, where I was running into places where it is not profitable compared to gather/scatter. From what I remember there were a few cases in the llvm-test-suite where was helping, and I didn't push that patch forward as I had no strong motivating example. The base AArch64 doesn't have gather scatter, so this case is a little different).

Ok, it's pretty clear the scope of this is much much broader than I'd realized. I'm going to change tack here, and come at this from a different angle. I'm going to re-frame this patch as generalizing IVDescriptors, and adding an off-by-default option in LV to allow vectorization of such IVs. This keeps the status quo in LV while getting the IVDescriptor code landed.

I do not intend for this flag to be supported long term. I need it to write a test corpus that covers the various optimization quality issues reported above. Doing it this way also allows me to separate the second functional issue into it's own review - I've dug into that one a bit, and it's looking a bit annoying.

My thinking is now that we need to work through result quality for strided IVs (i.e. using gathers or strided accesses where available), before returning to this. I'm hoping to be able to stage these as individual commits.

reames updated this revision to Diff 510555.Apr 3 2023, 10:54 AM

reames edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B223398: Diff 510555.Apr 3 2023, 11:37 AM

Sounds good. LGTM

Like I said many of the changes were improvements. There is likely some decent gain we can get from enabling it fully, if we can work through the problem cases.

This revision is now accepted and ready to land.Apr 4 2023, 7:33 AM

LGTM, having this behind an option to start with seems like a good idea

This revision was landed with ongoing or failed builds.Apr 5 2023, 9:32 AM

Closed by commit rGc416f6700f51: [IVDescriptors] Add pointer InductionDescriptors with non-constant strides (try… (authored by reames). · Explain Why

This revision was automatically updated to reflect the committed changes.

reames added a commit: rGc416f6700f51: [IVDescriptors] Add pointer InductionDescriptors with non-constant strides (try….

Revision Contents

Path

Size

llvm/

lib/

Analysis/

IVDescriptors.cpp

22 lines

Transforms/

Vectorize/

LoopVectorizationLegality.cpp

23 lines

LoopVectorize.cpp

9 lines

test/

Transforms/

LoopVectorize/

RISCV/

strided-accesses.ll

125 lines

pointer-induction-unroll.ll

95 lines

pointer-induction.ll

82 lines

Diff 511128

llvm/lib/Analysis/IVDescriptors.cpp

Show First 20 Lines • Show All 1,282 Lines • ▼ Show 20 Lines	assert((IK != IK_PtrInduction \|\| StartValue->getType()->isPointerTy()) &&
"StartValue is not a pointer for pointer induction");		"StartValue is not a pointer for pointer induction");
assert((IK != IK_IntInduction \|\| StartValue->getType()->isIntegerTy()) &&		assert((IK != IK_IntInduction \|\| StartValue->getType()->isIntegerTy()) &&
"StartValue is not an integer for integer induction");		"StartValue is not an integer for integer induction");

// Check the Step Value. It should be non-zero integer value.		// Check the Step Value. It should be non-zero integer value.
assert((!getConstIntStepValue() \|\| !getConstIntStepValue()->isZero()) &&		assert((!getConstIntStepValue() \|\| !getConstIntStepValue()->isZero()) &&
"Step value is zero");		"Step value is zero");

assert((IK != IK_PtrInduction \|\| getConstIntStepValue()) &&
"Step value should be constant for pointer induction");
assert((IK == IK_FpInduction \|\| Step->getType()->isIntegerTy()) &&		assert((IK == IK_FpInduction \|\| Step->getType()->isIntegerTy()) &&
"StepValue is not an integer");		"StepValue is not an integer");

assert((IK != IK_FpInduction \|\| Step->getType()->isFloatingPointTy()) &&		assert((IK != IK_FpInduction \|\| Step->getType()->isFloatingPointTy()) &&
"StepValue is not FP for FpInduction");		"StepValue is not FP for FpInduction");
assert((IK != IK_FpInduction \|\|		assert((IK != IK_FpInduction \|\|
(InductionBinOp &&		(InductionBinOp &&
(InductionBinOp->getOpcode() == Instruction::FAdd \|\|		(InductionBinOp->getOpcode() == Instruction::FAdd \|\|
▲ Show 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	if (PhiTy->isIntegerTy()) {
BinaryOperator *BOp =		BinaryOperator *BOp =
dyn_cast<BinaryOperator>(Phi->getIncomingValueForBlock(Latch));		dyn_cast<BinaryOperator>(Phi->getIncomingValueForBlock(Latch));
D = InductionDescriptor(StartValue, IK_IntInduction, Step, BOp,		D = InductionDescriptor(StartValue, IK_IntInduction, Step, BOp,
/* ElementType */ nullptr, CastsToIgnore);		/* ElementType */ nullptr, CastsToIgnore);
return true;		return true;
}		}

assert(PhiTy->isPointerTy() && "The PHI must be a pointer");		assert(PhiTy->isPointerTy() && "The PHI must be a pointer");
		PointerType *PtrTy = cast<PointerType>(PhiTy);

		// Always use i8 element type for opaque pointer inductions.
		// This allows induction variables w/non-constant steps.
		if (PtrTy->isOpaque()) {
		D = InductionDescriptor(StartValue, IK_PtrInduction, Step,
		/* BinOp */ nullptr,
		Type::getInt8Ty(PtrTy->getContext()));
		return true;
		}

// Pointer induction should be a constant.		// Pointer induction should be a constant.
		// TODO: This could be generalized, but should probably just
		// be dropped instead once the migration to opaque ptrs is
		// complete.
if (!ConstStep)		if (!ConstStep)
return false;		return false;

// Always use i8 element type for opaque pointer inductions.		Type *ElementType = PtrTy->getNonOpaquePointerElementType();
PointerType *PtrTy = cast<PointerType>(PhiTy);
Type *ElementType = PtrTy->isOpaque()
? Type::getInt8Ty(PtrTy->getContext())
: PtrTy->getNonOpaquePointerElementType();
if (!ElementType->isSized())		if (!ElementType->isSized())
return false;		return false;

ConstantInt *CV = ConstStep->getValue();		ConstantInt *CV = ConstStep->getValue();
const DataLayout &DL = Phi->getModule()->getDataLayout();		const DataLayout &DL = Phi->getModule()->getDataLayout();
TypeSize TySize = DL.getTypeAllocSize(ElementType);		TypeSize TySize = DL.getTypeAllocSize(ElementType);
// TODO: We could potentially support this for scalable vectors if we can		// TODO: We could potentially support this for scalable vectors if we can
// prove at compile time that the constant step is always a multiple of		// prove at compile time that the constant step is always a multiple of
Show All 14 Lines

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Show All 31 Lines

#define LV_NAME "loop-vectorize"		#define LV_NAME "loop-vectorize"
#define DEBUG_TYPE LV_NAME		#define DEBUG_TYPE LV_NAME

static cl::opt<bool>		static cl::opt<bool>
EnableIfConversion("enable-if-conversion", cl::init(true), cl::Hidden,		EnableIfConversion("enable-if-conversion", cl::init(true), cl::Hidden,
cl::desc("Enable if-conversion during vectorization."));		cl::desc("Enable if-conversion during vectorization."));

		static cl::opt<bool>
		AllowStridedPointerIVs("lv-strided-pointer-ivs", cl::init(false), cl::Hidden,
		cl::desc("Enable recognition of non-constant strided "
		"pointer induction variables."));

namespace llvm {		namespace llvm {
cl::opt<bool>		cl::opt<bool>
HintsAllowReordering("hints-allow-reordering", cl::init(true), cl::Hidden,		HintsAllowReordering("hints-allow-reordering", cl::init(true), cl::Hidden,
cl::desc("Allow enabling loop hints to reorder "		cl::desc("Allow enabling loop hints to reorder "
"FP operations during vectorization."));		"FP operations during vectorization."));
}		}

// TODO: Move size-based thresholds out of legality checking, make cost based		// TODO: Move size-based thresholds out of legality checking, make cost based
▲ Show 20 Lines • Show All 647 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {
if (RecurrenceDescriptor::isReductionPHI(Phi, TheLoop, RedDes, DB, AC,		if (RecurrenceDescriptor::isReductionPHI(Phi, TheLoop, RedDes, DB, AC,
DT, PSE.getSE())) {		DT, PSE.getSE())) {
Requirements->addExactFPMathInst(RedDes.getExactFPMathInst());		Requirements->addExactFPMathInst(RedDes.getExactFPMathInst());
AllowedExit.insert(RedDes.getLoopExitInstr());		AllowedExit.insert(RedDes.getLoopExitInstr());
Reductions[Phi] = RedDes;		Reductions[Phi] = RedDes;
continue;		continue;
}		}

		// We prevent matching non-constant strided pointer IVS to preserve
		// historical vectorizer behavior after a generalization of the
		// IVDescriptor code. The intent is to remove this check, but we
		// have to fix issues around code quality for such loops first.
		auto isDisallowedStridedPointerInduction =
		[](const InductionDescriptor &ID) {
		if (AllowStridedPointerIVs)
		return false;
		return ID.getKind() == InductionDescriptor::IK_PtrInduction &&
		ID.getConstIntStepValue() == nullptr;
		};

// TODO: Instead of recording the AllowedExit, it would be good to		// TODO: Instead of recording the AllowedExit, it would be good to
// record the complementary set: NotAllowedExit. These include (but may		// record the complementary set: NotAllowedExit. These include (but may
// not be limited to):		// not be limited to):
// 1. Reduction phis as they represent the one-before-last value, which		// 1. Reduction phis as they represent the one-before-last value, which
// is not available when vectorized		// is not available when vectorized
// 2. Induction phis and increment when SCEV predicates cannot be used		// 2. Induction phis and increment when SCEV predicates cannot be used
// outside the loop - see addInductionPhi		// outside the loop - see addInductionPhi
// 3. Non-Phis with outside uses when SCEV predicates cannot be used		// 3. Non-Phis with outside uses when SCEV predicates cannot be used
// outside the loop - see call to hasOutsideLoopUser in the non-phi		// outside the loop - see call to hasOutsideLoopUser in the non-phi
// handling below		// handling below
// 4. FixedOrderRecurrence phis that can possibly be handled by		// 4. FixedOrderRecurrence phis that can possibly be handled by
// extraction.		// extraction.
// By recording these, we can then reason about ways to vectorize each		// By recording these, we can then reason about ways to vectorize each
// of these NotAllowedExit.		// of these NotAllowedExit.
InductionDescriptor ID;		InductionDescriptor ID;
if (InductionDescriptor::isInductionPHI(Phi, TheLoop, PSE, ID)) {		if (InductionDescriptor::isInductionPHI(Phi, TheLoop, PSE, ID) &&
		!isDisallowedStridedPointerInduction(ID)) {
addInductionPhi(Phi, ID, AllowedExit);		addInductionPhi(Phi, ID, AllowedExit);
Requirements->addExactFPMathInst(ID.getExactFPMathInst());		Requirements->addExactFPMathInst(ID.getExactFPMathInst());
continue;		continue;
}		}

if (RecurrenceDescriptor::isFixedOrderRecurrence(Phi, TheLoop,		if (RecurrenceDescriptor::isFixedOrderRecurrence(Phi, TheLoop,
SinkAfter, DT)) {		SinkAfter, DT)) {
AllowedExit.insert(Phi);		AllowedExit.insert(Phi);
FixedOrderRecurrences.insert(Phi);		FixedOrderRecurrences.insert(Phi);
continue;		continue;
}		}

// As a last resort, coerce the PHI to a AddRec expression		// As a last resort, coerce the PHI to a AddRec expression
// and re-try classifying it a an induction PHI.		// and re-try classifying it a an induction PHI.
if (InductionDescriptor::isInductionPHI(Phi, TheLoop, PSE, ID, true)) {		if (InductionDescriptor::isInductionPHI(Phi, TheLoop, PSE, ID, true) &&
		!isDisallowedStridedPointerInduction(ID)) {
addInductionPhi(Phi, ID, AllowedExit);		addInductionPhi(Phi, ID, AllowedExit);
continue;		continue;
}		}

reportVectorizationFailure("Found an unidentified PHI",		reportVectorizationFailure("Found an unidentified PHI",
"value that could not be identified as "		"value that could not be identified as "
"reduction is used outside the loop",		"reduction is used outside the loop",
"NonReductionValueUsedOutsideLoop", ORE, TheLoop, Phi);		"NonReductionValueUsedOutsideLoop", ORE, TheLoop, Phi);
▲ Show 20 Lines • Show All 704 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,485 Lines • ▼ Show 20 Lines	case InductionDescriptor::IK_IntInduction: {
assert(Index->getType() == StartValue->getType() &&		assert(Index->getType() == StartValue->getType() &&
"Index type does not match StartValue type");		"Index type does not match StartValue type");
if (isa<ConstantInt>(Step) && cast<ConstantInt>(Step)->isMinusOne())		if (isa<ConstantInt>(Step) && cast<ConstantInt>(Step)->isMinusOne())
return B.CreateSub(StartValue, Index);		return B.CreateSub(StartValue, Index);
auto *Offset = CreateMul(Index, Step);		auto *Offset = CreateMul(Index, Step);
return CreateAdd(StartValue, Offset);		return CreateAdd(StartValue, Offset);
}		}
case InductionDescriptor::IK_PtrInduction: {		case InductionDescriptor::IK_PtrInduction: {
assert(isa<Constant>(Step) &&
"Expected constant step for pointer induction");
return B.CreateGEP(ID.getElementType(), StartValue, CreateMul(Index, Step));		return B.CreateGEP(ID.getElementType(), StartValue, CreateMul(Index, Step));
}		}
case InductionDescriptor::IK_FpInduction: {		case InductionDescriptor::IK_FpInduction: {
assert(!isa<VectorType>(Index->getType()) &&		assert(!isa<VectorType>(Index->getType()) &&
"Vector indices not supported for FP inductions yet");		"Vector indices not supported for FP inductions yet");
assert(Step->getType()->isFloatingPointTy() && "Expected FP Step value");		assert(Step->getType()->isFloatingPointTy() && "Expected FP Step value");
auto InductionBinOp = ID.getInductionBinOp();		auto InductionBinOp = ID.getInductionBinOp();
assert(InductionBinOp &&		assert(InductionBinOp &&
▲ Show 20 Lines • Show All 5,766 Lines • ▼ Show 20 Lines	VPRecipeBase *VPRecipeBuilder::tryToOptimizeInductionPHI(
if (auto *II = Legal->getIntOrFpInductionDescriptor(Phi))		if (auto *II = Legal->getIntOrFpInductionDescriptor(Phi))
return createWidenInductionRecipes(Phi, Phi, Operands[0], *II, CM, Plan,		return createWidenInductionRecipes(Phi, Phi, Operands[0], *II, CM, Plan,
PSE.getSE(), OrigLoop, Range);		PSE.getSE(), OrigLoop, Range);

// Check if this is pointer induction. If so, build the recipe for it.		// Check if this is pointer induction. If so, build the recipe for it.
if (auto *II = Legal->getPointerInductionDescriptor(Phi)) {		if (auto *II = Legal->getPointerInductionDescriptor(Phi)) {
VPValue *Step = vputils::getOrCreateVPValueForSCEVExpr(Plan, II->getStep(),		VPValue *Step = vputils::getOrCreateVPValueForSCEVExpr(Plan, II->getStep(),
*PSE.getSE());		*PSE.getSE());
assert(isa<SCEVConstant>(II->getStep()));
return new VPWidenPointerInductionRecipe(		return new VPWidenPointerInductionRecipe(
Phi, Operands[0], Step, *II,		Phi, Operands[0], Step, *II,
LoopVectorizationPlanner::getDecisionAndClampRange(		LoopVectorizationPlanner::getDecisionAndClampRange(
[&](ElementCount VF) {		[&](ElementCount VF) {
return CM.isScalarAfterVectorization(Phi, VF);		return CM.isScalarAfterVectorization(Phi, VF);
},		},
Range));		Range));
}		}
▲ Show 20 Lines • Show All 1,088 Lines • ▼ Show 20 Lines	for (unsigned Part = 0; Part < State.UF; ++Part) {
Value *PartStart =		Value *PartStart =
createStepForVF(State.Builder, PtrInd->getType(), State.VF, Part);		createStepForVF(State.Builder, PtrInd->getType(), State.VF, Part);

for (unsigned Lane = 0; Lane < Lanes; ++Lane) {		for (unsigned Lane = 0; Lane < Lanes; ++Lane) {
Value *Idx = State.Builder.CreateAdd(		Value *Idx = State.Builder.CreateAdd(
PartStart, ConstantInt::get(PtrInd->getType(), Lane));		PartStart, ConstantInt::get(PtrInd->getType(), Lane));
Value *GlobalIdx = State.Builder.CreateAdd(PtrInd, Idx);		Value *GlobalIdx = State.Builder.CreateAdd(PtrInd, Idx);

Value *Step = State.get(getOperand(1), VPIteration(0, Part));		Value *Step = State.get(getOperand(1), VPIteration(Part, Lane));
Value *SclrGep = emitTransformedIndex(		Value *SclrGep = emitTransformedIndex(
State.Builder, GlobalIdx, IndDesc.getStartValue(), Step, IndDesc);		State.Builder, GlobalIdx, IndDesc.getStartValue(), Step, IndDesc);
SclrGep->setName("next.gep");		SclrGep->setName("next.gep");
State.set(this, SclrGep, VPIteration(Part, Lane));		State.set(this, SclrGep, VPIteration(Part, Lane));
}		}
}		}
return;		return;
}		}

assert(isa<SCEVConstant>(IndDesc.getStep()) &&
"Induction step not a SCEV constant!");
Type *PhiType = IndDesc.getStep()->getType();		Type *PhiType = IndDesc.getStep()->getType();

// Build a pointer phi		// Build a pointer phi
Value *ScalarStartValue = getStartValue()->getLiveInIRValue();		Value *ScalarStartValue = getStartValue()->getLiveInIRValue();
Type *ScStValueType = ScalarStartValue->getType();		Type *ScStValueType = ScalarStartValue->getType();
PHINode *NewPointerPhi =		PHINode *NewPointerPhi =
PHINode::Create(ScStValueType, 2, "pointer.phi", CanonicalIV);		PHINode::Create(ScStValueType, 2, "pointer.phi", CanonicalIV);

Show All 26 Lines	for (unsigned Part = 0; Part < State.UF; ++Part) {
Value *StartOffsetScalar =		Value *StartOffsetScalar =
State.Builder.CreateMul(RuntimeVF, ConstantInt::get(PhiType, Part));		State.Builder.CreateMul(RuntimeVF, ConstantInt::get(PhiType, Part));
Value *StartOffset =		Value *StartOffset =
State.Builder.CreateVectorSplat(State.VF, StartOffsetScalar);		State.Builder.CreateVectorSplat(State.VF, StartOffsetScalar);
// Create a vector of consecutive numbers from zero to VF.		// Create a vector of consecutive numbers from zero to VF.
StartOffset = State.Builder.CreateAdd(		StartOffset = State.Builder.CreateAdd(
StartOffset, State.Builder.CreateStepVector(VecPhiType));		StartOffset, State.Builder.CreateStepVector(VecPhiType));

assert(ScalarStepValue == State.get(getOperand(1), VPIteration(0, Part)) &&		assert(ScalarStepValue == State.get(getOperand(1), VPIteration(Part, 0)) &&
"scalar step must be the same across all parts");		"scalar step must be the same across all parts");
Value *GEP = State.Builder.CreateGEP(		Value *GEP = State.Builder.CreateGEP(
IndDesc.getElementType(), NewPointerPhi,		IndDesc.getElementType(), NewPointerPhi,
State.Builder.CreateMul(		State.Builder.CreateMul(
StartOffset,		StartOffset,
State.Builder.CreateVectorSplat(State.VF, ScalarStepValue),		State.Builder.CreateVectorSplat(State.VF, ScalarStepValue),
"vector.gep"));		"vector.gep"));
State.set(this, GEP, Part);		State.set(this, GEP, Part);
▲ Show 20 Lines • Show All 1,172 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -passes=loop-vectorize -mtriple=riscv64 -mattr=+v -S \| FileCheck %s		; RUN: opt < %s -passes=loop-vectorize -mtriple=riscv64 -mattr=+v -S \| FileCheck --check-prefixes=CHECK,NOSTRIDED %s
		; RUN: opt < %s -passes=loop-vectorize -mtriple=riscv64 -mattr=+v -lv-strided-pointer-ivs=true -S \| FileCheck --check-prefixes=CHECK,STRIDED %s


define void @single_constant_stride_int_scaled(ptr %p) {		define void @single_constant_stride_int_scaled(ptr %p) {
; CHECK-LABEL: @single_constant_stride_int_scaled(		; CHECK-LABEL: @single_constant_stride_int_scaled(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()		; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 4		; CHECK-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 4
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ule i64 1024, [[TMP1]]		; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ule i64 1024, [[TMP1]]
▲ Show 20 Lines • Show All 531 Lines • ▼ Show 20 Lines	loop:
%done = icmp eq i64 %nexti, 1024		%done = icmp eq i64 %nexti, 1024
br i1 %done, label %exit, label %loop		br i1 %done, label %exit, label %loop
exit:		exit:
ret void		ret void
}		}


define void @double_stride_ptr_iv(ptr %p, ptr %p2, i64 %stride) {		define void @double_stride_ptr_iv(ptr %p, ptr %p2, i64 %stride) {
; CHECK-LABEL: @double_stride_ptr_iv(		; NOSTRIDED-LABEL: @double_stride_ptr_iv(
; CHECK-NEXT: entry:		; NOSTRIDED-NEXT: entry:
; CHECK-NEXT: br label [[LOOP:%.*]]		; NOSTRIDED-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:		; NOSTRIDED: loop:
; CHECK-NEXT: [[I:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[NEXTI:%.*]], [[LOOP]] ]		; NOSTRIDED-NEXT: [[I:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[NEXTI:%.*]], [[LOOP]] ]
; CHECK-NEXT: [[PTR:%.]] = phi ptr [ [[P:%.]], [[ENTRY]] ], [ [[PTR_NEXT:%.*]], [[LOOP]] ]		; NOSTRIDED-NEXT: [[PTR:%.]] = phi ptr [ [[P:%.]], [[ENTRY]] ], [ [[PTR_NEXT:%.*]], [[LOOP]] ]
; CHECK-NEXT: [[PTR2:%.]] = phi ptr [ [[P2:%.]], [[ENTRY]] ], [ [[PTR2_NEXT:%.*]], [[LOOP]] ]		; NOSTRIDED-NEXT: [[PTR2:%.]] = phi ptr [ [[P2:%.]], [[ENTRY]] ], [ [[PTR2_NEXT:%.*]], [[LOOP]] ]
; CHECK-NEXT: [[X0:%.*]] = load i32, ptr [[PTR]], align 4		; NOSTRIDED-NEXT: [[X0:%.*]] = load i32, ptr [[PTR]], align 4
; CHECK-NEXT: [[Y0:%.*]] = add i32 [[X0]], 1		; NOSTRIDED-NEXT: [[Y0:%.*]] = add i32 [[X0]], 1
; CHECK-NEXT: store i32 [[Y0]], ptr [[PTR2]], align 4		; NOSTRIDED-NEXT: store i32 [[Y0]], ptr [[PTR2]], align 4
; CHECK-NEXT: [[PTR_NEXT]] = getelementptr inbounds i8, ptr [[PTR]], i64 [[STRIDE:%.*]]		; NOSTRIDED-NEXT: [[PTR_NEXT]] = getelementptr inbounds i8, ptr [[PTR]], i64 [[STRIDE:%.*]]
; CHECK-NEXT: [[PTR2_NEXT]] = getelementptr inbounds i8, ptr [[PTR2]], i64 [[STRIDE]]		; NOSTRIDED-NEXT: [[PTR2_NEXT]] = getelementptr inbounds i8, ptr [[PTR2]], i64 [[STRIDE]]
; CHECK-NEXT: [[NEXTI]] = add i64 [[I]], 1		; NOSTRIDED-NEXT: [[NEXTI]] = add i64 [[I]], 1
; CHECK-NEXT: [[DONE:%.*]] = icmp eq i64 [[NEXTI]], 1024		; NOSTRIDED-NEXT: [[DONE:%.*]] = icmp eq i64 [[NEXTI]], 1024
; CHECK-NEXT: br i1 [[DONE]], label [[EXIT:%.*]], label [[LOOP]]		; NOSTRIDED-NEXT: br i1 [[DONE]], label [[EXIT:%.*]], label [[LOOP]]
; CHECK: exit:		; NOSTRIDED: exit:
; CHECK-NEXT: ret void		; NOSTRIDED-NEXT: ret void
		;
		; STRIDED-LABEL: @double_stride_ptr_iv(
		; STRIDED-NEXT: entry:
		; STRIDED-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
		; STRIDED-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 4
		; STRIDED-NEXT: [[TMP2:%.*]] = call i64 @llvm.umax.i64(i64 8, i64 [[TMP1]])
		; STRIDED-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP2]]
		; STRIDED-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_SCEVCHECK:%.]]
		; STRIDED: vector.scevcheck:
		; STRIDED-NEXT: [[IDENT_CHECK:%.]] = icmp ne i64 [[STRIDE:%.]], 1
		; STRIDED-NEXT: br i1 [[IDENT_CHECK]], label [[SCALAR_PH]], label [[VECTOR_MEMCHECK:%.*]]
		; STRIDED: vector.memcheck:
		; STRIDED-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[P2:%.]], i64 1027
		; STRIDED-NEXT: [[SCEVGEP1:%.]] = getelementptr i8, ptr [[P:%.]], i64 1027
		; STRIDED-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[P2]], [[SCEVGEP1]]
		; STRIDED-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[P]], [[SCEVGEP]]
		; STRIDED-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
		; STRIDED-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
		; STRIDED: vector.ph:
		; STRIDED-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
		; STRIDED-NEXT: [[TMP4:%.*]] = mul i64 [[TMP3]], 4
		; STRIDED-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP4]]
		; STRIDED-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
		; STRIDED-NEXT: [[TMP5:%.*]] = mul i64 [[N_VEC]], [[STRIDE]]
		; STRIDED-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr [[P]], i64 [[TMP5]]
		; STRIDED-NEXT: [[TMP6:%.*]] = mul i64 [[N_VEC]], [[STRIDE]]
		; STRIDED-NEXT: [[IND_END3:%.*]] = getelementptr i8, ptr [[P2]], i64 [[TMP6]]
		; STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
		; STRIDED: vector.body:
		; STRIDED-NEXT: [[POINTER_PHI:%.]] = phi ptr [ [[P]], [[VECTOR_PH]] ], [ [[PTR_IND:%.]], [[VECTOR_BODY]] ]
		; STRIDED-NEXT: [[POINTER_PHI7:%.]] = phi ptr [ [[P2]], [[VECTOR_PH]] ], [ [[PTR_IND8:%.]], [[VECTOR_BODY]] ]
		; STRIDED-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; STRIDED-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
		; STRIDED-NEXT: [[TMP8:%.*]] = mul i64 [[TMP7]], 4
		; STRIDED-NEXT: [[TMP9:%.*]] = mul i64 [[TMP8]], 1
		; STRIDED-NEXT: [[TMP10:%.*]] = mul i64 [[STRIDE]], [[TMP9]]
		; STRIDED-NEXT: [[TMP11:%.*]] = mul i64 [[TMP8]], 0
		; STRIDED-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP11]], i64 0
		; STRIDED-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
		; STRIDED-NEXT: [[TMP12:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
		; STRIDED-NEXT: [[TMP13:%.*]] = add <vscale x 4 x i64> [[DOTSPLAT]], [[TMP12]]
		; STRIDED-NEXT: [[DOTSPLATINSERT5:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[STRIDE]], i64 0
		; STRIDED-NEXT: [[DOTSPLAT6:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT5]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
		; STRIDED-NEXT: [[VECTOR_GEP:%.*]] = mul <vscale x 4 x i64> [[TMP13]], [[DOTSPLAT6]]
		; STRIDED-NEXT: [[TMP14:%.*]] = getelementptr i8, ptr [[POINTER_PHI]], <vscale x 4 x i64> [[VECTOR_GEP]]
		; STRIDED-NEXT: [[TMP15:%.*]] = call i64 @llvm.vscale.i64()
		; STRIDED-NEXT: [[TMP16:%.*]] = mul i64 [[TMP15]], 4
		; STRIDED-NEXT: [[TMP17:%.*]] = mul i64 [[TMP16]], 1
		; STRIDED-NEXT: [[TMP18:%.*]] = mul i64 [[STRIDE]], [[TMP17]]
		; STRIDED-NEXT: [[TMP19:%.*]] = mul i64 [[TMP16]], 0
		; STRIDED-NEXT: [[DOTSPLATINSERT9:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP19]], i64 0
		; STRIDED-NEXT: [[DOTSPLAT10:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT9]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
		; STRIDED-NEXT: [[TMP20:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
		; STRIDED-NEXT: [[TMP21:%.*]] = add <vscale x 4 x i64> [[DOTSPLAT10]], [[TMP20]]
		; STRIDED-NEXT: [[VECTOR_GEP13:%.*]] = mul <vscale x 4 x i64> [[TMP21]], [[DOTSPLAT6]]
		; STRIDED-NEXT: [[TMP22:%.*]] = getelementptr i8, ptr [[POINTER_PHI7]], <vscale x 4 x i64> [[VECTOR_GEP13]]
		; STRIDED-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP14]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison), !alias.scope !16
		; STRIDED-NEXT: [[TMP23:%.*]] = add <vscale x 4 x i32> [[WIDE_MASKED_GATHER]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
		; STRIDED-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[TMP23]], <vscale x 4 x ptr> [[TMP22]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer)), !alias.scope !19, !noalias !16
		; STRIDED-NEXT: [[TMP24:%.*]] = call i64 @llvm.vscale.i64()
		; STRIDED-NEXT: [[TMP25:%.*]] = mul i64 [[TMP24]], 4
		; STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP25]]
		; STRIDED-NEXT: [[PTR_IND]] = getelementptr i8, ptr [[POINTER_PHI]], i64 [[TMP10]]
		; STRIDED-NEXT: [[PTR_IND8]] = getelementptr i8, ptr [[POINTER_PHI7]], i64 [[TMP18]]
		; STRIDED-NEXT: [[TMP26:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; STRIDED-NEXT: br i1 [[TMP26]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]
		; STRIDED: middle.block:
		; STRIDED-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
		; STRIDED-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; STRIDED: scalar.ph:
		; STRIDED-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_SCEVCHECK]] ], [ 0, [[VECTOR_MEMCHECK]] ]
		; STRIDED-NEXT: [[BC_RESUME_VAL2:%.*]] = phi ptr [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[P]], [[ENTRY]] ], [ [[P]], [[VECTOR_SCEVCHECK]] ], [ [[P]], [[VECTOR_MEMCHECK]] ]
		; STRIDED-NEXT: [[BC_RESUME_VAL4:%.*]] = phi ptr [ [[IND_END3]], [[MIDDLE_BLOCK]] ], [ [[P2]], [[ENTRY]] ], [ [[P2]], [[VECTOR_SCEVCHECK]] ], [ [[P2]], [[VECTOR_MEMCHECK]] ]
		; STRIDED-NEXT: br label [[LOOP:%.*]]
		; STRIDED: loop:
		; STRIDED-NEXT: [[I:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[NEXTI:%.]], [[LOOP]] ]
		; STRIDED-NEXT: [[PTR:%.]] = phi ptr [ [[BC_RESUME_VAL2]], [[SCALAR_PH]] ], [ [[PTR_NEXT:%.]], [[LOOP]] ]
		; STRIDED-NEXT: [[PTR2:%.]] = phi ptr [ [[BC_RESUME_VAL4]], [[SCALAR_PH]] ], [ [[PTR2_NEXT:%.]], [[LOOP]] ]
		; STRIDED-NEXT: [[X0:%.*]] = load i32, ptr [[PTR]], align 4
		; STRIDED-NEXT: [[Y0:%.*]] = add i32 [[X0]], 1
		; STRIDED-NEXT: store i32 [[Y0]], ptr [[PTR2]], align 4
		; STRIDED-NEXT: [[PTR_NEXT]] = getelementptr inbounds i8, ptr [[PTR]], i64 [[STRIDE]]
		; STRIDED-NEXT: [[PTR2_NEXT]] = getelementptr inbounds i8, ptr [[PTR2]], i64 [[STRIDE]]
		; STRIDED-NEXT: [[NEXTI]] = add i64 [[I]], 1
		; STRIDED-NEXT: [[DONE:%.*]] = icmp eq i64 [[NEXTI]], 1024
		; STRIDED-NEXT: br i1 [[DONE]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP22:![0-9]+]]
		; STRIDED: exit:
		; STRIDED-NEXT: ret void
;		;
entry:		entry:
br label %loop		br label %loop
loop:		loop:
%i = phi i64 [0, %entry], [%nexti, %loop]		%i = phi i64 [0, %entry], [%nexti, %loop]
%ptr = phi ptr [%p, %entry], [%ptr.next, %loop]		%ptr = phi ptr [%p, %entry], [%ptr.next, %loop]
%ptr2 = phi ptr [%p2, %entry], [%ptr2.next, %loop]		%ptr2 = phi ptr [%p2, %entry], [%ptr2.next, %loop]

Show All 12 Lines

llvm/test/Transforms/LoopVectorize/pointer-induction-unroll.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=loop-vectorize -force-vector-interleave=4 -force-vector-width=1 -S \| FileCheck %s			; RUN: opt < %s -passes=loop-vectorize -force-vector-interleave=4 -force-vector-width=1 -S \| FileCheck --check-prefixes=CHECK,DEFAULT %s
				; RUN: opt < %s -passes=loop-vectorize -force-vector-interleave=4 -force-vector-width=1 -lv-strided-pointer-ivs=true -S \| FileCheck --check-prefixes=CHECK,STRIDED %s
	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

	; Test the scalar expansion of a non-constant stride pointer IV			; Test the scalar expansion of a non-constant stride pointer IV
	define void @non_constant_scalar_expansion(i32 %0, ptr %call) {			define void @non_constant_scalar_expansion(i32 %0, ptr %call) {
	; CHECK-LABEL: @non_constant_scalar_expansion(			; DEFAULT-LABEL: @non_constant_scalar_expansion(
	; CHECK-NEXT: entry:			; DEFAULT-NEXT: entry:
	; CHECK-NEXT: [[MUL:%.]] = shl i32 [[TMP0:%.]], 1			; DEFAULT-NEXT: [[MUL:%.]] = shl i32 [[TMP0:%.]], 1
	; CHECK-NEXT: br label [[FOR_COND:%.*]]			; DEFAULT-NEXT: br label [[FOR_COND:%.*]]
	; CHECK: for.cond:			; DEFAULT: for.cond:
	; CHECK-NEXT: [[TMP1:%.]] = phi i32 [ 30, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_COND]] ]			; DEFAULT-NEXT: [[TMP1:%.]] = phi i32 [ 30, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_COND]] ]
	; CHECK-NEXT: [[P_0:%.]] = phi ptr [ null, [[ENTRY]] ], [ [[ADD_PTR:%.]], [[FOR_COND]] ]			; DEFAULT-NEXT: [[P_0:%.]] = phi ptr [ null, [[ENTRY]] ], [ [[ADD_PTR:%.]], [[FOR_COND]] ]
	; CHECK-NEXT: [[ADD_PTR]] = getelementptr i8, ptr [[P_0]], i32 [[MUL]]			; DEFAULT-NEXT: [[ADD_PTR]] = getelementptr i8, ptr [[P_0]], i32 [[MUL]]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr ptr, ptr [[CALL:%.]], i32 [[TMP1]]			; DEFAULT-NEXT: [[ARRAYIDX:%.]] = getelementptr ptr, ptr [[CALL:%.]], i32 [[TMP1]]
	; CHECK-NEXT: store ptr [[P_0]], ptr [[ARRAYIDX]], align 4			; DEFAULT-NEXT: store ptr [[P_0]], ptr [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[INC]] = add i32 [[TMP1]], 1			; DEFAULT-NEXT: [[INC]] = add i32 [[TMP1]], 1
	; CHECK-NEXT: [[TOBOOL_NOT:%.*]] = icmp eq i32 [[TMP1]], 0			; DEFAULT-NEXT: [[TOBOOL_NOT:%.*]] = icmp eq i32 [[TMP1]], 0
	; CHECK-NEXT: br i1 [[TOBOOL_NOT]], label [[FOR_END:%.*]], label [[FOR_COND]]			; DEFAULT-NEXT: br i1 [[TOBOOL_NOT]], label [[FOR_END:%.*]], label [[FOR_COND]]
	; CHECK: for.end:			; DEFAULT: for.end:
	; CHECK-NEXT: ret void			; DEFAULT-NEXT: ret void
				;
				; STRIDED-LABEL: @non_constant_scalar_expansion(
				; STRIDED-NEXT: entry:
				; STRIDED-NEXT: [[MUL:%.]] = shl i32 [[TMP0:%.]], 1
				; STRIDED-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; STRIDED: vector.ph:
				; STRIDED-NEXT: [[TMP1:%.*]] = sext i32 [[MUL]] to i64
				; STRIDED-NEXT: [[TMP2:%.*]] = mul i64 4294967264, [[TMP1]]
				; STRIDED-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr null, i64 [[TMP2]]
				; STRIDED-NEXT: [[TMP3:%.*]] = sext i32 [[MUL]] to i64
				; STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
				; STRIDED: vector.body:
				; STRIDED-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; STRIDED-NEXT: [[TMP4:%.*]] = add i64 [[INDEX]], 0
				; STRIDED-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], [[TMP3]]
				; STRIDED-NEXT: [[NEXT_GEP:%.*]] = getelementptr i8, ptr null, i64 [[TMP5]]
				; STRIDED-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 1
				; STRIDED-NEXT: [[TMP7:%.*]] = mul i64 [[TMP6]], [[TMP3]]
				; STRIDED-NEXT: [[NEXT_GEP2:%.*]] = getelementptr i8, ptr null, i64 [[TMP7]]
				; STRIDED-NEXT: [[TMP8:%.*]] = add i64 [[INDEX]], 2
				; STRIDED-NEXT: [[TMP9:%.*]] = mul i64 [[TMP8]], [[TMP3]]
				; STRIDED-NEXT: [[NEXT_GEP3:%.*]] = getelementptr i8, ptr null, i64 [[TMP9]]
				; STRIDED-NEXT: [[TMP10:%.*]] = add i64 [[INDEX]], 3
				; STRIDED-NEXT: [[TMP11:%.*]] = mul i64 [[TMP10]], [[TMP3]]
				; STRIDED-NEXT: [[NEXT_GEP4:%.*]] = getelementptr i8, ptr null, i64 [[TMP11]]
				; STRIDED-NEXT: [[DOTCAST:%.*]] = trunc i64 [[INDEX]] to i32
				; STRIDED-NEXT: [[OFFSET_IDX:%.*]] = add i32 30, [[DOTCAST]]
				; STRIDED-NEXT: [[TMP12:%.*]] = add i32 [[OFFSET_IDX]], 0
				; STRIDED-NEXT: [[TMP13:%.*]] = add i32 [[OFFSET_IDX]], 1
				; STRIDED-NEXT: [[TMP14:%.*]] = add i32 [[OFFSET_IDX]], 2
				; STRIDED-NEXT: [[TMP15:%.*]] = add i32 [[OFFSET_IDX]], 3
				; STRIDED-NEXT: [[TMP16:%.]] = getelementptr ptr, ptr [[CALL:%.]], i32 [[TMP12]]
				; STRIDED-NEXT: [[TMP17:%.*]] = getelementptr ptr, ptr [[CALL]], i32 [[TMP13]]
				; STRIDED-NEXT: [[TMP18:%.*]] = getelementptr ptr, ptr [[CALL]], i32 [[TMP14]]
				; STRIDED-NEXT: [[TMP19:%.*]] = getelementptr ptr, ptr [[CALL]], i32 [[TMP15]]
				; STRIDED-NEXT: store ptr [[NEXT_GEP]], ptr [[TMP16]], align 4
				; STRIDED-NEXT: store ptr [[NEXT_GEP2]], ptr [[TMP17]], align 4
				; STRIDED-NEXT: store ptr [[NEXT_GEP3]], ptr [[TMP18]], align 4
				; STRIDED-NEXT: store ptr [[NEXT_GEP4]], ptr [[TMP19]], align 4
				; STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; STRIDED-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4294967264
				; STRIDED-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; STRIDED: middle.block:
				; STRIDED-NEXT: [[CMP_N:%.*]] = icmp eq i64 4294967267, 4294967264
				; STRIDED-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; STRIDED: scalar.ph:
				; STRIDED-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ -2, [[MIDDLE_BLOCK]] ], [ 30, [[ENTRY:%.]] ]
				; STRIDED-NEXT: [[BC_RESUME_VAL1:%.*]] = phi ptr [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ null, [[ENTRY]] ]
				; STRIDED-NEXT: br label [[FOR_COND:%.*]]
				; STRIDED: for.cond:
				; STRIDED-NEXT: [[TMP21:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_COND]] ]
				; STRIDED-NEXT: [[P_0:%.]] = phi ptr [ [[BC_RESUME_VAL1]], [[SCALAR_PH]] ], [ [[ADD_PTR:%.]], [[FOR_COND]] ]
				; STRIDED-NEXT: [[ADD_PTR]] = getelementptr i8, ptr [[P_0]], i32 [[MUL]]
				; STRIDED-NEXT: [[ARRAYIDX:%.*]] = getelementptr ptr, ptr [[CALL]], i32 [[TMP21]]
				; STRIDED-NEXT: store ptr [[P_0]], ptr [[ARRAYIDX]], align 4
				; STRIDED-NEXT: [[INC]] = add i32 [[TMP21]], 1
				; STRIDED-NEXT: [[TOBOOL_NOT:%.*]] = icmp eq i32 [[TMP21]], 0
				; STRIDED-NEXT: br i1 [[TOBOOL_NOT]], label [[FOR_END]], label [[FOR_COND]], !llvm.loop [[LOOP3:![0-9]+]]
				; STRIDED: for.end:
				; STRIDED-NEXT: ret void
	;			;
	entry:			entry:
	%mul = shl i32 %0, 1			%mul = shl i32 %0, 1
	br label %for.cond			br label %for.cond

	for.cond: ; preds = %for.body, %entry			for.cond: ; preds = %for.body, %entry
	%1 = phi i32 [ 30, %entry ], [ %inc, %for.cond ]			%1 = phi i32 [ 30, %entry ], [ %inc, %for.cond ]
	%p.0 = phi ptr [ null, %entry ], [ %add.ptr, %for.cond ]			%p.0 = phi ptr [ null, %entry ], [ %add.ptr, %for.cond ]
	%add.ptr = getelementptr i8, ptr %p.0, i32 %mul			%add.ptr = getelementptr i8, ptr %p.0, i32 %mul
	%arrayidx = getelementptr ptr, ptr %call, i32 %1			%arrayidx = getelementptr ptr, ptr %call, i32 %1
	store ptr %p.0, ptr %arrayidx, align 4			store ptr %p.0, ptr %arrayidx, align 4
	%inc = add i32 %1, 1			%inc = add i32 %1, 1
	%tobool.not = icmp eq i32 %1, 0			%tobool.not = icmp eq i32 %1, 0
	br i1 %tobool.not, label %for.end, label %for.cond			br i1 %tobool.not, label %for.end, label %for.cond


	for.end: ; preds = %for.cond			for.end: ; preds = %for.cond
	ret void			ret void
	}			}
				;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
				; CHECK: {{.*}}

llvm/test/Transforms/LoopVectorize/pointer-induction.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -S \| FileCheck %s		; RUN: opt < %s -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -S \| FileCheck --check-prefixes=CHECK,DEFAULT %s
		; RUN: opt < %s -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -lv-strided-pointer-ivs=true -S \| FileCheck --check-prefixes=CHECK,STRIDED %s
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"


; Function Attrs: nofree norecurse nounwind		; Function Attrs: nofree norecurse nounwind
define void @a(ptr readnone %b) {		define void @a(ptr readnone %b) {
; CHECK-LABEL: @a(		; CHECK-LABEL: @a(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[B1:%.]] = ptrtoint ptr [[B:%.]] to i64		; CHECK-NEXT: [[B1:%.]] = ptrtoint ptr [[B:%.]] to i64
▲ Show 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	loop.body: ; preds = %loop.body, %entry
br i1 %c, label %loop.body, label %exit		br i1 %c, label %loop.body, label %exit

exit: ; preds = %loop.body		exit: ; preds = %loop.body
ret void		ret void
}		}

; Test the vector expansion of a non-constant stride pointer IV		; Test the vector expansion of a non-constant stride pointer IV
define void @non_constant_vector_expansion(i32 %0, ptr %call) {		define void @non_constant_vector_expansion(i32 %0, ptr %call) {
; CHECK-LABEL: @non_constant_vector_expansion(		; DEFAULT-LABEL: @non_constant_vector_expansion(
; CHECK-NEXT: entry:		; DEFAULT-NEXT: entry:
; CHECK-NEXT: [[MUL:%.]] = shl i32 [[TMP0:%.]], 1		; DEFAULT-NEXT: [[MUL:%.]] = shl i32 [[TMP0:%.]], 1
; CHECK-NEXT: br label [[FOR_COND:%.*]]		; DEFAULT-NEXT: br label [[FOR_COND:%.*]]
; CHECK: for.cond:		; DEFAULT: for.cond:
; CHECK-NEXT: [[TMP1:%.]] = phi i32 [ 30, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_COND]] ]		; DEFAULT-NEXT: [[TMP1:%.]] = phi i32 [ 30, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_COND]] ]
; CHECK-NEXT: [[P_0:%.]] = phi ptr [ null, [[ENTRY]] ], [ [[ADD_PTR:%.]], [[FOR_COND]] ]		; DEFAULT-NEXT: [[P_0:%.]] = phi ptr [ null, [[ENTRY]] ], [ [[ADD_PTR:%.]], [[FOR_COND]] ]
; CHECK-NEXT: [[ADD_PTR]] = getelementptr i8, ptr [[P_0]], i32 [[MUL]]		; DEFAULT-NEXT: [[ADD_PTR]] = getelementptr i8, ptr [[P_0]], i32 [[MUL]]
; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr ptr, ptr [[CALL:%.]], i32 [[TMP1]]		; DEFAULT-NEXT: [[ARRAYIDX:%.]] = getelementptr ptr, ptr [[CALL:%.]], i32 [[TMP1]]
; CHECK-NEXT: store ptr [[P_0]], ptr [[ARRAYIDX]], align 4		; DEFAULT-NEXT: store ptr [[P_0]], ptr [[ARRAYIDX]], align 4
; CHECK-NEXT: [[INC]] = add i32 [[TMP1]], 1		; DEFAULT-NEXT: [[INC]] = add i32 [[TMP1]], 1
; CHECK-NEXT: [[TOBOOL_NOT:%.*]] = icmp eq i32 [[TMP1]], 0		; DEFAULT-NEXT: [[TOBOOL_NOT:%.*]] = icmp eq i32 [[TMP1]], 0
; CHECK-NEXT: br i1 [[TOBOOL_NOT]], label [[FOR_END:%.*]], label [[FOR_COND]]		; DEFAULT-NEXT: br i1 [[TOBOOL_NOT]], label [[FOR_END:%.*]], label [[FOR_COND]]
; CHECK: for.end:		; DEFAULT: for.end:
; CHECK-NEXT: ret void		; DEFAULT-NEXT: ret void
		;
		; STRIDED-LABEL: @non_constant_vector_expansion(
		; STRIDED-NEXT: entry:
		; STRIDED-NEXT: [[MUL:%.]] = shl i32 [[TMP0:%.]], 1
		; STRIDED-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_SCEVCHECK:%.]]
		; STRIDED: vector.scevcheck:
		; STRIDED-NEXT: br i1 true, label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
		; STRIDED: vector.ph:
		; STRIDED-NEXT: [[TMP1:%.*]] = sext i32 [[MUL]] to i64
		; STRIDED-NEXT: [[TMP2:%.*]] = mul i64 4294967264, [[TMP1]]
		; STRIDED-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr null, i64 [[TMP2]]
		; STRIDED-NEXT: [[TMP3:%.*]] = sext i32 [[MUL]] to i64
		; STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
		; STRIDED: vector.body:
		; STRIDED-NEXT: [[POINTER_PHI:%.]] = phi ptr [ null, [[VECTOR_PH]] ], [ [[PTR_IND:%.]], [[VECTOR_BODY]] ]
		; STRIDED-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; STRIDED-NEXT: [[TMP4:%.*]] = mul i64 [[TMP3]], 4
		; STRIDED-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP3]], i64 0
		; STRIDED-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; STRIDED-NEXT: [[VECTOR_GEP:%.*]] = mul <4 x i64> <i64 0, i64 1, i64 2, i64 3>, [[DOTSPLAT]]
		; STRIDED-NEXT: [[TMP5:%.*]] = getelementptr i8, ptr [[POINTER_PHI]], <4 x i64> [[VECTOR_GEP]]
		; STRIDED-NEXT: [[DOTCAST:%.*]] = trunc i64 [[INDEX]] to i32
		; STRIDED-NEXT: [[OFFSET_IDX:%.*]] = add i32 30, [[DOTCAST]]
		; STRIDED-NEXT: [[TMP6:%.*]] = add i32 [[OFFSET_IDX]], 0
		; STRIDED-NEXT: [[TMP7:%.]] = getelementptr ptr, ptr [[CALL:%.]], i32 [[TMP6]]
		; STRIDED-NEXT: [[TMP8:%.*]] = getelementptr ptr, ptr [[TMP7]], i32 0
		; STRIDED-NEXT: store <4 x ptr> [[TMP5]], ptr [[TMP8]], align 4
		; STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; STRIDED-NEXT: [[PTR_IND]] = getelementptr i8, ptr [[POINTER_PHI]], i64 [[TMP4]]
		; STRIDED-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4294967264
		; STRIDED-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
		; STRIDED: middle.block:
		; STRIDED-NEXT: [[CMP_N:%.*]] = icmp eq i64 4294967267, 4294967264
		; STRIDED-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
		; STRIDED: scalar.ph:
		; STRIDED-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ -2, [[MIDDLE_BLOCK]] ], [ 30, [[ENTRY:%.]] ], [ 30, [[VECTOR_SCEVCHECK]] ]
		; STRIDED-NEXT: [[BC_RESUME_VAL1:%.*]] = phi ptr [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ null, [[ENTRY]] ], [ null, [[VECTOR_SCEVCHECK]] ]
		; STRIDED-NEXT: br label [[FOR_COND:%.*]]
		; STRIDED: for.cond:
		; STRIDED-NEXT: [[TMP10:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_COND]] ]
		; STRIDED-NEXT: [[P_0:%.]] = phi ptr [ [[BC_RESUME_VAL1]], [[SCALAR_PH]] ], [ [[ADD_PTR:%.]], [[FOR_COND]] ]
		; STRIDED-NEXT: [[ADD_PTR]] = getelementptr i8, ptr [[P_0]], i32 [[MUL]]
		; STRIDED-NEXT: [[ARRAYIDX:%.*]] = getelementptr ptr, ptr [[CALL]], i32 [[TMP10]]
		; STRIDED-NEXT: store ptr [[P_0]], ptr [[ARRAYIDX]], align 4
		; STRIDED-NEXT: [[INC]] = add i32 [[TMP10]], 1
		; STRIDED-NEXT: [[TOBOOL_NOT:%.*]] = icmp eq i32 [[TMP10]], 0
		; STRIDED-NEXT: br i1 [[TOBOOL_NOT]], label [[FOR_END]], label [[FOR_COND]], !llvm.loop [[LOOP7:![0-9]+]]
		; STRIDED: for.end:
		; STRIDED-NEXT: ret void
;		;
entry:		entry:
%mul = shl i32 %0, 1		%mul = shl i32 %0, 1
br label %for.cond		br label %for.cond

for.cond: ; preds = %for.body, %entry		for.cond: ; preds = %for.body, %entry
%1 = phi i32 [ 30, %entry ], [ %inc, %for.cond ]		%1 = phi i32 [ 30, %entry ], [ %inc, %for.cond ]
%p.0 = phi ptr [ null, %entry ], [ %add.ptr, %for.cond ]		%p.0 = phi ptr [ null, %entry ], [ %add.ptr, %for.cond ]
Show All 11 Lines