This is an archive of the discontinued LLVM Phabricator instance.

[LV] Generate both scalar and vector integer induction variables
ClosedPublic

Authored by mssimpso on Jul 27 2016, 8:30 AM.

Download Raw Diff

Details

Reviewers

anemet
mkuper

Commits

rG18d88983179a: [LV] Generate both scalar and vector integer induction variables
rL277474: [LV] Generate both scalar and vector integer induction variables

Summary

This patch enables the vectorizer to generate both scalar and vector versions of an integer induction variable for a given loop. Previously, we only generated a scalar induction variable if we knew all its users were going to be scalar. Otherwise, we generated a vector induction variable. In the case of a loop with both scalar and vector users of the induction variable, we would generate the vector induction variable and extract scalar values from it for the scalar users. With this patch, we now generate both versions of the induction variable when there are both scalar and vector users and select which version to use based on whether the user is scalar or vector.

Diff Detail

Event Timeline

mssimpso updated this revision to Diff 65746.Jul 27 2016, 8:30 AM

mssimpso retitled this revision from to [LV] Generate both scalar and vector integer induction variables.

mssimpso updated this object.

mssimpso added reviewers: anemet, mkuper.

mssimpso added subscribers: llvm-commits, mcrosier.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptJul 27 2016, 8:30 AM

mssimpso added a parent revision: D22867: [LV] Untangle the concepts of uniform and scalar.Jul 27 2016, 8:31 AM

Rebased.

Removed unnecessary cast.

Out of curiosity - what does the final generate code end up looking with and without this patch, for cases where we have both a scalar and a vector use?
It seems like it should be better, I'm wondering how much.

lib/Transforms/Vectorize/LoopVectorize.cpp
1976	Why "Int"? I mean, it may be called only from widenIntInduction now, but I don't really see a reason to bake it into the name, the logic is independent of the IV type. Regardless, I find the use of "scalarize" sort of confusing. It evokes the notion of constructing a vector, and then extracting the scalars, which is the opposite of what we're doing. Maybe something like needsScalarInduction, and then a matching name for the variable below? (If you find the current terminology clearer, I won't insist on this, this could be just my own bias.)
1980	Why do we need !isLoopInvariant(U)? If the use is loop invariant, shouldn't it necessarily be a scalar? (Assuming we do need it, why two separate ifs, and not a && ?)

In D22869#499700, @mkuper wrote:

Out of curiosity - what does the final generate code end up looking with and without this patch, for cases where we have both a scalar and a vector use?
It seems like it should be better, I'm wondering how much.

If you don't mind looking at the full code, I've pasted below what we generate for the new test case I've added to induction.ll (VF=2, UF=2, after instcombine).

Without this patch:

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %vec.ind = phi <2 x i64> [ <i64 0, i64 1>, %vector.ph ], [ %vec.ind.next, %vector.body ]
  %vec.ind2 = phi <2 x i32> [ <i32 0, i32 1>, %vector.ph ], [ %vec.ind.next5, %vector.body ]
  %step.add = add <2 x i64> %vec.ind, <i64 2, i64 2>
  %step.add3 = add <2 x i32> %vec.ind2, <i32 2, i32 2>
  %3 = add <2 x i32> %broadcast.splat, %vec.ind2
  %4 = add <2 x i32> %broadcast.splat, %step.add3
  %5 = trunc <2 x i32> %3 to <2 x i16>
  %6 = trunc <2 x i32> %4 to <2 x i16>
  %7 = extractelement <2 x i64> %vec.ind, i32 0
  %8 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %7, i32 1
  %9 = extractelement <2 x i64> %vec.ind, i32 1
  %10 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %9, i32 1
  %11 = extractelement <2 x i64> %step.add, i32 0
  %12 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %11, i32 1
  %13 = extractelement <2 x i64> %step.add, i32 1
  %14 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %13, i32 1
  %15 = extractelement <2 x i16> %5, i32 0
  store i16 %15, i16* %8, align 2
  %16 = extractelement <2 x i16> %5, i32 1
  store i16 %16, i16* %10, align 2
  %17 = extractelement <2 x i16> %6, i32 0
  store i16 %17, i16* %12, align 2
  %18 = extractelement <2 x i16> %6, i32 1
  store i16 %18, i16* %14, align 2
  %index.next = add i64 %index, 4
  %vec.ind.next = add <2 x i64> %vec.ind, <i64 4, i64 4>
  %vec.ind.next5 = add <2 x i32> %vec.ind2, <i32 4, i32 4>
  %19 = icmp eq i64 %index.next, %n.vec
  br i1 %19, label %middle.block, label %vector.body, !llvm.loop !0

With this patch:

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %vec.ind2 = phi <2 x i32> [ <i32 0, i32 1>, %vector.ph ], [ %vec.ind.next5, %vector.body ]
  %3 = or i64 %index, 1
  %4 = or i64 %index, 2
  %5 = or i64 %index, 3
  %step.add3 = add <2 x i32> %vec.ind2, <i32 2, i32 2>
  %6 = add <2 x i32> %broadcast.splat, %vec.ind2
  %7 = add <2 x i32> %broadcast.splat, %step.add3
  %8 = trunc <2 x i32> %6 to <2 x i16>
  %9 = trunc <2 x i32> %7 to <2 x i16>
  %10 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %index, i32 1
  %11 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %3, i32 1
  %12 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %4, i32 1
  %13 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %5, i32 1
  %14 = extractelement <2 x i16> %8, i32 0
  store i16 %14, i16* %10, align 2
  %15 = extractelement <2 x i16> %8, i32 1
  store i16 %15, i16* %11, align 2
  %16 = extractelement <2 x i16> %9, i32 0
  store i16 %16, i16* %12, align 2
  %17 = extractelement <2 x i16> %9, i32 1
  store i16 %17, i16* %13, align 2
  %index.next = add i64 %index, 4
  %vec.ind.next5 = add <2 x i32> %vec.ind2, <i32 4, i32 4>
  %18 = icmp eq i64 %index.next, %n.vec
  br i1 %18, label %middle.block, label %vector.body, !llvm.loop !0

lib/Transforms/Vectorize/LoopVectorize.cpp
1976	I don't feel that strongly about the name, so I'm fine with this.
1980	isScalarAfterVectorization returns true for values or instructions not in the loop. Using !isLoopInVariant ensures we're looking at an instruction in the loop that will be scalar. With the &&, we would get a clang-format line break, so I kept it as two if's because I thought it might be easier to read. Let me know if you prefer one over the other.

Addressed Michael's comments.

Rebased.
Renamed "shouldScalarizeIntInduciton" to "needsScalarInduction"

In D22869#499751, @mssimpso wrote:

In D22869#499700, @mkuper wrote:

Out of curiosity - what does the final generate code end up looking with and without this patch, for cases where we have both a scalar and a vector use?
It seems like it should be better, I'm wondering how much.

If you don't mind looking at the full code, I've pasted below what we generate for the new test case I've added to induction.ll (VF=2, UF=2, after instcombine).

Yes, that does look much better, thanks!
(I actually meant post-CG, but the IR looks so much better that doesn't really matter).

lib/Transforms/Vectorize/LoopVectorize.cpp
1975–1986	Oh, so if your only scalar users are outside the loop, then you don't want to generate a scalar IV. Makes sense. So you still end up with an extract from the vector IV, right? Anyway, if all what you want to check is whether the user is in the loop, why are you checking isLoopInvariant and not Contains (like isScalarAfterVectorization does)?

mssimpso added inline comments.Jul 28 2016, 2:36 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
1975–1986	So you still end up with an extract from the vector IV, right? I don't think so. If there is a user of the induction variable outside the loop (but no scalar users in the loop), yes, the induction variable would remain vector. The external induction variable users are handled in fixupIVUsers (this was recently added I think). And I think the fixup's are based on the canonical induction variable. Anyway, if all what you want to check is whether the user is in the loop, why are you checking isLoopInvariant and not Contains (like isScalarAfterVectorization does)? isLoopInvariant does the additional check that the value is an instruction. But I'm thinking we should just cast the user to an instruction here and use Contains. That should make this more clear. I'll update the patch.

LGTM after the Invariant -> Contains change.

lib/Transforms/Vectorize/LoopVectorize.cpp
1975–1986	The external induction variable users are handled in fixupIVUsers (this was recently added I think). And I think the fixup's are based on the canonical induction variable. Yes, I've added that, I just wanted to make sure this doesn't interfere, and was too lazy to look up what I actually did. :-) Anyway, you're right, we'll always have the canonical IV as scalar, so it's fine.

This revision is now accepted and ready to land.Jul 28 2016, 2:39 PM

Use contains instead of isLoopInvariant.

Thanks for the review, Michael! I'll wait to commit until after D22867 is approved.

mssimpso mentioned this in D22867: [LV] Untangle the concepts of uniform and scalar.Jul 29 2016, 10:08 AM

Similarly to other patch, it would be good to describe the changed functionality in this patch -- you only describe the end state. (Sorry to keep banging on this but given the complexity I think it's crucial to keep a good a record. In generally, it's also a good idea to have the description match the proposed commit log. If you use arcanist, passing --verbatim to arc diff supports this flow. )

Essentially the part that is missing is the before picture which I think is this. We used to use a scalar IV *if and only if* there was no need for a vector IV. Otherwise we would still use the vector IV and extract the scalar value from it. With this new patch we can now generate both and use either variant of the IV depending on the user. Correct?

lib/Transforms/Vectorize/LoopVectorize.cpp
1979–1982	Can we do this with std::any_of?
2007	Why are you calling this 'EntryVal'?

Adam,

That's exactly right. I'll be sure to use that language in the commit log. Echoing what you said for the record: Previously, we only used the scalar IV if we knew all users were going to be scalar. In the case of a loop with both scalar and vector users, we would generate the vector IV and extract scalar values from it for the scalar users. The functional change here is that now when there are both scalar and vector users, we generate both versions of the IV and select which version to use based on whether the user is scalar or vector.

lib/Transforms/Vectorize/LoopVectorize.cpp
1975–1986	Yes, I think so! I'll update the patch.
2007	It's the entry value in WidenMap. For the truncation case, the truncate instruction maps to the new narrow IV. I'm happy to rename this though. I probably went with EntryVal because I couldn't think of anything better at the time.

Addressed Adam's comments.

Rebased.
Used any_of in needsScalarInduction. I also simplified this a bit by removing the explicit check for the IV itself. At the very least, if the IV is scalar, it's update instruction will also be scalar.

Made needsScalarInduction more readable.
Updated summary to reflect functional changes.

anemet added inline comments.Aug 1 2016, 11:03 AM

test/Transforms/LoopVectorize/induction.ll
604–678	I could be wrong, but now this test does not seem to test what it was meant for. I thought the point was to ensure that most of the work to get the vector IV set up is pushed into the preheader. But now it seems that we no longer generate that?

mssimpso added inline comments.Aug 1 2016, 11:20 AM

test/Transforms/LoopVectorize/induction.ll
604–678	Yeah, I think you're right. With the current patch, the vector IV is complete removed after instcombine. We generate both a scalar one and a vector one (because of the store) during vectorization. But because the store is scalarized, instcombine is able to remove the vector IV. If we add a pre-instcombine check for this test, we could check the original functionality as well. What do you think?

anemet added inline comments.Aug 1 2016, 11:34 AM

test/Transforms/LoopVectorize/induction.ll
604–678	Ah, I didn't see that this was a non-consecutive store. What if you make it consecutive to avoid the store to get scalarized (the non-zero based IV would still require us to create a new IV hopefully)?

mssimpso added inline comments.Aug 1 2016, 11:40 AM

test/Transforms/LoopVectorize/induction.ll
604–678	That will work -- we'll be left with both a scalar and a vector IV after instcombine. I'll update the patch. Thanks!

Addressed Adam's comments.

Made store consecutive in test case and added a pre-instcombine check. We now generate both scalar and vector IVs for this test.

LGTM!

Closed by commit rL277474: [LV] Generate both scalar and vector integer induction variables (authored by mssimpso). · Explain WhyAug 2 2016, 8:33 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

100 lines

test/

Transforms/

LoopVectorize/

X86/

scatter_crash.ll

66 lines

induction.ll

197 lines

Diff 66352

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 429 Lines • ▼ Show 20 Lines	void createVectorIntInductionPHI(const InductionDescriptor &II,
VectorParts &Entry, IntegerType *TruncType);		VectorParts &Entry, IntegerType *TruncType);

/// Widen an integer induction variable \p IV. If \p Trunc is provided, the		/// Widen an integer induction variable \p IV. If \p Trunc is provided, the
/// induction variable will first be truncated to the corresponding type. The		/// induction variable will first be truncated to the corresponding type. The
/// widened values are placed in \p Entry.		/// widened values are placed in \p Entry.
void widenIntInduction(PHINode *IV, VectorParts &Entry,		void widenIntInduction(PHINode *IV, VectorParts &Entry,
TruncInst *Trunc = nullptr);		TruncInst *Trunc = nullptr);

		/// Returns true if we should generate a scalar version of \p IV.
		bool needsScalarInduction(Instruction *IV) const;

/// When we go over instructions in the basic block we rely on previous		/// When we go over instructions in the basic block we rely on previous
/// values within the current basic block or on loop invariant values.		/// values within the current basic block or on loop invariant values.
/// When we widen (vectorize) values we place them in the map. If the values		/// When we widen (vectorize) values we place them in the map. If the values
/// are not within the map, they have to be loop invariant, so we simply		/// are not within the map, they have to be loop invariant, so we simply
/// broadcast them into a vector.		/// broadcast them into a vector.
VectorParts &getVectorValue(Value *V);		VectorParts &getVectorValue(Value *V);

/// Try to vectorize the interleaved access group that \p Instr belongs to.		/// Try to vectorize the interleaved access group that \p Instr belongs to.
▲ Show 20 Lines • Show All 1,518 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::createVectorIntInductionPHI(
auto *Br = cast<BranchInst>(LoopVectorLatch->getTerminator());		auto *Br = cast<BranchInst>(LoopVectorLatch->getTerminator());
auto *ICmp = cast<Instruction>(Br->getCondition());		auto *ICmp = cast<Instruction>(Br->getCondition());
LastInduction->moveBefore(ICmp);		LastInduction->moveBefore(ICmp);
LastInduction->setName("vec.ind.next");		LastInduction->setName("vec.ind.next");

VecInd->addIncoming(SteppedStart, LoopVectorPreHeader);		VecInd->addIncoming(SteppedStart, LoopVectorPreHeader);
VecInd->addIncoming(LastInduction, LoopVectorLatch);		VecInd->addIncoming(LastInduction, LoopVectorLatch);
}		}

		bool InnerLoopVectorizer::needsScalarInduction(Instruction *IV) const {
		mkuperUnsubmitted Done Reply Inline Actions Why "Int"? I mean, it may be called only from widenIntInduction now, but I don't really see a reason to bake it into the name, the logic is independent of the IV type. Regardless, I find the use of "scalarize" sort of confusing. It evokes the notion of constructing a vector, and then extracting the scalars, which is the opposite of what we're doing. Maybe something like needsScalarInduction, and then a matching name for the variable below? (If you find the current terminology clearer, I won't insist on this, this could be just my own bias.) mkuper: Why "Int"? I mean, it may be called only from widenIntInduction now, but I don't really see a…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions I don't feel that strongly about the name, so I'm fine with this. mssimpso: I don't feel that strongly about the name, so I'm fine with this.
		if (Legal->isScalarAfterVectorization(IV))
		return true;
		auto isScalarInst = [&](User *U) -> bool {
		auto *I = cast<Instruction>(U);
		mkuperUnsubmitted Not Done Reply Inline Actions Why do we need !isLoopInvariant(U)? If the use is loop invariant, shouldn't it necessarily be a scalar? (Assuming we do need it, why two separate ifs, and not a && ?) mkuper: Why do we need !isLoopInvariant(U)? If the use is loop invariant, shouldn't it necessarily be a…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions isScalarAfterVectorization returns true for values or instructions not in the loop. Using !isLoopInVariant ensures we're looking at an instruction in the loop that will be scalar. With the &&, we would get a clang-format line break, so I kept it as two if's because I thought it might be easier to read. Let me know if you prefer one over the other. mssimpso: isScalarAfterVectorization returns true for values or instructions not in the loop. Using !
		return (OrigLoop->contains(I) && Legal->isScalarAfterVectorization(I));
		};
		anemetUnsubmitted Not Done Reply Inline Actions Can we do this with std::any_of? anemet: Can we do this with std::any_of?
		return any_of(IV->users(), isScalarInst);
		}

void InnerLoopVectorizer::widenIntInduction(PHINode *IV, VectorParts &Entry,		void InnerLoopVectorizer::widenIntInduction(PHINode *IV, VectorParts &Entry,
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Yes, I think so! I'll update the patch. mssimpso: Yes, I think so! I'll update the patch.
		mkuperUnsubmitted Not Done Reply Inline Actions Oh, so if your only scalar users are outside the loop, then you don't want to generate a scalar IV. Makes sense. So you still end up with an extract from the vector IV, right? Anyway, if all what you want to check is whether the user is in the loop, why are you checking isLoopInvariant and not Contains (like isScalarAfterVectorization does)? mkuper: Oh, so if your only scalar users are outside the loop, then you don't want to generate a scalar…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions So you still end up with an extract from the vector IV, right? I don't think so. If there is a user of the induction variable outside the loop (but no scalar users in the loop), yes, the induction variable would remain vector. The external induction variable users are handled in fixupIVUsers (this was recently added I think). And I think the fixup's are based on the canonical induction variable. Anyway, if all what you want to check is whether the user is in the loop, why are you checking isLoopInvariant and not Contains (like isScalarAfterVectorization does)? isLoopInvariant does the additional check that the value is an instruction. But I'm thinking we should just cast the user to an instruction here and use Contains. That should make this more clear. I'll update the patch. mssimpso: > So you still end up with an extract from the vector IV, right? I don't think so. If there is…
		mkuperUnsubmitted Not Done Reply Inline Actions The external induction variable users are handled in fixupIVUsers (this was recently added I think). And I think the fixup's are based on the canonical induction variable. Yes, I've added that, I just wanted to make sure this doesn't interfere, and was too lazy to look up what I actually did. :-) Anyway, you're right, we'll always have the canonical IV as scalar, so it's fine. mkuper: > The external induction variable users are handled in fixupIVUsers (this was recently added I…
TruncInst *Trunc) {		TruncInst *Trunc) {

auto II = Legal->getInductionVars()->find(IV);		auto II = Legal->getInductionVars()->find(IV);
assert(II != Legal->getInductionVars()->end() && "IV is not an induction");		assert(II != Legal->getInductionVars()->end() && "IV is not an induction");

auto ID = II->second;		auto ID = II->second;
assert(IV->getType() == ID.getStartValue()->getType() && "Types must match");		assert(IV->getType() == ID.getStartValue()->getType() && "Types must match");

// If a truncate instruction was provided, get the smaller type.		// If a truncate instruction was provided, get the smaller type.
auto *TruncType = Trunc ? cast<IntegerType>(Trunc->getType()) : nullptr;		auto *TruncType = Trunc ? cast<IntegerType>(Trunc->getType()) : nullptr;

		// The scalar value to broadcast. This will be derived from the canonical
		// induction variable.
		Value *ScalarIV = nullptr;

// The step of the induction.		// The step of the induction.
Value *Step = nullptr;		Value *Step = nullptr;

		// The value from the original loop to which we are mapping the new induction
		// variable.
		Instruction *EntryVal = Trunc ? cast<Instruction>(Trunc) : IV;
		anemetUnsubmitted Not Done Reply Inline Actions Why are you calling this 'EntryVal'? anemet: Why are you calling this 'EntryVal'?
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions It's the entry value in WidenMap. For the truncation case, the truncate instruction maps to the new narrow IV. I'm happy to rename this though. I probably went with EntryVal because I couldn't think of anything better at the time. mssimpso: It's the entry value in WidenMap. For the truncation case, the truncate instruction maps to the…

		// True if we have vectorized the induction variable.
		auto VectorizedIV = false;

		// Determine if we want a scalar version of the induction variable. This is
		// true if the induction variable itself is not widened, or if it has at
		// least one user in the loop that is not widened.
		auto NeedsScalarIV = VF > 1 && needsScalarInduction(EntryVal);

// If the induction variable has a constant integer step value, go ahead and		// If the induction variable has a constant integer step value, go ahead and
// get it now.		// get it now.
if (ID.getConstIntStepValue())		if (ID.getConstIntStepValue())
Step = ID.getConstIntStepValue();		Step = ID.getConstIntStepValue();

// Try to create a new independent vector induction variable. If we can't		// Try to create a new independent vector induction variable. If we can't
// create the phi node, we will splat the scalar induction variable in each		// create the phi node, we will splat the scalar induction variable in each
// loop iteration.		// loop iteration.
if (VF > 1 && IV->getType() == Induction->getType() && Step &&		if (VF > 1 && IV->getType() == Induction->getType() && Step &&
!Legal->isScalarAfterVectorization(IV))		!Legal->isScalarAfterVectorization(EntryVal)) {
return createVectorIntInductionPHI(ID, Entry, TruncType);		createVectorIntInductionPHI(ID, Entry, TruncType);
		VectorizedIV = true;
// The scalar value to broadcast. This will be derived from the canonical		}
// induction variable.
Value *ScalarIV = nullptr;

// Define the scalar induction variable and step values. If we were given a		// If we haven't yet vectorized the induction variable, or if we will create
// truncation type, truncate the canonical induction variable and constant		// a scalar one, we need to define the scalar induction variable and step
// step. Otherwise, derive these values from the induction descriptor.		// values. If we were given a truncation type, truncate the canonical
		// induction variable and constant step. Otherwise, derive these values from
		// the induction descriptor.
		if (!VectorizedIV \|\| NeedsScalarIV) {
if (TruncType) {		if (TruncType) {
assert(Step && "Truncation requires constant integer step");		assert(Step && "Truncation requires constant integer step");
auto StepInt = cast<ConstantInt>(Step)->getSExtValue();		auto StepInt = cast<ConstantInt>(Step)->getSExtValue();
ScalarIV = Builder.CreateCast(Instruction::Trunc, Induction, TruncType);		ScalarIV = Builder.CreateCast(Instruction::Trunc, Induction, TruncType);
Step = ConstantInt::getSigned(TruncType, StepInt);		Step = ConstantInt::getSigned(TruncType, StepInt);
} else {		} else {
ScalarIV = Induction;		ScalarIV = Induction;
auto &DL = OrigLoop->getHeader()->getModule()->getDataLayout();		auto &DL = OrigLoop->getHeader()->getModule()->getDataLayout();
if (IV != OldInduction) {		if (IV != OldInduction) {
ScalarIV = Builder.CreateSExtOrTrunc(ScalarIV, IV->getType());		ScalarIV = Builder.CreateSExtOrTrunc(ScalarIV, IV->getType());
ScalarIV = ID.transform(Builder, ScalarIV, PSE.getSE(), DL);		ScalarIV = ID.transform(Builder, ScalarIV, PSE.getSE(), DL);
ScalarIV->setName("offset.idx");		ScalarIV->setName("offset.idx");
}		}
if (!Step) {		if (!Step) {
SCEVExpander Exp(*PSE.getSE(), DL, "induction");		SCEVExpander Exp(*PSE.getSE(), DL, "induction");
Step = Exp.expandCodeFor(ID.getStep(), ID.getStep()->getType(),		Step = Exp.expandCodeFor(ID.getStep(), ID.getStep()->getType(),
&*Builder.GetInsertPoint());		&*Builder.GetInsertPoint());
}		}
}		}
		}

// Splat the scalar induction variable, and build the necessary step vectors.		// If we haven't yet vectorized the induction variable, splat the scalar
		// induction variable, and build the necessary step vectors.
		if (!VectorizedIV) {
Value *Broadcasted = getBroadcastInstrs(ScalarIV);		Value *Broadcasted = getBroadcastInstrs(ScalarIV);
for (unsigned Part = 0; Part < UF; ++Part)		for (unsigned Part = 0; Part < UF; ++Part)
Entry[Part] = getStepVector(Broadcasted, VF * Part, Step);		Entry[Part] = getStepVector(Broadcasted, VF * Part, Step);
		}

// If an induction variable is only used for counting loop iterations or		// If an induction variable is only used for counting loop iterations or
// calculating addresses, it doesn't need to be widened. Create scalar steps		// calculating addresses, it doesn't need to be widened. Create scalar steps
// that can be used by instructions we will later scalarize. Note that the		// that can be used by instructions we will later scalarize. Note that the
// addition of the scalar steps will not increase the number of instructions		// addition of the scalar steps will not increase the number of instructions
// in the loop in the common case prior to InstCombine. We will be trading		// in the loop in the common case prior to InstCombine. We will be trading
// one vector extract for each scalar step.		// one vector extract for each scalar step.
if (VF > 1 && Legal->isScalarAfterVectorization(IV)) {		if (NeedsScalarIV)
auto *EntryVal = Trunc ? cast<Value>(Trunc) : IV;
buildScalarSteps(ScalarIV, Step, EntryVal);		buildScalarSteps(ScalarIV, Step, EntryVal);
}		}
}

Value InnerLoopVectorizer::getStepVector(Value Val, int StartIdx, Value *Step,		Value InnerLoopVectorizer::getStepVector(Value Val, int StartIdx, Value *Step,
Instruction::BinaryOps BinOp) {		Instruction::BinaryOps BinOp) {
// Create and check the types.		// Create and check the types.
assert(Val->getType()->isVectorTy() && "Must be a vector");		assert(Val->getType()->isVectorTy() && "Must be a vector");
int VLen = Val->getType()->getVectorNumElements();		int VLen = Val->getType()->getVectorNumElements();

Type *STy = Val->getType()->getScalarType();		Type *STy = Val->getType()->getScalarType();
▲ Show 20 Lines • Show All 4,758 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/X86/scatter_crash.ll

	Show All 13 Lines

	; Function Attrs: norecurse nounwind ssp uwtable			; Function Attrs: norecurse nounwind ssp uwtable
	define void @_Z3fn1v() #0 {			define void @_Z3fn1v() #0 {
	; CHECK-LABEL: @_Z3fn1v(			; CHECK-LABEL: @_Z3fn1v(
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX:%.]].next, %vector.body ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX:%.]].next, %vector.body ]
	; CHECK-NEXT: [[VEC_IND:%.*]] = phi <16 x i64> [			; CHECK-NEXT: [[VEC_IND:%.*]] = phi <16 x i64> [
	; CHECK-NEXT: [[VEC_IND3:%.*]] = phi <16 x i64> [			; CHECK-NEXT: [[VEC_IND3:%.*]] = phi <16 x i64> [
				; CHECK-NEXT: [[SHL:%.*]] = shl i64 %index, 1
				; CHECK-NEXT: %offset.idx = add i64 [[SHL]], 8
				; CHECK-NEXT: [[IND00:%.*]] = add i64 %offset.idx, 0
				; CHECK-NEXT: [[IND02:%.*]] = add i64 %offset.idx, 2
				; CHECK-NEXT: [[IND04:%.*]] = add i64 %offset.idx, 4
				; CHECK-NEXT: [[IND06:%.*]] = add i64 %offset.idx, 6
				; CHECK-NEXT: [[IND08:%.*]] = add i64 %offset.idx, 8
				; CHECK-NEXT: [[IND10:%.*]] = add i64 %offset.idx, 10
				; CHECK-NEXT: [[IND12:%.*]] = add i64 %offset.idx, 12
				; CHECK-NEXT: [[IND14:%.*]] = add i64 %offset.idx, 14
				; CHECK-NEXT: [[IND16:%.*]] = add i64 %offset.idx, 16
				; CHECK-NEXT: [[IND18:%.*]] = add i64 %offset.idx, 18
				; CHECK-NEXT: [[IND20:%.*]] = add i64 %offset.idx, 20
				; CHECK-NEXT: [[IND22:%.*]] = add i64 %offset.idx, 22
				; CHECK-NEXT: [[IND24:%.*]] = add i64 %offset.idx, 24
				; CHECK-NEXT: [[IND26:%.*]] = add i64 %offset.idx, 26
				; CHECK-NEXT: [[IND28:%.*]] = add i64 %offset.idx, 28
				; CHECK-NEXT: [[IND30:%.*]] = add i64 %offset.idx, 30
	; CHECK-NEXT: [[TMP10:%.*]] = sub nsw <16 x i64> <i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8>, [[VEC_IND]]			; CHECK-NEXT: [[TMP10:%.*]] = sub nsw <16 x i64> <i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8>, [[VEC_IND]]
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <16 x i64> [[VEC_IND]], i32 0			; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND00]]
	; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[TMP11]]
	; CHECK-NEXT: [[TMP13:%.]] = insertelement <16 x [10 x i32]> undef, [10 x i32]* [[TMP12]], i32 0			; CHECK-NEXT: [[TMP13:%.]] = insertelement <16 x [10 x i32]> undef, [10 x i32]* [[TMP12]], i32 0
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <16 x i64> [[VEC_IND]], i32 1			; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND02]]
	; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[TMP14]]
	; CHECK-NEXT: [[TMP16:%.]] = insertelement <16 x [10 x i32]> [[TMP13]], [10 x i32]* [[TMP15]], i32 1			; CHECK-NEXT: [[TMP16:%.]] = insertelement <16 x [10 x i32]> [[TMP13]], [10 x i32]* [[TMP15]], i32 1
	; CHECK-NEXT: [[TMP17:%.*]] = extractelement <16 x i64> [[VEC_IND]], i32 2			; CHECK-NEXT: [[TMP18:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND04]]
	; CHECK-NEXT: [[TMP18:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[TMP17]]
	; CHECK-NEXT: [[TMP19:%.]] = insertelement <16 x [10 x i32]> [[TMP16]], [10 x i32]* [[TMP18]], i32 2			; CHECK-NEXT: [[TMP19:%.]] = insertelement <16 x [10 x i32]> [[TMP16]], [10 x i32]* [[TMP18]], i32 2
	; CHECK-NEXT: [[TMP20:%.*]] = extractelement <16 x i64> [[VEC_IND]], i32 3			; CHECK-NEXT: [[TMP21:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND06]]
	; CHECK-NEXT: [[TMP21:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[TMP20]]
	; CHECK-NEXT: [[TMP22:%.]] = insertelement <16 x [10 x i32]> [[TMP19]], [10 x i32]* [[TMP21]], i32 3			; CHECK-NEXT: [[TMP22:%.]] = insertelement <16 x [10 x i32]> [[TMP19]], [10 x i32]* [[TMP21]], i32 3
	; CHECK-NEXT: [[TMP23:%.*]] = extractelement <16 x i64> [[VEC_IND]], i32 4			; CHECK-NEXT: [[TMP24:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND08]]
	; CHECK-NEXT: [[TMP24:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[TMP23]]
	; CHECK-NEXT: [[TMP25:%.]] = insertelement <16 x [10 x i32]> [[TMP22]], [10 x i32]* [[TMP24]], i32 4			; CHECK-NEXT: [[TMP25:%.]] = insertelement <16 x [10 x i32]> [[TMP22]], [10 x i32]* [[TMP24]], i32 4
	; CHECK-NEXT: [[TMP26:%.*]] = extractelement <16 x i64> [[VEC_IND]], i32 5			; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND10]]
	; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[TMP26]]
	; CHECK-NEXT: [[TMP28:%.]] = insertelement <16 x [10 x i32]> [[TMP25]], [10 x i32]* [[TMP27]], i32 5			; CHECK-NEXT: [[TMP28:%.]] = insertelement <16 x [10 x i32]> [[TMP25]], [10 x i32]* [[TMP27]], i32 5
	; CHECK-NEXT: [[TMP29:%.*]] = extractelement <16 x i64> [[VEC_IND]], i32 6			; CHECK-NEXT: [[TMP30:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND12]]
	; CHECK-NEXT: [[TMP30:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[TMP29]]
	; CHECK-NEXT: [[TMP31:%.]] = insertelement <16 x [10 x i32]> [[TMP28]], [10 x i32]* [[TMP30]], i32 6			; CHECK-NEXT: [[TMP31:%.]] = insertelement <16 x [10 x i32]> [[TMP28]], [10 x i32]* [[TMP30]], i32 6
	; CHECK-NEXT: [[TMP32:%.*]] = extractelement <16 x i64> [[VEC_IND]], i32 7			; CHECK-NEXT: [[TMP33:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND14]]
	; CHECK-NEXT: [[TMP33:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[TMP32]]
	; CHECK-NEXT: [[TMP34:%.]] = insertelement <16 x [10 x i32]> [[TMP31]], [10 x i32]* [[TMP33]], i32 7			; CHECK-NEXT: [[TMP34:%.]] = insertelement <16 x [10 x i32]> [[TMP31]], [10 x i32]* [[TMP33]], i32 7
	; CHECK-NEXT: [[TMP35:%.*]] = extractelement <16 x i64> [[VEC_IND]], i32 8			; CHECK-NEXT: [[TMP36:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND16]]
	; CHECK-NEXT: [[TMP36:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[TMP35]]
	; CHECK-NEXT: [[TMP37:%.]] = insertelement <16 x [10 x i32]> [[TMP34]], [10 x i32]* [[TMP36]], i32 8			; CHECK-NEXT: [[TMP37:%.]] = insertelement <16 x [10 x i32]> [[TMP34]], [10 x i32]* [[TMP36]], i32 8
	; CHECK-NEXT: [[TMP38:%.*]] = extractelement <16 x i64> [[VEC_IND]], i32 9			; CHECK-NEXT: [[TMP39:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND18]]
	; CHECK-NEXT: [[TMP39:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[TMP38]]
	; CHECK-NEXT: [[TMP40:%.]] = insertelement <16 x [10 x i32]> [[TMP37]], [10 x i32]* [[TMP39]], i32 9			; CHECK-NEXT: [[TMP40:%.]] = insertelement <16 x [10 x i32]> [[TMP37]], [10 x i32]* [[TMP39]], i32 9
	; CHECK-NEXT: [[TMP41:%.*]] = extractelement <16 x i64> [[VEC_IND]], i32 10			; CHECK-NEXT: [[TMP42:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND20]]
	; CHECK-NEXT: [[TMP42:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[TMP41]]
	; CHECK-NEXT: [[TMP43:%.]] = insertelement <16 x [10 x i32]> [[TMP40]], [10 x i32]* [[TMP42]], i32 10			; CHECK-NEXT: [[TMP43:%.]] = insertelement <16 x [10 x i32]> [[TMP40]], [10 x i32]* [[TMP42]], i32 10
	; CHECK-NEXT: [[TMP44:%.*]] = extractelement <16 x i64> [[VEC_IND]], i32 11			; CHECK-NEXT: [[TMP45:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND22]]
	; CHECK-NEXT: [[TMP45:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[TMP44]]
	; CHECK-NEXT: [[TMP46:%.]] = insertelement <16 x [10 x i32]> [[TMP43]], [10 x i32]* [[TMP45]], i32 11			; CHECK-NEXT: [[TMP46:%.]] = insertelement <16 x [10 x i32]> [[TMP43]], [10 x i32]* [[TMP45]], i32 11
	; CHECK-NEXT: [[TMP47:%.*]] = extractelement <16 x i64> [[VEC_IND]], i32 12			; CHECK-NEXT: [[TMP48:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND24]]
	; CHECK-NEXT: [[TMP48:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[TMP47]]
	; CHECK-NEXT: [[TMP49:%.]] = insertelement <16 x [10 x i32]> [[TMP46]], [10 x i32]* [[TMP48]], i32 12			; CHECK-NEXT: [[TMP49:%.]] = insertelement <16 x [10 x i32]> [[TMP46]], [10 x i32]* [[TMP48]], i32 12
	; CHECK-NEXT: [[TMP50:%.*]] = extractelement <16 x i64> [[VEC_IND]], i32 13			; CHECK-NEXT: [[TMP51:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND26]]
	; CHECK-NEXT: [[TMP51:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[TMP50]]
	; CHECK-NEXT: [[TMP52:%.]] = insertelement <16 x [10 x i32]> [[TMP49]], [10 x i32]* [[TMP51]], i32 13			; CHECK-NEXT: [[TMP52:%.]] = insertelement <16 x [10 x i32]> [[TMP49]], [10 x i32]* [[TMP51]], i32 13
	; CHECK-NEXT: [[TMP53:%.*]] = extractelement <16 x i64> [[VEC_IND]], i32 14			; CHECK-NEXT: [[TMP54:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND28]]
	; CHECK-NEXT: [[TMP54:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[TMP53]]
	; CHECK-NEXT: [[TMP55:%.]] = insertelement <16 x [10 x i32]> [[TMP52]], [10 x i32]* [[TMP54]], i32 14			; CHECK-NEXT: [[TMP55:%.]] = insertelement <16 x [10 x i32]> [[TMP52]], [10 x i32]* [[TMP54]], i32 14
	; CHECK-NEXT: [[TMP56:%.*]] = extractelement <16 x i64> [[VEC_IND]], i32 15			; CHECK-NEXT: [[TMP57:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND30]]
	; CHECK-NEXT: [[TMP57:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[TMP56]]
	; CHECK-NEXT: [[TMP58:%.]] = insertelement <16 x [10 x i32]> [[TMP55]], [10 x i32]* [[TMP57]], i32 15			; CHECK-NEXT: [[TMP58:%.]] = insertelement <16 x [10 x i32]> [[TMP55]], [10 x i32]* [[TMP57]], i32 15
	; CHECK-NEXT: [[TMP59:%.*]] = add nsw <16 x i64> [[TMP10]], [[VEC_IND3]]			; CHECK-NEXT: [[TMP59:%.*]] = add nsw <16 x i64> [[TMP10]], [[VEC_IND3]]
	; CHECK-NEXT: [[TMP60:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 0			; CHECK-NEXT: [[TMP60:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 0
	; CHECK-NEXT: [[TMP61:%.*]] = extractelement <16 x i64> [[TMP59]], i32 0			; CHECK-NEXT: [[TMP61:%.*]] = extractelement <16 x i64> [[TMP59]], i32 0
	; CHECK-NEXT: [[TMP62:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP60]], i64 [[TMP61]], i64 0			; CHECK-NEXT: [[TMP62:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP60]], i64 [[TMP61]], i64 0
	; CHECK-NEXT: [[TMP63:%.]] = insertelement <16 x i32> undef, i32* [[TMP62]], i32 0			; CHECK-NEXT: [[TMP63:%.]] = insertelement <16 x i32> undef, i32* [[TMP62]], i32 0
	; CHECK-NEXT: [[TMP64:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 1			; CHECK-NEXT: [[TMP64:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 1
	; CHECK-NEXT: [[TMP65:%.*]] = extractelement <16 x i64> [[TMP59]], i32 1			; CHECK-NEXT: [[TMP65:%.*]] = extractelement <16 x i64> [[TMP59]], i32 1
	▲ Show 20 Lines • Show All 141 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/induction.ll

Show First 20 Lines • Show All 214 Lines • ▼ Show 20 Lines
; INTERLEAVE: %[[i0:.+]] = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]		; INTERLEAVE: %[[i0:.+]] = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
; INTERLEAVE: %[[i1:.+]] = or i64 %[[i0]], 1		; INTERLEAVE: %[[i1:.+]] = or i64 %[[i0]], 1
; INTERLEAVE: %[[i2:.+]] = or i64 %[[i0]], 2		; INTERLEAVE: %[[i2:.+]] = or i64 %[[i0]], 2
; INTERLEAVE: %[[i3:.+]] = or i64 %[[i0]], 3		; INTERLEAVE: %[[i3:.+]] = or i64 %[[i0]], 3
; INTERLEAVE: %[[i4:.+]] = or i64 %[[i0]], 4		; INTERLEAVE: %[[i4:.+]] = or i64 %[[i0]], 4
; INTERLEAVE: %[[i5:.+]] = or i64 %[[i0]], 5		; INTERLEAVE: %[[i5:.+]] = or i64 %[[i0]], 5
; INTERLEAVE: %[[i6:.+]] = or i64 %[[i0]], 6		; INTERLEAVE: %[[i6:.+]] = or i64 %[[i0]], 6
; INTERLEAVE: %[[i7:.+]] = or i64 %[[i0]], 7		; INTERLEAVE: %[[i7:.+]] = or i64 %[[i0]], 7
; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i0]], i32 1		; INTERLEAVE: getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %[[i0]], i32 1
; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i1]], i32 1		; INTERLEAVE: getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %[[i1]], i32 1
; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i2]], i32 1		; INTERLEAVE: getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %[[i2]], i32 1
; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i3]], i32 1		; INTERLEAVE: getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %[[i3]], i32 1
; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i4]], i32 1		; INTERLEAVE: getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %[[i4]], i32 1
; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i5]], i32 1		; INTERLEAVE: getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %[[i5]], i32 1
; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i6]], i32 1		; INTERLEAVE: getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %[[i6]], i32 1
; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i7]], i32 1		; INTERLEAVE: getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %[[i7]], i32 1

%pair = type { i32, i32 }		%pair.i32 = type { i32, i32 }
define void @scalarize_induction_variable_03(%pair *%p, i32 %y, i64 %n) {		define void @scalarize_induction_variable_03(%pair.i32 *%p, i32 %y, i64 %n) {
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]		%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
%f = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 1		%f = getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %i, i32 1
%0 = load i32, i32* %f, align 8		%0 = load i32, i32* %f, align 8
%1 = xor i32 %0, %y		%1 = xor i32 %0, %y
store i32 %1, i32* %f, align 8		store i32 %1, i32* %f, align 8
%i.next = add nuw nsw i64 %i, 1		%i.next = add nuw nsw i64 %i, 1
%cond = icmp slt i64 %i.next, %n		%cond = icmp slt i64 %i.next, %n
br i1 %cond, label %for.body, label %for.end		br i1 %cond, label %for.body, label %for.end

for.end:		for.end:
Show All 11 Lines
; INTERLEAVE: %[[i0:.+]] = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]		; INTERLEAVE: %[[i0:.+]] = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
; INTERLEAVE: %[[i1:.+]] = or i64 %[[i0]], 1		; INTERLEAVE: %[[i1:.+]] = or i64 %[[i0]], 1
; INTERLEAVE: %[[i2:.+]] = or i64 %[[i0]], 2		; INTERLEAVE: %[[i2:.+]] = or i64 %[[i0]], 2
; INTERLEAVE: %[[i3:.+]] = or i64 %[[i0]], 3		; INTERLEAVE: %[[i3:.+]] = or i64 %[[i0]], 3
; INTERLEAVE: %[[i4:.+]] = or i64 %[[i0]], 4		; INTERLEAVE: %[[i4:.+]] = or i64 %[[i0]], 4
; INTERLEAVE: %[[i5:.+]] = or i64 %[[i0]], 5		; INTERLEAVE: %[[i5:.+]] = or i64 %[[i0]], 5
; INTERLEAVE: %[[i6:.+]] = or i64 %[[i0]], 6		; INTERLEAVE: %[[i6:.+]] = or i64 %[[i0]], 6
; INTERLEAVE: %[[i7:.+]] = or i64 %[[i0]], 7		; INTERLEAVE: %[[i7:.+]] = or i64 %[[i0]], 7
; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i0]], i32 1		; INTERLEAVE: getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %[[i0]], i32 1
; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i1]], i32 1		; INTERLEAVE: getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %[[i1]], i32 1
; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i2]], i32 1		; INTERLEAVE: getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %[[i2]], i32 1
; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i3]], i32 1		; INTERLEAVE: getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %[[i3]], i32 1
; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i4]], i32 1		; INTERLEAVE: getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %[[i4]], i32 1
; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i5]], i32 1		; INTERLEAVE: getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %[[i5]], i32 1
; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i6]], i32 1		; INTERLEAVE: getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %[[i6]], i32 1
; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i7]], i32 1		; INTERLEAVE: getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %[[i7]], i32 1

define void @scalarize_induction_variable_04(i32* %a, %pair* %p, i32 %n) {		define void @scalarize_induction_variable_04(i32* %a, %pair.i32* %p, i32 %n) {
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%i = phi i64 [ %i.next, %for.body ], [ 0, %entry]		%i = phi i64 [ %i.next, %for.body ], [ 0, %entry]
%0 = shl nsw i64 %i, 2		%0 = shl nsw i64 %i, 2
%1 = getelementptr inbounds i32, i32* %a, i64 %0		%1 = getelementptr inbounds i32, i32* %a, i64 %0
%2 = load i32, i32* %1, align 1		%2 = load i32, i32* %1, align 1
%3 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 1		%3 = getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %i, i32 1
store i32 %2, i32* %3, align 1		store i32 %2, i32* %3, align 1
%i.next = add nuw nsw i64 %i, 1		%i.next = add nuw nsw i64 %i, 1
%4 = trunc i64 %i.next to i32		%4 = trunc i64 %i.next to i32
%cond = icmp eq i32 %4, %n		%cond = icmp eq i32 %4, %n
br i1 %cond, label %for.end, label %for.body		br i1 %cond, label %for.end, label %for.body

for.end:		for.end:
ret void		ret void
}		}

		; Ensure we generate both a vector and a scalar induction variable. In this
		; test, the induction variable is used by an instruction that will be
		; vectorized (trunc) as well as an instruction that will remain in scalar form
		; (gepelementptr).
		;
		; CHECK-LABEL: @iv_vector_and_scalar_users(
		; CHECK: vector.body:
		; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
		; CHECK: %vec.ind = phi <2 x i64> [ <i64 0, i64 1>, %vector.ph ], [ %vec.ind.next, %vector.body ]
		; CHECK: %vec.ind1 = phi <2 x i32> [ <i32 0, i32 1>, %vector.ph ], [ %vec.ind.next2, %vector.body ]
		; CHECK: %[[i0:.+]] = add i64 %index, 0
		; CHECK: %[[i1:.+]] = add i64 %index, 1
		; CHECK: getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %[[i0]], i32 1
		; CHECK: getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %[[i1]], i32 1
		; CHECK: %index.next = add i64 %index, 2
		; CHECK: %vec.ind.next = add <2 x i64> %vec.ind, <i64 2, i64 2>
		; CHECK: %vec.ind.next2 = add <2 x i32> %vec.ind1, <i32 2, i32 2>
		;
		; IND-LABEL: @iv_vector_and_scalar_users(
		; IND: vector.body:
		; IND: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
		; IND: %vec.ind1 = phi <2 x i32> [ <i32 0, i32 1>, %vector.ph ], [ %vec.ind.next2, %vector.body ]
		; IND: %[[i1:.+]] = or i64 %index, 1
		; IND: getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %index, i32 1
		; IND: getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %[[i1]], i32 1
		; IND: %index.next = add i64 %index, 2
		; IND: %vec.ind.next2 = add <2 x i32> %vec.ind1, <i32 2, i32 2>
		;
		; UNROLL-LABEL: @iv_vector_and_scalar_users(
		; UNROLL: vector.body:
		; UNROLL: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
		; UNROLL: %vec.ind2 = phi <2 x i32> [ <i32 0, i32 1>, %vector.ph ], [ %vec.ind.next5, %vector.body ]
		; UNROLL: %[[i1:.+]] = or i64 %index, 1
		; UNROLL: %[[i2:.+]] = or i64 %index, 2
		; UNROLL: %[[i3:.+]] = or i64 %index, 3
		; UNROLL: %step.add3 = add <2 x i32> %vec.ind2, <i32 2, i32 2>
		; UNROLL: getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %index, i32 1
		; UNROLL: getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %[[i1]], i32 1
		; UNROLL: getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %[[i2]], i32 1
		; UNROLL: getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %[[i3]], i32 1
		; UNROLL: %index.next = add i64 %index, 4
		; UNROLL: %vec.ind.next5 = add <2 x i32> %vec.ind2, <i32 4, i32 4>

		%pair.i16 = type { i16, i16 }
		define void @iv_vector_and_scalar_users(%pair.i16* %p, i32 %a, i32 %n) {
		entry:
		br label %for.body

		for.body:
		%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
		%0 = trunc i64 %i to i32
		%1 = add i32 %a, %0
		%2 = trunc i32 %1 to i16
		%3 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %i, i32 1
		store i16 %2, i16* %3, align 2
		%i.next = add nuw nsw i64 %i, 1
		%4 = trunc i64 %i.next to i32
		%cond = icmp eq i32 %4, %n
		br i1 %cond, label %for.end, label %for.body

		for.end:
		ret void
		}

; Make sure that the loop exit count computation does not overflow for i8 and		; Make sure that the loop exit count computation does not overflow for i8 and
; i16. The exit count of these loops is i8/i16 max + 1. If we don't cast the		; i16. The exit count of these loops is i8/i16 max + 1. If we don't cast the
; induction variable to a bigger type the exit count computation will overflow		; induction variable to a bigger type the exit count computation will overflow
; to 0.		; to 0.
; PR17532		; PR17532

; CHECK-LABEL: i8_loop		; CHECK-LABEL: i8_loop
; CHECK: icmp eq i32 {{.*}}, 256		; CHECK: icmp eq i32 {{.*}}, 256
▲ Show 20 Lines • Show All 228 Lines • ▼ Show 20 Lines	for.body:
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, %k		%exitcond = icmp eq i64 %indvars.iv.next, %k
br i1 %exitcond, label %exit, label %for.body		br i1 %exitcond, label %exit, label %for.body

exit:		exit:
ret void		ret void
}		}

; IND-LABEL: nonprimary		; CHECK-LABEL: @nonprimary(
; IND-LABEL: vector.ph		; CHECK: vector.ph:
		; CHECK: %[[INSERT:.*]] = insertelement <2 x i32> undef, i32 %i, i32 0
		; CHECK: %[[SPLAT:.*]] = shufflevector <2 x i32> %[[INSERT]], <2 x i32> undef, <2 x i32> zeroinitializer
		; CHECK: %[[START:.*]] = add <2 x i32> %[[SPLAT]], <i32 0, i32 1>
		; CHECK: vector.body:
		; CHECK: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
		; CHECK: %vec.ind = phi <2 x i32> [ %[[START]], %vector.ph ], [ %vec.ind.next, %vector.body ]
		; CHECK: %offset.idx = add i32 %i, %index
		; CHECK: %[[A1:.*]] = add i32 %offset.idx, 0
		; CHECK: %[[A2:.*]] = add i32 %offset.idx, 1
		; CHECK: %[[G1:.]] = getelementptr inbounds i32, i32 %a, i32 %[[A1]]
		; CHECK: %[[G2:.]] = getelementptr inbounds i32, i32 %a, i32 %[[A2]]
		; CHECK: %[[G3:.]] = getelementptr i32, i32 %[[G1]], i32 0
		; CHECK: %[[B1:.]] = bitcast i32 %[[G3]] to <2 x i32>*
		; CHECK: store <2 x i32> %vec.ind, <2 x i32>* %[[B1]]
		; CHECK: %index.next = add i32 %index, 2
		; CHECK: %vec.ind.next = add <2 x i32> %vec.ind, <i32 2, i32 2>
		; CHECK: %[[CMP:.*]] = icmp eq i32 %index.next, %n.vec
		; CHECK: br i1 %[[CMP]]
		;
		; IND-LABEL: @nonprimary(
		; IND: vector.ph:
; IND: %[[INSERT:.*]] = insertelement <2 x i32> undef, i32 %i, i32 0		; IND: %[[INSERT:.*]] = insertelement <2 x i32> undef, i32 %i, i32 0
; IND: %[[SPLAT:.*]] = shufflevector <2 x i32> %[[INSERT]], <2 x i32> undef, <2 x i32> zeroinitializer		; IND: %[[SPLAT:.*]] = shufflevector <2 x i32> %[[INSERT]], <2 x i32> undef, <2 x i32> zeroinitializer
; IND: %[[START:.*]] = add <2 x i32> %[[SPLAT]], <i32 0, i32 42>		; IND: %[[START:.*]] = add <2 x i32> %[[SPLAT]], <i32 0, i32 1>
; IND-LABEL: vector.body:		; IND: vector.body:
; IND: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]		; IND: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
; IND: %vec.ind = phi <2 x i32> [ %[[START]], %vector.ph ], [ %vec.ind.next, %vector.body ]		; IND: %vec.ind = phi <2 x i32> [ %[[START]], %vector.ph ], [ %vec.ind.next, %vector.body ]
		; IND: %[[A1:.*]] = add i32 %index, %i
		; IND: %[[S1:.*]] = sext i32 %[[A1]] to i64
		; IND: %[[G1:.]] = getelementptr inbounds i32, i32 %a, i64 %[[S1]]
		; IND: %[[B1:.]] = bitcast i32 %[[G1]] to <2 x i32>*
		; IND: store <2 x i32> %vec.ind, <2 x i32>* %[[B1]]
; IND: %index.next = add i32 %index, 2		; IND: %index.next = add i32 %index, 2
; IND: %vec.ind.next = add <2 x i32> %vec.ind, <i32 84, i32 84>		; IND: %vec.ind.next = add <2 x i32> %vec.ind, <i32 2, i32 2>
; IND: %[[CMP:.*]] = icmp eq i32 %index.next		; IND: %[[CMP:.*]] = icmp eq i32 %index.next, %n.vec
; IND: br i1 %[[CMP]]		; IND: br i1 %[[CMP]]
; UNROLL-LABEL: nonprimary		;
; UNROLL-LABEL: vector.ph		; UNROLL-LABEL: @nonprimary(
		; UNROLL: vector.ph:
; UNROLL: %[[INSERT:.*]] = insertelement <2 x i32> undef, i32 %i, i32 0		; UNROLL: %[[INSERT:.*]] = insertelement <2 x i32> undef, i32 %i, i32 0
; UNROLL: %[[SPLAT:.*]] = shufflevector <2 x i32> %[[INSERT]], <2 x i32> undef, <2 x i32> zeroinitializer		; UNROLL: %[[SPLAT:.*]] = shufflevector <2 x i32> %[[INSERT]], <2 x i32> undef, <2 x i32> zeroinitializer
; UNROLL: %[[START:.*]] = add <2 x i32> %[[SPLAT]], <i32 0, i32 42>		; UNROLL: %[[START:.*]] = add <2 x i32> %[[SPLAT]], <i32 0, i32 1>
; UNROLL-LABEL: vector.body:		; UNROLL: vector.body:
; UNROLL: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]		; UNROLL: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
; UNROLL: %vec.ind = phi <2 x i32> [ %[[START]], %vector.ph ], [ %vec.ind.next, %vector.body ]		; UNROLL: %vec.ind = phi <2 x i32> [ %[[START]], %vector.ph ], [ %vec.ind.next, %vector.body ]
; UNROLL: %step.add = add <2 x i32> %vec.ind, <i32 84, i32 84>		; UNROLL: %step.add = add <2 x i32> %vec.ind, <i32 2, i32 2>
		; UNROLL: %[[A1:.*]] = add i32 %index, %i
		; UNROLL: %[[S1:.*]] = sext i32 %[[A1]] to i64
		; UNROLL: %[[G1:.]] = getelementptr inbounds i32, i32 %a, i64 %[[S1]]
		; UNROLL: %[[B1:.]] = bitcast i32 %[[G1]] to <2 x i32>*
		; UNROLL: store <2 x i32> %vec.ind, <2 x i32>* %[[B1]]
		; UNROLL: %[[G2:.]] = getelementptr i32, i32 %[[G1]], i64 2
		; UNROLL: %[[B2:.]] = bitcast i32 %[[G2]] to <2 x i32>*
		; UNROLL: store <2 x i32> %step.add, <2 x i32>* %[[B2]]
; UNROLL: %index.next = add i32 %index, 4		; UNROLL: %index.next = add i32 %index, 4
; UNROLL: %vec.ind.next = add <2 x i32> %vec.ind, <i32 168, i32 168>		; UNROLL: %vec.ind.next = add <2 x i32> %vec.ind, <i32 4, i32 4>
; UNROLL: %[[CMP:.*]] = icmp eq i32 %index.next		; UNROLL: %[[CMP:.*]] = icmp eq i32 %index.next, %n.vec
; UNROLL: br i1 %[[CMP]]		; UNROLL: br i1 %[[CMP]]
define void @nonprimary(i32* nocapture %a, i32 %start, i32 %i, i32 %k) {		define void @nonprimary(i32* nocapture %a, i32 %start, i32 %i, i32 %k) {
for.body.preheader:		for.body.preheader:
br label %for.body		br label %for.body

for.body:		for.body:
%indvars.iv = phi i32 [ %indvars.iv.next, %for.body ], [ %i, %for.body.preheader ]		%indvars.iv = phi i32 [ %indvars.iv.next, %for.body ], [ %i, %for.body.preheader ]
%arrayidx = getelementptr inbounds i32, i32* %a, i32 %indvars.iv		%arrayidx = getelementptr inbounds i32, i32* %a, i32 %indvars.iv
store i32 %indvars.iv, i32* %arrayidx, align 4		store i32 %indvars.iv, i32* %arrayidx, align 4
%indvars.iv.next = add nuw nsw i32 %indvars.iv, 42		%indvars.iv.next = add nuw nsw i32 %indvars.iv, 1
%exitcond = icmp eq i32 %indvars.iv.next, %k		%exitcond = icmp eq i32 %indvars.iv.next, %k
br i1 %exitcond, label %exit, label %for.body		br i1 %exitcond, label %exit, label %for.body

exit:		exit:
ret void		ret void
}		}
		anemetUnsubmitted Not Done Reply Inline Actions I could be wrong, but now this test does not seem to test what it was meant for. I thought the point was to ensure that most of the work to get the vector IV set up is pushed into the preheader. But now it seems that we no longer generate that? anemet: I could be wrong, but now this test does not seem to test what it was meant for. I thought the…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Yeah, I think you're right. With the current patch, the vector IV is complete removed after instcombine. We generate both a scalar one and a vector one (because of the store) during vectorization. But because the store is scalarized, instcombine is able to remove the vector IV. If we add a pre-instcombine check for this test, we could check the original functionality as well. What do you think? mssimpso: Yeah, I think you're right. With the current patch, the vector IV is complete removed after…
		anemetUnsubmitted Not Done Reply Inline Actions Ah, I didn't see that this was a non-consecutive store. What if you make it consecutive to avoid the store to get scalarized (the non-zero based IV would still require us to create a new IV hopefully)? anemet: Ah, I didn't see that this was a non-consecutive store. What if you make it consecutive to…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions That will work -- we'll be left with both a scalar and a vector IV after instcombine. I'll update the patch. Thanks! mssimpso: That will work -- we'll be left with both a scalar and a vector IV after instcombine. I'll…