This is an archive of the discontinued LLVM Phabricator instance.

[LV] Don't widen trivial induction variables
ClosedPublic

Authored by mssimpso on Jun 22 2016, 1:32 PM.

Download Raw Diff

Details

Reviewers

anemet
nadav
jmolloy
mkuper
sbaranga

Commits

rG433cb1dfe31a: [LV] Don't widen trivial induction variables
rL274627: [LV] Don't widen trivial induction variables

Summary

We currently always try to vectorize induction variables. However, if an induction variable is only used for counting loop iterations or computing addresses with getelementptr instructions, this doesn't really make sense. We only see a benefit from vectorizing induction variables if they are used in non-trivial ways. For example, here the induction variable is stored to memory:

for (int i = 0; i < n; ++i)
  a[i] = i;

Needlessly vectorizing causes us to generate unnecessary phi nodes, extracts, and other computation inside the loop. InstCombine can sometimes clean up the code we generate, but in general it cannot do so completely, especially when the unroll factor is greater than one (we create dependent step vectors that are difficult to simplify). It would be better if we didn't generate poor code to begin with.

This patch checks that induction variables are used in non-trivial ways before deciding to widen them. If an induction variable is only used for counting iterations or computing addresses, we scalarize it instead.

This change results in a static reduction in the number of instructions in nearly every benchmark in spec2000 and spec2006. In addition, we've observed significant performance improvements (> 1%) in several of them with no non-noise reductions. Experiments were conducted on Kryo.

Please take a look.

Diff Detail

Repository: rL LLVM

Event Timeline

mssimpso updated this revision to Diff 61599.Jun 22 2016, 1:32 PM

mssimpso retitled this revision from to [LV] Don't widen trivial induction variables.

mssimpso updated this object.

mssimpso added reviewers: mkuper, sbaranga, jmolloy, nadav.

mssimpso added subscribers: llvm-commits, mcrosier.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptJun 22 2016, 1:32 PM

Thanks a lot for cleaning up my mess, the original patch was probably rather overzealous. :-\
The cases I've looked at were nicely cleaned up by InstCombine, but I guess having it work in the general case was too much to expect.

lib/Transforms/Vectorize/LoopVectorize.cpp
2189 ↗	(On Diff #61599)	This looks a bit weird to me.Do you have a test-case where this works, but instcombine fails to simplify the regular step vector? Perhaps it would be better to fix instcombine, I can look into it. In any case, that shouldn't block the rest of the patch - (a) not widening the phi when the uses are trivial, and (b) using the scalarized step vector instead of the regular one should be independently good, right?
4145 ↗	(On Diff #61599)	We are already doing something very similar in collectValuesToIgnore(), for basically the same reason - to estimate whether we'll end up needing a vector IV or not (and when InstCombine can't do the cleanup, my patch breaks that logic as well). Can you use collectValuesToIgnore() instead? Either by checking whether all the phi users are in ValuesToIgnore, or by marking the PHI itself as "should/should not be widened" during collectValuesToIgnore(), if that makes sense?

Michael,

Thanks for the quick feedback. But I definitely wasn't trying to clean up after you! Your work creating vector phi's is certainly a step in the right direction. I just came across some examples where we were currently generating bad code (even before your changes). Please see my responses inline. Thanks!

Matt.

lib/Transforms/Vectorize/LoopVectorize.cpp
2189 ↗	(On Diff #61599)	This looks a bit weird to me.Do you have a test-case where this works, but instcombine fails to simplify the regular step vector? Sure. Here's a simple example with interleaved access vectorization enabled. In the test case below, if we use the regular step vector, we end up creating unneeded splat inserts and extracts inside the loop. A scalar induction variable is definitely preferable here since we don't do anything exciting with it. ; opt -S < %s -loop-vectorize -force-vector-width=2 -force-vector-interleave=2 -enable-interleaved-mem-accesses -instcombine target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128" target triple = "aarch64--linux-gnu" %pair = type { i64, i64 } define void @interleaved(%pair %p, i64 %y, i64 %n) { entry: br label %for.body for.body: %i = phi i64 [ %i.next, %for.body ], [ 0, %entry ] %f1 = getelementptr inbounds %pair, %pair %p, i64 %i, i32 1 %r0 = load i64, i64* %f1, align 8 %r3 = xor i64 %r0, %y store i64 %r3, i64* %f1, align 8 %i.next = add nuw nsw i64 %i, 1 %cond = icmp slt i64 %i.next, %n br i1 %cond, label %for.body, label %for.end for.end: ret void } I've seen other cases with what I've been working on that lead me to believe fixing this kind of thing in InstCombine is not going to be that fruitful. Things get very ugly when interleaved access vectorization is enabled with conditional stores vectorization, for example. I think it's better to do the right thing at the outset.
4145 ↗	(On Diff #61599)	Ah, right. I forgot we already checked for this in collectValuesToIgnore. Thanks for pointing this out! I will update the patch.

mkuper added inline comments.Jun 22 2016, 3:06 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
2189 ↗	(On Diff #61599)	Sorry, I wasn't very clear. I didn't mean an example of when a scalar IV is preferable. You're completely right about that. I meant a case where, in addition to the scalar IV, we want the pre-scalarized step vector, because the vectorized one doesn't simplify. Or is this the same example?

mssimpso added inline comments.Jun 22 2016, 6:17 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
2189 ↗	(On Diff #61599)	No, you were clear. This is the same example -- it doesn't simplify. Without the pre-scalarized step vector, we're left with the splats, extracts, etc. I will add it as a test case to the patch.

Hi Matt,

This seems to be related to https://llvm.org/bugs/show_bug.cgi?id=27881#c6

I am also wondering if this is general goodness that should be in instcombine rather than adding more special-casing to the already complex vectorizer.

Adam

mkuper added inline comments.Jun 22 2016, 11:59 PM

lib/Transforms/Vectorize/LoopVectorize.cpp

2189 ↗

(On Diff #61599)

Ah, ok, I was confused because you wrote that "a scalar induction variable is definitely preferable".
Anyway, this does look like an InstCombine issue.
We get:

%broadcast.splatinsert = insertelement <2 x i64> undef, i64 %index, i32 0
%broadcast.splat = shufflevector <2 x i64> %broadcast.splatinsert, <2 x i64> undef, <2 x i32> zeroinitializer
%induction = add <2 x i64> %broadcast.splat, <i64 0, i64 1>
%induction1 = add <2 x i64> %broadcast.splat, <i64 2, i64 3>
%3 = extractelement <2 x i64> %induction, i32 0
%4 = getelementptr inbounds %pair, %pair* %p, i64 %3, i32 1
%5 = insertelement <2 x i64*> undef, i64* %4, i32 0
%6 = extractelement <2 x i64> %induction, i32 1
%7 = getelementptr inbounds %pair, %pair* %p, i64 %6, i32 1
%8 = insertelement <2 x i64*> %5, i64* %7, i32 1
%9 = extractelement <2 x i64> %induction1, i32 0
%10 = getelementptr inbounds %pair, %pair* %p, i64 %9, i32 1
%11 = insertelement <2 x i64*> undef, i64* %10, i32 0
%12 = extractelement <2 x i64> %induction1, i32 1
%13 = getelementptr inbounds %pair, %pair* %p, i64 %12, i32 1
%14 = insertelement <2 x i64*> %11, i64* %13, i32 1

InstCombine cleans this up to:

%broadcast.splatinsert = insertelement <2 x i64> undef, i64 %index, i32 0
%broadcast.splat = shufflevector <2 x i64> %broadcast.splatinsert, <2 x i64> undef, <2 x i32> zeroinitializer
%induction1 = add <2 x i64> %broadcast.splat, <i64 2, i64 3>
%3 = getelementptr inbounds %pair, %pair* %p, i64 %index, i32 1
%4 = or i64 %index, 1
%5 = getelementptr inbounds %pair, %pair* %p, i64 %4, i32 1
%6 = extractelement <2 x i64> %induction1, i32 0
%7 = getelementptr inbounds %pair, %pair* %p, i64 %6, i32 1
%8 = extractelement <2 x i64> %induction1, i32 1
%9 = getelementptr inbounds %pair, %pair* %p, i64 %8, i32 1
%10 = bitcast i64* %3 to <4 x i64>*
%11 = bitcast i64* %7 to <4 x i64>*

I don't immediately see why it should be able to handle %induction, but not %induction1.
Am I missing something obvious?

In D21620#465284, @anemet wrote:

I am also wondering if this is general goodness that should be in instcombine rather than adding more special-casing to the already complex vectorizer.

Adam,

I'm not sure I agree with this. InstCombine/InstructionSimplify are also very complex, and if you look at my reply to Michael, simplifying these cases is not straightforward. I don't think we want, or can, have InstCombine be an "un-vectorizer". I just doesn't make sense to vectorize these trivial induction variables in the first place.

lib/Transforms/Vectorize/LoopVectorize.cpp
2189 ↗	(On Diff #61599)	InstCombine/InstructionSimplify is only able to simplify %induction because it knows element zero of the add is a noop: %induction = add <2 x i64> %broadcast.splat, <i64 0, i64 1> It can then replace the extract from element 0 with %index, and then the rest falls out. We can't do the same thing for %induction1, though. The add is not a noop. InstCombine would have to decide it would be beneficial to unvectorize actual computation, replacing a single vector add with 2 scalar adds. It doesn't seem like InstCombine is equipped to do this. And I think you would need a cost model for this, anyway.

mssimpso added inline comments.Jun 23 2016, 6:13 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
2189 ↗	(On Diff #61599)	I also should have mentioned that if you increase the vector factor to 4, %induction is not longer removed as well, for the reasons I mentioned above. The noop trick is only useful when there are two elements in the vector.

Addressed Michael's initial comments.

Reused collectValuesToIgnore for determining if an induction variable has any non-trivial users. We were already doing this in the cost model.
Added another test case for the interleaved example we've been discussing.

Thanks!

Basically this LGTM (modulo some nits, inline), as long as you also file a bug for the InstCombine issue.
I'm also ok with landing only the scalar IV part of this for now, and trying to make a decision about getScalarizedStepVector vs. IC separately. Adam, what do you think?

lib/Transforms/Vectorize/LoopVectorize.cpp
4223 ↗	(On Diff #61696)	Nit - can you perhaps rename ValuesToIgnore? Because we're not using it here to "ignore" anything, and it looks odd.
6352 ↗	(On Diff #61696)	Note that there's a conflicting modification to this code planned by Wei: http://reviews.llvm.org/D20474
test/Transforms/LoopVectorize/induction.ll
110 ↗	(On Diff #61696)	simply -> simplify

Adding Wei due to the possible conflict with http://reviews.llvm.org/D20474

Addressed Michael's comments.

I'll wait for additional feedback from Adam. Thanks, Michael!

mssimpso added a reviewer: anemet.Jun 24 2016, 1:46 PM

anemet added inline comments.Jun 27 2016, 9:51 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
2191 ↗	(On Diff #61818)	Matt, your argument about instcombine not being adequate for this is a convincing one. On the other hand, I am still wondering if the solution is general enough. There should be cases where we'd be better off having both the vector and the scalar version of the same induction variable. I.e. depending on the use, you may want to use the vector version (e.g. store) or the scalar version (address calc, compares). I am not saying that we need to necessarily implement this but the current solution should be taking us incrementally closer to the full solution.

mssimpso added inline comments.Jun 28 2016, 8:36 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
2191 ↗	(On Diff #61818)	Adam, One way to handle the cases you mention would be to allow the vectorizer to generate both a scalar version and a vector version of the same induction variable. Then it could select whether to generate a use of the scalar or vector version based on the instruction (address calculation, store, etc.). DCE and Instcombine would presumably clean up afterwards. The current patch does take us incrementally to that kind of solution. It handles the case where we know we will always prefer to use the scalar version. If that's the case, we generate the scalar steps; otherwise, we continue to generate the vector steps. It doesn't, however, generate the scalar version of the induction variable along side the vector version.

anemet added inline comments.Jun 28 2016, 10:09 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
2191 ↗	(On Diff #61818)	One way to handle the cases you mention would be to allow the vectorizer to generate both a scalar version and a vector version of the same induction variable. Then it could select whether to generate a use of the scalar or vector version based on the instruction (address calculation, store, etc.). DCE and Instcombine would presumably clean up afterwards. Yes that is the direction I was thinking of going in the long term. (If the later passes are ineffective cleaning up unused versions we can also generate them on demand.) The current patch does take us incrementally to that kind of solution. It handles the case where we know we will always prefer to use the scalar version. If that's the case, we generate the scalar steps; otherwise, we continue to generate the vector steps. It doesn't, however, generate the scalar version of the induction variable along side the vector version. Fair enough. I have one last high-level question. Does this fix the pointer arithmetic code in https://llvm.org/bugs/show_bug.cgi?id=27881#c6 ? I will look at the specifics of the patch later today.

mssimpso added inline comments.Jun 28 2016, 11:23 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
2191 ↗	(On Diff #61818)	Does this fix the pointer arithmetic code in https://llvm.org/bugs/show_bug.cgi?id=27881#c6 ? I think so. With this patch post-instcombine, we generate code like the following for the pointer arithmetic: vector.body: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %offset.idx = shl i64 %index, 3 %3 = or i64 %offset.idx, 8 %4 = or i64 %offset.idx, 16 %5 = or i64 %offset.idx, 24 %6 = getelementptr inbounds float, float* %a, i64 %offset.idx %7 = getelementptr inbounds float, float* %a, i64 %3 %8 = getelementptr inbounds float, float* %a, i64 %4 %9 = getelementptr inbounds float, float* %a, i64 %5 The bigger issue there seems to be the cost model though. The loop still builds up vectors with the loaded values that feed the vector fadd's. Regarding the IR explosion mentioned in that bug pre-instcombine, this patch will not help. This is because I scalarize the arithmetic for the step vectors, but still insert the results into a vector, since the rest of the code expects the IV's to be vectors. (All the scalarization is handled in this way I believe). There's perhaps room for a compile-time optimization here where the vectorizer would avoid all the scalar-to-vector-to-scalar conversions. I will look at the specifics of the patch later today. Thanks, as always!

anemet added inline comments.Jun 28 2016, 11:33 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
2191 ↗	(On Diff #61818)	I think so. With this patch post-instcombine, we generate code like the following for the pointer arithmetic: Great, thanks!

anemet added inline comments.Jun 29 2016, 1:55 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
308–314 ↗	(On Diff #61818)	Please comment the new argument
316 ↗	(On Diff #61818)	Since you're holding on to this in a member, it's less surprising if the reference is live throughout the lifetime of the InnerLoopVectorizer. I think that we should pass this in the ctor. I see no particular reason not to do this way.
420 ↗	(On Diff #61818)	\p Step, please explain at least \p Index as well.
426 ↗	(On Diff #61818)	same
4219–4233 ↗	(On Diff #61818)	Looks like this and the code below under trunc are quite similar. Can we factor this is out to a helper in a prequel to this patch and then add the special case in the to the helper?
test/Transforms/LoopVectorize/induction.ll
70–80 ↗	(On Diff #61818)	I think that the testcase from the PR I mentioned would better highlight the problem of vectorizing the induction variable. I.e. in this case we don't actually end up with a vector induction variable but a secondary unnecessary IV.

mssimpso added inline comments.Jun 30 2016, 7:01 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
316 ↗	(On Diff #61818)	Would it make sense to just pass the Cost Model in the ctor? MinimumBitWidths comes from there as well. Actually, I'm not sure why we don't pass the Legality in the ctor either.
4219–4233 ↗	(On Diff #61818)	I think so - I'll give it a shot and post for review.
test/Transforms/LoopVectorize/induction.ll
70–80 ↗	(On Diff #61818)	I'll add that test case as well.

anemet added inline comments.Jun 30 2016, 11:14 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
316 ↗	(On Diff #61818)	Good point. If you prefer you can leave VecValuesToIgnore as it is for now and clean all this up in follow-on patches.

mssimpso mentioned this in D21903: [LV] Refactor integer induction widening (NFC).Jun 30 2016, 12:03 PM

mssimpso added inline comments.Jun 30 2016, 12:23 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
4219–4233 ↗	(On Diff #61818)	Refactoring done in D21903.

Addressed Adam's comments.

Refactored the integer induction widening (D21903) and rebased.
Updated comments.
Added the test case from the PR mentioned in discussions.

lib/Transforms/Vectorize/LoopVectorize.cpp
316 ↗	(On Diff #61818)	I think a follow-on is the way to go. I briefly looked at this, and there may be quit a bit to untangle. The cost model and legality have a lot of overlap, and we will want to be sure we don't unnecessarily duplicate anything.

LGTM.

This revision is now accepted and ready to land.Jul 1 2016, 10:52 AM

I just started to look at D21903 but I am not sure I understand the order of the two patches. Ideally D21903 should have been a prequel to this but after reading the latest version of this patch I was expecting it to be the other way. But looking at D21903, that changes here are not present there. Can you please explain?

Thanks for the reviews Adam. D21903 is indeed the prequel refactoring patch, and I've rebased the current patch on top of that one. Lines 2250-2257 in the current patch are the primary functional change now. This check is now only in one place (in the new widenIntInduction function I added), unlike before the refactoring, where it was in two places (with the int induction case and the trunc case). Hope that helps!

Ah, great! Let me review in the right order than.

This still LGTM after reading the patches in the right order.

Please don't forget to mention the PR in the commit log and update the PR as well. Thanks.

Will do. Thanks, Adam.

Closed by commit rL274627: [LV] Don't widen trivial induction variables (authored by mssimpso). · Explain WhyJul 6 2016, 7:34 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

66 lines

test/

Transforms/

LoopVectorize/

10 lines

138 lines

4 lines

62 lines

Diff 62863

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 302 Lines • ▼ Show 20 Lines	InnerLoopVectorizer(Loop *OrigLoop, PredicatedScalarEvolution &PSE,
AC(AC), VF(VecWidth), UF(UnrollFactor),		AC(AC), VF(VecWidth), UF(UnrollFactor),
Builder(PSE.getSE()->getContext()), Induction(nullptr),		Builder(PSE.getSE()->getContext()), Induction(nullptr),
OldInduction(nullptr), WidenMap(UnrollFactor), TripCount(nullptr),		OldInduction(nullptr), WidenMap(UnrollFactor), TripCount(nullptr),
VectorTripCount(nullptr), Legal(nullptr), AddedSafetyChecks(false) {}		VectorTripCount(nullptr), Legal(nullptr), AddedSafetyChecks(false) {}

// Perform the actual loop widening (vectorization).		// Perform the actual loop widening (vectorization).
// MinimumBitWidths maps scalar integer values to the smallest bitwidth they		// MinimumBitWidths maps scalar integer values to the smallest bitwidth they
// can be validly truncated to. The cost model has assumed this truncation		// can be validly truncated to. The cost model has assumed this truncation
// will happen when vectorizing.		// will happen when vectorizing. VecValuesToIgnore contains scalar values
		// that the cost model has chosen to ignore because they will not be
		// vectorized.
void vectorize(LoopVectorizationLegality *L,		void vectorize(LoopVectorizationLegality *L,
const MapVector<Instruction *, uint64_t> &MinimumBitWidths) {		const MapVector<Instruction *, uint64_t> &MinimumBitWidths,
		SmallPtrSetImpl<const Value *> &VecValuesToIgnore) {
MinBWs = &MinimumBitWidths;		MinBWs = &MinimumBitWidths;
		ValuesNotWidened = &VecValuesToIgnore;
Legal = L;		Legal = L;
// Create a new empty loop. Unlink the old loop and connect the new one.		// Create a new empty loop. Unlink the old loop and connect the new one.
createEmptyLoop();		createEmptyLoop();
// Widen each instruction in the old loop to a new one in the new loop.		// Widen each instruction in the old loop to a new one in the new loop.
// Use the Legality module to find the induction and reduction variables.		// Use the Legality module to find the induction and reduction variables.
vectorizeLoop();		vectorizeLoop();
}		}

▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	protected:
/// this is needed because each iteration in the loop corresponds to a SIMD		/// this is needed because each iteration in the loop corresponds to a SIMD
/// element.		/// element.
virtual Value getBroadcastInstrs(Value V);		virtual Value getBroadcastInstrs(Value V);

/// This function adds (StartIdx, StartIdx + Step, StartIdx + 2*Step, ...)		/// This function adds (StartIdx, StartIdx + Step, StartIdx + 2*Step, ...)
/// to each vector element of Val. The sequence starts at StartIndex.		/// to each vector element of Val. The sequence starts at StartIndex.
virtual Value getStepVector(Value Val, int StartIdx, Value *Step);		virtual Value getStepVector(Value Val, int StartIdx, Value *Step);

		/// Compute a step vector like the above function, but scalarize the
		/// arithmetic instead. The results of the computation are inserted into a
		/// new vector with VF elements. \p Val is the initial value, \p Step is the
		/// size of the step, and \p StartIdx indicates the index of the increment
		/// from which to start computing the steps.
		Value getScalarizedStepVector(Value Val, int StartIdx, Value *Step);

/// Create a vector induction phi node based on an existing scalar one. This		/// Create a vector induction phi node based on an existing scalar one. This
/// currently only works for integer induction variables with a constant		/// currently only works for integer induction variables with a constant
/// step. If \p TruncType is non-null, instead of widening the original IV,		/// step. If \p TruncType is non-null, instead of widening the original IV,
/// we widen a version of the IV truncated to \p TruncType.		/// we widen a version of the IV truncated to \p TruncType.
void createVectorIntInductionPHI(const InductionDescriptor &II,		void createVectorIntInductionPHI(const InductionDescriptor &II,
VectorParts &Entry, IntegerType *TruncType);		VectorParts &Entry, IntegerType *TruncType);

/// Widen an integer induction variable \p IV. If \p TruncType is provided,		/// Widen an integer induction variable \p IV. If \p TruncType is provided,
▲ Show 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	protected:
Value *TripCount;		Value *TripCount;
/// Trip count of the widened loop (TripCount - TripCount % (VF*UF))		/// Trip count of the widened loop (TripCount - TripCount % (VF*UF))
Value *VectorTripCount;		Value *VectorTripCount;

/// Map of scalar integer values to the smallest bitwidth they can be legally		/// Map of scalar integer values to the smallest bitwidth they can be legally
/// represented as. The vector equivalents of these values should be truncated		/// represented as. The vector equivalents of these values should be truncated
/// to this type.		/// to this type.
const MapVector<Instruction , uint64_t> MinBWs;		const MapVector<Instruction , uint64_t> MinBWs;

		/// A set of values that should not be widened. This is taken from
		/// VecValuesToIgnore in the cost model.
		SmallPtrSetImpl<const Value > ValuesNotWidened;

LoopVectorizationLegality *Legal;		LoopVectorizationLegality *Legal;

// Record whether runtime checks are added.		// Record whether runtime checks are added.
bool AddedSafetyChecks;		bool AddedSafetyChecks;
};		};

class InnerLoopUnroller : public InnerLoopVectorizer {		class InnerLoopUnroller : public InnerLoopVectorizer {
public:		public:
▲ Show 20 Lines • Show All 1,475 Lines • ▼ Show 20 Lines	if (!VectorizeLoop && !InterleaveLoop) {
DEBUG(dbgs() << "LV: Interleave Count is " << IC << '\n');		DEBUG(dbgs() << "LV: Interleave Count is " << IC << '\n');
}		}

if (!VectorizeLoop) {		if (!VectorizeLoop) {
assert(IC > 1 && "interleave count should not be 1 or 0");		assert(IC > 1 && "interleave count should not be 1 or 0");
// If we decided that it is not legal to vectorize the loop, then		// If we decided that it is not legal to vectorize the loop, then
// interleave it.		// interleave it.
InnerLoopUnroller Unroller(L, PSE, LI, DT, TLI, TTI, AC, IC);		InnerLoopUnroller Unroller(L, PSE, LI, DT, TLI, TTI, AC, IC);
Unroller.vectorize(&LVL, CM.MinBWs);		Unroller.vectorize(&LVL, CM.MinBWs, CM.VecValuesToIgnore);

emitOptimizationRemark(F->getContext(), LV_NAME, *F, L->getStartLoc(),		emitOptimizationRemark(F->getContext(), LV_NAME, *F, L->getStartLoc(),
Twine("interleaved loop (interleaved count: ") +		Twine("interleaved loop (interleaved count: ") +
Twine(IC) + ")");		Twine(IC) + ")");
} else {		} else {
// If we decided that it is legal to vectorize the loop, then do it.		// If we decided that it is legal to vectorize the loop, then do it.
InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, VF.Width, IC);		InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, VF.Width, IC);
LB.vectorize(&LVL, CM.MinBWs);		LB.vectorize(&LVL, CM.MinBWs, CM.VecValuesToIgnore);
++LoopsVectorized;		++LoopsVectorized;

// Add metadata to disable runtime unrolling a scalar loop when there are		// Add metadata to disable runtime unrolling a scalar loop when there are
// no runtime checks about strides and memory. A scalar loop that is		// no runtime checks about strides and memory. A scalar loop that is
// rarely used is not worth unrolling.		// rarely used is not worth unrolling.
if (!LB.areSafetyChecksAdded())		if (!LB.areSafetyChecksAdded())
AddRuntimeUnrollDisableMetaData(L);		AddRuntimeUnrollDisableMetaData(L);

▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::widenIntInduction(PHINode *IV, VectorParts &Entry,
// If the induction variable has a constant integer step value, go ahead and		// If the induction variable has a constant integer step value, go ahead and
// get it now.		// get it now.
if (ID.getConstIntStepValue())		if (ID.getConstIntStepValue())
Step = ID.getConstIntStepValue();		Step = ID.getConstIntStepValue();

// Try to create a new independent vector induction variable. If we can't		// Try to create a new independent vector induction variable. If we can't
// create the phi node, we will splat the scalar induction variable in each		// create the phi node, we will splat the scalar induction variable in each
// loop iteration.		// loop iteration.
if (VF > 1 && IV->getType() == Induction->getType() && Step)		if (VF > 1 && IV->getType() == Induction->getType() && Step &&
		!ValuesNotWidened->count(IV))
return createVectorIntInductionPHI(ID, Entry, TruncType);		return createVectorIntInductionPHI(ID, Entry, TruncType);

// The scalar value to broadcast. This will be derived from the canonical		// The scalar value to broadcast. This will be derived from the canonical
// induction variable.		// induction variable.
Value *ScalarIV = nullptr;		Value *ScalarIV = nullptr;

// Define the scalar induction variable and step values. If we were given a		// Define the scalar induction variable and step values. If we were given a
// truncation type, truncate the canonical induction variable and constant		// truncation type, truncate the canonical induction variable and constant
Show All 13 Lines	if (TruncType) {
}		}
if (!Step) {		if (!Step) {
SCEVExpander Exp(*PSE.getSE(), DL, "induction");		SCEVExpander Exp(*PSE.getSE(), DL, "induction");
Step = Exp.expandCodeFor(ID.getStep(), ID.getStep()->getType(),		Step = Exp.expandCodeFor(ID.getStep(), ID.getStep()->getType(),
&*Builder.GetInsertPoint());		&*Builder.GetInsertPoint());
}		}
}		}

		// If an induction variable is only used for counting loop iterations or
		// calculating addresses, it shouldn't be widened. Scalarize the step vector
		// to give InstCombine a better chance of simplifying it.
		if (VF > 1 && ValuesNotWidened->count(IV)) {
		for (unsigned Part = 0; Part < UF; ++Part)
		Entry[Part] = getScalarizedStepVector(ScalarIV, VF * Part, Step);
		return;
		}

// Finally, splat the scalar induction variable, and build the necessary step		// Finally, splat the scalar induction variable, and build the necessary step
// vectors.		// vectors.
Value *Broadcasted = getBroadcastInstrs(ScalarIV);		Value *Broadcasted = getBroadcastInstrs(ScalarIV);
for (unsigned Part = 0; Part < UF; ++Part)		for (unsigned Part = 0; Part < UF; ++Part)
Entry[Part] = getStepVector(Broadcasted, VF * Part, Step);		Entry[Part] = getStepVector(Broadcasted, VF * Part, Step);
}		}

Value InnerLoopVectorizer::getStepVector(Value Val, int StartIdx,		Value InnerLoopVectorizer::getStepVector(Value Val, int StartIdx,
Show All 19 Lines	Value InnerLoopVectorizer::getStepVector(Value Val, int StartIdx,
Step = Builder.CreateVectorSplat(VLen, Step);		Step = Builder.CreateVectorSplat(VLen, Step);
assert(Step->getType() == Val->getType() && "Invalid step vec");		assert(Step->getType() == Val->getType() && "Invalid step vec");
// FIXME: The newly created binary instructions should contain nsw/nuw flags,		// FIXME: The newly created binary instructions should contain nsw/nuw flags,
// which can be found from the original scalar operations.		// which can be found from the original scalar operations.
Step = Builder.CreateMul(Cv, Step);		Step = Builder.CreateMul(Cv, Step);
return Builder.CreateAdd(Val, Step, "induction");		return Builder.CreateAdd(Val, Step, "induction");
}		}

		Value InnerLoopVectorizer::getScalarizedStepVector(Value Val, int StartIdx,
		Value *Step) {

		// We can't create a vector with less than two elements.
		assert(VF > 1 && "VF should be greater than one");

		// Get the value type and ensure it and the step have the same integer type.
		Type *ValTy = Val->getType()->getScalarType();
		assert(ValTy->isIntegerTy() && ValTy == Step->getType() &&
		"Val and Step should have the same integer type");

		// Compute the scalarized step vector. We perform scalar arithmetic and then
		// insert the results into the step vector.
		Value *StepVector = UndefValue::get(ToVectorTy(ValTy, VF));
		for (unsigned I = 0; I < VF; ++I) {
		auto *Mul = Builder.CreateMul(ConstantInt::get(ValTy, StartIdx + I), Step);
		auto *Add = Builder.CreateAdd(Val, Mul);
		StepVector = Builder.CreateInsertElement(StepVector, Add, I);
		}

		return StepVector;
		}

int LoopVectorizationLegality::isConsecutivePtr(Value *Ptr) {		int LoopVectorizationLegality::isConsecutivePtr(Value *Ptr) {
assert(Ptr->getType()->isPointerTy() && "Unexpected non-ptr");		assert(Ptr->getType()->isPointerTy() && "Unexpected non-ptr");
auto *SE = PSE.getSE();		auto *SE = PSE.getSE();
// Make sure that the pointer does not point to structs.		// Make sure that the pointer does not point to structs.
if (Ptr->getType()->getPointerElementType()->isAggregateType())		if (Ptr->getType()->getPointerElementType()->isAggregateType())
return 0;		return 0;

// If this value is a pointer induction variable, we know it is consecutive.		// If this value is a pointer induction variable, we know it is consecutive.
▲ Show 20 Lines • Show All 4,163 Lines • ▼ Show 20 Lines	void LoopVectorizationCostModel::collectValuesToIgnore() {
// instruction to exit loop. Induction variables usually have large types and		// instruction to exit loop. Induction variables usually have large types and
// can have big impact when estimating register usage.		// can have big impact when estimating register usage.
// This is for when VF > 1.		// This is for when VF > 1.
for (auto &Induction : *Legal->getInductionVars()) {		for (auto &Induction : *Legal->getInductionVars()) {
auto *PN = Induction.first;		auto *PN = Induction.first;
auto *UpdateV = PN->getIncomingValueForBlock(TheLoop->getLoopLatch());		auto *UpdateV = PN->getIncomingValueForBlock(TheLoop->getLoopLatch());

// Check that the PHI is only used by the induction increment (UpdateV) or		// Check that the PHI is only used by the induction increment (UpdateV) or
// by GEPs. Then check that UpdateV is only used by a compare instruction or		// by GEPs. Then check that UpdateV is only used by a compare instruction,
// the loop header PHI.		// the loop header PHI, or by GEPs.
// FIXME: Need precise def-use analysis to determine if this instruction		// FIXME: Need precise def-use analysis to determine if this instruction
// variable will be vectorized.		// variable will be vectorized.
if (std::all_of(PN->user_begin(), PN->user_end(),		if (std::all_of(PN->user_begin(), PN->user_end(),
[&](const User *U) -> bool {		[&](const User *U) -> bool {
return U == UpdateV \|\| isa<GetElementPtrInst>(U);		return U == UpdateV \|\| isa<GetElementPtrInst>(U);
}) &&		}) &&
std::all_of(UpdateV->user_begin(), UpdateV->user_end(),		std::all_of(UpdateV->user_begin(), UpdateV->user_end(),
[&](const User *U) -> bool {		[&](const User *U) -> bool {
return U == PN \|\| isa<ICmpInst>(U);		return U == PN \|\| isa<ICmpInst>(U) \|\|
		isa<GetElementPtrInst>(U);
})) {		})) {
VecValuesToIgnore.insert(PN);		VecValuesToIgnore.insert(PN);
VecValuesToIgnore.insert(UpdateV);		VecValuesToIgnore.insert(UpdateV);
}		}
}		}

// Ignore instructions that will not be vectorized.		// Ignore instructions that will not be vectorized.
// This is for when VF > 1.		// This is for when VF > 1.
▲ Show 20 Lines • Show All 146 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/gep_with_bitcast.ll

	; RUN: opt -S -loop-vectorize -instcombine -force-vector-width=4 < %s \| FileCheck %s			; RUN: opt -S -loop-vectorize -instcombine -force-vector-width=4 < %s \| FileCheck %s

	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"

	; Vectorization of loop with bitcast between GEP and load			; Vectorization of loop with bitcast between GEP and load
	; Simplified source code:			; Simplified source code:
	;void foo (double** __restrict__ in, bool * __restrict__ res) {			;void foo (double** __restrict__ in, bool * __restrict__ res) {
	;			;
	; for (int i = 0; i < 4096; ++i)			; for (int i = 0; i < 4096; ++i)
	; res[i] = ((unsigned long long)in[i] == 0);			; res[i] = ((unsigned long long)in[i] == 0);
	;}			;}

	; CHECK-LABEL: @foo			; CHECK-LABEL: @foo
	; CHECK: vector.body			; CHECK: vector.body
	; CHECK: %0 = phi			; CHECK: %[[IV:.+]] = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %2 = getelementptr inbounds double, double* %in, i64 %0			; CHECK: %[[v0:.+]] = getelementptr inbounds double, double* %in, i64 %[[IV]]
	; CHECK: %3 = bitcast double** %2 to <4 x i64>*			; CHECK: %[[v1:.+]] = bitcast double** %[[v0]] to <4 x i64>*
	; CHECK: %wide.load = load <4 x i64>, <4 x i64>* %3, align 8			; CHECK: %wide.load = load <4 x i64>, <4 x i64>* %[[v1]], align 8
	; CHECK: %4 = icmp eq <4 x i64> %wide.load, zeroinitializer			; CHECK: icmp eq <4 x i64> %wide.load, zeroinitializer
	; CHECK: br i1			; CHECK: br i1

	define void @foo(double noalias nocapture readonly %in, double noalias nocapture readnone %out, i8* noalias nocapture %res) #0 {			define void @foo(double noalias nocapture readonly %in, double noalias nocapture readnone %out, i8* noalias nocapture %res) #0 {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	Show All 14 Lines

llvm/trunk/test/Transforms/LoopVectorize/induction.ll

; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -S \| FileCheck %s		; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -S \| FileCheck %s
; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -instcombine -S \| FileCheck %s --check-prefix=IND		; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -instcombine -S \| FileCheck %s --check-prefix=IND
; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=2 -instcombine -S \| FileCheck %s --check-prefix=UNROLL		; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=2 -instcombine -S \| FileCheck %s --check-prefix=UNROLL
		; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=4 -enable-interleaved-mem-accesses -instcombine -S \| FileCheck %s --check-prefix=INTERLEAVE

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"		target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

; Make sure that we can handle multiple integer induction variables.		; Make sure that we can handle multiple integer induction variables.
; CHECK-LABEL: @multi_int_induction(		; CHECK-LABEL: @multi_int_induction(
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]		; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
; CHECK: %[[VAR:.*]] = trunc i64 %index to i32		; CHECK: %[[VAR:.*]] = trunc i64 %index to i32
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	for.body:
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%exitcond = icmp eq i64 %iv.next, %n		%exitcond = icmp eq i64 %iv.next, %n
br i1 %exitcond, label %loopexit, label %for.body		br i1 %exitcond, label %loopexit, label %for.body

loopexit:		loopexit:
ret void		ret void
}		}

		; Make sure we don't create a vector induction phi node that is unused.
		; Scalarize the step vectors instead.
		;
		; for (int i = 0; i < n; ++i)
		; sum += a[i];
		;
		; IND-LABEL: @scalarize_induction_variable_01(
		; IND: vector.body:
		; IND: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
		; IND-NOT: add i64 {{.*}}, 2
		; IND: getelementptr inbounds i64, i64* %a, i64 %index
		;
		; UNROLL-LABEL: @scalarize_induction_variable_01(
		; UNROLL: vector.body:
		; UNROLL: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
		; UNROLL-NOT: add i64 {{.*}}, 4
		; UNROLL: %[[g1:.+]] = getelementptr inbounds i64, i64* %a, i64 %index
		; UNROLL: getelementptr i64, i64* %[[g1]], i64 2

		define i64 @scalarize_induction_variable_01(i64 *%a, i64 %n) {
		entry:
		br label %for.body

		for.body:
		%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
		%sum = phi i64 [ %2, %for.body ], [ 0, %entry ]
		%0 = getelementptr inbounds i64, i64* %a, i64 %i
		%1 = load i64, i64* %0, align 8
		%2 = add i64 %1, %sum
		%i.next = add nuw nsw i64 %i, 1
		%cond = icmp slt i64 %i.next, %n
		br i1 %cond, label %for.body, label %for.end

		for.end:
		%3 = phi i64 [ %2, %for.body ]
		ret i64 %3
		}

		; Make sure we scalarize the step vectors used for the pointer arithmetic. We
		; can't easily simplify vectorized step vectors.
		;
		; float s = 0;
		; for (int i ; 0; i < n; i += 8)
		; s += (a[i] + b[i] + 1.0f);
		;
		; IND-LABEL: @scalarize_induction_variable_02(
		; IND: vector.body:
		; IND: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
		; IND: %[[i0:.+]] = shl i64 %index, 3
		; IND: %[[i1:.+]] = or i64 %[[i0]], 8
		; IND: getelementptr inbounds float, float* %a, i64 %[[i0]]
		; IND: getelementptr inbounds float, float* %a, i64 %[[i1]]
		;
		; UNROLL-LABEL: @scalarize_induction_variable_02(
		; UNROLL: vector.body:
		; UNROLL: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
		; UNROLL: %[[i0:.+]] = shl i64 %index, 3
		; UNROLL: %[[i1:.+]] = or i64 %[[i0]], 8
		; UNROLL: %[[i2:.+]] = or i64 %[[i0]], 16
		; UNROLL: %[[i3:.+]] = or i64 %[[i0]], 24
		; UNROLL: getelementptr inbounds float, float* %a, i64 %[[i0]]
		; UNROLL: getelementptr inbounds float, float* %a, i64 %[[i1]]
		; UNROLL: getelementptr inbounds float, float* %a, i64 %[[i2]]
		; UNROLL: getelementptr inbounds float, float* %a, i64 %[[i3]]

		define float @scalarize_induction_variable_02(float* %a, float* %b, i64 %n) {
		entry:
		br label %for.body

		for.body:
		%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
		%s = phi float [ 0.0, %entry ], [ %6, %for.body ]
		%0 = getelementptr inbounds float, float* %a, i64 %i
		%1 = load float, float* %0, align 4
		%2 = getelementptr inbounds float, float* %b, i64 %i
		%3 = load float, float* %2, align 4
		%4 = fadd fast float %s, 1.0
		%5 = fadd fast float %4, %1
		%6 = fadd fast float %5, %3
		%i.next = add nuw nsw i64 %i, 8
		%cond = icmp slt i64 %i.next, %n
		br i1 %cond, label %for.body, label %for.end

		for.end:
		%s.lcssa = phi float [ %6, %for.body ]
		ret float %s.lcssa
		}

		; Make sure we scalarize the step vectors used for the pointer arithmetic. We
		; can't easily simplify vectorized step vectors. (Interleaved accesses.)
		;
		; for (int i = 0; i < n; ++i)
		; a[i].f ^= y;
		;
		; INTERLEAVE-LABEL: @scalarize_induction_variable_03(
		; INTERLEAVE: vector.body:
		; INTERLEAVE: %[[i0:.+]] = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
		; INTERLEAVE: %[[i1:.+]] = or i64 %[[i0]], 1
		; INTERLEAVE: %[[i2:.+]] = or i64 %[[i0]], 2
		; INTERLEAVE: %[[i3:.+]] = or i64 %[[i0]], 3
		; INTERLEAVE: %[[i4:.+]] = or i64 %[[i0]], 4
		; INTERLEAVE: %[[i5:.+]] = or i64 %[[i0]], 5
		; INTERLEAVE: %[[i6:.+]] = or i64 %[[i0]], 6
		; INTERLEAVE: %[[i7:.+]] = or i64 %[[i0]], 7
		; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i0]], i32 1
		; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i1]], i32 1
		; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i2]], i32 1
		; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i3]], i32 1
		; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i4]], i32 1
		; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i5]], i32 1
		; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i6]], i32 1
		; INTERLEAVE: getelementptr inbounds %pair, %pair* %p, i64 %[[i7]], i32 1

		%pair = type { i32, i32 }
		define void @scalarize_induction_variable_03(%pair *%p, i32 %y, i64 %n) {
		entry:
		br label %for.body

		for.body:
		%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
		%f = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 1
		%0 = load i32, i32* %f, align 8
		%1 = xor i32 %0, %y
		store i32 %1, i32* %f, align 8
		%i.next = add nuw nsw i64 %i, 1
		%cond = icmp slt i64 %i.next, %n
		br i1 %cond, label %for.body, label %for.end

		for.end:
		ret void
		}

; Make sure that the loop exit count computation does not overflow for i8 and		; Make sure that the loop exit count computation does not overflow for i8 and
; i16. The exit count of these loops is i8/i16 max + 1. If we don't cast the		; i16. The exit count of these loops is i8/i16 max + 1. If we don't cast the
; induction variable to a bigger type the exit count computation will overflow		; induction variable to a bigger type the exit count computation will overflow
; to 0.		; to 0.
; PR17532		; PR17532

; CHECK-LABEL: i8_loop		; CHECK-LABEL: i8_loop
Show All 32 Lines
}		}

; This loop has a backedge taken count of i32_max. We need to check for this		; This loop has a backedge taken count of i32_max. We need to check for this
; condition and branch directly to the scalar loop.		; condition and branch directly to the scalar loop.

; CHECK-LABEL: max_i32_backedgetaken		; CHECK-LABEL: max_i32_backedgetaken
; CHECK: br i1 true, label %scalar.ph, label %min.iters.checked		; CHECK: br i1 true, label %scalar.ph, label %min.iters.checked

		; CHECK: middle.block:
		; CHECK: %[[v9:.+]] = extractelement <2 x i32> %bin.rdx, i32 0
; CHECK: scalar.ph:		; CHECK: scalar.ph:
; CHECK: %bc.resume.val = phi i32 [ 0, %middle.block ], [ 0, %0 ]		; CHECK: %bc.resume.val = phi i32 [ 0, %middle.block ], [ 0, %[[v0:.+]] ]
; CHECK: %bc.merge.rdx = phi i32 [ 1, %0 ], [ 1, %min.iters.checked ], [ %5, %middle.block ]		; CHECK: %bc.merge.rdx = phi i32 [ 1, %[[v0:.+]] ], [ 1, %min.iters.checked ], [ %[[v9]], %middle.block ]

define i32 @max_i32_backedgetaken() nounwind readnone ssp uwtable {		define i32 @max_i32_backedgetaken() nounwind readnone ssp uwtable {

br label %1		br label %1

; <label>:1 ; preds = %1, %0		; <label>:1 ; preds = %1, %0
%a.0 = phi i32 [ 1, %0 ], [ %2, %1 ]		%a.0 = phi i32 [ 1, %0 ], [ %2, %1 ]
%b.0 = phi i32 [ 0, %0 ], [ %3, %1 ]		%b.0 = phi i32 [ 0, %0 ], [ %3, %1 ]
▲ Show 20 Lines • Show All 224 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/iv_outside_user.ll

Show All 16 Lines	for.body:
br i1 %cmp, label %for.end, label %for.body		br i1 %cmp, label %for.end, label %for.body

for.end:		for.end:
ret i32 %inc		ret i32 %inc
}		}

; CHECK-LABEL: @preinc		; CHECK-LABEL: @preinc
; CHECK-LABEL: middle.block:		; CHECK-LABEL: middle.block:
; CHECK: %3 = sub i32 %n.vec, 1		; CHECK: %[[v3:.+]] = sub i32 %n.vec, 1
; CHECK: %ind.escape = add i32 0, %3		; CHECK: %ind.escape = add i32 0, %[[v3]]
; CHECK-LABEL: scalar.ph:		; CHECK-LABEL: scalar.ph:
; CHECK: %bc.resume.val = phi i32 [ %n.vec, %middle.block ], [ 0, %entry ]		; CHECK: %bc.resume.val = phi i32 [ %n.vec, %middle.block ], [ 0, %entry ]
; CHECK-LABEL: for.end:		; CHECK-LABEL: for.end:
; CHECK: %[[RET:.]] = phi i32 [ {{.}}, %for.body ], [ %ind.escape, %middle.block ]		; CHECK: %[[RET:.]] = phi i32 [ {{.}}, %for.body ], [ %ind.escape, %middle.block ]
; CHECK: ret i32 %[[RET]]		; CHECK: ret i32 %[[RET]]
define i32 @preinc(i32 %k) {		define i32 @preinc(i32 %k) {
entry:		entry:
br label %for.body		br label %for.body
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/reverse_induction.ll

	; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=4 -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=4 -S \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	; Make sure consecutive vector generates correct negative indices.			; Make sure consecutive vector generates correct negative indices.
	; PR15882			; PR15882

	; CHECK-LABEL: @reverse_induction_i64(			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %step.add = add <4 x i64> %vec.ind, <i64 -4, i64 -4, i64 -4, i64 -4>			; CHECK: %offset.idx = sub i64 %startval, %index
	; CHECK: %step.add2 = add <4 x i64> %step.add, <i64 -4, i64 -4, i64 -4, i64 -4>			; CHECK: %[[a0:.+]] = add i64 %offset.idx, 0
				; CHECK: %[[v0:.+]] = insertelement <4 x i64> undef, i64 %[[a0]], i64 0
				; CHECK: %[[a1:.+]] = add i64 %offset.idx, -1
				; CHECK: %[[v1:.+]] = insertelement <4 x i64> %[[v0]], i64 %[[a1]], i64 1
				; CHECK: %[[a2:.+]] = add i64 %offset.idx, -2
				; CHECK: %[[v2:.+]] = insertelement <4 x i64> %[[v1]], i64 %[[a2]], i64 2
				; CHECK: %[[a3:.+]] = add i64 %offset.idx, -3
				; CHECK: %[[v3:.+]] = insertelement <4 x i64> %[[v2]], i64 %[[a3]], i64 3
				; CHECK: %[[a4:.+]] = add i64 %offset.idx, -4
				; CHECK: %[[v4:.+]] = insertelement <4 x i64> undef, i64 %[[a4]], i64 0
				; CHECK: %[[a5:.+]] = add i64 %offset.idx, -5
				; CHECK: %[[v5:.+]] = insertelement <4 x i64> %[[v4]], i64 %[[a5]], i64 1
				; CHECK: %[[a6:.+]] = add i64 %offset.idx, -6
				; CHECK: %[[v6:.+]] = insertelement <4 x i64> %[[v5]], i64 %[[a6]], i64 2
				; CHECK: %[[a7:.+]] = add i64 %offset.idx, -7
				; CHECK: %[[v7:.+]] = insertelement <4 x i64> %[[v6]], i64 %[[a7]], i64 3

	define i32 @reverse_induction_i64(i64 %startval, i32 * %ptr) {			define i32 @reverse_induction_i64(i64 %startval, i32 * %ptr) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%add.i7 = phi i64 [ %startval, %entry ], [ %add.i, %for.body ]			%add.i7 = phi i64 [ %startval, %entry ], [ %add.i, %for.body ]
	%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]			%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]
	%redux5 = phi i32 [ 0, %entry ], [ %inc.redux, %for.body ]			%redux5 = phi i32 [ 0, %entry ], [ %inc.redux, %for.body ]
	%add.i = add i64 %add.i7, -1			%add.i = add i64 %add.i7, -1
	%kind_.i = getelementptr inbounds i32, i32* %ptr, i64 %add.i			%kind_.i = getelementptr inbounds i32, i32* %ptr, i64 %add.i
	%tmp.i1 = load i32, i32* %kind_.i, align 4			%tmp.i1 = load i32, i32* %kind_.i, align 4
	%inc.redux = add i32 %tmp.i1, %redux5			%inc.redux = add i32 %tmp.i1, %redux5
	%inc4 = add i32 %i.06, 1			%inc4 = add i32 %i.06, 1
	%exitcond = icmp ne i32 %inc4, 1024			%exitcond = icmp ne i32 %inc4, 1024
	br i1 %exitcond, label %for.body, label %loopend			br i1 %exitcond, label %for.body, label %loopend

	loopend:			loopend:
	ret i32 %inc.redux			ret i32 %inc.redux
	}			}

	; CHECK-LABEL: @reverse_induction_i128(			; CHECK-LABEL: @reverse_induction_i128(
	; CHECK: %step.add = add <4 x i128> %vec.ind, <i128 -4, i128 -4, i128 -4, i128 -4>			; CHECK: %index = phi i128 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %step.add2 = add <4 x i128> %step.add, <i128 -4, i128 -4, i128 -4, i128 -4>			; CHECK: %offset.idx = sub i128 %startval, %index
				; CHECK: %[[a0:.+]] = add i128 %offset.idx, 0
				; CHECK: %[[v0:.+]] = insertelement <4 x i128> undef, i128 %[[a0]], i64 0
				; CHECK: %[[a1:.+]] = add i128 %offset.idx, -1
				; CHECK: %[[v1:.+]] = insertelement <4 x i128> %[[v0]], i128 %[[a1]], i64 1
				; CHECK: %[[a2:.+]] = add i128 %offset.idx, -2
				; CHECK: %[[v2:.+]] = insertelement <4 x i128> %[[v1]], i128 %[[a2]], i64 2
				; CHECK: %[[a3:.+]] = add i128 %offset.idx, -3
				; CHECK: %[[v3:.+]] = insertelement <4 x i128> %[[v2]], i128 %[[a3]], i64 3
				; CHECK: %[[a4:.+]] = add i128 %offset.idx, -4
				; CHECK: %[[v4:.+]] = insertelement <4 x i128> undef, i128 %[[a4]], i64 0
				; CHECK: %[[a5:.+]] = add i128 %offset.idx, -5
				; CHECK: %[[v5:.+]] = insertelement <4 x i128> %[[v4]], i128 %[[a5]], i64 1
				; CHECK: %[[a6:.+]] = add i128 %offset.idx, -6
				; CHECK: %[[v6:.+]] = insertelement <4 x i128> %[[v5]], i128 %[[a6]], i64 2
				; CHECK: %[[a7:.+]] = add i128 %offset.idx, -7
				; CHECK: %[[v7:.+]] = insertelement <4 x i128> %[[v6]], i128 %[[a7]], i64 3

	define i32 @reverse_induction_i128(i128 %startval, i32 * %ptr) {			define i32 @reverse_induction_i128(i128 %startval, i32 * %ptr) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%add.i7 = phi i128 [ %startval, %entry ], [ %add.i, %for.body ]			%add.i7 = phi i128 [ %startval, %entry ], [ %add.i, %for.body ]
	%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]			%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]
	%redux5 = phi i32 [ 0, %entry ], [ %inc.redux, %for.body ]			%redux5 = phi i32 [ 0, %entry ], [ %inc.redux, %for.body ]
	%add.i = add i128 %add.i7, -1			%add.i = add i128 %add.i7, -1
	%kind_.i = getelementptr inbounds i32, i32* %ptr, i128 %add.i			%kind_.i = getelementptr inbounds i32, i32* %ptr, i128 %add.i
	%tmp.i1 = load i32, i32* %kind_.i, align 4			%tmp.i1 = load i32, i32* %kind_.i, align 4
	%inc.redux = add i32 %tmp.i1, %redux5			%inc.redux = add i32 %tmp.i1, %redux5
	%inc4 = add i32 %i.06, 1			%inc4 = add i32 %i.06, 1
	%exitcond = icmp ne i32 %inc4, 1024			%exitcond = icmp ne i32 %inc4, 1024
	br i1 %exitcond, label %for.body, label %loopend			br i1 %exitcond, label %for.body, label %loopend

	loopend:			loopend:
	ret i32 %inc.redux			ret i32 %inc.redux
	}			}

	; CHECK-LABEL: @reverse_induction_i16(			; CHECK-LABEL: @reverse_induction_i16(
	; CHECK: add <4 x i16> %[[SPLAT:.*]], <i16 0, i16 -1, i16 -2, i16 -3>			; CHECK: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: add <4 x i16> %[[SPLAT]], <i16 -4, i16 -5, i16 -6, i16 -7>			; CHECK: %offset.idx = sub i16 %startval, {{.*}}
				; CHECK: %[[a0:.+]] = add i16 %offset.idx, 0
				; CHECK: %[[v0:.+]] = insertelement <4 x i16> undef, i16 %[[a0]], i64 0
				; CHECK: %[[a1:.+]] = add i16 %offset.idx, -1
				; CHECK: %[[v1:.+]] = insertelement <4 x i16> %[[v0]], i16 %[[a1]], i64 1
				; CHECK: %[[a2:.+]] = add i16 %offset.idx, -2
				; CHECK: %[[v2:.+]] = insertelement <4 x i16> %[[v1]], i16 %[[a2]], i64 2
				; CHECK: %[[a3:.+]] = add i16 %offset.idx, -3
				; CHECK: %[[v3:.+]] = insertelement <4 x i16> %[[v2]], i16 %[[a3]], i64 3
				; CHECK: %[[a4:.+]] = add i16 %offset.idx, -4
				; CHECK: %[[v4:.+]] = insertelement <4 x i16> undef, i16 %[[a4]], i64 0
				; CHECK: %[[a5:.+]] = add i16 %offset.idx, -5
				; CHECK: %[[v5:.+]] = insertelement <4 x i16> %[[v4]], i16 %[[a5]], i64 1
				; CHECK: %[[a6:.+]] = add i16 %offset.idx, -6
				; CHECK: %[[v6:.+]] = insertelement <4 x i16> %[[v5]], i16 %[[a6]], i64 2
				; CHECK: %[[a7:.+]] = add i16 %offset.idx, -7
				; CHECK: %[[v7:.+]] = insertelement <4 x i16> %[[v6]], i16 %[[a7]], i64 3

	define i32 @reverse_induction_i16(i16 %startval, i32 * %ptr) {			define i32 @reverse_induction_i16(i16 %startval, i32 * %ptr) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%add.i7 = phi i16 [ %startval, %entry ], [ %add.i, %for.body ]			%add.i7 = phi i16 [ %startval, %entry ], [ %add.i, %for.body ]
	%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]			%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]
	▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines