This is an archive of the discontinued LLVM Phabricator instance.

[LV] Unify vector and scalar maps
ClosedPublic

Authored by mssimpso on Aug 4 2016, 9:52 AM.

Download Raw Diff

Details

Reviewers

anemet
nadav
mkuper

Commits

rGabd2be1e2e52: [LV] Unify vector and scalar maps
rL279649: [LV] Unify vector and scalar maps

Summary

This patch unifies the data structures we use for mapping instructions from the original loop to their corresponding instructions in the new loop. Previously, we maintained two distinct maps for this purpose: WidenMap and ScalarIVMap. WidenMap maintained the vector values each instruction from the old loop was represented with, and ScalarIVMap maintained the scalar values each scalarized induction variable was represented with. With this patch, all values created for the new loop (vector and scalar) are maintained in WidenMap (renamed to VectorLoopValueMap).

The change allows for several simplifications. Previously, when an instruction was scalarized, we had to insert the scalar values into vectors in order to maintain the mapping in WidenMap. Then, if a user of the scalarized value was also scalar, we had to extract the scalar values from the temporary vector we created. This caused unnecessary scalar-to-vector-to-scalar conversions, resulting in some cases in significant code bloat pre-InstCombine.

This patch avoids these unnecessary conversions by maintaining the scalar values directly. This can improve compile-time since it reduces the number of instructions in each block that InstCombine needs to simplify. If a scalarized value is used by a scalar instruction, the scalar value can be used directly. However, if the scalarized value is needed by a vector instruction, we will now generate the needed insertelement instructions on-demand.

A common idiom in several locations in the code (including the scalarization code), is to first get the vector values an instruction from the original loop maps to, and then extract a particular scalar value. This patch adds getScalarValue for this purpose along side getVectorValue as an interface into WidenMap. getScalarValue reuses a scalar value if available, or creates an extractelement instruction if a scalar value is not yet available. Similarly, getVectorValue has been modified to generate insertelement instructions on-demand if a vector version of a value is not yet available.

There's no real functional change with this patch (post-InstCombine); only compile-time savings and refactoring. However, in some cases we will generate different code. Because we now might generate an insertelement sequence on-demand, the IR post-InstCombine might be reordered somewhat. For example, instead of the insertelement sequence following the definition of an instruction, it will now precede the first use of that instruction. This can be seen in the test case changes.

Diff Detail

Repository: rL LLVM

Event Timeline

mssimpso updated this revision to Diff 66820.Aug 4 2016, 9:52 AM

mssimpso retitled this revision from to [LV] Unify vector and scalar maps.

mssimpso updated this object.

mssimpso added reviewers: nadav, anemet, mkuper.

mssimpso added subscribers: gilr, mcrosier, llvm-commits.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptAug 4 2016, 9:52 AM

wmi added a subscriber: wmi.Aug 4 2016, 11:04 AM

Ensure the new patch handles scalarized values with void types as we previously did. We make a vector entry for the value in WidenMap mapped to nullptr's.

spatel added a subscriber: spatel.Aug 4 2016, 3:35 PM

+1000 on the design and the cleanup, some nits about the implementation.

lib/Transforms/Vectorize/LoopVectorize.cpp
549 ↗	(On Diff #66850)	Perhaps use a 2D vector for this ("ScalarParts"), instead of the 1D VectorParts? I don't think we actually gain anything from using VectorParts for both - even in the one place where we do move a VectorParts from one map to the other (line 2300), you don't actually break the abstraction.
630 ↗	(On Diff #66850)	Bike-shedding - maybe change the name of the variable? "WidenMap" made sense when it mapped scalar values to the corresponding "wide" vector values.
2175 ↗	(On Diff #66850)	I -> Lane ?
2310 ↗	(On Diff #66850)	For clarity, I'd prefer a temp value here, and an assignment to Parts[Part] in the end.
2343 ↗	(On Diff #66850)	Is this for VF==1, for uniform values, or both?

As we're piling on more complexity on the top of WidenMap, I feel like that the completely permissive ValueParts &get() interface makes it really hard to reason about this code. The new eraseVector API is certainly a warning sign.

I think the new logic is correct because we first widen definitions before getting to the uses. So when widening the definition we either create the vector variant or the scalar variant or in some cases of int induction variables both. And then as we encounter uses, we derive the missing scalar or vector variants from each unless we already have a value for it. Please correct me if if I am misinterpreting this. It would also be good to have a high-level overview of these interactions commented somewhere before the class.

Anyhow, this all seems like the property of WidenMap, so I am wondering how hard it would be to pull getScalarValue and getVectorValue under that class. (I am also not sure that hiding the class inside InnerLoopVectorizer in a .cpp file is really adding anything but certainly hurts readability as the class is growing.)

Anyhow, sorry about the somewhat random comments, I was just wondering what you guys thought about this.

In D23169#506561, @anemet wrote:

As we're piling on more complexity on the top of WidenMap, I feel like that the completely permissive ValueParts &get() interface makes it really hard to reason about this code. The new eraseVector API is certainly a warning sign.

Right, that is confusing. We use "get" (now getScalar/getVector) to return an existing entry in the map or initialize an empty entry it if it's not already there. We use "getVectorValue" to return an existing entry or construct a non-empty entry (by broadcasting or, now, by inserting scalar elements into a vector). getVector/getScalar should probably just be initializers. For example, I think we should be able to replace all but one instance of getVector with getVectorValue. Then, getVector/getScalar could be renamed to initVector/initScalar, which would assert or noop if we try to initialize a key more than once. What do you think?

I think the new logic is correct because we first widen definitions before getting to the uses. So when widening the definition we either create the vector variant or the scalar variant or in some cases of int induction variables both. And then as we encounter uses, we derive the missing scalar or vector variants from each unless we already have a value for it. Please correct me if if I am misinterpreting this. It would also be good to have a high-level overview of these interactions commented somewhere before the class.

That's right. Sure I'll add some high-level comments.

Anyhow, this all seems like the property of WidenMap, so I am wondering how hard it would be to pull getScalarValue and getVectorValue under that class. (I am also not sure that hiding the class inside InnerLoopVectorizer in a .cpp file is really adding anything but certainly hurts readability as the class is growing.)

I think moving getVectorValue and getScalarValue into WidenMap might be tricky. The issue is that getVectorValue uses getBroadcastInstrs, which is a member of InnerLoopVectorizer, and hasStride from LoopVectorizationLegality. For getBroadcastInstrs, would we pass an InnerLoopVectorizer* to WidenMap? Alternatively, we could probably make getBroadcastInstrs into a utility (maybe in VectorUtils?). What about hasStride? What do you think?

lib/Transforms/Vectorize/LoopVectorize.cpp
549 ↗	(On Diff #66850)	Sounds good to me.
630 ↗	(On Diff #66850)	I actually thought about doing this, but couldn't really come up with anything better. Do you have any ideas? What about VectorLoopMap?
2175 ↗	(On Diff #66850)	Makes sense.
2310 ↗	(On Diff #66850)	Sure.
2343 ↗	(On Diff #66850)	Nice question! This is for the VF == 1 case, and it prevents us from trying to extract an element from a non-vector type. I moved this check from the existing scalarization code (line 2827). But we could replace the if condition here with VF == 1 if that makes more sense. To be clear, if a value is not an instruction in the loop, it's handled by the first check. For the instructions in Uniforms, I believe we will vectorize/scalarize them, and then the unused pieces will be later deleted.

In D23169#506915, @mssimpso wrote:

In D23169#506561, @anemet wrote:

As we're piling on more complexity on the top of WidenMap, I feel like that the completely permissive ValueParts &get() interface makes it really hard to reason about this code. The new eraseVector API is certainly a warning sign.

Right, that is confusing. We use "get" (now getScalar/getVector) to return an existing entry in the map or initialize an empty entry it if it's not already there. We use "getVectorValue" to return an existing entry or construct a non-empty entry (by broadcasting or, now, by inserting scalar elements into a vector). getVector/getScalar should probably just be initializers. For example, I think we should be able to replace all but one instance of getVector with getVectorValue. Then, getVector/getScalar could be renamed to initVector/initScalar, which would assert or noop if we try to initialize a key more than once. What do you think?

That would be very nice.

I think the new logic is correct because we first widen definitions before getting to the uses. So when widening the definition we either create the vector variant or the scalar variant or in some cases of int induction variables both. And then as we encounter uses, we derive the missing scalar or vector variants from each unless we already have a value for it. Please correct me if if I am misinterpreting this. It would also be good to have a high-level overview of these interactions commented somewhere before the class.

That's right. Sure I'll add some high-level comments.

Thanks.

Anyhow, this all seems like the property of WidenMap, so I am wondering how hard it would be to pull getScalarValue and getVectorValue under that class. (I am also not sure that hiding the class inside InnerLoopVectorizer in a .cpp file is really adding anything but certainly hurts readability as the class is growing.)

I think moving getVectorValue and getScalarValue into WidenMap might be tricky. The issue is that getVectorValue uses getBroadcastInstrs, which is a member of InnerLoopVectorizer, and hasStride from LoopVectorizationLegality. For getBroadcastInstrs, would we pass an InnerLoopVectorizer* to WidenMap? Alternatively, we could probably make getBroadcastInstrs into a utility (maybe in VectorUtils?). What about hasStride? What do you think?

Yeah, hasStride does *not* seem to belong to WM. We could have a wrapper of WM's getVectorValue in ILV that just swaps in the speculated stride value before calling WM::getVectorValue.

getBroadcastInstrs on the other hand probably does belong to WM because it's part of the whole picture how values are mapped. So probably making it stand-alone utility is the way to go.

Anyhow this all is certainly for another patch. I just wanted to see what you guys thought about moving even more of the mapping responsibility out of ILV to WM.

Addressed comments from Michael and Adam.

Added "ScalarParts" type as a 2D vector.
Removed from WidenMap the "VectorParts &get" interface and replaced it with "VectorParts &initVector" and "ScalarParts &initScalar", as previously discussed. These initializers assert if a key has already been mapped, which should prevent data from being overwritten unintentionally. Because of this change, I had to slightly modify interleaved access vectorization since we vectorize all instructions in a group at once.
I also removed the "VectorParts &splat" interface since this is now just a special case of initVector.
The data in WidenMap can now only be accessed via getVectorValue and getScalarValue. Until we can move these functions inside WidenMap, we have to declare them as friends to get access to the data.
Renamed WidenMap to VectorLoopValueMap (but I'm still open to name suggestions).
Added high-level comments about the interaction between getVectorValue and getScalarValue.
Addressed other minor comments.

Replaced unneeded interleaved access check in vectorizeMemoryInstruction with an assert.

Michael/Adam,

Do you have anymore comments on this change? Thanks!

Matt, sorry about the delay.

lib/Transforms/Vectorize/LoopVectorize.cpp
545–553 ↗	(On Diff #67219)	If we can't change all users to use init, it would be better to keep both functionalities separate rather than hiding it behind a default parameter. Just keep the ugly get interface as well, so that the users are easy to find. Hopefully we can migrate all users to init in the future.
2587–2589 ↗	(On Diff #67219)	This is weird in terms of its interface use. getVectorValue computes values and then this one overrides it?! I think we should just stick with the &get interface here and clean in it up in the future.
2679 ↗	(On Diff #67219)	Is this call necessary?
2859–2862 ↗	(On Diff #67219)	It would be good to explain the scenario under which we have already created a vector value for this value.

Matt, I also had some comments from earlier that I've never sent out. This was about the future direction of get{Scalar,Vector}Value:

In D23169#506915, @mssimpso wrote:

Anyhow, this all seems like the property of WidenMap, so I am wondering how hard it would be to pull getScalarValue and getVectorValue under that class. (I am also not sure that hiding the class inside InnerLoopVectorizer in a .cpp file is really adding anything but certainly hurts readability as the class is growing.)

I think moving getVectorValue and getScalarValue into WidenMap might be tricky. The issue is that getVectorValue uses getBroadcastInstrs, which is a member of InnerLoopVectorizer, and hasStride from LoopVectorizationLegality. For getBroadcastInstrs, would we pass an InnerLoopVectorizer* to WidenMap? Alternatively, we could probably make getBroadcastInstrs into a utility (maybe in VectorUtils?). What about hasStride? What do you think?

Yeah, hasStride does *not* seem to belong to WM. We could have a wrapper of WM's getVectorValue in ILV that just swaps in the speculated stride value before calling WM::getVectorValue.

getBroadcastInstrs on the other hand probably does belong to WM because it's part of the whole picture how values are mapped. So probably making it stand-alone utility is the way to go.

Anyhow this all is certainly for another patch. I just wanted to see what you guys thought about moving even more of the mapping responsibility out of ILV to WM.

Sorry for the delay, thanks for nudging me!

lib/Transforms/Vectorize/LoopVectorize.cpp
558 ↗	(On Diff #67219)	Do we need the Val parameter? It doesn't look like we call this with a non-default Val anywhere.
2322 ↗	(On Diff #67219)	This looks a bit odd. When does this happen? I don't think we call getVectorValue() for stores - and if we do, I'm not sure what the expected result is. If it doesn't happen, perhaps this ought to be an assert?
2368 ↗	(On Diff #67219)	I'd still appreciate a comment (and maybe an assert) that this is for VF==1.
2371 ↗	(On Diff #67219)	The comment is a bit odd, because the "Otherwise" looks like it refers to the previous comment ("If the value has not been scalarized...") while in fact it refers to the one before ("If the value from the original loop has not been vectorized").
2861 ↗	(On Diff #67219)	Just to make sure I'm not missing anything - the reason this is needed is because we eagerly created the vector entry in the beginning of vectorizeBlockInLoop(), right? We don't actually have a vector entry at this point, it's just an empty placeholder - but we need to delete it, so that if we need a real vector entry, it will get constructed from scratch from the scalars?

Michael/Adam,

Thanks very much for the comments! No problem about the delay - I'm happy to have the feedback. I've replied to your comments inline, and in one case asked a question about what we want the ValueMap interface change here to be.

In D23169#515712, @anemet wrote:

Yeah, hasStride does *not* seem to belong to WM. We could have a wrapper of WM's getVectorValue in ILV that just swaps in the speculated stride value before calling WM::getVectorValue.

getBroadcastInstrs on the other hand probably does belong to WM because it's part of the whole picture how values are mapped. So probably making it stand-alone utility is the way to go.

Anyhow this all is certainly for another patch. I just wanted to see what you guys thought about moving even more of the mapping responsibility out of ILV to WM.

Sorry to miss replying to these comments earlier. Yes, I think it makes sense to move the mapping related tasks out of ILV and into ValueMap. I think getBroadcastIstrs, since it's currently used by both ValueMap and ILV, could actually live in VectorUtils. I think it just needs to know where to insert the code.

lib/Transforms/Vectorize/LoopVectorize.cpp
545–553 ↗	(On Diff #67219)	I actually did change all users to either init(Vector\|Scalar) or get(Vector\|Scalar)Value. The default parameter here was added as a replacement for the "splat" interface, which I removed. I can add this back though. But looking over your other comments, I think you're wondering why getVectorValue returns a reference to the map entries that are then able to be overridden and set by the caller? This is basically the same as the original WidenMap.get I agree that's, strange. So to be clear, you're suggesting that for now we keep the original "get" and "splat" interfaces along side the new init* interfaces? Or are you suggesting that we abandon the init* interfaces for now as well?
558 ↗	(On Diff #67219)	No, I'll remove it. Thanks!
2322 ↗	(On Diff #67219)	I agree, I don't think this should happen. This was taken from the scalarization code, and I kept it this way to make the patch as NFC as possible. The only possible, but unlikely, issue I can think of is if we use presence in WidenMap as an indicator for "already vectorized". But again, I don't think we do this. I think it's perfectly sensible to add an assert. I'll update the patch.
2368 ↗	(On Diff #67219)	Sure, sounds good!
2371 ↗	(On Diff #67219)	The comments are confusing; I'll fix that. The intent was the following. Case 2: "value has been scalarized." Case 3: "value hasn't been scalarized, but also hasn't been vectorized (VF=1)". Case 4: "value has been vectorized".
2587–2589 ↗	(On Diff #67219)	Sure. To be clear, getVectorValue returns a reference to an entry (which can be overridden) if it exists in the map, and only computes values if it's not already there.
2679 ↗	(On Diff #67219)	Yes, the Entry values are set later on in the function. (The diff here is a bit misleading. Here, I had replaced WidenMap.get with getVectorValue).
2859–2862 ↗	(On Diff #67219)	I think Michael covered this in his comment. I'll update the comment here to explain why the erase is needed.
2861 ↗	(On Diff #67219)	That's right! I'll update the comment to better clarify this.

anemet added inline comments.Aug 15 2016, 7:59 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
545–553 ↗	(On Diff #67219)	Yes, the former. The init stuff is great, my hope is that it will slowly become the main interface along with get{Vector,Scalar}Value. I think we want to use 'get' when all we do is allocate the entries. 'init' when we actually set the value and get*Value when we let the variants be derived from each other. I think I still don't understand initVector with Val=nullptr. Isn't that just &get? If yes than I think that's what we should use for it since we're not setting up the entries. How much of this you want to put in this patch or leave it for later improvements (by you or others) I leave it to you. My goal is to use the interfaces that return mutable references as little as possible and restrain the current ones to return const & as much as possible.

Addressed comments from Adam and Michael.

This revision changes the ValueMap interface to more closely align with the spirit of Adam's comments. In particular, the init{Vector, Scalar} interface has been changed to accept a reference to {Vector, Scalar}Parts, which are then moved over to the actual map entries. Also, getVectorValue in InnerLoopVectorizer has been modified to return a constant reference.

Taking VectorizeBlockInLoop as an example, instead of getting an empty map entry with WidenMap.get() at the top of the function, we now just allocate a temporary VectorParts. Once we assign all the VectorParts entries, we then move the VectorParts to the map with initVector. So whenever we actually map a value, we use init{Vector, Scalar}. Also, because we no longer eagerly create empty mappings, eraseVector is no longer needed.

Most of the changes required to use initVector and the constant reference version of getVectorValue are fairly mechanical. One exception was interleaved accesses, where I had to reorder the code such that we didn't require mutable entries. In a few other cases, we still actually do need to change the mapped entries, and I've left getVector in ValueMap for this, which returns a non-constant reference. These cases include the "fix-up" operations that happen after the first phase of widening (i.e., type truncation and the second widening phase for recurrences).

All the other minor comments have been addressed as well. Thanks!

In D23169#519875, @mssimpso wrote:

Also, because we no longer eagerly create empty mappings, eraseVector is no longer needed.

Hurray!

This is all looking pretty nice now!

lib/Transforms/Vectorize/LoopVectorize.cpp
655 ↗	(On Diff #68578)	I don't think this is what you want. This will still do a copy because Entry is a const &. I think that this function should take Entry by value (or by r-value but that would make it harder to use).
659–661 ↗	(On Diff #68578)	Please add a comment discouraging the use of this function in favor of initVector.
2309 ↗	(On Diff #68578)	This should std::move Entry.
3830 ↗	(On Diff #68578)	I am not a big fan of auto in cases where the type is not obvious. At least we should have const. Same later.
4389 ↗	(On Diff #68578)	Would it be safer to do this inside the blocks? I want to make sure you didn't leave a Entry[] = ... without a corresponding initVector.

mssimpso mentioned this in D23509: [LoopVectorize] Query TTI when deciding to splat IV.Aug 22 2016, 9:42 AM

mssimpso added inline comments.Aug 22 2016, 10:25 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
655 ↗	(On Diff #68578)	Ah, you're right. Thanks for pointing this out. I'll update this to be by value.
659–661 ↗	(On Diff #68578)	Sure.
2309 ↗	(On Diff #68578)	Right.
3830 ↗	(On Diff #68578)	Sure, instead of auto, I'll just make these "const VectorParts &" to make things clear.
4389 ↗	(On Diff #68578)	I agree. There's a few cases that don't actually use Entry. I'll update the patch.

Addressed Adam's comments.

Regarding the moves, I think this may be a case of unneeded complexity. If the SmallVectors are sized appropriately, a move shouldn't be much better (if better at all) than a copy. So I've reversed my previous comment and left int{Vector, Scalar} taking a constant reference to {Vector, Scalar}Parts and just removed the std::move's, which were wrong as Adam pointed out. The next best thing, I think, would be if they accepted rvalues, but again, that would probably be overly complex for very little gain.

All remaining comments are addressed. Thanks!

Looks great to me! Thanks for your work.

This revision is now accepted and ready to land.Aug 23 2016, 9:39 AM

Closed by commit rL279649: [LV] Unify vector and scalar maps (authored by mssimpso). · Explain WhyAug 24 2016, 11:31 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

497 lines

test/

Transforms/

LoopVectorize/

AArch64/

arbitrary-induction-step.ll

4 lines

X86/

scatter_crash.ll

96 lines

if-pred-non-void.ll

20 lines

if-pred-stores.ll

10 lines

Diff 69147

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 318 Lines • ▼ Show 20 Lines	InnerLoopVectorizer(Loop *OrigLoop, PredicatedScalarEvolution &PSE,
LoopInfo LI, DominatorTree DT,		LoopInfo LI, DominatorTree DT,
const TargetLibraryInfo *TLI,		const TargetLibraryInfo *TLI,
const TargetTransformInfo TTI, AssumptionCache AC,		const TargetTransformInfo TTI, AssumptionCache AC,
OptimizationRemarkEmitter *ORE, unsigned VecWidth,		OptimizationRemarkEmitter *ORE, unsigned VecWidth,
unsigned UnrollFactor)		unsigned UnrollFactor)
: OrigLoop(OrigLoop), PSE(PSE), LI(LI), DT(DT), TLI(TLI), TTI(TTI),		: OrigLoop(OrigLoop), PSE(PSE), LI(LI), DT(DT), TLI(TLI), TTI(TTI),
AC(AC), ORE(ORE), VF(VecWidth), UF(UnrollFactor),		AC(AC), ORE(ORE), VF(VecWidth), UF(UnrollFactor),
Builder(PSE.getSE()->getContext()), Induction(nullptr),		Builder(PSE.getSE()->getContext()), Induction(nullptr),
OldInduction(nullptr), WidenMap(UnrollFactor), TripCount(nullptr),		OldInduction(nullptr), VectorLoopValueMap(UnrollFactor, VecWidth),
VectorTripCount(nullptr), Legal(nullptr), AddedSafetyChecks(false) {}		TripCount(nullptr), VectorTripCount(nullptr), Legal(nullptr),
		AddedSafetyChecks(false) {}

// Perform the actual loop widening (vectorization).		// Perform the actual loop widening (vectorization).
// MinimumBitWidths maps scalar integer values to the smallest bitwidth they		// MinimumBitWidths maps scalar integer values to the smallest bitwidth they
// can be validly truncated to. The cost model has assumed this truncation		// can be validly truncated to. The cost model has assumed this truncation
// will happen when vectorizing. VecValuesToIgnore contains scalar values		// will happen when vectorizing. VecValuesToIgnore contains scalar values
// that the cost model has chosen to ignore because they will not be		// that the cost model has chosen to ignore because they will not be
// vectorized.		// vectorized.
void vectorize(LoopVectorizationLegality *L,		void vectorize(LoopVectorizationLegality *L,
Show All 10 Lines	public:
// Return true if any runtime check is added.		// Return true if any runtime check is added.
bool areSafetyChecksAdded() { return AddedSafetyChecks; }		bool areSafetyChecksAdded() { return AddedSafetyChecks; }

virtual ~InnerLoopVectorizer() {}		virtual ~InnerLoopVectorizer() {}

protected:		protected:
/// A small list of PHINodes.		/// A small list of PHINodes.
typedef SmallVector<PHINode *, 4> PhiVector;		typedef SmallVector<PHINode *, 4> PhiVector;
/// When we unroll loops we have multiple vector values for each scalar.
/// This data structure holds the unrolled and vectorized values that		/// A type for vectorized values in the new loop. Each value from the
/// originated from one scalar instruction.		/// original loop, when vectorized, is represented by UF vector values in the
		/// new unrolled loop, where UF is the unroll factor.
typedef SmallVector<Value *, 2> VectorParts;		typedef SmallVector<Value *, 2> VectorParts;

		/// A type for scalarized values in the new loop. Each value from the
		/// original loop, when scalarized, is represented by UF x VF scalar values
		/// in the new unrolled loop, where UF is the unroll factor and VF is the
		/// vectorization factor.
		typedef SmallVector<SmallVector<Value *, 4>, 2> ScalarParts;

// When we if-convert we need to create edge masks. We have to cache values		// When we if-convert we need to create edge masks. We have to cache values
// so that we don't end up with exponential recursion/IR.		// so that we don't end up with exponential recursion/IR.
typedef DenseMap<std::pair<BasicBlock , BasicBlock >, VectorParts>		typedef DenseMap<std::pair<BasicBlock , BasicBlock >, VectorParts>
EdgeMaskCache;		EdgeMaskCache;

/// Create an empty loop, based on the loop ranges of the old loop.		/// Create an empty loop, based on the loop ranges of the old loop.
void createEmptyLoop();		void createEmptyLoop();

▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	protected:
/// induction variable will first be truncated to the corresponding type. The		/// induction variable will first be truncated to the corresponding type. The
/// widened values are placed in \p Entry.		/// widened values are placed in \p Entry.
void widenIntInduction(PHINode *IV, VectorParts &Entry,		void widenIntInduction(PHINode *IV, VectorParts &Entry,
TruncInst *Trunc = nullptr);		TruncInst *Trunc = nullptr);

/// Returns true if we should generate a scalar version of \p IV.		/// Returns true if we should generate a scalar version of \p IV.
bool needsScalarInduction(Instruction *IV) const;		bool needsScalarInduction(Instruction *IV) const;

/// When we go over instructions in the basic block we rely on previous		/// Return a constant reference to the VectorParts corresponding to \p V from
/// values within the current basic block or on loop invariant values.		/// the original loop. If the value has already been vectorized, the
/// When we widen (vectorize) values we place them in the map. If the values		/// corresponding vector entry in VectorLoopValueMap is returned. If,
/// are not within the map, they have to be loop invariant, so we simply		/// however, the value has a scalar entry in VectorLoopValueMap, we construct
/// broadcast them into a vector.		/// new vector values on-demand by inserting the scalar values into vectors
VectorParts &getVectorValue(Value *V);		/// with an insertelement sequence. If the value has been neither vectorized
		/// nor scalarized, it must be loop invariant, so we simply broadcast the
		/// value into vectors.
		const VectorParts &getVectorValue(Value *V);

		/// Return a value in the new loop corresponding to \p V from the original
		/// loop at unroll index \p Part and vector index \p Lane. If the value has
		/// been vectorized but not scalarized, the necessary extractelement
		/// instruction will be generated.
		Value getScalarValue(Value V, unsigned Part, unsigned Lane);

/// Try to vectorize the interleaved access group that \p Instr belongs to.		/// Try to vectorize the interleaved access group that \p Instr belongs to.
void vectorizeInterleaveGroup(Instruction *Instr);		void vectorizeInterleaveGroup(Instruction *Instr);

/// Generate a shuffle sequence that will reverse the vector Vec.		/// Generate a shuffle sequence that will reverse the vector Vec.
virtual Value reverseVector(Value Vec);		virtual Value reverseVector(Value Vec);

/// Returns (and creates if needed) the original loop trip count.		/// Returns (and creates if needed) the original loop trip count.
Show All 26 Lines	protected:
/// addNewMetadata). Use this for newly created instructions in the vector		/// addNewMetadata). Use this for newly created instructions in the vector
/// loop.		/// loop.
void addMetadata(Instruction To, Instruction From);		void addMetadata(Instruction To, Instruction From);

/// \brief Similar to the previous function but it adds the metadata to a		/// \brief Similar to the previous function but it adds the metadata to a
/// vector of instructions.		/// vector of instructions.
void addMetadata(ArrayRef<Value > To, Instruction From);		void addMetadata(ArrayRef<Value > To, Instruction From);

/// This is a helper class that holds the vectorizer state. It maps scalar		/// This is a helper class for maintaining vectorization state. It's used for
/// instructions to vector instructions. When the code is 'unrolled' then		/// mapping values from the original loop to their corresponding values in
/// then a single scalar value is mapped to multiple vector parts. The parts		/// the new loop. Two mappings are maintained: one for vectorized values and
/// are stored in the VectorPart type.		/// one for scalarized values. Vectorized values are represented with UF
		/// vector values in the new loop, and scalarized values are represented with
		/// UF x VF scalar values in the new loop. UF and VF are the unroll and
		/// vectorization factors, respectively.
		///
		/// Entries can be added to either map with initVector and initScalar, which
		/// initialize and return a constant reference to the new entry. If a
		/// non-constant reference to a vector entry is required, getVector can be
		/// used to retrieve a mutable entry. We currently directly modify the mapped
		/// values during "fix-up" operations that occur once the first phase of
		/// widening is complete. These operations include type truncation and the
		/// second phase of recurrence widening.
		///
		/// Otherwise, entries from either map should be accessed using the
		/// getVectorValue or getScalarValue functions from InnerLoopVectorizer.
		/// getVectorValue and getScalarValue coordinate to generate a vector or
		/// scalar value on-demand if one is not yet available. When vectorizing a
		/// loop, we visit the definition of an instruction before its uses. When
		/// visiting the definition, we either vectorize or scalarize the
		/// instruction, creating an entry for it in the corresponding map. (In some
		/// cases, such as induction variables, we will create both vector and scalar
		/// entries.) Then, as we encounter uses of the definition, we derive values
		/// for each scalar or vector use unless such a value is already available.
		/// For example, if we scalarize a definition and one of its uses is vector,
		/// we build the required vector on-demand with an insertelement sequence
		/// when visiting the use. Otherwise, if the use is scalar, we can use the
		/// existing scalar definition.
struct ValueMap {		struct ValueMap {
/// C'tor. UnrollFactor controls the number of vectors ('parts') that
/// are mapped.		/// Construct an empty map with the given unroll and vectorization factors.
ValueMap(unsigned UnrollFactor) : UF(UnrollFactor) {}		ValueMap(unsigned UnrollFactor, unsigned VecWidth)
		: UF(UnrollFactor), VF(VecWidth) {
/// \return True if 'Key' is saved in the Value Map.		// The unroll and vectorization factors are only used in asserts builds
bool has(Value *Key) const { return MapStorage.count(Key); }		// to verify map entries are sized appropriately.
		(void)UF;
/// Initializes a new entry in the map. Sets all of the vector parts to the		(void)VF;
/// save value in 'Val'.		}
/// \return A reference to a vector with splat values.
VectorParts &splat(Value Key, Value Val) {		/// \return True if the map has a vector entry for \p Key.
VectorParts &Entry = MapStorage[Key];		bool hasVector(Value *Key) const { return VectorMapStorage.count(Key); }
Entry.assign(UF, Val);
return Entry;		/// \return True if the map has a scalar entry for \p Key.
}		bool hasScalar(Value *Key) const { return ScalarMapStorage.count(Key); }

///\return A reference to the value that is stored at 'Key'.		/// \brief Map \p Key to the given VectorParts \p Entry, and return a
VectorParts &get(Value *Key) {		/// constant reference to the new vector map entry. The given key should
VectorParts &Entry = MapStorage[Key];		/// not already be in the map, and the given VectorParts should be
if (Entry.empty())		/// correctly sized for the current unroll factor.
Entry.resize(UF);		const VectorParts &initVector(Value *Key, const VectorParts &Entry) {
assert(Entry.size() == UF);		assert(!hasVector(Key) && "Vector entry already initialized");
return Entry;		assert(Entry.size() == UF && "VectorParts has wrong dimensions");
}		VectorMapStorage[Key] = Entry;
		return VectorMapStorage[Key];
		}

		/// \brief Map \p Key to the given ScalarParts \p Entry, and return a
		/// constant reference to the new scalar map entry. The given key should
		/// not already be in the map, and the given ScalarParts should be
		/// correctly sized for the current unroll and vectorization factors.
		const ScalarParts &initScalar(Value *Key, const ScalarParts &Entry) {
		assert(!hasScalar(Key) && "Scalar entry already initialized");
		assert(Entry.size() == UF &&
		all_of(make_range(Entry.begin(), Entry.end()),
		[&](const SmallVectorImpl<Value *> &Values) -> bool {
		return Values.size() == VF;
		}) &&
		"ScalarParts has wrong dimensions");
		ScalarMapStorage[Key] = Entry;
		return ScalarMapStorage[Key];
		}

		/// \return A reference to the vector map entry corresponding to \p Key.
		/// The key should already be in the map. This function should only be used
		/// when it's necessary to update values that have already been vectorized.
		/// This is the case for "fix-up" operations including type truncation and
		/// the second phase of recurrence vectorization. If a non-const reference
		/// isn't required, getVectorValue should be used instead.
		VectorParts &getVector(Value *Key) {
		assert(hasVector(Key) && "Vector entry not initialized");
		return VectorMapStorage.find(Key)->second;
		}

		/// Retrieve an entry from the vector or scalar maps. The preferred way to
		/// access an existing mapped entry is with getVectorValue or
		/// getScalarValue from InnerLoopVectorizer. Until those functions can be
		/// moved inside ValueMap, we have to declare them as friends.
		friend const VectorParts &InnerLoopVectorizer::getVectorValue(Value *V);
		friend Value InnerLoopVectorizer::getScalarValue(Value V, unsigned Part,
		unsigned Lane);

private:		private:
/// The unroll factor. Each entry in the map stores this number of vector		/// The unroll factor. Each entry in the vector map contains UF vector
/// elements.		/// values.
unsigned UF;		unsigned UF;

/// Map storage. We use std::map and not DenseMap because insertions to a		/// The vectorization factor. Each entry in the scalar map contains UF x VF
/// dense map invalidates its iterators.		/// scalar values.
std::map<Value *, VectorParts> MapStorage;		unsigned VF;

		/// The vector and scalar map storage. We use std::map and not DenseMap
		/// because insertions to DenseMap invalidate its iterators.
		std::map<Value *, VectorParts> VectorMapStorage;
		std::map<Value *, ScalarParts> ScalarMapStorage;
};		};

/// The original loop.		/// The original loop.
Loop *OrigLoop;		Loop *OrigLoop;
/// A wrapper around ScalarEvolution used to add runtime SCEV checks. Applies		/// A wrapper around ScalarEvolution used to add runtime SCEV checks. Applies
/// dynamic knowledge to simplify SCEV expressions and converts them to a		/// dynamic knowledge to simplify SCEV expressions and converts them to a
/// more usable form.		/// more usable form.
PredicatedScalarEvolution &PSE;		PredicatedScalarEvolution &PSE;
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	protected:
BasicBlock *LoopScalarBody;		BasicBlock *LoopScalarBody;
/// A list of all bypass blocks. The first block is the entry of the loop.		/// A list of all bypass blocks. The first block is the entry of the loop.
SmallVector<BasicBlock *, 4> LoopBypassBlocks;		SmallVector<BasicBlock *, 4> LoopBypassBlocks;

/// The new Induction variable which was added to the new block.		/// The new Induction variable which was added to the new block.
PHINode *Induction;		PHINode *Induction;
/// The induction variable of the old basic block.		/// The induction variable of the old basic block.
PHINode *OldInduction;		PHINode *OldInduction;
/// Maps scalars to widened vectors.
ValueMap WidenMap;

/// A map of induction variables from the original loop to their		/// Maps values from the original loop to their corresponding values in the
/// corresponding VF * UF scalarized values in the vectorized loop. The		/// vectorized loop. A key value can map to either vector values, scalar
/// purpose of ScalarIVMap is similar to that of WidenMap. Whereas WidenMap		/// values or both kinds of values, depending on whether the key was
/// maps original loop values to their vector versions in the new loop,		/// vectorized and scalarized.
/// ScalarIVMap maps induction variables from the original loop that are not		ValueMap VectorLoopValueMap;
/// vectorized to their scalar equivalents in the vector loop. Maintaining a
/// separate map for scalarized induction variables allows us to avoid
/// unnecessary scalar-to-vector-to-scalar conversions.
DenseMap<Value , SmallVector<Value , 8>> ScalarIVMap;

/// Store instructions that should be predicated, as a pair		/// Store instructions that should be predicated, as a pair
/// <StoreInst, Predicate>		/// <StoreInst, Predicate>
SmallVector<std::pair<Instruction , Value >, 4> PredicatedInstructions;		SmallVector<std::pair<Instruction , Value >, 4> PredicatedInstructions;
EdgeMaskCache MaskCache;		EdgeMaskCache MaskCache;
/// Trip count of the original loop.		/// Trip count of the original loop.
Value *TripCount;		Value *TripCount;
/// Trip count of the widened loop (TripCount - TripCount % (VF*UF))		/// Trip count of the widened loop (TripCount - TripCount % (VF*UF))
▲ Show 20 Lines • Show All 1,527 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::buildScalarSteps(Value ScalarIV, Value Step,
// We shouldn't have to build scalar steps if we aren't vectorizing.		// We shouldn't have to build scalar steps if we aren't vectorizing.
assert(VF > 1 && "VF should be greater than one");		assert(VF > 1 && "VF should be greater than one");

// Get the value type and ensure it and the step have the same integer type.		// Get the value type and ensure it and the step have the same integer type.
Type *ScalarIVTy = ScalarIV->getType()->getScalarType();		Type *ScalarIVTy = ScalarIV->getType()->getScalarType();
assert(ScalarIVTy->isIntegerTy() && ScalarIVTy == Step->getType() &&		assert(ScalarIVTy->isIntegerTy() && ScalarIVTy == Step->getType() &&
"Val and Step should have the same integer type");		"Val and Step should have the same integer type");

// Compute the scalar steps and save the results in ScalarIVMap.		// Compute the scalar steps and save the results in VectorLoopValueMap.
for (unsigned Part = 0; Part < UF; ++Part)		ScalarParts Entry(UF);
for (unsigned I = 0; I < VF; ++I) {		for (unsigned Part = 0; Part < UF; ++Part) {
auto StartIdx = ConstantInt::get(ScalarIVTy, VF Part + I);		Entry[Part].resize(VF);
		for (unsigned Lane = 0; Lane < VF; ++Lane) {
		auto StartIdx = ConstantInt::get(ScalarIVTy, VF Part + Lane);
auto *Mul = Builder.CreateMul(StartIdx, Step);		auto *Mul = Builder.CreateMul(StartIdx, Step);
auto *Add = Builder.CreateAdd(ScalarIV, Mul);		auto *Add = Builder.CreateAdd(ScalarIV, Mul);
ScalarIVMap[EntryVal].push_back(Add);		Entry[Part][Lane] = Add;
		}
}		}
		VectorLoopValueMap.initScalar(EntryVal, Entry);
}		}

int LoopVectorizationLegality::isConsecutivePtr(Value *Ptr) {		int LoopVectorizationLegality::isConsecutivePtr(Value *Ptr) {
assert(Ptr->getType()->isPointerTy() && "Unexpected non-ptr");		assert(Ptr->getType()->isPointerTy() && "Unexpected non-ptr");
auto *SE = PSE.getSE();		auto *SE = PSE.getSE();
// Make sure that the pointer does not point to structs.		// Make sure that the pointer does not point to structs.
if (Ptr->getType()->getPointerElementType()->isAggregateType())		if (Ptr->getType()->getPointerElementType()->isAggregateType())
return 0;		return 0;
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	int LoopVectorizationLegality::isConsecutivePtr(Value *Ptr) {

return 0;		return 0;
}		}

bool LoopVectorizationLegality::isUniform(Value *V) {		bool LoopVectorizationLegality::isUniform(Value *V) {
return LAI->isUniform(V);		return LAI->isUniform(V);
}		}

InnerLoopVectorizer::VectorParts &		const InnerLoopVectorizer::VectorParts &
InnerLoopVectorizer::getVectorValue(Value *V) {		InnerLoopVectorizer::getVectorValue(Value *V) {
assert(V != Induction && "The new induction variable should not be used.");		assert(V != Induction && "The new induction variable should not be used.");
assert(!V->getType()->isVectorTy() && "Can't widen a vector");		assert(!V->getType()->isVectorTy() && "Can't widen a vector");
		assert(!V->getType()->isVoidTy() && "Type does not produce a value");

// If we have a stride that is replaced by one, do it here.		// If we have a stride that is replaced by one, do it here.
if (Legal->hasStride(V))		if (Legal->hasStride(V))
V = ConstantInt::get(V->getType(), 1);		V = ConstantInt::get(V->getType(), 1);

// If we have this scalar in the map, return it.		// If we have this scalar in the map, return it.
if (WidenMap.has(V))		if (VectorLoopValueMap.hasVector(V))
return WidenMap.get(V);		return VectorLoopValueMap.VectorMapStorage[V];

		// If the value has not been vectorized, check if it has been scalarized
		// instead. If it has been scalarized, and we actually need the value in
		// vector form, we will construct the vector values on demand.
		if (VectorLoopValueMap.hasScalar(V)) {

		// Initialize a new vector map entry.
		VectorParts Entry(UF);

		// If we aren't vectorizing, we can just copy the scalar map values over to
		// the vector map.
		if (VF == 1) {
		for (unsigned Part = 0; Part < UF; ++Part)
		Entry[Part] = getScalarValue(V, Part, 0);
		return VectorLoopValueMap.initVector(V, Entry);
		}

		// However, if we are vectorizing, we need to construct the vector values
		// using insertelement instructions. Since the resulting vectors are stored
		// in VectorLoopValueMap, we will only generate the insertelements once.
		for (unsigned Part = 0; Part < UF; ++Part) {
		Value *Insert = UndefValue::get(VectorType::get(V->getType(), VF));
		for (unsigned Width = 0; Width < VF; ++Width)
		Insert = Builder.CreateInsertElement(
		Insert, getScalarValue(V, Part, Width), Builder.getInt32(Width));
		Entry[Part] = Insert;
		}
		return VectorLoopValueMap.initVector(V, Entry);
		}

// If this scalar is unknown, assume that it is a constant or that it is		// If this scalar is unknown, assume that it is a constant or that it is
// loop invariant. Broadcast V and save the value for future uses.		// loop invariant. Broadcast V and save the value for future uses.
Value *B = getBroadcastInstrs(V);		Value *B = getBroadcastInstrs(V);
return WidenMap.splat(V, B);		return VectorLoopValueMap.initVector(V, VectorParts(UF, B));
		}

		Value InnerLoopVectorizer::getScalarValue(Value V, unsigned Part,
		unsigned Lane) {

		// If the value is not an instruction contained in the loop, it should
		// already be scalar.
		if (OrigLoop->isLoopInvariant(V))
		return V;

		// If the value from the original loop has not been vectorized, it is
		// represented by UF x VF scalar values in the new loop. Return the requested
		// scalar value.
		if (VectorLoopValueMap.hasScalar(V))
		return VectorLoopValueMap.ScalarMapStorage[V][Part][Lane];

		// If the value has not been scalarized, get its entry in VectorLoopValueMap
		// for the given unroll part. If this entry is not a vector type (i.e., the
		// vectorization factor is one), there is no need to generate an
		// extractelement instruction.
		auto *U = getVectorValue(V)[Part];
		if (!U->getType()->isVectorTy()) {
		assert(VF == 1 && "Value not scalarized has non-vector type");
		return U;
		}

		// Otherwise, the value from the original loop has been vectorized and is
		// represented by UF vector values. Extract and return the requested scalar
		// value from the appropriate vector lane.
		return Builder.CreateExtractElement(U, Builder.getInt32(Lane));
}		}

Value InnerLoopVectorizer::reverseVector(Value Vec) {		Value InnerLoopVectorizer::reverseVector(Value Vec) {
assert(Vec->getType()->isVectorTy() && "Invalid type");		assert(Vec->getType()->isVectorTy() && "Invalid type");
SmallVector<Constant *, 8> ShuffleMask;		SmallVector<Constant *, 8> ShuffleMask;
for (unsigned i = 0; i < VF; ++i)		for (unsigned i = 0; i < VF; ++i)
ShuffleMask.push_back(Builder.getInt32(VF - i - 1));		ShuffleMask.push_back(Builder.getInt32(VF - i - 1));

▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::vectorizeInterleaveGroup(Instruction *Instr) {
// Prepare for the vector type of the interleaved load/store.		// Prepare for the vector type of the interleaved load/store.
Type *ScalarTy = LI ? LI->getType() : SI->getValueOperand()->getType();		Type *ScalarTy = LI ? LI->getType() : SI->getValueOperand()->getType();
unsigned InterleaveFactor = Group->getFactor();		unsigned InterleaveFactor = Group->getFactor();
Type VecTy = VectorType::get(ScalarTy, InterleaveFactor VF);		Type VecTy = VectorType::get(ScalarTy, InterleaveFactor VF);
Type *PtrTy = VecTy->getPointerTo(Ptr->getType()->getPointerAddressSpace());		Type *PtrTy = VecTy->getPointerTo(Ptr->getType()->getPointerAddressSpace());

// Prepare for the new pointers.		// Prepare for the new pointers.
setDebugLocFromInst(Builder, Ptr);		setDebugLocFromInst(Builder, Ptr);
VectorParts &PtrParts = getVectorValue(Ptr);
SmallVector<Value *, 2> NewPtrs;		SmallVector<Value *, 2> NewPtrs;
unsigned Index = Group->getIndex(Instr);		unsigned Index = Group->getIndex(Instr);
for (unsigned Part = 0; Part < UF; Part++) {		for (unsigned Part = 0; Part < UF; Part++) {
// Extract the pointer for current instruction from the pointer vector. A		Value *NewPtr = getScalarValue(Ptr, Part, Group->isReverse() ? VF - 1 : 0);
// reverse access uses the pointer in the last lane.
Value *NewPtr = Builder.CreateExtractElement(
PtrParts[Part],
Group->isReverse() ? Builder.getInt32(VF - 1) : Builder.getInt32(0));

// Notice current instruction could be any index. Need to adjust the address		// Notice current instruction could be any index. Need to adjust the address
// to the member of index 0.		// to the member of index 0.
//		//
// E.g. a = A[i+1]; // Member of index 1 (Current instruction)		// E.g. a = A[i+1]; // Member of index 1 (Current instruction)
// b = A[i]; // Member of index 0		// b = A[i]; // Member of index 0
// Current pointer is pointed to A[i+1], adjust it to A[i].		// Current pointer is pointed to A[i+1], adjust it to A[i].
//		//
// E.g. A[i+1] = a; // Member of index 1		// E.g. A[i+1] = a; // Member of index 1
// A[i] = b; // Member of index 0		// A[i] = b; // Member of index 0
// A[i+2] = c; // Member of index 2 (Current instruction)		// A[i+2] = c; // Member of index 2 (Current instruction)
// Current pointer is pointed to A[i+2], adjust it to A[i].		// Current pointer is pointed to A[i+2], adjust it to A[i].
NewPtr = Builder.CreateGEP(NewPtr, Builder.getInt32(-Index));		NewPtr = Builder.CreateGEP(NewPtr, Builder.getInt32(-Index));

// Cast to the vector pointer type.		// Cast to the vector pointer type.
NewPtrs.push_back(Builder.CreateBitCast(NewPtr, PtrTy));		NewPtrs.push_back(Builder.CreateBitCast(NewPtr, PtrTy));
}		}

setDebugLocFromInst(Builder, Instr);		setDebugLocFromInst(Builder, Instr);
Value *UndefVec = UndefValue::get(VecTy);		Value *UndefVec = UndefValue::get(VecTy);

// Vectorize the interleaved load group.		// Vectorize the interleaved load group.
if (LI) {		if (LI) {

		// For each unroll part, create a wide load for the group.
		SmallVector<Value *, 2> NewLoads;
for (unsigned Part = 0; Part < UF; Part++) {		for (unsigned Part = 0; Part < UF; Part++) {
Instruction *NewLoadInstr = Builder.CreateAlignedLoad(		auto *NewLoad = Builder.CreateAlignedLoad(
NewPtrs[Part], Group->getAlignment(), "wide.vec");		NewPtrs[Part], Group->getAlignment(), "wide.vec");
		addMetadata(NewLoad, Instr);
		NewLoads.push_back(NewLoad);
		}

for (unsigned i = 0; i < InterleaveFactor; i++) {		// For each member in the group, shuffle out the appropriate data from the
Instruction *Member = Group->getMember(i);		// wide loads.
		for (unsigned I = 0; I < InterleaveFactor; ++I) {
		Instruction *Member = Group->getMember(I);

// Skip the gaps in the group.		// Skip the gaps in the group.
if (!Member)		if (!Member)
continue;		continue;

Constant *StrideMask = getStridedMask(Builder, i, InterleaveFactor, VF);		VectorParts Entry(UF);
		Constant *StrideMask = getStridedMask(Builder, I, InterleaveFactor, VF);
		for (unsigned Part = 0; Part < UF; Part++) {
Value *StridedVec = Builder.CreateShuffleVector(		Value *StridedVec = Builder.CreateShuffleVector(
NewLoadInstr, UndefVec, StrideMask, "strided.vec");		NewLoads[Part], UndefVec, StrideMask, "strided.vec");

// If this member has different type, cast the result type.		// If this member has different type, cast the result type.
if (Member->getType() != ScalarTy) {		if (Member->getType() != ScalarTy) {
VectorType *OtherVTy = VectorType::get(Member->getType(), VF);		VectorType *OtherVTy = VectorType::get(Member->getType(), VF);
StridedVec = Builder.CreateBitOrPointerCast(StridedVec, OtherVTy);		StridedVec = Builder.CreateBitOrPointerCast(StridedVec, OtherVTy);
}		}

VectorParts &Entry = WidenMap.get(Member);
Entry[Part] =		Entry[Part] =
Group->isReverse() ? reverseVector(StridedVec) : StridedVec;		Group->isReverse() ? reverseVector(StridedVec) : StridedVec;
}		}
		VectorLoopValueMap.initVector(Member, Entry);
addMetadata(NewLoadInstr, Instr);
}		}
return;		return;
}		}

// The sub vector type for current instruction.		// The sub vector type for current instruction.
VectorType *SubVT = VectorType::get(ScalarTy, VF);		VectorType *SubVT = VectorType::get(ScalarTy, VF);

// Vectorize the interleaved store group.		// Vectorize the interleaved store group.
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr) {
bool Reverse = ConsecutiveStride < 0;		bool Reverse = ConsecutiveStride < 0;
bool CreateGatherScatter =		bool CreateGatherScatter =
!ConsecutiveStride && ((LI && Legal->isLegalMaskedGather(ScalarDataTy)) \|\|		!ConsecutiveStride && ((LI && Legal->isLegalMaskedGather(ScalarDataTy)) \|\|
(SI && Legal->isLegalMaskedScatter(ScalarDataTy)));		(SI && Legal->isLegalMaskedScatter(ScalarDataTy)));

if (!ConsecutiveStride && !CreateGatherScatter)		if (!ConsecutiveStride && !CreateGatherScatter)
return scalarizeInstruction(Instr);		return scalarizeInstruction(Instr);

Constant *Zero = Builder.getInt32(0);
VectorParts &Entry = WidenMap.get(Instr);
VectorParts VectorGep;		VectorParts VectorGep;

// Handle consecutive loads/stores.		// Handle consecutive loads/stores.
GetElementPtrInst *Gep = getGEPInstruction(Ptr);		GetElementPtrInst *Gep = getGEPInstruction(Ptr);
if (ConsecutiveStride) {		if (ConsecutiveStride) {
if (Gep && Legal->isInductionVariable(Gep->getPointerOperand())) {		if (Gep && Legal->isInductionVariable(Gep->getPointerOperand())) {
setDebugLocFromInst(Builder, Gep);		setDebugLocFromInst(Builder, Gep);
Value *PtrOperand = Gep->getPointerOperand();		auto *FirstBasePtr = getScalarValue(Gep->getPointerOperand(), 0, 0);
Value *FirstBasePtr = getVectorValue(PtrOperand)[0];
FirstBasePtr = Builder.CreateExtractElement(FirstBasePtr, Zero);

// Create the new GEP with the new induction variable.		// Create the new GEP with the new induction variable.
GetElementPtrInst *Gep2 = cast<GetElementPtrInst>(Gep->clone());		GetElementPtrInst *Gep2 = cast<GetElementPtrInst>(Gep->clone());
Gep2->setOperand(0, FirstBasePtr);		Gep2->setOperand(0, FirstBasePtr);
Gep2->setName("gep.indvar.base");		Gep2->setName("gep.indvar.base");
Ptr = Builder.Insert(Gep2);		Ptr = Builder.Insert(Gep2);
} else if (Gep) {		} else if (Gep) {
setDebugLocFromInst(Builder, Gep);		setDebugLocFromInst(Builder, Gep);
Show All 14 Lines	if (Gep && Legal->isInductionVariable(Gep->getPointerOperand())) {
// Update last index or loop invariant instruction anchored in loop.		// Update last index or loop invariant instruction anchored in loop.
if (i == InductionOperand \|\|		if (i == InductionOperand \|\|
(GepOperandInst && OrigLoop->contains(GepOperandInst))) {		(GepOperandInst && OrigLoop->contains(GepOperandInst))) {
assert((i == InductionOperand \|\|		assert((i == InductionOperand \|\|
PSE.getSE()->isLoopInvariant(PSE.getSCEV(GepOperandInst),		PSE.getSE()->isLoopInvariant(PSE.getSCEV(GepOperandInst),
OrigLoop)) &&		OrigLoop)) &&
"Must be last index or loop invariant");		"Must be last index or loop invariant");

VectorParts &GEPParts = getVectorValue(GepOperand);		Gep2->setOperand(i, getScalarValue(GepOperand, 0, 0));

// If GepOperand is an induction variable, and there's a scalarized
// version of it available, use it. Otherwise, we will need to create
// an extractelement instruction.
Value *Index = ScalarIVMap.count(GepOperand)
? ScalarIVMap[GepOperand][0]
: Builder.CreateExtractElement(GEPParts[0], Zero);

Gep2->setOperand(i, Index);
Gep2->setName("gep.indvar.idx");		Gep2->setName("gep.indvar.idx");
}		}
}		}
Ptr = Builder.Insert(Gep2);		Ptr = Builder.Insert(Gep2);
} else { // No GEP		} else { // No GEP
// Use the induction element ptr.		// Use the induction element ptr.
assert(isa<PHINode>(Ptr) && "Invalid induction ptr");		assert(isa<PHINode>(Ptr) && "Invalid induction ptr");
setDebugLocFromInst(Builder, Ptr);		setDebugLocFromInst(Builder, Ptr);
VectorParts &PtrVal = getVectorValue(Ptr);		Ptr = getScalarValue(Ptr, 0, 0);
Ptr = Builder.CreateExtractElement(PtrVal[0], Zero);
}		}
} else {		} else {
// At this point we should vector version of GEP for Gather or Scatter		// At this point we should vector version of GEP for Gather or Scatter
assert(CreateGatherScatter && "The instruction should be scalarized");		assert(CreateGatherScatter && "The instruction should be scalarized");
if (Gep) {		if (Gep) {
// Vectorizing GEP, across UF parts. We want to get a vector value for base		// Vectorizing GEP, across UF parts. We want to get a vector value for base
// and each index that's defined inside the loop, even if it is		// and each index that's defined inside the loop, even if it is
// loop-invariant but wasn't hoisted out. Otherwise we want to keep them		// loop-invariant but wasn't hoisted out. Otherwise we want to keep them
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	for (unsigned Part = 0; Part < UF; ++Part) {
addMetadata(NewSI, SI);		addMetadata(NewSI, SI);
}		}
return;		return;
}		}

// Handle loads.		// Handle loads.
assert(LI && "Must have a load instruction");		assert(LI && "Must have a load instruction");
setDebugLocFromInst(Builder, LI);		setDebugLocFromInst(Builder, LI);
		VectorParts Entry(UF);
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Instruction *NewLI;		Instruction *NewLI;
if (CreateGatherScatter) {		if (CreateGatherScatter) {
Value *MaskPart = Legal->isMaskRequired(LI) ? Mask[Part] : nullptr;		Value *MaskPart = Legal->isMaskRequired(LI) ? Mask[Part] : nullptr;
NewLI = Builder.CreateMaskedGather(VectorGep[Part], Alignment, MaskPart,		NewLI = Builder.CreateMaskedGather(VectorGep[Part], Alignment, MaskPart,
0, "wide.masked.gather");		0, "wide.masked.gather");
Entry[Part] = NewLI;		Entry[Part] = NewLI;
} else {		} else {
Show All 16 Lines	if (CreateGatherScatter) {
UndefValue::get(DataTy),		UndefValue::get(DataTy),
"wide.masked.load");		"wide.masked.load");
else		else
NewLI = Builder.CreateAlignedLoad(VecPtr, Alignment, "wide.load");		NewLI = Builder.CreateAlignedLoad(VecPtr, Alignment, "wide.load");
Entry[Part] = Reverse ? reverseVector(NewLI) : NewLI;		Entry[Part] = Reverse ? reverseVector(NewLI) : NewLI;
}		}
addMetadata(NewLI, LI);		addMetadata(NewLI, LI);
}		}
		VectorLoopValueMap.initVector(Instr, Entry);
}		}

void InnerLoopVectorizer::scalarizeInstruction(Instruction *Instr,		void InnerLoopVectorizer::scalarizeInstruction(Instruction *Instr,
bool IfPredicateInstr) {		bool IfPredicateInstr) {
assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");		assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");
DEBUG(dbgs() << "LV: Scalarizing"		DEBUG(dbgs() << "LV: Scalarizing"
<< (IfPredicateInstr ? " and predicating:" : ":") << *Instr		<< (IfPredicateInstr ? " and predicating:" : ":") << *Instr
<< '\n');		<< '\n');
// Holds vector parameters or scalars, in case of uniform vals.		// Holds vector parameters or scalars, in case of uniform vals.
SmallVector<VectorParts, 4> Params;		SmallVector<VectorParts, 4> Params;

setDebugLocFromInst(Builder, Instr);		setDebugLocFromInst(Builder, Instr);

// Find all of the vectorized parameters.
for (Value *SrcOp : Instr->operands()) {
// If we are accessing the old induction variable, use the new one.
if (SrcOp == OldInduction) {
Params.push_back(getVectorValue(SrcOp));
continue;
}

// Try using previously calculated values.
auto *SrcInst = dyn_cast<Instruction>(SrcOp);

// If the src is an instruction that appeared earlier in the basic block,
// then it should already be vectorized.
if (SrcInst && OrigLoop->contains(SrcInst)) {
assert(WidenMap.has(SrcInst) && "Source operand is unavailable");
// The parameter is a vector value from earlier.
Params.push_back(WidenMap.get(SrcInst));
} else {
// The parameter is a scalar from outside the loop. Maybe even a constant.
VectorParts Scalars;
Scalars.append(UF, SrcOp);
Params.push_back(Scalars);
}
}

assert(Params.size() == Instr->getNumOperands() &&
"Invalid number of operands");

// Does this instruction return a value ?		// Does this instruction return a value ?
bool IsVoidRetTy = Instr->getType()->isVoidTy();		bool IsVoidRetTy = Instr->getType()->isVoidTy();

Value *UndefVec =		// Initialize a new scalar map entry.
IsVoidRetTy ? nullptr		ScalarParts Entry(UF);
: UndefValue::get(VectorType::get(Instr->getType(), VF));
// Create a new entry in the WidenMap and initialize it to Undef or Null.
VectorParts &VecResults = WidenMap.splat(Instr, UndefVec);

VectorParts Cond;		VectorParts Cond;
if (IfPredicateInstr) {		if (IfPredicateInstr) {
assert(Instr->getParent()->getSinglePredecessor() &&		assert(Instr->getParent()->getSinglePredecessor() &&
"Only support single predecessor blocks");		"Only support single predecessor blocks");
Cond = createEdgeMask(Instr->getParent()->getSinglePredecessor(),		Cond = createEdgeMask(Instr->getParent()->getSinglePredecessor(),
Instr->getParent());		Instr->getParent());
}		}

// For each vector unroll 'part':		// For each vector unroll 'part':
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
		Entry[Part].resize(VF);
// For each scalar that we create:		// For each scalar that we create:
for (unsigned Width = 0; Width < VF; ++Width) {		for (unsigned Width = 0; Width < VF; ++Width) {

// Start if-block.		// Start if-block.
Value *Cmp = nullptr;		Value *Cmp = nullptr;
if (IfPredicateInstr) {		if (IfPredicateInstr) {
Cmp = Builder.CreateExtractElement(Cond[Part], Builder.getInt32(Width));		Cmp = Builder.CreateExtractElement(Cond[Part], Builder.getInt32(Width));
Cmp = Builder.CreateICmp(ICmpInst::ICMP_EQ, Cmp,		Cmp = Builder.CreateICmp(ICmpInst::ICMP_EQ, Cmp,
ConstantInt::get(Cmp->getType(), 1));		ConstantInt::get(Cmp->getType(), 1));
}		}

Instruction *Cloned = Instr->clone();		Instruction *Cloned = Instr->clone();
if (!IsVoidRetTy)		if (!IsVoidRetTy)
Cloned->setName(Instr->getName() + ".cloned");		Cloned->setName(Instr->getName() + ".cloned");
// Replace the operands of the cloned instructions with extracted scalars.
for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {

// If the operand is an induction variable, and there's a scalarized		// Replace the operands of the cloned instructions with their scalar
// version of it available, use it. Otherwise, we will need to create		// equivalents in the new loop.
// an extractelement instruction if vectorizing.		for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {
auto *NewOp = Params[op][Part];		auto *NewOp = getScalarValue(Instr->getOperand(op), Part, Width);
auto *ScalarOp = Instr->getOperand(op);
if (ScalarIVMap.count(ScalarOp))
NewOp = ScalarIVMap[ScalarOp][VF * Part + Width];
else if (NewOp->getType()->isVectorTy())
NewOp = Builder.CreateExtractElement(NewOp, Builder.getInt32(Width));
Cloned->setOperand(op, NewOp);		Cloned->setOperand(op, NewOp);
}		}
addNewMetadata(Cloned, Instr);		addNewMetadata(Cloned, Instr);

// Place the cloned scalar in the new loop.		// Place the cloned scalar in the new loop.
Builder.Insert(Cloned);		Builder.Insert(Cloned);

		// Add the cloned scalar to the scalar map entry.
		Entry[Part][Width] = Cloned;

// If we just cloned a new assumption, add it the assumption cache.		// If we just cloned a new assumption, add it the assumption cache.
if (auto *II = dyn_cast<IntrinsicInst>(Cloned))		if (auto *II = dyn_cast<IntrinsicInst>(Cloned))
if (II->getIntrinsicID() == Intrinsic::assume)		if (II->getIntrinsicID() == Intrinsic::assume)
AC->registerAssumption(II);		AC->registerAssumption(II);

// If the original scalar returns a value we need to place it in a vector
// so that future users will be able to use it.
if (!IsVoidRetTy)
VecResults[Part] = Builder.CreateInsertElement(VecResults[Part], Cloned,
Builder.getInt32(Width));
// End if-block.		// End if-block.
if (IfPredicateInstr)		if (IfPredicateInstr)
PredicatedInstructions.push_back(std::make_pair(Cloned, Cmp));		PredicatedInstructions.push_back(std::make_pair(Cloned, Cmp));
}		}
}		}
		VectorLoopValueMap.initScalar(Instr, Entry);
}		}

PHINode InnerLoopVectorizer::createInductionVariable(Loop L, Value *Start,		PHINode InnerLoopVectorizer::createInductionVariable(Loop L, Value *Start,
Value End, Value Step,		Value End, Value Step,
Instruction *DL) {		Instruction *DL) {
BasicBlock *Header = L->getHeader();		BasicBlock *Header = L->getHeader();
BasicBlock *Latch = L->getLoopLatch();		BasicBlock *Latch = L->getLoopLatch();
// As we're just creating this loop, it's possible no latch exists		// As we're just creating this loop, it's possible no latch exists
▲ Show 20 Lines • Show All 667 Lines • ▼ Show 20 Lines

void InnerLoopVectorizer::truncateToMinimalBitwidths() {		void InnerLoopVectorizer::truncateToMinimalBitwidths() {
// For every instruction `I` in MinBWs, truncate the operands, create a		// For every instruction `I` in MinBWs, truncate the operands, create a
// truncated version of `I` and reextend its result. InstCombine runs		// truncated version of `I` and reextend its result. InstCombine runs
// later and will remove any ext/trunc pairs.		// later and will remove any ext/trunc pairs.
//		//
SmallPtrSet<Value *, 4> Erased;		SmallPtrSet<Value *, 4> Erased;
for (const auto &KV : *MinBWs) {		for (const auto &KV : *MinBWs) {
VectorParts &Parts = WidenMap.get(KV.first);		VectorParts &Parts = VectorLoopValueMap.getVector(KV.first);
for (Value *&I : Parts) {		for (Value *&I : Parts) {
if (Erased.count(I) \|\| I->use_empty() \|\| !isa<Instruction>(I))		if (Erased.count(I) \|\| I->use_empty() \|\| !isa<Instruction>(I))
continue;		continue;
Type *OriginalTy = I->getType();		Type *OriginalTy = I->getType();
Type *ScalarTruncatedTy =		Type *ScalarTruncatedTy =
IntegerType::get(OriginalTy->getContext(), KV.second);		IntegerType::get(OriginalTy->getContext(), KV.second);
Type *TruncatedTy = VectorType::get(ScalarTruncatedTy,		Type *TruncatedTy = VectorType::get(ScalarTruncatedTy,
OriginalTy->getVectorNumElements());		OriginalTy->getVectorNumElements());
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	for (Value *&I : Parts) {
cast<Instruction>(I)->eraseFromParent();		cast<Instruction>(I)->eraseFromParent();
Erased.insert(I);		Erased.insert(I);
I = Res;		I = Res;
}		}
}		}

// We'll have created a bunch of ZExts that are now parentless. Clean up.		// We'll have created a bunch of ZExts that are now parentless. Clean up.
for (const auto &KV : *MinBWs) {		for (const auto &KV : *MinBWs) {
VectorParts &Parts = WidenMap.get(KV.first);		VectorParts &Parts = VectorLoopValueMap.getVector(KV.first);
for (Value *&I : Parts) {		for (Value *&I : Parts) {
ZExtInst *Inst = dyn_cast<ZExtInst>(I);		ZExtInst *Inst = dyn_cast<ZExtInst>(I);
if (Inst && Inst->use_empty()) {		if (Inst && Inst->use_empty()) {
Value *NewI = Inst->getOperand(0);		Value *NewI = Inst->getOperand(0);
Inst->eraseFromParent();		Inst->eraseFromParent();
I = NewI;		I = NewI;
}		}
}		}
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	for (PHINode *Phi : PHIsToFix) {

// We need to generate a reduction vector from the incoming scalar.		// We need to generate a reduction vector from the incoming scalar.
// To do so, we need to generate the 'identity' vector and override		// To do so, we need to generate the 'identity' vector and override
// one of the elements with the incoming scalar reduction. We need		// one of the elements with the incoming scalar reduction. We need
// to do it in the vector-loop preheader.		// to do it in the vector-loop preheader.
Builder.SetInsertPoint(LoopBypassBlocks[1]->getTerminator());		Builder.SetInsertPoint(LoopBypassBlocks[1]->getTerminator());

// This is the vector-clone of the value that leaves the loop.		// This is the vector-clone of the value that leaves the loop.
VectorParts &VectorExit = getVectorValue(LoopExitInst);		const VectorParts &VectorExit = getVectorValue(LoopExitInst);
Type *VecTy = VectorExit[0]->getType();		Type *VecTy = VectorExit[0]->getType();

// Find the reduction identity variable. Zero for addition, or, xor,		// Find the reduction identity variable. Zero for addition, or, xor,
// one for multiplication, -1 for And.		// one for multiplication, -1 for And.
Value *Identity;		Value *Identity;
Value *VectorStart;		Value *VectorStart;
if (RK == RecurrenceDescriptor::RK_IntegerMinMax \|\|		if (RK == RecurrenceDescriptor::RK_IntegerMinMax \|\|
RK == RecurrenceDescriptor::RK_FloatMinMax) {		RK == RecurrenceDescriptor::RK_FloatMinMax) {
Show All 22 Lines	if (RK == RecurrenceDescriptor::RK_IntegerMinMax \|\|
Builder.CreateInsertElement(Identity, ReductionStartValue, Zero);		Builder.CreateInsertElement(Identity, ReductionStartValue, Zero);
}		}
}		}

// Fix the vector-loop phi.		// Fix the vector-loop phi.

// Reductions do not have to start at zero. They can start with		// Reductions do not have to start at zero. They can start with
// any loop invariant values.		// any loop invariant values.
VectorParts &VecRdxPhi = WidenMap.get(Phi);		const VectorParts &VecRdxPhi = getVectorValue(Phi);
BasicBlock *Latch = OrigLoop->getLoopLatch();		BasicBlock *Latch = OrigLoop->getLoopLatch();
Value *LoopVal = Phi->getIncomingValueForBlock(Latch);		Value *LoopVal = Phi->getIncomingValueForBlock(Latch);
VectorParts &Val = getVectorValue(LoopVal);		const VectorParts &Val = getVectorValue(LoopVal);
for (unsigned part = 0; part < UF; ++part) {		for (unsigned part = 0; part < UF; ++part) {
// Make sure to add the reduction stat value only to the		// Make sure to add the reduction stat value only to the
// first unroll part.		// first unroll part.
Value *StartVal = (part == 0) ? VectorStart : Identity;		Value *StartVal = (part == 0) ? VectorStart : Identity;
cast<PHINode>(VecRdxPhi[part])		cast<PHINode>(VecRdxPhi[part])
->addIncoming(StartVal, LoopVectorPreHeader);		->addIncoming(StartVal, LoopVectorPreHeader);
cast<PHINode>(VecRdxPhi[part])		cast<PHINode>(VecRdxPhi[part])
->addIncoming(Val[part], LoopVectorBody);		->addIncoming(Val[part], LoopVectorBody);
}		}

// Before each round, move the insertion point right between		// Before each round, move the insertion point right between
// the PHIs and the values we are going to write.		// the PHIs and the values we are going to write.
// This allows us to write both PHINodes and the extractelement		// This allows us to write both PHINodes and the extractelement
// instructions.		// instructions.
Builder.SetInsertPoint(&*LoopMiddleBlock->getFirstInsertionPt());		Builder.SetInsertPoint(&*LoopMiddleBlock->getFirstInsertionPt());

VectorParts RdxParts = getVectorValue(LoopExitInst);		VectorParts &RdxParts = VectorLoopValueMap.getVector(LoopExitInst);
setDebugLocFromInst(Builder, LoopExitInst);		setDebugLocFromInst(Builder, LoopExitInst);

// If the vector reduction can be performed in a smaller type, we truncate		// If the vector reduction can be performed in a smaller type, we truncate
// then extend the loop exit value to enable InstCombine to evaluate the		// then extend the loop exit value to enable InstCombine to evaluate the
// entire expression in the smaller type.		// entire expression in the smaller type.
if (VF > 1 && Phi->getType() != RdxDesc.getRecurrenceType()) {		if (VF > 1 && Phi->getType() != RdxDesc.getRecurrenceType()) {
Type *RdxVecTy = VectorType::get(RdxDesc.getRecurrenceType(), VF);		Type *RdxVecTy = VectorType::get(RdxDesc.getRecurrenceType(), VF);
Builder.SetInsertPoint(LoopVectorBody->getTerminator());		Builder.SetInsertPoint(LoopVectorBody->getTerminator());
▲ Show 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	if (VF > 1) {
Builder.SetInsertPoint(LoopVectorPreHeader->getTerminator());		Builder.SetInsertPoint(LoopVectorPreHeader->getTerminator());
VectorInit = Builder.CreateInsertElement(		VectorInit = Builder.CreateInsertElement(
UndefValue::get(VectorType::get(VectorInit->getType(), VF)), VectorInit,		UndefValue::get(VectorType::get(VectorInit->getType(), VF)), VectorInit,
Builder.getInt32(VF - 1), "vector.recur.init");		Builder.getInt32(VF - 1), "vector.recur.init");
}		}

// We constructed a temporary phi node in the first phase of vectorization.		// We constructed a temporary phi node in the first phase of vectorization.
// This phi node will eventually be deleted.		// This phi node will eventually be deleted.
auto &PhiParts = getVectorValue(Phi);		VectorParts &PhiParts = VectorLoopValueMap.getVector(Phi);
Builder.SetInsertPoint(cast<Instruction>(PhiParts[0]));		Builder.SetInsertPoint(cast<Instruction>(PhiParts[0]));

// Create a phi node for the new recurrence. The current value will either be		// Create a phi node for the new recurrence. The current value will either be
// the initial value inserted into a vector or loop-varying vector value.		// the initial value inserted into a vector or loop-varying vector value.
auto *VecPhi = Builder.CreatePHI(VectorInit->getType(), 2, "vector.recur");		auto *VecPhi = Builder.CreatePHI(VectorInit->getType(), 2, "vector.recur");
VecPhi->addIncoming(VectorInit, LoopVectorPreHeader);		VecPhi->addIncoming(VectorInit, LoopVectorPreHeader);

// Get the vectorized previous value. We ensured the previous values was an		// Get the vectorized previous value. We ensured the previous values was an
▲ Show 20 Lines • Show All 293 Lines • ▼ Show 20 Lines	if (P->getParent() != OrigLoop->getHeader()) {

// Generate a sequence of selects of the form:		// Generate a sequence of selects of the form:
// SELECT(Mask3, In3,		// SELECT(Mask3, In3,
// SELECT(Mask2, In2,		// SELECT(Mask2, In2,
// ( ...)))		// ( ...)))
for (unsigned In = 0; In < NumIncoming; In++) {		for (unsigned In = 0; In < NumIncoming; In++) {
VectorParts Cond =		VectorParts Cond =
createEdgeMask(P->getIncomingBlock(In), P->getParent());		createEdgeMask(P->getIncomingBlock(In), P->getParent());
VectorParts &In0 = getVectorValue(P->getIncomingValue(In));		const VectorParts &In0 = getVectorValue(P->getIncomingValue(In));

for (unsigned part = 0; part < UF; ++part) {		for (unsigned part = 0; part < UF; ++part) {
// We might have single edge PHIs (blocks) - use an identity		// We might have single edge PHIs (blocks) - use an identity
// 'select' for the first PHI operand.		// 'select' for the first PHI operand.
if (In == 0)		if (In == 0)
Entry[part] = Builder.CreateSelect(Cond[part], In0[part], In0[part]);		Entry[part] = Builder.CreateSelect(Cond[part], In0[part], In0[part]);
else		else
// Select between the current value and the previous incoming edge		// Select between the current value and the previous incoming edge
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	static bool mayDivideByZero(Instruction &I) {
Value *Divisor = I.getOperand(1);		Value *Divisor = I.getOperand(1);
auto *CInt = dyn_cast<ConstantInt>(Divisor);		auto *CInt = dyn_cast<ConstantInt>(Divisor);
return !CInt \|\| CInt->isZero();		return !CInt \|\| CInt->isZero();
}		}

void InnerLoopVectorizer::vectorizeBlockInLoop(BasicBlock BB, PhiVector PV) {		void InnerLoopVectorizer::vectorizeBlockInLoop(BasicBlock BB, PhiVector PV) {
// For each instruction in the old loop.		// For each instruction in the old loop.
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
VectorParts &Entry = WidenMap.get(&I);

switch (I.getOpcode()) {		switch (I.getOpcode()) {
case Instruction::Br:		case Instruction::Br:
// Nothing to do for PHIs and BR, since we already took care of the		// Nothing to do for PHIs and BR, since we already took care of the
// loop control flow instructions.		// loop control flow instructions.
continue;		continue;
case Instruction::PHI: {		case Instruction::PHI: {
// Vectorize PHINodes.		// Vectorize PHINodes.
		VectorParts Entry(UF);
widenPHIInstruction(&I, Entry, UF, VF, PV);		widenPHIInstruction(&I, Entry, UF, VF, PV);
		VectorLoopValueMap.initVector(&I, Entry);
continue;		continue;
} // End of PHI.		} // End of PHI.

case Instruction::UDiv:		case Instruction::UDiv:
case Instruction::SDiv:		case Instruction::SDiv:
case Instruction::SRem:		case Instruction::SRem:
case Instruction::URem:		case Instruction::URem:
// Scalarize with predication if this instruction may divide by zero and		// Scalarize with predication if this instruction may divide by zero and
Show All 14 Lines	for (Instruction &I : *BB) {
case Instruction::LShr:		case Instruction::LShr:
case Instruction::AShr:		case Instruction::AShr:
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor: {		case Instruction::Xor: {
// Just widen binops.		// Just widen binops.
auto *BinOp = cast<BinaryOperator>(&I);		auto *BinOp = cast<BinaryOperator>(&I);
setDebugLocFromInst(Builder, BinOp);		setDebugLocFromInst(Builder, BinOp);
VectorParts &A = getVectorValue(BinOp->getOperand(0));		const VectorParts &A = getVectorValue(BinOp->getOperand(0));
VectorParts &B = getVectorValue(BinOp->getOperand(1));		const VectorParts &B = getVectorValue(BinOp->getOperand(1));

// Use this vector value for all users of the original instruction.		// Use this vector value for all users of the original instruction.
		VectorParts Entry(UF);
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Value *V = Builder.CreateBinOp(BinOp->getOpcode(), A[Part], B[Part]);		Value *V = Builder.CreateBinOp(BinOp->getOpcode(), A[Part], B[Part]);

if (BinaryOperator *VecOp = dyn_cast<BinaryOperator>(V))		if (BinaryOperator *VecOp = dyn_cast<BinaryOperator>(V))
VecOp->copyIRFlags(BinOp);		VecOp->copyIRFlags(BinOp);

Entry[Part] = V;		Entry[Part] = V;
}		}

		VectorLoopValueMap.initVector(&I, Entry);
addMetadata(Entry, BinOp);		addMetadata(Entry, BinOp);
break;		break;
}		}
case Instruction::Select: {		case Instruction::Select: {
// Widen selects.		// Widen selects.
// If the selector is loop invariant we can create a select		// If the selector is loop invariant we can create a select
// instruction with a scalar condition. Otherwise, use vector-select.		// instruction with a scalar condition. Otherwise, use vector-select.
auto *SE = PSE.getSE();		auto *SE = PSE.getSE();
bool InvariantCond =		bool InvariantCond =
SE->isLoopInvariant(PSE.getSCEV(I.getOperand(0)), OrigLoop);		SE->isLoopInvariant(PSE.getSCEV(I.getOperand(0)), OrigLoop);
setDebugLocFromInst(Builder, &I);		setDebugLocFromInst(Builder, &I);

// The condition can be loop invariant but still defined inside the		// The condition can be loop invariant but still defined inside the
// loop. This means that we can't just use the original 'cond' value.		// loop. This means that we can't just use the original 'cond' value.
// We have to take the 'vectorized' value and pick the first lane.		// We have to take the 'vectorized' value and pick the first lane.
// Instcombine will make this a no-op.		// Instcombine will make this a no-op.
VectorParts &Cond = getVectorValue(I.getOperand(0));		const VectorParts &Cond = getVectorValue(I.getOperand(0));
VectorParts &Op0 = getVectorValue(I.getOperand(1));		const VectorParts &Op0 = getVectorValue(I.getOperand(1));
VectorParts &Op1 = getVectorValue(I.getOperand(2));		const VectorParts &Op1 = getVectorValue(I.getOperand(2));

Value *ScalarCond =		auto *ScalarCond = getScalarValue(I.getOperand(0), 0, 0);
(VF == 1)
? Cond[0]
: Builder.CreateExtractElement(Cond[0], Builder.getInt32(0));

		VectorParts Entry(UF);
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Entry[Part] = Builder.CreateSelect(		Entry[Part] = Builder.CreateSelect(
InvariantCond ? ScalarCond : Cond[Part], Op0[Part], Op1[Part]);		InvariantCond ? ScalarCond : Cond[Part], Op0[Part], Op1[Part]);
}		}

		VectorLoopValueMap.initVector(&I, Entry);
addMetadata(Entry, &I);		addMetadata(Entry, &I);
break;		break;
}		}

case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::FCmp: {		case Instruction::FCmp: {
// Widen compares. Generate vector compares.		// Widen compares. Generate vector compares.
bool FCmp = (I.getOpcode() == Instruction::FCmp);		bool FCmp = (I.getOpcode() == Instruction::FCmp);
auto *Cmp = dyn_cast<CmpInst>(&I);		auto *Cmp = dyn_cast<CmpInst>(&I);
setDebugLocFromInst(Builder, Cmp);		setDebugLocFromInst(Builder, Cmp);
VectorParts &A = getVectorValue(Cmp->getOperand(0));		const VectorParts &A = getVectorValue(Cmp->getOperand(0));
VectorParts &B = getVectorValue(Cmp->getOperand(1));		const VectorParts &B = getVectorValue(Cmp->getOperand(1));
		VectorParts Entry(UF);
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Value *C = nullptr;		Value *C = nullptr;
if (FCmp) {		if (FCmp) {
C = Builder.CreateFCmp(Cmp->getPredicate(), A[Part], B[Part]);		C = Builder.CreateFCmp(Cmp->getPredicate(), A[Part], B[Part]);
cast<FCmpInst>(C)->copyFastMathFlags(Cmp);		cast<FCmpInst>(C)->copyFastMathFlags(Cmp);
} else {		} else {
C = Builder.CreateICmp(Cmp->getPredicate(), A[Part], B[Part]);		C = Builder.CreateICmp(Cmp->getPredicate(), A[Part], B[Part]);
}		}
Entry[Part] = C;		Entry[Part] = C;
}		}

		VectorLoopValueMap.initVector(&I, Entry);
addMetadata(Entry, &I);		addMetadata(Entry, &I);
break;		break;
}		}

case Instruction::Store:		case Instruction::Store:
case Instruction::Load:		case Instruction::Load:
vectorizeMemoryInstruction(&I);		vectorizeMemoryInstruction(&I);
break;		break;
case Instruction::ZExt:		case Instruction::ZExt:
case Instruction::SExt:		case Instruction::SExt:
case Instruction::FPToUI:		case Instruction::FPToUI:
case Instruction::FPToSI:		case Instruction::FPToSI:
case Instruction::FPExt:		case Instruction::FPExt:
case Instruction::PtrToInt:		case Instruction::PtrToInt:
case Instruction::IntToPtr:		case Instruction::IntToPtr:
case Instruction::SIToFP:		case Instruction::SIToFP:
case Instruction::UIToFP:		case Instruction::UIToFP:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
auto *CI = dyn_cast<CastInst>(&I);		auto *CI = dyn_cast<CastInst>(&I);
setDebugLocFromInst(Builder, CI);		setDebugLocFromInst(Builder, CI);
		VectorParts Entry(UF);

// Optimize the special case where the source is a constant integer		// Optimize the special case where the source is a constant integer
// induction variable. Notice that we can only optimize the 'trunc' case		// induction variable. Notice that we can only optimize the 'trunc' case
// because (a) FP conversions lose precision, (b) sext/zext may wrap, and		// because (a) FP conversions lose precision, (b) sext/zext may wrap, and
// (c) other casts depend on pointer size.		// (c) other casts depend on pointer size.
auto ID = Legal->getInductionVars()->lookup(OldInduction);		auto ID = Legal->getInductionVars()->lookup(OldInduction);
if (isa<TruncInst>(CI) && CI->getOperand(0) == OldInduction &&		if (isa<TruncInst>(CI) && CI->getOperand(0) == OldInduction &&
ID.getConstIntStepValue()) {		ID.getConstIntStepValue()) {
widenIntInduction(OldInduction, Entry, cast<TruncInst>(CI));		widenIntInduction(OldInduction, Entry, cast<TruncInst>(CI));
		VectorLoopValueMap.initVector(&I, Entry);
addMetadata(Entry, &I);		addMetadata(Entry, &I);
break;		break;
}		}

/// Vectorize casts.		/// Vectorize casts.
Type *DestTy =		Type *DestTy =
(VF == 1) ? CI->getType() : VectorType::get(CI->getType(), VF);		(VF == 1) ? CI->getType() : VectorType::get(CI->getType(), VF);

VectorParts &A = getVectorValue(CI->getOperand(0));		const VectorParts &A = getVectorValue(CI->getOperand(0));
for (unsigned Part = 0; Part < UF; ++Part)		for (unsigned Part = 0; Part < UF; ++Part)
Entry[Part] = Builder.CreateCast(CI->getOpcode(), A[Part], DestTy);		Entry[Part] = Builder.CreateCast(CI->getOpcode(), A[Part], DestTy);
		VectorLoopValueMap.initVector(&I, Entry);
addMetadata(Entry, &I);		addMetadata(Entry, &I);
break;		break;
}		}

case Instruction::Call: {		case Instruction::Call: {
// Ignore dbg intrinsics.		// Ignore dbg intrinsics.
if (isa<DbgInfoIntrinsic>(I))		if (isa<DbgInfoIntrinsic>(I))
break;		break;
Show All 22 Lines	case Instruction::Call: {
unsigned CallCost = getVectorCallCost(CI, VF, *TTI, TLI, NeedToScalarize);		unsigned CallCost = getVectorCallCost(CI, VF, *TTI, TLI, NeedToScalarize);
bool UseVectorIntrinsic =		bool UseVectorIntrinsic =
ID && getVectorIntrinsicCost(CI, VF, *TTI, TLI) <= CallCost;		ID && getVectorIntrinsicCost(CI, VF, *TTI, TLI) <= CallCost;
if (!UseVectorIntrinsic && NeedToScalarize) {		if (!UseVectorIntrinsic && NeedToScalarize) {
scalarizeInstruction(&I);		scalarizeInstruction(&I);
break;		break;
}		}

		VectorParts Entry(UF);
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
SmallVector<Value *, 4> Args;		SmallVector<Value *, 4> Args;
for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i) {		for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i) {
Value *Arg = CI->getArgOperand(i);		Value *Arg = CI->getArgOperand(i);
// Some intrinsics have a scalar argument - don't replace it with a		// Some intrinsics have a scalar argument - don't replace it with a
// vector.		// vector.
if (!UseVectorIntrinsic \|\| !hasVectorInstrinsicScalarOpd(ID, i)) {		if (!UseVectorIntrinsic \|\| !hasVectorInstrinsicScalarOpd(ID, i)) {
VectorParts &VectorArg = getVectorValue(CI->getArgOperand(i));		const VectorParts &VectorArg = getVectorValue(CI->getArgOperand(i));
Arg = VectorArg[Part];		Arg = VectorArg[Part];
}		}
Args.push_back(Arg);		Args.push_back(Arg);
}		}

Function *VectorF;		Function *VectorF;
if (UseVectorIntrinsic) {		if (UseVectorIntrinsic) {
// Use vector version of the intrinsic.		// Use vector version of the intrinsic.
Show All 21 Lines	case Instruction::Call: {
CallInst *V = Builder.CreateCall(VectorF, Args, OpBundles);		CallInst *V = Builder.CreateCall(VectorF, Args, OpBundles);

if (isa<FPMathOperator>(V))		if (isa<FPMathOperator>(V))
V->copyFastMathFlags(CI);		V->copyFastMathFlags(CI);

Entry[Part] = V;		Entry[Part] = V;
}		}

		VectorLoopValueMap.initVector(&I, Entry);
addMetadata(Entry, &I);		addMetadata(Entry, &I);
break;		break;
}		}

default:		default:
// All other instructions are unsupported. Scalarize them.		// All other instructions are unsupported. Scalarize them.
scalarizeInstruction(&I);		scalarizeInstruction(&I);
break;		break;
▲ Show 20 Lines • Show All 1,914 Lines • ▼ Show 20 Lines
void InnerLoopUnroller::scalarizeInstruction(Instruction *Instr,		void InnerLoopUnroller::scalarizeInstruction(Instruction *Instr,
bool IfPredicateInstr) {		bool IfPredicateInstr) {
assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");		assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");
// Holds vector parameters or scalars, in case of uniform vals.		// Holds vector parameters or scalars, in case of uniform vals.
SmallVector<VectorParts, 4> Params;		SmallVector<VectorParts, 4> Params;

setDebugLocFromInst(Builder, Instr);		setDebugLocFromInst(Builder, Instr);

// Find all of the vectorized parameters.
for (Value *SrcOp : Instr->operands()) {
// If we are accessing the old induction variable, use the new one.
if (SrcOp == OldInduction) {
Params.push_back(getVectorValue(SrcOp));
continue;
}

// Try using previously calculated values.
Instruction *SrcInst = dyn_cast<Instruction>(SrcOp);

// If the src is an instruction that appeared earlier in the basic block
// then it should already be vectorized.
if (SrcInst && OrigLoop->contains(SrcInst)) {
assert(WidenMap.has(SrcInst) && "Source operand is unavailable");
// The parameter is a vector value from earlier.
Params.push_back(WidenMap.get(SrcInst));
} else {
// The parameter is a scalar from outside the loop. Maybe even a constant.
VectorParts Scalars;
Scalars.append(UF, SrcOp);
Params.push_back(Scalars);
}
}

assert(Params.size() == Instr->getNumOperands() &&
"Invalid number of operands");

// Does this instruction return a value ?		// Does this instruction return a value ?
bool IsVoidRetTy = Instr->getType()->isVoidTy();		bool IsVoidRetTy = Instr->getType()->isVoidTy();

Value *UndefVec = IsVoidRetTy ? nullptr : UndefValue::get(Instr->getType());		// Initialize a new scalar map entry.
// Create a new entry in the WidenMap and initialize it to Undef or Null.		ScalarParts Entry(UF);
VectorParts &VecResults = WidenMap.splat(Instr, UndefVec);

VectorParts Cond;		VectorParts Cond;
if (IfPredicateInstr) {		if (IfPredicateInstr) {
assert(Instr->getParent()->getSinglePredecessor() &&		assert(Instr->getParent()->getSinglePredecessor() &&
"Only support single predecessor blocks");		"Only support single predecessor blocks");
Cond = createEdgeMask(Instr->getParent()->getSinglePredecessor(),		Cond = createEdgeMask(Instr->getParent()->getSinglePredecessor(),
Instr->getParent());		Instr->getParent());
}		}

// For each vector unroll 'part':		// For each vector unroll 'part':
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
		Entry[Part].resize(1);
// For each scalar that we create:		// For each scalar that we create:

// Start an "if (pred) a[i] = ..." block.		// Start an "if (pred) a[i] = ..." block.
Value *Cmp = nullptr;		Value *Cmp = nullptr;
if (IfPredicateInstr) {		if (IfPredicateInstr) {
if (Cond[Part]->getType()->isVectorTy())		if (Cond[Part]->getType()->isVectorTy())
Cond[Part] =		Cond[Part] =
Builder.CreateExtractElement(Cond[Part], Builder.getInt32(0));		Builder.CreateExtractElement(Cond[Part], Builder.getInt32(0));
Cmp = Builder.CreateICmp(ICmpInst::ICMP_EQ, Cond[Part],		Cmp = Builder.CreateICmp(ICmpInst::ICMP_EQ, Cond[Part],
ConstantInt::get(Cond[Part]->getType(), 1));		ConstantInt::get(Cond[Part]->getType(), 1));
}		}

Instruction *Cloned = Instr->clone();		Instruction *Cloned = Instr->clone();
if (!IsVoidRetTy)		if (!IsVoidRetTy)
Cloned->setName(Instr->getName() + ".cloned");		Cloned->setName(Instr->getName() + ".cloned");
// Replace the operands of the cloned instructions with extracted scalars.
		// Replace the operands of the cloned instructions with their scalar
		// equivalents in the new loop.
for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {		for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {
Value *Op = Params[op][Part];		auto *NewOp = getScalarValue(Instr->getOperand(op), Part, 0);
Cloned->setOperand(op, Op);		Cloned->setOperand(op, NewOp);
}		}

// Place the cloned scalar in the new loop.		// Place the cloned scalar in the new loop.
Builder.Insert(Cloned);		Builder.Insert(Cloned);

		// Add the cloned scalar to the scalar map entry.
		Entry[Part][0] = Cloned;

// If we just cloned a new assumption, add it the assumption cache.		// If we just cloned a new assumption, add it the assumption cache.
if (auto *II = dyn_cast<IntrinsicInst>(Cloned))		if (auto *II = dyn_cast<IntrinsicInst>(Cloned))
if (II->getIntrinsicID() == Intrinsic::assume)		if (II->getIntrinsicID() == Intrinsic::assume)
AC->registerAssumption(II);		AC->registerAssumption(II);

// If the original scalar returns a value we need to place it in a vector
// so that future users will be able to use it.
if (!IsVoidRetTy)
VecResults[Part] = Cloned;

// End if-block.		// End if-block.
if (IfPredicateInstr)		if (IfPredicateInstr)
PredicatedInstructions.push_back(std::make_pair(Cloned, Cmp));		PredicatedInstructions.push_back(std::make_pair(Cloned, Cmp));
}		}
		VectorLoopValueMap.initScalar(Instr, Entry);
}		}

void InnerLoopUnroller::vectorizeMemoryInstruction(Instruction *Instr) {		void InnerLoopUnroller::vectorizeMemoryInstruction(Instruction *Instr) {
auto *SI = dyn_cast<StoreInst>(Instr);		auto *SI = dyn_cast<StoreInst>(Instr);
bool IfPredicateInstr = (SI && Legal->blockNeedsPredication(SI->getParent()));		bool IfPredicateInstr = (SI && Legal->blockNeedsPredication(SI->getParent()));

return scalarizeInstruction(Instr, IfPredicateInstr);		return scalarizeInstruction(Instr, IfPredicateInstr);
}		}
▲ Show 20 Lines • Show All 363 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/AArch64/arbitrary-induction-step.ll

	Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
	; for (int i = 0; i < 1024; i++) {			; for (int i = 0; i < 1024; i++) {
	; int tmp0 = *A++;			; int tmp0 = *A++;
	; int tmp1 = *A++;			; int tmp1 = *A++;
	; sum += tmp0 * tmp1;			; sum += tmp0 * tmp1;
	; }			; }

	; CHECK-LABEL: @ptr_ind_plus2(			; CHECK-LABEL: @ptr_ind_plus2(
	; CHECK: %[[V0:.*]] = load <8 x i32>			; CHECK: %[[V0:.*]] = load <8 x i32>
	; CHECK: shufflevector <8 x i32> %[[V0]], <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
	; CHECK: shufflevector <8 x i32> %[[V0]], <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
	; CHECK: %[[V1:.*]] = load <8 x i32>			; CHECK: %[[V1:.*]] = load <8 x i32>
				; CHECK: shufflevector <8 x i32> %[[V0]], <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
	; CHECK: shufflevector <8 x i32> %[[V1]], <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>			; CHECK: shufflevector <8 x i32> %[[V1]], <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
				; CHECK: shufflevector <8 x i32> %[[V0]], <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
	; CHECK: shufflevector <8 x i32> %[[V1]], <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>			; CHECK: shufflevector <8 x i32> %[[V1]], <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
	; CHECK: mul nsw <4 x i32>			; CHECK: mul nsw <4 x i32>
	; CHECK: mul nsw <4 x i32>			; CHECK: mul nsw <4 x i32>
	; CHECK: add nsw <4 x i32>			; CHECK: add nsw <4 x i32>
	; CHECK: add nsw <4 x i32>			; CHECK: add nsw <4 x i32>
	; CHECK: %index.next = add i64 %index, 8			; CHECK: %index.next = add i64 %index, 8
	; CHECK: icmp eq i64 %index.next, 1024			; CHECK: icmp eq i64 %index.next, 1024

	Show All 30 Lines

llvm/trunk/test/Transforms/LoopVectorize/X86/scatter_crash.ll

	Show All 33 Lines
	; CHECK-NEXT: [[IND20:%.*]] = add i64 %offset.idx, 20			; CHECK-NEXT: [[IND20:%.*]] = add i64 %offset.idx, 20
	; CHECK-NEXT: [[IND22:%.*]] = add i64 %offset.idx, 22			; CHECK-NEXT: [[IND22:%.*]] = add i64 %offset.idx, 22
	; CHECK-NEXT: [[IND24:%.*]] = add i64 %offset.idx, 24			; CHECK-NEXT: [[IND24:%.*]] = add i64 %offset.idx, 24
	; CHECK-NEXT: [[IND26:%.*]] = add i64 %offset.idx, 26			; CHECK-NEXT: [[IND26:%.*]] = add i64 %offset.idx, 26
	; CHECK-NEXT: [[IND28:%.*]] = add i64 %offset.idx, 28			; CHECK-NEXT: [[IND28:%.*]] = add i64 %offset.idx, 28
	; CHECK-NEXT: [[IND30:%.*]] = add i64 %offset.idx, 30			; CHECK-NEXT: [[IND30:%.*]] = add i64 %offset.idx, 30
	; CHECK-NEXT: [[TMP10:%.*]] = sub nsw <16 x i64> <i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8>, [[VEC_IND]]			; CHECK-NEXT: [[TMP10:%.*]] = sub nsw <16 x i64> <i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8>, [[VEC_IND]]
	; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND00]]			; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND00]]
	; CHECK-NEXT: [[TMP13:%.]] = insertelement <16 x [10 x i32]> undef, [10 x i32]* [[TMP12]], i32 0
	; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND02]]			; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND02]]
	; CHECK-NEXT: [[TMP16:%.]] = insertelement <16 x [10 x i32]> [[TMP13]], [10 x i32]* [[TMP15]], i32 1
	; CHECK-NEXT: [[TMP18:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND04]]			; CHECK-NEXT: [[TMP18:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND04]]
	; CHECK-NEXT: [[TMP19:%.]] = insertelement <16 x [10 x i32]> [[TMP16]], [10 x i32]* [[TMP18]], i32 2
	; CHECK-NEXT: [[TMP21:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND06]]			; CHECK-NEXT: [[TMP21:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND06]]
	; CHECK-NEXT: [[TMP22:%.]] = insertelement <16 x [10 x i32]> [[TMP19]], [10 x i32]* [[TMP21]], i32 3
	; CHECK-NEXT: [[TMP24:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND08]]			; CHECK-NEXT: [[TMP24:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND08]]
	; CHECK-NEXT: [[TMP25:%.]] = insertelement <16 x [10 x i32]> [[TMP22]], [10 x i32]* [[TMP24]], i32 4
	; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND10]]			; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND10]]
	; CHECK-NEXT: [[TMP28:%.]] = insertelement <16 x [10 x i32]> [[TMP25]], [10 x i32]* [[TMP27]], i32 5
	; CHECK-NEXT: [[TMP30:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND12]]			; CHECK-NEXT: [[TMP30:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND12]]
	; CHECK-NEXT: [[TMP31:%.]] = insertelement <16 x [10 x i32]> [[TMP28]], [10 x i32]* [[TMP30]], i32 6
	; CHECK-NEXT: [[TMP33:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND14]]			; CHECK-NEXT: [[TMP33:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND14]]
	; CHECK-NEXT: [[TMP34:%.]] = insertelement <16 x [10 x i32]> [[TMP31]], [10 x i32]* [[TMP33]], i32 7
	; CHECK-NEXT: [[TMP36:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND16]]			; CHECK-NEXT: [[TMP36:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND16]]
	; CHECK-NEXT: [[TMP37:%.]] = insertelement <16 x [10 x i32]> [[TMP34]], [10 x i32]* [[TMP36]], i32 8
	; CHECK-NEXT: [[TMP39:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND18]]			; CHECK-NEXT: [[TMP39:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND18]]
	; CHECK-NEXT: [[TMP40:%.]] = insertelement <16 x [10 x i32]> [[TMP37]], [10 x i32]* [[TMP39]], i32 9
	; CHECK-NEXT: [[TMP42:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND20]]			; CHECK-NEXT: [[TMP42:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND20]]
	; CHECK-NEXT: [[TMP43:%.]] = insertelement <16 x [10 x i32]> [[TMP40]], [10 x i32]* [[TMP42]], i32 10
	; CHECK-NEXT: [[TMP45:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND22]]			; CHECK-NEXT: [[TMP45:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND22]]
	; CHECK-NEXT: [[TMP46:%.]] = insertelement <16 x [10 x i32]> [[TMP43]], [10 x i32]* [[TMP45]], i32 11
	; CHECK-NEXT: [[TMP48:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND24]]			; CHECK-NEXT: [[TMP48:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND24]]
	; CHECK-NEXT: [[TMP49:%.]] = insertelement <16 x [10 x i32]> [[TMP46]], [10 x i32]* [[TMP48]], i32 12
	; CHECK-NEXT: [[TMP51:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND26]]			; CHECK-NEXT: [[TMP51:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND26]]
	; CHECK-NEXT: [[TMP52:%.]] = insertelement <16 x [10 x i32]> [[TMP49]], [10 x i32]* [[TMP51]], i32 13
	; CHECK-NEXT: [[TMP54:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND28]]			; CHECK-NEXT: [[TMP54:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND28]]
	; CHECK-NEXT: [[TMP55:%.]] = insertelement <16 x [10 x i32]> [[TMP52]], [10 x i32]* [[TMP54]], i32 14
	; CHECK-NEXT: [[TMP57:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND30]]			; CHECK-NEXT: [[TMP57:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND30]]
	; CHECK-NEXT: [[TMP58:%.]] = insertelement <16 x [10 x i32]> [[TMP55]], [10 x i32]* [[TMP57]], i32 15
	; CHECK-NEXT: [[TMP59:%.*]] = add nsw <16 x i64> [[TMP10]], [[VEC_IND3]]			; CHECK-NEXT: [[TMP59:%.*]] = add nsw <16 x i64> [[TMP10]], [[VEC_IND3]]
	; CHECK-NEXT: [[TMP60:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 0
	; CHECK-NEXT: [[TMP61:%.*]] = extractelement <16 x i64> [[TMP59]], i32 0			; CHECK-NEXT: [[TMP61:%.*]] = extractelement <16 x i64> [[TMP59]], i32 0
	; CHECK-NEXT: [[TMP62:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP60]], i64 [[TMP61]], i64 0			; CHECK-NEXT: [[TMP62:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP12]], i64 [[TMP61]], i64 0
	; CHECK-NEXT: [[TMP63:%.]] = insertelement <16 x i32> undef, i32* [[TMP62]], i32 0
	; CHECK-NEXT: [[TMP64:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 1
	; CHECK-NEXT: [[TMP65:%.*]] = extractelement <16 x i64> [[TMP59]], i32 1			; CHECK-NEXT: [[TMP65:%.*]] = extractelement <16 x i64> [[TMP59]], i32 1
	; CHECK-NEXT: [[TMP66:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP64]], i64 [[TMP65]], i64 0			; CHECK-NEXT: [[TMP66:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP15]], i64 [[TMP65]], i64 0
	; CHECK-NEXT: [[TMP67:%.]] = insertelement <16 x i32> [[TMP63]], i32* [[TMP66]], i32 1
	; CHECK-NEXT: [[TMP68:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 2
	; CHECK-NEXT: [[TMP69:%.*]] = extractelement <16 x i64> [[TMP59]], i32 2			; CHECK-NEXT: [[TMP69:%.*]] = extractelement <16 x i64> [[TMP59]], i32 2
	; CHECK-NEXT: [[TMP70:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP68]], i64 [[TMP69]], i64 0			; CHECK-NEXT: [[TMP70:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP18]], i64 [[TMP69]], i64 0
	; CHECK-NEXT: [[TMP71:%.]] = insertelement <16 x i32> [[TMP67]], i32* [[TMP70]], i32 2
	; CHECK-NEXT: [[TMP72:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 3
	; CHECK-NEXT: [[TMP73:%.*]] = extractelement <16 x i64> [[TMP59]], i32 3			; CHECK-NEXT: [[TMP73:%.*]] = extractelement <16 x i64> [[TMP59]], i32 3
	; CHECK-NEXT: [[TMP74:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP72]], i64 [[TMP73]], i64 0			; CHECK-NEXT: [[TMP74:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP21]], i64 [[TMP73]], i64 0
	; CHECK-NEXT: [[TMP75:%.]] = insertelement <16 x i32> [[TMP71]], i32* [[TMP74]], i32 3
	; CHECK-NEXT: [[TMP76:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 4
	; CHECK-NEXT: [[TMP77:%.*]] = extractelement <16 x i64> [[TMP59]], i32 4			; CHECK-NEXT: [[TMP77:%.*]] = extractelement <16 x i64> [[TMP59]], i32 4
	; CHECK-NEXT: [[TMP78:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP76]], i64 [[TMP77]], i64 0			; CHECK-NEXT: [[TMP78:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP24]], i64 [[TMP77]], i64 0
	; CHECK-NEXT: [[TMP79:%.]] = insertelement <16 x i32> [[TMP75]], i32* [[TMP78]], i32 4
	; CHECK-NEXT: [[TMP80:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 5
	; CHECK-NEXT: [[TMP81:%.*]] = extractelement <16 x i64> [[TMP59]], i32 5			; CHECK-NEXT: [[TMP81:%.*]] = extractelement <16 x i64> [[TMP59]], i32 5
	; CHECK-NEXT: [[TMP82:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP80]], i64 [[TMP81]], i64 0			; CHECK-NEXT: [[TMP82:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP27]], i64 [[TMP81]], i64 0
	; CHECK-NEXT: [[TMP83:%.]] = insertelement <16 x i32> [[TMP79]], i32* [[TMP82]], i32 5
	; CHECK-NEXT: [[TMP84:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 6
	; CHECK-NEXT: [[TMP85:%.*]] = extractelement <16 x i64> [[TMP59]], i32 6			; CHECK-NEXT: [[TMP85:%.*]] = extractelement <16 x i64> [[TMP59]], i32 6
	; CHECK-NEXT: [[TMP86:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP84]], i64 [[TMP85]], i64 0			; CHECK-NEXT: [[TMP86:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP30]], i64 [[TMP85]], i64 0
	; CHECK-NEXT: [[TMP87:%.]] = insertelement <16 x i32> [[TMP83]], i32* [[TMP86]], i32 6
	; CHECK-NEXT: [[TMP88:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 7
	; CHECK-NEXT: [[TMP89:%.*]] = extractelement <16 x i64> [[TMP59]], i32 7			; CHECK-NEXT: [[TMP89:%.*]] = extractelement <16 x i64> [[TMP59]], i32 7
	; CHECK-NEXT: [[TMP90:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP88]], i64 [[TMP89]], i64 0			; CHECK-NEXT: [[TMP90:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP33]], i64 [[TMP89]], i64 0
	; CHECK-NEXT: [[TMP91:%.]] = insertelement <16 x i32> [[TMP87]], i32* [[TMP90]], i32 7
	; CHECK-NEXT: [[TMP92:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 8
	; CHECK-NEXT: [[TMP93:%.*]] = extractelement <16 x i64> [[TMP59]], i32 8			; CHECK-NEXT: [[TMP93:%.*]] = extractelement <16 x i64> [[TMP59]], i32 8
	; CHECK-NEXT: [[TMP94:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP92]], i64 [[TMP93]], i64 0			; CHECK-NEXT: [[TMP94:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP36]], i64 [[TMP93]], i64 0
	; CHECK-NEXT: [[TMP95:%.]] = insertelement <16 x i32> [[TMP91]], i32* [[TMP94]], i32 8
	; CHECK-NEXT: [[TMP96:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 9
	; CHECK-NEXT: [[TMP97:%.*]] = extractelement <16 x i64> [[TMP59]], i32 9			; CHECK-NEXT: [[TMP97:%.*]] = extractelement <16 x i64> [[TMP59]], i32 9
	; CHECK-NEXT: [[TMP98:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP96]], i64 [[TMP97]], i64 0			; CHECK-NEXT: [[TMP98:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP39]], i64 [[TMP97]], i64 0
	; CHECK-NEXT: [[TMP99:%.]] = insertelement <16 x i32> [[TMP95]], i32* [[TMP98]], i32 9
	; CHECK-NEXT: [[TMP100:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 10
	; CHECK-NEXT: [[TMP101:%.*]] = extractelement <16 x i64> [[TMP59]], i32 10			; CHECK-NEXT: [[TMP101:%.*]] = extractelement <16 x i64> [[TMP59]], i32 10
	; CHECK-NEXT: [[TMP102:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP100]], i64 [[TMP101]], i64 0			; CHECK-NEXT: [[TMP102:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP42]], i64 [[TMP101]], i64 0
	; CHECK-NEXT: [[TMP103:%.]] = insertelement <16 x i32> [[TMP99]], i32* [[TMP102]], i32 10
	; CHECK-NEXT: [[TMP104:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 11
	; CHECK-NEXT: [[TMP105:%.*]] = extractelement <16 x i64> [[TMP59]], i32 11			; CHECK-NEXT: [[TMP105:%.*]] = extractelement <16 x i64> [[TMP59]], i32 11
	; CHECK-NEXT: [[TMP106:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP104]], i64 [[TMP105]], i64 0			; CHECK-NEXT: [[TMP106:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP45]], i64 [[TMP105]], i64 0
	; CHECK-NEXT: [[TMP107:%.]] = insertelement <16 x i32> [[TMP103]], i32* [[TMP106]], i32 11
	; CHECK-NEXT: [[TMP108:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 12
	; CHECK-NEXT: [[TMP109:%.*]] = extractelement <16 x i64> [[TMP59]], i32 12			; CHECK-NEXT: [[TMP109:%.*]] = extractelement <16 x i64> [[TMP59]], i32 12
	; CHECK-NEXT: [[TMP110:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP108]], i64 [[TMP109]], i64 0			; CHECK-NEXT: [[TMP110:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP48]], i64 [[TMP109]], i64 0
	; CHECK-NEXT: [[TMP111:%.]] = insertelement <16 x i32> [[TMP107]], i32* [[TMP110]], i32 12
	; CHECK-NEXT: [[TMP112:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 13
	; CHECK-NEXT: [[TMP113:%.*]] = extractelement <16 x i64> [[TMP59]], i32 13			; CHECK-NEXT: [[TMP113:%.*]] = extractelement <16 x i64> [[TMP59]], i32 13
	; CHECK-NEXT: [[TMP114:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP112]], i64 [[TMP113]], i64 0			; CHECK-NEXT: [[TMP114:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP51]], i64 [[TMP113]], i64 0
	; CHECK-NEXT: [[TMP115:%.]] = insertelement <16 x i32> [[TMP111]], i32* [[TMP114]], i32 13
	; CHECK-NEXT: [[TMP116:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 14
	; CHECK-NEXT: [[TMP117:%.*]] = extractelement <16 x i64> [[TMP59]], i32 14			; CHECK-NEXT: [[TMP117:%.*]] = extractelement <16 x i64> [[TMP59]], i32 14
	; CHECK-NEXT: [[TMP118:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP116]], i64 [[TMP117]], i64 0			; CHECK-NEXT: [[TMP118:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP54]], i64 [[TMP117]], i64 0
	; CHECK-NEXT: [[TMP119:%.]] = insertelement <16 x i32> [[TMP115]], i32* [[TMP118]], i32 14
	; CHECK-NEXT: [[TMP120:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 15
	; CHECK-NEXT: [[TMP121:%.*]] = extractelement <16 x i64> [[TMP59]], i32 15			; CHECK-NEXT: [[TMP121:%.*]] = extractelement <16 x i64> [[TMP59]], i32 15
	; CHECK-NEXT: [[TMP122:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP120]], i64 [[TMP121]], i64 0			; CHECK-NEXT: [[TMP122:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP57]], i64 [[TMP121]], i64 0
	; CHECK-NEXT: [[TMP123:%.]] = insertelement <16 x i32> [[TMP119]], i32* [[TMP122]], i32 15			; CHECK-NEXT: [[TMP13:%.]] = insertelement <16 x [10 x i32]> undef, [10 x i32]* [[TMP12]], i32 0
				; CHECK-NEXT: [[TMP16:%.]] = insertelement <16 x [10 x i32]> [[TMP13]], [10 x i32]* [[TMP15]], i32 1
				; CHECK-NEXT: [[TMP19:%.]] = insertelement <16 x [10 x i32]> [[TMP16]], [10 x i32]* [[TMP18]], i32 2
				; CHECK-NEXT: [[TMP22:%.]] = insertelement <16 x [10 x i32]> [[TMP19]], [10 x i32]* [[TMP21]], i32 3
				; CHECK-NEXT: [[TMP25:%.]] = insertelement <16 x [10 x i32]> [[TMP22]], [10 x i32]* [[TMP24]], i32 4
				; CHECK-NEXT: [[TMP28:%.]] = insertelement <16 x [10 x i32]> [[TMP25]], [10 x i32]* [[TMP27]], i32 5
				; CHECK-NEXT: [[TMP31:%.]] = insertelement <16 x [10 x i32]> [[TMP28]], [10 x i32]* [[TMP30]], i32 6
				; CHECK-NEXT: [[TMP34:%.]] = insertelement <16 x [10 x i32]> [[TMP31]], [10 x i32]* [[TMP33]], i32 7
				; CHECK-NEXT: [[TMP37:%.]] = insertelement <16 x [10 x i32]> [[TMP34]], [10 x i32]* [[TMP36]], i32 8
				; CHECK-NEXT: [[TMP40:%.]] = insertelement <16 x [10 x i32]> [[TMP37]], [10 x i32]* [[TMP39]], i32 9
				; CHECK-NEXT: [[TMP43:%.]] = insertelement <16 x [10 x i32]> [[TMP40]], [10 x i32]* [[TMP42]], i32 10
				; CHECK-NEXT: [[TMP46:%.]] = insertelement <16 x [10 x i32]> [[TMP43]], [10 x i32]* [[TMP45]], i32 11
				; CHECK-NEXT: [[TMP49:%.]] = insertelement <16 x [10 x i32]> [[TMP46]], [10 x i32]* [[TMP48]], i32 12
				; CHECK-NEXT: [[TMP52:%.]] = insertelement <16 x [10 x i32]> [[TMP49]], [10 x i32]* [[TMP51]], i32 13
				; CHECK-NEXT: [[TMP55:%.]] = insertelement <16 x [10 x i32]> [[TMP52]], [10 x i32]* [[TMP54]], i32 14
				; CHECK-NEXT: [[TMP58:%.]] = insertelement <16 x [10 x i32]> [[TMP55]], [10 x i32]* [[TMP57]], i32 15
	; CHECK-NEXT: [[VECTORGEP:%.]] = getelementptr inbounds [10 x i32], <16 x [10 x i32]> [[TMP58]], <16 x i64> [[TMP59]], i64 0			; CHECK-NEXT: [[VECTORGEP:%.]] = getelementptr inbounds [10 x i32], <16 x [10 x i32]> [[TMP58]], <16 x i64> [[TMP59]], i64 0
	; CHECK-NEXT: call void @llvm.masked.scatter.v16i32(<16 x i32> <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>, <16 x i32*> [[VECTORGEP]], i32 16, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)			; CHECK-NEXT: call void @llvm.masked.scatter.v16i32(<16 x i32> <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>, <16 x i32*> [[VECTORGEP]], i32 16, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)
	; CHECK: [[STEP_ADD:%.*]] = add <16 x i64> [[VEC_IND]], <i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32>			; CHECK: [[STEP_ADD:%.*]] = add <16 x i64> [[VEC_IND]], <i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32>
	; CHECK: [[STEP_ADD4:%.*]] = add <16 x i64> [[VEC_IND3]], <i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32>			; CHECK: [[STEP_ADD4:%.*]] = add <16 x i64> [[VEC_IND3]], <i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32>
	entry:			entry:
	%0 = load i32, i32* @c, align 4			%0 = load i32, i32* @c, align 4
	%cmp34 = icmp sgt i32 %0, 8			%cmp34 = icmp sgt i32 %0, 8
	br i1 %cmp34, label %for.body.lr.ph, label %for.cond.cleanup			br i1 %cmp34, label %for.body.lr.ph, label %for.cond.cleanup
	▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/if-pred-non-void.ll

Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	if.end: ; preds = %if.then, %for.body
store i32 %yud.0, i32* %iud, align 4		store i32 %yud.0, i32* %iud, align 4
store i32 %ysr.0, i32* %isr, align 4		store i32 %ysr.0, i32* %isr, align 4
store i32 %yur.0, i32* %iur, align 4		store i32 %yur.0, i32* %iur, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, 128		%exitcond = icmp eq i64 %indvars.iv.next, 128
br i1 %exitcond, label %for.cond.cleanup, label %for.body		br i1 %exitcond, label %for.cond.cleanup, label %for.body
}		}

; Future-use test for predication under smarter scalar-scalar: this test will
; fail when the vectorizer starts feeding scalarized values directly to their
; scalar users, i.e. w/o generating redundant insertelement/extractelement
; instructions. This case is already supported by the predication code (which
; should generate a phi for the scalar predicated value rather than for the
; insertelement), but cannot be tested yet.
; If you got this test to fail, kindly fix the test by using the alternative
; FFU sequence. This will make the test check how we handle this case from
; now on.
define void @test_scalar2scalar(i32* nocapture %asd, i32* nocapture %bsd) {		define void @test_scalar2scalar(i32* nocapture %asd, i32* nocapture %bsd) {
entry:		entry:
br label %for.body		br label %for.body

for.cond.cleanup: ; preds = %if.end		for.cond.cleanup: ; preds = %if.end
ret void		ret void

; CHECK-LABEL: test_scalar2scalar		; CHECK-LABEL: test_scalar2scalar
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK: br i1 %{{.*}}, label %[[THEN:[a-zA-Z0-9.]+]], label %[[FI:[a-zA-Z0-9.]+]]		; CHECK: br i1 %{{.*}}, label %[[THEN:[a-zA-Z0-9.]+]], label %[[FI:[a-zA-Z0-9.]+]]
; CHECK: [[THEN]]:		; CHECK: [[THEN]]:
; CHECK: %[[PD:[a-zA-Z0-9]+]] = sdiv i32 %{{.}}, %{{.}}		; CHECK: %[[PD:[a-zA-Z0-9]+]] = sdiv i32 %{{.}}, %{{.}}
; CHECK: %[[PDV:[a-zA-Z0-9]+]] = insertelement <2 x i32> undef, i32 %[[PD]], i32 0
; CHECK: br label %[[FI]]		; CHECK: br label %[[FI]]
; CHECK: [[FI]]:		; CHECK: [[FI]]:
; CHECK: %[[PH:[a-zA-Z0-9]+]] = phi <2 x i32> [ undef, %vector.body ], [ %[[PDV]], %[[THEN]] ]		; CHECK: %{{.*}} = phi i32 [ undef, %vector.body ], [ %[[PD]], %[[THEN]] ]
; FFU-LABEL: test_scalar2scalar
; FFU: vector.body:
; FFU: br i1 %{{.*}}, label %[[THEN:[a-zA-Z0-9.]+]], label %[[FI:[a-zA-Z0-9.]+]]
; FFU: [[THEN]]:
; FFU: %[[PD:[a-zA-Z0-9]+]] = sdiv i32 %{{.}}, %{{.}}
; FFU: br label %[[FI]]
; FFU: [[FI]]:
; FFU: %{{.*}} = phi i32 [ undef, %vector.body ], [ %[[PD]], %[[THEN]] ]

for.body: ; preds = %if.end, %entry		for.body: ; preds = %if.end, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %if.end ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %if.end ]
%isd = getelementptr inbounds i32, i32* %asd, i64 %indvars.iv		%isd = getelementptr inbounds i32, i32* %asd, i64 %indvars.iv
%lsd = load i32, i32* %isd, align 4		%lsd = load i32, i32* %isd, align 4
%isd.b = getelementptr inbounds i32, i32* %bsd, i64 %indvars.iv		%isd.b = getelementptr inbounds i32, i32* %bsd, i64 %indvars.iv
%lsd.b = load i32, i32* %isd.b, align 4		%lsd.b = load i32, i32* %isd.b, align 4
%psd = add nsw i32 %lsd, 23		%psd = add nsw i32 %lsd, 23
Show All 15 Lines

llvm/trunk/test/Transforms/LoopVectorize/if-pred-stores.ll

	; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info -simplifycfg < %s \| FileCheck %s --check-prefix=UNROLL			; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info -simplifycfg < %s \| FileCheck %s --check-prefix=UNROLL
	; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info < %s \| FileCheck %s --check-prefix=UNROLL-NOSIMPLIFY			; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info < %s \| FileCheck %s --check-prefix=UNROLL-NOSIMPLIFY
	; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=2 -force-vector-interleave=1 -loop-vectorize -enable-cond-stores-vec -verify-loop-info -simplifycfg < %s \| FileCheck %s --check-prefix=VEC			; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=2 -force-vector-interleave=1 -loop-vectorize -enable-cond-stores-vec -verify-loop-info -simplifycfg < %s \| FileCheck %s --check-prefix=VEC

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.9.0"			target triple = "x86_64-apple-macosx10.9.0"

	; Test predication of stores.			; Test predication of stores.
	define i32 @test(i32* nocapture %f) #0 {			define i32 @test(i32* nocapture %f) #0 {
	entry:			entry:
	br label %for.body			br label %for.body

	; VEC-LABEL: test			; VEC-LABEL: test
				; VEC: %[[v0:.+]] = add i64 %index, 0
				; VEC: %[[v1:.+]] = add i64 %index, 1
				; VEC: %[[v2:.+]] = getelementptr inbounds i32, i32* %f, i64 %[[v0]]
				; VEC: %[[v4:.+]] = getelementptr inbounds i32, i32* %f, i64 %[[v1]]
	; VEC: %[[v8:.+]] = icmp sgt <2 x i32> %{{.*}}, <i32 100, i32 100>			; VEC: %[[v8:.+]] = icmp sgt <2 x i32> %{{.*}}, <i32 100, i32 100>
	; VEC: %[[v9:.+]] = add nsw <2 x i32> %{{.*}}, <i32 20, i32 20>			; VEC: %[[v9:.+]] = add nsw <2 x i32> %{{.*}}, <i32 20, i32 20>
	; VEC: %[[v10:.+]] = and <2 x i1> %[[v8]], <i1 true, i1 true>			; VEC: %[[v10:.+]] = and <2 x i1> %[[v8]], <i1 true, i1 true>
	; VEC: %[[v11:.+]] = extractelement <2 x i1> %[[v10]], i32 0			; VEC: %[[v11:.+]] = extractelement <2 x i1> %[[v10]], i32 0
	; VEC: %[[v12:.+]] = icmp eq i1 %[[v11]], true			; VEC: %[[v12:.+]] = icmp eq i1 %[[v11]], true
	; VEC: br i1 %[[v12]], label %[[cond:.+]], label %[[else:.+]]			; VEC: br i1 %[[v12]], label %[[cond:.+]], label %[[else:.+]]
	;			;
	; VEC: [[cond]]:			; VEC: [[cond]]:
	; VEC: %[[v13:.+]] = extractelement <2 x i32> %[[v9]], i32 0			; VEC: %[[v13:.+]] = extractelement <2 x i32> %[[v9]], i32 0
	; VEC: %[[v14:.+]] = extractelement <2 x i32> %{{.}}, i32 0			; VEC: store i32 %[[v13]], i32* %[[v2]], align 4
	; VEC: store i32 %[[v13]], i32* %[[v14]], align 4
	; VEC: br label %[[else:.+]]			; VEC: br label %[[else:.+]]
	;			;
	; VEC: [[else]]:			; VEC: [[else]]:
	; VEC: %[[v15:.+]] = extractelement <2 x i1> %[[v10]], i32 1			; VEC: %[[v15:.+]] = extractelement <2 x i1> %[[v10]], i32 1
	; VEC: %[[v16:.+]] = icmp eq i1 %[[v15]], true			; VEC: %[[v16:.+]] = icmp eq i1 %[[v15]], true
	; VEC: br i1 %[[v16]], label %[[cond2:.+]], label %[[else2:.+]]			; VEC: br i1 %[[v16]], label %[[cond2:.+]], label %[[else2:.+]]
	;			;
	; VEC: [[cond2]]:			; VEC: [[cond2]]:
	; VEC: %[[v17:.+]] = extractelement <2 x i32> %[[v9]], i32 1			; VEC: %[[v17:.+]] = extractelement <2 x i32> %[[v9]], i32 1
	; VEC: %[[v18:.+]] = extractelement <2 x i32*> %{{.+}} i32 1			; VEC: store i32 %[[v17]], i32* %[[v4]], align 4
	; VEC: store i32 %[[v17]], i32* %[[v18]], align 4
	; VEC: br label %[[else2:.+]]			; VEC: br label %[[else2:.+]]
	;			;
	; VEC: [[else2]]:			; VEC: [[else2]]:

	; UNROLL-LABEL: test			; UNROLL-LABEL: test
	; UNROLL: vector.body:			; UNROLL: vector.body:
	; UNROLL: %[[IND:[a-zA-Z0-9]+]] = add i64 %{{.*}}, 0			; UNROLL: %[[IND:[a-zA-Z0-9]+]] = add i64 %{{.*}}, 0
	; UNROLL: %[[IND1:[a-zA-Z0-9]+]] = add i64 %{{.*}}, 1			; UNROLL: %[[IND1:[a-zA-Z0-9]+]] = add i64 %{{.*}}, 1
	▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines