This is an archive of the discontinued LLVM Phabricator instance.

[LV] Unify vector and scalar maps
ClosedPublic

Authored by mssimpso on Aug 4 2016, 9:52 AM.

Download Raw Diff

Details

Reviewers

anemet
nadav
mkuper

Commits

rGabd2be1e2e52: [LV] Unify vector and scalar maps
rL279649: [LV] Unify vector and scalar maps

Summary

This patch unifies the data structures we use for mapping instructions from the original loop to their corresponding instructions in the new loop. Previously, we maintained two distinct maps for this purpose: WidenMap and ScalarIVMap. WidenMap maintained the vector values each instruction from the old loop was represented with, and ScalarIVMap maintained the scalar values each scalarized induction variable was represented with. With this patch, all values created for the new loop (vector and scalar) are maintained in WidenMap (renamed to VectorLoopValueMap).

The change allows for several simplifications. Previously, when an instruction was scalarized, we had to insert the scalar values into vectors in order to maintain the mapping in WidenMap. Then, if a user of the scalarized value was also scalar, we had to extract the scalar values from the temporary vector we created. This caused unnecessary scalar-to-vector-to-scalar conversions, resulting in some cases in significant code bloat pre-InstCombine.

This patch avoids these unnecessary conversions by maintaining the scalar values directly. This can improve compile-time since it reduces the number of instructions in each block that InstCombine needs to simplify. If a scalarized value is used by a scalar instruction, the scalar value can be used directly. However, if the scalarized value is needed by a vector instruction, we will now generate the needed insertelement instructions on-demand.

A common idiom in several locations in the code (including the scalarization code), is to first get the vector values an instruction from the original loop maps to, and then extract a particular scalar value. This patch adds getScalarValue for this purpose along side getVectorValue as an interface into WidenMap. getScalarValue reuses a scalar value if available, or creates an extractelement instruction if a scalar value is not yet available. Similarly, getVectorValue has been modified to generate insertelement instructions on-demand if a vector version of a value is not yet available.

There's no real functional change with this patch (post-InstCombine); only compile-time savings and refactoring. However, in some cases we will generate different code. Because we now might generate an insertelement sequence on-demand, the IR post-InstCombine might be reordered somewhat. For example, instead of the insertelement sequence following the definition of an instruction, it will now precede the first use of that instruction. This can be seen in the test case changes.

Diff Detail

Event Timeline

mssimpso updated this revision to Diff 66820.Aug 4 2016, 9:52 AM

mssimpso retitled this revision from to [LV] Unify vector and scalar maps.

mssimpso updated this object.

mssimpso added reviewers: nadav, anemet, mkuper.

mssimpso added subscribers: gilr, mcrosier, llvm-commits.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptAug 4 2016, 9:52 AM

wmi added a subscriber: wmi.Aug 4 2016, 11:04 AM

Ensure the new patch handles scalarized values with void types as we previously did. We make a vector entry for the value in WidenMap mapped to nullptr's.

spatel added a subscriber: spatel.Aug 4 2016, 3:35 PM

+1000 on the design and the cleanup, some nits about the implementation.

lib/Transforms/Vectorize/LoopVectorize.cpp
549	Perhaps use a 2D vector for this ("ScalarParts"), instead of the 1D VectorParts? I don't think we actually gain anything from using VectorParts for both - even in the one place where we do move a VectorParts from one map to the other (line 2300), you don't actually break the abstraction.
630	Bike-shedding - maybe change the name of the variable? "WidenMap" made sense when it mapped scalar values to the corresponding "wide" vector values.
2175	I -> Lane ?
2310	For clarity, I'd prefer a temp value here, and an assignment to Parts[Part] in the end.
2343	Is this for VF==1, for uniform values, or both?

As we're piling on more complexity on the top of WidenMap, I feel like that the completely permissive ValueParts &get() interface makes it really hard to reason about this code. The new eraseVector API is certainly a warning sign.

I think the new logic is correct because we first widen definitions before getting to the uses. So when widening the definition we either create the vector variant or the scalar variant or in some cases of int induction variables both. And then as we encounter uses, we derive the missing scalar or vector variants from each unless we already have a value for it. Please correct me if if I am misinterpreting this. It would also be good to have a high-level overview of these interactions commented somewhere before the class.

Anyhow, this all seems like the property of WidenMap, so I am wondering how hard it would be to pull getScalarValue and getVectorValue under that class. (I am also not sure that hiding the class inside InnerLoopVectorizer in a .cpp file is really adding anything but certainly hurts readability as the class is growing.)

Anyhow, sorry about the somewhat random comments, I was just wondering what you guys thought about this.

In D23169#506561, @anemet wrote:

As we're piling on more complexity on the top of WidenMap, I feel like that the completely permissive ValueParts &get() interface makes it really hard to reason about this code. The new eraseVector API is certainly a warning sign.

Right, that is confusing. We use "get" (now getScalar/getVector) to return an existing entry in the map or initialize an empty entry it if it's not already there. We use "getVectorValue" to return an existing entry or construct a non-empty entry (by broadcasting or, now, by inserting scalar elements into a vector). getVector/getScalar should probably just be initializers. For example, I think we should be able to replace all but one instance of getVector with getVectorValue. Then, getVector/getScalar could be renamed to initVector/initScalar, which would assert or noop if we try to initialize a key more than once. What do you think?

I think the new logic is correct because we first widen definitions before getting to the uses. So when widening the definition we either create the vector variant or the scalar variant or in some cases of int induction variables both. And then as we encounter uses, we derive the missing scalar or vector variants from each unless we already have a value for it. Please correct me if if I am misinterpreting this. It would also be good to have a high-level overview of these interactions commented somewhere before the class.

That's right. Sure I'll add some high-level comments.

Anyhow, this all seems like the property of WidenMap, so I am wondering how hard it would be to pull getScalarValue and getVectorValue under that class. (I am also not sure that hiding the class inside InnerLoopVectorizer in a .cpp file is really adding anything but certainly hurts readability as the class is growing.)

I think moving getVectorValue and getScalarValue into WidenMap might be tricky. The issue is that getVectorValue uses getBroadcastInstrs, which is a member of InnerLoopVectorizer, and hasStride from LoopVectorizationLegality. For getBroadcastInstrs, would we pass an InnerLoopVectorizer* to WidenMap? Alternatively, we could probably make getBroadcastInstrs into a utility (maybe in VectorUtils?). What about hasStride? What do you think?

lib/Transforms/Vectorize/LoopVectorize.cpp
549	Sounds good to me.
630	I actually thought about doing this, but couldn't really come up with anything better. Do you have any ideas? What about VectorLoopMap?
2175	Makes sense.
2310	Sure.
2343	Nice question! This is for the VF == 1 case, and it prevents us from trying to extract an element from a non-vector type. I moved this check from the existing scalarization code (line 2827). But we could replace the if condition here with VF == 1 if that makes more sense. To be clear, if a value is not an instruction in the loop, it's handled by the first check. For the instructions in Uniforms, I believe we will vectorize/scalarize them, and then the unused pieces will be later deleted.

In D23169#506915, @mssimpso wrote:

In D23169#506561, @anemet wrote:

As we're piling on more complexity on the top of WidenMap, I feel like that the completely permissive ValueParts &get() interface makes it really hard to reason about this code. The new eraseVector API is certainly a warning sign.

Right, that is confusing. We use "get" (now getScalar/getVector) to return an existing entry in the map or initialize an empty entry it if it's not already there. We use "getVectorValue" to return an existing entry or construct a non-empty entry (by broadcasting or, now, by inserting scalar elements into a vector). getVector/getScalar should probably just be initializers. For example, I think we should be able to replace all but one instance of getVector with getVectorValue. Then, getVector/getScalar could be renamed to initVector/initScalar, which would assert or noop if we try to initialize a key more than once. What do you think?

That would be very nice.

I think the new logic is correct because we first widen definitions before getting to the uses. So when widening the definition we either create the vector variant or the scalar variant or in some cases of int induction variables both. And then as we encounter uses, we derive the missing scalar or vector variants from each unless we already have a value for it. Please correct me if if I am misinterpreting this. It would also be good to have a high-level overview of these interactions commented somewhere before the class.

That's right. Sure I'll add some high-level comments.

Thanks.

Anyhow, this all seems like the property of WidenMap, so I am wondering how hard it would be to pull getScalarValue and getVectorValue under that class. (I am also not sure that hiding the class inside InnerLoopVectorizer in a .cpp file is really adding anything but certainly hurts readability as the class is growing.)

I think moving getVectorValue and getScalarValue into WidenMap might be tricky. The issue is that getVectorValue uses getBroadcastInstrs, which is a member of InnerLoopVectorizer, and hasStride from LoopVectorizationLegality. For getBroadcastInstrs, would we pass an InnerLoopVectorizer* to WidenMap? Alternatively, we could probably make getBroadcastInstrs into a utility (maybe in VectorUtils?). What about hasStride? What do you think?

Yeah, hasStride does *not* seem to belong to WM. We could have a wrapper of WM's getVectorValue in ILV that just swaps in the speculated stride value before calling WM::getVectorValue.

getBroadcastInstrs on the other hand probably does belong to WM because it's part of the whole picture how values are mapped. So probably making it stand-alone utility is the way to go.

Anyhow this all is certainly for another patch. I just wanted to see what you guys thought about moving even more of the mapping responsibility out of ILV to WM.

Addressed comments from Michael and Adam.

Added "ScalarParts" type as a 2D vector.
Removed from WidenMap the "VectorParts &get" interface and replaced it with "VectorParts &initVector" and "ScalarParts &initScalar", as previously discussed. These initializers assert if a key has already been mapped, which should prevent data from being overwritten unintentionally. Because of this change, I had to slightly modify interleaved access vectorization since we vectorize all instructions in a group at once.
I also removed the "VectorParts &splat" interface since this is now just a special case of initVector.
The data in WidenMap can now only be accessed via getVectorValue and getScalarValue. Until we can move these functions inside WidenMap, we have to declare them as friends to get access to the data.
Renamed WidenMap to VectorLoopValueMap (but I'm still open to name suggestions).
Added high-level comments about the interaction between getVectorValue and getScalarValue.
Addressed other minor comments.

Replaced unneeded interleaved access check in vectorizeMemoryInstruction with an assert.

Michael/Adam,

Do you have anymore comments on this change? Thanks!

Matt, sorry about the delay.

lib/Transforms/Vectorize/LoopVectorize.cpp
530–538	If we can't change all users to use init, it would be better to keep both functionalities separate rather than hiding it behind a default parameter. Just keep the ugly get interface as well, so that the users are easy to find. Hopefully we can migrate all users to init in the future.
2554–2556	This is weird in terms of its interface use. getVectorValue computes values and then this one overrides it?! I think we should just stick with the &get interface here and clean in it up in the future.
2648	Is this call necessary?
2828–2832	It would be good to explain the scenario under which we have already created a vector value for this value.

Matt, I also had some comments from earlier that I've never sent out. This was about the future direction of get{Scalar,Vector}Value:

In D23169#506915, @mssimpso wrote:

Anyhow, this all seems like the property of WidenMap, so I am wondering how hard it would be to pull getScalarValue and getVectorValue under that class. (I am also not sure that hiding the class inside InnerLoopVectorizer in a .cpp file is really adding anything but certainly hurts readability as the class is growing.)

I think moving getVectorValue and getScalarValue into WidenMap might be tricky. The issue is that getVectorValue uses getBroadcastInstrs, which is a member of InnerLoopVectorizer, and hasStride from LoopVectorizationLegality. For getBroadcastInstrs, would we pass an InnerLoopVectorizer* to WidenMap? Alternatively, we could probably make getBroadcastInstrs into a utility (maybe in VectorUtils?). What about hasStride? What do you think?

Yeah, hasStride does *not* seem to belong to WM. We could have a wrapper of WM's getVectorValue in ILV that just swaps in the speculated stride value before calling WM::getVectorValue.

getBroadcastInstrs on the other hand probably does belong to WM because it's part of the whole picture how values are mapped. So probably making it stand-alone utility is the way to go.

Anyhow this all is certainly for another patch. I just wanted to see what you guys thought about moving even more of the mapping responsibility out of ILV to WM.

Sorry for the delay, thanks for nudging me!

lib/Transforms/Vectorize/LoopVectorize.cpp
526	Do we need the Val parameter? It doesn't look like we call this with a non-default Val anywhere.
2294	This looks a bit odd. When does this happen? I don't think we call getVectorValue() for stores - and if we do, I'm not sure what the expected result is. If it doesn't happen, perhaps this ought to be an assert?
2342	I'd still appreciate a comment (and maybe an assert) that this is for VF==1.
2345	The comment is a bit odd, because the "Otherwise" looks like it refers to the previous comment ("If the value has not been scalarized...") while in fact it refers to the one before ("If the value from the original loop has not been vectorized").
2830	Just to make sure I'm not missing anything - the reason this is needed is because we eagerly created the vector entry in the beginning of vectorizeBlockInLoop(), right? We don't actually have a vector entry at this point, it's just an empty placeholder - but we need to delete it, so that if we need a real vector entry, it will get constructed from scratch from the scalars?

Michael/Adam,

Thanks very much for the comments! No problem about the delay - I'm happy to have the feedback. I've replied to your comments inline, and in one case asked a question about what we want the ValueMap interface change here to be.

In D23169#515712, @anemet wrote:

Yeah, hasStride does *not* seem to belong to WM. We could have a wrapper of WM's getVectorValue in ILV that just swaps in the speculated stride value before calling WM::getVectorValue.

getBroadcastInstrs on the other hand probably does belong to WM because it's part of the whole picture how values are mapped. So probably making it stand-alone utility is the way to go.

Anyhow this all is certainly for another patch. I just wanted to see what you guys thought about moving even more of the mapping responsibility out of ILV to WM.

Sorry to miss replying to these comments earlier. Yes, I think it makes sense to move the mapping related tasks out of ILV and into ValueMap. I think getBroadcastIstrs, since it's currently used by both ValueMap and ILV, could actually live in VectorUtils. I think it just needs to know where to insert the code.

lib/Transforms/Vectorize/LoopVectorize.cpp
526	No, I'll remove it. Thanks!
530–538	I actually did change all users to either init(Vector\|Scalar) or get(Vector\|Scalar)Value. The default parameter here was added as a replacement for the "splat" interface, which I removed. I can add this back though. But looking over your other comments, I think you're wondering why getVectorValue returns a reference to the map entries that are then able to be overridden and set by the caller? This is basically the same as the original WidenMap.get I agree that's, strange. So to be clear, you're suggesting that for now we keep the original "get" and "splat" interfaces along side the new init* interfaces? Or are you suggesting that we abandon the init* interfaces for now as well?
2294	I agree, I don't think this should happen. This was taken from the scalarization code, and I kept it this way to make the patch as NFC as possible. The only possible, but unlikely, issue I can think of is if we use presence in WidenMap as an indicator for "already vectorized". But again, I don't think we do this. I think it's perfectly sensible to add an assert. I'll update the patch.
2342	Sure, sounds good!
2345	The comments are confusing; I'll fix that. The intent was the following. Case 2: "value has been scalarized." Case 3: "value hasn't been scalarized, but also hasn't been vectorized (VF=1)". Case 4: "value has been vectorized".
2554–2556	Sure. To be clear, getVectorValue returns a reference to an entry (which can be overridden) if it exists in the map, and only computes values if it's not already there.
2648	Yes, the Entry values are set later on in the function. (The diff here is a bit misleading. Here, I had replaced WidenMap.get with getVectorValue).
2828–2832	I think Michael covered this in his comment. I'll update the comment here to explain why the erase is needed.
2830	That's right! I'll update the comment to better clarify this.

anemet added inline comments.Aug 15 2016, 7:59 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
530–538	Yes, the former. The init stuff is great, my hope is that it will slowly become the main interface along with get{Vector,Scalar}Value. I think we want to use 'get' when all we do is allocate the entries. 'init' when we actually set the value and get*Value when we let the variants be derived from each other. I think I still don't understand initVector with Val=nullptr. Isn't that just &get? If yes than I think that's what we should use for it since we're not setting up the entries. How much of this you want to put in this patch or leave it for later improvements (by you or others) I leave it to you. My goal is to use the interfaces that return mutable references as little as possible and restrain the current ones to return const & as much as possible.

Addressed comments from Adam and Michael.

This revision changes the ValueMap interface to more closely align with the spirit of Adam's comments. In particular, the init{Vector, Scalar} interface has been changed to accept a reference to {Vector, Scalar}Parts, which are then moved over to the actual map entries. Also, getVectorValue in InnerLoopVectorizer has been modified to return a constant reference.

Taking VectorizeBlockInLoop as an example, instead of getting an empty map entry with WidenMap.get() at the top of the function, we now just allocate a temporary VectorParts. Once we assign all the VectorParts entries, we then move the VectorParts to the map with initVector. So whenever we actually map a value, we use init{Vector, Scalar}. Also, because we no longer eagerly create empty mappings, eraseVector is no longer needed.

Most of the changes required to use initVector and the constant reference version of getVectorValue are fairly mechanical. One exception was interleaved accesses, where I had to reorder the code such that we didn't require mutable entries. In a few other cases, we still actually do need to change the mapped entries, and I've left getVector in ValueMap for this, which returns a non-constant reference. These cases include the "fix-up" operations that happen after the first phase of widening (i.e., type truncation and the second widening phase for recurrences).

All the other minor comments have been addressed as well. Thanks!

In D23169#519875, @mssimpso wrote:

Also, because we no longer eagerly create empty mappings, eraseVector is no longer needed.

Hurray!

This is all looking pretty nice now!

lib/Transforms/Vectorize/LoopVectorize.cpp
529–531	Please add a comment discouraging the use of this function in favor of initVector.
536	I don't think this is what you want. This will still do a copy because Entry is a const &. I think that this function should take Entry by value (or by r-value but that would make it harder to use).
2182	This should std::move Entry.
3692	I am not a big fan of auto in cases where the type is not obvious. At least we should have const. Same later.
4252	Would it be safer to do this inside the blocks? I want to make sure you didn't leave a Entry[] = ... without a corresponding initVector.

mssimpso mentioned this in D23509: [LoopVectorize] Query TTI when deciding to splat IV.Aug 22 2016, 9:42 AM

mssimpso added inline comments.Aug 22 2016, 10:25 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
529–531	Sure.
536	Ah, you're right. Thanks for pointing this out. I'll update this to be by value.
2182	Right.
3692	Sure, instead of auto, I'll just make these "const VectorParts &" to make things clear.
4252	I agree. There's a few cases that don't actually use Entry. I'll update the patch.

Addressed Adam's comments.

Regarding the moves, I think this may be a case of unneeded complexity. If the SmallVectors are sized appropriately, a move shouldn't be much better (if better at all) than a copy. So I've reversed my previous comment and left int{Vector, Scalar} taking a constant reference to {Vector, Scalar}Parts and just removed the std::move's, which were wrong as Adam pointed out. The next best thing, I think, would be if they accepted rvalues, but again, that would probably be overly complex for very little gain.

All remaining comments are addressed. Thanks!

Looks great to me! Thanks for your work.

This revision is now accepted and ready to land.Aug 23 2016, 9:39 AM

Closed by commit rL279649: [LV] Unify vector and scalar maps (authored by mssimpso). · Explain WhyAug 24 2016, 11:31 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

307 lines

test/

Transforms/

LoopVectorize/

X86/

scatter_crash.ll

96 lines

if-pred-stores.ll

10 lines

Diff 66850

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 299 Lines • ▼ Show 20 Lines	InnerLoopVectorizer(Loop *OrigLoop, PredicatedScalarEvolution &PSE,
LoopInfo LI, DominatorTree DT,		LoopInfo LI, DominatorTree DT,
const TargetLibraryInfo *TLI,		const TargetLibraryInfo *TLI,
const TargetTransformInfo TTI, AssumptionCache AC,		const TargetTransformInfo TTI, AssumptionCache AC,
OptimizationRemarkEmitter *ORE, unsigned VecWidth,		OptimizationRemarkEmitter *ORE, unsigned VecWidth,
unsigned UnrollFactor)		unsigned UnrollFactor)
: OrigLoop(OrigLoop), PSE(PSE), LI(LI), DT(DT), TLI(TLI), TTI(TTI),		: OrigLoop(OrigLoop), PSE(PSE), LI(LI), DT(DT), TLI(TLI), TTI(TTI),
AC(AC), ORE(ORE), VF(VecWidth), UF(UnrollFactor),		AC(AC), ORE(ORE), VF(VecWidth), UF(UnrollFactor),
Builder(PSE.getSE()->getContext()), Induction(nullptr),		Builder(PSE.getSE()->getContext()), Induction(nullptr),
OldInduction(nullptr), WidenMap(UnrollFactor), TripCount(nullptr),		OldInduction(nullptr), WidenMap(UnrollFactor, VecWidth),
VectorTripCount(nullptr), Legal(nullptr), AddedSafetyChecks(false) {}		TripCount(nullptr), VectorTripCount(nullptr), Legal(nullptr),
		AddedSafetyChecks(false) {}

// Perform the actual loop widening (vectorization).		// Perform the actual loop widening (vectorization).
// MinimumBitWidths maps scalar integer values to the smallest bitwidth they		// MinimumBitWidths maps scalar integer values to the smallest bitwidth they
// can be validly truncated to. The cost model has assumed this truncation		// can be validly truncated to. The cost model has assumed this truncation
// will happen when vectorizing. VecValuesToIgnore contains scalar values		// will happen when vectorizing. VecValuesToIgnore contains scalar values
// that the cost model has chosen to ignore because they will not be		// that the cost model has chosen to ignore because they will not be
// vectorized.		// vectorized.
void vectorize(LoopVectorizationLegality *L,		void vectorize(LoopVectorizationLegality *L,
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	protected:

/// When we go over instructions in the basic block we rely on previous		/// When we go over instructions in the basic block we rely on previous
/// values within the current basic block or on loop invariant values.		/// values within the current basic block or on loop invariant values.
/// When we widen (vectorize) values we place them in the map. If the values		/// When we widen (vectorize) values we place them in the map. If the values
/// are not within the map, they have to be loop invariant, so we simply		/// are not within the map, they have to be loop invariant, so we simply
/// broadcast them into a vector.		/// broadcast them into a vector.
VectorParts &getVectorValue(Value *V);		VectorParts &getVectorValue(Value *V);

		/// Return a value in the new loop corresponding to \p V from the original
		/// loop at unroll index \p Part and vector index \p Lane. If the value has
		/// been vectorized but not scalarized, the necessary extractelement
		/// instruction will be generated.
		Value getScalarValue(Value V, unsigned Part, unsigned Lane);

/// Try to vectorize the interleaved access group that \p Instr belongs to.		/// Try to vectorize the interleaved access group that \p Instr belongs to.
void vectorizeInterleaveGroup(Instruction *Instr);		void vectorizeInterleaveGroup(Instruction *Instr);

/// Generate a shuffle sequence that will reverse the vector Vec.		/// Generate a shuffle sequence that will reverse the vector Vec.
virtual Value reverseVector(Value Vec);		virtual Value reverseVector(Value Vec);

/// Returns (and creates if needed) the original loop trip count.		/// Returns (and creates if needed) the original loop trip count.
Value getOrCreateTripCount(Loop NewLoop);		Value getOrCreateTripCount(Loop NewLoop);
Show All 32 Lines	protected:

/// This is a helper class that holds the vectorizer state. It maps scalar		/// This is a helper class that holds the vectorizer state. It maps scalar
/// instructions to vector instructions. When the code is 'unrolled' then		/// instructions to vector instructions. When the code is 'unrolled' then
/// then a single scalar value is mapped to multiple vector parts. The parts		/// then a single scalar value is mapped to multiple vector parts. The parts
/// are stored in the VectorPart type.		/// are stored in the VectorPart type.
struct ValueMap {		struct ValueMap {
/// C'tor. UnrollFactor controls the number of vectors ('parts') that		/// C'tor. UnrollFactor controls the number of vectors ('parts') that
/// are mapped.		/// are mapped.
ValueMap(unsigned UnrollFactor) : UF(UnrollFactor) {}		ValueMap(unsigned UnrollFactor, unsigned VecWidth)
		: UF(UnrollFactor), VF(VecWidth) {}

		/// \return True if the map has a vector entry for \p Key.
		bool hasVector(Value *Key) const { return VectorMapStorage.count(Key); }

/// \return True if 'Key' is saved in the Value Map.		/// \return True if the map has a scalar entry for \p Key.
bool has(Value *Key) const { return MapStorage.count(Key); }		bool hasScalar(Value *Key) const { return ScalarMapStorage.count(Key); }

/// Initializes a new entry in the map. Sets all of the vector parts to the		/// Initializes a new entry in the map. Sets all of the vector parts to the
/// save value in 'Val'.		/// save value in 'Val'.
/// \return A reference to a vector with splat values.		/// \return A reference to a vector with splat values.
VectorParts &splat(Value Key, Value Val) {		VectorParts &splat(Value Key, Value Val) {
VectorParts &Entry = MapStorage[Key];		VectorParts &Entry = VectorMapStorage[Key];
Entry.assign(UF, Val);		Entry.assign(UF, Val);
return Entry;		return Entry;
}		}

///\return A reference to the value that is stored at 'Key'.		/// \return A reference to the vector map entry corresponding to \p Key.
VectorParts &get(Value *Key) {		VectorParts &getVector(Value *Key) {
VectorParts &Entry = MapStorage[Key];		return get(Key, VectorMapStorage, UF);
		mkuperUnsubmitted Done Reply Inline Actions Do we need the Val parameter? It doesn't look like we call this with a non-default Val anywhere. mkuper: Do we need the Val parameter? It doesn't look like we call this with a non-default Val anywhere.
		mssimpsoAuthorUnsubmitted Done Reply Inline Actions No, I'll remove it. Thanks! mssimpso: No, I'll remove it. Thanks!
if (Entry.empty())
Entry.resize(UF);
assert(Entry.size() == UF);
return Entry;
}		}

		/// \return A reference to the scalar map entry corresponding to \p Key.
		VectorParts &getScalar(Value *Key) {
		return get(Key, ScalarMapStorage, UF * VF);
		anemetUnsubmitted Done Reply Inline Actions Please add a comment discouraging the use of this function in favor of initVector. anemet: Please add a comment discouraging the use of this function in favor of initVector.
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Sure. mssimpso: Sure.
		}

		/// Remove the entry corresponding to \p Key from the vector map.
		bool eraseVector(Value *Key) { return VectorMapStorage.erase(Key); }

		anemetUnsubmitted Not Done Reply Inline Actions I don't think this is what you want. This will still do a copy because Entry is a const &. I think that this function should take Entry by value (or by r-value but that would make it harder to use). anemet: I don't think this is what you want. This will still do a copy because Entry is a const &. I…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Ah, you're right. Thanks for pointing this out. I'll update this to be by value. mssimpso: Ah, you're right. Thanks for pointing this out. I'll update this to be by value.
private:		private:
/// The unroll factor. Each entry in the map stores this number of vector		/// The unroll factor. Each entry in the vector map contains UF vector
		anemetUnsubmitted Not Done Reply Inline Actions If we can't change all users to use init, it would be better to keep both functionalities separate rather than hiding it behind a default parameter. Just keep the ugly get interface as well, so that the users are easy to find. Hopefully we can migrate all users to init in the future. anemet: If we can't change all users to use init*, it would be better to keep both functionalities…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions I actually did change all users to either init(Vector\|Scalar) or get(Vector\|Scalar)Value. The default parameter here was added as a replacement for the "splat" interface, which I removed. I can add this back though. But looking over your other comments, I think you're wondering why getVectorValue returns a reference to the map entries that are then able to be overridden and set by the caller? This is basically the same as the original WidenMap.get I agree that's, strange. So to be clear, you're suggesting that for now we keep the original "get" and "splat" interfaces along side the new init* interfaces? Or are you suggesting that we abandon the init* interfaces for now as well? mssimpso: I actually did change all users to either init(Vector\|Scalar) or get(Vector\|Scalar)Value. The…
		anemetUnsubmitted Not Done Reply Inline Actions Yes, the former. The init stuff is great, my hope is that it will slowly become the main interface along with get{Vector,Scalar}Value. I think we want to use 'get' when all we do is allocate the entries. 'init' when we actually set the value and getValue when we let the variants be derived from each other. I think I still don't understand initVector with Val=nullptr. Isn't that just &get? If yes than I think that's what we should use for it since we're not setting up the entries. How much of this you want to put in this patch or leave it for later improvements (by you or others) I leave it to you. My goal is to use the interfaces that return mutable references as little as possible and restrain the current ones to return const & as much as possible. anemet:* Yes, the former. The init stuff is great, my hope is that it will slowly become the main…
/// elements.		/// values.
unsigned UF;		unsigned UF;

/// Map storage. We use std::map and not DenseMap because insertions to a		/// The vectorization factor. Each entry in the scalar map contains UF * VF
/// dense map invalidates its iterators.		/// scalar values.
std::map<Value *, VectorParts> MapStorage;		unsigned VF;

		/// Vector and scalar map storage. We use std::map and not DenseMap because
		/// insertions to a dense map invalidates its iterators.
		std::map<Value *, VectorParts> VectorMapStorage;
		std::map<Value *, VectorParts> ScalarMapStorage;
		mkuperUnsubmitted Done Reply Inline Actions Perhaps use a 2D vector for this ("ScalarParts"), instead of the 1D VectorParts? I don't think we actually gain anything from using VectorParts for both - even in the one place where we do move a VectorParts from one map to the other (line 2300), you don't actually break the abstraction. mkuper: Perhaps use a 2D vector for this ("ScalarParts"), instead of the 1D VectorParts? I don't think…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Sounds good to me. mssimpso: Sounds good to me.

		/// \return A reference to the entry in the given \p Storage map
		/// corresponding to \p Key. The entry is guaranteed to contain \p Size
		/// elements.
		VectorParts &get(Value Key, std::map<Value , VectorParts> &Storage,
		unsigned Size) {
		VectorParts &Entry = Storage[Key];
		if (Entry.empty())
		Entry.resize(Size);
		assert(Entry.size() == Size && "Entry has incorrect size");
		return Entry;
		}
};		};

/// The original loop.		/// The original loop.
Loop *OrigLoop;		Loop *OrigLoop;
/// A wrapper around ScalarEvolution used to add runtime SCEV checks. Applies		/// A wrapper around ScalarEvolution used to add runtime SCEV checks. Applies
/// dynamic knowledge to simplify SCEV expressions and converts them to a		/// dynamic knowledge to simplify SCEV expressions and converts them to a
/// more usable form.		/// more usable form.
PredicatedScalarEvolution &PSE;		PredicatedScalarEvolution &PSE;
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	protected:
BasicBlock *LoopScalarBody;		BasicBlock *LoopScalarBody;
/// A list of all bypass blocks. The first block is the entry of the loop.		/// A list of all bypass blocks. The first block is the entry of the loop.
SmallVector<BasicBlock *, 4> LoopBypassBlocks;		SmallVector<BasicBlock *, 4> LoopBypassBlocks;

/// The new Induction variable which was added to the new block.		/// The new Induction variable which was added to the new block.
PHINode *Induction;		PHINode *Induction;
/// The induction variable of the old basic block.		/// The induction variable of the old basic block.
PHINode *OldInduction;		PHINode *OldInduction;
/// Maps scalars to widened vectors.
ValueMap WidenMap;

/// A map of induction variables from the original loop to their		/// Maps values from the orginal loop to their corresponding values in the
/// corresponding VF * UF scalarized values in the vectorized loop. The		/// vectorized loop. A key value can map to either vector values, scalar
/// purpose of ScalarIVMap is similar to that of WidenMap. Whereas WidenMap		/// values or both kinds of values, depending on whether they key was
/// maps original loop values to their vector versions in the new loop,		/// vectorized and scalarized.
/// ScalarIVMap maps induction variables from the original loop that are not		ValueMap WidenMap;
		mkuperUnsubmitted Done Reply Inline Actions Bike-shedding - maybe change the name of the variable? "WidenMap" made sense when it mapped scalar values to the corresponding "wide" vector values. mkuper: Bike-shedding - maybe change the name of the variable? "WidenMap" made sense when it mapped…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions I actually thought about doing this, but couldn't really come up with anything better. Do you have any ideas? What about VectorLoopMap? mssimpso: I actually thought about doing this, but couldn't really come up with anything better. Do you…
/// vectorized to their scalar equivalents in the vector loop. Maintaining a
/// separate map for scalarized induction variables allows us to avoid
/// unnecessary scalar-to-vector-to-scalar conversions.
DenseMap<Value , SmallVector<Value , 8>> ScalarIVMap;

/// Store instructions that should be predicated, as a pair		/// Store instructions that should be predicated, as a pair
/// <StoreInst, Predicate>		/// <StoreInst, Predicate>
SmallVector<std::pair<StoreInst , Value >, 4> PredicatedStores;		SmallVector<std::pair<StoreInst , Value >, 4> PredicatedStores;
EdgeMaskCache MaskCache;		EdgeMaskCache MaskCache;
/// Trip count of the original loop.		/// Trip count of the original loop.
Value *TripCount;		Value *TripCount;
/// Trip count of the widened loop (TripCount - TripCount % (VF*UF))		/// Trip count of the widened loop (TripCount - TripCount % (VF*UF))
▲ Show 20 Lines • Show All 1,525 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::buildScalarSteps(Value ScalarIV, Value Step,
// We shouldn't have to build scalar steps if we aren't vectorizing.		// We shouldn't have to build scalar steps if we aren't vectorizing.
assert(VF > 1 && "VF should be greater than one");		assert(VF > 1 && "VF should be greater than one");

// Get the value type and ensure it and the step have the same integer type.		// Get the value type and ensure it and the step have the same integer type.
Type *ScalarIVTy = ScalarIV->getType()->getScalarType();		Type *ScalarIVTy = ScalarIV->getType()->getScalarType();
assert(ScalarIVTy->isIntegerTy() && ScalarIVTy == Step->getType() &&		assert(ScalarIVTy->isIntegerTy() && ScalarIVTy == Step->getType() &&
"Val and Step should have the same integer type");		"Val and Step should have the same integer type");

// Compute the scalar steps and save the results in ScalarIVMap.		// Compute the scalar steps and save the results in WidenMap.
		auto &Entry = WidenMap.getScalar(EntryVal);
for (unsigned Part = 0; Part < UF; ++Part)		for (unsigned Part = 0; Part < UF; ++Part)
for (unsigned I = 0; I < VF; ++I) {		for (unsigned I = 0; I < VF; ++I) {
		mkuperUnsubmitted Done Reply Inline Actions I -> Lane ? mkuper: I -> Lane ?
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Makes sense. mssimpso: Makes sense.
auto StartIdx = ConstantInt::get(ScalarIVTy, VF Part + I);		auto StartIdx = ConstantInt::get(ScalarIVTy, VF Part + I);
auto *Mul = Builder.CreateMul(StartIdx, Step);		auto *Mul = Builder.CreateMul(StartIdx, Step);
auto *Add = Builder.CreateAdd(ScalarIV, Mul);		auto *Add = Builder.CreateAdd(ScalarIV, Mul);
ScalarIVMap[EntryVal].push_back(Add);		Entry[VF * Part + I] = Add;
}		}
}		}

		anemetUnsubmitted Not Done Reply Inline Actions This should std::move Entry. anemet: This should std::move Entry.
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Right. mssimpso: Right.
int LoopVectorizationLegality::isConsecutivePtr(Value *Ptr) {		int LoopVectorizationLegality::isConsecutivePtr(Value *Ptr) {
assert(Ptr->getType()->isPointerTy() && "Unexpected non-ptr");		assert(Ptr->getType()->isPointerTy() && "Unexpected non-ptr");
auto *SE = PSE.getSE();		auto *SE = PSE.getSE();
// Make sure that the pointer does not point to structs.		// Make sure that the pointer does not point to structs.
if (Ptr->getType()->getPointerElementType()->isAggregateType())		if (Ptr->getType()->getPointerElementType()->isAggregateType())
return 0;		return 0;

// If this value is a pointer induction variable, we know it is consecutive.		// If this value is a pointer induction variable, we know it is consecutive.
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	InnerLoopVectorizer::getVectorValue(Value *V) {
assert(V != Induction && "The new induction variable should not be used.");		assert(V != Induction && "The new induction variable should not be used.");
assert(!V->getType()->isVectorTy() && "Can't widen a vector");		assert(!V->getType()->isVectorTy() && "Can't widen a vector");

// If we have a stride that is replaced by one, do it here.		// If we have a stride that is replaced by one, do it here.
if (Legal->hasStride(V))		if (Legal->hasStride(V))
V = ConstantInt::get(V->getType(), 1);		V = ConstantInt::get(V->getType(), 1);

// If we have this scalar in the map, return it.		// If we have this scalar in the map, return it.
if (WidenMap.has(V))		if (WidenMap.hasVector(V))
return WidenMap.get(V);		return WidenMap.getVector(V);

		// If the value has not been vectorized, check if it has been scalarized
		// instead. If it has been scalarized, and we actually need the value in
		// vector form, we will construct the vector values on demand.
		if (WidenMap.hasScalar(V)) {

		// If V doesn't produce a value, just create an empty vector entry for it
		// in WidenMap.
		if (V->getType()->isVoidTy())
		return WidenMap.splat(V, nullptr);

		mkuperUnsubmitted Done Reply Inline Actions This looks a bit odd. When does this happen? I don't think we call getVectorValue() for stores - and if we do, I'm not sure what the expected result is. If it doesn't happen, perhaps this ought to be an assert? mkuper: This looks a bit odd. When does this happen? I don't think we call getVectorValue() for stores…
		mssimpsoAuthorUnsubmitted Done Reply Inline Actions I agree, I don't think this should happen. This was taken from the scalarization code, and I kept it this way to make the patch as NFC as possible. The only possible, but unlikely, issue I can think of is if we use presence in WidenMap as an indicator for "already vectorized". But again, I don't think we do this. I think it's perfectly sensible to add an assert. I'll update the patch. mssimpso: I agree, I don't think this should happen. This was taken from the scalarization code, and I…
		// Get the vector map entry.
		auto &Parts = WidenMap.getVector(V);

		// If we aren't vectorizing, we can just copy the scalar map values over to
		// the vector map.
		if (VF == 1) {
		for (unsigned Part = 0; Part < UF; ++Part)
		Parts[Part] = getScalarValue(V, Part, 0);
		return Parts;
		}

		// However, if we are vectorizing, we need to construct the vector values
		// using insertelement instructions. Since the resulting vectors are stored
		// in WidenMap, we will only generate the insertelements once.
		for (unsigned Part = 0; Part < UF; ++Part) {
		Parts[Part] = UndefValue::get(VectorType::get(V->getType(), VF));
		mkuperUnsubmitted Done Reply Inline Actions For clarity, I'd prefer a temp value here, and an assignment to Parts[Part] in the end. mkuper: For clarity, I'd prefer a temp value here, and an assignment to Parts[Part] in the end.
		mssimpsoAuthorUnsubmitted Done Reply Inline Actions Sure. mssimpso: Sure.
		for (unsigned Width = 0; Width < VF; ++Width)
		Parts[Part] = Builder.CreateInsertElement(
		Parts[Part], getScalarValue(V, Part, Width),
		Builder.getInt32(Width));
		}
		return Parts;
		}

// If this scalar is unknown, assume that it is a constant or that it is		// If this scalar is unknown, assume that it is a constant or that it is
// loop invariant. Broadcast V and save the value for future uses.		// loop invariant. Broadcast V and save the value for future uses.
Value *B = getBroadcastInstrs(V);		Value *B = getBroadcastInstrs(V);
return WidenMap.splat(V, B);		return WidenMap.splat(V, B);
}		}

		Value InnerLoopVectorizer::getScalarValue(Value V, unsigned Part,
		unsigned Lane) {

		// If the value is not an instruction contained in the loop, it should
		// already be scalar.
		if (OrigLoop->isLoopInvariant(V))
		return V;

		// If the value from the original loop has not been vectorized, it is
		// represented by UF * VF scalar values in the new loop. Return the requested
		// scalar value.
		if (WidenMap.hasScalar(V))
		return WidenMap.getScalar(V)[VF * Part + Lane];

		// If the value has not been scalarized, it may have been vectorized. Get the
		// value corresponding to the requested unroll index.
		auto *U = getVectorValue(V)[Part];
		if (!U->getType()->isVectorTy())
		mkuperUnsubmitted Done Reply Inline Actions I'd still appreciate a comment (and maybe an assert) that this is for VF==1. mkuper: I'd still appreciate a comment (and maybe an assert) that this is for VF==1.
		mssimpsoAuthorUnsubmitted Done Reply Inline Actions Sure, sounds good! mssimpso: Sure, sounds good!
		return U;
		mkuperUnsubmitted Not Done Reply Inline Actions Is this for VF==1, for uniform values, or both? mkuper: Is this for VF==1, for uniform values, or both?
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Nice question! This is for the VF == 1 case, and it prevents us from trying to extract an element from a non-vector type. I moved this check from the existing scalarization code (line 2827). But we could replace the if condition here with VF == 1 if that makes more sense. To be clear, if a value is not an instruction in the loop, it's handled by the first check. For the instructions in Uniforms, I believe we will vectorize/scalarize them, and then the unused pieces will be later deleted. mssimpso: Nice question! This is for the VF == 1 case, and it prevents us from trying to extract an…

		// Otherwise, the value from the original loop has been vectorized and is
		mkuperUnsubmitted Done Reply Inline Actions The comment is a bit odd, because the "Otherwise" looks like it refers to the previous comment ("If the value has not been scalarized...") while in fact it refers to the one before ("If the value from the original loop has not been vectorized"). mkuper: The comment is a bit odd, because the "Otherwise" looks like it refers to the previous comment…
		mssimpsoAuthorUnsubmitted Done Reply Inline Actions The comments are confusing; I'll fix that. The intent was the following. Case 2: "value has been scalarized." Case 3: "value hasn't been scalarized, but also hasn't been vectorized (VF=1)". Case 4: "value has been vectorized". mssimpso: The comments are confusing; I'll fix that. The intent was the following. Case 2: "value has…
		// represented by UF vector values. Extract and return the requested scalar
		// value from the appropriate vector lane.
		return Builder.CreateExtractElement(U, Builder.getInt32(Lane));
		}

Value InnerLoopVectorizer::reverseVector(Value Vec) {		Value InnerLoopVectorizer::reverseVector(Value Vec) {
assert(Vec->getType()->isVectorTy() && "Invalid type");		assert(Vec->getType()->isVectorTy() && "Invalid type");
SmallVector<Constant *, 8> ShuffleMask;		SmallVector<Constant *, 8> ShuffleMask;
for (unsigned i = 0; i < VF; ++i)		for (unsigned i = 0; i < VF; ++i)
ShuffleMask.push_back(Builder.getInt32(VF - i - 1));		ShuffleMask.push_back(Builder.getInt32(VF - i - 1));

return Builder.CreateShuffleVector(Vec, UndefValue::get(Vec->getType()),		return Builder.CreateShuffleVector(Vec, UndefValue::get(Vec->getType()),
ConstantVector::get(ShuffleMask),		ConstantVector::get(ShuffleMask),
▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::vectorizeInterleaveGroup(Instruction *Instr) {
// Prepare for the vector type of the interleaved load/store.		// Prepare for the vector type of the interleaved load/store.
Type *ScalarTy = LI ? LI->getType() : SI->getValueOperand()->getType();		Type *ScalarTy = LI ? LI->getType() : SI->getValueOperand()->getType();
unsigned InterleaveFactor = Group->getFactor();		unsigned InterleaveFactor = Group->getFactor();
Type VecTy = VectorType::get(ScalarTy, InterleaveFactor VF);		Type VecTy = VectorType::get(ScalarTy, InterleaveFactor VF);
Type *PtrTy = VecTy->getPointerTo(Ptr->getType()->getPointerAddressSpace());		Type *PtrTy = VecTy->getPointerTo(Ptr->getType()->getPointerAddressSpace());

// Prepare for the new pointers.		// Prepare for the new pointers.
setDebugLocFromInst(Builder, Ptr);		setDebugLocFromInst(Builder, Ptr);
VectorParts &PtrParts = getVectorValue(Ptr);
SmallVector<Value *, 2> NewPtrs;		SmallVector<Value *, 2> NewPtrs;
unsigned Index = Group->getIndex(Instr);		unsigned Index = Group->getIndex(Instr);
for (unsigned Part = 0; Part < UF; Part++) {		for (unsigned Part = 0; Part < UF; Part++) {
// Extract the pointer for current instruction from the pointer vector. A		Value *NewPtr = getScalarValue(Ptr, Part, Group->isReverse() ? VF - 1 : 0);
// reverse access uses the pointer in the last lane.
Value *NewPtr = Builder.CreateExtractElement(
PtrParts[Part],
Group->isReverse() ? Builder.getInt32(VF - 1) : Builder.getInt32(0));

// Notice current instruction could be any index. Need to adjust the address		// Notice current instruction could be any index. Need to adjust the address
// to the member of index 0.		// to the member of index 0.
//		//
// E.g. a = A[i+1]; // Member of index 1 (Current instruction)		// E.g. a = A[i+1]; // Member of index 1 (Current instruction)
// b = A[i]; // Member of index 0		// b = A[i]; // Member of index 0
// Current pointer is pointed to A[i+1], adjust it to A[i].		// Current pointer is pointed to A[i+1], adjust it to A[i].
//		//
Show All 28 Lines	for (unsigned Part = 0; Part < UF; Part++) {
NewLoadInstr, UndefVec, StrideMask, "strided.vec");		NewLoadInstr, UndefVec, StrideMask, "strided.vec");

// If this member has different type, cast the result type.		// If this member has different type, cast the result type.
if (Member->getType() != ScalarTy) {		if (Member->getType() != ScalarTy) {
VectorType *OtherVTy = VectorType::get(Member->getType(), VF);		VectorType *OtherVTy = VectorType::get(Member->getType(), VF);
StridedVec = Builder.CreateBitOrPointerCast(StridedVec, OtherVTy);		StridedVec = Builder.CreateBitOrPointerCast(StridedVec, OtherVTy);
}		}

VectorParts &Entry = WidenMap.get(Member);		VectorParts &Entry = WidenMap.getVector(Member);
Entry[Part] =		Entry[Part] =
Group->isReverse() ? reverseVector(StridedVec) : StridedVec;		Group->isReverse() ? reverseVector(StridedVec) : StridedVec;
		anemetUnsubmitted Not Done Reply Inline Actions This is weird in terms of its interface use. getVectorValue computes values and then this one overrides it?! I think we should just stick with the &get interface here and clean in it up in the future. anemet: This is weird in terms of its interface use. getVectorValue computes values and then this one…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Sure. To be clear, getVectorValue returns a reference to an entry (which can be overridden) if it exists in the map, and only computes values if it's not already there. mssimpso: Sure. To be clear, getVectorValue returns a reference to an entry (which can be overridden) if…
}		}

addMetadata(NewLoadInstr, Instr);		addMetadata(NewLoadInstr, Instr);
}		}
return;		return;
}		}

// The sub vector type for current instruction.		// The sub vector type for current instruction.
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr) {
bool Reverse = ConsecutiveStride < 0;		bool Reverse = ConsecutiveStride < 0;
bool CreateGatherScatter =		bool CreateGatherScatter =
!ConsecutiveStride && ((LI && Legal->isLegalMaskedGather(ScalarDataTy)) \|\|		!ConsecutiveStride && ((LI && Legal->isLegalMaskedGather(ScalarDataTy)) \|\|
(SI && Legal->isLegalMaskedScatter(ScalarDataTy)));		(SI && Legal->isLegalMaskedScatter(ScalarDataTy)));

if (!ConsecutiveStride && !CreateGatherScatter)		if (!ConsecutiveStride && !CreateGatherScatter)
return scalarizeInstruction(Instr);		return scalarizeInstruction(Instr);

Constant *Zero = Builder.getInt32(0);		VectorParts &Entry = WidenMap.getVector(Instr);
		anemetUnsubmitted Not Done Reply Inline Actions Is this call necessary? anemet: Is this call necessary?
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Yes, the Entry values are set later on in the function. (The diff here is a bit misleading. Here, I had replaced WidenMap.get with getVectorValue). mssimpso: Yes, the Entry values are set later on in the function. (The diff here is a bit misleading.
VectorParts &Entry = WidenMap.get(Instr);
VectorParts VectorGep;		VectorParts VectorGep;

// Handle consecutive loads/stores.		// Handle consecutive loads/stores.
GetElementPtrInst *Gep = getGEPInstruction(Ptr);		GetElementPtrInst *Gep = getGEPInstruction(Ptr);
if (ConsecutiveStride) {		if (ConsecutiveStride) {
if (Gep && Legal->isInductionVariable(Gep->getPointerOperand())) {		if (Gep && Legal->isInductionVariable(Gep->getPointerOperand())) {
setDebugLocFromInst(Builder, Gep);		setDebugLocFromInst(Builder, Gep);
Value *PtrOperand = Gep->getPointerOperand();		auto *FirstBasePtr = getScalarValue(Gep->getPointerOperand(), 0, 0);
Value *FirstBasePtr = getVectorValue(PtrOperand)[0];
FirstBasePtr = Builder.CreateExtractElement(FirstBasePtr, Zero);

// Create the new GEP with the new induction variable.		// Create the new GEP with the new induction variable.
GetElementPtrInst *Gep2 = cast<GetElementPtrInst>(Gep->clone());		GetElementPtrInst *Gep2 = cast<GetElementPtrInst>(Gep->clone());
Gep2->setOperand(0, FirstBasePtr);		Gep2->setOperand(0, FirstBasePtr);
Gep2->setName("gep.indvar.base");		Gep2->setName("gep.indvar.base");
Ptr = Builder.Insert(Gep2);		Ptr = Builder.Insert(Gep2);
} else if (Gep) {		} else if (Gep) {
setDebugLocFromInst(Builder, Gep);		setDebugLocFromInst(Builder, Gep);
Show All 14 Lines	if (Gep && Legal->isInductionVariable(Gep->getPointerOperand())) {
// Update last index or loop invariant instruction anchored in loop.		// Update last index or loop invariant instruction anchored in loop.
if (i == InductionOperand \|\|		if (i == InductionOperand \|\|
(GepOperandInst && OrigLoop->contains(GepOperandInst))) {		(GepOperandInst && OrigLoop->contains(GepOperandInst))) {
assert((i == InductionOperand \|\|		assert((i == InductionOperand \|\|
PSE.getSE()->isLoopInvariant(PSE.getSCEV(GepOperandInst),		PSE.getSE()->isLoopInvariant(PSE.getSCEV(GepOperandInst),
OrigLoop)) &&		OrigLoop)) &&
"Must be last index or loop invariant");		"Must be last index or loop invariant");

VectorParts &GEPParts = getVectorValue(GepOperand);		Gep2->setOperand(i, getScalarValue(GepOperand, 0, 0));

// If GepOperand is an induction variable, and there's a scalarized
// version of it available, use it. Otherwise, we will need to create
// an extractelement instruction.
Value *Index = ScalarIVMap.count(GepOperand)
? ScalarIVMap[GepOperand][0]
: Builder.CreateExtractElement(GEPParts[0], Zero);

Gep2->setOperand(i, Index);
Gep2->setName("gep.indvar.idx");		Gep2->setName("gep.indvar.idx");
}		}
}		}
Ptr = Builder.Insert(Gep2);		Ptr = Builder.Insert(Gep2);
} else { // No GEP		} else { // No GEP
// Use the induction element ptr.		// Use the induction element ptr.
assert(isa<PHINode>(Ptr) && "Invalid induction ptr");		assert(isa<PHINode>(Ptr) && "Invalid induction ptr");
setDebugLocFromInst(Builder, Ptr);		setDebugLocFromInst(Builder, Ptr);
VectorParts &PtrVal = getVectorValue(Ptr);		Ptr = getScalarValue(Ptr, 0, 0);
Ptr = Builder.CreateExtractElement(PtrVal[0], Zero);
}		}
} else {		} else {
// At this point we should vector version of GEP for Gather or Scatter		// At this point we should vector version of GEP for Gather or Scatter
assert(CreateGatherScatter && "The instruction should be scalarized");		assert(CreateGatherScatter && "The instruction should be scalarized");
if (Gep) {		if (Gep) {
// Vectorizing GEP, across UF parts. We want to get a vector value for base		// Vectorizing GEP, across UF parts. We want to get a vector value for base
// and each index that's defined inside the loop, even if it is		// and each index that's defined inside the loop, even if it is
// loop-invariant but wasn't hoisted out. Otherwise we want to keep them		// loop-invariant but wasn't hoisted out. Otherwise we want to keep them
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
void InnerLoopVectorizer::scalarizeInstruction(Instruction *Instr,		void InnerLoopVectorizer::scalarizeInstruction(Instruction *Instr,
bool IfPredicateStore) {		bool IfPredicateStore) {
assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");		assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");
// Holds vector parameters or scalars, in case of uniform vals.		// Holds vector parameters or scalars, in case of uniform vals.
SmallVector<VectorParts, 4> Params;		SmallVector<VectorParts, 4> Params;

setDebugLocFromInst(Builder, Instr);		setDebugLocFromInst(Builder, Instr);

// Find all of the vectorized parameters.
for (Value *SrcOp : Instr->operands()) {
// If we are accessing the old induction variable, use the new one.
if (SrcOp == OldInduction) {
Params.push_back(getVectorValue(SrcOp));
continue;
}

// Try using previously calculated values.
auto *SrcInst = dyn_cast<Instruction>(SrcOp);

// If the src is an instruction that appeared earlier in the basic block,
// then it should already be vectorized.
if (SrcInst && OrigLoop->contains(SrcInst)) {
assert(WidenMap.has(SrcInst) && "Source operand is unavailable");
// The parameter is a vector value from earlier.
Params.push_back(WidenMap.get(SrcInst));
} else {
// The parameter is a scalar from outside the loop. Maybe even a constant.
VectorParts Scalars;
Scalars.append(UF, SrcOp);
Params.push_back(Scalars);
}
}

assert(Params.size() == Instr->getNumOperands() &&
"Invalid number of operands");

// Does this instruction return a value ?		// Does this instruction return a value ?
bool IsVoidRetTy = Instr->getType()->isVoidTy();		bool IsVoidRetTy = Instr->getType()->isVoidTy();

Value *UndefVec =		// The instruction will not be vectorized. Erase its vector entry from
IsVoidRetTy ? nullptr		// WidenMap and get a new scalar entry instead.
: UndefValue::get(VectorType::get(Instr->getType(), VF));		WidenMap.eraseVector(Instr);
		mkuperUnsubmitted Done Reply Inline Actions Just to make sure I'm not missing anything - the reason this is needed is because we eagerly created the vector entry in the beginning of vectorizeBlockInLoop(), right? We don't actually have a vector entry at this point, it's just an empty placeholder - but we need to delete it, so that if we need a real vector entry, it will get constructed from scratch from the scalars? mkuper: Just to make sure I'm not missing anything - the reason this is needed is because we eagerly…
		mssimpsoAuthorUnsubmitted Done Reply Inline Actions That's right! I'll update the comment to better clarify this. mssimpso: That's right! I'll update the comment to better clarify this.
// Create a new entry in the WidenMap and initialize it to Undef or Null.		auto &Entry = WidenMap.getScalar(Instr);
VectorParts &VecResults = WidenMap.splat(Instr, UndefVec);

		anemetUnsubmitted Done Reply Inline Actions It would be good to explain the scenario under which we have already created a vector value for this value. anemet: It would be good to explain the scenario under which we have already created a vector value for…
		mssimpsoAuthorUnsubmitted Done Reply Inline Actions I think Michael covered this in his comment. I'll update the comment here to explain why the erase is needed. mssimpso: I think Michael covered this in his comment. I'll update the comment here to explain why the…
VectorParts Cond;		VectorParts Cond;
if (IfPredicateStore) {		if (IfPredicateStore) {
assert(Instr->getParent()->getSinglePredecessor() &&		assert(Instr->getParent()->getSinglePredecessor() &&
"Only support single predecessor blocks");		"Only support single predecessor blocks");
Cond = createEdgeMask(Instr->getParent()->getSinglePredecessor(),		Cond = createEdgeMask(Instr->getParent()->getSinglePredecessor(),
Instr->getParent());		Instr->getParent());
}		}

// For each vector unroll 'part':		// For each vector unroll 'part':
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
// For each scalar that we create:		// For each scalar that we create:
for (unsigned Width = 0; Width < VF; ++Width) {		for (unsigned Width = 0; Width < VF; ++Width) {

// Start if-block.		// Start if-block.
Value *Cmp = nullptr;		Value *Cmp = nullptr;
if (IfPredicateStore) {		if (IfPredicateStore) {
Cmp = Builder.CreateExtractElement(Cond[Part], Builder.getInt32(Width));		Cmp = Builder.CreateExtractElement(Cond[Part], Builder.getInt32(Width));
Cmp = Builder.CreateICmp(ICmpInst::ICMP_EQ, Cmp,		Cmp = Builder.CreateICmp(ICmpInst::ICMP_EQ, Cmp,
ConstantInt::get(Cmp->getType(), 1));		ConstantInt::get(Cmp->getType(), 1));
}		}

Instruction *Cloned = Instr->clone();		Instruction *Cloned = Instr->clone();
if (!IsVoidRetTy)		if (!IsVoidRetTy)
Cloned->setName(Instr->getName() + ".cloned");		Cloned->setName(Instr->getName() + ".cloned");
// Replace the operands of the cloned instructions with extracted scalars.
for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {

// If the operand is an induction variable, and there's a scalarized		// Replace the operands of the cloned instructions with their scalar
// version of it available, use it. Otherwise, we will need to create		// equivalents in the new loop.
// an extractelement instruction if vectorizing.		for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {
auto *NewOp = Params[op][Part];		auto *NewOp = getScalarValue(Instr->getOperand(op), Part, Width);
auto *ScalarOp = Instr->getOperand(op);
if (ScalarIVMap.count(ScalarOp))
NewOp = ScalarIVMap[ScalarOp][VF * Part + Width];
else if (NewOp->getType()->isVectorTy())
NewOp = Builder.CreateExtractElement(NewOp, Builder.getInt32(Width));
Cloned->setOperand(op, NewOp);		Cloned->setOperand(op, NewOp);
}		}
addNewMetadata(Cloned, Instr);		addNewMetadata(Cloned, Instr);

// Place the cloned scalar in the new loop.		// Place the cloned scalar in the new loop.
Builder.Insert(Cloned);		Builder.Insert(Cloned);

		// Add the cloned scalar to WidenMap.
		Entry[VF * Part + Width] = Cloned;

// If we just cloned a new assumption, add it the assumption cache.		// If we just cloned a new assumption, add it the assumption cache.
if (auto *II = dyn_cast<IntrinsicInst>(Cloned))		if (auto *II = dyn_cast<IntrinsicInst>(Cloned))
if (II->getIntrinsicID() == Intrinsic::assume)		if (II->getIntrinsicID() == Intrinsic::assume)
AC->registerAssumption(II);		AC->registerAssumption(II);

// If the original scalar returns a value we need to place it in a vector
// so that future users will be able to use it.
if (!IsVoidRetTy)
VecResults[Part] = Builder.CreateInsertElement(VecResults[Part], Cloned,
Builder.getInt32(Width));
// End if-block.		// End if-block.
if (IfPredicateStore)		if (IfPredicateStore)
PredicatedStores.push_back(		PredicatedStores.push_back(
std::make_pair(cast<StoreInst>(Cloned), Cmp));		std::make_pair(cast<StoreInst>(Cloned), Cmp));
}		}
}		}
}		}

▲ Show 20 Lines • Show All 629 Lines • ▼ Show 20 Lines

void InnerLoopVectorizer::truncateToMinimalBitwidths() {		void InnerLoopVectorizer::truncateToMinimalBitwidths() {
// For every instruction `I` in MinBWs, truncate the operands, create a		// For every instruction `I` in MinBWs, truncate the operands, create a
// truncated version of `I` and reextend its result. InstCombine runs		// truncated version of `I` and reextend its result. InstCombine runs
// later and will remove any ext/trunc pairs.		// later and will remove any ext/trunc pairs.
//		//
SmallPtrSet<Value *, 4> Erased;		SmallPtrSet<Value *, 4> Erased;
for (const auto &KV : *MinBWs) {		for (const auto &KV : *MinBWs) {
VectorParts &Parts = WidenMap.get(KV.first);		VectorParts &Parts = WidenMap.getVector(KV.first);
for (Value *&I : Parts) {		for (Value *&I : Parts) {
if (Erased.count(I) \|\| I->use_empty() \|\| !isa<Instruction>(I))		if (Erased.count(I) \|\| I->use_empty() \|\| !isa<Instruction>(I))
continue;		continue;
Type *OriginalTy = I->getType();		Type *OriginalTy = I->getType();
Type *ScalarTruncatedTy =		Type *ScalarTruncatedTy =
IntegerType::get(OriginalTy->getContext(), KV.second);		IntegerType::get(OriginalTy->getContext(), KV.second);
Type *TruncatedTy = VectorType::get(ScalarTruncatedTy,		Type *TruncatedTy = VectorType::get(ScalarTruncatedTy,
OriginalTy->getVectorNumElements());		OriginalTy->getVectorNumElements());
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	for (Value *&I : Parts) {
cast<Instruction>(I)->eraseFromParent();		cast<Instruction>(I)->eraseFromParent();
Erased.insert(I);		Erased.insert(I);
I = Res;		I = Res;
}		}
}		}

// We'll have created a bunch of ZExts that are now parentless. Clean up.		// We'll have created a bunch of ZExts that are now parentless. Clean up.
for (const auto &KV : *MinBWs) {		for (const auto &KV : *MinBWs) {
VectorParts &Parts = WidenMap.get(KV.first);		VectorParts &Parts = WidenMap.getVector(KV.first);
for (Value *&I : Parts) {		for (Value *&I : Parts) {
ZExtInst *Inst = dyn_cast<ZExtInst>(I);		ZExtInst *Inst = dyn_cast<ZExtInst>(I);
if (Inst && Inst->use_empty()) {		if (Inst && Inst->use_empty()) {
Value *NewI = Inst->getOperand(0);		Value *NewI = Inst->getOperand(0);
Inst->eraseFromParent();		Inst->eraseFromParent();
I = NewI;		I = NewI;
}		}
}		}
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	for (PHINode *Phi : PHIsToFix) {

// We need to generate a reduction vector from the incoming scalar.		// We need to generate a reduction vector from the incoming scalar.
// To do so, we need to generate the 'identity' vector and override		// To do so, we need to generate the 'identity' vector and override
// one of the elements with the incoming scalar reduction. We need		// one of the elements with the incoming scalar reduction. We need
// to do it in the vector-loop preheader.		// to do it in the vector-loop preheader.
Builder.SetInsertPoint(LoopBypassBlocks[1]->getTerminator());		Builder.SetInsertPoint(LoopBypassBlocks[1]->getTerminator());

// This is the vector-clone of the value that leaves the loop.		// This is the vector-clone of the value that leaves the loop.
VectorParts &VectorExit = getVectorValue(LoopExitInst);		VectorParts &VectorExit = getVectorValue(LoopExitInst);
		anemetUnsubmitted Done Reply Inline Actions I am not a big fan of auto in cases where the type is not obvious. At least we should have const. Same later. anemet: I am not a big fan of auto in cases where the type is not obvious. At least we should have…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Sure, instead of auto, I'll just make these "const VectorParts &" to make things clear. mssimpso: Sure, instead of auto, I'll just make these "const VectorParts &" to make things clear.
Type *VecTy = VectorExit[0]->getType();		Type *VecTy = VectorExit[0]->getType();

// Find the reduction identity variable. Zero for addition, or, xor,		// Find the reduction identity variable. Zero for addition, or, xor,
// one for multiplication, -1 for And.		// one for multiplication, -1 for And.
Value *Identity;		Value *Identity;
Value *VectorStart;		Value *VectorStart;
if (RK == RecurrenceDescriptor::RK_IntegerMinMax \|\|		if (RK == RecurrenceDescriptor::RK_IntegerMinMax \|\|
RK == RecurrenceDescriptor::RK_FloatMinMax) {		RK == RecurrenceDescriptor::RK_FloatMinMax) {
Show All 22 Lines	if (RK == RecurrenceDescriptor::RK_IntegerMinMax \|\|
Builder.CreateInsertElement(Identity, ReductionStartValue, Zero);		Builder.CreateInsertElement(Identity, ReductionStartValue, Zero);
}		}
}		}

// Fix the vector-loop phi.		// Fix the vector-loop phi.

// Reductions do not have to start at zero. They can start with		// Reductions do not have to start at zero. They can start with
// any loop invariant values.		// any loop invariant values.
VectorParts &VecRdxPhi = WidenMap.get(Phi);		VectorParts &VecRdxPhi = WidenMap.getVector(Phi);
BasicBlock *Latch = OrigLoop->getLoopLatch();		BasicBlock *Latch = OrigLoop->getLoopLatch();
Value *LoopVal = Phi->getIncomingValueForBlock(Latch);		Value *LoopVal = Phi->getIncomingValueForBlock(Latch);
VectorParts &Val = getVectorValue(LoopVal);		VectorParts &Val = getVectorValue(LoopVal);
for (unsigned part = 0; part < UF; ++part) {		for (unsigned part = 0; part < UF; ++part) {
// Make sure to add the reduction stat value only to the		// Make sure to add the reduction stat value only to the
// first unroll part.		// first unroll part.
Value *StartVal = (part == 0) ? VectorStart : Identity;		Value *StartVal = (part == 0) ? VectorStart : Identity;
cast<PHINode>(VecRdxPhi[part])		cast<PHINode>(VecRdxPhi[part])
▲ Show 20 Lines • Show All 504 Lines • ▼ Show 20 Lines	case InductionDescriptor::IK_FpInduction: {
return;		return;
}		}
}		}
}		}

void InnerLoopVectorizer::vectorizeBlockInLoop(BasicBlock BB, PhiVector PV) {		void InnerLoopVectorizer::vectorizeBlockInLoop(BasicBlock BB, PhiVector PV) {
// For each instruction in the old loop.		// For each instruction in the old loop.
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
VectorParts &Entry = WidenMap.get(&I);		VectorParts &Entry = WidenMap.getVector(&I);
		anemetUnsubmitted Done Reply Inline Actions Would it be safer to do this inside the blocks? I want to make sure you didn't leave a Entry[] = ... without a corresponding initVector. anemet: Would it be safer to do this inside the blocks? I want to make sure you didn't leave a Entry[]…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions I agree. There's a few cases that don't actually use Entry. I'll update the patch. mssimpso: I agree. There's a few cases that don't actually use Entry. I'll update the patch.

switch (I.getOpcode()) {		switch (I.getOpcode()) {
case Instruction::Br:		case Instruction::Br:
// Nothing to do for PHIs and BR, since we already took care of the		// Nothing to do for PHIs and BR, since we already took care of the
// loop control flow instructions.		// loop control flow instructions.
continue;		continue;
case Instruction::PHI: {		case Instruction::PHI: {
// Vectorize PHINodes.		// Vectorize PHINodes.
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	case Instruction::Select: {
// The condition can be loop invariant but still defined inside the		// The condition can be loop invariant but still defined inside the
// loop. This means that we can't just use the original 'cond' value.		// loop. This means that we can't just use the original 'cond' value.
// We have to take the 'vectorized' value and pick the first lane.		// We have to take the 'vectorized' value and pick the first lane.
// Instcombine will make this a no-op.		// Instcombine will make this a no-op.
VectorParts &Cond = getVectorValue(I.getOperand(0));		VectorParts &Cond = getVectorValue(I.getOperand(0));
VectorParts &Op0 = getVectorValue(I.getOperand(1));		VectorParts &Op0 = getVectorValue(I.getOperand(1));
VectorParts &Op1 = getVectorValue(I.getOperand(2));		VectorParts &Op1 = getVectorValue(I.getOperand(2));

Value *ScalarCond =		auto *ScalarCond = getScalarValue(I.getOperand(0), 0, 0);
(VF == 1)
? Cond[0]
: Builder.CreateExtractElement(Cond[0], Builder.getInt32(0));

for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Entry[Part] = Builder.CreateSelect(		Entry[Part] = Builder.CreateSelect(
InvariantCond ? ScalarCond : Cond[Part], Op0[Part], Op1[Part]);		InvariantCond ? ScalarCond : Cond[Part], Op0[Part], Op1[Part]);
}		}

addMetadata(Entry, &I);		addMetadata(Entry, &I);
break;		break;
▲ Show 20 Lines • Show All 2,084 Lines • ▼ Show 20 Lines
void InnerLoopUnroller::scalarizeInstruction(Instruction *Instr,		void InnerLoopUnroller::scalarizeInstruction(Instruction *Instr,
bool IfPredicateStore) {		bool IfPredicateStore) {
assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");		assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");
// Holds vector parameters or scalars, in case of uniform vals.		// Holds vector parameters or scalars, in case of uniform vals.
SmallVector<VectorParts, 4> Params;		SmallVector<VectorParts, 4> Params;

setDebugLocFromInst(Builder, Instr);		setDebugLocFromInst(Builder, Instr);

// Find all of the vectorized parameters.
for (Value *SrcOp : Instr->operands()) {
// If we are accessing the old induction variable, use the new one.
if (SrcOp == OldInduction) {
Params.push_back(getVectorValue(SrcOp));
continue;
}

// Try using previously calculated values.
Instruction *SrcInst = dyn_cast<Instruction>(SrcOp);

// If the src is an instruction that appeared earlier in the basic block
// then it should already be vectorized.
if (SrcInst && OrigLoop->contains(SrcInst)) {
assert(WidenMap.has(SrcInst) && "Source operand is unavailable");
// The parameter is a vector value from earlier.
Params.push_back(WidenMap.get(SrcInst));
} else {
// The parameter is a scalar from outside the loop. Maybe even a constant.
VectorParts Scalars;
Scalars.append(UF, SrcOp);
Params.push_back(Scalars);
}
}

assert(Params.size() == Instr->getNumOperands() &&
"Invalid number of operands");

// Does this instruction return a value ?		// Does this instruction return a value ?
bool IsVoidRetTy = Instr->getType()->isVoidTy();		bool IsVoidRetTy = Instr->getType()->isVoidTy();

Value *UndefVec = IsVoidRetTy ? nullptr : UndefValue::get(Instr->getType());		// The instruction will not be vectorized. Erase its vector entry from
// Create a new entry in the WidenMap and initialize it to Undef or Null.		// WidenMap and get a new scalar entry instead.
VectorParts &VecResults = WidenMap.splat(Instr, UndefVec);		WidenMap.eraseVector(Instr);
		auto &Entry = WidenMap.getScalar(Instr);

VectorParts Cond;		VectorParts Cond;
if (IfPredicateStore) {		if (IfPredicateStore) {
assert(Instr->getParent()->getSinglePredecessor() &&		assert(Instr->getParent()->getSinglePredecessor() &&
"Only support single predecessor blocks");		"Only support single predecessor blocks");
Cond = createEdgeMask(Instr->getParent()->getSinglePredecessor(),		Cond = createEdgeMask(Instr->getParent()->getSinglePredecessor(),
Instr->getParent());		Instr->getParent());
}		}
Show All 10 Lines	if (IfPredicateStore) {
Builder.CreateExtractElement(Cond[Part], Builder.getInt32(0));		Builder.CreateExtractElement(Cond[Part], Builder.getInt32(0));
Cmp = Builder.CreateICmp(ICmpInst::ICMP_EQ, Cond[Part],		Cmp = Builder.CreateICmp(ICmpInst::ICMP_EQ, Cond[Part],
ConstantInt::get(Cond[Part]->getType(), 1));		ConstantInt::get(Cond[Part]->getType(), 1));
}		}

Instruction *Cloned = Instr->clone();		Instruction *Cloned = Instr->clone();
if (!IsVoidRetTy)		if (!IsVoidRetTy)
Cloned->setName(Instr->getName() + ".cloned");		Cloned->setName(Instr->getName() + ".cloned");
// Replace the operands of the cloned instructions with extracted scalars.
		// Replace the operands of the cloned instructions with their scalar
		// equivalents in the new loop.
for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {		for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {
Value *Op = Params[op][Part];		auto *NewOp = getScalarValue(Instr->getOperand(op), Part, 0);
Cloned->setOperand(op, Op);		Cloned->setOperand(op, NewOp);
}		}

// Place the cloned scalar in the new loop.		// Place the cloned scalar in the new loop.
Builder.Insert(Cloned);		Builder.Insert(Cloned);

		// Add the cloned scalar to WidenMap.
		Entry[Part] = Cloned;

// If we just cloned a new assumption, add it the assumption cache.		// If we just cloned a new assumption, add it the assumption cache.
if (auto *II = dyn_cast<IntrinsicInst>(Cloned))		if (auto *II = dyn_cast<IntrinsicInst>(Cloned))
if (II->getIntrinsicID() == Intrinsic::assume)		if (II->getIntrinsicID() == Intrinsic::assume)
AC->registerAssumption(II);		AC->registerAssumption(II);

// If the original scalar returns a value we need to place it in a vector
// so that future users will be able to use it.
if (!IsVoidRetTy)
VecResults[Part] = Cloned;

// End if-block.		// End if-block.
if (IfPredicateStore)		if (IfPredicateStore)
PredicatedStores.push_back(std::make_pair(cast<StoreInst>(Cloned), Cmp));		PredicatedStores.push_back(std::make_pair(cast<StoreInst>(Cloned), Cmp));
}		}
}		}

void InnerLoopUnroller::vectorizeMemoryInstruction(Instruction *Instr) {		void InnerLoopUnroller::vectorizeMemoryInstruction(Instruction *Instr) {
auto *SI = dyn_cast<StoreInst>(Instr);		auto *SI = dyn_cast<StoreInst>(Instr);
▲ Show 20 Lines • Show All 367 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/X86/scatter_crash.ll

	Show All 33 Lines
	; CHECK-NEXT: [[IND20:%.*]] = add i64 %offset.idx, 20			; CHECK-NEXT: [[IND20:%.*]] = add i64 %offset.idx, 20
	; CHECK-NEXT: [[IND22:%.*]] = add i64 %offset.idx, 22			; CHECK-NEXT: [[IND22:%.*]] = add i64 %offset.idx, 22
	; CHECK-NEXT: [[IND24:%.*]] = add i64 %offset.idx, 24			; CHECK-NEXT: [[IND24:%.*]] = add i64 %offset.idx, 24
	; CHECK-NEXT: [[IND26:%.*]] = add i64 %offset.idx, 26			; CHECK-NEXT: [[IND26:%.*]] = add i64 %offset.idx, 26
	; CHECK-NEXT: [[IND28:%.*]] = add i64 %offset.idx, 28			; CHECK-NEXT: [[IND28:%.*]] = add i64 %offset.idx, 28
	; CHECK-NEXT: [[IND30:%.*]] = add i64 %offset.idx, 30			; CHECK-NEXT: [[IND30:%.*]] = add i64 %offset.idx, 30
	; CHECK-NEXT: [[TMP10:%.*]] = sub nsw <16 x i64> <i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8>, [[VEC_IND]]			; CHECK-NEXT: [[TMP10:%.*]] = sub nsw <16 x i64> <i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8>, [[VEC_IND]]
	; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND00]]			; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND00]]
	; CHECK-NEXT: [[TMP13:%.]] = insertelement <16 x [10 x i32]> undef, [10 x i32]* [[TMP12]], i32 0
	; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND02]]			; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND02]]
	; CHECK-NEXT: [[TMP16:%.]] = insertelement <16 x [10 x i32]> [[TMP13]], [10 x i32]* [[TMP15]], i32 1
	; CHECK-NEXT: [[TMP18:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND04]]			; CHECK-NEXT: [[TMP18:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND04]]
	; CHECK-NEXT: [[TMP19:%.]] = insertelement <16 x [10 x i32]> [[TMP16]], [10 x i32]* [[TMP18]], i32 2
	; CHECK-NEXT: [[TMP21:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND06]]			; CHECK-NEXT: [[TMP21:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND06]]
	; CHECK-NEXT: [[TMP22:%.]] = insertelement <16 x [10 x i32]> [[TMP19]], [10 x i32]* [[TMP21]], i32 3
	; CHECK-NEXT: [[TMP24:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND08]]			; CHECK-NEXT: [[TMP24:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND08]]
	; CHECK-NEXT: [[TMP25:%.]] = insertelement <16 x [10 x i32]> [[TMP22]], [10 x i32]* [[TMP24]], i32 4
	; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND10]]			; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND10]]
	; CHECK-NEXT: [[TMP28:%.]] = insertelement <16 x [10 x i32]> [[TMP25]], [10 x i32]* [[TMP27]], i32 5
	; CHECK-NEXT: [[TMP30:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND12]]			; CHECK-NEXT: [[TMP30:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND12]]
	; CHECK-NEXT: [[TMP31:%.]] = insertelement <16 x [10 x i32]> [[TMP28]], [10 x i32]* [[TMP30]], i32 6
	; CHECK-NEXT: [[TMP33:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND14]]			; CHECK-NEXT: [[TMP33:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND14]]
	; CHECK-NEXT: [[TMP34:%.]] = insertelement <16 x [10 x i32]> [[TMP31]], [10 x i32]* [[TMP33]], i32 7
	; CHECK-NEXT: [[TMP36:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND16]]			; CHECK-NEXT: [[TMP36:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND16]]
	; CHECK-NEXT: [[TMP37:%.]] = insertelement <16 x [10 x i32]> [[TMP34]], [10 x i32]* [[TMP36]], i32 8
	; CHECK-NEXT: [[TMP39:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND18]]			; CHECK-NEXT: [[TMP39:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND18]]
	; CHECK-NEXT: [[TMP40:%.]] = insertelement <16 x [10 x i32]> [[TMP37]], [10 x i32]* [[TMP39]], i32 9
	; CHECK-NEXT: [[TMP42:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND20]]			; CHECK-NEXT: [[TMP42:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND20]]
	; CHECK-NEXT: [[TMP43:%.]] = insertelement <16 x [10 x i32]> [[TMP40]], [10 x i32]* [[TMP42]], i32 10
	; CHECK-NEXT: [[TMP45:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND22]]			; CHECK-NEXT: [[TMP45:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND22]]
	; CHECK-NEXT: [[TMP46:%.]] = insertelement <16 x [10 x i32]> [[TMP43]], [10 x i32]* [[TMP45]], i32 11
	; CHECK-NEXT: [[TMP48:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND24]]			; CHECK-NEXT: [[TMP48:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND24]]
	; CHECK-NEXT: [[TMP49:%.]] = insertelement <16 x [10 x i32]> [[TMP46]], [10 x i32]* [[TMP48]], i32 12
	; CHECK-NEXT: [[TMP51:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND26]]			; CHECK-NEXT: [[TMP51:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND26]]
	; CHECK-NEXT: [[TMP52:%.]] = insertelement <16 x [10 x i32]> [[TMP49]], [10 x i32]* [[TMP51]], i32 13
	; CHECK-NEXT: [[TMP54:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND28]]			; CHECK-NEXT: [[TMP54:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND28]]
	; CHECK-NEXT: [[TMP55:%.]] = insertelement <16 x [10 x i32]> [[TMP52]], [10 x i32]* [[TMP54]], i32 14
	; CHECK-NEXT: [[TMP57:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND30]]			; CHECK-NEXT: [[TMP57:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND30]]
	; CHECK-NEXT: [[TMP58:%.]] = insertelement <16 x [10 x i32]> [[TMP55]], [10 x i32]* [[TMP57]], i32 15
	; CHECK-NEXT: [[TMP59:%.*]] = add nsw <16 x i64> [[TMP10]], [[VEC_IND3]]			; CHECK-NEXT: [[TMP59:%.*]] = add nsw <16 x i64> [[TMP10]], [[VEC_IND3]]
	; CHECK-NEXT: [[TMP60:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 0
	; CHECK-NEXT: [[TMP61:%.*]] = extractelement <16 x i64> [[TMP59]], i32 0			; CHECK-NEXT: [[TMP61:%.*]] = extractelement <16 x i64> [[TMP59]], i32 0
	; CHECK-NEXT: [[TMP62:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP60]], i64 [[TMP61]], i64 0			; CHECK-NEXT: [[TMP62:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP12]], i64 [[TMP61]], i64 0
	; CHECK-NEXT: [[TMP63:%.]] = insertelement <16 x i32> undef, i32* [[TMP62]], i32 0
	; CHECK-NEXT: [[TMP64:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 1
	; CHECK-NEXT: [[TMP65:%.*]] = extractelement <16 x i64> [[TMP59]], i32 1			; CHECK-NEXT: [[TMP65:%.*]] = extractelement <16 x i64> [[TMP59]], i32 1
	; CHECK-NEXT: [[TMP66:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP64]], i64 [[TMP65]], i64 0			; CHECK-NEXT: [[TMP66:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP15]], i64 [[TMP65]], i64 0
	; CHECK-NEXT: [[TMP67:%.]] = insertelement <16 x i32> [[TMP63]], i32* [[TMP66]], i32 1
	; CHECK-NEXT: [[TMP68:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 2
	; CHECK-NEXT: [[TMP69:%.*]] = extractelement <16 x i64> [[TMP59]], i32 2			; CHECK-NEXT: [[TMP69:%.*]] = extractelement <16 x i64> [[TMP59]], i32 2
	; CHECK-NEXT: [[TMP70:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP68]], i64 [[TMP69]], i64 0			; CHECK-NEXT: [[TMP70:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP18]], i64 [[TMP69]], i64 0
	; CHECK-NEXT: [[TMP71:%.]] = insertelement <16 x i32> [[TMP67]], i32* [[TMP70]], i32 2
	; CHECK-NEXT: [[TMP72:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 3
	; CHECK-NEXT: [[TMP73:%.*]] = extractelement <16 x i64> [[TMP59]], i32 3			; CHECK-NEXT: [[TMP73:%.*]] = extractelement <16 x i64> [[TMP59]], i32 3
	; CHECK-NEXT: [[TMP74:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP72]], i64 [[TMP73]], i64 0			; CHECK-NEXT: [[TMP74:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP21]], i64 [[TMP73]], i64 0
	; CHECK-NEXT: [[TMP75:%.]] = insertelement <16 x i32> [[TMP71]], i32* [[TMP74]], i32 3
	; CHECK-NEXT: [[TMP76:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 4
	; CHECK-NEXT: [[TMP77:%.*]] = extractelement <16 x i64> [[TMP59]], i32 4			; CHECK-NEXT: [[TMP77:%.*]] = extractelement <16 x i64> [[TMP59]], i32 4
	; CHECK-NEXT: [[TMP78:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP76]], i64 [[TMP77]], i64 0			; CHECK-NEXT: [[TMP78:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP24]], i64 [[TMP77]], i64 0
	; CHECK-NEXT: [[TMP79:%.]] = insertelement <16 x i32> [[TMP75]], i32* [[TMP78]], i32 4
	; CHECK-NEXT: [[TMP80:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 5
	; CHECK-NEXT: [[TMP81:%.*]] = extractelement <16 x i64> [[TMP59]], i32 5			; CHECK-NEXT: [[TMP81:%.*]] = extractelement <16 x i64> [[TMP59]], i32 5
	; CHECK-NEXT: [[TMP82:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP80]], i64 [[TMP81]], i64 0			; CHECK-NEXT: [[TMP82:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP27]], i64 [[TMP81]], i64 0
	; CHECK-NEXT: [[TMP83:%.]] = insertelement <16 x i32> [[TMP79]], i32* [[TMP82]], i32 5
	; CHECK-NEXT: [[TMP84:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 6
	; CHECK-NEXT: [[TMP85:%.*]] = extractelement <16 x i64> [[TMP59]], i32 6			; CHECK-NEXT: [[TMP85:%.*]] = extractelement <16 x i64> [[TMP59]], i32 6
	; CHECK-NEXT: [[TMP86:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP84]], i64 [[TMP85]], i64 0			; CHECK-NEXT: [[TMP86:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP30]], i64 [[TMP85]], i64 0
	; CHECK-NEXT: [[TMP87:%.]] = insertelement <16 x i32> [[TMP83]], i32* [[TMP86]], i32 6
	; CHECK-NEXT: [[TMP88:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 7
	; CHECK-NEXT: [[TMP89:%.*]] = extractelement <16 x i64> [[TMP59]], i32 7			; CHECK-NEXT: [[TMP89:%.*]] = extractelement <16 x i64> [[TMP59]], i32 7
	; CHECK-NEXT: [[TMP90:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP88]], i64 [[TMP89]], i64 0			; CHECK-NEXT: [[TMP90:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP33]], i64 [[TMP89]], i64 0
	; CHECK-NEXT: [[TMP91:%.]] = insertelement <16 x i32> [[TMP87]], i32* [[TMP90]], i32 7
	; CHECK-NEXT: [[TMP92:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 8
	; CHECK-NEXT: [[TMP93:%.*]] = extractelement <16 x i64> [[TMP59]], i32 8			; CHECK-NEXT: [[TMP93:%.*]] = extractelement <16 x i64> [[TMP59]], i32 8
	; CHECK-NEXT: [[TMP94:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP92]], i64 [[TMP93]], i64 0			; CHECK-NEXT: [[TMP94:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP36]], i64 [[TMP93]], i64 0
	; CHECK-NEXT: [[TMP95:%.]] = insertelement <16 x i32> [[TMP91]], i32* [[TMP94]], i32 8
	; CHECK-NEXT: [[TMP96:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 9
	; CHECK-NEXT: [[TMP97:%.*]] = extractelement <16 x i64> [[TMP59]], i32 9			; CHECK-NEXT: [[TMP97:%.*]] = extractelement <16 x i64> [[TMP59]], i32 9
	; CHECK-NEXT: [[TMP98:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP96]], i64 [[TMP97]], i64 0			; CHECK-NEXT: [[TMP98:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP39]], i64 [[TMP97]], i64 0
	; CHECK-NEXT: [[TMP99:%.]] = insertelement <16 x i32> [[TMP95]], i32* [[TMP98]], i32 9
	; CHECK-NEXT: [[TMP100:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 10
	; CHECK-NEXT: [[TMP101:%.*]] = extractelement <16 x i64> [[TMP59]], i32 10			; CHECK-NEXT: [[TMP101:%.*]] = extractelement <16 x i64> [[TMP59]], i32 10
	; CHECK-NEXT: [[TMP102:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP100]], i64 [[TMP101]], i64 0			; CHECK-NEXT: [[TMP102:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP42]], i64 [[TMP101]], i64 0
	; CHECK-NEXT: [[TMP103:%.]] = insertelement <16 x i32> [[TMP99]], i32* [[TMP102]], i32 10
	; CHECK-NEXT: [[TMP104:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 11
	; CHECK-NEXT: [[TMP105:%.*]] = extractelement <16 x i64> [[TMP59]], i32 11			; CHECK-NEXT: [[TMP105:%.*]] = extractelement <16 x i64> [[TMP59]], i32 11
	; CHECK-NEXT: [[TMP106:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP104]], i64 [[TMP105]], i64 0			; CHECK-NEXT: [[TMP106:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP45]], i64 [[TMP105]], i64 0
	; CHECK-NEXT: [[TMP107:%.]] = insertelement <16 x i32> [[TMP103]], i32* [[TMP106]], i32 11
	; CHECK-NEXT: [[TMP108:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 12
	; CHECK-NEXT: [[TMP109:%.*]] = extractelement <16 x i64> [[TMP59]], i32 12			; CHECK-NEXT: [[TMP109:%.*]] = extractelement <16 x i64> [[TMP59]], i32 12
	; CHECK-NEXT: [[TMP110:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP108]], i64 [[TMP109]], i64 0			; CHECK-NEXT: [[TMP110:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP48]], i64 [[TMP109]], i64 0
	; CHECK-NEXT: [[TMP111:%.]] = insertelement <16 x i32> [[TMP107]], i32* [[TMP110]], i32 12
	; CHECK-NEXT: [[TMP112:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 13
	; CHECK-NEXT: [[TMP113:%.*]] = extractelement <16 x i64> [[TMP59]], i32 13			; CHECK-NEXT: [[TMP113:%.*]] = extractelement <16 x i64> [[TMP59]], i32 13
	; CHECK-NEXT: [[TMP114:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP112]], i64 [[TMP113]], i64 0			; CHECK-NEXT: [[TMP114:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP51]], i64 [[TMP113]], i64 0
	; CHECK-NEXT: [[TMP115:%.]] = insertelement <16 x i32> [[TMP111]], i32* [[TMP114]], i32 13
	; CHECK-NEXT: [[TMP116:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 14
	; CHECK-NEXT: [[TMP117:%.*]] = extractelement <16 x i64> [[TMP59]], i32 14			; CHECK-NEXT: [[TMP117:%.*]] = extractelement <16 x i64> [[TMP59]], i32 14
	; CHECK-NEXT: [[TMP118:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP116]], i64 [[TMP117]], i64 0			; CHECK-NEXT: [[TMP118:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP54]], i64 [[TMP117]], i64 0
	; CHECK-NEXT: [[TMP119:%.]] = insertelement <16 x i32> [[TMP115]], i32* [[TMP118]], i32 14
	; CHECK-NEXT: [[TMP120:%.]] = extractelement <16 x [10 x i32]> [[TMP58]], i32 15
	; CHECK-NEXT: [[TMP121:%.*]] = extractelement <16 x i64> [[TMP59]], i32 15			; CHECK-NEXT: [[TMP121:%.*]] = extractelement <16 x i64> [[TMP59]], i32 15
	; CHECK-NEXT: [[TMP122:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP120]], i64 [[TMP121]], i64 0			; CHECK-NEXT: [[TMP122:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP57]], i64 [[TMP121]], i64 0
	; CHECK-NEXT: [[TMP123:%.]] = insertelement <16 x i32> [[TMP119]], i32* [[TMP122]], i32 15			; CHECK-NEXT: [[TMP13:%.]] = insertelement <16 x [10 x i32]> undef, [10 x i32]* [[TMP12]], i32 0
				; CHECK-NEXT: [[TMP16:%.]] = insertelement <16 x [10 x i32]> [[TMP13]], [10 x i32]* [[TMP15]], i32 1
				; CHECK-NEXT: [[TMP19:%.]] = insertelement <16 x [10 x i32]> [[TMP16]], [10 x i32]* [[TMP18]], i32 2
				; CHECK-NEXT: [[TMP22:%.]] = insertelement <16 x [10 x i32]> [[TMP19]], [10 x i32]* [[TMP21]], i32 3
				; CHECK-NEXT: [[TMP25:%.]] = insertelement <16 x [10 x i32]> [[TMP22]], [10 x i32]* [[TMP24]], i32 4
				; CHECK-NEXT: [[TMP28:%.]] = insertelement <16 x [10 x i32]> [[TMP25]], [10 x i32]* [[TMP27]], i32 5
				; CHECK-NEXT: [[TMP31:%.]] = insertelement <16 x [10 x i32]> [[TMP28]], [10 x i32]* [[TMP30]], i32 6
				; CHECK-NEXT: [[TMP34:%.]] = insertelement <16 x [10 x i32]> [[TMP31]], [10 x i32]* [[TMP33]], i32 7
				; CHECK-NEXT: [[TMP37:%.]] = insertelement <16 x [10 x i32]> [[TMP34]], [10 x i32]* [[TMP36]], i32 8
				; CHECK-NEXT: [[TMP40:%.]] = insertelement <16 x [10 x i32]> [[TMP37]], [10 x i32]* [[TMP39]], i32 9
				; CHECK-NEXT: [[TMP43:%.]] = insertelement <16 x [10 x i32]> [[TMP40]], [10 x i32]* [[TMP42]], i32 10
				; CHECK-NEXT: [[TMP46:%.]] = insertelement <16 x [10 x i32]> [[TMP43]], [10 x i32]* [[TMP45]], i32 11
				; CHECK-NEXT: [[TMP49:%.]] = insertelement <16 x [10 x i32]> [[TMP46]], [10 x i32]* [[TMP48]], i32 12
				; CHECK-NEXT: [[TMP52:%.]] = insertelement <16 x [10 x i32]> [[TMP49]], [10 x i32]* [[TMP51]], i32 13
				; CHECK-NEXT: [[TMP55:%.]] = insertelement <16 x [10 x i32]> [[TMP52]], [10 x i32]* [[TMP54]], i32 14
				; CHECK-NEXT: [[TMP58:%.]] = insertelement <16 x [10 x i32]> [[TMP55]], [10 x i32]* [[TMP57]], i32 15
	; CHECK-NEXT: [[VECTORGEP:%.]] = getelementptr inbounds [10 x i32], <16 x [10 x i32]> [[TMP58]], <16 x i64> [[TMP59]], i64 0			; CHECK-NEXT: [[VECTORGEP:%.]] = getelementptr inbounds [10 x i32], <16 x [10 x i32]> [[TMP58]], <16 x i64> [[TMP59]], i64 0
	; CHECK-NEXT: call void @llvm.masked.scatter.v16i32(<16 x i32> <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>, <16 x i32*> [[VECTORGEP]], i32 16, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)			; CHECK-NEXT: call void @llvm.masked.scatter.v16i32(<16 x i32> <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>, <16 x i32*> [[VECTORGEP]], i32 16, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)
	; CHECK: [[STEP_ADD:%.*]] = add <16 x i64> [[VEC_IND]], <i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32>			; CHECK: [[STEP_ADD:%.*]] = add <16 x i64> [[VEC_IND]], <i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32>
	; CHECK: [[STEP_ADD4:%.*]] = add <16 x i64> [[VEC_IND3]], <i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32>			; CHECK: [[STEP_ADD4:%.*]] = add <16 x i64> [[VEC_IND3]], <i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32>
	entry:			entry:
	%0 = load i32, i32* @c, align 4			%0 = load i32, i32* @c, align 4
	%cmp34 = icmp sgt i32 %0, 8			%cmp34 = icmp sgt i32 %0, 8
	br i1 %cmp34, label %for.body.lr.ph, label %for.cond.cleanup			br i1 %cmp34, label %for.body.lr.ph, label %for.cond.cleanup
	▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/if-pred-stores.ll

	; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info -simplifycfg < %s \| FileCheck %s --check-prefix=UNROLL			; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info -simplifycfg < %s \| FileCheck %s --check-prefix=UNROLL
	; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info < %s \| FileCheck %s --check-prefix=UNROLL-NOSIMPLIFY			; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info < %s \| FileCheck %s --check-prefix=UNROLL-NOSIMPLIFY
	; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=2 -force-vector-interleave=1 -loop-vectorize -enable-cond-stores-vec -verify-loop-info -simplifycfg < %s \| FileCheck %s --check-prefix=VEC			; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=2 -force-vector-interleave=1 -loop-vectorize -enable-cond-stores-vec -verify-loop-info -simplifycfg < %s \| FileCheck %s --check-prefix=VEC
	; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=2 -force-vector-interleave=1 -loop-vectorize -enable-cond-stores-vec -verify-loop-info -simplifycfg -instcombine < %s \| FileCheck %s --check-prefix=VEC-IC			; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=2 -force-vector-interleave=1 -loop-vectorize -enable-cond-stores-vec -verify-loop-info -simplifycfg -instcombine < %s \| FileCheck %s --check-prefix=VEC-IC

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.9.0"			target triple = "x86_64-apple-macosx10.9.0"

	; Test predication of stores.			; Test predication of stores.
	define i32 @test(i32* nocapture %f) #0 {			define i32 @test(i32* nocapture %f) #0 {
	entry:			entry:
	br label %for.body			br label %for.body

	; VEC-LABEL: test			; VEC-LABEL: test
				; VEC: %[[v0:.+]] = add i64 %index, 0
				; VEC: %[[v1:.+]] = add i64 %index, 1
				; VEC: %[[v2:.+]] = getelementptr inbounds i32, i32* %f, i64 %[[v0]]
				; VEC: %[[v4:.+]] = getelementptr inbounds i32, i32* %f, i64 %[[v1]]
	; VEC: %[[v8:.+]] = icmp sgt <2 x i32> %{{.*}}, <i32 100, i32 100>			; VEC: %[[v8:.+]] = icmp sgt <2 x i32> %{{.*}}, <i32 100, i32 100>
	; VEC: %[[v9:.+]] = add nsw <2 x i32> %{{.*}}, <i32 20, i32 20>			; VEC: %[[v9:.+]] = add nsw <2 x i32> %{{.*}}, <i32 20, i32 20>
	; VEC: %[[v10:.+]] = and <2 x i1> %[[v8]], <i1 true, i1 true>			; VEC: %[[v10:.+]] = and <2 x i1> %[[v8]], <i1 true, i1 true>
	; VEC: %[[v11:.+]] = extractelement <2 x i1> %[[v10]], i32 0			; VEC: %[[v11:.+]] = extractelement <2 x i1> %[[v10]], i32 0
	; VEC: %[[v12:.+]] = icmp eq i1 %[[v11]], true			; VEC: %[[v12:.+]] = icmp eq i1 %[[v11]], true
	; VEC: %[[v13:.+]] = extractelement <2 x i32> %[[v9]], i32 0			; VEC: %[[v13:.+]] = extractelement <2 x i32> %[[v9]], i32 0
	; VEC: %[[v14:.+]] = extractelement <2 x i32> %{{.}}, i32 0
	; VEC: br i1 %[[v12]], label %[[cond:.+]], label %[[else:.+]]			; VEC: br i1 %[[v12]], label %[[cond:.+]], label %[[else:.+]]
	;			;
	; VEC: [[cond]]:			; VEC: [[cond]]:
	; VEC: store i32 %[[v13]], i32* %[[v14]], align 4			; VEC: store i32 %[[v13]], i32* %[[v2]], align 4
	; VEC: br label %[[else:.+]]			; VEC: br label %[[else:.+]]
	;			;
	; VEC: [[else]]:			; VEC: [[else]]:
	; VEC: %[[v15:.+]] = extractelement <2 x i1> %[[v10]], i32 1			; VEC: %[[v15:.+]] = extractelement <2 x i1> %[[v10]], i32 1
	; VEC: %[[v16:.+]] = icmp eq i1 %[[v15]], true			; VEC: %[[v16:.+]] = icmp eq i1 %[[v15]], true
	; VEC: %[[v17:.+]] = extractelement <2 x i32> %[[v9]], i32 1			; VEC: %[[v17:.+]] = extractelement <2 x i32> %[[v9]], i32 1
	; VEC: %[[v18:.+]] = extractelement <2 x i32*> %{{.+}} i32 1
	; VEC: br i1 %[[v16]], label %[[cond2:.+]], label %[[else2:.+]]			; VEC: br i1 %[[v16]], label %[[cond2:.+]], label %[[else2:.+]]
	;			;
	; VEC: [[cond2]]:			; VEC: [[cond2]]:
	; VEC: store i32 %[[v17]], i32* %[[v18]], align 4			; VEC: store i32 %[[v17]], i32* %[[v4]], align 4
	; VEC: br label %[[else2:.+]]			; VEC: br label %[[else2:.+]]
	;			;
	; VEC: [[else2]]:			; VEC: [[else2]]:

	; VEC-IC-LABEL: test			; VEC-IC-LABEL: test
	; VEC-IC: %[[v1:.+]] = icmp sgt <2 x i32> %{{.*}}, <i32 100, i32 100>			; VEC-IC: %[[v1:.+]] = icmp sgt <2 x i32> %{{.*}}, <i32 100, i32 100>
	; VEC-IC: %[[v2:.+]] = add nsw <2 x i32> %{{.*}}, <i32 20, i32 20>			; VEC-IC: %[[v2:.+]] = add nsw <2 x i32> %{{.*}}, <i32 20, i32 20>
	; VEC-IC: %[[v3:.+]] = extractelement <2 x i1> %[[v1]], i32 0			; VEC-IC: %[[v3:.+]] = extractelement <2 x i1> %[[v1]], i32 0
	▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines