This is an archive of the discontinued LLVM Phabricator instance.

[LV] Scalarize operands of predicated instructions
ClosedPublic

Authored by mssimpso on Oct 28 2016, 9:00 AM.

Download Raw Diff

Details

Reviewers

anemet
mkuper
gilr

Commits

rG364da7e52704: [LV] Scalarize operands of predicated instructions
rL288909: [LV] Scalarize operands of predicated instructions

Summary

This patch attempts to scalarize the operand expressions of predicated instructions if they were conditionally executed in the original loop. After scalarization, the expressions will be sunk inside the blocks created for the predicated instructions. The transformation essentially performs un-if-conversion on the operands.

The cost model has been updated to determine if scalarization is profitable. It compares the cost of a vectorized instruction, assuming it will be if-converted, to the cost of the scalarized instruction, assuming that the instructions corresponding to each vector lane will be sunk inside a predicated block, possibly avoiding execution. If it's more profitable to scalarize the entire expression tree feeding the predicated instruction, the expression will be scalarized; otherwise, it will be vectorized. We only consider the cost of the entire expression to accurately estimate the cost of the required insertelement and extractelement instructions.

Diff Detail

Build Status

Buildable 952
Build 952: arc lint + arc unit

Event Timeline

mssimpso updated this revision to Diff 76204.Oct 28 2016, 9:00 AM

mssimpso retitled this revision from to [LV] Scalarize operands of predicated instructions.

mssimpso updated this object.

mssimpso added reviewers: anemet, mkuper, gilr.

mssimpso added subscribers: llvm-commits, mcrosier.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptOct 28 2016, 9:00 AM

dorit added a subscriber: dorit.Oct 30 2016, 3:01 PM

gilr added inline comments.Oct 31 2016, 2:40 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
6602	It seems the only instructions pushed into Worklist are PredInst and instructions from its basic block. Is the invariance check necessary?

mkuper added inline comments.Oct 31 2016, 5:03 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
1916	Can this actually happen, or should there be an assert here? This seems weird, because if we sometimes call this before we collect the instructions for the VF, and sometimes after, then we'll get different results?
4679–4681	We have 5 places that call isScalarAfterVectorization(). Is this the only call site that cares about this? (If all of them should care, perhaps wrap in a helper function?)

mssimpso added inline comments.Nov 1 2016, 7:33 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
1916	You're right, this can be an assert. I'll update the patch. Thanks!
4679–4681	You're asking if all places where we call isScalarAfterVectorization should also be concerned with isProfitableToScalarize? I think a helper makes sense in general, but this is the only case (at least currently) where I think it will make a difference. In needsScalarInduction and widenIntInduction, the IVs shouldn't be predicated so it shouldn't make a difference. And in collectValuesToIgnore, we haven't yet computed the instruction costs.
6602	Ah, that's right. Thanks! I think the invariance check is leftover from a previous implementation I had. I'll remove it.

Addressed comments from Gil and Michael. Thanks!

Removed unnecessary invariance check from computePredInstDiscount
Added assert to isProfitableToScalarlize
Added call to collectInstsToScalarize for user-selected VFs (I realized we needed this with the new assert)
Made some auto types explicit

In D26083#585225, @mssimpso wrote:

Added call to collectInstsToScalarize for user-selected VFs (I realized we needed this with the new assert)

Yay, finally being pedantic and asking for an assert pays off! :-)

lib/Transforms/Vectorize/LoopVectorize.cpp
1962	Maybe make more explicit that the cost is the cost of the instruction if scalarized? ("instruction-cost pairs for each choice of vectorization factor" seems to imply the vectorized cost.)
1968	This can also be more explicit. "A negative returned value implies..."
4679–4681	And in collectValuesToIgnore, we haven't yet computed the instruction costs. So, we get some imprecision here, right? If we end up scalarizing things that aren't in VecValuesToIgnore then we'll overestimate the register pressure? Or am I confused? Anyway, assuming I got it right - I'm not saying we need to fix this in this patch. The patch, as it is, is probably still a strict improvement. But a FIXME would be good.
6154	Could you add a test for this?
6570	"to to" -> "to"
6583	Just trying to understand if I got this right. Let's say we have: %a = add i32, ... %r = udiv i32 %x, %a %s = udiv i32 %y, %a %t = udiv i32 %z, %a In the same block. Will we do the right thing evaluating the cost of scalarizing the add in a separate context for each udiv?
6624	Can we end up with a non-scalar type during the traversal? I'm thinking about something like a div that depends on an extractvalue from a struct.

gilr added inline comments.Nov 7 2016, 8:14 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
4679–4681	I think Michael's comment raises a more general question: should it matter why we scalarize? Put differently, once an instruction is marked for scalarization by legal or cost, shouldn't we treat it uniformly in the code? [if so, then the question "will the instruction be scalarized" would only be well-defined in the context of some VF]
6583	Actually in such a case it seems the addition gets scalarized but cannot be sunk down because of the multiple uses. I also wonder if we calculate the cost correctly when one predicated instruction feeds another.
6605	The following loop causes the patch to assert void foo(int* restrict a, int b, int* restrict c) { for (int i = 0; i < 10000; ++i) { if (a[i] > 777) { int t2 = c[i]; int t3 = t2 / b; a[i] += t3; } } } while trying to scalarize a uniform value (the predicated load). Got this example to work by skipping I if any of its operands isUniformAfterVectorization().
6624	Wasn't the cost of insert-element accounted for by VectorCost?

Michael/Gil,

Thanks for all the comments! I'll update a new version of the patch shortly.

lib/Transforms/Vectorize/LoopVectorize.cpp
1962	Sounds good.
1968	Sounds good.
4679–4681	That's right. There is some conservatism in collectValuesToIgnore. Currently, we place a value in VecValuesToIgnore only if we're sure (for all VFs) that the value will be scalar. But this doesn't mean that all other values will be vectorized. With this patch, we'll have a more precise determination of the known scalars after VF selection. For the question about scalarization by legality vs. the cost model, for code generation, it's not going to matter why we decide to scalarize. In legality, we know that for any (legal) VF that we select, a value will be scalarized. In the cost model, we decide for particular VFs if it's more profitable to scalarize. When we come to code generation, we know what that VF is. If the comment is that InnerLoopVectorizer shouldn't need to make a distinction between Legal and Cost scalarization, then we can create a helper function like Michael suggested for InnerLoopVectorizer to use. What do you think?
6154	Sure.
6570	Thanks!
6583	Yeah, Gil is right here in that the add in this example will not be sunk. Thanks, Gil! This is because our current predication method places each udiv in it's own separate block. Long term I think we will want to keep the predicated instructions in the same block after vectorization if they were in the same block before vectorization. I can add some comments about this in the meantime. So we will need to check for the multi-context case, then. But this doesn't mean that we can't sink instructions with multi-uses. We should still be able to handle the following, for example. if: %a = add i32, ... %r = add i32 %a, %a %s = udiv i32 %x, %r br I'll add some tests.
6605	Thanks, Gil!. I'll fix this in the updated patch and add an appropriate test.
6624	For the cost of the insertelement for predicated instructions: that's right. We computed it in VectorCost. The insert overhead should be the same for both versions of the code: predication as-is vs. predication with scalarized operands. So we either need to subtract this overhead from VectorCost or add it to ScalarCost to make the calculations comparable. For the non-scalar type question: this may be possible (I'll need to investigate). I'll add a test either way. Thanks!

mkuper added inline comments.Nov 8 2016, 11:10 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
4679–4681	That sounds good to me. I think eventually we'll probably need to keep a full "this will get scalarized" list per-VF instead of the current VecValuesToIgnore, but that's an issue for another patch. (By the way, do we really expect to see cases where scalarization is better for some VFs, but worse for others?)

mssimpso added inline comments.Nov 8 2016, 11:40 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
4679–4681	At least for this patch, scalarization tends to become less profitable as the VF increases. This is primarily because, without PGO, we always assume a 0.5 predicated block probability.

mkuper added inline comments.Nov 8 2016, 1:34 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
4679–4681	Ah, ok, makes sense.

mssimpso added inline comments.Nov 11 2016, 10:38 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
6583	Michael/Gil, After thinking about the multi-context case a bit more (and not wanting to revisit this again later), I decided to take a stab at modifying code generation for the predicated instructions so that we don't place them in separate basic blocks if we don't have to. I'll upload this as a separate patch shortly. Matt.

mssimpso mentioned this in D26555: [LV] Keep predicated instructions in the same block.Nov 16 2016, 5:34 AM

Michael/Gil,

I'm wrapping up addressing your comments, but I have a question about testing. Since this change is cost-model driven, all tests should really live under a target-specific directory (i.e., test/Transforms/LoopVectorize/AArch64). That's easy enough to do, but what should we do about all the target-independent tests that currently exist in the top-level directory? With this change, we would be introducing target-specific behavior that may change how those tests are vectorized, depending on the default target.

I can think of two ways to avoid this: (1) disable scalarization when a user-specified vectorization factor is provided. That is, only perform the optimization when the cost model is actually run. Or (2) introduce a new flag (e.g., --enable-scalarization-with-predication), defaulting to true that we could then disable in the run line of all the current top-level tests.

What do you think?

I'm not a fan of (1) - I don't think specifying the vectorization factor manually should mean that you're uninterested in all other cost-model considerations.

Regarding (2), I'm not sure what's worse - knob proliferation, or not having target-independent tests. That is, one knob is fine, but I wouldn't want to add a knob for every cost-model-dependent decision.
Maybe we can have a catch-all knob that basically says "use the default TTI instead of the current target's"? Does that even make sense?

In D26083#598865, @mkuper wrote:

I'm not a fan of (1) - I don't think specifying the vectorization factor manually should mean that you're uninterested in all other cost-model considerations.

Regarding (2), I'm not sure what's worse - knob proliferation, or not having target-independent tests. That is, one knob is fine, but I wouldn't want to add a knob for every cost-model-dependent decision.
Maybe we can have a catch-all knob that basically says "use the default TTI instead of the current target's"? Does that even make sense?

That should be doable. I'll give it a shot in a separate patch.

Addressed comments from Michael and Gil.

I think I've taken care of everything mentioned until now. Thanks again for the reviews! With my testing concerns addressed, I think we're fine with the code generation tests remaining target-independent.

Added a helper function in InnerLoopVectorizer combining Legal->isScalarAfterVectorization with Cost->isProfitableToScalarize. I replaced all uses of isScalarAfterVectorization in the vectorizer with the new helper function. I added a FIXME for the use of isScalarAfterVectorization in collectValuesToIgnore, like Michael suggested.
Added an assert for Michael's non-scalar type question. Non-scalar types aren't allowable (we already check for this in canVectorizeInstrs), but I don't think an additional assert will hurt.
Added a test requested by Michael for user-specified vectorization factors with no interleaving (AArch64/aarch64-predication.ll). The test is target-specific to ensure we perform the same optimization with and without manually specifying the vectorization factor.
Reorganized scalarization conditions in canBeScalarized, and added a check for operands that are uniform-after-vectorization. This fixes the crash Gil discovered (X86/x86-predication.ll). The test is target-specific because the crash depends on the generation of masked loads.
Added a check in canBeScalarized for the multi-context case. We now only consider instructions forming a single-use chain from the original predicated block that would otherwise be vectorized. I added a test case for the multi-context costs (AArch64/predication_costs.ll).
Updated comments.

This LGTM, except some vague thoughts about slightly relaxing canBeScalarized().
If you think that's not a real concern, I'm ok with this going on as is.

lib/Transforms/Vectorize/LoopVectorize.cpp
6611	Please add an explanation for why we bail on Legal->isScalarAfterVectorization(I)
6625	Are you sure this is the right check? If I understand correctly, we fail not because the pointer is uniform, but because it's only uniform in the "we use a single (LLVM) value to represent it" sense, not the "the (abstract) value is the same for all lanes" sense. Otherwise it'd be safe to scalarize. I'm having a hard time to think of a good example, though because this is limited to instructions (so GV and param operands won't be affected), and uniform instructions tend to either be consecutive, loop-invariant (in which case they should be hoisted out by LICM before we hit the vectorizer), or uses of the scalar IV (in which case they won't be operands of vectorized instructions).

mssimpso added inline comments.Dec 1 2016, 9:25 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
6611	Sure. I don't think it's strictly necessary to bail here, but I couldn't think of an example where continuing to traverse the chain would be useful.
6625	That's exactly right. We can't scalarize because we use a single value to represent the uniform-after-vec instructions. It's the difference between uniform meaning "only lane zero will be used so the others aren't needed" vs. "all lanes can be used but they will all be equal". I think we could change this if we wanted to. In getScalarValue, we currently assert if we're trying to access a uniform-after-vec instruction for a Lane > 0. We could instead clone the Lane == 0 value and return that. What do you think?

mkuper accepted this revision.Dec 1 2016, 3:25 PM

mkuper edited edge metadata.

mkuper added inline comments.

lib/Transforms/Vectorize/LoopVectorize.cpp
6625	I'm really ok with this going in as is, as long as it's clearly documented. We can fix this if ever run into a case where it matters, and I don't want to introduce even more complexity here if it's unwarranted.

This revision is now accepted and ready to land.Dec 1 2016, 3:25 PM

mssimpso added inline comments.Dec 2 2016, 4:49 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
6625	Sounds good, thanks Michael. I'll add some comments to make this very clear.

gilr added inline comments.Dec 2 2016, 7:56 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
6672	Is there scalarization overhead if canBeScalarized(J) is false due to isScalarAfterVectorization(J)? More generally, shouldn't this line be under a !isScalarAfterVectorization(J) condition? We may have bailed out e.g. due to J being on a different basic block but J itself may be scalar (meaning we've "smoothed" a scalar-vector-scalar sequence). Is that correct?

mssimpso added inline comments.Dec 2 2016, 8:35 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
6672	That's right Gil. We also should check that J is actually contained in the loop before adding in the extract overhead. I had already added this to my working copy after uploading the last revision. But I'll upload a new revision with this change and Michael's comment suggestions for clarity. Thanks!

Addressed comments from Michael and Gil.

Added documentation requested by Michael.
Added a needsExtract condition for determining if we need to compute a scalarization overhead, as mentioned by Gil. I updated some costs in the AArch64/predication_costs.ll test to reflect this. For predicated stores, we know their GEP pointer operands will be scalar, so we no longer will calculate an extract cost for them.

(Still LGTM :-) )

Did I address all of your comments, Gil? Thanks!

Yes you did, Matt - sorry for the delay. LGTM too :)

In D26083#614898, @gilr wrote:

Yes you did, Matt - sorry for the delay. LGTM too :)

No problem - just wanted to make sure. Thanks for the review!

Closed by commit rL288909: [LV] Scalarize operands of predicated instructions (authored by mssimpso). · Explain WhyDec 7 2016, 7:13 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

136 lines

test/

Transforms/

LoopVectorize/

AArch64/

predication_costs.ll

86 lines

if-pred-non-void.ll

54 lines

if-pred-stores.ll

11 lines

Diff 76631

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,902 Lines • ▼ Show 20 Lines	public:

/// \returns The smallest bitwidth each instruction can be represented with.		/// \returns The smallest bitwidth each instruction can be represented with.
/// The vector equivalents of these instructions should be truncated to this		/// The vector equivalents of these instructions should be truncated to this
/// type.		/// type.
const MapVector<Instruction *, uint64_t> &getMinimalBitwidths() const {		const MapVector<Instruction *, uint64_t> &getMinimalBitwidths() const {
return MinBWs;		return MinBWs;
}		}

		/// \returns True if it is more profitable to scalarize instruction \p I for
		/// vectorization factor \p VF.
		bool isProfitableToScalarize(Instruction *I, unsigned VF) const {
		auto Scalars = InstsToScalarize.find(VF);
		assert(Scalars != InstsToScalarize.end() &&
		"VF not yet analyzed for scalarization profitability");
		mkuperUnsubmitted Done Reply Inline Actions Can this actually happen, or should there be an assert here? This seems weird, because if we sometimes call this before we collect the instructions for the VF, and sometimes after, then we'll get different results? mkuper: Can this actually happen, or should there be an assert here? This seems weird, because if we…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions You're right, this can be an assert. I'll update the patch. Thanks! mssimpso: You're right, this can be an assert. I'll update the patch. Thanks!
		return Scalars->second.count(I);
		}

private:		private:
/// The vectorization cost is a combination of the cost itself and a boolean		/// The vectorization cost is a combination of the cost itself and a boolean
/// indicating whether any of the contributing operations will actually		/// indicating whether any of the contributing operations will actually
/// operate on		/// operate on
/// vector values after type legalization in the backend. If this latter value		/// vector values after type legalization in the backend. If this latter value
/// is		/// is
/// false, then all operations will be scalarized (i.e. no vectorization has		/// false, then all operations will be scalarized (i.e. no vectorization has
/// actually taken place).		/// actually taken place).
Show All 26 Lines	return ::createMissedAnalysis(Hints->vectorizeAnalysisPassName(),
RemarkName, TheLoop);		RemarkName, TheLoop);
}		}

/// Map of scalar integer values to the smallest bitwidth they can be legally		/// Map of scalar integer values to the smallest bitwidth they can be legally
/// represented as. The vector equivalents of these values should be truncated		/// represented as. The vector equivalents of these values should be truncated
/// to this type.		/// to this type.
MapVector<Instruction *, uint64_t> MinBWs;		MapVector<Instruction *, uint64_t> MinBWs;

		/// A map holding instruction-cost pairs for each choice of vectorization
		mkuperUnsubmitted Done Reply Inline Actions Maybe make more explicit that the cost is the cost of the instruction if scalarized? ("instruction-cost pairs for each choice of vectorization factor" seems to imply the vectorized cost.) mkuper: Maybe make more explicit that the cost is the cost of the instruction if scalarized?
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Sounds good. mssimpso: Sounds good.
		/// factor. The presence of an instruction in the mapping indicates that it
		/// will be scalarized when vectorizing with the associated vectorization
		/// factor.
		DenseMap<unsigned, DenseMap<Instruction *, unsigned>> InstsToScalarize;

		// Returns the expected difference in cost from scalarizing the expression
		mkuperUnsubmitted Done Reply Inline Actions This can also be more explicit. "A negative returned value implies..." mkuper: This can also be more explicit. "A negative returned value implies..."
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Sounds good. mssimpso: Sounds good.
		// feeding a predicated instruction \p PredInst. The instructions to
		// scalarize and their scalar costs are collected in \p ScalarCosts.
		int computePredInstDiscount(Instruction *PredInst,
		DenseMap<Instruction *, unsigned> &ScalarCosts,
		unsigned VF);

		/// Collects the instructions to scalarize for each predicated instruction in
		/// the loop.
		void collectInstsToScalarize(unsigned VF);

public:		public:
/// The loop that we evaluate.		/// The loop that we evaluate.
Loop *TheLoop;		Loop *TheLoop;
/// Predicated scalar evolution analysis.		/// Predicated scalar evolution analysis.
PredicatedScalarEvolution &PSE;		PredicatedScalarEvolution &PSE;
/// Loop Info analysis.		/// Loop Info analysis.
LoopInfo *LI;		LoopInfo *LI;
/// Vectorization legality.		/// Vectorization legality.
▲ Show 20 Lines • Show All 2,681 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::vectorizeBlockInLoop(BasicBlock BB, PhiVector PV) {
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {

// If the instruction will become trivially dead when vectorized, we don't		// If the instruction will become trivially dead when vectorized, we don't
// need to generate it.		// need to generate it.
if (DeadInstructions.count(&I))		if (DeadInstructions.count(&I))
continue;		continue;

// Scalarize instructions that should remain scalar after vectorization.		// Scalarize instructions that should remain scalar after vectorization.
if (!(isa<BranchInst>(&I) \|\| isa<PHINode>(&I) \|\|		if (VF > 1 &&
		!(isa<BranchInst>(&I) \|\| isa<PHINode>(&I) \|\|
isa<DbgInfoIntrinsic>(&I)) &&		isa<DbgInfoIntrinsic>(&I)) &&
Legal->isScalarAfterVectorization(&I)) {		(Legal->isScalarAfterVectorization(&I) \|\|
scalarizeInstruction(&I);		Cost->isProfitableToScalarize(&I, VF))) {
		scalarizeInstruction(&I, Legal->isScalarWithPredication(&I));
		mkuperUnsubmitted Done Reply Inline Actions We have 5 places that call isScalarAfterVectorization(). Is this the only call site that cares about this? (If all of them should care, perhaps wrap in a helper function?) mkuper: We have 5 places that call isScalarAfterVectorization(). Is this the only call site that cares…
		gilrUnsubmitted Not Done Reply Inline Actions I think Michael's comment raises a more general question: should it matter why we scalarize? Put differently, once an instruction is marked for scalarization by legal or cost, shouldn't we treat it uniformly in the code? [if so, then the question "will the instruction be scalarized" would only be well-defined in the context of some VF] gilr: I think Michael's comment raises a more general question: should it matter why we scalarize?
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions You're asking if all places where we call isScalarAfterVectorization should also be concerned with isProfitableToScalarize? I think a helper makes sense in general, but this is the only case (at least currently) where I think it will make a difference. In needsScalarInduction and widenIntInduction, the IVs shouldn't be predicated so it shouldn't make a difference. And in collectValuesToIgnore, we haven't yet computed the instruction costs. mssimpso: You're asking if all places where we call isScalarAfterVectorization should also be concerned…
		mkuperUnsubmitted Done Reply Inline Actions And in collectValuesToIgnore, we haven't yet computed the instruction costs. So, we get some imprecision here, right? If we end up scalarizing things that aren't in VecValuesToIgnore then we'll overestimate the register pressure? Or am I confused? Anyway, assuming I got it right - I'm not saying we need to fix this in this patch. The patch, as it is, is probably still a strict improvement. But a FIXME would be good. mkuper: > And in collectValuesToIgnore, we haven't yet computed the instruction costs. So, we get some…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions That's right. There is some conservatism in collectValuesToIgnore. Currently, we place a value in VecValuesToIgnore only if we're sure (for all VFs) that the value will be scalar. But this doesn't mean that all other values will be vectorized. With this patch, we'll have a more precise determination of the known scalars after VF selection. For the question about scalarization by legality vs. the cost model, for code generation, it's not going to matter why we decide to scalarize. In legality, we know that for any (legal) VF that we select, a value will be scalarized. In the cost model, we decide for particular VFs if it's more profitable to scalarize. When we come to code generation, we know what that VF is. If the comment is that InnerLoopVectorizer shouldn't need to make a distinction between Legal and Cost scalarization, then we can create a helper function like Michael suggested for InnerLoopVectorizer to use. What do you think? mssimpso: That's right. There is some conservatism in collectValuesToIgnore. Currently, we place a value…
		mkuperUnsubmitted Not Done Reply Inline Actions That sounds good to me. I think eventually we'll probably need to keep a full "this will get scalarized" list per-VF instead of the current VecValuesToIgnore, but that's an issue for another patch. (By the way, do we really expect to see cases where scalarization is better for some VFs, but worse for others?) mkuper: That sounds good to me. I think eventually we'll probably need to keep a full "this will get…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions At least for this patch, scalarization tends to become less profitable as the VF increases. This is primarily because, without PGO, we always assume a 0.5 predicated block probability. mssimpso: At least for this patch, scalarization tends to become less profitable as the VF increases.
		mkuperUnsubmitted Not Done Reply Inline Actions Ah, ok, makes sense. mkuper: Ah, ok, makes sense.
continue;		continue;
}		}

switch (I.getOpcode()) {		switch (I.getOpcode()) {
case Instruction::Br:		case Instruction::Br:
// Nothing to do for PHIs and BR, since we already took care of the		// Nothing to do for PHIs and BR, since we already took care of the
// loop control flow instructions.		// loop control flow instructions.
continue;		continue;
▲ Show 20 Lines • Show All 1,456 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel::selectVectorizationFactor(bool OptForSize) {
}		}

int UserVF = Hints->getWidth();		int UserVF = Hints->getWidth();
if (UserVF != 0) {		if (UserVF != 0) {
assert(isPowerOf2_32(UserVF) && "VF needs to be a power of two");		assert(isPowerOf2_32(UserVF) && "VF needs to be a power of two");
DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");		DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");

Factor.Width = UserVF;		Factor.Width = UserVF;
		collectInstsToScalarize(UserVF);
		mkuperUnsubmitted Done Reply Inline Actions Could you add a test for this? mkuper: Could you add a test for this?
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Sure. mssimpso: Sure.
return Factor;		return Factor;
}		}

float Cost = expectedCost(1).first;		float Cost = expectedCost(1).first;
#ifndef NDEBUG		#ifndef NDEBUG
const float ScalarCost = Cost;		const float ScalarCost = Cost;
#endif /* NDEBUG */		#endif /* NDEBUG */
unsigned Width = 1;		unsigned Width = 1;
▲ Show 20 Lines • Show All 389 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = VFs.size(); i < e; ++i) {
RU.LoopInvariantRegs = Invariant;		RU.LoopInvariantRegs = Invariant;
RU.MaxLocalUsers = MaxUsages[i];		RU.MaxLocalUsers = MaxUsages[i];
RUs[i] = RU;		RUs[i] = RU;
}		}

return RUs;		return RUs;
}		}

		void LoopVectorizationCostModel::collectInstsToScalarize(unsigned VF) {

		// If we aren't vectorizing the loop, or if we've already collected the
		// instructions to scalarize, there's nothing to do. Collection may already
		// have occurred if we have a user-selected VF and are now computing the
		// expected cost for interleaving.
		if (VF < 2 \|\| InstsToScalarize.count(VF))
		return;

		// Initialize a mapping for VF in InstsToScalalarize. If we find that it's
		// not profitable to to scalarize any instructions, the presence of VF in the
		mkuperUnsubmitted Done Reply Inline Actions "to to" -> "to" mkuper: "to to" -> "to"
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Thanks! mssimpso: Thanks!
		// map will indicate that we've analyzed it already.
		DenseMap<Instruction *, unsigned> &InstsToScalarizeVF = InstsToScalarize[VF];

		// Find all the instructions that are scalar with predication in the loop and
		// determine if it would better to not if-convert the blocks they are in. If
		// so, we also record the instructions to scalarize.
		for (BasicBlock *BB : TheLoop->blocks()) {
		if (!Legal->blockNeedsPredication(BB))
		continue;
		for (Instruction &I : *BB)
		if (Legal->isScalarWithPredication(&I)) {
		DenseMap<Instruction *, unsigned> ScalarCosts;
		if (computePredInstDiscount(&I, ScalarCosts, VF) >= 0)
		mkuperUnsubmitted Not Done Reply Inline Actions Just trying to understand if I got this right. Let's say we have: %a = add i32, ... %r = udiv i32 %x, %a %s = udiv i32 %y, %a %t = udiv i32 %z, %a In the same block. Will we do the right thing evaluating the cost of scalarizing the add in a separate context for each udiv? mkuper: Just trying to understand if I got this right. Let's say we have: ``` %a = add i32, ... %r =…
		gilrUnsubmitted Not Done Reply Inline Actions Actually in such a case it seems the addition gets scalarized but cannot be sunk down because of the multiple uses. I also wonder if we calculate the cost correctly when one predicated instruction feeds another. gilr: Actually in such a case it seems the addition gets scalarized but cannot be sunk down because…
		mssimpsoAuthorUnsubmitted Done Reply Inline Actions Yeah, Gil is right here in that the add in this example will not be sunk. Thanks, Gil! This is because our current predication method places each udiv in it's own separate block. Long term I think we will want to keep the predicated instructions in the same block after vectorization if they were in the same block before vectorization. I can add some comments about this in the meantime. So we will need to check for the multi-context case, then. But this doesn't mean that we can't sink instructions with multi-uses. We should still be able to handle the following, for example. if: %a = add i32, ... %r = add i32 %a, %a %s = udiv i32 %x, %r br I'll add some tests. mssimpso: Yeah, Gil is right here in that the add in this example will not be sunk. Thanks, Gil! This is…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Michael/Gil, After thinking about the multi-context case a bit more (and not wanting to revisit this again later), I decided to take a stab at modifying code generation for the predicated instructions so that we don't place them in separate basic blocks if we don't have to. I'll upload this as a separate patch shortly. Matt. mssimpso: Michael/Gil, After thinking about the multi-context case a bit more (and not wanting to…
		InstsToScalarizeVF.insert(ScalarCosts.begin(), ScalarCosts.end());
		}
		}
		}

		int LoopVectorizationCostModel::computePredInstDiscount(
		Instruction PredInst, DenseMap<Instruction , unsigned> &ScalarCosts,
		unsigned VF) {

		// Initialize the discount to zero, meaning that the scalar version and the
		// vector version cost the same.
		int Discount = 0;

		// Holds instructions to analyze. The instructions we visit are mapped in
		// ScalarCosts. Those instructions are the ones that would be scalarized if
		// we find that the scalar version costs less.
		SmallVector<Instruction *, 8> Worklist;

		// Compute the expected cost discount from scalarizing the entire expression
		gilrUnsubmitted Done Reply Inline Actions It seems the only instructions pushed into Worklist are PredInst and instructions from its basic block. Is the invariance check necessary? gilr: It seems the only instructions pushed into Worklist are PredInst and instructions from its…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Ah, that's right. Thanks! I think the invariance check is leftover from a previous implementation I had. I'll remove it. mssimpso: Ah, that's right. Thanks! I think the invariance check is leftover from a previous…
		// feeding the predicated instruction.
		Worklist.push_back(PredInst);
		while (!Worklist.empty()) {
		gilrUnsubmitted Done Reply Inline Actions The following loop causes the patch to assert void foo(int* restrict a, int b, int* restrict c) { for (int i = 0; i < 10000; ++i) { if (a[i] > 777) { int t2 = c[i]; int t3 = t2 / b; a[i] += t3; } } } while trying to scalarize a uniform value (the predicated load). Got this example to work by skipping I if any of its operands isUniformAfterVectorization(). gilr: The following loop causes the patch to assert void foo(int* restrict a, int b, int* restrict…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Thanks, Gil!. I'll fix this in the updated patch and add an appropriate test. mssimpso: Thanks, Gil!. I'll fix this in the updated patch and add an appropriate test.
		Instruction *I = Worklist.pop_back_val();

		// If we've already analyzed the instruction, there's nothing to do.
		if (ScalarCosts.count(I))
		continue;

		mkuperUnsubmitted Done Reply Inline Actions Please add an explanation for why we bail on Legal->isScalarAfterVectorization(I) mkuper: Please add an explanation for why we bail on Legal->isScalarAfterVectorization(I)
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Sure. I don't think it's strictly necessary to bail here, but I couldn't think of an example where continuing to traverse the chain would be useful. mssimpso: Sure. I don't think it's strictly necessary to bail here, but I couldn't think of an example…
		// Compute the cost of the vector instruction. Note that this cost already
		// includes the scalarization overhead of the predicated instruction.
		unsigned VectorCost = getInstructionCost(I, VF).first;

		// Compute the cost of the scalarized instruction. This cost is the cost of
		// the instruction as if it wasn't if-converted and instead remained in the
		// predicated block. We will scale this cost by block probability after
		// computing the scalarization overhead.
		unsigned ScalarCost = VF * getInstructionCost(I, 1).first;

		// Compute the scalarization overhead of needed insertelement instructions.
		if (Legal->isScalarWithPredication(I) && !I->getType()->isVoidTy())
		ScalarCost += getScalarizationOverhead(ToVectorTy(I->getType(), VF), true,
		gilrUnsubmitted Not Done Reply Inline Actions Wasn't the cost of insert-element accounted for by VectorCost? gilr: Wasn't the cost of insert-element accounted for by VectorCost?
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions For the cost of the insertelement for predicated instructions: that's right. We computed it in VectorCost. The insert overhead should be the same for both versions of the code: predication as-is vs. predication with scalarized operands. So we either need to subtract this overhead from VectorCost or add it to ScalarCost to make the calculations comparable. For the non-scalar type question: this may be possible (I'll need to investigate). I'll add a test either way. Thanks! mssimpso: For the cost of the insertelement for predicated instructions: that's right. We computed it in…
		mkuperUnsubmitted Done Reply Inline Actions Can we end up with a non-scalar type during the traversal? I'm thinking about something like a div that depends on an extractvalue from a struct. mkuper: Can we end up with a non-scalar type during the traversal? I'm thinking about something like a…
		false, TTI);
		mkuperUnsubmitted Not Done Reply Inline Actions Are you sure this is the right check? If I understand correctly, we fail not because the pointer is uniform, but because it's only uniform in the "we use a single (LLVM) value to represent it" sense, not the "the (abstract) value is the same for all lanes" sense. Otherwise it'd be safe to scalarize. I'm having a hard time to think of a good example, though because this is limited to instructions (so GV and param operands won't be affected), and uniform instructions tend to either be consecutive, loop-invariant (in which case they should be hoisted out by LICM before we hit the vectorizer), or uses of the scalar IV (in which case they won't be operands of vectorized instructions). mkuper: Are you sure this is the right check? If I understand correctly, we fail not because the…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions That's exactly right. We can't scalarize because we use a single value to represent the uniform-after-vec instructions. It's the difference between uniform meaning "only lane zero will be used so the others aren't needed" vs. "all lanes can be used but they will all be equal". I think we could change this if we wanted to. In getScalarValue, we currently assert if we're trying to access a uniform-after-vec instruction for a Lane > 0. We could instead clone the Lane == 0 value and return that. What do you think? mssimpso: That's exactly right. We can't scalarize because we use a single value to represent the uniform…
		mkuperUnsubmitted Done Reply Inline Actions I'm really ok with this going in as is, as long as it's clearly documented. We can fix this if ever run into a case where it matters, and I don't want to introduce even more complexity here if it's unwarranted. mkuper: I'm really ok with this going in as is, as long as it's clearly documented. We can fix this if…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Sounds good, thanks Michael. I'll add some comments to make this very clear. mssimpso: Sounds good, thanks Michael. I'll add some comments to make this very clear.

		// Compute the scalarization overhead of needed extractelement
		// instructions. We will attempt to scalarize all instructions in the
		// predicated block. Thus, if the instruction uses an instruction defined
		// outside the predicated block, we will have to include the cost of
		// extracting it. If the instruction uses an instruction defined inside the
		// predicated block, we add it to the worklist instead.
		for (Use &U : I->operands())
		if (auto *J = dyn_cast<Instruction>(U.get())) {
		if (PredInst->getParent() != J->getParent())
		ScalarCost += getScalarizationOverhead(ToVectorTy(J->getType(), VF),
		false, true, TTI);
		else
		Worklist.push_back(J);
		}

		// Scale the total scalar cost by block probability.
		ScalarCost /= getReciprocalPredBlockProb();

		// Compute the discount. A non-negative discount means the vector version
		// of the instruction costs more, and scalarizing would be beneficial.
		Discount += VectorCost - ScalarCost;
		ScalarCosts[I] = ScalarCost;
		}

		return Discount;
		}

LoopVectorizationCostModel::VectorizationCostTy		LoopVectorizationCostModel::VectorizationCostTy
LoopVectorizationCostModel::expectedCost(unsigned VF) {		LoopVectorizationCostModel::expectedCost(unsigned VF) {
VectorizationCostTy Cost;		VectorizationCostTy Cost;

		// Collect the instructions (and their associated costs) that will be more
		// profitable to scalarize.
		collectInstsToScalarize(VF);

// For each block.		// For each block.
for (BasicBlock *BB : TheLoop->blocks()) {		for (BasicBlock *BB : TheLoop->blocks()) {
VectorizationCostTy BlockCost;		VectorizationCostTy BlockCost;

// For each instruction in the old loop.		// For each instruction in the old loop.
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
// Skip dbg intrinsics.		// Skip dbg intrinsics.
if (isa<DbgInfoIntrinsic>(I))		if (isa<DbgInfoIntrinsic>(I))
continue;		continue;

// Skip ignored values.		// Skip ignored values.
		gilrUnsubmitted Done Reply Inline Actions Is there scalarization overhead if canBeScalarized(J) is false due to isScalarAfterVectorization(J)? More generally, shouldn't this line be under a !isScalarAfterVectorization(J) condition? We may have bailed out e.g. due to J being on a different basic block but J itself may be scalar (meaning we've "smoothed" a scalar-vector-scalar sequence). Is that correct? gilr: Is there scalarization overhead if canBeScalarized(J) is false due to…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions That's right Gil. We also should check that J is actually contained in the loop before adding in the extract overhead. I had already added this to my working copy after uploading the last revision. But I'll upload a new revision with this change and Michael's comment suggestions for clarity. Thanks! mssimpso: That's right Gil. We also should check that J is actually contained in the loop before adding…
if (ValuesToIgnore.count(&I))		if (ValuesToIgnore.count(&I))
continue;		continue;

VectorizationCostTy C = getInstructionCost(&I, VF);		VectorizationCostTy C = getInstructionCost(&I, VF);

// Check if we should override the cost.		// Check if we should override the cost.
if (ForceTargetInstructionCost.getNumOccurrences() > 0)		if (ForceTargetInstructionCost.getNumOccurrences() > 0)
C.first = ForceTargetInstructionCost;		C.first = ForceTargetInstructionCost;
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines

LoopVectorizationCostModel::VectorizationCostTy		LoopVectorizationCostModel::VectorizationCostTy
LoopVectorizationCostModel::getInstructionCost(Instruction *I, unsigned VF) {		LoopVectorizationCostModel::getInstructionCost(Instruction *I, unsigned VF) {
// If we know that this instruction will remain uniform, check the cost of		// If we know that this instruction will remain uniform, check the cost of
// the scalar version.		// the scalar version.
if (Legal->isUniformAfterVectorization(I))		if (Legal->isUniformAfterVectorization(I))
VF = 1;		VF = 1;

		if (VF > 1 && isProfitableToScalarize(I, VF))
		return VectorizationCostTy(InstsToScalarize[VF][I], false);

Type *VectorTy;		Type *VectorTy;
unsigned C = getInstructionCost(I, VF, VectorTy);		unsigned C = getInstructionCost(I, VF, VectorTy);

bool TypeNotScalarized =		bool TypeNotScalarized =
VF > 1 && !VectorTy->isVoidTy() && TTI.getNumberOfParts(VectorTy) < VF;		VF > 1 && !VectorTy->isVoidTy() && TTI.getNumberOfParts(VectorTy) < VF;
return VectorizationCostTy(C, TypeNotScalarized);		return VectorizationCostTy(C, TypeNotScalarized);
}		}

▲ Show 20 Lines • Show All 815 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/AArch64/predication_costs.ll

	Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	for.inc:			for.inc:
	%i.next = add nuw nsw i64 %i, 1			%i.next = add nuw nsw i64 %i, 1
	%cond = icmp slt i64 %i.next, %n			%cond = icmp slt i64 %i.next, %n
	br i1 %cond, label %for.body, label %for.end			br i1 %cond, label %for.body, label %for.end

	for.end:			for.end:
	ret void			ret void
	}			}

				; CHECK-LABEL: predicated_udiv_scalarized_operand
				;
				; This test checks that we correctly compute the cost of the predicated udiv
				; instruction and the add instruction it uses. The add is scalarized and sunk
				; inside the predicated block. If we assume the block probability is 50%, we
				; compute the cost as:
				;
				; Cost of add:
				; (add(2) + extractelement(3)) / 2 = 2
				; Cost of udiv:
				; (udiv(2) + extractelement(3) + insertelement(3)) / 2 = 4
				;
				; CHECK: Found an estimated cost of 2 for VF 2 For instruction: %tmp3 = add nsw i32 %tmp2, %x
				; CHECK: Found an estimated cost of 4 for VF 2 For instruction: %tmp4 = udiv i32 %tmp2, %tmp3
				; CHECK: Scalarizing: %tmp3 = add nsw i32 %tmp2, %x
				; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp2, %tmp3
				;
				define i32 @predicated_udiv_scalarized_operand(i32* %a, i1 %c, i32 %x, i64 %n) {
				entry:
				br label %for.body

				for.body:
				%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
				%r = phi i32 [ 0, %entry ], [ %tmp6, %for.inc ]
				%tmp0 = getelementptr inbounds i32, i32* %a, i64 %i
				%tmp2 = load i32, i32* %tmp0, align 4
				br i1 %c, label %if.then, label %for.inc

				if.then:
				%tmp3 = add nsw i32 %tmp2, %x
				%tmp4 = udiv i32 %tmp2, %tmp3
				br label %for.inc

				for.inc:
				%tmp5 = phi i32 [ %tmp2, %for.body ], [ %tmp4, %if.then]
				%tmp6 = add i32 %r, %tmp5
				%i.next = add nuw nsw i64 %i, 1
				%cond = icmp slt i64 %i.next, %n
				br i1 %cond, label %for.body, label %for.end

				for.end:
				%tmp7 = phi i32 [ %tmp6, %for.inc ]
				ret i32 %tmp7
				}

				; CHECK-LABEL: predicated_store_scalarized_operand
				;
				; This test checks that we correctly compute the cost of the predicated store
				; instruction and the add instruction it uses. The add is scalarized and sunk
				; inside the predicated block. If we assume the block probability is 50%, we
				; compute the cost as:
				;
				; Cost of add:
				; (add(2) + extractelement(3)) / 2 = 2
				; Cost of udiv:
				; (store(4) + extractelement(3)) / 2 = 3
				;
				; CHECK: Found an estimated cost of 2 for VF 2 For instruction: %tmp2 = add nsw i32 %tmp1, %x
				; CHECK: Found an estimated cost of 3 for VF 2 For instruction: store i32 %tmp2, i32* %tmp0, align 4
				; CHECK: Scalarizing: %tmp2 = add nsw i32 %tmp1, %x
				; CHECK: Scalarizing and predicating: store i32 %tmp2, i32* %tmp0, align 4
				;
				define void @predicated_store_scalarized_operand(i32* %a, i1 %c, i32 %x, i64 %n) {
				entry:
				br label %for.body

				for.body:
				%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
				%tmp0 = getelementptr inbounds i32, i32* %a, i64 %i
				%tmp1 = load i32, i32* %tmp0, align 4
				br i1 %c, label %if.then, label %for.inc

				if.then:
				%tmp2 = add nsw i32 %tmp1, %x
				store i32 %tmp2, i32* %tmp0, align 4
				br label %for.inc

				for.inc:
				%i.next = add nuw nsw i64 %i, 1
				%cond = icmp slt i64 %i.next, %n
				br i1 %cond, label %for.body, label %for.end

				for.end:
				ret void
				}

test/Transforms/LoopVectorize/if-pred-non-void.ll

	Show First 20 Lines • Show All 201 Lines • ▼ Show 20 Lines

	if.end: ; preds = %if.then, %check			if.end: ; preds = %if.then, %check
	%ysd.0 = phi i32 [ %rsd, %if.then ], [ %psd, %check ]			%ysd.0 = phi i32 [ %rsd, %if.then ], [ %psd, %check ]
	store i32 %ysd.0, i32* %isd, align 4			store i32 %ysd.0, i32* %isd, align 4
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 128			%exitcond = icmp eq i64 %indvars.iv.next, 128
	br i1 %exitcond, label %for.cond.cleanup, label %for.body			br i1 %exitcond, label %for.cond.cleanup, label %for.body
	}			}


				define i32 @predicated_udiv_scalarized_operand(i32* %a, i1 %c, i32 %x, i64 %n) {
				entry:
				br label %for.body

				; CHECK-LABEL: predicated_udiv_scalarized_operand
				; CHECK: vector.body:
				; CHECK: %wide.load = load <2 x i32>, <2 x i32>* {{.*}}, align 4
				; CHECK: br i1 {{.*}}, label %[[IF0:.+]], label %[[CONT0:.+]]
				; CHECK: [[IF0]]:
				; CHECK: %[[T00:.+]] = extractelement <2 x i32> %wide.load, i32 0
				; CHECK: %[[T01:.+]] = extractelement <2 x i32> %wide.load, i32 0
				; CHECK: %[[T02:.+]] = add nsw i32 %[[T01]], %x
				; CHECK: %[[T03:.+]] = udiv i32 %[[T00]], %[[T02]]
				; CHECK: %[[T04:.+]] = insertelement <2 x i32> undef, i32 %[[T03]], i32 0
				; CHECK: br label %[[CONT0]]
				; CHECK: [[CONT0]]:
				; CHECK: %[[T05:.+]] = phi <2 x i32> [ undef, %vector.body ], [ %[[T04]], %[[IF0]] ]
				; CHECK: br i1 {{.*}}, label %[[IF1:.+]], label %[[CONT1:.+]]
				; CHECK: [[IF1]]:
				; CHECK: %[[T06:.+]] = extractelement <2 x i32> %wide.load, i32 1
				; CHECK: %[[T07:.+]] = extractelement <2 x i32> %wide.load, i32 1
				; CHECK: %[[T08:.+]] = add nsw i32 %[[T07]], %x
				; CHECK: %[[T09:.+]] = udiv i32 %[[T06]], %[[T08]]
				; CHECK: %[[T10:.+]] = insertelement <2 x i32> %[[T05]], i32 %[[T09]], i32 1
				; CHECK: br label %[[CONT1]]
				; CHECK: [[CONT1]]:
				; CHECK: phi <2 x i32> [ %[[T05]], %[[CONT0]] ], [ %[[T10]], %[[IF1]] ]
				; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body

				for.body:
				%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
				%r = phi i32 [ 0, %entry ], [ %tmp6, %for.inc ]
				%tmp0 = getelementptr inbounds i32, i32* %a, i64 %i
				%tmp2 = load i32, i32* %tmp0, align 4
				br i1 %c, label %if.then, label %for.inc

				if.then:
				%tmp3 = add nsw i32 %tmp2, %x
				%tmp4 = udiv i32 %tmp2, %tmp3
				br label %for.inc

				for.inc:
				%tmp5 = phi i32 [ %tmp2, %for.body ], [ %tmp4, %if.then]
				%tmp6 = add i32 %r, %tmp5
				%i.next = add nuw nsw i64 %i, 1
				%cond = icmp slt i64 %i.next, %n
				br i1 %cond, label %for.body, label %for.end

				for.end:
				%tmp7 = phi i32 [ %tmp6, %for.inc ]
				ret i32 %tmp7
				}

test/Transforms/LoopVectorize/if-pred-stores.ll

	; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info -simplifycfg < %s \| FileCheck %s --check-prefix=UNROLL			; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info -simplifycfg < %s \| FileCheck %s --check-prefix=UNROLL
	; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info < %s \| FileCheck %s --check-prefix=UNROLL-NOSIMPLIFY			; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info < %s \| FileCheck %s --check-prefix=UNROLL-NOSIMPLIFY
	; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=2 -force-vector-interleave=1 -loop-vectorize -enable-cond-stores-vec -verify-loop-info -simplifycfg < %s \| FileCheck %s --check-prefix=VEC			; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=2 -force-vector-interleave=1 -loop-vectorize -enable-cond-stores-vec -verify-loop-info -simplifycfg < %s \| FileCheck %s --check-prefix=VEC

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

	; Test predication of stores.			; Test predication of stores.
	define i32 @test(i32* nocapture %f) #0 {			define i32 @test(i32* nocapture %f) #0 {
	entry:			entry:
	br label %for.body			br label %for.body

	; VEC-LABEL: test			; VEC-LABEL: test
	; VEC: %[[v0:.+]] = add i64 %index, 0			; VEC: %[[v0:.+]] = add i64 %index, 0
	; VEC: %[[v8:.+]] = icmp sgt <2 x i32> %{{.*}}, <i32 100, i32 100>			; VEC: %[[v8:.+]] = icmp sgt <2 x i32> %{{.*}}, <i32 100, i32 100>
	; VEC: %[[v9:.+]] = add nsw <2 x i32> %{{.*}}, <i32 20, i32 20>
	; VEC: %[[v10:.+]] = and <2 x i1> %[[v8]], <i1 true, i1 true>			; VEC: %[[v10:.+]] = and <2 x i1> %[[v8]], <i1 true, i1 true>
	; VEC: %[[o1:.+]] = or <2 x i1> zeroinitializer, %[[v10]]			; VEC: %[[o1:.+]] = or <2 x i1> zeroinitializer, %[[v10]]
	; VEC: %[[v11:.+]] = extractelement <2 x i1> %[[o1]], i32 0			; VEC: %[[v11:.+]] = extractelement <2 x i1> %[[o1]], i32 0
	; VEC: %[[v12:.+]] = icmp eq i1 %[[v11]], true			; VEC: %[[v12:.+]] = icmp eq i1 %[[v11]], true
	; VEC: br i1 %[[v12]], label %[[cond:.+]], label %[[else:.+]]			; VEC: br i1 %[[v12]], label %[[cond:.+]], label %[[else:.+]]
	;			;
	; VEC: [[cond]]:			; VEC: [[cond]]:
	; VEC: %[[v13:.+]] = extractelement <2 x i32> %[[v9]], i32 0			; VEC: %[[v13:.+]] = extractelement <2 x i32> %wide.load, i32 0
				; VEC: %[[v9a:.+]] = add nsw i32 %[[v13]], 20
	; VEC: %[[v2:.+]] = getelementptr inbounds i32, i32* %f, i64 %[[v0]]			; VEC: %[[v2:.+]] = getelementptr inbounds i32, i32* %f, i64 %[[v0]]
	; VEC: store i32 %[[v13]], i32* %[[v2]], align 4			; VEC: store i32 %[[v9a]], i32* %[[v2]], align 4
	; VEC: br label %[[else:.+]]			; VEC: br label %[[else:.+]]
	;			;
	; VEC: [[else]]:			; VEC: [[else]]:
	; VEC: %[[v15:.+]] = extractelement <2 x i1> %[[o1]], i32 1			; VEC: %[[v15:.+]] = extractelement <2 x i1> %[[o1]], i32 1
	; VEC: %[[v16:.+]] = icmp eq i1 %[[v15]], true			; VEC: %[[v16:.+]] = icmp eq i1 %[[v15]], true
	; VEC: br i1 %[[v16]], label %[[cond2:.+]], label %[[else2:.+]]			; VEC: br i1 %[[v16]], label %[[cond2:.+]], label %[[else2:.+]]
	;			;
	; VEC: [[cond2]]:			; VEC: [[cond2]]:
	; VEC: %[[v17:.+]] = extractelement <2 x i32> %[[v9]], i32 1			; VEC: %[[v17:.+]] = extractelement <2 x i32> %wide.load, i32 1
				; VEC: %[[v9b:.+]] = add nsw i32 %[[v17]], 20
	; VEC: %[[v1:.+]] = add i64 %index, 1			; VEC: %[[v1:.+]] = add i64 %index, 1
	; VEC: %[[v4:.+]] = getelementptr inbounds i32, i32* %f, i64 %[[v1]]			; VEC: %[[v4:.+]] = getelementptr inbounds i32, i32* %f, i64 %[[v1]]
	; VEC: store i32 %[[v17]], i32* %[[v4]], align 4			; VEC: store i32 %[[v9b]], i32* %[[v4]], align 4
	; VEC: br label %[[else2:.+]]			; VEC: br label %[[else2:.+]]
	;			;
	; VEC: [[else2]]:			; VEC: [[else2]]:

	; UNROLL-LABEL: test			; UNROLL-LABEL: test
	; UNROLL: vector.body:			; UNROLL: vector.body:
	; UNROLL: %[[IND:[a-zA-Z0-9]+]] = add i64 %{{.*}}, 0			; UNROLL: %[[IND:[a-zA-Z0-9]+]] = add i64 %{{.*}}, 0
	; UNROLL: %[[IND1:[a-zA-Z0-9]+]] = add i64 %{{.*}}, 1			; UNROLL: %[[IND1:[a-zA-Z0-9]+]] = add i64 %{{.*}}, 1
	▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines