This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Transforms/Vectorize/
-
lib/
-
Transforms/
-
Vectorize/
4/7
LoopVectorize.cpp

Differential D102437

[LV] NFC: Decouple foldTailByMasking from isScalarWithPredication.
AcceptedPublic

Authored by sdesmalen on May 13 2021, 1:49 PM.

Download Raw Diff

Details

Reviewers

bmahjour
spatel
david-arm
SjoerdMeijer

Summary

isScalarWithPredication assumes that the decision for 'foldTailByMasking'
is already calculated, which may not be true at the point of asking the
question. By passing 'VFIterationIsPredicated' as a separate operand, we can
make this dependency explicit, and use this function earlier on before
it has been finally decided whether tail predication will be used.

This patch also renames isPredicatedInst -> requiresPredication, to avoid
the suggestion that the function only returns true if the instruction
was predicated in the original loop.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sdesmalen created this revision.May 13 2021, 1:49 PM

Herald added subscribers: dexonsmith, hiraditya. · View Herald TranscriptMay 13 2021, 1:49 PM

sdesmalen requested review of this revision.May 13 2021, 1:49 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 13 2021, 1:49 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

sdesmalen added a child revision: D102438: [LV] Remove unnecessary state variable FoldTailByMasking.May 13 2021, 1:50 PM

sdesmalen added reviewers: bmahjour, spatel, david-arm, SjoerdMeijer.May 13 2021, 1:53 PM

Harbormaster completed remote builds in B104374: Diff 345271.May 13 2021, 3:25 PM

I like this change, but some of the widening decisions used to feed back to the result of isScalarWithPredication which may alter the result in unexpected ways. I'd suggest you try to remove the VF from that function in a separate patch first. If it doesn't break anything anywhere, then we can move ahead with this patch.

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
262 ↗	(On Diff #345271)	could you try to make this function `const`? You'd probably need to propagate it to `blockNeedsPredication` and a few other places, but it'd be worth it.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5267	CM's `isLegalMaskedLoad`, `isLegalMaskedStore`, `isLegalMaskedGather`, `isLegalMaskedScatter` can also be removed.

Hey Sander,
This patch is a fresh air to my eyes.
Thank you!
I think you need to run git-clang-format and I have one more comment.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1534–1535	This ElementCount VF can go away because it is not being used.

sdesmalen mentioned this in rG459c48e04f25: NFCI: Remove VF argument from isScalarWithPredication.May 14 2021, 2:35 AM

I'd like to understand the main motivation and the reasons why it cannot be done via cost-modeling a bit better. IIUC the main problem is the case where scalable vectorization is forced by the user via pragmas or options? In the auto-vec case, the cost-model would already just skip scalable VFs in those cases as not profitable?

You mention ... we can't yet fall back on scalarization. From that I gather you plan to add support for that in the future? If we could scalarize arbitrary instructions for scalable vectors, would we need to do the check during the legality checks? If it would be possible to scalarize, even if expensive or hard to estimate, it seems to me that the current code should work without change, right?

Removed LoopVectorizationCostModel::isLegalMasked*
Marked canVectorizeWithPredication as const.
Rebased on rG459c48e04f25 (which also removes VF from isPredicatedInst)

In D102437#2759148, @fhahn wrote:

I'd like to understand the main motivation and the reasons why it cannot be done via cost-modeling a bit better. IIUC the main problem is the case where scalable vectorization is forced by the user via pragmas or options? In the auto-vec case, the cost-model would already just skip scalable VFs in those cases as not profitable?

You mention ... we can't yet fall back on scalarization. From that I gather you plan to add support for that in the future? If we could scalarize arbitrary instructions for scalable vectors, would we need to do the check during the legality checks? If it would be possible to scalarize, even if expensive or hard to estimate, it seems to me that the current code should work without change, right?

That's not really the case. The legalizer will ask if the LV can vectorize the loop. Some scalable operations cannot be legalized, because they'd require scalarization, which doesn't exist and there is no plan to add support for such scalarization, so instead we need to make sure that any IR that's generated can be code-generated. Subsequently, this means that the CostModel should always return a valid cost for scalable operations. This is the conceptual split between legalization and the cost-model. i.e. If something is legal, we must be able to cost it. It is also an extra guard to make sure the cost-model is complete.

This is not the only argument for this patch though. As I tried to explain in the commit message, this refactoring also helps untangle the circular dependence between the [choosing of the] VF from the cost-model, and the decision on whether tail predication is needed. e.g. For a trip-count of 100 and a MaxVF of 8, it would not decide the tail is unnecessary, whereas it could decide per VF whether it needs a tail (e.g. for a VF=4, no tail is needed). So there is a possible improvement here. Having isScalarWithPredication and blockNeedsPredication combine both the "needs tail predication" and "block/instruction needs predication" isn't helpful, hence the reason for splitting the two concerns in this patch.

Note that this same distinction is needed when choosing a scalable VF (we can't decide the tail is unnecessary if we don't know the value of vscale, whereas for a fixed-width VF it would be able to remove the tail, but we can't decide that before calling the cost-model if we don't know whether we choose a fixed-width or scalable VF).

Harbormaster completed remote builds in B104463: Diff 345394.May 14 2021, 4:58 AM

Matt added a subscriber: Matt.May 14 2021, 12:17 PM

CarolineConcatto mentioned this in D101916: [LoopVectorize] Fix crash for predicated instruction with scalable VF.May 18 2021, 12:28 AM

Gentle ping.

@bmahjour are you happy with the patch now that I've split off the possibly functional changes? (which should mean this patch is now NFC)

@fhahn were you satisfied by my reasoning?
If so, then I can create a follow-up patch that addresses the FIXME that I left here, and also improve the cost-model for the case where it currently can't eliminate the tail if the trip-count isn't divisible by the MaxVF (where it might pick a VF where the tail can be eliminated without requiring predication)

In D102437#2759232, @sdesmalen wrote:

That's not really the case. The legalizer will ask if the LV can vectorize the loop. Some scalable operations cannot be legalized, because they'd require scalarization, which doesn't exist and there is no plan to add support for such scalarization, so instead we need to make sure that any IR that's generated can be code-generated.

I am not sure if it was very clear from my earlier comment, I was referring to scalarization + predicated block generation done directly by LV.

For regular vectors, LV does not rely on the backend for scalarization, but it instead creates a loop that iterates over each index for the VF and in the loop conditionally executes the scalar version of the requested instruction and packs the result in a vector. For fixed width VFs the loop is not explicit in the generated IR, because it is just unrolled for the VF. Is there anything fundamental preventing LV to create a similar explicit loop over the scalable VF at runtime? Figuring out an accurate cost would be expensive of course, but I am curious if such scalarization would be possible.

Not sure if I am missing something, but after a quick glance it seems to me that we should have all required pieces in LLVM IR to write such a loop and lower it https://llvm.godbolt.org/z/rhqh9fz8b

Subsequently, this means that the CostModel should always return a valid cost for scalable operations. This is the conceptual split between legalization and the cost-model. i.e. If something is legal, we must be able to cost it. It is also an extra guard to make sure the cost-model is complete.

Agreed. The direction of my comments tries to figure out if there's something LV can do to scalarize arbitrary operations on scalable vectors or not.

This is not the only argument for this patch though. As I tried to explain in the commit message, this refactoring also helps untangle the circular dependence between the [choosing of the] VF from the cost-model, and the decision on whether tail predication is needed. e.g. For a trip-count of 100 and a MaxVF of 8, it would not decide the tail is unnecessary, whereas it could decide per VF whether it needs a tail (e.g. for a VF=4, no tail is needed). So there is a possible improvement here. Having isScalarWithPredication and blockNeedsPredication combine both the "needs tail predication" and "block/instruction needs predication" isn't helpful, hence the reason for splitting the two concerns in this patch.

My comment is only referring to the 'moving to LVL' part of the patch. Not sure if anything bit of the untangling relies on moving the code?

In D102437#2776569, @fhahn wrote:

Not sure if I am missing something, but after a quick glance it seems to me that we should have all required pieces in LLVM IR to write such a loop and lower it https://llvm.godbolt.org/z/rhqh9fz8b

Slightly updated link https://llvm.godbolt.org/z/nW8n6sMxr

In D102437#2776569, @fhahn wrote:

I am not sure if it was very clear from my earlier comment, I was referring to scalarization + predicated block generation done directly by LV.

For regular vectors, LV does not rely on the backend for scalarization, but it instead creates a loop that iterates over each index for the VF and in the loop conditionally executes the scalar version of the requested instruction and packs the result in a vector. For fixed width VFs the loop is not explicit in the generated IR, because it is just unrolled for the VF. Is there anything fundamental preventing LV to create a similar explicit loop over the scalable VF at runtime? Figuring out an accurate cost would be expensive of course, but I am curious if such scalarization would be possible.

Not sure if I am missing something, but after a quick glance it seems to me that we should have all required pieces in LLVM IR to write such a loop and lower it https://llvm.godbolt.org/z/rhqh9fz8b

Thanks for clarifying. We have prototyped that approach before and our experiments showed it was never beneficial to use scalarization within a vectorized loop with scalable vectors, so we decided to keep the vectorizer simple and disallow scalable auto-vec when scalarization is required. In the case of operations like SDIV, our plan is to emit a llvm.vp intrinsic which can then be ISel'ed natively (using a native instruction w/ predication). It seemed better to disallow it first to avoid any vectorization failures, and then bring up the implementation to support this case.

Subsequently, this means that the CostModel should always return a valid cost for scalable operations. This is the conceptual split between legalization and the cost-model. i.e. If something is legal, we must be able to cost it. It is also an extra guard to make sure the cost-model is complete.

Agreed. The direction of my comments tries to figure out if there's something LV can do to scalarize arbitrary operations on scalable vectors or not.

Alright, makes sense.

This is not the only argument for this patch though. As I tried to explain in the commit message, this refactoring also helps untangle the circular dependence between the [choosing of the] VF from the cost-model, and the decision on whether tail predication is needed. e.g. For a trip-count of 100 and a MaxVF of 8, it would not decide the tail is unnecessary, whereas it could decide per VF whether it needs a tail (e.g. for a VF=4, no tail is needed). So there is a possible improvement here. Having isScalarWithPredication and blockNeedsPredication combine both the "needs tail predication" and "block/instruction needs predication" isn't helpful, hence the reason for splitting the two concerns in this patch.

My comment is only referring to the 'moving to LVL' part of the patch. Not sure if anything bit of the untangling relies on moving the code?

I guess it could equally live in CM, but my reasoning to move it was more that it has nothing to do with the cost-model, so moving it to LVL avoids it getting polluted with other CM-specific code in the future.

CarolineConcatto added a child revision: D101916: [LoopVectorize] Fix crash for predicated instruction with scalable VF.May 25 2021, 8:30 AM

In D102437#2777338, @sdesmalen wrote:

In D102437#2776569, @fhahn wrote:

I am not sure if it was very clear from my earlier comment, I was referring to scalarization + predicated block generation done directly by LV.

For regular vectors, LV does not rely on the backend for scalarization, but it instead creates a loop that iterates over each index for the VF and in the loop conditionally executes the scalar version of the requested instruction and packs the result in a vector. For fixed width VFs the loop is not explicit in the generated IR, because it is just unrolled for the VF. Is there anything fundamental preventing LV to create a similar explicit loop over the scalable VF at runtime? Figuring out an accurate cost would be expensive of course, but I am curious if such scalarization would be possible.

Not sure if I am missing something, but after a quick glance it seems to me that we should have all required pieces in LLVM IR to write such a loop and lower it https://llvm.godbolt.org/z/rhqh9fz8b

Thanks for clarifying. We have prototyped that approach before and our experiments showed it was never beneficial to use scalarization within a vectorized loop with scalable vectors, so we decided to keep the vectorizer simple and disallow scalable auto-vec when scalarization is required. In the case of operations like SDIV, our plan is to emit a llvm.vp intrinsic which can then be ISel'ed natively (using a native instruction w/ predication). It seemed better to disallow it first to avoid any vectorization failures, and then bring up the implementation to support this case.

I see, thanks for elaborating! I am not sure how much extra work generating such loops would be, but it seems like disallowing scalarization requires quite a bit of extra complexity as a number of recent patches that introduce additional restrictions show.

The extra checks also seem introduce various TTI checks that seem to mirror the interfaces that compute cost. Do you know how this will impact other targets with scalable vectors? Do you know if scalarization is similarly expensive on other scalable targets? The naive predication/scalarization is also very expensive on most fixed vector targets and almost never profitable, but it simplifies the rest of the code, because LV can generate vector IR for almost all instructions mechanically.

My comment is only referring to the 'moving to LVL' part of the patch. Not sure if anything bit of the untangling relies on moving the code?

I guess it could equally live in CM, but my reasoning to move it was more that it has nothing to do with the cost-model, so moving it to LVL avoids it getting polluted with other CM-specific code in the future.

From the discussion above, it seems to me that it is still more of a cost question rather than a legality question, as IIUC from the discussion, they could be vectorized by scalarizing via a loop? Also, it looks like all uses of the functions moved to LVL are in the cost-model. The replacement of isLegalMaskedScatter seems also unrelated and could be submitted separately.

sdesmalen mentioned this in rG034503e9d2d6: [LV] NFC: Remove redundant isLegalMasked(Gather|Scatter) functions..Jun 2 2021, 6:10 AM

Rebased patch after splitting out some changes.
Refactored patch to better split concerns between legalization vs cost-model.

sdesmalen retitled this revision from [LV] NFCI: Move implementation of isScalarWithPredication to LoopVectorizationLegality to [LV] NFC: Decouple foldTailByMasking from isScalarWithPredication..Jun 3 2021, 4:03 AM

sdesmalen edited the summary of this revision. (Show Details)

In D102437#2790473, @fhahn wrote:

I see, thanks for elaborating! I am not sure how much extra work generating such loops would be, but it seems like disallowing scalarization requires quite a bit of extra complexity as a number of recent patches that introduce additional restrictions show.

The extra checks also seem introduce various TTI checks that seem to mirror the interfaces that compute cost. Do you know how this will impact other targets with scalable vectors? Do you know if scalarization is similarly expensive on other scalable targets? The naive predication/scalarization is also very expensive on most fixed vector targets and almost never profitable, but it simplifies the rest of the code, because LV can generate vector IR for almost all instructions mechanically.

I'll reply to this separately since it's not the only motivation for this patch.

I guess it could equally live in CM, but my reasoning to move it was more that it has nothing to do with the cost-model, so moving it to LVL avoids it getting polluted with other CM-specific code in the future.

From the discussion above, it seems to me that it is still more of a cost question rather than a legality question, as IIUC from the discussion, they could be vectorized by scalarizing via a loop? Also, it looks like all uses of the functions moved to LVL are in the cost-model. The replacement of isLegalMaskedScatter seems also unrelated and could be submitted separately.

Fair point, I've changed the patch a little bit which in my eyes improves the interfaces and splits out the concerns a bit better by providing interfaces to two distinct questions:

requiresPredicatedWidening: This answers the question "Does this operation require predication when being widened". I've moved this interface to LVL because it describes a property of the operation and adds a requirement to what capability is needed in order to vectorize it. It does not implement a cost-model choice of how this is eventually widened by the LV. This function was initially coined isPredicatedInst, but I find this name a bit more descriptive (also, the original implementation was 'backward', since it used isScalarWithPredication to determine whether the operation was predicated). If you feel strongly about the original name, or having this in the CM, I'm happy to change that.
isScalarWithPredication: This answers the question "If this operation needs predication, do we have to fall back on scalarizing it?". This is a LV/cost-model decision, hence why it lives in the CM.

Does this make more sense?

Note that I've committed the changes for isLegalMaskedScatter|Gather separately in rG034503e9d2d66ab75679ab5d2ee0848f4de3cac7.

Harbormaster completed remote builds in B107432: Diff 349514.Jun 3 2021, 4:40 AM

SjoerdMeijer added inline comments.Jun 9 2021, 2:40 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1354 ↗	(On Diff #349514)	Just wanted to check what we mean by speculation here (or why it is relevant here)?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1538	nit: `I` -> `\p I`

Hmm, strange some comments went missing when I pushed the button previously.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1343 ↗	(On Diff #349514)	I think this is related to the legal vs. cost-model discussion. I agree that it is probably more a cost-model thing. Does that mean this needs to live in CostModel, where it used to live?

SjoerdMeijer added inline comments.Jun 9 2021, 2:47 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5251	I am confused here about names or some negations in logic, but probably it is names. Here, if we "do not require predicated widening", we return false for `isScalarWithPredication`, whereas I probably was expecting true. Ignoring some edge cases, `requiresPredicatedWidening` returns true for most cases because `all operations can be widened safely without predication`. So probably I am confused that we are talking about widening and isScalar at the same time, or I am missing something else.

sdesmalen added inline comments.Jun 9 2021, 3:19 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1343 ↗	(On Diff #349514)	Okay, if you feel the same way, I'll just move it to the CostModel!
1354 ↗	(On Diff #349514)	The reason I initially added the comment was because I would have expected `return true;` (given that the block needs predication or the VF Iteration is predicated), but it seems LVL uses a more detailed analysis to check whether the operation can be speculatively executed, reflected in `isMaskRequired(I)`.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5251	`requiresPredicatedWidening` returns true for most cases because `all operations can be widened safely without predication` `requiresPredicatedWidening` returns false for most cases, because most operations can be widened safely without requiring predication. I've tried to split up two questions: `requiresPredicatedWidening`: Given the use of predication in the loop (either from tail folding, or from the operation being under an actual if-then condition), does widening the operation require predication, or conversely: can the operation speculatively execute all lanes followed by a `select` on the output. If the answer to the latter question is 'no' (like the case for DIV), then the instruction requires a predicated vector operation (<=> predicated widening). `isScalarWithPredication`: Given that the operation needs predicated widening, can we actually widen it, or do we need to scalarise it? If it doesn't require predicated widening, we know it also doesn't need to be scalarised (hence the condition to `return false` here) Does that help?

sdesmalen added inline comments.Jun 9 2021, 3:46 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1343 ↗	(On Diff #349514)	Just to clarify the original reason to move it to Legality was for a similar reason as that there exists `LoopVectoriztionLegality::blockNeedsPredication`, which describes a property of the original loop when vectorizing. The same property could be queried for a specific instruction, as if the interface was named `instructionNeedsPredication` (instead I named it requiresPredicatedWidening). Also, by moving it out of the CostModel, we avoid private cost-model info to pollute the code such as the `foldTailByMasking()`, which this patch tries to split out)

SjoerdMeijer added inline comments.Jun 9 2021, 7:00 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5251	Thanks, that definitely helped, and I now see where I went wrong. These explanations are better (in my opinion) than the current comments for these functions. If you can update those comments that would be good. I am now going to read the patch again.

Rebased patch (after having pulled out one of the changes into D113392)

Harbormaster completed remote builds in B132983: Diff 385448.Nov 8 2021, 4:17 AM

LGTM! I think this is a sensible refactor and it seems clearer in the code now what's going on.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5251	nit: predication

This revision is now accepted and ready to land.Nov 12 2021, 8:57 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

114 lines

Diff 385448

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,525 Lines • ▼ Show 20 Lines	public:
/// Returns true if the target machine supports all of the reduction		/// Returns true if the target machine supports all of the reduction
/// variables found for the given VF.		/// variables found for the given VF.
bool canVectorizeReductions(ElementCount VF) const {		bool canVectorizeReductions(ElementCount VF) const {
return (all_of(Legal->getReductionVars(), [&](auto &Reduction) -> bool {		return (all_of(Legal->getReductionVars(), [&](auto &Reduction) -> bool {
const RecurrenceDescriptor &RdxDesc = Reduction.second;		const RecurrenceDescriptor &RdxDesc = Reduction.second;
return TTI.isLegalToVectorizeReduction(RdxDesc, VF);		return TTI.isLegalToVectorizeReduction(RdxDesc, VF);
}));		}));
}		}

/// Returns true if \p I is an instruction that will be scalarized with		/// Returns true if \p I is an instruction that requires predication, but
		CarolineConcattoUnsubmitted Done Reply Inline Actions This ElementCount VF can go away because it is not being used. CarolineConcatto: This ElementCount VF can go away because it is not being used.
/// predication. Such instructions include conditional stores and		/// cannot use predicated vector ops to implement the widened operations, thus
/// instructions that may divide by zero.		/// having to fall back on scalarized operations.
/// If a non-zero VF has been calculated, we check if I will be scalarized		/// If \p VFIterationIsPredicated is true, then the vector iteration will use
SjoerdMeijerUnsubmitted Not Done Reply Inline Actions nit: `I` -> `\p I` SjoerdMeijer: nit: `I` -> `\p I`
/// predication for that VF.		/// predication to enable only the active lanes handled in the VF iteration,
bool isScalarWithPredication(Instruction *I) const;		/// even if the block is not predicated in the original scalar loop (e.g. for
		/// tail folding).
// Returns true if \p I is an instruction that will be predicated either		bool isScalarWithPredication(Instruction *I,
// through scalar predication or masked load/store or masked gather/scatter.		bool VFIterationIsPredicated) const;
// Superset of instructions that return true for isScalarWithPredication.
bool isPredicatedInst(Instruction *I) {		/// Return true if the instruction requires a predicated vector operation
if (!Legal->blockNeedsPredication(I->getParent()) && !foldTailByMasking())		/// when widening \p I to a vector. Such instructions include conditional
return false;		/// stores and instructions that may divide by zero.
// Loads and stores that need some form of masked operation are predicated		bool requiresPredication(Instruction *I, bool VFIterationIsPredicated) const;
// instructions.
if (isa<LoadInst>(I) \|\| isa<StoreInst>(I))
return Legal->isMaskRequired(I);
return isScalarWithPredication(I);
}

/// Returns true if \p I is a memory instruction with consecutive memory		/// Returns true if \p I is a memory instruction with consecutive memory
/// access that can be widened.		/// access that can be widened.
bool		bool
memoryInstructionCanBeWidened(Instruction *I,		memoryInstructionCanBeWidened(Instruction *I,
ElementCount VF = ElementCount::getFixed(1));		ElementCount VF = ElementCount::getFixed(1));

/// Returns true if \p I is a memory instruction in an interleaved-group		/// Returns true if \p I is a memory instruction in an interleaved-group
▲ Show 20 Lines • Show All 3,684 Lines • ▼ Show 20 Lines	for (auto &Induction : Legal->getInductionVars()) {
LLVM_DEBUG(dbgs() << "LV: Found scalar instruction: " << *Ind << "\n");		LLVM_DEBUG(dbgs() << "LV: Found scalar instruction: " << *Ind << "\n");
LLVM_DEBUG(dbgs() << "LV: Found scalar instruction: " << *IndUpdate		LLVM_DEBUG(dbgs() << "LV: Found scalar instruction: " << *IndUpdate
<< "\n");		<< "\n");
}		}

Scalars[VF].insert(Worklist.begin(), Worklist.end());		Scalars[VF].insert(Worklist.begin(), Worklist.end());
}		}

bool LoopVectorizationCostModel::isScalarWithPredication(Instruction *I) const {		bool LoopVectorizationCostModel::requiresPredication(
if (!Legal->blockNeedsPredication(I->getParent()) && !foldTailByMasking())		Instruction *I, bool VFIterationIsPredicated) const {
		// If no predication is used, no predicatation is needed for the widened op.
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions I am confused here about names or some negations in logic, but probably it is names. Here, if we "do not require predicated widening", we return false for `isScalarWithPredication`, whereas I probably was expecting true. Ignoring some edge cases, `requiresPredicatedWidening` returns true for most cases because `all operations can be widened safely without predication`. So probably I am confused that we are talking about widening and isScalar at the same time, or I am missing something else. SjoerdMeijer: I am confused here about names or some negations in logic, but probably it is names. Here, if…
		sdesmalenAuthorUnsubmitted Done Reply Inline Actions `requiresPredicatedWidening` returns true for most cases because `all operations can be widened safely without predication` `requiresPredicatedWidening` returns false for most cases, because most operations can be widened safely without requiring predication. I've tried to split up two questions: `requiresPredicatedWidening`: Given the use of predication in the loop (either from tail folding, or from the operation being under an actual if-then condition), does widening the operation require predication, or conversely: can the operation speculatively execute all lanes followed by a `select` on the output. If the answer to the latter question is 'no' (like the case for DIV), then the instruction requires a predicated vector operation (<=> predicated widening). `isScalarWithPredication`: Given that the operation needs predicated widening, can we actually widen it, or do we need to scalarise it? If it doesn't require predicated widening, we know it also doesn't need to be scalarised (hence the condition to `return false` here) Does that help? sdesmalen: > `requiresPredicatedWidening` returns true for most cases because `all operations can be…
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Thanks, that definitely helped, and I now see where I went wrong. These explanations are better (in my opinion) than the current comments for these functions. If you can update those comments that would be good. I am now going to read the patch again. SjoerdMeijer: Thanks, that definitely helped, and I now see where I went wrong. These explanations are…
		david-armUnsubmitted Not Done Reply Inline Actions nit: predication david-arm: nit: predication
		if (!Legal->blockNeedsPredication(I->getParent()) && !VFIterationIsPredicated)
return false;		return false;

switch(I->getOpcode()) {		switch (I->getOpcode()) {
default:		default:
break;		break;
case Instruction::Load:		case Instruction::Load:
case Instruction::Store: {		case Instruction::Store:
if (!Legal->isMaskRequired(I))		// Some instructions can be speculated, even when predication is used
		// for the block.
		return Legal->isMaskRequired(I);
		case Instruction::UDiv:
		case Instruction::SDiv:
		case Instruction::SRem:
		case Instruction::URem:
		// If predication is used for this block and the operation would otherwise
		// be guarded, then this requires a predicated vector operation.
		return mayDivideByZero(*I);
		}

		// By default, all operations can be widened safely without predication.
return false;		return false;
		}

		bool LoopVectorizationCostModel::isScalarWithPredication(
		Instruction *I, bool VFIterationIsPredicated) const {
		if (!requiresPredication(I, VFIterationIsPredicated))
		return false;

		// See if masked.load/store instructions are legal.
		if (isa<LoadInst>(I) \|\| isa<StoreInst>(I)) {
auto *Ptr = getLoadStorePointerOperand(I);		auto *Ptr = getLoadStorePointerOperand(I);
auto *Ty = getLoadStoreType(I);		auto *Ty = getLoadStoreType(I);
const Align Alignment = getLoadStoreAlignment(I);		const Align Alignment = getLoadStoreAlignment(I);
return isa<LoadInst>(I) ? !(isLegalMaskedLoad(Ty, Ptr, Alignment) \|\|		return isa<LoadInst>(I) ? !(isLegalMaskedLoad(Ty, Ptr, Alignment) \|\|
bmahjourUnsubmitted Done Reply Inline Actions CM's `isLegalMaskedLoad`, `isLegalMaskedStore`, `isLegalMaskedGather`, `isLegalMaskedScatter` can also be removed. bmahjour: CM's `isLegalMaskedLoad`, `isLegalMaskedStore`, `isLegalMaskedGather`, `isLegalMaskedScatter`…
TTI.isLegalMaskedGather(Ty, Alignment))		TTI.isLegalMaskedGather(Ty, Alignment))
: !(isLegalMaskedStore(Ty, Ptr, Alignment) \|\|		: !(isLegalMaskedStore(Ty, Ptr, Alignment) \|\|
TTI.isLegalMaskedScatter(Ty, Alignment));		TTI.isLegalMaskedScatter(Ty, Alignment));
}		}
case Instruction::UDiv:
case Instruction::SDiv:		// Until the LV adds support for using llvm.vp intrinsics to handle
case Instruction::SRem:		// these operations, fall back on predicated scalar operations.
case Instruction::URem:		return true;
return mayDivideByZero(*I);
}
return false;
}		}

bool LoopVectorizationCostModel::interleavedAccessCanBeWidened(		bool LoopVectorizationCostModel::interleavedAccessCanBeWidened(
Instruction *I, ElementCount VF) {		Instruction *I, ElementCount VF) {
assert(isAccessInterleaved(I) && "Expecting interleaved access.");		assert(isAccessInterleaved(I) && "Expecting interleaved access.");
assert(getWideningDecision(I, VF) == CM_Unknown &&		assert(getWideningDecision(I, VF) == CM_Unknown &&
"Decision should not be set yet.");		"Decision should not be set yet.");
auto *Group = getInterleavedAccessGroup(I);		auto *Group = getInterleavedAccessGroup(I);
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	bool LoopVectorizationCostModel::memoryInstructionCanBeWidened(
auto *ScalarTy = getLoadStoreType(I);		auto *ScalarTy = getLoadStoreType(I);

// In order to be widened, the pointer should be consecutive, first of all.		// In order to be widened, the pointer should be consecutive, first of all.
if (!Legal->isConsecutivePtr(ScalarTy, Ptr))		if (!Legal->isConsecutivePtr(ScalarTy, Ptr))
return false;		return false;

// If the instruction is a store located in a predicated block, it will be		// If the instruction is a store located in a predicated block, it will be
// scalarized.		// scalarized.
if (isScalarWithPredication(I))		if (isScalarWithPredication(I, foldTailByMasking()))
return false;		return false;

// If the instruction's allocated size doesn't equal it's type size, it		// If the instruction's allocated size doesn't equal it's type size, it
// requires padding and will be scalarized.		// requires padding and will be scalarized.
auto &DL = I->getModule()->getDataLayout();		auto &DL = I->getModule()->getDataLayout();
if (hasIrregularType(ScalarTy, DL))		if (hasIrregularType(ScalarTy, DL))
return false;		return false;

Show All 34 Lines	void LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) {
// where only a single instance out of VF should be formed.		// where only a single instance out of VF should be formed.
// TODO: optimize such seldom cases if found important, see PR40816.		// TODO: optimize such seldom cases if found important, see PR40816.
auto addToWorklistIfAllowed = [&](Instruction *I) -> void {		auto addToWorklistIfAllowed = [&](Instruction *I) -> void {
if (isOutOfScope(I)) {		if (isOutOfScope(I)) {
LLVM_DEBUG(dbgs() << "LV: Found not uniform due to scope: "		LLVM_DEBUG(dbgs() << "LV: Found not uniform due to scope: "
<< *I << "\n");		<< *I << "\n");
return;		return;
}		}
if (isScalarWithPredication(I)) {		if (isScalarWithPredication(I, foldTailByMasking())) {
LLVM_DEBUG(dbgs() << "LV: Found not uniform being ScalarWithPredication: "		LLVM_DEBUG(dbgs() << "LV: Found not uniform being ScalarWithPredication: "
<< *I << "\n");		<< *I << "\n");
return;		return;
}		}
LLVM_DEBUG(dbgs() << "LV: Found uniform instruction: " << *I << "\n");		LLVM_DEBUG(dbgs() << "LV: Found uniform instruction: " << *I << "\n");
Worklist.insert(I);		Worklist.insert(I);
};		};

▲ Show 20 Lines • Show All 1,391 Lines • ▼ Show 20 Lines	bool LoopVectorizationCostModel::useEmulatedMaskMemRefHack(Instruction *I){
// TODO: Cost model for emulated masked load/store is completely		// TODO: Cost model for emulated masked load/store is completely
// broken. This hack guides the cost model to use an artificially		// broken. This hack guides the cost model to use an artificially
// high enough value to practically disable vectorization with such		// high enough value to practically disable vectorization with such
// operations, except where previously deployed legality hack allowed		// operations, except where previously deployed legality hack allowed
// using very low cost values. This is to avoid regressions coming simply		// using very low cost values. This is to avoid regressions coming simply
// from moving "masked load/store" check from legality to cost model.		// from moving "masked load/store" check from legality to cost model.
// Masked Load/Gather emulation was previously never allowed.		// Masked Load/Gather emulation was previously never allowed.
// Limited number of Masked Store/Scatter emulation was allowed.		// Limited number of Masked Store/Scatter emulation was allowed.
assert(isPredicatedInst(I) &&		assert(requiresPredication(I, foldTailByMasking()) &&
"Expecting a scalar emulated instruction");		"Expecting a scalar emulated instruction");
return isa<LoadInst>(I) \|\|		return isa<LoadInst>(I) \|\|
(isa<StoreInst>(I) &&		(isa<StoreInst>(I) &&
NumPredStores > NumberOfStoresToPredicate);		NumPredStores > NumberOfStoresToPredicate);
}		}

void LoopVectorizationCostModel::collectInstsToScalarize(ElementCount VF) {		void LoopVectorizationCostModel::collectInstsToScalarize(ElementCount VF) {
// If we aren't vectorizing the loop, or if we've already collected the		// If we aren't vectorizing the loop, or if we've already collected the
Show All 11 Lines	void LoopVectorizationCostModel::collectInstsToScalarize(ElementCount VF) {

// Find all the instructions that are scalar with predication in the loop and		// Find all the instructions that are scalar with predication in the loop and
// determine if it would be better to not if-convert the blocks they are in.		// determine if it would be better to not if-convert the blocks they are in.
// If so, we also record the instructions to scalarize.		// If so, we also record the instructions to scalarize.
for (BasicBlock *BB : TheLoop->blocks()) {		for (BasicBlock *BB : TheLoop->blocks()) {
if (!Legal->blockNeedsPredication(BB) && !foldTailByMasking())		if (!Legal->blockNeedsPredication(BB) && !foldTailByMasking())
continue;		continue;
for (Instruction &I : *BB)		for (Instruction &I : *BB)
if (isScalarWithPredication(&I)) {		if (isScalarWithPredication(&I, foldTailByMasking())) {
ScalarCostsTy ScalarCosts;		ScalarCostsTy ScalarCosts;
// Do not apply discount if scalable, because that would lead to		// Do not apply discount if scalable, because that would lead to
// invalid scalarization costs.		// invalid scalarization costs.
// Do not apply discount logic if hacked cost is needed		// Do not apply discount logic if hacked cost is needed
// for emulated masked memrefs.		// for emulated masked memrefs.
if (!VF.isScalable() && !useEmulatedMaskMemRefHack(&I) &&		if (!VF.isScalable() && !useEmulatedMaskMemRefHack(&I) &&
computePredInstDiscount(&I, ScalarCosts, VF) >= 0)		computePredInstDiscount(&I, ScalarCosts, VF) >= 0)
ScalarCostsVF.insert(ScalarCosts.begin(), ScalarCosts.end());		ScalarCostsVF.insert(ScalarCosts.begin(), ScalarCosts.end());
Show All 25 Lines	auto canBeScalarized = [&](Instruction *I) -> bool {
// already be scalar to avoid traversing chains that are unlikely to be		// already be scalar to avoid traversing chains that are unlikely to be
// beneficial.		// beneficial.
if (!I->hasOneUse() \|\| PredInst->getParent() != I->getParent() \|\|		if (!I->hasOneUse() \|\| PredInst->getParent() != I->getParent() \|\|
isScalarAfterVectorization(I, VF))		isScalarAfterVectorization(I, VF))
return false;		return false;

// If the instruction is scalar with predication, it will be analyzed		// If the instruction is scalar with predication, it will be analyzed
// separately. We ignore it within the context of PredInst.		// separately. We ignore it within the context of PredInst.
if (isScalarWithPredication(I))		if (isScalarWithPredication(I, foldTailByMasking()))
return false;		return false;

// If any of the instruction's operands are uniform after vectorization,		// If any of the instruction's operands are uniform after vectorization,
// the instruction cannot be scalarized. This prevents, for example, a		// the instruction cannot be scalarized. This prevents, for example, a
// masked load from being scalarized.		// masked load from being scalarized.
//		//
// We assume we will only emit a value for lane zero of an instruction		// We assume we will only emit a value for lane zero of an instruction
// marked uniform after vectorization, rather than VF identical values.		// marked uniform after vectorization, rather than VF identical values.
Show All 30 Lines	while (!Worklist.empty()) {
// predicated block. We will scale this cost by block probability after		// predicated block. We will scale this cost by block probability after
// computing the scalarization overhead.		// computing the scalarization overhead.
InstructionCost ScalarCost =		InstructionCost ScalarCost =
VF.getFixedValue() *		VF.getFixedValue() *
getInstructionCost(I, ElementCount::getFixed(1)).first;		getInstructionCost(I, ElementCount::getFixed(1)).first;

// Compute the scalarization overhead of needed insertelement instructions		// Compute the scalarization overhead of needed insertelement instructions
// and phi nodes.		// and phi nodes.
if (isScalarWithPredication(I) && !I->getType()->isVoidTy()) {		if (isScalarWithPredication(I, foldTailByMasking()) &&
		!I->getType()->isVoidTy()) {
ScalarCost += TTI.getScalarizationOverhead(		ScalarCost += TTI.getScalarizationOverhead(
cast<VectorType>(ToVectorTy(I->getType(), VF)),		cast<VectorType>(ToVectorTy(I->getType(), VF)),
APInt::getAllOnes(VF.getFixedValue()), true, false);		APInt::getAllOnes(VF.getFixedValue()), true, false);
ScalarCost +=		ScalarCost +=
VF.getFixedValue() *		VF.getFixedValue() *
TTI.getCFInstrCost(Instruction::PHI, TTI::TCK_RecipThroughput);		TTI.getCFInstrCost(Instruction::PHI, TTI::TCK_RecipThroughput);
}		}

▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel::getMemInstScalarizationCost(Instruction *I,

// Get the overhead of the extractelement and insertelement instructions		// Get the overhead of the extractelement and insertelement instructions
// we might create due to scalarization.		// we might create due to scalarization.
Cost += getScalarizationOverhead(I, VF);		Cost += getScalarizationOverhead(I, VF);

// If we have a predicated load/store, it will need extra i1 extracts and		// If we have a predicated load/store, it will need extra i1 extracts and
// conditional branches, but may not be executed for each vector lane. Scale		// conditional branches, but may not be executed for each vector lane. Scale
// the cost by the probability of executing the predicated block.		// the cost by the probability of executing the predicated block.
if (isPredicatedInst(I)) {		if (requiresPredication(I, foldTailByMasking())) {
Cost /= getReciprocalPredBlockProb();		Cost /= getReciprocalPredBlockProb();

// Add the cost of an i1 extract and a branch		// Add the cost of an i1 extract and a branch
auto *Vec_i1Ty =		auto *Vec_i1Ty =
VectorType::get(IntegerType::getInt1Ty(ValTy->getContext()), VF);		VectorType::get(IntegerType::getInt1Ty(ValTy->getContext()), VF);
Cost += TTI.getScalarizationOverhead(		Cost += TTI.getScalarizationOverhead(
Vec_i1Ty, APInt::getAllOnes(VF.getKnownMinValue()),		Vec_i1Ty, APInt::getAllOnes(VF.getKnownMinValue()),
/Insert=/false, /Extract=/true);		/Insert=/false, /Extract=/true);
▲ Show 20 Lines • Show All 371 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {
Value *Ptr = getLoadStorePointerOperand(&I);		Value *Ptr = getLoadStorePointerOperand(&I);
if (!Ptr)		if (!Ptr)
continue;		continue;

// TODO: We should generate better code and update the cost model for		// TODO: We should generate better code and update the cost model for
// predicated uniform stores. Today they are treated as any other		// predicated uniform stores. Today they are treated as any other
// predicated store (see added test cases in		// predicated store (see added test cases in
// invariant-store-vectorization.ll).		// invariant-store-vectorization.ll).
if (isa<StoreInst>(&I) && isScalarWithPredication(&I))		if (isa<StoreInst>(&I) &&
		isScalarWithPredication(&I, foldTailByMasking()))
NumPredStores++;		NumPredStores++;

if (Legal->isUniformMemOp(I)) {		if (Legal->isUniformMemOp(I)) {
// TODO: Avoid replicating loads and stores instead of		// TODO: Avoid replicating loads and stores instead of
// relying on instcombine to remove them.		// relying on instcombine to remove them.
// Load: Scalar load + broadcast		// Load: Scalar load + broadcast
// Store: Scalar store + isLoopInvariantStoreValue ? 0 : extract		// Store: Scalar store + isLoopInvariantStoreValue ? 0 : extract
InstructionCost Cost;		InstructionCost Cost;
▲ Show 20 Lines • Show All 238 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
case Instruction::UDiv:		case Instruction::UDiv:
case Instruction::SDiv:		case Instruction::SDiv:
case Instruction::URem:		case Instruction::URem:
case Instruction::SRem:		case Instruction::SRem:
// If we have a predicated instruction, it may not be executed for each		// If we have a predicated instruction, it may not be executed for each
// vector lane. Get the scalarization cost and scale this amount by the		// vector lane. Get the scalarization cost and scale this amount by the
// probability of executing the predicated block. If the instruction is not		// probability of executing the predicated block. If the instruction is not
// predicated, we fall through to the next case.		// predicated, we fall through to the next case.
if (VF.isVector() && isScalarWithPredication(I)) {		if (VF.isVector() && isScalarWithPredication(I, foldTailByMasking())) {
InstructionCost Cost = 0;		InstructionCost Cost = 0;

// These instructions have a non-void type, so account for the phi nodes		// These instructions have a non-void type, so account for the phi nodes
// that we will create. This cost is likely to be zero. The phi node		// that we will create. This cost is likely to be zero. The phi node
// cost, if any, should be scaled by the block probability because it		// cost, if any, should be scaled by the block probability because it
// models a copy at the end of each predicated block.		// models a copy at the end of each predicated block.
Cost += VF.getKnownMinValue() *		Cost += VF.getKnownMinValue() *
TTI.getCFInstrCost(Instruction::PHI, CostKind);		TTI.getCFInstrCost(Instruction::PHI, CostKind);
▲ Show 20 Lines • Show All 1,169 Lines • ▼ Show 20 Lines	VPRecipeOrVPValueTy VPRecipeBuilder::tryToBlend(PHINode *Phi,
return toVPRecipeResult(new VPBlendRecipe(Phi, OperandsWithMask));		return toVPRecipeResult(new VPBlendRecipe(Phi, OperandsWithMask));
}		}

VPWidenCallRecipe VPRecipeBuilder::tryToWidenCall(CallInst CI,		VPWidenCallRecipe VPRecipeBuilder::tryToWidenCall(CallInst CI,
ArrayRef<VPValue *> Operands,		ArrayRef<VPValue *> Operands,
VFRange &Range) const {		VFRange &Range) const {

bool IsPredicated = LoopVectorizationPlanner::getDecisionAndClampRange(		bool IsPredicated = LoopVectorizationPlanner::getDecisionAndClampRange(
[this, CI](ElementCount VF) { return CM.isScalarWithPredication(CI); },		[this, CI](ElementCount VF) {
		return CM.isScalarWithPredication(CI, CM.foldTailByMasking());
		},
Range);		Range);

if (IsPredicated)		if (IsPredicated)
return nullptr;		return nullptr;

Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);
if (ID && (ID == Intrinsic::assume \|\| ID == Intrinsic::lifetime_end \|\|		if (ID && (ID == Intrinsic::assume \|\| ID == Intrinsic::lifetime_end \|\|
ID == Intrinsic::lifetime_start \|\| ID == Intrinsic::sideeffect \|\|		ID == Intrinsic::lifetime_start \|\| ID == Intrinsic::sideeffect \|\|
Show All 23 Lines

bool VPRecipeBuilder::shouldWiden(Instruction *I, VFRange &Range) const {		bool VPRecipeBuilder::shouldWiden(Instruction *I, VFRange &Range) const {
assert(!isa<BranchInst>(I) && !isa<PHINode>(I) && !isa<LoadInst>(I) &&		assert(!isa<BranchInst>(I) && !isa<PHINode>(I) && !isa<LoadInst>(I) &&
!isa<StoreInst>(I) && "Instruction should have been handled earlier");		!isa<StoreInst>(I) && "Instruction should have been handled earlier");
// Instruction should be widened, unless it is scalar after vectorization,		// Instruction should be widened, unless it is scalar after vectorization,
// scalarization is profitable or it is predicated.		// scalarization is profitable or it is predicated.
auto WillScalarize = [this, I](ElementCount VF) -> bool {		auto WillScalarize = [this, I](ElementCount VF) -> bool {
return CM.isScalarAfterVectorization(I, VF) \|\|		return CM.isScalarAfterVectorization(I, VF) \|\|
CM.isProfitableToScalarize(I, VF) \|\| CM.isScalarWithPredication(I);		CM.isProfitableToScalarize(I, VF) \|\|
		CM.isScalarWithPredication(I, CM.foldTailByMasking());
};		};
return !LoopVectorizationPlanner::getDecisionAndClampRange(WillScalarize,		return !LoopVectorizationPlanner::getDecisionAndClampRange(WillScalarize,
Range);		Range);
}		}

VPWidenRecipe VPRecipeBuilder::tryToWiden(Instruction I,		VPWidenRecipe VPRecipeBuilder::tryToWiden(Instruction I,
ArrayRef<VPValue *> Operands) const {		ArrayRef<VPValue *> Operands) const {
auto IsVectorizableOpcode = [](unsigned Opcode) {		auto IsVectorizableOpcode = [](unsigned Opcode) {
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
VPBasicBlock *VPRecipeBuilder::handleReplication(		VPBasicBlock *VPRecipeBuilder::handleReplication(
Instruction I, VFRange &Range, VPBasicBlock VPBB,		Instruction I, VFRange &Range, VPBasicBlock VPBB,
VPlanPtr &Plan) {		VPlanPtr &Plan) {
bool IsUniform = LoopVectorizationPlanner::getDecisionAndClampRange(		bool IsUniform = LoopVectorizationPlanner::getDecisionAndClampRange(
[&](ElementCount VF) { return CM.isUniformAfterVectorization(I, VF); },		[&](ElementCount VF) { return CM.isUniformAfterVectorization(I, VF); },
Range);		Range);

bool IsPredicated = LoopVectorizationPlanner::getDecisionAndClampRange(		bool IsPredicated = LoopVectorizationPlanner::getDecisionAndClampRange(
[&](ElementCount VF) { return CM.isPredicatedInst(I); }, Range);		[&](ElementCount VF) {
		return CM.requiresPredication(I, CM.foldTailByMasking());
		},
		Range);

// Even if the instruction is not marked as uniform, there are certain		// Even if the instruction is not marked as uniform, there are certain
// intrinsic calls that can be effectively treated as such, so we check for		// intrinsic calls that can be effectively treated as such, so we check for
// them here. Conservatively, we only do this for scalable vectors, since		// them here. Conservatively, we only do this for scalable vectors, since
// for fixed-width VFs we can always fall back on full scalarization.		// for fixed-width VFs we can always fall back on full scalarization.
if (!IsUniform && Range.Start.isScalable() && isa<IntrinsicInst>(I)) {		if (!IsUniform && Range.Start.isScalable() && isa<IntrinsicInst>(I)) {
switch (cast<IntrinsicInst>(I)->getIntrinsicID()) {		switch (cast<IntrinsicInst>(I)->getIntrinsicID()) {
case Intrinsic::assume:		case Intrinsic::assume:
Show All 30 Lines	VPBasicBlock *VPRecipeBuilder::handleReplication(
// value. Avoid hoisting the insert-element which packs the scalar value into		// value. Avoid hoisting the insert-element which packs the scalar value into
// a vector value, as that happens iff all users use the vector value.		// a vector value, as that happens iff all users use the vector value.
for (VPValue *Op : Recipe->operands()) {		for (VPValue *Op : Recipe->operands()) {
auto *PredR = dyn_cast_or_null<VPPredInstPHIRecipe>(Op->getDef());		auto *PredR = dyn_cast_or_null<VPPredInstPHIRecipe>(Op->getDef());
if (!PredR)		if (!PredR)
continue;		continue;
auto *RepR =		auto *RepR =
cast_or_null<VPReplicateRecipe>(PredR->getOperand(0)->getDef());		cast_or_null<VPReplicateRecipe>(PredR->getOperand(0)->getDef());
assert(RepR->isPredicated() &&		assert(RepR->requiresPredication() &&
"expected Replicate recipe to be predicated");		"expected Replicate recipe to be predicated");
RepR->setAlsoPack(false);		RepR->setAlsoPack(false);
}		}

// Finalize the recipe for Instr, first if it is not predicated.		// Finalize the recipe for Instr, first if it is not predicated.
if (!IsPredicated) {		if (!IsPredicated) {
LLVM_DEBUG(dbgs() << "LV: Scalarizing:" << *I << "\n");		LLVM_DEBUG(dbgs() << "LV: Scalarizing:" << *I << "\n");
VPBB->appendRecipe(Recipe);		VPBB->appendRecipe(Recipe);
▲ Show 20 Lines • Show All 1,556 Lines • Show Last 20 Lines