Page MenuHomePhabricator

[LV] NFC: Decouple foldTailByMasking from isScalarWithPredication.
Needs ReviewPublic

Authored by sdesmalen on May 13 2021, 1:49 PM.

Details

Summary

isScalarWithPredication assumes that the decision for 'foldTailByMasking'
is already calculated, which may not be true at the point of asking the
question. By passing 'VFIterationIsPredicated' as a separate operand, we can
make this dependency explicit, and use this function earlier on before
it has been finally decided whether tail predication will be used.

Additionally, this patch:

  • Replaces LoopVectorizationCostModel::isPredicatedInst by LoopVectorizationLegality::requiresPredicatedWidening.
  • Replaces LoopVectorizationCostModel::blockNeedsPredication by LoopVectorizationLegality::blockNeedsPredication.

Diff Detail

Unit TestsFailed

TimeTest
90 msx64 debian > LLVM.Analysis/CostModel/AArch64::bitreverse.ll
Script: -- : 'RUN: at line 2'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt < /var/lib/buildkite-agent/builds/llvm-project/llvm/test/Analysis/CostModel/AArch64/bitreverse.ll -mtriple=aarch64-unknown-linux-gnu -cost-model -analyze | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/llvm/test/Analysis/CostModel/AArch64/bitreverse.ll
110 msx64 windows > LLVM.Analysis/CostModel/AArch64::bitreverse.ll
Script: -- : 'RUN: at line 2'; c:\ws\w6\llvm-project\premerge-checks\build\bin\opt.exe < C:\ws\w6\llvm-project\premerge-checks\llvm\test\Analysis\CostModel\AArch64\bitreverse.ll -mtriple=aarch64-unknown-linux-gnu -cost-model -analyze | c:\ws\w6\llvm-project\premerge-checks\build\bin\filecheck.exe C:\ws\w6\llvm-project\premerge-checks\llvm\test\Analysis\CostModel\AArch64\bitreverse.ll

Event Timeline

sdesmalen created this revision.May 13 2021, 1:49 PM
sdesmalen requested review of this revision.May 13 2021, 1:49 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 13 2021, 1:49 PM

I like this change, but some of the widening decisions used to feed back to the result of isScalarWithPredication which may alter the result in unexpected ways. I'd suggest you try to remove the VF from that function in a separate patch first. If it doesn't break anything anywhere, then we can move ahead with this patch.

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
286

could you try to make this function const? You'd probably need to propagate it to blockNeedsPredication and a few other places, but it'd be worth it.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5350

CM's isLegalMaskedLoad, isLegalMaskedStore, isLegalMaskedGather, isLegalMaskedScatter can also be removed.

Hey Sander,
This patch is a fresh air to my eyes.
Thank you!
I think you need to run git-clang-format and I have one more comment.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1513–1514

This ElementCount VF can go away because it is not being used.

fhahn added a subscriber: fhahn.May 14 2021, 3:16 AM

I'd like to understand the main motivation and the reasons why it cannot be done via cost-modeling a bit better. IIUC the main problem is the case where scalable vectorization is forced by the user via pragmas or options? In the auto-vec case, the cost-model would already just skip scalable VFs in those cases as not profitable?

You mention ... we can't yet fall back on scalarization. From that I gather you plan to add support for that in the future? If we could scalarize arbitrary instructions for scalable vectors, would we need to do the check during the legality checks? If it would be possible to scalarize, even if expensive or hard to estimate, it seems to me that the current code should work without change, right?

sdesmalen updated this revision to Diff 345394.May 14 2021, 4:19 AM
sdesmalen marked 3 inline comments as done.
  • Removed LoopVectorizationCostModel::isLegalMasked*
  • Marked canVectorizeWithPredication as const.
  • Rebased on rG459c48e04f25 (which also removes VF from isPredicatedInst)

I'd like to understand the main motivation and the reasons why it cannot be done via cost-modeling a bit better. IIUC the main problem is the case where scalable vectorization is forced by the user via pragmas or options? In the auto-vec case, the cost-model would already just skip scalable VFs in those cases as not profitable?

You mention ... we can't yet fall back on scalarization. From that I gather you plan to add support for that in the future? If we could scalarize arbitrary instructions for scalable vectors, would we need to do the check during the legality checks? If it would be possible to scalarize, even if expensive or hard to estimate, it seems to me that the current code should work without change, right?

That's not really the case. The legalizer will ask if the LV can vectorize the loop. Some scalable operations cannot be legalized, because they'd require scalarization, which doesn't exist and there is no plan to add support for such scalarization, so instead we need to make sure that any IR that's generated can be code-generated. Subsequently, this means that the CostModel should always return a valid cost for scalable operations. This is the conceptual split between legalization and the cost-model. i.e. If something is legal, we must be able to cost it. It is also an extra guard to make sure the cost-model is complete.

This is not the only argument for this patch though. As I tried to explain in the commit message, this refactoring also helps untangle the circular dependence between the [choosing of the] VF from the cost-model, and the decision on whether tail predication is needed. e.g. For a trip-count of 100 and a MaxVF of 8, it would not decide the tail is unnecessary, whereas it could decide per VF whether it needs a tail (e.g. for a VF=4, no tail is needed). So there is a possible improvement here. Having isScalarWithPredication and blockNeedsPredication combine both the "needs tail predication" and "block/instruction needs predication" isn't helpful, hence the reason for splitting the two concerns in this patch.

Note that this same distinction is needed when choosing a scalable VF (we can't decide the tail is unnecessary if we don't know the value of vscale, whereas for a fixed-width VF it would be able to remove the tail, but we can't decide that before calling the cost-model if we don't know whether we choose a fixed-width or scalable VF).

Matt added a subscriber: Matt.May 14 2021, 12:17 PM

Gentle ping.

@bmahjour are you happy with the patch now that I've split off the possibly functional changes? (which should mean this patch is now NFC)

@fhahn were you satisfied by my reasoning?
If so, then I can create a follow-up patch that addresses the FIXME that I left here, and also improve the cost-model for the case where it currently can't eliminate the tail if the trip-count isn't divisible by the MaxVF (where it might pick a VF where the tail can be eliminated without requiring predication)

fhahn added a comment.Mon, May 24, 1:16 AM

That's not really the case. The legalizer will ask if the LV can vectorize the loop. Some scalable operations cannot be legalized, because they'd require scalarization, which doesn't exist and there is no plan to add support for such scalarization, so instead we need to make sure that any IR that's generated can be code-generated.

I am not sure if it was very clear from my earlier comment, I was referring to scalarization + predicated block generation done directly by LV.

For regular vectors, LV does not rely on the backend for scalarization, but it instead creates a loop that iterates over each index for the VF and in the loop conditionally executes the scalar version of the requested instruction and packs the result in a vector. For fixed width VFs the loop is not explicit in the generated IR, because it is just unrolled for the VF. Is there anything fundamental preventing LV to create a similar explicit loop over the scalable VF at runtime? Figuring out an accurate cost would be expensive of course, but I am curious if such scalarization would be possible.

Not sure if I am missing something, but after a quick glance it seems to me that we should have all required pieces in LLVM IR to write such a loop and lower it https://llvm.godbolt.org/z/rhqh9fz8b

Subsequently, this means that the CostModel should always return a valid cost for scalable operations. This is the conceptual split between legalization and the cost-model. i.e. If something is legal, we must be able to cost it. It is also an extra guard to make sure the cost-model is complete.

Agreed. The direction of my comments tries to figure out if there's something LV can do to scalarize arbitrary operations on scalable vectors or not.

This is not the only argument for this patch though. As I tried to explain in the commit message, this refactoring also helps untangle the circular dependence between the [choosing of the] VF from the cost-model, and the decision on whether tail predication is needed. e.g. For a trip-count of 100 and a MaxVF of 8, it would not decide the tail is unnecessary, whereas it could decide per VF whether it needs a tail (e.g. for a VF=4, no tail is needed). So there is a possible improvement here. Having isScalarWithPredication and blockNeedsPredication combine both the "needs tail predication" and "block/instruction needs predication" isn't helpful, hence the reason for splitting the two concerns in this patch.

My comment is only referring to the 'moving to LVL' part of the patch. Not sure if anything bit of the untangling relies on moving the code?

fhahn added a comment.Mon, May 24, 1:19 AM

Not sure if I am missing something, but after a quick glance it seems to me that we should have all required pieces in LLVM IR to write such a loop and lower it https://llvm.godbolt.org/z/rhqh9fz8b

Slightly updated link https://llvm.godbolt.org/z/nW8n6sMxr

I am not sure if it was very clear from my earlier comment, I was referring to scalarization + predicated block generation done directly by LV.

For regular vectors, LV does not rely on the backend for scalarization, but it instead creates a loop that iterates over each index for the VF and in the loop conditionally executes the scalar version of the requested instruction and packs the result in a vector. For fixed width VFs the loop is not explicit in the generated IR, because it is just unrolled for the VF. Is there anything fundamental preventing LV to create a similar explicit loop over the scalable VF at runtime? Figuring out an accurate cost would be expensive of course, but I am curious if such scalarization would be possible.

Not sure if I am missing something, but after a quick glance it seems to me that we should have all required pieces in LLVM IR to write such a loop and lower it https://llvm.godbolt.org/z/rhqh9fz8b

Thanks for clarifying. We have prototyped that approach before and our experiments showed it was never beneficial to use scalarization within a vectorized loop with scalable vectors, so we decided to keep the vectorizer simple and disallow scalable auto-vec when scalarization is required. In the case of operations like SDIV, our plan is to emit a llvm.vp intrinsic which can then be ISel'ed natively (using a native instruction w/ predication). It seemed better to disallow it first to avoid any vectorization failures, and then bring up the implementation to support this case.

Subsequently, this means that the CostModel should always return a valid cost for scalable operations. This is the conceptual split between legalization and the cost-model. i.e. If something is legal, we must be able to cost it. It is also an extra guard to make sure the cost-model is complete.

Agreed. The direction of my comments tries to figure out if there's something LV can do to scalarize arbitrary operations on scalable vectors or not.

Alright, makes sense.

This is not the only argument for this patch though. As I tried to explain in the commit message, this refactoring also helps untangle the circular dependence between the [choosing of the] VF from the cost-model, and the decision on whether tail predication is needed. e.g. For a trip-count of 100 and a MaxVF of 8, it would not decide the tail is unnecessary, whereas it could decide per VF whether it needs a tail (e.g. for a VF=4, no tail is needed). So there is a possible improvement here. Having isScalarWithPredication and blockNeedsPredication combine both the "needs tail predication" and "block/instruction needs predication" isn't helpful, hence the reason for splitting the two concerns in this patch.

My comment is only referring to the 'moving to LVL' part of the patch. Not sure if anything bit of the untangling relies on moving the code?

I guess it could equally live in CM, but my reasoning to move it was more that it has nothing to do with the cost-model, so moving it to LVL avoids it getting polluted with other CM-specific code in the future.

fhahn added a comment.Tue, Jun 1, 1:50 AM

I am not sure if it was very clear from my earlier comment, I was referring to scalarization + predicated block generation done directly by LV.

For regular vectors, LV does not rely on the backend for scalarization, but it instead creates a loop that iterates over each index for the VF and in the loop conditionally executes the scalar version of the requested instruction and packs the result in a vector. For fixed width VFs the loop is not explicit in the generated IR, because it is just unrolled for the VF. Is there anything fundamental preventing LV to create a similar explicit loop over the scalable VF at runtime? Figuring out an accurate cost would be expensive of course, but I am curious if such scalarization would be possible.

Not sure if I am missing something, but after a quick glance it seems to me that we should have all required pieces in LLVM IR to write such a loop and lower it https://llvm.godbolt.org/z/rhqh9fz8b

Thanks for clarifying. We have prototyped that approach before and our experiments showed it was never beneficial to use scalarization within a vectorized loop with scalable vectors, so we decided to keep the vectorizer simple and disallow scalable auto-vec when scalarization is required. In the case of operations like SDIV, our plan is to emit a llvm.vp intrinsic which can then be ISel'ed natively (using a native instruction w/ predication). It seemed better to disallow it first to avoid any vectorization failures, and then bring up the implementation to support this case.

I see, thanks for elaborating! I am not sure how much extra work generating such loops would be, but it seems like disallowing scalarization requires quite a bit of extra complexity as a number of recent patches that introduce additional restrictions show.

The extra checks also seem introduce various TTI checks that seem to mirror the interfaces that compute cost. Do you know how this will impact other targets with scalable vectors? Do you know if scalarization is similarly expensive on other scalable targets? The naive predication/scalarization is also very expensive on most fixed vector targets and almost never profitable, but it simplifies the rest of the code, because LV can generate vector IR for almost all instructions mechanically.

My comment is only referring to the 'moving to LVL' part of the patch. Not sure if anything bit of the untangling relies on moving the code?

I guess it could equally live in CM, but my reasoning to move it was more that it has nothing to do with the cost-model, so moving it to LVL avoids it getting polluted with other CM-specific code in the future.

From the discussion above, it seems to me that it is still more of a cost question rather than a legality question, as IIUC from the discussion, they could be vectorized by scalarizing via a loop? Also, it looks like all uses of the functions moved to LVL are in the cost-model. The replacement of isLegalMaskedScatter seems also unrelated and could be submitted separately.

sdesmalen updated this revision to Diff 349514.Thu, Jun 3, 4:02 AM
  • Rebased patch after splitting out some changes.
  • Refactored patch to better split concerns between legalization vs cost-model.
sdesmalen retitled this revision from [LV] NFCI: Move implementation of isScalarWithPredication to LoopVectorizationLegality to [LV] NFC: Decouple foldTailByMasking from isScalarWithPredication..Thu, Jun 3, 4:03 AM
sdesmalen edited the summary of this revision. (Show Details)

I see, thanks for elaborating! I am not sure how much extra work generating such loops would be, but it seems like disallowing scalarization requires quite a bit of extra complexity as a number of recent patches that introduce additional restrictions show.

The extra checks also seem introduce various TTI checks that seem to mirror the interfaces that compute cost. Do you know how this will impact other targets with scalable vectors? Do you know if scalarization is similarly expensive on other scalable targets? The naive predication/scalarization is also very expensive on most fixed vector targets and almost never profitable, but it simplifies the rest of the code, because LV can generate vector IR for almost all instructions mechanically.

I'll reply to this separately since it's not the only motivation for this patch.

I guess it could equally live in CM, but my reasoning to move it was more that it has nothing to do with the cost-model, so moving it to LVL avoids it getting polluted with other CM-specific code in the future.

From the discussion above, it seems to me that it is still more of a cost question rather than a legality question, as IIUC from the discussion, they could be vectorized by scalarizing via a loop? Also, it looks like all uses of the functions moved to LVL are in the cost-model. The replacement of isLegalMaskedScatter seems also unrelated and could be submitted separately.

Fair point, I've changed the patch a little bit which in my eyes improves the interfaces and splits out the concerns a bit better by providing interfaces to two distinct questions:

  • requiresPredicatedWidening: This answers the question "Does this operation require predication when being widened". I've moved this interface to LVL because it describes a property of the operation and adds a requirement to what capability is needed in order to vectorize it. It does not implement a cost-model choice of how this is eventually widened by the LV. This function was initially coined isPredicatedInst, but I find this name a bit more descriptive (also, the original implementation was 'backward', since it used isScalarWithPredication to determine whether the operation was predicated). If you feel strongly about the original name, or having this in the CM, I'm happy to change that.
  • isScalarWithPredication: This answers the question "If this operation needs predication, do we have to fall back on scalarizing it?". This is a LV/cost-model decision, hence why it lives in the CM.

Does this make more sense?

Note that I've committed the changes for isLegalMaskedScatter|Gather separately in rG034503e9d2d66ab75679ab5d2ee0848f4de3cac7.

SjoerdMeijer added inline comments.Wed, Jun 9, 2:40 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1354

Just wanted to check what we mean by speculation here (or why it is relevant here)?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1517

nit: I -> \p I

Hmm, strange some comments went missing when I pushed the button previously.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1343

I think this is related to the legal vs. cost-model discussion. I agree that it is probably more a cost-model thing. Does that mean this needs to live in CostModel, where it used to live?

SjoerdMeijer added inline comments.Wed, Jun 9, 2:47 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5307

I am confused here about names or some negations in logic, but probably it is names.

Here, if we "do not require predicated widening", we return false for isScalarWithPredication, whereas I probably was expecting true. Ignoring some edge cases, requiresPredicatedWidening returns true for most cases because all operations can be widened safely without predication. So probably I am confused that we are talking about widening and isScalar at the same time, or I am missing something else.

sdesmalen added inline comments.Wed, Jun 9, 3:19 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1343

Okay, if you feel the same way, I'll just move it to the CostModel!

1354

The reason I initially added the comment was because I would have expected return true; (given that the block needs predication or the VF Iteration is predicated), but it seems LVL uses a more detailed analysis to check whether the operation can be speculatively executed, reflected in isMaskRequired(I).

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5307

requiresPredicatedWidening returns true for most cases because all operations can be widened safely without predication

requiresPredicatedWidening returns false for most cases, because most operations can be widened safely without requiring predication.

I've tried to split up two questions:
requiresPredicatedWidening: Given the use of predication in the loop (either from tail folding, or from the operation being under an actual if-then condition), does widening the operation require predication, or conversely: can the operation speculatively execute all lanes followed by a select on the output. If the answer to the latter question is 'no' (like the case for DIV), then the instruction requires a predicated vector operation (<=> predicated widening).
isScalarWithPredication: Given that the operation needs predicated widening, can we actually widen it, or do we need to scalarise it? If it doesn't require predicated widening, we know it also doesn't need to be scalarised (hence the condition to return false here)

Does that help?

sdesmalen added inline comments.Wed, Jun 9, 3:46 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1343

Just to clarify the original reason to move it to Legality was for a similar reason as that there exists LoopVectoriztionLegality::blockNeedsPredication, which describes a property of the original loop when vectorizing. The same property could be queried for a specific instruction, as if the interface was named instructionNeedsPredication (instead I named it requiresPredicatedWidening).

Also, by moving it out of the CostModel, we avoid private cost-model info to pollute the code such as the foldTailByMasking(), which this patch tries to split out)

SjoerdMeijer added inline comments.Wed, Jun 9, 7:00 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5307

Thanks, that definitely helped, and I now see where I went wrong.

These explanations are better (in my opinion) than the current comments for these functions. If you can update those comments that would be good.

I am now going to read the patch again.