This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter
ClosedPublic

Authored by RosieSumpter on Dec 8 2021, 4:15 AM.

Details

Summary

This is required to query the legality more precisely in the LoopVectorizer.

This adds another TTI function named 'forceScalarizeMaskedGather/Scatter'
function to work around the hack introduced for MVE, where
isLegalMaskedGather/Scatter would return an answer by second-guessing
where the function was called from, based on the Type passed in (vector
vs scalar). The new interface makes this explicit. It is also used by
X86 to check for vector widths where gather/scatters aren't profitable
(or don't exist) for certain subtargets.

Diff Detail

Event Timeline

RosieSumpter created this revision.Dec 8 2021, 4:15 AM
RosieSumpter requested review of this revision.Dec 8 2021, 4:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2021, 4:15 AM
david-arm added inline comments.Dec 8 2021, 4:51 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1590

According to the default VF = ElementCount::getFixed(1) it looks like we can still have an inconsistency during vectorisation where we get different answers? I think it's also called from collectElementTypesForWidening so we could potentially say a gather is legal when the type is scalar, but then say it's illegal in setCostBasedWideningDecision. I wonder if it's worth adding a comment there saying something like:

// We're simply querying at this point if the target even supports any vector
// gathers and scatters for the given element type? Certain vector forms may
// still be illegal, but setCostBasedWideningDecision will distinguish between
// the legal behaviour for different VFs at that point and generate costs accordingly.

Also, does it matter that LoopVectorizationCostModel::isScalarWithPredication is still passing in an element type rather than a vector? For example, collectLoopUniforms calls isScalarWithPredication and at that point knows the VF. It may be fine, but I'm just a bit worried we might be swapping one inconsistency for another that's all.

sdesmalen added inline comments.Dec 8 2021, 6:12 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1590

The code in collectElementTypesForWidening seems like a hack to get the LV to choose a wider VF and it seems specific to X86. If I replace that whole expression with if (!T->isSized()) continue, I only get a single test that fails because it explicitly tests for it to use a wider VF. I doubt this is the right mechanism to use for that kind of purpose going forward.

In any case, if collectElementTypesForWidening would consider using a wider VF, it's still the cost-model that will then narrow that down (after this patch), because passing in the VF will result in a more restricted answer for isLegalMaskedGather.

Also, does it matter that LoopVectorizationCostModel::isScalarWithPredication is still passing in an element type rather than a vector? For example, collectLoopUniforms calls isScalarWithPredication and at that point knows the VF. It may be fine, but I'm just a bit worried we might be swapping one inconsistency for another that's all.

I don't know if there's an actual problem atm, but I agree it makes sense to pass it to isScalarWithPredication as well. It should be possible since VF is available in all functions that call it.

MVE part sounds OK to me. Is the X86 part expected to be the same or better?

lebedev.ri added inline comments.Dec 9 2021, 1:33 AM
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
5122–5139

Can you move this code into forceScalarizeMaskedGather()?
This should not have been here, LV should rely on cost model.

RosieSumpter edited the summary of this revision. (Show Details)
  • Moved code which checks if masked gathers/scatters are legal for the given vector width to forceScalarizeMaskedGather/Scatter for X86
  • Pass VF to isScalarWithPredication so that we are consistently passing a vector type to isLegalMaskedGather/Scatter
  • Add minimum vscale attribute to the necessary AArch64 tests
  • Remove changes to X86 tests (X86 behaviour is now unchanged by this patch)
RosieSumpter marked an inline comment as done.Dec 10 2021, 3:27 AM

@lebedev.ri/@fhahn given that there exists a pass that scalarizes the masked load/store/gather/scatter intrinsics, is there a reason why we wouldn't always want to choose the intrinsic representation over scalarizing it in the LV? It seems a bit odd to want to trick the LV to think that a masked gather is legal, but then still scalarize it in the end. This means it is not using the right cost-model, unless it reimplements the scalarization costs, like is done by X86TargetTransformInfo::getGSScalarCost.

llvm/lib/Target/ARM/ARMTargetTransformInfo.h
194

nit: s/can to/can lower to/

195–196

nit: s/, so if we are here, we know we want to expand//

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
5125

This is casting to FixedVectorType, so I think we should we make the interface take VectorType instead of Type.

@lebedev.ri/@fhahn given that there exists a pass that scalarizes the masked load/store/gather/scatter intrinsics,
is there a reason why we wouldn't always want to choose the intrinsic representation over scalarizing it in the LV?
It seems a bit odd to want to trick the LV to think that a masked gather is legal, but then still scalarize it in the end.
This means it is not using the right cost-model, unless it reimplements the scalarization costs,
like is done by X86TargetTransformInfo::getGSScalarCost.

The fact that something is/isn't legal does not mean that LV will pick *that* particular path.
It merely allows LV to consider this path, among the others,
and out of all the legal paths, pick the most profitable one.

We don't "trick LV to think it's legal and then scalarize anyway", it *is* legal.
But for those particular VF's, it's more costly than the native scalarized path.
So LV won't actually produce gather intrinsics, and naturally, they won't be scalarized.

Furthermore, no, while we could just always prefer the intrinsics (from correctness viewpoint),
that is detrimental to the performance. See e.g. my recent patches, namely D111220 / D111363 / D111546.

So no, what happens is correct.

The fact that something is/isn't legal does not mean that LV will pick *that* particular path.
It merely allows LV to consider this path, among the others,
and out of all the legal paths, pick the most profitable one.

The LV makes the decision to use gathers/scatters because the cost-model said the gather/scatter cost was lower than the scalarization cost, but that only works if the gather/scatter cost is correct. If it will be scalarized, then I'd expect this costs to be no different than the scalarization cost.

We don't "trick LV to think it's legal and then scalarize anyway", it *is* legal.

I'm not sure what the meaning of "legal" is in that case. My understanding of the term is that it means that the target can handle the operation directly without legalization, e.g. using a special instruction or a sequence of instructions. From that perspective, saying that <operation> is legal, means it will not need to be scalarized. (conversely, the ScalarizeMaskedOperations pass only scalarizes the intrinsic if it is *not* marked as legal). If the LV asks the question "is a masked gather operation legal?" and it returns true, it seems wrong to me to query the cost as if it were a gather rather than asking for the scalarization cost, but then later still decide to scalarize the operation.

But for those particular VF's, it's more costly than the native scalarized path.
So LV won't actually produce gather intrinsics, and naturally, they won't be scalarized.

Furthermore, no, while we could just always prefer the intrinsics (from correctness viewpoint),
that is detrimental to the performance. See e.g. my recent patches, namely D111220 / D111363 / D111546.

This is the answer I was after, thanks for pointing me to those patches!

  • Change data type parameter for forceScalarizeMaskedGather/Scatter to be a VectorType instead of a Type
  • Address nits

The fact that something is/isn't legal does not mean that LV will pick *that* particular path.
It merely allows LV to consider this path, among the others,
and out of all the legal paths, pick the most profitable one.

The LV makes the decision to use gathers/scatters because the cost-model said the gather/scatter cost was lower than the scalarization cost, but that only works if the gather/scatter cost is correct. If it will be scalarized, then I'd expect this costs to be no different than the scalarization cost.

If the cost model is wrong then lots of weird shit will go wrong, yes.

We don't "trick LV to think it's legal and then scalarize anyway", it *is* legal.

I'm not sure what the meaning of "legal" is in that case.

That the gather with VF=2 can be done by the same instruction as for the VF=4, but after padding the mask with zeros.

My understanding of the term is that it means that the target can handle the operation directly without legalization, e.g. using a special instruction or a sequence of instructions.

From that perspective, saying that <operation> is legal, means it will not need to be scalarized.

Define "need".
It is not much different from what VectorCombine does ("would scalar cost be lower than vector cost?
if so, scalarize" or the other way around), just implemented in an another pass.

(conversely, the ScalarizeMaskedOperations pass only scalarizes the intrinsic if it is *not* marked as legal). If the LV asks the question "is a masked gather operation legal?" and it returns true, it seems wrong to me to query the cost as if it were a gather rather than asking for the scalarization cost, but then later still decide to scalarize the operation.

But for those particular VF's, it's more costly than the native scalarized path.
So LV won't actually produce gather intrinsics, and naturally, they won't be scalarized.

Furthermore, no, while we could just always prefer the intrinsics (from correctness viewpoint),
that is detrimental to the performance. See e.g. my recent patches, namely D111220 / D111363 / D111546.

This is the answer I was after, thanks for pointing me to those patches!

Cheers.

RosieSumpter marked 3 inline comments as done.Dec 13 2021, 8:34 AM

From that perspective, saying that <operation> is legal, means it will not need to be scalarized.

Define "need".
It is not much different from what VectorCombine does ("would scalar cost be lower than vector cost?
if so, scalarize" or the other way around), just implemented in an another pass.

I see your point, it just seemed odd to me that for X86 this required reimplementing a different scalarisation cost rather than relying on the mechanisms already present in LoopVectorize, which tries to cost as accurately as possible whether to scalarize or use the intrinsics.
To me that suggests that the scalarization cost in LoopVectorize is inaccurate or there's improvements to be made to the code for scalarizing in the LoopVectorizer (or both) to make sure there is no disparity in performance.

As you explained, it's currently doing the right thing in the sense that it leads to the desired result.

sdesmalen added inline comments.Dec 14 2021, 6:05 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3377 ↗(On Diff #393907)

Hi @RosieSumpter this check should probably not be here anymore. Only isLegalMaskedGather should be used to determine whether or not to use an intrinsic for the gather operation.

  • Removed forceScalarizeMaskedGather check from SLPVectorizer.cpp (since we don't check it in LoopVectorize.cpp)
  • Updated necessary SLPVectorizer/X86 tests
RosieSumpter marked an inline comment as done.Dec 15 2021, 3:32 AM

Hi @lebedev.ri, the SLPVectorizer/X86/pr47629 tests have changed because isLegalMaskedGather now returns true for certain cases where it didn't before (due to the check on the number of vector elements now being in forceScalarizeMaskedGather as requested). It then calculates the gather/scatter cost, and because forceScalarizeMaskedGather returns true, it calculates the cost using getGSScalarCost. This cost is higher than before and so it chooses not to vectorize. An alternative approach would be to check forceScalarizeMaskedGather in isLegalMaskedGather instead of when calculating the cost, but this will then mean LoopVectorize will assume these operations need to be scalarized so will cause test failures there. What do you think the preferred option is here?

Hi @RosieSumpter, just wanted to let you know I'm happy with the patch. This looks like a good improvement to me. It conceptually makes more sense and it also allows AArch64TTI to better distinguish between Neon/SVE when using fixed-width vectors.
I want to accept the patch, but I'll defer this to @lebedev.ri because there's still an open question about the X86 cost-model.

Friendly ping @lebedev.ri, do you have any further comments for this patch?

No concerns from my side, i guess this is one of the reasonable solutions.

sdesmalen accepted this revision.Jan 4 2022, 2:50 AM
This revision is now accepted and ready to land.Jan 4 2022, 2:50 AM
fhahn added a comment.Jan 4 2022, 2:58 AM

Are the X86 behavior changes intentional?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1608

documentation needs to be updated.

1613

VF should be documented.

RosieSumpter marked 2 inline comments as done.
  • Updated descriptions of isScalarWithPredication and isPredicatedInst
  • rebased

Are the X86 behavior changes intentional?

Hi @fhahn, yes these changes are intentional - the cost model for X86 is now more accurate, and in some cases it ends up with a higher cost through getGSScalarCost because it knows the cost of the gather/scatter will be expanded later.