Page MenuHomePhabricator

[LoopVectorize] When tail-folding, don't always predicate uniform loads
ClosedPublic

Authored by david-arm on Oct 26 2021, 8:52 AM.

Details

Summary

In VPRecipeBuilder::handleReplication if we believe the instruction
is predicated we then proceed to create new VP region blocks even
when the load is uniform and only predicated due to tail-folding.

I have updated isPredicatedInst to avoid treating a uniform load as
predicated when tail-folding, which means we can do a single scalar
load and a vector splat of the value.

Tests added here:

Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll

Diff Detail

Event Timeline

david-arm created this revision.Oct 26 2021, 8:52 AM
david-arm requested review of this revision.Oct 26 2021, 8:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2021, 8:52 AM
fhahn added a comment.Oct 26 2021, 2:46 PM

Does the test need to be aarch64 specific or could it be target independent?

Does the test need to be aarch64 specific or could it be target independent?

Hi @fhahn, I think that's a really good idea. The main problem I had when trying this is that tail-folding requires masked load/store support, so when you make this target-independent we end up scalarising the loads/stores and never expose the path I changed in this patch. If there is a way to force masked loads to be legal then I could try that?

Does the test need to be aarch64 specific or could it be target independent?

Hi @fhahn, I think that's a really good idea. The main problem I had when trying this is that tail-folding requires masked load/store support, so when you make this target-independent we end up scalarising the loads/stores and never expose the path I changed in this patch. If there is a way to force masked loads to be legal then I could try that?

From what I can see, there doesn't seem to be a way to force masked loads to be legal, and personally I don't think it is a big-enough deal to create a new option for it.
Maybe you can add a comment to the test describing that the test isn't really target-specific (other than the need for TTI.isLegalMaskedLoad to return true) ?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9256

I see you've followed the message above, but I'm confused by the terminology here. From the test it does the opposite of scalarising (e.g. load scalar value + splat), which I personally associate more with 'widening'. So perhaps "Replicating uniform load in predicated loop:" would be more appropriate? (assuming that replicating can be done in multiple ways, not necessarily by scalarizing)

llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll
48

The output of this test is unchanged by your code. Is it worth adding this test as a NFC change ("adding tests for uniform memops") and then rebasing your patch? That way it's clear what your patch has changed from the test.

david-arm updated this revision to Diff 382645.Oct 27 2021, 6:33 AM
  • Changed debug wording.
  • Rebased off NFC patch.
sdesmalen accepted this revision.Oct 29 2021, 7:33 AM

This looks good to me. We know the load will be executed to populate at least one of the lanes so the scalar uniform load can be performed unconditionally within the vector body.

llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll
10

nit: I'd suggest adding this test in D11261 as well, so that you can see the difference in this patch.

This revision is now accepted and ready to land.Oct 29 2021, 7:33 AM
fhahn requested changes to this revision.Oct 31 2021, 1:10 PM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9253

The problem with the workaround is that IsPredicated is now inaccurate for the recipe I think.

It looks like the cost-model already accurately estimates the cost as not requiring predication. So perhaps it would be better to not claim the instruction requires predication in the first place in LoopVectorizationCostModel::isPredicated?

This revision now requires changes to proceed.Oct 31 2021, 1:10 PM
david-arm added inline comments.Nov 1 2021, 2:05 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9253

Hi @fhahn, are you suggesting that IsPredicated is being set incorrectly above? If I understand correctly, I believe you're suggesting the answer would be to add more logic to isPredicatedInst? i.e. something like:

bool isPredicatedInst(Instruction *I, bool IsUniform) {
  if (IsUniform && isa<LoadInst>(I) &&
    !Legal->blockNeedsPredication(I->getParent()))
    return false;

  if (!blockNeedsPredication(I->getParent()))
  ...
}

That feels a bit messy. Would it be better to simply adjust the IsPredicated boolean value instead in this function?

david-arm updated this revision to Diff 383778.Nov 1 2021, 4:51 AM
  • Made sure we don't set IsPredicated in the VPReplicateRecipe.
david-arm marked an inline comment as done.Nov 1 2021, 4:53 AM

Hi @fhahn, I've fixed the immediate problem with IsPredicated and now it is set correctly. If you still think it's better to change the isPredicatedInst interface I'm happy to give it a try?

sdesmalen added inline comments.Nov 5 2021, 4:36 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9251

nit: unrelated whitespace change?

9253

Hi @david-arm, I think that could just be:

// Returns true if \p I is an instruction that will be predicated either
// through scalar predication or masked load/store or masked gather/scatter.
// Superset of instructions that return true for isScalarWithPredication.
bool isPredicatedInst(Instruction *I, bool IsUniform = false) {
  if (!blockNeedsPredication(I->getParent()))
    return false;
  // Uniform loads don't require predication.
  if (isa<LoadInst>(I) && IsUniform)
    return false;
  // Loads and stores that need some form of masked operation are predicated
  // instructions.
  if (isa<LoadInst>(I) || isa<StoreInst>(I))
    return Legal->isMaskRequired(I);
  return isScalarWithPredication(I);
}

Which seems like a sensible change to me and isn't really that messy?

Looking at the uses of that function, should the then-block of if (isPredicatedInst(I)) in getMemInstScalarizationCost have an assert that the the value is not uniform?

fhahn added inline comments.Wed, Nov 10, 2:02 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9253

That feels a bit messy. Would it be better to simply adjust the IsPredicated boolean value instead in this function?

Why do you think this would be messy? If we treat those loads as unpredicted during codegen, claiming they will be predicated earlier seems inconsistent and may lead to surprises down the road for cases they are out-of-sync.

david-arm added inline comments.Wed, Nov 10, 2:54 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9253

It's because of the confusion between CM.blockNeedsPredication and Legal->blockNeedsPredication. Vectorised blocks created when tail-folding are something artificial we have introduced during vectorisation and we know that there is always at least one active lane. I can fix this as you suggest *if* I distinguish between the two, i.e.

bool isPredicatedInst(Instruction *I, bool IsUniform) {
  if (IsUniform && isa<LoadInst>(I) &&
    !Legal->blockNeedsPredication(I->getParent()))
    return false;

  if (!blockNeedsPredication(I->getParent()))
  ...
}

It just looks very confusing that's all due to having to call two different forms of blockNeedsPredication. I will go with the code above unless you can think of a cleaner solution?

fhahn added inline comments.Mon, Nov 15, 12:49 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9253

Sounds good, especially with the renaming to Legal's blockNeedsPredication.

david-arm updated this revision to Diff 387955.Wed, Nov 17, 8:31 AM
  • Changed the patch to add the uniform load check in isPredicatedInst instead.
fhahn accepted this revision.Fri, Nov 26, 12:51 AM

LGTM, thanks! Please make sure to update the commit message; the current one is very much out-of-date.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1635

I think it would be good to add the explanation from your earlier comment here to help the reader.

This revision is now accepted and ready to land.Fri, Nov 26, 12:51 AM
sdesmalen accepted this revision.Fri, Nov 26, 1:12 AM

I agree with @fhahn that an extra comment would be useful.

david-arm updated this revision to Diff 389945.Fri, Nov 26, 1:52 AM
david-arm edited the summary of this revision. (Show Details)
  • Added comment.