Page MenuHomePhabricator

[LoopVectorize] Change PredicatedBBsAfterVectorization to be per VF
ClosedPublic

Authored by david-arm on Jun 21 2022, 6:38 AM.

Details

Summary

When calculating the cost of Instruction::Br in getInstructionCost
we query PredicatedBBsAfterVectorization to see if there is a
scalar predicated block. However, this meant that the decisions
being made for a given fixed-width VF were affecting the cost for a
scalable VF. As a result we were returning InstructionCost::Invalid
pointlessly for a scalable VF that should have a low cost. I
encountered this for some loops when enabling tail-folding for
scalable VFs.

Test added here:

Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll

Diff Detail

Event Timeline

david-arm created this revision.Jun 21 2022, 6:38 AM
david-arm requested review of this revision.Jun 21 2022, 6:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2022, 6:38 AM
sdesmalen added inline comments.Jun 21 2022, 7:57 AM
llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll
9

nit: can you clean up this test a bit? dso_local nocapture, etc aren't really required for the LV test.
Also, can you use some constant FP value instead of loading from @globval ?

17

What in this loop is causing PredicatedBBsAfterVectorization to be true for fixed-width vectors?

david-arm added inline comments.Jun 22 2022, 7:00 AM
llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll
17

It's the uniform load of @globval, which causes LoopVectorizationCostModel::collectInstsToScalarize to insert the BB into PredicatedBBsAfterVectorization.

david-arm added inline comments.Jun 22 2022, 7:06 AM
llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll
9

I can try using other types of uniform load, so long as the load is treated as scalar with predication for a fixed-width VF.

david-arm updated this revision to Diff 439032.Jun 22 2022, 8:31 AM
  • Tidied up test and removed the globvar.
david-arm marked 2 inline comments as done.Jun 22 2022, 8:31 AM
sdesmalen added inline comments.Jun 23 2022, 2:46 AM
llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll
7

nit: Can this test be reduced a little bit more, I'm not sure if %boff, %coff and all of the pointers are really needed for this test.

17

Okay thanks, can you just add a comment to the test describing this? I didn't realise that the load from @globval had this effect.

david-arm updated this revision to Diff 440986.Jun 29 2022, 6:27 AM
  • Simplified test case.
david-arm marked 2 inline comments as done.Jun 29 2022, 6:27 AM
Matt added a subscriber: Matt.Jun 30 2022, 5:20 PM

Thanks for the changes @david-arm. I just have a few more comments on the test.

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll
14–16

If it selects VF=vscale x 4 then it seems redundant to check there is no invalid cost for that chosen VF.

If you want to check that it uses a scalable VF, you don't need the debug-output to tell you it chose that. You could remove the REQUIRES: asserts, the -debug option from the RUN line and instead check for <vscale x 4 x i32> in the resulting IR. What do you think?

Just a note that this test may become redundant in the future if SVE is used for fixed-width vectors and it can lower fixed-width masked load/store operations to use SVE. Maybe you can add a note here that this assumption is important for the test and may change in the future?

david-arm updated this revision to Diff 442207.Jul 5 2022, 1:58 AM
  • Removed checks for debug output.
david-arm marked an inline comment as done.Jul 5 2022, 1:58 AM
david-arm updated this revision to Diff 443902.Jul 12 2022, 3:26 AM
  • Updated comment in the test to be more readable!
sdesmalen accepted this revision.Jul 12 2022, 6:04 AM

Thanks @david-arm, LGTM.

This revision is now accepted and ready to land.Jul 12 2022, 6:04 AM
This revision was landed with ongoing or failed builds.Jul 12 2022, 7:24 AM
This revision was automatically updated to reflect the committed changes.