This is an archive of the discontinued LLVM Phabricator instance.

[LV] Replace fixed-order cost model with a SK_Splice shuffle
ClosedPublic

Authored by dmgreen on Aug 20 2022, 11:38 AM.

Details

Summary

The existing cost model for fixed-order recurrences models the phi as an extract shuffle of a v1 vector. The shuffle produced should be a splice, taking two vectors are extracting a subset of the lanes. On certain architectures the existing cost model can drastically under-estimate the correct cost for the shuffle, so this changes it to a SK_Splice, and passes a correct Mask through to the getShuffleCost call.

I believe this might be the first use of a SK_Splice shuffle cost model outside of scalable vectors, and some targets may require additions to the cost-model to correctly account for them.
https://godbolt.org/z/r8j69Kz9z

Diff Detail

Event Timeline

dmgreen created this revision.Aug 20 2022, 11:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 20 2022, 11:38 AM
dmgreen requested review of this revision.Aug 20 2022, 11:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 20 2022, 11:38 AM

existing cost model can drastically under-estimate the correct cost for the shuffle …. and some targets may require additions to the cost-model to correctly account for them.

Hm, sounds like your fix improves some targets and at the same time possibly regresses others?

existing cost model can drastically under-estimate the correct cost for the shuffle …. and some targets may require additions to the cost-model to correctly account for them.

Hm, sounds like your fix improves some targets and at the same time possibly regresses others?

For now, please can you add a SK_Splice -> SK_PermuteTwoSrc remapping to targets with getShuffleCost that don't recognise SK_Splice?

Hm, sounds like your fix improves some targets and at the same time possibly regresses others?

From looking through the backends that have any tests for "shufflevector", with the codegen from https://godbolt.org/z/r8j69Kz9z - it is Power, SystemZ and X86 that can produce a single instruction but do not have any costmodel for SK_Splice (AArch64 and RISCV have some costs). I've tried to add reviews from each of those architectures. Unfortunately the regressions under MVE were really large, like 250% in some cases.

For now, please can you add a SK_Splice -> SK_PermuteTwoSrc remapping to targets with getShuffleCost that don't recognise SK_Splice?

I believe that a SK_Splice will currently be costed as a SK_PermuteTwoSrc via https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/BasicTTIImpl.h#L946. This will be high for targets with an Splice/Ext/vpalignr/etc instruction, but will be more accurate on any architecture without it.

Looking again at the cost models, both Power and SystemZ seems to treat all shuffles as cheap as they support cheap arbitrary permutations. That just leaves X86, and whilst I would offer to try and adjust the costmodel I don't think I understand the many variations of the architecture enough to do it well. @RKSimon do you want me to add an if (Kind == TTI::SK_Splice) Kind = TTI::SK_PermuteTwoSrc; to X86TTIImpl::getShuffleCost, or are you happier to do it properly yourself?

I can add it to X86 now if you're happy that we don't have test coverage yet. Otherwise it'll have to wait until I can add isSpliceMask() detection for getInstructionCost

The AArch64 backend is tested with the llvm.experimental.vector.splice intrinsics, but yes we should add isSpliceMask detection eventually too.

Using splice seems more accurate here, thanks! It would be good to update at least X86 before landing this.

Also cc'ing @power-llvm-team and @jonpa for the PPC & SystemZ costs respectively.

llvm/test/Transforms/LoopVectorize/ARM/mve-recurrence.ll
10

It would probably be good to have the same test for X86 and AArch64 at least. THere's seems no cost-model test for recurrences on X86 at all, and on AArch64 we only have first-order recurrences.

I can add it to X86 now if you're happy that we don't have test coverage yet. Otherwise it'll have to wait until I can add isSpliceMask() detection for getInstructionCost

I have a candidate patch for isSpliceMask detection almost ready for this

I can add it to X86 now if you're happy that we don't have test coverage yet. Otherwise it'll have to wait until I can add isSpliceMask() detection for getInstructionCost

I have a candidate patch for isSpliceMask detection almost ready for this

D132374

dmgreen updated this revision to Diff 454711.Aug 23 2022, 12:00 AM

Added AArch64 and X86 tests, neither of which change with the updated costs from D132299 and D132374.

Also cc'ing @power-llvm-team and @jonpa for the PPC & SystemZ costs respectively.

I believe that power and systemz both treat all shuffles as cheap, so should not see a difference.

RKSimon added inline comments.Aug 23 2022, 1:54 AM
llvm/test/Transforms/LoopVectorize/X86/fixed-order-recurrence.ll
7 ↗(On Diff #454711)

Is there any change with/without this patch (and/or D132374)?

dmgreen added inline comments.Aug 23 2022, 1:58 AM
llvm/test/Transforms/LoopVectorize/X86/fixed-order-recurrence.ll
7 ↗(On Diff #454711)

For firstorderrec there is no difference.
For thirdorderrec below:

  • With D132374 there is no change.
  • Without D132374 the interleaving factor is lower, as the cost of the loop is higher. The VF is still the same.

This patch currently assumes that D132374 goes in first.

please can you commit the new tests and rebase to show the codegen changes?

dmgreen updated this revision to Diff 455090.Aug 24 2022, 12:30 AM

Rebase over the tests added in rGe29f9f7572d7 (and the updates to the cost models). As a result - no changes in the new fixed-order-recurrence.ll tests.

fhahn accepted this revision.Aug 24 2022, 12:49 AM

LGTM, thanks!

This revision is now accepted and ready to land.Aug 24 2022, 12:49 AM
RKSimon accepted this revision.Aug 24 2022, 1:35 AM

LGTM - cheers!

This revision was landed with ongoing or failed builds.Aug 24 2022, 5:00 AM
This revision was automatically updated to reflect the committed changes.