This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Teach cost model that masked loads/stores are cheap
ClosedPublic

Authored by MattDevereau on Aug 6 2021, 8:19 AM.

Details

Summary

Reduce the cost of VLS loads/stores to make the vectorizor emit them more frequently.

Diff Detail

Event Timeline

MattDevereau created this revision.Aug 6 2021, 8:19 AM
MattDevereau requested review of this revision.Aug 6 2021, 8:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 6 2021, 8:19 AM
MattDevereau updated this revision to Diff 364808.EditedAug 6 2021, 8:35 AM

updated the diff to show more context

bsmith added inline comments.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1519

What's the rationale behind this cost for fixed types?

llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll
2 ↗(On Diff #364808)

This test is for scalable types, hence the changes in here are testing the cost of scalable masked loads/stores rather than fixed ones. You'll need to add functions (probably in another test file) that use the vscale_range attribute to specify various different fixed vector lengths)

Matt added a subscriber: Matt.Aug 6 2021, 8:59 AM
david-arm added inline comments.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1505

I don't think we should be removing this because it's needed when we don't know the SVE vector length, in which case we will have to scalarise the masked ops. I think you can add an extra check here to see if we're using SVE for fixed width vectors.

1519

Is this due to the extra predicate we have to create? If so, that's also true for SVE since we'll need a ptrue and it's highly likely to get reused anyway or hoisted out of a loop.

llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll
6 ↗(On Diff #364808)

Again, I think these changes look wrong - they should be high because we are going to scalarise the operations, since we haven't specified the SVE vector length so we can't use SVE or NEON.

MattDevereau added inline comments.Aug 6 2021, 9:08 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1519

A cost of 3*vscale was suggested as a suitable cost for VLS compared to the current VLA cost of 2*vscale. Rather than edit TLI->getTypeLegalizationCost(DL, Src); directly, this seemed like the simplest way of achieving it.

llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll
2 ↗(On Diff #364808)

What is the difference between this test called "fixed" and the test called "scalable" further down then?

paulwalker-arm added inline comments.Aug 6 2021, 9:11 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1505

Perhaps just add something like && !ST->useSVEForFixedLengthVectors()

llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll
2 ↗(On Diff #364808)

@bsmith Can this not be done using multiple RUN lines instead? I'm hoping that a single set of CHECK lines can be used by utilising FileChecks math functions.

bsmith added inline comments.Aug 6 2021, 9:41 AM
llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll
2 ↗(On Diff #364808)

That's probably a better approach yes, to avoid duplicating things.

2 ↗(On Diff #364808)

The 'fixed' test is testing the fixed LLVM IR types when using scalable codegen (i.e. fixed types are treated as Neon types, not SVE), the 'scalable' test is testing scalable types using scalable codegen. You need a test that checks fixed types using fixed codegen (of various sizes).

That is to say, as per Paul's approach, you need some additional run lines that specify -aarch64-sve-vector-bits-min=<value>, which result in different costs for the the masked load/stores depending on the value of <value>

Changed the cost model by keeping the scalarised NEON costs for 128bit width vectors, but use the SVE costs for larger VLS sizes. Added a new regression test to assert the cost-model estimates depending on VLS width

paulwalker-arm added inline comments.Aug 16 2021, 3:36 AM
llvm/test/Analysis/CostModel/AArch64/masked_ldst_vls.ll
2

This should be removed as I imagine it's not true and you just inherited it from a file you copied.

3

Rather than having -mtriple=aarch64-linux-gnu -mattr=+sve on ever RUN line can you use LLVM IR directly. For example:

target triple = "aarch64-unknown-linux-gnu"

define void @fixed-sve-vls() #0 {
....
}

attributes #0 = { "target-features"="+sve" }

It just means you need to know less of the details when running the test manually.

30

Storing vectors of i1 is a thorny issue so I think it best to not bother testing them just yet.

Removed i1 vector from regression test, added some LLVM IR syntax cleanup to regression test

Removed -mattr=sve from RUN lines in regression test

llvm/test/Analysis/CostModel/AArch64/masked_ldst_vls.ll
20

Minor nit: functions with dashes in the name are quite rare. I'm a little surprised it's allowed! Out of ~240k function definitions in the LLVM test suite, only 28 of them have dashes, and ~187k have underscores, so I'd go with the majority here.

renamed fixed-sve-vl to fixed_sve_vl, use useNeonVector() function instead of verbose if statement

paulwalker-arm accepted this revision.Aug 18 2021, 4:51 AM
This revision is now accepted and ready to land.Aug 18 2021, 4:51 AM