This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add cost model for llvm.get.active.lane.mask intrinsic
ClosedPublic

Authored by david-arm on Mar 7 2022, 5:06 AM.

Details

Summary

The vectoriser sometimes generates predicated vector loops using
the llvm.get.active.lane.mask intrinsic so it's important that we
are able to calculate a valid cost for the call instruction. When
SVE is enabled we are able to use a single whilelo instruction
for some vector types - in such cases I've marked the cost as 1.
For all other cases I've set the cost to some reasonably high
value.

Tests added here:

Analysis/CostModel/AArch64/sve-intrinsics.ll
Analysis/CostModel/ARM/active_lane_mask.ll
Analysis/CostModel/RISCV/active_lane_mask.ll

Diff Detail

Event Timeline

david-arm created this revision.Mar 7 2022, 5:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 5:06 AM
david-arm requested review of this revision.Mar 7 2022, 5:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 5:06 AM
sdesmalen added inline comments.Mar 7 2022, 5:46 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
234 ↗(On Diff #413435)

Can the default case (i.e. when shouldExpandGetActiveLaneMask = true) be added to BasicTTIImpl?
Then in the target's TTI, you only need to handle the case where the intrinsic is not expanded.

david-arm updated this revision to Diff 413442.Mar 7 2022, 6:21 AM
  • Moved the cost for the intrinsic expansion case into BasicTTIImpl.h
david-arm marked an inline comment as done.Mar 7 2022, 6:21 AM
dmgreen added inline comments.Mar 7 2022, 10:16 AM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
1519

The active_lane_mask has a default expansion. Can this add the costs expected from that expansion? Like the above fshr does.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
234 ↗(On Diff #413435)

Does this need to be here? Or could it be part of the target independent code - assuming the cost is 1 if it is not expanded?

david-arm added inline comments.Mar 8 2022, 1:09 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
234 ↗(On Diff #413435)

That's a good idea. One thing that makes me uncertain of that is that I can't call the usual TLI->getTypeLegalizationCost function to get the value type because for NEON the legalised type is transformed from v4i1 -> v4i32, etc. This then changes the behaviour of shouldExpandGetActiveLaneMask. That's why I only call TLI->getValueType here. If there is precedence for doing this in BasicTTIImpl then I'm happy to move it there.

dmgreen added inline comments.Mar 8 2022, 5:00 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
234 ↗(On Diff #413435)

I think that sounds fine, if it makes more sense to do that. Some of the existing places already do so - like the cost of memory ops.

david-arm updated this revision to Diff 413783.Mar 8 2022, 6:09 AM
  • Moved all of the cost calculation into BasicTTIImpl.h and tried to estimate the cost more formally for the case where we expand the intrinsic. The expansion costs look way over the top to be honest, but I guess we can refine those later if necessary.
david-arm marked 3 inline comments as done.Mar 8 2022, 6:09 AM
sdesmalen added inline comments.Mar 8 2022, 6:34 AM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
1522

Should this code concern itself with legalization costs too?

david-arm updated this revision to Diff 414321.Mar 10 2022, 2:52 AM
david-arm edited the summary of this revision. (Show Details)
  • Added cost model tests for ARM and RISCV too. Some of the costs for RISCV with scalable vector types are showing as Invalid, but I think this is probably due to something attempting to scalarise the saturating add, etc.
sdesmalen accepted this revision.Mar 11 2022, 1:29 AM
This revision is now accepted and ready to land.Mar 11 2022, 1:29 AM
This revision was landed with ongoing or failed builds.Mar 14 2022, 2:35 AM
This revision was automatically updated to reflect the committed changes.