This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Improve codegen for get.active.lane.mask when SVE is available
ClosedPublic

Authored by david-arm on Jan 5 2022, 7:52 AM.

Details

Summary

When lowering the get.active.lane.mask intrinsic with a fixed-width
predicate vector result, we can actually make use of the SVE whilelo
instruction when SVE is enabled. We do this by carefully choosing
a sensible VT for the whilelo instruction, then promoting it to an
integer vector, i.e. nxv16i1 -> nx16i8. We can then extract a v16i8
subvector and truncate back to the original return type, i.e. v16i1.
This leads to a significant improvement in code quality.

Diff Detail

Event Timeline

david-arm created this revision.Jan 5 2022, 7:52 AM
david-arm requested review of this revision.Jan 5 2022, 7:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2022, 7:52 AM

I'm not sure the testcases actually illustrate the cases we care about. Generally, I would expect the result of llvm.get.active.lane.mask() to be used in a select instruction, or a masked load, or something like that. And in that case, I'm not sure the way you're choosing the VT is appropriate; the instruction using the mask is probably not going to expect a 64-bit vector.

Matt added a subscriber: Matt.Jan 7 2022, 7:30 AM

I'm not sure the testcases actually illustrate the cases we care about. Generally, I would expect the result of llvm.get.active.lane.mask() to be used in a select instruction, or a masked load, or something like that. And in that case, I'm not sure the way you're choosing the VT is appropriate; the instruction using the mask is probably not going to expect a 64-bit vector.

Hi @efriedma, I think these testcases are still useful by themselves because they are succint and make it easy to see how one IR instruction maps to assembly. At the moment if I add more complex test cases involving a select, for example, the code quality ends up being awful regardless of what promoted VT I choose. I think there is a still a codegen issue somewhere because I see loads of pointless lane moves whenever I add something like a select. So for now, I'd like to leave the tests as they are.

However, I do take your point about trying to second guess how the masks are going to be used, and perhaps I can make the choice of promoted VT simpler for now, and leave the xtn instructions in.

Okay, sounds good.

david-arm updated this revision to Diff 399284.Jan 12 2022, 3:55 AM
david-arm edited the summary of this revision. (Show Details)
  • Removed code to optimise VT for NEON, since we don't know how the result is actually going to be used.
sdesmalen accepted this revision.Jan 12 2022, 9:26 AM

Nice improvement!

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15095

nit: avoid indentation by doing if (!VT.isFixedLengthVector()) return SDValue(); ?

15121

nit: add comment 'truncate v4i32 -> v4i1`

This revision is now accepted and ready to land.Jan 12 2022, 9:26 AM
dmgreen added inline comments.Jan 13 2022, 12:33 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15095

It looks like it is worth adding an assert that SVE is available too.

This revision was landed with ongoing or failed builds.Feb 10 2022, 8:02 AM
This revision was automatically updated to reflect the committed changes.