This is an archive of the discontinued LLVM Phabricator instance.

[SVE][CodeGen] Fix legalisation of floating-point masked gathers
ClosedPublic

Authored by kmclaughlin on Jan 6 2021, 5:32 AM.

Details

Summary

Changes in this patch:

  • When lowering floating-point masked gathers, cast the result of the gather back to the original type with reinterpret_cast before returning.
  • Added patterns for reinterpret_casts from integer to floating point, and concat_vector patterns for bfloat16.
  • Tests for various legalisation scenarios with floating point types.

Diff Detail

Event Timeline

kmclaughlin created this revision.Jan 6 2021, 5:32 AM
kmclaughlin requested review of this revision.Jan 6 2021, 5:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 6 2021, 5:32 AM
david-arm added inline comments.Jan 6 2021, 6:13 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3981

Is it worth having a helper function here, something like "isVectorUnpack(bool Signed)"? The reason I mention this is that there are two other places in the codebase where we also check if an opcode is "AArch64ISD::UUNPKLO || IdxOp == AArch64ISD::UUNPKHI".

llvm/test/CodeGen/AArch64/sve-masked-gather-legalize.ll
74

Is it worth having tests that load <vscale x 4 x half> as well for both the ptrs and base+offset case?

kmclaughlin marked 2 inline comments as done.
kmclaughlin added a subscriber: paulwalker-arm.
  • Added a new helper function, isVectorUnpack
  • Added tests which load <vscale x 4 x half> & <vscale x 2 x float>
david-arm accepted this revision.Jan 7 2021, 5:11 AM

LGTM! Thanks for making the changes.

This revision is now accepted and ready to land.Jan 7 2021, 5:11 AM

Removed the isVectorUnpack helper added in the previous revision. If the index values are already extended to i64 by an unpkhi/lo, then the gather does not also need to extend the index.
This affects the masked_gather_nxv4f64 test, which has been updated as follows:

sunpklo z1.d, z0.s
sunpkhi z2.d, z0.s
ld1d { z0.d }, p1/z, [x0, z1.d, sxtw #3]
ld1d { z1.d }, p0/z, [x0, z2.d, sxtw #3]

->

sunpklo z1.d, z0.s
sunpkhi z2.d, z0.s
ld1d { z0.d }, p1/z, [x0, z1.d, lsl #3]
ld1d { z1.d }, p0/z, [x0, z2.d, lsl #3]
david-arm accepted this revision.Jan 8 2021, 7:19 AM

LGTM! I'd blame the reviewer for the bug in the previous patch. :)

sdesmalen accepted this revision.Jan 8 2021, 8:13 AM