This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen][AArch64][SVE] Use ld1r[bhsd] for vector splat from memory
ClosedPublic

Authored by peterwaller-arm on May 26 2021, 8:20 AM.

Details

Summary

This avoids the use of the vector unit for copying from scalar to
vector. There is an extra ptrue instruction, but a predicate register
with the ptrue pattern populated is likely to be free in the context of
real code.

Tests were generated from a template to cover the axes mentioned at the
top of the test file.

Co-authored-by: Francesco Petrogalli <francesco.petrogalli@arm.com>

Diff Detail

Event Timeline

peterwaller-arm requested review of this revision.May 26 2021, 8:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 26 2021, 8:20 AM
  • Fix nxv2f64: uimm6s{4 => 8}
  • Protect against values living on the stack. Intend to introduce this later.

Please note I'll be on leave for the next week, returning June 7th.

I'll take a more detailed look later but structurally it looks fine to me. I've added a comment that hopefully means we don't need the frame index limitation.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
135

Rather than this I think you just need to add entries for the LD1R instructions to functions like getLoadStoreImmIdx and getMemOpInfo.

Maybe add a testcase for splat of a constant? Not sure what the code generation should look like, but it would be good to have coverage.

llvm/test/CodeGen/AArch64/sve-ld1r.ll
301

Is ldr+mov really better than add+ld1rw? (I think it's coming out this way because SelectAddrModeIndexedUImm bails instead of forcing the offset to zero.)

peterwaller-arm edited the summary of this revision. (Show Details)

Apologies for delayed response. Should be much faster on further updates.

  • Address most review comments.
    • Fix the issue of stack objects.
    • Force offset to zero in SelectAddrModeIndexedUImm.
  • Add a mir test for frame lowering, which crashes on main and passes here.
  • Drop addition of azextload and use more specific patterns instead. I couldn't find a sense in which they were necessary.

Maybe add a testcase for splat of a constant? Not sure what the code generation should look like, but it would be good to have coverage.

@efriedma I wasn't sure why you might be after for splat-of-constant, since this is about reading from memory. Are you thinking of splat-of-constant where the constant is out of range for an immediate? Something like https://github.com/llvm/llvm-project/blob/0596f7d828436e7db85154f2815eb3ff32d505af/llvm/test/CodeGen/AArch64/sve-intrinsics-dup-x.ll ? Or something else? Thanks for your review!

peterwaller-arm marked 2 inline comments as done.Jul 1 2021, 5:04 AM

Maybe add a testcase for splat of a constant? Not sure what the code generation should look like, but it would be good to have coverage.

@efriedma I wasn't sure why you might be after for splat-of-constant, since this is about reading from memory. Are you thinking of splat-of-constant where the constant is out of range for an immediate? Something like https://github.com/llvm/llvm-project/blob/0596f7d828436e7db85154f2815eb3ff32d505af/llvm/test/CodeGen/AArch64/sve-intrinsics-dup-x.ll ? Or something else? Thanks for your review!

Yes, an splat with an out-of-range immediate. We generate a load for some floating-point constants. For example:

define <vscale x 2 x double> @z() {
  %1 = insertelement <vscale x 2 x double> undef, double 3.33, i32 0
  %2 = shufflevector <vscale x 2 x double> %1, <vscale x 2 x double> undef, <vscale x 2 x i32> zeroinitializer
  ret <vscale x 2 x double> %2
}

Currently lowers to:

adrp    x8, .LCPI0_0
ldr     d0, [x8, :lo12:.LCPI0_0]
mov     z0.d, d0
ret
  • Add test for splat-of-float-constant which is not representable as an immediate.
  • Run update_llc_test_checks on sve-vector-splat.ll
  • Address code formater complaints.

Thanks again Eli. The code change is apparent in splat_nxv2f64_imm_out_of_range. I added float tests for good measure.

efriedma accepted this revision.Jul 5 2021, 5:45 PM

LGTM

This revision is now accepted and ready to land.Jul 5 2021, 5:45 PM
  • Rebase, update tests.
This revision was landed with ongoing or failed builds.Jul 6 2021, 5:05 AM
This revision was automatically updated to reflect the committed changes.