This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] More patterns to generate LD1R vector splats
ClosedPublic

Authored by SjoerdMeijer on Feb 28 2023, 1:52 PM.

Details

Summary

We are missing patterns to generate vector splats using LD1R.
A shuffle vector with all 0s is a vector splat:

%lv2i32 = load <2 x i32>, ptr %P
%B = shufflevector <2 x i32> %lv2i32, <2 x i32> undef, <2 x i32> zeroinitializer

for which we can generate a LD1R if the operands are a load and undef. This was inspired by the tests in:

llvm-project/llvm/test/Analysis/CostModel/AArch64/shuffle-load.ll

for which we don't generate LD1Rs.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Feb 28 2023, 1:52 PM
SjoerdMeijer requested review of this revision.Feb 28 2023, 1:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 28 2023, 1:52 PM
dmgreen accepted this revision.Mar 1 2023, 1:53 AM

Sounds good. I'm a little surprised that if we have a vector load where only one lane it demanded that we don't change it into a scalar load.

Can you add this as a multi-use test case. Otherwise LGTM:

define <4 x i32> @shuffle2_multiuse(ptr %P) {
; CHECK-LABEL: shuffle2_multiuse:
; CHECK:       // %bb.0:
; CHECK-NEXT:    ldr q0, [x0]
; CHECK-NEXT:    dup v1.4s, v0.s[0]
; CHECK-NEXT:    dup v0.4s, v0.s[1]
; CHECK-NEXT:    add v0.4s, v1.4s, v0.4s
; CHECK-NEXT:    ret
  %lv2i32 = load <4 x i32>, ptr %P
  %B = shufflevector <4 x i32> %lv2i32, <4 x i32> undef, <4 x i32> zeroinitializer
  %C = shufflevector <4 x i32> %lv2i32, <4 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
  %D = add <4 x i32> %B, %C
  ret <4 x i32> %D
}
This revision is now accepted and ready to land.Mar 1 2023, 1:53 AM

Thanks, also for that case which I will add, that's interesting indeed.

This revision was landed with ongoing or failed builds.Mar 1 2023, 2:48 AM
This revision was automatically updated to reflect the committed changes.