We are missing patterns to generate vector splats using LD1R.
A shuffle vector with all 0s is a vector splat:
%lv2i32 = load <2 x i32>, ptr %P %B = shufflevector <2 x i32> %lv2i32, <2 x i32> undef, <2 x i32> zeroinitializer
for which we can generate a LD1R if the operands are a load and undef. This was inspired by the tests in:
llvm-project/llvm/test/Analysis/CostModel/AArch64/shuffle-load.ll
for which we don't generate LD1Rs.