The existing LD1 patterns do not cover cases where result type does
not match the memory type. This happens when illegal vector types are
extended and scalarized, for example:
load <2 x i16>* %v2i16
is lowered into:
// first element (v4i32 (insert_subvector (v2i32 (scalar_to_vector (load anyext from i16))))) // other elements (v4i32 (insert_vector_elt (i32 (load anyext from i16)) idx))
Before this patch these patterns were compiled into LDR + INS.
Now they are compiled into LD1.
There is a separate patch to combine these sequences with an ADD
into an LD1_POST: https://reviews.llvm.org/D102939
The problem was reported in
PR24820: LLVM Generates abysmal code in simple situation.