We are allowed to store 128-bit-wide values using the q registers on AArch64.
GlobalISel was clamping the number of elements in vector stores into 64 bits instead.
This results in some poor codegen in cases like below:
; SDAG uses a stp + q registers in both cases here. define void @float(<16 x float> %val, <16 x float>* %ptr) { store <16 x float> %val, <16 x float>* %ptr ret void } define void @double(<8 x double> %val, <8 x double>* %ptr) { store <8 x double> %val, <8 x double>* %ptr ret void }
This adds similar legalization for vector stores with s8 and s16 elements.