E.g. An interleaved load (Factor = 4):
%wide.vec = load <8 x i16>, <8 x i16>* %ptr %strided.vec = shuffle <8 x i16> %wide.vec, <8 x i16> undef, <2 x i32><i32 0, i32 4>
%v1 = uitofp <2 x i16> %strided.vec to <2 x double>
It can be transformed into a tbl1 intrinsic in AArch64 backend to avoid the high cost extract/insert sequences.
The change is also summarized in calculating InterleavedMemoryOpCost in loop vectorizer for decision in
loop vectorization.
This change will give SPEC2017 538.imagick_r 11.5% performance boost.
Tested using:
%llvm/build/bin/llvm-lit ../../test/*
And there is no regression on the test.
And we also tested this on SPEC2017 whole suite and they all pass and there is no performance regression.
Since you're changing the function signature anyway would it make sense to change this to a VectorType *VecTy? There's been chatter on other patches about making this jump in general as many of the TTI calls expect a vector anyhow.