https://reviews.llvm.org/D140493 is going to teach SROA how to promote allocas
that have variably-indexed loads. That does bring up questions of cost model,
since that requires creating wide shifts.
Indeed, our legalization for them is not optimal.
We either split it into parts, or lower it into a libcall.
But if the shift amount is by a multiple of CHAR_BIT,
we can also legalize it throught stack.
The basic idea is very simple:
- Get a stack slot 2x the width of the shift type
- store the value we are shifting into one half of the slot
- pad the other half of the slot. for logical shifts, with zero, for arithmetic shift with signbit
- index into the slot (starting from the base half into which we spilled, either upwards or downwards)
- load
- split loaded integer
This works for both little-endian and big-endian machines:
https://alive2.llvm.org/ce/z/YNVwd5
And better yet, if the original shift amount was not a multiple of CHAR_BIT,
we can just shift by that remainder afterwards: https://alive2.llvm.org/ce/z/pz5G-K
I think, if we are going perform shift->shift-by-parts expansion more than once,
we should instead go through stack, which is what this patch does.
Is this ByteVecVT only used to make clampDynamicVectorIndex work? It won't cause vector instructions to be generated from scalar code will it?