On AVX and AVX2, BROADCAST instructions can load a scalar into all elements of a target vector.
This patch improves the lowering of 'splat' shuffles of a loaded vector into a broadcast - currently the lowering only works for cases where we are splatting the zero'th element, which is now generalised to any element.
Fix for PR23022
I don't know if this is even possible in practice: do we need to guard against volatile loads since the transform is shrinking the size of the load?
MayFoldLoad() calls isNormalLoad(), but neither of those check volatility, so if this fires on a volatile load, it looks like we get the original load and a splat load: