Instead of using the pointer element type, look at how the pointer is actually being used in store instructions, while looking through bitcasts. This makes the transform compatible with opaque pointers and a bit more general.
It's worth noting that I have dropped the 3-vector to 4-vector shufflevector special case, because this is now handled in a different way: If the value is actually used as a 4-vector, then we're directly going to use that type, instead of shuffling to a 3-vector in between.