This is D64142 moved from SLP to VectorCombine. That patch stalled because it didn't fit neatly into the SLP mold, but this matches the template we've used for other VectorCombine transforms.
The general idea is that if we have a legal vector pointer type, but we are bitcasting that pointer to only load a subset of a vector, then load the whole vector (if that is safe) and extract the subset of the vector.
That should allow this pass or others to fold subsequent scalar ops together more easily because they will see extractelement ops from a single vector rather than incomplete parts of that vector.
Currently, this transform will make no overall difference to these most basic patterns because the backend (DAGCombiner) will narrow the loads back down to scalars via narrowExtractedVectorLoad().
For now, we do not get any differences for scalar integer loads because those extracts are not free. We will need to match larger patterns and/or adjust the cost equation to allow that.