This should allow this pass or others to fold subsequent scalar ops together more easily because they will see extractelement ops from a single vector rather than incomplete parts of that vector.
Currently, this transform will make no overall difference to these most basic patterns because the backend (DAGCombiner) will narrow the loads back down to scalars via narrowExtractedVectorLoad().
For now, we do not get any differences for scalar integer loads because those extracts are not free. We will need to match larger patterns and/or adjust the cost equation to allow that.
I'm not sure why we'd care whether the load is of bitcast.
Why are we using bitcast src type as the source of truth?
Are we trying to avoid introducing some cache issues?
I'd think we should instead assess (check, brute-force)
each possible wider load type, first checking cost and then isSafeToLoadUnconditionally().