When building the vectorization graph, the compiler may end up with the
consecutive loads in the different branches, which end up to be
gathered. We can scan these loads and try to load them as final
vectorized load and then reshuffle between the branches to avoid extra
scalar loads in the code.
Part of D57059
clang-format: please reformat the code