AMDGPUCodeGenPrepare widens some loads which then prevent
vectorization of an otherwise vectorizable load pair.
Run vectorizer pass before AMDGPUCodeGenPrepare to catch
the opportunity. The second run is still in place as passes
in between also create a lot of vectorization opportunities.
This pass is already problematic for compile time and hits quadratic behavior, so I'm not thrilled about running it twice. Why can't we just run it once?