In complex cases, the address optimization for gathers and scatters introduced by would push out instructions not to the vector loop preheader, but to other locations as well. This could lead to their order being scrambled and thus the compilation as a whole failing.
This patch fixes this by ensuring that said instructions are always pushed to the end of the vector loop preheader, and adds a test that would fail without the patch.
