In the loopvectorizer it hoists out from the loop body some variables
to optimize the loop body. When the loop has a gather/scatter
it hoists out parts of the index, as the step_vector.
However, with gather/scatter when parts of the index (step_vector and index)
are hoisted out from the loop, the compiler cannot check if the index could be
folded in 32 bits (in findMoreOptimalIndexType).
This patch pulls stepvector and the index to be inside the loop body, so we
can check if the compiler can fold the gather/scatter.
You may want to limit this recursion a bit further to some threshold, to avoid it becoming too expensive.
Maybe you can rewrite this to a loop, like this: