When we predicate an instruction (div, rem, store) we place the instruction in its own basic block within the vectorized loop. If a predicated instruction has scalar operands, it's possible to recursively sink these scalar expressions into the predicated block so that they might avoid execution. This patch sinks as much scalar computation as possible into predicated blocks. We previously were able to sink such operands only if they were extractelement instructions.
For example, if we have a predicated store "a[i] = x", instead of generating:
vector.body: ... %i = add i64 %index, 1 %p = getelementptr inbounds i32, i32* %a, i64 %i %x = extractelement <2 x i32> %vec, i32 0 ... pred.store: store i32 %x, i32* %p ...
We will now generate:
vector.body: ... pred.store: %i = add i64 %index, 1 %p = getelementptr inbounds i32, i32* %a, i64 %i %x = extractelement <2 x i32> %vec, i32 0 store i32 %x, i32* %p ...
I think the standard way to do this is to run "while (!WorkList.empty()))" and keep a side map of nodes you're done with so you will not add to the worklist. This avoids both the iterator invalidation problem, and recognizing the fixed-point problem.
It may make things more complicated in this case, though.