When we predicate an instruction (div, rem, store) we place the instruction in its own basic block within the vectorized loop. If a predicated instruction has scalar operands, it's possible to recursively sink these scalar expressions into the predicated block so that they might avoid execution. This patch sinks as much scalar computation as possible into predicated blocks. We previously were able to sink such operands only if they were extractelement instructions.
For example, if we have a predicated store "a[i] = x", instead of generating:
vector.body: ... %i = add i64 %index, 1 %p = getelementptr inbounds i32, i32* %a, i64 %i %x = extractelement <2 x i32> %vec, i32 0 ... pred.store: store i32 %x, i32* %p ...
We will now generate:
vector.body: ... pred.store: %i = add i64 %index, 1 %p = getelementptr inbounds i32, i32* %a, i64 %i %x = extractelement <2 x i32> %vec, i32 0 store i32 %x, i32* %p ...