LoopVectorize: Support conditional stores by scalarizing

Press ? to show keyboard shortcuts.
arnoldsJan 27 2014, 5:01 PM
rL200269: Revert r199871 and replace it with a simple check in the debug info

LoopVectorize: Support conditional stores by scalarizing

The vectorizer takes a loop like this and widens all instructions except for the
store. The stores are scalarized/unrolled and hidden behind an "if" block.

for (i = 0; i < 128; ++i) {
  if (a[i] < 10)
    a[i] += val;

for (i = 0; i < 128; i+=2) {
  v = a[i:i+1];
  v0 = (extract v, 0) + 10;
  v1 = (extract v, 1) + 10;
  if (v0 < 10)
    a[i] = v0;
  if (v1 < 10)
    a[i] = v1;

The vectorizer relies on subsequent optimizations to sink instructions into the
conditional block where they are anticipated.

The flag "vectorize-num-stores-pred" controls whether and how many stores to
handle this way. Vectorization of conditional stores is disabled per default for

This patch also adds a change to the heuristic when the flag
"enable-loadstore-runtime-unroll" is enabled (off by default). It unrolls small
loops until load/store ports are saturated. This heuristic uses TTI's
getMaxUnrollFactor as a measure for load/store ports.

I also added a second flag -enable-cond-stores-vec. It will enable vectorization
of conditional stores. But there is no cost model for vectorization of
conditional stores in place yet so this will not do good at the moment.


Results for x86-64 -O3 -mavx +/- -mllvm -enable-loadstore-runtime-unroll
-vectorize-num-stores-pred=1 (before the BFI change):

Performance Regressions:

Benchmarks/Ptrdist/yacr2/yacr2 7.35% (maze3() is identical but 10% slower)
Applications/siod/siod         2.18%

Performance improvements:

mesa                          -4.42%
libquantum                    -4.15%

With a patch that slightly changes the register heuristics (by subtracting the
induction variable on both sides of the register pressure equation, as the
induction variable is probably not really unrolled):

Performance Regressions:

Benchmarks/Ptrdist/yacr2/yacr2  7.73%
Applications/siod/siod          1.97%

Performance Improvements:

libquantum                    -13.05% (we now also unroll quantum_toffoli)
mesa                           -4.27%






Add Comment