This patch adds support for vectorizing loops with 'iter_args'
implementing known reductions along the vector dimension. Comparing to
the non-vector-dimension case, two additional things are done during
vectorization of such loops:
- The resulting vector returned from the loop is reduced to a scalar using vector.reduce.
- In some cases a mask is applied to the vector yielded at the end of the loop to prevent garbage values from being written to the accumulator.
Vectorization of reduction loops is disabled by default. To enable it, a
map from loops to array of reduction descriptors should be explicitly passed to
vectorizeAffineLoops, or vectorize-reductions=true should be passed
to the SuperVectorize pass.
Current limitations:
- Loops with a non-unit step size are not supported.
- n-D vectorization with n > 1 is not supported.
Making the method return true with the ignoreIterArgs set to true looks hacky. That's an incorrect return result since the loop isn't actually parallel. I'd recommend making the change suggested in your comment below - since it's in the right direction to start with.