This matches loops with a affine.min upper bound, limiting the trip
count to a constant, and rewrites them into two loops, one with constant
upper bound and one with variable upper bound. The assumption is that
the constant upper bound loop will be unrolled and vectorized, which is
preferable if this is the hot path.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopPreparationForVectorization.cpp | ||
---|---|---|
1 ↗ | (On Diff #246912) | (bikeshed) could we find a shorter name? ParallelLoopSpecialization? |
36 ↗ | (On Diff #246912) | This assumes the first operand of the map appears in the results of the map unmodified. While the "if" introduced below will make sure nothing bad happens if it does not, there may be cases where the generated code is dead. For example %0 = affine.min affine_map<(d0)->(-d0,-1024)>(%c10) loop.parallel (%i) = (%c0) to (%0) will never have the upper bound of 10. I'd at least add a check that the operand indeed appears unmodified. Another thing is canonicalization. If you canonicalize before this pass, constants should be folded into the affine map so you'll never have a constant index operand to affine apply. You could look into normalizing affine maps with operands and then just check if some _results_ are constant affine expressions. This is fine for a follow-up. |
mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopPreparationForVectorization.cpp | ||
---|---|---|
28 ↗ | (On Diff #246912) | Please only use anonymous namespace for classes. Functions should be in the global scope and marked as static. |
- Renamed the thing ParallelLoopSpecialization
- Match the canonicalized AffineMap form, not the one produced by ParallelLoopTiling