This is an archive of the discontinued LLVM Phabricator instance.

Add a pass that specializes parallel loops for easier unrolling and vectorization
ClosedPublic

Authored by bkramer on Feb 27 2020, 4:33 AM.

Details

Summary

This matches loops with a affine.min upper bound, limiting the trip
count to a constant, and rewrites them into two loops, one with constant
upper bound and one with variable upper bound. The assumption is that
the constant upper bound loop will be unrolled and vectorized, which is
preferable if this is the hot path.

Diff Detail

Event Timeline

bkramer created this revision.Feb 27 2020, 4:33 AM
ftynse added a subscriber: ftynse.Feb 27 2020, 6:25 AM
ftynse added inline comments.
mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopPreparationForVectorization.cpp
1 ↗(On Diff #246912)

(bikeshed) could we find a shorter name? ParallelLoopSpecialization?

36 ↗(On Diff #246912)

This assumes the first operand of the map appears in the results of the map unmodified. While the "if" introduced below will make sure nothing bad happens if it does not, there may be cases where the generated code is dead. For example

%0 = affine.min affine_map<(d0)->(-d0,-1024)>(%c10)
loop.parallel (%i) = (%c0) to (%0)

will never have the upper bound of 10. I'd at least add a check that the operand indeed appears unmodified.

Another thing is canonicalization. If you canonicalize before this pass, constants should be folded into the affine map so you'll never have a constant index operand to affine apply. You could look into normalizing affine maps with operands and then just check if some _results_ are constant affine expressions. This is fine for a follow-up.

rriddle added inline comments.Feb 27 2020, 8:38 AM
mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopPreparationForVectorization.cpp
28 ↗(On Diff #246912)

Please only use anonymous namespace for classes. Functions should be in the global scope and marked as static.

bkramer updated this revision to Diff 247224.Feb 28 2020, 4:20 AM
  • Renamed the thing ParallelLoopSpecialization
  • Match the canonicalized AffineMap form, not the one produced by ParallelLoopTiling
ftynse accepted this revision.Feb 28 2020, 8:22 AM
This revision is now accepted and ready to land.Feb 28 2020, 8:22 AM
This revision was automatically updated to reflect the committed changes.