This is an archive of the discontinued LLVM Phabricator instance.

Add a basic tiling pass for parallel loops
ClosedPublic

Authored by bkramer on Feb 21 2020, 4:31 AM.

Details

Summary

This exploits the fact that the iterations of parallel loops are
independent so tiling becomes just an index transformation. This pass
only tiles the innermost loop of a loop nest.

The ultimate goal is to allow vectorization of the tiled loops, but I
don't think we're there yet with the current rewriting, as the tiled
loops don't have a constant trip count.

Diff Detail

Event Timeline

bkramer created this revision.Feb 21 2020, 4:31 AM
bkramer updated this revision to Diff 245822.Feb 21 2020, 4:53 AM
  • use pass option instead of global option
  • fix the recursive walker to walk recursively
herhut requested changes to this revision.Feb 21 2020, 5:01 AM
herhut added inline comments.
mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopTiling.cpp
25

Please use pass options.

30

nit: Doc comments need to start with ///

Use parallel loop instead of ploop.

42

tileParallelLoop

44

Maybe newSteps.reserve?

68

I do not understand this part. We want the min of either the step size or dim - upperbound_of_outer.

77

you can do something like innerLoop.region().takeBody(op.region()) which simply transfers one region to the other. As the number of block arguments does not change, no need to fix uses or anything.

95

getBlock().getOps<loop::ParallelOp>() gives an (potentially empty) iterator over all parallel loops in the block.

This revision now requires changes to proceed.Feb 21 2020, 5:01 AM

We already several implementations of loop tiling (lib/Transforms/LoopTiling.cpp, lib/Dialect/Linalg/Transforms/Tiling.cpp, lib/Transforms/Utils/LoopUtils.cpp). Have you considered generalizing them instead of introducing yet another one? I suppose parallel loop nest as an operation removes the need for any supplementary preconditions, but the mechanics of the transformation should be very similar between parallel and sequential loops.

I agree we may want to peel off the last iterations to avoid complex conditions, but IMO it's better as an option to the transformation. @poechsel had a similar requested with loops produced by Linalg tiling, so it would make sense to try and reuse the infra here.

ftynse requested changes to this revision.Feb 21 2020, 5:07 AM
ftynse added inline comments.
mlir/test/Dialect/Loops/parallel-loop-tiling.mlir
20

Please don't pattern-match SSA names, and generally prefer avoiding non-essential parts of the IR from the test (e.g. repeated types in the pattern).

https://mlir.llvm.org/getting_started/TestingGuide/

bkramer updated this revision to Diff 245832.Feb 21 2020, 6:00 AM
bkramer marked 9 inline comments as done.
  • One round of addressed review comments

We already several implementations of loop tiling (lib/Transforms/LoopTiling.cpp, lib/Dialect/Linalg/Transforms/Tiling.cpp, lib/Transforms/Utils/LoopUtils.cpp). Have you considered generalizing them instead of introducing yet another one? I suppose parallel loop nest as an operation removes the need for any supplementary preconditions, but the mechanics of the transformation should be very similar between parallel and sequential loops.

The complexity of the tiling passes is in the dependency analysis and not the rewriting itself. This one is purely structural. We would need to split the actual tiling rewrite form the analysis with interfaces for generating the tiled operation. I am not sure that is worth the code we would actually get to reuse.

mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopTiling.cpp
52

The replication part could be a separate pattern/pass that rewrites a loop with dynamic upper bounds specified with a constant upper bound (so an affine.min with a constant) into something like

bound = affine.min(constant, other thing)
if (bound ==constant) {
  // loop with constant upper bound
} else {
  // loop with dynamic upper bound
}

Maybe this is good enough to make LLVM recognize the potential for vectorization. That rewrite could be applied independently.

96

Can we have an ArrayRef<unsigned> for tile sizes? So we can have different ones at different levels? I would suspect that we, for vectorization, always want to tile 1x...x1xN.

herhut added inline comments.Feb 21 2020, 6:19 AM
mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopTiling.cpp
68

I always get confused by this myself. I think it needs to be upper_bound - current_index_of_outer_loop to get the number of remaining iterations. And then the min with the tiling size.

bkramer updated this revision to Diff 245847.Feb 21 2020, 7:08 AM
bkramer marked an inline comment as done.
  • Make tileSizes a list
  • Fix min again

Just adding some minor nit comments!

We already several implementations of loop tiling (lib/Transforms/LoopTiling.cpp, lib/Dialect/Linalg/Transforms/Tiling.cpp, lib/Transforms/Utils/LoopUtils.cpp). Have you considered generalizing them instead of introducing yet another one? I suppose parallel loop nest as an operation removes the need for any supplementary preconditions, but the mechanics of the transformation should be very similar between parallel and sequential loops.

The complexity of the tiling passes is in the dependency analysis and not the rewriting itself. This one is purely structural. We would need to split the actual tiling rewrite form the analysis with interfaces for generating the tiled operation. I am not sure that is worth the code we would actually get to reuse.

I think it might be worth it since affine parallel for would be another candidate for tiling.

mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopTiling.cpp
41

size_t i = 0; i != op.upperBound().size(); -> size_t i = 0, end = op.upperBound().size(); i != end; ?

116

please drop {} for single line for and if-else statements

mlir/test/Dialect/Loops/parallel-loop-tiling.mlir
38

Please add // ----- between tests for split-input-file to work

Just adding some minor nit comments!

We already several implementations of loop tiling (lib/Transforms/LoopTiling.cpp, lib/Dialect/Linalg/Transforms/Tiling.cpp, lib/Transforms/Utils/LoopUtils.cpp). Have you considered generalizing them instead of introducing yet another one? I suppose parallel loop nest as an operation removes the need for any supplementary preconditions, but the mechanics of the transformation should be very similar between parallel and sequential loops.

The complexity of the tiling passes is in the dependency analysis and not the rewriting itself. This one is purely structural. We would need to split the actual tiling rewrite form the analysis with interfaces for generating the tiled operation. I am not sure that is worth the code we would actually get to reuse.

I think it might be worth it since affine parallel for would be another candidate for tiling.

I guess we would need to compute the new step sizes as affine expressions, as well, in such case. My point was how much code reuse we would get.

mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopTiling.cpp
34

This comment does not state the actual rewriting, does it?

herhut accepted this revision.Feb 24 2020, 1:33 AM

Thanks!

LGTM with comments addressed.

mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopTiling.cpp
40

mega-nit: tileSizeConstants.reserve

bkramer updated this revision to Diff 246166.Feb 24 2020, 2:42 AM
bkramer marked 4 inline comments as done.
  • Address moar comments
Herald added a reviewer: sstefan1. · View Herald Transcript
Herald added a reviewer: rriddle. · View Herald Transcript
Herald added a reviewer: uenoku. · View Herald Transcript
Herald added projects: Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project. · View Herald Transcript
bkramer updated this revision to Diff 246167.Feb 24 2020, 2:43 AM
  • WTF phab?
This revision was not accepted when it landed; it landed in state Needs Review.Feb 24 2020, 2:48 AM
This revision was automatically updated to reflect the committed changes.
teemperor removed projects: Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project.
rriddle added inline comments.Feb 24 2020, 9:09 AM
mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopTiling.cpp
113

nit: Please drop the trivial braces. Here and below.