This is an archive of the discontinued LLVM Phabricator instance.

[mlir] parallel loop tiling optimization for loops with static bounds
ClosedPublic

Authored by gysit on Jun 17 2020, 4:17 AM.

Details

Summary

The patch optimizes the tiling of parallel loops with static bounds if the number of loop iterations is an integer multiple of the tile size.

Diff Detail

Event Timeline

gysit created this revision.Jun 17 2020, 4:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2020, 4:17 AM
ftynse accepted this revision.Jun 17 2020, 8:42 AM
ftynse added inline comments.
mlir/lib/Dialect/SCF/Transforms/ParallelLoopTiling.cpp
85–88

I could not parse this comment

112

Maybe we should consider a separate "canonicalization" pass (or an actual canonicalization) that removes single-iteration loops completely. There is one on affine loops.

This revision is now accepted and ready to land.Jun 17 2020, 8:42 AM
bondhugula requested changes to this revision.Jun 17 2020, 11:06 AM
bondhugula added a subscriber: bondhugula.
bondhugula added inline comments.
mlir/test/Dialect/SCF/parallel-loop-tiling.mlir
58–65

Please avoid using VAL_<running_number> - instead C0, C3, C0_1, etc. for readability. A substitution at this point.

This revision now requires changes to proceed.Jun 17 2020, 11:06 AM
herhut added inline comments.Jun 17 2020, 2:29 PM
mlir/lib/Dialect/SCF/Transforms/ParallelLoopTiling.cpp
64

Why do you dislike the llvm::zip?

85

This optimization makes sense here, as it is hard to recover this out of the generated affine min expression. But the comment could explain this better. Something like:

If we statically know the size of the outer loops iteration and it is divisible by the tiling factor, we can use a static bound for the inner loop. Otherwise, we have to dynamically compute the bound for each iteration of the outer loop.
114

I would rather have this as a separate canonicalization pattern that removes parallel loops with trip-count 1.

gysit updated this revision to Diff 272002.Jun 19 2020, 4:24 AM
gysit retitled this revision from [mlir] fix induction variable mapping of the parallel loop tiling to [mlir] optimize parallel loop tiling if bounds are known.
gysit edited the summary of this revision. (Show Details)
  • I started from the fixed tiling pass and tried to apply minimal changes (drop the minimum operation if the loop sizes is a multiple of the tile size)
  • I added a canonicalization to drop single iteration loops
  • I use CX / VX instead of VAL_X
gysit marked 6 inline comments as done.Jun 19 2020, 4:25 AM
gysit added a comment.Jun 19 2020, 4:30 AM

I may missunderstand the thing but in mlir/test/Transforms/parallel-loop-collapsing.mlir the computation of I3 directly uses NEW_I0 and multiplies it by 10 and adds 9. Since NEW_I0 takes the values 0 and 1 this means we get the value 19 for I3 but we have the loop range 9 to 11 in the original loop. I guess NEW_I0 should be divided by two before computing the index I3?

I may missunderstand the thing but in mlir/test/Transforms/parallel-loop-collapsing.mlir the computation of I3 directly uses NEW_I0 and multiplies it by 10 and adds 9. Since NEW_I0 takes the values 0 and 1 this means we get the value 19 for I3 but we have the loop range 9 to 11 in the original loop. I guess NEW_I0 should be divided by two before computing the index I3?

Yes, you are right, there is a div missing there. Do you want to fix it or file a bug?

Also, it would be super awesome if you could split this into two, one adding the canonicalization and one changing the tiling.

mlir/lib/Dialect/SCF/SCF.cpp
738 ↗(On Diff #272002)

You also need to check whether upperBound - lowerBound > 0. In the other case, the loop has no iterations while the rewritten loop has one.

mlir/lib/Dialect/SCF/Transforms/ParallelLoopTiling.cpp
93

Why not zip over op.step and tileSizeConstants to get these two? That avoids the pattern matching. You can then assign all of them at the beginning of the loop to give them local names. Something like

Value lowerBound, upperBound, originalStep, newStep, index;
std::tie...
109

You have assigned names to these std::get previously by assigning to locals. Why not use them here?

gysit added a comment.Jun 19 2020, 7:26 AM

Ok I try to split the stuff.

Regarding the collapse I will have a look at it next week.

gysit updated this revision to Diff 272086.Jun 19 2020, 8:31 AM
gysit retitled this revision from [mlir] optimize parallel loop tiling if bounds are known to [mlir] parallel loop tiling optimization for loops with static bounds.
gysit edited the summary of this revision. (Show Details)
gysit set the repository for this revision to rG LLVM Github Monorepo.
  • separated the loop tiling patch from the canonicalization
  • addressed Stephan's comments
gysit marked 2 inline comments as done.Jun 19 2020, 8:32 AM
rriddle added inline comments.Jun 19 2020, 11:07 AM
mlir/lib/Dialect/SCF/Transforms/ParallelLoopTiling.cpp
103

This is an extremely complex condition, please add braces and/or preferably split the conditions.

107

Add braces here, this if is not trivial.

bondhugula added inline comments.Jun 20 2020, 4:45 AM
mlir/lib/Dialect/SCF/Transforms/ParallelLoopTiling.cpp
97

Please rephrase this: "size of the outer loops iteration" isn't meaningful. Also, you are only considering the case of lb, ub, step being constant, which isn't the same of trip counts being statically known/constant (for eg., %i = %N to %N + 16).

gysit updated this revision to Diff 272255.Jun 20 2020, 7:39 AM
  • addressed the review comments
  • use ceil division by the step to compute the number of loop iterations (to support the cases where the number of loop iterations is known but not a multiple of the step)
  • adapted one of the test cases to test this scenario
  • removed some unnecessary headers
gysit marked 3 inline comments as done.Jun 20 2020, 7:39 AM
herhut accepted this revision.Jun 22 2020, 1:30 AM

Thank you! This is much easier to read now that it focuses on one thing.

bondhugula resigned from this revision.Jun 24 2020, 2:21 PM
This revision is now accepted and ready to land.Jun 24 2020, 2:21 PM
This revision was automatically updated to reflect the committed changes.