This allows conversion of a ParallelLoop from N induction variables to some nuber of induction variables less than N. The first intended use of this is for the GPUDialect to convert ParallelLoops to iterate over 3 dimensions so they can be launched as GPU Kernels. To implement this: - Normalize each iteration space of the ParallelLoop - Use the same induction variable in a new ParallelLoop for multiple original iterations. - Split the new induction variable back into the original set of values inside the body of the ParallelLoop.
Details
- Reviewers
herhut bondhugula - Commits
- rG27c201aa1d97: [MLIR] Add parallel loop collapsing.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
(Just some drive-by nits)
mlir/include/mlir/InitAllPasses.h | ||
---|---|---|
115 | Let's keep these sorted. | |
mlir/include/mlir/Transforms/LoopUtils.h | ||
229 | Can you please add a comment here? | |
mlir/lib/Transforms/ParallelLoopCoalescing.cpp | ||
23 ↗ | (On Diff #251099) | Can you use pass options instead? https://mlir.llvm.org/docs/WritingAPass/#instance-specific-pass-options |
mlir/lib/Transforms/Utils/LoopUtils.cpp | ||
976–983 | nit: Use /// for top-level comments | |
1000–1001 | nit: Please drop all trivial braces. | |
1032 | nit: Please use /// for top-level comments. | |
1119 | nit: Cache the end iterator of the loop, and prefer pre-increment. | |
1152 | Same here and below. |
Could you please add a summary to the commit message - even if it's a couple of lines? On a side note, do you want to use the term 'collapse' instead of 'coalesce'? OpenMP uses collapse for such linearization, and coalesce could also imply fusion of loops, which this isn't. I do see that colaesceLoops already existed prior to this patch
mlir/lib/Transforms/ParallelLoopCoalescing.cpp | ||
---|---|---|
52 ↗ | (On Diff #251099) | List initialize? |
Where is this coming from?
It'd be nice to clearly understand the goal and where is it coming from. In particular I don't understand why this isn't the role of separate tooling and it has to be encoded directly in commit messages. Are we going to prefix down to the last component manually? Is something like [LLVM][Backend][X86][Register allocator] to be expected?
+1, we document it here: https://mlir.llvm.org/getting_started/Contributing/#commit-messages
I'll add to the summary. Is there a reason arc diff didn't pick up my git commit message? Does it only create the summary based on the first arc diff and not any subsequent calls?
Thanks. Looks great with some nits.
mlir/lib/Transforms/ParallelLoopCoalescing.cpp | ||
---|---|---|
44 ↗ | (On Diff #251099) | Can you make this an OperationPass instead? |
52 ↗ | (On Diff #251099) | Use llvm::SmallVector instead? |
mlir/lib/Transforms/Utils/LoopUtils.cpp | ||
979 | Maybe have a little struct here instead of a tuple? Or use std::tie at use sites to improve readability. | |
983 | Move these closer to their first use. | |
1006 | Maybe Value newLowerBound = isZeroBased ? lowerBound : boundsBuilder.create<ConstantIndexOp>(loc, 0)? | |
1012 | Here, too? | |
1050 | Mega-nit: The order lower, step, upper is strange... | |
1138 | Why not newUpperBound = cst1 here? | |
1154 | A comment what this computes would help readability. | |
1158 | Should this be the normalized upper bound? | |
1160 | It would read easier for me if updating previous was also done here except for the last case. Would that make sense? | |
1163 | Normalized here, too? |
Handle herhut's comments and fix broken collapsing
logic that used the wrong upper bound value.
mlir/lib/Transforms/Utils/LoopUtils.cpp | ||
---|---|---|
1160 | I think this trades one mess for a different one because then it's just a different bounds check and not all indexing is happening at ivar_idx anymore. |
mlir/lib/Transforms/Utils/LoopUtils.cpp | ||
---|---|---|
1160 | I tried to restructure it to be more readable. |
I've noticed this too. It doesn't pick up subsequent commit message updates. I just manually update the commit message here when needed.
Mostly minor suggestions on readability
mlir/lib/Transforms/Utils/LoopUtils.cpp | ||
---|---|---|
1045 | OpBuilder innerBuilder(inner.getBody()) will be sufficient. | |
1109 | But to spell this out? PLoops -> ParallelLoops? | |
1130 | Nit: period at the end. | |
mlir/test/Transforms/parallel-loop-collapsing.mlir | ||
5–14 ↗ | (On Diff #252767) | Can you drop the extra indent? Also VAL_0 -> C6, VAL_1 -> C7, ...? |
31 ↗ | (On Diff #252767) | Can you use more descriptive names? VAL_10 -> IV0, VAL_11 -> IV1, ... |
34 ↗ | (On Diff #252767) | Drop the additional indent between the CHECK and the string? |
49 ↗ | (On Diff #252767) | CHECK-NEXT |
50–51 ↗ | (On Diff #252767) | CHECK-NEXT |
mlir/test/Transforms/single-parallel-loop-collapsing.mlir | ||
26 ↗ | (On Diff #252767) | VAL_14 isn't used; no need to capture it. |
32–33 ↗ | (On Diff #252767) | CHECK-NEXT |
34 ↗ | (On Diff #252767) | Not needed. |
35 ↗ | (On Diff #252767) | Drop trailing blank lines. |
Let's keep these sorted.