This allows conversion of a ParallelLoop from N induction variables to some nuber of induction variables less than N. The first intended use of this is for the GPUDialect to convert ParallelLoops to iterate over 3 dimensions so they can be launched as GPU Kernels. To implement this: - Normalize each iteration space of the ParallelLoop - Use the same induction variable in a new ParallelLoop for multiple original iterations. - Split the new induction variable back into the original set of values inside the body of the ParallelLoop.
- rG27c201aa1d97: [MLIR] Add parallel loop collapsing.
(Just some drive-by nits)
Let's keep these sorted.
Can you please add a comment here?
|23 ↗||(On Diff #251099)|
Can you use pass options instead? https://mlir.llvm.org/docs/WritingAPass/#instance-specific-pass-options
nit: Use /// for top-level comments
nit: Please drop all trivial braces.
nit: Please use /// for top-level comments.
nit: Cache the end iterator of the loop, and prefer pre-increment.
Same here and below.
Could you please add a summary to the commit message - even if it's a couple of lines? On a side note, do you want to use the term 'collapse' instead of 'coalesce'? OpenMP uses collapse for such linearization, and coalesce could also imply fusion of loops, which this isn't. I do see that colaesceLoops already existed prior to this patch
|52 ↗||(On Diff #251099)|
Where is this coming from?
It'd be nice to clearly understand the goal and where is it coming from. In particular I don't understand why this isn't the role of separate tooling and it has to be encoded directly in commit messages. Are we going to prefix down to the last component manually? Is something like [LLVM][Backend][X86][Register allocator] to be expected?
Thanks. Looks great with some nits.
|44 ↗||(On Diff #251099)|
Can you make this an OperationPass instead?
|52 ↗||(On Diff #251099)|
Use llvm::SmallVector instead?
Maybe have a little struct here instead of a tuple? Or use std::tie at use sites to improve readability.
Move these closer to their first use.
Maybe Value newLowerBound = isZeroBased ? lowerBound : boundsBuilder.create<ConstantIndexOp>(loc, 0)?
Mega-nit: The order lower, step, upper is strange...
Why not newUpperBound = cst1 here?
A comment what this computes would help readability.
Should this be the normalized upper bound?
It would read easier for me if updating previous was also done here except for the last case. Would that make sense?
Normalized here, too?
Mostly minor suggestions on readability
OpBuilder innerBuilder(inner.getBody()) will be sufficient.
But to spell this out? PLoops -> ParallelLoops?
Nit: period at the end.
Can you drop the extra indent? Also VAL_0 -> C6, VAL_1 -> C7, ...?
Can you use more descriptive names? VAL_10 -> IV0, VAL_11 -> IV1, ...
Drop the additional indent between the CHECK and the string?
VAL_14 isn't used; no need to capture it.
Drop trailing blank lines.