Page MenuHomePhabricator

[mlir] support max/min lower/upper bounds in affine.parallel
ClosedPublic

Authored by ftynse on Apr 23 2021, 8:55 AM.

Details

Summary

This enables to express more complex parallel loops in the affine framework,
for example, in cases of tiling by sizes not dividing loop trip counts perfectly
or inner wavefront parallelism, among others. One can't use affine.max/min
and supply values to the nested loop bounds since the results of such
affine.max/min operations aren't valid symbols. Making them valid symbols
isn't an option since they would introduce selection trees into memref
subscript arithmetic as an unintended and undesired consequence. Also
add support for converting such loops to SCF. Drop some API that isn't used in
the core repo from AffineParallelOp since its semantics becomes ambiguous in
presence of max/min bounds. Loop normalization is currently unavailable for
such loops.

Depends On D101171

Diff Detail

Event Timeline

ftynse created this revision.Apr 23 2021, 8:55 AM
ftynse requested review of this revision.Apr 23 2021, 8:55 AM

This is fantastic! Thanks @ftynse The fact that affine.parallel didn't allow min/max for lb/ub meant that dim/symbol requirements would be broken whenever an affine.max/min couldn't be at the top-level (since the result of an affine.max and affine.min isn't a valid symbol (unless at top level) for good reasons). Having this support also means the max/min is unified/composed in the op itself now consistent with other designs. In retrospect, I think this should have been in the design on day 0 itself because otherwise the design just goes in an undesired direction and is invasive to fix later. I'll be happy to review this revision.

bondhugula added inline comments.Apr 23 2021, 10:26 PM
mlir/include/mlir/Dialect/Affine/IR/AffineOps.td
669

Missing updates to the documentation.

733

Some of these should probably have doc comments - likewise above.

mlir/include/mlir/IR/AffineMap.h
265–266

Nit: length -> num ?

mlir/include/mlir/IR/OpImplementation.h
690

Missing doc comment - you can just refer to the one above here / make it relative.

mlir/lib/Conversion/AffineToStandard/AffineToStandard.cpp
434

I wonder when this fails. Is it the floodiv's/ceildiv/mod w.r.t negative values and symbols (semi-affine). This can be addressed separately but floordiv/ceildiv/mod RHS are always expected to be positive - it's UB otherwise. And so you can freely use the same operation to divide as in the case of positive constants.

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2622–2644

Assert message please although trivial.

2655

same repeated.

2930

Nit: '('.

mlir/lib/IR/AsmPrinter.cpp
2607

Likewise.

bondhugula added a comment.EditedApr 23 2021, 10:31 PM

This is fantastic! Thanks @ftynse The fact that affine.parallel didn't allow min/max for lb/ub meant that dim/symbol requirements would be broken whenever an affine.max/min couldn't be at the top-level (since the result of an affine.max and affine.min isn't a valid symbol (unless at top level) for good reasons). Having this support also means the max/min is unified/composed in the op itself now consistent with other designs. In retrospect, I think this should have been in the design on day 0 itself because otherwise the design just goes in an undesired direction and is invasive to fix later. I'll be happy to review this revision.

@ftynse You could actually augment your commit summary. It's not just imperfect tiling or wavefront parallelism, but also plain loop tiling when the trip counts aren't multiple of tile sizes. One can't use affine.max/min and supply values to the intra-tile bounds since the results of such affine.max/min operations aren't valid symbols. Making them valid symbols isn't an option since they'd introduce selection trees into memref subscript arithmetic as an unintended and undesired consequence.

bondhugula requested changes to this revision.Apr 23 2021, 10:43 PM
bondhugula added inline comments.
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2992–3000

I think you can do:

pos = llvm::find(...) - uniqueOperands.begin();
if (pos == uniqueOperands.size())
   uniqueOperands.push_back(operand);
2993

Nit: Consider switching to * it - although we use auto it in the codebase, the clang-tidy warnings mean those using in-editor syntax errors (via clangd and clang-tidy checks) would see a highlighted issue that can't be distinguished immediately from other "must fix" warnings.

This revision now requires changes to proceed.Apr 23 2021, 10:43 PM
bondhugula added inline comments.Apr 23 2021, 11:02 PM
mlir/include/mlir/IR/AffineMap.h
265–266

I think length is fine (as in slice length).

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2655

same -> space.

2659

assert(!maps.empty() && ..)

2677–2680

lbGroups and ubGroups can't be empty here. An assertion missing somewhere.

3024

Always good to also have an example op here.

3054

Nit: Consider hoisting this out and having an mapOperands.clear() here to avoid repeated allocation.

3079

Good to leave a blank line right below here.

mlir/lib/Dialect/Affine/Utils/Utils.cpp
160

{lowerBoundMap} won't work?

ftynse updated this revision to Diff 340466.Apr 26 2021, 2:59 AM
ftynse marked 17 inline comments as done.

Address review.

Thanks for the detailed review!

mlir/lib/Conversion/AffineToStandard/AffineToStandard.cpp
434

Yes, it fails in the case of negative or symbolic RHS for div/mod. This has been around for a while, I remember having written prototype code with selects that supports Euclidean division by a value that can be either positive or negative, with @albertcohen. Not sure if there was a use case for that.

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2677–2680

They won't be empty if lbMaps, ubMaps are not given how they are constructed in concatMapsSameInput. And we have an assertion for lbMaps, ubMaps.

2992–3000

Nice, thanks!

2993

Yeah, I find this specific clang-tidy suggestion borderline incorrect. auto it intends to say that we don't care about the specific iterator type, auto *it effective forces the iterator to be implemented as typedef elementTy *iterator, which is almost the exact opposite of the original intent.

mlir/lib/Dialect/Affine/Utils/Utils.cpp
160

Nope, template type deduction doesn't work with initializer lists forwarded to implicit constructors.

Candidate template ignored: substitution failure [with OpTy = mlir::AffineParallelOp]: deduced incomplete pack <mlir::ValueTypeRange<mlir::ValueRange>, llvm::SmallVector<mlir::AtomicRMWKind, 6> &, (no value), mlir::ValueRange &, llvm::ArrayRef<mlir::AffineMap>, mlir::ValueRange &, llvm::ArrayRef<long>> for template parameter 'Args'

this is one of the reasons why I kept proposing arguably better interfaces for IR construction, but abandoned given the unreasonable amount of pushback. Building ops feels like one of the worst developer experience solutions I have ever seen.

ftynse edited the summary of this revision. (Show Details)Apr 26 2021, 3:02 AM

This is fantastic! Thanks @ftynse The fact that affine.parallel didn't allow min/max for lb/ub meant that dim/symbol requirements would be broken whenever an affine.max/min couldn't be at the top-level (since the result of an affine.max and affine.min isn't a valid symbol (unless at top level) for good reasons). Having this support also means the max/min is unified/composed in the op itself now consistent with other designs. In retrospect, I think this should have been in the design on day 0 itself because otherwise the design just goes in an undesired direction and is invasive to fix later. I'll be happy to review this revision.

@ftynse You could actually augment your commit summary. It's not just imperfect tiling or wavefront parallelism, but also plain loop tiling when the trip counts aren't multiple of tile sizes. One can't use affine.max/min and supply values to the intra-tile bounds since the results of such affine.max/min operations aren't valid symbols. Making them valid symbols isn't an option since they'd introduce selection trees into memref subscript arithmetic as an unintended and undesired consequence.

I borrowed parts of your text if you don't mind.

bondhugula accepted this revision.Apr 26 2021, 12:26 PM

Looks great.

This revision is now accepted and ready to land.Apr 26 2021, 12:26 PM

@ftynse This can be landed.

No it cannot because it depends on the other commit blocked in review.

No it cannot because it depends on the other commit blocked in review.

It wasn't clear how this revision which adds additional IR support logically depends on the one that was detecting and parallelizing reductions. I realize now this was done after and so affine.parallel builder changes. But if it's not a major change, this can go in first. Anyway, the other one is getting close to landing as well.

This is a non-trivial rebase which would require splitting out the part of this commit that actually depends on the changes of the previous commit. I have a very strong preference of not wasting my time on such things. If you want, feel free to do the rebase yourself and commit it on my behalf.

bondhugula accepted this revision.EditedApr 29 2021, 3:54 AM

This is a non-trivial rebase which would require splitting out the part of this commit that actually depends on the changes of the previous commit. I have a very strong preference of not wasting my time on such things. If you want, feel free to do the rebase yourself and commit it on my behalf.

There is really no need to - nor was I recommending it. I was just checking whether the rebase was a trivial one.

ftynse updated this revision to Diff 341463.Apr 29 2021, 4:15 AM

More rebase.

This revision was landed with ongoing or failed builds.Apr 29 2021, 4:16 AM
This revision was automatically updated to reflect the committed changes.
flaub added inline comments.May 3 2021, 2:55 PM
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2699–2705

Perhaps we can keep this helper, but fail if hasMinMaxBounds is true? This helper is used in some downstream code:
https://github.com/plaidml/plaidml/blob/plaidml-v1/pmlc/dialect/pxa/analysis/strides.cc#L80