Page MenuHomePhabricator

[MLIR] Add affine.parallel folder and AffineParallelNormalizePass
ClosedPublic

Authored by flaub on Jul 30 2020, 7:38 PM.

Details

Summary

Add a folder to the affine.parallel op so that loop bounds expressions are canonicalized.

Additionally, a new AffineParallelNormalizePass is added to adjust affine.parallel ops so that the lower bound is always 0 and the upper bound always represents a range with a step size of 1.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Thanks Frank! It looks good to me. Just a few minor comments

mlir/include/mlir/Dialect/Affine/Passes.h
40

Could we use Normalize instead of Simplify? https://en.wikipedia.org/wiki/Normalized_loop
We can rename it later on if we add other simplifications in the future.

mlir/include/mlir/Dialect/Affine/Passes.td
122

Probably good to add the previous comment to this description:

/ Simplify affine.parallel ops so that they have a step size of 1 and a lower
/ bound of 0.

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2606

I don't understand this condition. Why don't remove a 0-rank loop if it has a custom attribute?

2688

Doc still missing?

mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp
22 ↗(On Diff #283394)

Same, Simplify -> Normalize

38 ↗(On Diff #283394)

We shouldn't run full blown canonicalization on all the ops as part of this pass. If canonicalization is needed, we can always invoke the canonicalizer after this pass. In that way, we would leave the decision of running canonicalization or not after this pass to the user.

51 ↗(On Diff #283394)

nit, readability: lbExpr.getValue() -> lbExpr.getValue() != 0

mlir/test/Dialect/Affine/affine-fold.mlir
23 ↗(On Diff #283394)

no iteration -> only one iteration?

This revision now requires changes to proceed.Aug 5 2020, 4:12 PM
flaub added inline comments.Aug 6 2020, 11:04 AM
mlir/include/mlir/Dialect/Affine/Passes.h
40

Gotcha, will do.

mlir/include/mlir/Dialect/Affine/Passes.td
122

Was trying to keep this short since this is just for the command line. Looking at the other summaries that seems consistent. I can try to make it a little bit more descriptive. How about if we go with the normalize nomenclature, can we just say:

"Normalize affine.parallel ops so that lower bounds are 0 and step size is 1."

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2606

We have cases where we'd like to avoid removing affine.parallel ops even when they have no IVs where the op represents something structural, like it might represent the outermost loop which we might want to do kernel outlining on. This was an attempt to prevent this particular canonicalization from running in this case. We've been 'tagging' ops in our pipeline and this was an easy way to control canonicalization. This isn't the most elegant or general solution so we're open to something better here.

mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp
38 ↗(On Diff #283394)

Fair enough I'll remove it and just add it to our pipeline as a separate pass.

mlir/test/Dialect/Affine/affine-fold.mlir
23 ↗(On Diff #283394)

Actually I meant to say when no IVs exist. But yes, that's right.

dcaballe added inline comments.
mlir/include/mlir/Dialect/Affine/Passes.td
122

Sounds good!

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2606

One option could be to convert your special rank-0 affine.parallel to some other region op that properly models what you want (kernel outlining, in this case) before the canonicalization (or probably use that region op from the beginning? I'm missing how you get to this rank-0 affine.parallel scenario). Another option could be to have a dedicated attribute to prevent the canonicalization. I think I would lean towards the first option since we wouldn't be overloading rank-0 affine.parallel constructs with special semantics. Probably @rriddle, @ftynse, @bondhugula could also help here.

flaub added inline comments.Aug 6 2020, 12:55 PM
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2606

IIUC, currently kernel outlining is based on affine.for and scf.parallel ops. Our plan was to lower affine.parallel to scf.parallel and then use the SCFToGPU pass and then eventually perform kernel outlining. If we were to canonicalize away this empty affine.parallel, then any inner affine.parallel would be elevated up one level, which would mean kernel outlining would be working on the wrong level. Our model assumes that the outermost affine.parallel represents iteration of the workgroup items. It's very possible that we'd have a single kernel launch for a single workgroup item, which would be represented as affine.parallel () = () to ().

Regarding adding a new op, I suppose that would work, although it seems redundant. What should we call this op? It would act as a shield for this canonicalization but then we'd need a special lowering for this op into affine.for or scf.parallel.

And it's not really rank-0 that have these special semantics, it's any outermost loop that has special semantics that is imposed by kernel outlining.

bondhugula requested changes to this revision.Aug 6 2020, 1:15 PM
bondhugula added inline comments.
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2606

This looks like a hack / substitute due to a missing abstraction/region op. In fact, std.execute_region exactly model scenarios like this among several others. (There is the affine.execute_region as well but you won't need it here since it starts a new affine scope at that point.) Perhaps you may want to leave it the way that's simplest for this revision and put a TODO for the need of a more suitable abstraction?

2616

Nit: Terminate with full stop please.

2650

Likewise and anywhere else.

mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
81 ↗(On Diff #283394)

Avoid auto here please.

87 ↗(On Diff #283394)

Likewise.

bondhugula added inline comments.Aug 6 2020, 1:19 PM
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2671

Nit: prefix Map to name.

mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp
38 ↗(On Diff #283394)

Note that, where needed, there is also a local version of the canonicalizer to apply on specific ops if you know which ops you want to canonicalize.

bondhugula added inline comments.Aug 6 2020, 1:22 PM
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2607

The comment needs another line to explain the rationale.

mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp
64 ↗(On Diff #283394)

Nit: steps should always be int64_t.

flaub added inline comments.Aug 6 2020, 1:27 PM
mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
112 ↗(On Diff #283394)

@bondhugula I'm confused, this file already uses this style for auto. How do I know which style to use?

flaub added inline comments.Aug 6 2020, 1:30 PM
mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp
38 ↗(On Diff #283394)

Is there an example of this someplace? That seems like something that would be useful in this case.

flaub added inline comments.Aug 6 2020, 1:39 PM
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2606

Agreed this is a hack, looking for better alternatives, but it does practically work for now.

I've been looking for the std.execute_region and affine.execute_region but can't find it. Was it a proposal or perhaps renamed?

flaub updated this revision to Diff 283734.Aug 6 2020, 2:08 PM
flaub marked 3 inline comments as done.
flaub edited the summary of this revision. (Show Details)

Address comments.

flaub edited the summary of this revision. (Show Details)Aug 6 2020, 2:13 PM
flaub added a comment.Aug 9 2020, 5:23 PM

@dcaballe & @bondhugula Is there anything else you'd like for me to address?

@rriddle I added replaceUsesOfBlockArgument and replaceUsesOfWith to the PatternRewriter, left the default impl as unsupported, and then implemented a simple version for GreedyPatternRewriteDriver, WDYT?

dcaballe added inline comments.Aug 10 2020, 2:07 PM
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2606

To keep this moving while working on a better abstraction and since you are already overusing affine.parallel, would it be possible to remove the check below so that any affine.parallel is optimized and then add some fake bounds to your special affine.parallel so that it's not optimized away?

Another option would be to define a specific attribute that we can use to prevent the optimization. We could check for that attribute only.

WDYT?

@rriddle I added replaceUsesOfBlockArgument and replaceUsesOfWith to the PatternRewriter, left the default impl as unsupported, and then implemented a simple version for GreedyPatternRewriteDriver, WDYT?

Sorry for the delay, was waiting for some of the other discussion on this revision to resolve first.

mlir/include/mlir/IR/PatternMatch.h
318 ↗(On Diff #283734)

Please use the inner loop here as the default implementation of this hook (https://github.com/llvm/llvm-project/blob/49bbb8b60e451d173c7dd42993592e8aa4d95f24/mlir/lib/Transforms/DialectConversion.cpp#L889). The operation rooted at to could be using from, using the implementation linked above allows for the use cases to "just work".

322 ↗(On Diff #283734)

I don't think this is necessary, see the comment at the use of it below.

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2620

This seems like it should just be rewriter.replaceOp(op, yield.operands());.

2626

Drop the mlir::

2628

Why is this hook necessary? Seems like you should be using rewriter.replaceOp(op, yield.operands())instead of this loop and eraseOp.

2629

This would also need to go through the rewriter.

2674

This would also need to go through the rewriter.

mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp
33 ↗(On Diff #283394)

Please do not do this. This is placing a full run of the canonicalizer pass inside of your pass, just let the user schedule the canonicalizer in their pipeline.

mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
80 ↗(On Diff #283734)

When you update the default implementation, you can call it from here.

bondhugula added inline comments.Aug 13 2020, 12:52 PM
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2606

The execute_region ops are still pending - the corresponding revisions are dormant on differential. Although there are no unresolved issues w.r.t std.execute_region, I didn't finish it up for the lack of immediate use cases (an std.yield terminator needs to be added for it).

bondhugula added inline comments.Aug 13 2020, 12:59 PM
mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
112 ↗(On Diff #283394)

A lot of the old code was written with an overuse of auto. That's fine - it's not a big deal here. In general, avoid auto if it doesn't improve readability. The issue is that the type is often obvious to the author of the revision at this point, but it often isn't when reading it later "locally".

flaub updated this revision to Diff 285729.Aug 14 2020, 12:29 PM
flaub retitled this revision from [MLIR] Add affine.parallel canonicalizations and folding to [MLIR] Add affine.parallel folder and AffineParallelNormalizePass.
flaub edited the summary of this revision. (Show Details)

After some more discussions and thought, I've removed the complex canonicalizations that:

  • break the PatternRewriter rules
  • have hacky special exemptions

What's left is the folder for affine.parallel (very similar to the one for affine.for) and the separate AffineParallelNormalizePass. The other patterns will be refactored as separate passes that we might submit here at a later time.

Hey Frank, didn't mean for this to force you to scale back the patch. If it helps I can implement the pattern rewriter hooks that you would need. Just let me know.

  • River
bondhugula added inline comments.Aug 14 2020, 10:13 PM
mlir/include/mlir/Dialect/Affine/IR/AffineValueMap.h
77

This doesn't describe what this method does! (but only its return status).

mlir/test/Dialect/Affine/affine-parallel-normalize.mlir
23

Nit: bounds -> bound or bounds'

bondhugula added inline comments.Aug 14 2020, 10:19 PM
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2513

Nit: avoid auto.

mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp
38 ↗(On Diff #283394)

The method is called mlir::applyOpPatternsAndFold and is used in these five files:

lib/Dialect/Affine/Utils/Utils.cpp
lib/Dialect/Affine/Transforms/AffineDataCopyGeneration.cpp
lib/Dialect/Affine/Transforms/SimplifyAffineStructures.cpp
lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
lib/Transforms/Utils/LoopUtils.cpp

Thanks for addressing the comments, Frank! LGTM. I'll leave the final approval to Uday and River since I'm OOO next week.

What's left is the folder for affine.parallel (very similar to the one for affine.for) and the separate AffineParallelNormalizePass. The other patterns will be refactored as separate passes that we might submit here at a later time.

Actually, having them in a separate pass could be a good idea since removing a loop (or iteration space dim), even if it has 1 iteration, is somehow dropping high level information that can be useful or even expected by some passes. I would actually have problems if we removed 1-iteration loops or dimension too early in the pipeline. Maybe a LoopSimplifyPass could gather some of these loop optimization so that we have more control over them.

dcaballe resigned from this revision.Aug 16 2020, 6:19 PM

Unblocking the review from my side. OOO next week.

bondhugula added inline comments.Aug 18 2020, 4:49 AM
mlir/test/Dialect/Affine/affine-parallel-normalize.mlir
2

Are you sure you want to run -canonicalize here? This would make it an integration test and in many cases *may* make it hard/non-trivial to maintain test cases due to changes to -canonicalize. I understand it makes it much easier/intuitive for you to write the CHECK lines - more compact IR but in general combining multiple passes for for something that is meant to test just a specific pass is discouraged. Actually, if you really want to run the canonicalizer always after this, you can run it from inside the pass and just drop this -canonicalize. But that still gives one a combined effect. What do you think?

bondhugula added inline comments.Aug 18 2020, 4:50 AM
mlir/lib/Dialect/Affine/Transforms/AffineParallelNormalize.cpp
24

Could you add a line on whether it's always guaranteed to succeed or when it might fail.

bondhugula requested changes to this revision.Aug 18 2020, 4:57 AM
bondhugula added inline comments.
mlir/lib/Dialect/Affine/Transforms/AffineParallelNormalize.cpp
60–62

This looks like a bug. Shouldn't this be ceilDiv?

mlir/test/Dialect/Affine/affine-parallel-normalize.mlir
10

Shouldn't there be 4 iterations in the outer loop here?

This revision now requires changes to proceed.Aug 18 2020, 4:57 AM
flaub marked 2 inline comments as done.Aug 18 2020, 8:40 PM

Hey Frank, didn't mean for this to force you to scale back the patch. If it helps I can implement the pattern rewriter hooks that you would need. Just let me know.

  • River

No worries River, thanks for the offer. I think we're still exploring the design space here with regards to being able to configure canonicalizations. Perhaps it's just a bad idea and I need to rethink it. I'm happy to back away and take more time to think thru a better solution so that we don't paint ourselves into a corner.

All that said, it would be helpful to see your vision for updating the default implementation of PatternRewriter, it wasn't clear to me how big of a refactor you were looking for, so I was trying to do the most minimal thing. I think there's a separate issue which is how to prevent authors of pattern rewriters from breaking the rules; it seems very easy to do so right now and at present we see no errors or other indications that this has happened.

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2513

Seems like this one improves readability, because otherwise it's the lengthy: AffineParallelOp::operand_range

mlir/lib/Dialect/Affine/Transforms/AffineParallelNormalize.cpp
60–62

Good catch!

mlir/test/Dialect/Affine/affine-parallel-normalize.mlir
2

I've dropped the test that relies on canonicalization because it's true that as a unit test, we don't need to also test the combined capability. For context, this pass was originally written as a canonicalization pattern. I think a further refinement would be to provide a way to call the transformation directly without going thru a pass to allow callers to decide whether they want to combine canonicalization with normalization. The combination is what I require in my current use cases. But as a general purpose library I see the value in not restricting other users to that particular use case.

10

Yes! Thanks for catching this!

flaub updated this revision to Diff 286470.Aug 18 2020, 8:52 PM
bondhugula added inline comments.Aug 18 2020, 11:29 PM
mlir/lib/Dialect/Affine/Transforms/AffineParallelNormalize.cpp
38

steps should all be stored in int64_t for consistency.

40

A comment here and/or one for the isWorkPending |= ... line in the body.

41

auto -> int64_t

bondhugula accepted this revision.Aug 18 2020, 11:33 PM

Thanks for addressing something. Looks good to me. Please take care of the remaining minor comments for doc.

mlir/test/Dialect/Affine/affine-parallel-normalize.mlir
4

Nit: Set -> Normalize

This revision is now accepted and ready to land.Aug 18 2020, 11:33 PM
flaub updated this revision to Diff 286904.Aug 20 2020, 3:15 PM
flaub marked 4 inline comments as done.
This revision was automatically updated to reflect the committed changes.
flaub added a comment.Aug 20 2020, 3:26 PM

Thanks for the review everyone, I've made tiny tweaks after further feedback. This includes extracting the core transformation of normalization into a separate utility function (in case other users want to integrate this transformation in a different pass), and also adding a helper method AffineParallelOp::getSteps().