This is an archive of the discontinued LLVM Phabricator instance.

[mlir][gpu] Add pass to make GPU ops within a region execute asynchronously.
ClosedPublic

Authored by csigg on Oct 22 2020, 1:05 AM.

Diff Detail

Event Timeline

csigg created this revision.Oct 22 2020, 1:05 AM
csigg requested review of this revision.Oct 22 2020, 1:05 AM
herhut requested changes to this revision.Oct 22 2020, 6:44 AM
herhut added inline comments.
mlir/lib/Dialect/GPU/Transforms/AsyncRegionRewriter.cpp
112

I find it somewhat hard to read that this is a carried state. A better name or comment would already help a lot.

134

Is it useful to formulate this as a pattern? It could equally well be just a walk over all operations, as there is no matching going on, at all.

This revision now requires changes to proceed.Oct 22 2020, 6:44 AM
csigg marked an inline comment as done.Oct 22 2020, 10:50 AM
csigg added inline comments.
mlir/lib/Dialect/GPU/Transforms/AsyncRegionRewriter.cpp
134

I'm using notifyMatchFailure, and maybe/theoretically there could be different patterns in the same pass. But to be honest I mainly used RewritePattern because I vaguely remember that River didn't like the raw walk in the all-reduce lowering. If you think a raw walk is better here, I'm happy to change it.

rriddle added inline comments.Oct 22 2020, 11:03 AM
mlir/lib/Dialect/GPU/Transforms/AsyncRegionRewriter.cpp
134

I don't exactly remember the details there. A normal walk is fine, but I tend to err more towards patterns because they compose better/have better logging/remove the need to write transformation application logic when things are more complicated than a simple walk. Not to say that you shouldn't just use a simple walk here, but I'd say it depends on how this pass is intended to evolve.

mehdi_amini added inline comments.Oct 22 2020, 4:14 PM
mlir/lib/Dialect/GPU/Transforms/AsyncRegionRewriter.cpp
134

Because there is state in the pattern keeping a value, this is to me a strong indication that we should not use a pattern here. This is just hiding the traversal logic, while it seems core to the algorithm.

csigg updated this revision to Diff 300170.Oct 22 2020, 11:28 PM

Use raw walk instead of a RewritePattern.

This does seem a little cleaner, at least at the moment:

  • It's easier to make the pass fail when some preconditions aren't met.
  • The rewriter.replaceOp() assumes that we replace the root, which is not what we did.
sanjoy.google added inline comments.Oct 23 2020, 12:02 AM
mlir/lib/Dialect/GPU/Transforms/AsyncRegionRewriter.cpp
38

The more common term for this is to "thread the token through", WDYT?

79

Is this correct if the region has cycles or diamond control flow?

csigg added inline comments.Oct 23 2020, 12:36 PM
mlir/lib/Dialect/GPU/Transforms/AsyncRegionRewriter.cpp
38

Thanks for the suggestion. Yes, that sound better. I will include that change in my next update (it's always a bit of a dance, so I'm waiting if there are more requested changes).

79

I thought control flow requires separate blocks, each with their own terminator (which triggers a host synchronization and clears the currentToken). Is that not correct?

134

I've removed the pattern, but it did not contain state. The Callback containing the state was instantiated inside the matchAndRewrite().

csigg updated this revision to Diff 300609.Oct 26 2020, 2:53 AM

Error out on encountering pre-existing gpu.wait.

sanjoy.google added inline comments.Oct 28 2020, 10:32 PM
mlir/lib/Dialect/GPU/Transforms/AsyncRegionRewriter.cpp
79

You're right, I didn't notice that bit.

herhut accepted this revision.Oct 29 2020, 9:06 AM

Thanks!

This revision is now accepted and ready to land.Oct 29 2020, 9:06 AM
csigg updated this revision to Diff 301739.Oct 29 2020, 1:53 PM

Rebase.

This revision was landed with ongoing or failed builds.Oct 29 2020, 2:18 PM
This revision was automatically updated to reflect the committed changes.
mehdi_amini added inline comments.Apr 13 2023, 9:41 AM
mlir/lib/Dialect/GPU/Transforms/AsyncRegionRewriter.cpp
83

Is there a reason why this is using Operation::create to clone the operation instead of one of the clone() routine or could we just switch it?