This is an archive of the discontinued LLVM Phabricator instance.

[mlir][gpu] Move gpu.wait ops from async.execute regions to its dependencies.
ClosedPublic

Authored by csigg on Oct 28 2020, 2:59 PM.

Details

Summary

This can prevent unnecessary host synchronization.

Diff Detail

Event Timeline

csigg created this revision.Oct 28 2020, 2:59 PM
csigg requested review of this revision.Oct 28 2020, 2:59 PM
csigg updated this revision to Diff 301759.Oct 29 2020, 2:22 PM

Rebase.

herhut added inline comments.Nov 26 2020, 7:02 AM
mlir/lib/Dialect/GPU/Transforms/AsyncRegionRewriter.cpp
37

Use op.isKnownTerminator()? But you have to be careful because unknown operations, even if they are terminators, are reported as not being a terminator. So sometimes !op.isKnownNonTerminator is the better choice.

51

Yeah, I think using the !isKnownNonTerminator is the right approach here.

133

How could it be a terminator? Or is this just for reuse?

181

Why drop?

194

You can also do isa<async::ExecuteOp, async::AwaitOp>(user)

206

I am not sure why this is done. If the user of the token is a wait

210

Use .result()?

224

Should this not be the first operation after op, async or not?

csigg updated this revision to Diff 308022.Nov 27 2020, 4:49 AM
csigg marked 2 inline comments as done.

Apply reviewer comments and adjust to latest async dialect changes.

csigg marked 2 inline comments as done.Nov 27 2020, 5:05 AM
csigg added inline comments.
mlir/lib/Dialect/GPU/Transforms/AsyncRegionRewriter.cpp
133

I added a comment and split it in two separate functions now.

181

This should have been drop_back(count) to exclude the newly added gpu.async.tokens.

206

Say you have something like this:

%token, %async_gpu_token = async.execute() ...
async.await %token

This adds %gpu_token = async.await %async_gpu_token, and then further down we add gpu.await %gpu_token.

I added a comment.

224

I think the confusion came from that I reused op. I introduced it now.

This function adds gpu.wait between it and the first async/terminator following it.
Before find_if, it either points to the beginning of an async.execute body or just after the block of async.await ops (the first one synchronizing on the token, the others waiting for the !gpu.async.token).

csigg updated this revision to Diff 308025.Nov 27 2020, 5:05 AM

Rest of reviewer comments.

herhut added a comment.Dec 1 2020, 9:55 AM

Thanks, much clearer now. I am confused a bit by the types that get used but otherwise this looks great.

mlir/lib/Dialect/GPU/Transforms/AsyncRegionRewriter.cpp
127

This explicitly walks the single block without the terminator. So this would already break if it gets remodeled. That is why I was confused below.

170

Is the getOperation needed here?

171

Does the execute op return an async.token or an async.value<async.token>? I assumed the latter, because then the body of the execute can unwrap it into an async.token or the async.wait can do the unwrapping.

The token is an async value because it is only created during the execution of the parent async region. It could be a stream that gets created in there, no?

178

If you do not map anything here, why the mapper?

211

This is nit picking, but should they not be inserted before? Because originally they also happened before.

csigg updated this revision to Diff 308900.Dec 2 2020, 12:50 AM
csigg marked 2 inline comments as done.

Add some additional comments, drop getOperation().

mlir/lib/Dialect/GPU/Transforms/AsyncRegionRewriter.cpp
171

The create semantics of ExecuteOp changed recently and now automatically wraps the result types in async.value<>s.
So yes, the execute op returns async.value<gpu.async.token>s.

178

The interface needs a mapper. Do I need to map anything?

211

If we insert them before, we won't advance below (because the async.await on the token has side-effects) and will never find a gpu async op to pair it up with.

herhut accepted this revision.Dec 2 2020, 1:15 AM

Thanks!

mlir/lib/Dialect/GPU/Transforms/AsyncRegionRewriter.cpp
178

Ah, I see. I assumed it had a default of {} for the mapper, which some other variants of this functionality have.

This revision is now accepted and ready to land.Dec 2 2020, 1:15 AM