Page MenuHomePhabricator

[mlir][gpu] Move gpu.wait ops from async.execute regions to its dependencies.

Authored by csigg on Oct 28 2020, 2:59 PM.



This can prevent unnecessary host synchronization.

Diff Detail

Event Timeline

csigg created this revision.Oct 28 2020, 2:59 PM
csigg requested review of this revision.Oct 28 2020, 2:59 PM
csigg updated this revision to Diff 301759.Oct 29 2020, 2:22 PM


herhut added inline comments.Nov 26 2020, 7:02 AM

Use op.isKnownTerminator()? But you have to be careful because unknown operations, even if they are terminators, are reported as not being a terminator. So sometimes !op.isKnownNonTerminator is the better choice.


Yeah, I think using the !isKnownNonTerminator is the right approach here.


How could it be a terminator? Or is this just for reuse?


Why drop?


You can also do isa<async::ExecuteOp, async::AwaitOp>(user)


I am not sure why this is done. If the user of the token is a wait


Use .result()?


Should this not be the first operation after op, async or not?

csigg updated this revision to Diff 308022.Nov 27 2020, 4:49 AM
csigg marked 2 inline comments as done.

Apply reviewer comments and adjust to latest async dialect changes.

csigg marked 2 inline comments as done.Nov 27 2020, 5:05 AM
csigg added inline comments.

I added a comment and split it in two separate functions now.


This should have been drop_back(count) to exclude the newly added gpu.async.tokens.


Say you have something like this:

%token, %async_gpu_token = async.execute() ...
async.await %token

This adds %gpu_token = async.await %async_gpu_token, and then further down we add gpu.await %gpu_token.

I added a comment.


I think the confusion came from that I reused op. I introduced it now.

This function adds gpu.wait between it and the first async/terminator following it.
Before find_if, it either points to the beginning of an async.execute body or just after the block of async.await ops (the first one synchronizing on the token, the others waiting for the !gpu.async.token).

csigg updated this revision to Diff 308025.Nov 27 2020, 5:05 AM

Rest of reviewer comments.

herhut added a comment.Dec 1 2020, 9:55 AM

Thanks, much clearer now. I am confused a bit by the types that get used but otherwise this looks great.


This explicitly walks the single block without the terminator. So this would already break if it gets remodeled. That is why I was confused below.


Is the getOperation needed here?


Does the execute op return an async.token or an async.value<async.token>? I assumed the latter, because then the body of the execute can unwrap it into an async.token or the async.wait can do the unwrapping.

The token is an async value because it is only created during the execution of the parent async region. It could be a stream that gets created in there, no?


If you do not map anything here, why the mapper?


This is nit picking, but should they not be inserted before? Because originally they also happened before.

csigg updated this revision to Diff 308900.Dec 2 2020, 12:50 AM
csigg marked 2 inline comments as done.

Add some additional comments, drop getOperation().


The create semantics of ExecuteOp changed recently and now automatically wraps the result types in async.value<>s.
So yes, the execute op returns async.value<gpu.async.token>s.


The interface needs a mapper. Do I need to map anything?


If we insert them before, we won't advance below (because the async.await on the token has side-effects) and will never find a gpu async op to pair it up with.

herhut accepted this revision.Dec 2 2020, 1:15 AM



Ah, I see. I assumed it had a default of {} for the mapper, which some other variants of this functionality have.

This revision is now accepted and ready to land.Dec 2 2020, 1:15 AM