This can prevent unnecessary host synchronization.
Use op.isKnownTerminator()? But you have to be careful because unknown operations, even if they are terminators, are reported as not being a terminator. So sometimes !op.isKnownNonTerminator is the better choice.
Yeah, I think using the !isKnownNonTerminator is the right approach here.
How could it be a terminator? Or is this just for reuse?
You can also do isa<async::ExecuteOp, async::AwaitOp>(user)
I am not sure why this is done. If the user of the token is a wait
Should this not be the first operation after op, async or not?
I added a comment and split it in two separate functions now.
This should have been drop_back(count) to exclude the newly added gpu.async.tokens.
Say you have something like this:
%token, %async_gpu_token = async.execute() ... async.await %token
This adds %gpu_token = async.await %async_gpu_token, and then further down we add gpu.await %gpu_token.
I added a comment.
I think the confusion came from that I reused op. I introduced it now.
This function adds gpu.wait between it and the first async/terminator following it.
Thanks, much clearer now. I am confused a bit by the types that get used but otherwise this looks great.
This explicitly walks the single block without the terminator. So this would already break if it gets remodeled. That is why I was confused below.
Is the getOperation needed here?
Does the execute op return an async.token or an async.value<async.token>? I assumed the latter, because then the body of the execute can unwrap it into an async.token or the async.wait can do the unwrapping.
The token is an async value because it is only created during the execution of the parent async region. It could be a stream that gets created in there, no?
If you do not map anything here, why the mapper?
This is nit picking, but should they not be inserted before? Because originally they also happened before.
Add some additional comments, drop getOperation().
The create semantics of ExecuteOp changed recently and now automatically wraps the result types in async.value<>s.
The interface needs a mapper. Do I need to map anything?
If we insert them before, we won't advance below (because the async.await on the token has side-effects) and will never find a gpu async op to pair it up with.