Page MenuHomePhabricator

[mlir][gpu] Handle async in gpu.launch_func lowering.
ClosedPublic

Authored by csigg on Oct 21 2020, 11:40 PM.

Details

Summary

For the synchronous case, destroy the stream after synchronization.

Sneak in a unrelated change to report why the gpu.wait conversion pattern didn't match.

Diff Detail

Event Timeline

csigg created this revision.Oct 21 2020, 11:40 PM
csigg requested review of this revision.Oct 21 2020, 11:40 PM
herhut requested changes to this revision.Oct 22 2020, 6:26 AM

The lifetime management of streams is not correct, I think. Even with the assumption that "forks" are explicit, we might leak streams. For instance if we have

%t3 = gpu.wait aysnc [%t1, %t2] // last use of %t1 and %t2
...

the streams would never be freed. I wonder whether it would be easier to not deallocate streams at all when lowering and then have a fix-up pass that inserts the stream destruction where needed.

mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp
498

If there is more than one dependency, this would need to insert synchronization, rigth?

515

This assumes the property that all "forks" (i.e., tokens that have multiple uses) are made explicit via gpu.wait, right? This idea is not captured in the comments anywhere and also we do not enforce it in the rewritings. If this property is not true, the lowering will be incorrect.

516

nit: left-over tab.

This revision now requires changes to proceed.Oct 22 2020, 6:26 AM
csigg marked an inline comment as done.Oct 22 2020, 10:05 AM

The lifetime management of streams is not correct, I think. Even with the assumption that "forks" are explicit, we might leak streams. For instance if we have

%t3 = gpu.wait aysnc [%t1, %t2] // last use of %t1 and %t2
...

the streams would never be freed. I wonder whether it would be easier to not deallocate streams at all when lowering and then have a fix-up pass that inserts the stream destruction where needed.

Yes, you are right. The gpu.wait async lowering doesn't currently destroy the streams it depends on. The gpu.wait async ops we create in D89937 have no dependencies.

My plan is that gpu.wait async with dependencies will be inserted for tokens that have multiple uses (the explicit "forks"). There, the streams we synchronize on should not be destroyed.

We should verify that we don't leak, but hopefully that will be a little easier than a fix-up pass.

The change here primarily fixes the existing leaks of streams that we encountered benchmark when lowering non-async gpu.launch_func.

mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp
498

Correct. The rewriter checks that there is at most one dependency on line 449.

515

Yes, that is correct. I added a comment explaining this assumption. I will address the TODO in a follow up change.

csigg updated this revision to Diff 300032.Oct 22 2020, 10:10 AM

Add a comment that token are expected to not be reused.

csigg added inline comments.Oct 22 2020, 10:13 AM
mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp
516

Isn't this a marker that the line didn't change? I can't find any tab in the code.

csigg updated this revision to Diff 300632.Oct 26 2020, 4:29 AM

Fail when synchronous version has async dependencies.

mehdi_amini added inline comments.Oct 26 2020, 6:29 PM
mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp
320

drive by comment: it's really nice that you add messages!

herhut accepted this revision.Oct 29 2020, 8:57 AM
herhut added inline comments.
mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp
516

Oh, interesting. My brain is primed to other review tools :)

This revision is now accepted and ready to land.Oct 29 2020, 8:57 AM
mehdi_amini added inline comments.Oct 29 2020, 9:36 AM
mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp
516

Yeah phab indicates like this that only the indentation changed.

csigg updated this revision to Diff 301738.Oct 29 2020, 1:50 PM

Rebase.

This revision was landed with ongoing or failed builds.Oct 29 2020, 2:16 PM
This revision was automatically updated to reflect the committed changes.