Page MenuHomePhabricator

[mlir][gpu] Handle async in gpu.launch_func lowering.
ClosedPublic

Authored by csigg on Oct 21 2020, 11:40 PM.

Details

Summary

For the synchronous case, destroy the stream after synchronization.

Sneak in a unrelated change to report why the gpu.wait conversion pattern didn't match.

Diff Detail

Unit TestsFailed

TimeTest
270 mslinux > MLIR.Conversion/SPIRVToLLVM::lower-host-to-llvm-calls.mlir
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/bin/mlir-opt --lower-host-to-llvm /mnt/disks/ssd0/agent/llvm-project/mlir/test/Conversion/SPIRVToLLVM/lower-host-to-llvm-calls.mlir | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/mlir/test/Conversion/SPIRVToLLVM/lower-host-to-llvm-calls.mlir
130 mswindows > MLIR.Conversion/SPIRVToLLVM::lower-host-to-llvm-calls.mlir
Script: -- : 'RUN: at line 1'; c:\ws\w32-1\llvm-project\premerge-checks\build\bin\mlir-opt.exe --lower-host-to-llvm C:\ws\w32-1\llvm-project\premerge-checks\mlir\test\Conversion\SPIRVToLLVM\lower-host-to-llvm-calls.mlir | c:\ws\w32-1\llvm-project\premerge-checks\build\bin\filecheck.exe C:\ws\w32-1\llvm-project\premerge-checks\mlir\test\Conversion\SPIRVToLLVM\lower-host-to-llvm-calls.mlir

Event Timeline

csigg created this revision.Oct 21 2020, 11:40 PM
csigg requested review of this revision.Oct 21 2020, 11:40 PM
herhut requested changes to this revision.Oct 22 2020, 6:26 AM

The lifetime management of streams is not correct, I think. Even with the assumption that "forks" are explicit, we might leak streams. For instance if we have

%t3 = gpu.wait aysnc [%t1, %t2] // last use of %t1 and %t2
...

the streams would never be freed. I wonder whether it would be easier to not deallocate streams at all when lowering and then have a fix-up pass that inserts the stream destruction where needed.

mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp
502

If there is more than one dependency, this would need to insert synchronization, rigth?

518

This assumes the property that all "forks" (i.e., tokens that have multiple uses) are made explicit via gpu.wait, right? This idea is not captured in the comments anywhere and also we do not enforce it in the rewritings. If this property is not true, the lowering will be incorrect.

519

nit: left-over tab.

This revision now requires changes to proceed.Oct 22 2020, 6:26 AM
csigg marked an inline comment as done.Oct 22 2020, 10:05 AM

The lifetime management of streams is not correct, I think. Even with the assumption that "forks" are explicit, we might leak streams. For instance if we have

%t3 = gpu.wait aysnc [%t1, %t2] // last use of %t1 and %t2
...

the streams would never be freed. I wonder whether it would be easier to not deallocate streams at all when lowering and then have a fix-up pass that inserts the stream destruction where needed.

Yes, you are right. The gpu.wait async lowering doesn't currently destroy the streams it depends on. The gpu.wait async ops we create in D89937 have no dependencies.

My plan is that gpu.wait async with dependencies will be inserted for tokens that have multiple uses (the explicit "forks"). There, the streams we synchronize on should not be destroyed.

We should verify that we don't leak, but hopefully that will be a little easier than a fix-up pass.

The change here primarily fixes the existing leaks of streams that we encountered benchmark when lowering non-async gpu.launch_func.

mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp
502

Correct. The rewriter checks that there is at most one dependency on line 449.

518

Yes, that is correct. I added a comment explaining this assumption. I will address the TODO in a follow up change.

csigg updated this revision to Diff 300032.Oct 22 2020, 10:10 AM

Add a comment that token are expected to not be reused.

csigg added inline comments.Oct 22 2020, 10:13 AM
mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp
519

Isn't this a marker that the line didn't change? I can't find any tab in the code.

csigg updated this revision to Diff 300632.Oct 26 2020, 4:29 AM

Fail when synchronous version has async dependencies.

mehdi_amini added inline comments.Oct 26 2020, 6:29 PM
mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp
323

drive by comment: it's really nice that you add messages!

herhut accepted this revision.Oct 29 2020, 8:57 AM
herhut added inline comments.
mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp
519

Oh, interesting. My brain is primed to other review tools :)

This revision is now accepted and ready to land.Oct 29 2020, 8:57 AM
mehdi_amini added inline comments.Oct 29 2020, 9:36 AM
mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp
519

Yeah phab indicates like this that only the indentation changed.

csigg updated this revision to Diff 301738.Oct 29 2020, 1:50 PM

Rebase.

This revision was landed with ongoing or failed builds.Oct 29 2020, 2:16 PM
This revision was automatically updated to reflect the committed changes.