This is an archive of the discontinued LLVM Phabricator instance.

[mlir][gpu] Add lowering to LLVM for `gpu.wait` and `gpu.wait async`.
ClosedPublic

Authored by csigg on Oct 19 2020, 4:23 AM.

Diff Detail

Event Timeline

csigg created this revision.Oct 19 2020, 4:23 AM
csigg requested review of this revision.Oct 19 2020, 4:23 AM
csigg updated this revision to Diff 299009.Oct 19 2020, 4:55 AM

Remove checking operands to be LLVMType because it's guaranteed by the type conversion.

herhut added inline comments.Oct 20 2020, 4:15 AM
mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp
327

defOp could be null, e.g. a block argument, in which case you have to insert these late (maybe at block start?)

330

This is just asyncDependency, right?

330

Does this actually work with a gpu.call? That would create a stream and then enqueue the kernel on the stream, which is a use of the stream. So the record event would be inserted before that use, right?

csigg updated this revision to Diff 299562.Oct 20 2020, 11:25 PM

Apply thoughtful comments from herhut.

mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp
327

Oh, thanks for catching this. I've changed it to insert the mgpuEventRecord at block start if defOp null.

330

Yes, much cleaner.

330

Good point. I changed the insertion point to after the definition of the original operand.

If I understand correctly,

%t0 = gpu.launch_func async ...
%t1 = gpu.wait async [%t0]

will now convert to

%stream0 = llvm.call @mgpuStreamCreate()
llvm.call @mgpuLaunchKernel(%stream0, ...)       // assumption: inserted before gpu.launch_func
%t0 = gpu.launch_func async ...                  // marked for deletion
%event = llvm.call @mgpuEventCreate()            // after definition of %t0, not %stream0
llvm.call @mgpuEventRecord(%event, %stream0)
%stream1 = llvm.call @mgpuStreamCreate()
llvm.call @mgpuStreamWaitEvent(%stream1, %event)
llvm.call @mgpuEventDestroy(%event)
%t1 = gpu.wait async [%t0]                       // marked for deletion

Side note: my intention is that streams will only be created with explicit gpu.wait async ops, but your point still holds.

herhut accepted this revision.Oct 21 2020, 7:19 AM

Thanks.

mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp
326

Can you add a comment here why getOperands is used. This is somewhat special and not obvious.

330

Can you just copy this to the comment in the code. I think it would serve well as documentation,

mlir/test/Conversion/GPUCommon/lower-wait-to-gpu-runtime-calls.mlir
5

Please remember to add a test with a gpu.launch once that supports async launches.

This revision is now accepted and ready to land.Oct 21 2020, 7:19 AM
This revision was landed with ongoing or failed builds.Oct 21 2020, 9:20 AM
This revision was automatically updated to reflect the committed changes.