This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Lower enqueued blocks and generate runtime metadata
ClosedPublic

Authored by yaxunl on Oct 5 2017, 7:14 PM.

Details

Summary

This patch adds a post-linking pass which replaces the function pointer of enqueued
block kernel with a global variable (runtime handle) and adds
"runtime-handle" attribute to the enqueued block kernel.

In LLVM CodeGen the runtime-handle metadata will be translated to
RuntimeHandle metadata in code object. Runtime allocates a global buffer
for each kernel with RuntimeHandel metadata and saves the kernel address
required for the AQL packet into the buffer. __enqueue_kernel function
in device library knows that the invoke function pointer in the block
literal is actually runtime handle and loads the kernel address from it
and puts it into AQL packet for dispatching.

This cannot be done in FE since FE cannot create a unique global variable
with external linkage across LLVM modules. The global variable with internal
linkage does not work since optimization passes will try to replace loads
of the global variable with its initialization value.

Diff Detail

Repository
rL LLVM

Event Timeline

yaxunl created this revision.Oct 5 2017, 7:14 PM
rampitec added inline comments.Oct 6 2017, 10:55 AM
lib/Target/AMDGPU/AMDGPUOpenCLEnqueuedBlockLowering.cpp
44 ↗(On Diff #117949)

The comment looks wrong.

74 ↗(On Diff #117949)

Can you change I with F? "I" associates with an instruction.

78 ↗(On Diff #117949)

Users of user_begin() can be empty.

arsenm added inline comments.Oct 6 2017, 11:05 AM
lib/Target/AMDGPU/AMDGPUOpenCLEnqueuedBlockLowering.cpp
63 ↗(On Diff #117949)

Re-use DEBUG_TYPE

76–78 ↗(On Diff #117949)

Looking at the users like this confuse me. Don't all uses need to be handled always? Why isn't this just a range loop over all the users?

83 ↗(On Diff #117949)

Twine

86 ↗(On Diff #117949)

Why external linkage?

90 ↗(On Diff #117949)

llvm:: unnecessary

test/CodeGen/AMDGPU/enqueue-kernel.ll
1 ↗(On Diff #117949)

-verify-machineinstrs doesn't make sense with an opt test

79 ↗(On Diff #117949)

Remove unnecessary attributes

83–84 ↗(On Diff #117949)

These should be removable

yaxunl updated this revision to Diff 118053.Oct 6 2017, 12:14 PM
yaxunl marked 11 inline comments as done.

Revised by Matt's and Stas' comments.

lib/Target/AMDGPU/AMDGPUOpenCLEnqueuedBlockLowering.cpp
76–78 ↗(On Diff #117949)

We only translate IRs emitted by Clang when translating enqueue_kernel functions. These IRs satisfy the conditions given here. Therefore we only need to handle the first use.

If IR does not satisfy these conditions, it is not generated by Clang and could be anything. We just do not process them.

78 ↗(On Diff #117949)

we have check I.user_begin()->hasOneUse().

86 ↗(On Diff #117949)

This needs to be an external symbol in code object. Runtime will allocate a global buffer for it and save the kernel address to it.

This revision is now accepted and ready to land.Oct 6 2017, 12:19 PM
kzhuravl edited edge metadata.Oct 6 2017, 1:03 PM

Can you also update metadata docs at AMDGPUUSAGE

yaxunl updated this revision to Diff 118069.Oct 6 2017, 1:47 PM

Revised by Konstantin's comments. Add documentation for metadata.

kzhuravl accepted this revision.Oct 6 2017, 2:09 PM

LGTM.

arsenm added inline comments.Oct 6 2017, 2:12 PM
lib/Target/AMDGPU/AMDGPUOpenCLEnqueuedBlockLowering.cpp
82 ↗(On Diff #118069)

This is still producing a std::string. This should be an explicit Twine, not auto

yaxunl marked an inline comment as done.Oct 6 2017, 2:17 PM
yaxunl added inline comments.
lib/Target/AMDGPU/AMDGPUOpenCLEnqueuedBlockLowering.cpp
82 ↗(On Diff #118069)

I tried that but it does not work. I got corrupted string when use it. The Twine documentation says it does not live beyond the statement where it is defined. Since I need to use RuntimeHandle in two places, it cannot be Twine.

This revision was automatically updated to reflect the committed changes.
yaxunl marked an inline comment as done.