This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Move enqueued block handling into clang
AcceptedPublic

Authored by arsenm on Jan 13 2023, 8:37 AM.

Details

Summary

The previous implementation wasn't maintaining a faithful IR
representation of how this really works. The value returned by
createEnqueuedBlockKernel wasn't actually used as a function, and
hacked up later to be a pointer to the runtime handle global
variable. In reality, the enqueued block is a struct where the first
field is a pointer to the kernel descriptor, not the kernel itself. We
were also relying on passing around a reference to a global using a
string attribute containing its name. It's better to base this on a
proper IR symbol reference during final emission.

This now avoids using a function attribute on kernels and avoids using
the additional "runtime-handle" attribute to populate the final
metadata. Instead, associate the runtime handle reference to the
kernel with the !associated global metadata. We can then get a final,
correctly mangled name at the end.

I couldn't figure out how to get rename-with-external-symbol behavior
using a combination of comdats and aliases, so leaves an IR pass to
externalize the runtime handles for codegen. If anything breaks, it's
most likely this, so leave avoiding this for a later step. Use a
special section name to enable this behavior. This also means it's
possible to declare enqueuable kernels in source without going through
the dedicated block syntax or other dedicated compiler support.

We could move towards initializing the runtime handle in the
compiler/linker. I have a working patch where the linker sets up the
first field of the handle, avoiding the need to export the block
kernel symbol for the runtime. We would need new relocations to get
the private and group sizes, but that would avoid the runtime's
special case handling that requires the device_enqueue_symbol metadata
field.

Handle autoupgrade from the old kernel attribute. Not sure where I
could put the code shared with clang (maybe could rename
AMDGPUEmitPrintf to AMDGPUUtils).

Diff Detail

Event Timeline

arsenm created this revision.Jan 13 2023, 8:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2023, 8:37 AM
arsenm requested review of this revision.Jan 13 2023, 8:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2023, 8:37 AM
Herald added a subscriber: wdng. · View Herald Transcript
Anastasia added inline comments.Jan 15 2023, 12:31 PM
clang/lib/CodeGen/TargetInfo.cpp
12440

Is this AMDGPU target specific? If so perhaps it's better to reflect this in the name.

jmmartinez added inline comments.
clang/lib/CodeGen/TargetInfo.cpp
12580

Just a cosmetical remark: Is there any reason to keep the /*isConstant=*/, /*Initializer=*/, ... comments? I think it would be better to avoid them.

LGTM, to the extent that I can see that the change does what is advertised, and the ultimately emitted HSA metadata preserves the current contract with the runtime.

A couple of tests can use a little more explanatory comments as noted.

clang/lib/CodeGen/TargetInfo.cpp
12580

FWIW, I find these comments very helpful when spelunking through code. I could sympathise with not needing Initializer= because the value name makes it clear. But an undecorated constant literal like "true" or "10" or "nullptr" works best when accompanied by a comment.

llvm/test/Bitcode/amdgpu-autoupgrade-enqueued-block.ll
70

I did not understand what is being tested here.

llvm/test/CodeGen/AMDGPU/amdgpu-export-kernel-runtime-handles.ll
3

Is there any visible effect of the pass being tested? Or the intention is simply to check that the output is the same as input, and there is no error?

sameerds accepted this revision.Feb 5 2023, 9:01 PM
This revision is now accepted and ready to land.Feb 5 2023, 9:01 PM

Overall looks good.

llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
299

Do we really need/want to update code object v2?

arsenm added inline comments.Apr 4 2023, 8:32 AM
llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
299

as long as the code is here yes. Not updating it would mean maintaining two paths in the implementation. This is just changing the internal representation

barannikov88 added inline comments.
clang/lib/CodeGen/Targets/AMDGPU.cpp
520 ↗(On Diff #538088)

Minor suggestion: you can get these types from CGF / CGM (Int8Ty etc.)

arsenm updated this revision to Diff 554821.Aug 30 2023, 1:42 PM
arsenm marked an inline comment as done.
arsenm added inline comments.Nov 14 2023, 2:26 AM
llvm/lib/IR/CMakeLists.txt
80

This introduces a circular dependency between LLVMCore and TransformUtils. Options are:

  1. Move appendToUsed into Module
  2. Don't bother with bitcode compatibility for this
  3. Avoid depending on llvm.used. I know I tried to do this but it was so long ago I don't remember how I ended up on this solution
arsenm updated this revision to Diff 558095.Nov 14 2023, 3:10 AM

Drop bitcode auto upgrade handling