This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Use "hostcall" module flag instead of searching for ockl_hostcall_internal() declaration.
ClosedPublic

Authored by kpyzhov on Sep 23 2021, 8:30 AM.

Details

Summary

The current way to detect hostcalls by looking for "ockl_hostcall_internal()" function in the module seems to be not reliable enough. The LTO may rename the "ockl_hostcall_internal()" function when an application is compiled with "-fgpu-rdc", and MetadataStreamer pass to fail to detect hostcalls, therefore it does not set the "hidden_hostcall_buffer" kernel argument.
This change adds a new module flag: "hostcall" that can be used to detect whether GPU functions use host calls for printf.

Diff Detail

Event Timeline

kpyzhov created this revision.Sep 23 2021, 8:30 AM
kpyzhov requested review of this revision.Sep 23 2021, 8:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 23 2021, 8:30 AM
kpyzhov edited the summary of this revision. (Show Details)Sep 23 2021, 8:31 AM
kpyzhov edited the summary of this revision. (Show Details)
kpyzhov retitled this revision from [AMDGPU] Use "amdgpu-hostcalls" attribute to detect if a kernel function uses any host calls. to [AMDGPU] Use "amdgpu-hostcalls" attribute to detect if a kernel function hostcall for GPU printf..Sep 23 2021, 8:34 AM
kpyzhov edited the summary of this revision. (Show Details)

I do not want to add any more attributes to indicate a feature is used. If anything, we would use an attribute to indicate that there is no host call. I'm not sure an attribute is the right solution here. I think we should just rely on a known externally visible global symbol name

Aomp uses the presence of a symbol to determine if the hostcall machinery needs to be initialised. Using the function symbol messes up inlining so it defines a weak (strictly, common to work around an assembler bug) object within the hostcall entry point.

// The global variable needs_hostcall_buffer is used to detect that
// host services are required. If this function is not inlined, the symbol
// will not be present and the runtime can avoid initialising said support.
__asm__("; hostcall_invoke: record need for hostcall support\n\t"
        ".type needs_hostcall_buffer,@object\n\t"
        ".global needs_hostcall_buffer\n\t"
        ".comm needs_hostcall_buffer,4":::);

I'd suggest hip do the same thing, with a different symbol name to that. It can be detected by looking for that symbol in the code object, same as you do today.

That should be __namespaced, and it would be an abs symbol if the HSA loader understood them, but those limitations notwithstanding it's been working great for ages. Probably over a year since I wrote that.

If the function is inlined, even repeatedly, that asm is still valid. If every call to the function is eliminated, the asm is eliminated and thus the symbol is removed.

JonChesterfield requested changes to this revision.Sep 23 2021, 9:06 AM

Tagging req changes because this is easily implemented without compiler support

This revision now requires changes to proceed.Sep 23 2021, 9:06 AM

Tagging req changes because this is easily implemented without compiler support

What do you mean? We already have compiler support for hostcalls and we are using it.

Aomp uses the presence of a symbol to determine if the hostcall machinery needs to be initialised. Using the function symbol messes up inlining so it defines a weak (strictly, common to work around an assembler bug) object within the hostcall entry point.

// The global variable needs_hostcall_buffer is used to detect that
// host services are required. If this function is not inlined, the symbol
// will not be present and the runtime can avoid initialising said support.
__asm__("; hostcall_invoke: record need for hostcall support\n\t"
        ".type needs_hostcall_buffer,@object\n\t"
        ".global needs_hostcall_buffer\n\t"
        ".comm needs_hostcall_buffer,4":::);

I'd suggest hip do the same thing, with a different symbol name to that. It can be detected by looking for that symbol in the code object, same as you do today.

That should be __namespaced, and it would be an abs symbol if the HSA loader understood them, but those limitations notwithstanding it's been working great for ages. Probably over a year since I wrote that.

If the function is inlined, even repeatedly, that asm is still valid. If every call to the function is eliminated, the asm is eliminated and thus the symbol is removed.

We need to check the IR to know whether host call is needed since backend depends on that to determine whether to emit implicit kernel arg runtime metadata. How do we check IR for your approach? Are you suggesting to convey the host call information by symbol instead of runtime metadata? That would be an ABI change.

We need to check the IR to know whether host call is needed since backend depends on that to determine whether to emit implicit kernel arg runtime metadata. How do we check IR for your approach? Are you suggesting to convey the host call information by symbol instead of runtime metadata? That would be an ABI change.

Yeah, the real requirement here is to have something that remains stable until the metadata streamer. It should not be visible to anything outside the compiler. A function attribute does look like overkill. Can we tweak the linkage type of __ockl_hostcall_internal() to protect it from getting renamed by LTO? Alternatively, this could be one of the rare cases that qualify for"@llvm.compiler.used"?

We need to check the IR to know whether host call is needed since backend depends on that to determine whether to emit implicit kernel arg runtime metadata. How do we check IR for your approach? Are you suggesting to convey the host call information by symbol instead of runtime metadata? That would be an ABI change.

I suppose you'd look for the asm block instead of the function or tag the whole IR module with some metadata. It hadn't occurred to me that hostcall would need any special support in the back end. What runtime metadata stuff does it require?

We need to check the IR to know whether host call is needed since backend depends on that to determine whether to emit implicit kernel arg runtime metadata. How do we check IR for your approach? Are you suggesting to convey the host call information by symbol instead of runtime metadata? That would be an ABI change.

Yeah, the real requirement here is to have something that remains stable until the metadata streamer. It should not be visible to anything outside the compiler. A function attribute does look like overkill. Can we tweak the linkage type of __ockl_hostcall_internal() to protect it from getting renamed by LTO? Alternatively, this could be one of the rare cases that qualify for"@llvm.compiler.used"?

when clang links device library with -mlink-builtin-bitcode, all functions are internalized, so tweak linkage of __ockl_hostcall_internal does not work.

We need to check the IR to know whether host call is needed since backend depends on that to determine whether to emit implicit kernel arg runtime metadata. How do we check IR for your approach? Are you suggesting to convey the host call information by symbol instead of runtime metadata? That would be an ABI change.

I suppose you'd look for the asm block instead of the function or tag the whole IR module with some metadata. It hadn't occurred to me that hostcall would need any special support in the back end. What runtime metadata stuff does it require?

Then you have to check the text of asm block. It does not seem to be a stable way. A module flag seems better way to convey that info in IR.

For the reason why backend needs to know whether a kernel needs hostcall buffer, search "hidden_hostcall_buffer" at https://llvm.org/docs/AMDGPUUsage.html#code-object-v3-metadata . It is a per kernel metadata which runtime uses to decide whether to allocate hostcall buffer for a kernel. Changing that would change ABI. I don't think we want to do that.

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
72

Instead of emit a function attr, we can emit a module flag, then we can avoid checking func attr for every function later.

ronlieb added a subscriber: ronlieb.Oct 1 2021, 7:06 AM
kpyzhov retitled this revision from [AMDGPU] Use "amdgpu-hostcalls" attribute to detect if a kernel function hostcall for GPU printf. to [AMDGPU] Use "hostcall" module flag instead of searching for ockl_hostcall_internal() declaration..Oct 4 2021, 12:11 PM
kpyzhov edited the summary of this revision. (Show Details)
kpyzhov updated this revision to Diff 376988.Oct 4 2021, 12:13 PM
yaxunl added inline comments.Oct 4 2021, 12:32 PM
llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
66

Should be "Override" since a module needing hostcall overrides a module not needing hostcall.

Also the name of the flag better be "amdgpu_hostcall" to avoid conflict with other targets.

kpyzhov updated this revision to Diff 377037.Oct 4 2021, 2:24 PM
JonChesterfield accepted this revision.Oct 4 2021, 4:38 PM

Module metadata is clearer than checking for a particular function, plus robust to the function being renamed / inlined.

I'm not convinced by passing the state as an implicit kernel argument but that's the status quo.

This revision is now accepted and ready to land.Oct 4 2021, 4:38 PM
This revision was landed with ongoing or failed builds.Oct 5 2021, 6:56 AM
This revision was automatically updated to reflect the committed changes.
kpyzhov updated this revision to Diff 377413.Oct 5 2021, 6:42 PM

Corrected "duplicate flag" error.

This revision was landed with ongoing or failed builds.Oct 5 2021, 6:47 PM