The current way to detect hostcalls by looking for "ockl_hostcall_internal()" function in the module seems to be not reliable enough. The LTO may rename the "ockl_hostcall_internal()" function when an application is compiled with "-fgpu-rdc", and MetadataStreamer pass to fail to detect hostcalls, therefore it does not set the "hidden_hostcall_buffer" kernel argument.
This change adds a new module flag: "hostcall" that can be used to detect whether GPU functions use host calls for printf.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I do not want to add any more attributes to indicate a feature is used. If anything, we would use an attribute to indicate that there is no host call. I'm not sure an attribute is the right solution here. I think we should just rely on a known externally visible global symbol name
Aomp uses the presence of a symbol to determine if the hostcall machinery needs to be initialised. Using the function symbol messes up inlining so it defines a weak (strictly, common to work around an assembler bug) object within the hostcall entry point.
// The global variable needs_hostcall_buffer is used to detect that // host services are required. If this function is not inlined, the symbol // will not be present and the runtime can avoid initialising said support. __asm__("; hostcall_invoke: record need for hostcall support\n\t" ".type needs_hostcall_buffer,@object\n\t" ".global needs_hostcall_buffer\n\t" ".comm needs_hostcall_buffer,4":::);
I'd suggest hip do the same thing, with a different symbol name to that. It can be detected by looking for that symbol in the code object, same as you do today.
That should be __namespaced, and it would be an abs symbol if the HSA loader understood them, but those limitations notwithstanding it's been working great for ages. Probably over a year since I wrote that.
If the function is inlined, even repeatedly, that asm is still valid. If every call to the function is eliminated, the asm is eliminated and thus the symbol is removed.
What do you mean? We already have compiler support for hostcalls and we are using it.
We need to check the IR to know whether host call is needed since backend depends on that to determine whether to emit implicit kernel arg runtime metadata. How do we check IR for your approach? Are you suggesting to convey the host call information by symbol instead of runtime metadata? That would be an ABI change.
Yeah, the real requirement here is to have something that remains stable until the metadata streamer. It should not be visible to anything outside the compiler. A function attribute does look like overkill. Can we tweak the linkage type of __ockl_hostcall_internal() to protect it from getting renamed by LTO? Alternatively, this could be one of the rare cases that qualify for"@llvm.compiler.used"?
I suppose you'd look for the asm block instead of the function or tag the whole IR module with some metadata. It hadn't occurred to me that hostcall would need any special support in the back end. What runtime metadata stuff does it require?
when clang links device library with -mlink-builtin-bitcode, all functions are internalized, so tweak linkage of __ockl_hostcall_internal does not work.
Then you have to check the text of asm block. It does not seem to be a stable way. A module flag seems better way to convey that info in IR.
For the reason why backend needs to know whether a kernel needs hostcall buffer, search "hidden_hostcall_buffer" at https://llvm.org/docs/AMDGPUUsage.html#code-object-v3-metadata . It is a per kernel metadata which runtime uses to decide whether to allocate hostcall buffer for a kernel. Changing that would change ABI. I don't think we want to do that.
llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp | ||
---|---|---|
70 | Instead of emit a function attr, we can emit a module flag, then we can avoid checking func attr for every function later. |
llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp | ||
---|---|---|
66 | Should be "Override" since a module needing hostcall overrides a module not needing hostcall. Also the name of the flag better be "amdgpu_hostcall" to avoid conflict with other targets. |
Module metadata is clearer than checking for a particular function, plus robust to the function being renamed / inlined.
I'm not convinced by passing the state as an implicit kernel argument but that's the status quo.
Should be "Override" since a module needing hostcall overrides a module not needing hostcall.
Also the name of the flag better be "amdgpu_hostcall" to avoid conflict with other targets.