Identifies kernels which performs device side kernel enqueues and emit
metadata for the associated hidden kernel arguments. Such kernels are
marked with "calls-enqueue-kernel" function attribute by
AMDGPUOpenCLEnqueueKernelLowering pass and later on
hidden kernel arguments metadata HiddenDefaultQueue and
HiddenCompletionAction are emitted for them.
Details
Diff Detail
Event Timeline
If the kernel does not use the hidden arguments should they be listed in the metadata since the runtime is not required to pass them in? If not needed, the we still want to have a kernarg location at the known location, it will just not be initialized by the runtime.
In that case we just need runtime to fill in 0 values for the missing argument so that each hidden argument is in the fixed position. e.g. if runtime does not find printf metadata, it will still insert a nullptr as the PrintfBuffer argument. I think only library code uses these hidden arguments. Then library code can assume fixed position for each hidden argument.
So will the metadata only be generated for the hidden arguments that actually need to be initialized by the runtime? I believe the metadata format contains the position of the argument so it is possible to set up non-contiguous arguments (namely, can skip the ones that will not be used).
I have updated the patch so that the metadata are generated only when they are needed. The metadata format does not contain the position of the argument. The runtime needs to always emit the Printf hidden kernel argument so that library code does not need to know if it is emitted or not.
I think the metadata has a HIDDENNONE kind which needs to be used for kernel arguments that need no set up.
lib/Target/AMDGPU/MCTargetDesc/AMDGPUHSAMetadataStreamer.cpp | ||
---|---|---|
265–267 | Should these only be requested if there are uses of the corresponding get intrinsics along the lines of only generating the following ones if they are actually used? If not generated then the HIDDENNONE needs to be generated if necessary. |
lib/Target/AMDGPU/MCTargetDesc/AMDGPUHSAMetadataStreamer.cpp | ||
---|---|---|
265–267 | Can we do that in a separate patch? This patch is intended for metadata for enqueue_kernel. |
Can we do that in a separate patch? This patch is intended for metadata for enqueue_kernel.
I think separate patch for this is better.
lib/Target/AMDGPU/MCTargetDesc/AMDGPUHSAMetadataStreamer.cpp | ||
---|---|---|
265–267 | I have this and other minor cleanups partially completed. |
Should these only be requested if there are uses of the corresponding get intrinsics along the lines of only generating the following ones if they are actually used? If not generated then the HIDDENNONE needs to be generated if necessary.