On AMD target, printf is replaced and call to printf_alloc are inserted
in order to get the buffer offset to write to
Details
Diff Detail
Event Timeline
I don't think there are many benefits in merging this ahead of time without the rest of printf implementation so that it can be tested.
clang already emits an expansion for printf using the hostcall mechanism. This seems to be doing something different, so I'm not sure where it's going
An implementation can be tested here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6040
It's based on what it's expected by the the LLVM AMD gpu target when compiling cl file.
AMD Roc is using a similar implementation.
Does the printf expansion work for all amdgpu triples, including amdgcn--mesa3d?
will it work for non-ROC capable GPUs?
since it uses gcn specific builtin it should be in amdgcn/ rather than amdgpu/
Really I would prefer if this just cloned rocm-device-libs. The backend doesn't need to maintain two ABIs for dnot
Nothing in the compiler actually depends on roc capable GPUs. The limitations are all runtime/driver related
The exception to this would be places in the libraries that assume flat instructions are available. Theoretically we could handle these in the backend for SI with tagged pointers or something
so there are no hidden assumptions; like the printf buffer being cache coherent with host CPU, or the implementation using s_sendmsg to trigger CPU draining the buffer before the kernel finishes execution?
the main difference is atomic usage (I use those already in liclc), implict arg pos and asking to set the offset to 8 at buffer init instead of adding 8 on every call.
I can easily change the last 2
I don't believe so, but I know next to nothing about the runtime so I'm not 100% confident
The implementation use a regular device buffer that is read at the end of the execution. Mostly like a clEnqueueReadBuffer
The implementation use a regular device buffer that is read at the end of the execution. Mostly like a clEnqueueReadBuffer
Is that on ROCm side or clover? will it stay that way on the ROCm side in the future?
The idea of using different ABI (and -mesa3d triple) was that clover is not tied to changes AMD makes for ROCm.
it can make independent decisions, it's not exposed to potential breakage when ROCm does incompatible changes and does not force ROCm ABI on other users of clover (like r600 or nouveau).
It might be OK for printf (let's see what curro says on the mesa side), but I'd expect that attempting to closely follow ROCm will speed up efforts to switch mesa opencl to SPIRV->NIR path.
Add printf declaration to generic include
Move implementation part to amdgcn subdir
Add CL 1.2 flag to support variadic arguments
libclc/generic/include/clc/printf/printf.h | ||
---|---|---|
3 | I think we should probably have an #if guard around this declaration to make sure the CLC language version is 1.2 or higher. With this patch applied, mesa/clover on a CL1.1 device can no longer build CL kernels. |
Have you checked that returning NULL is handled properly?
Should the buffer wrap around instead?
libclc/amdgcn/lib/SOURCES | ||
---|---|---|
7 | Calling the file printf_alloc or similar would make it clearer that it doesn't actually implement printf | |
libclc/amdgcn/lib/printf/printf.cl | ||
3 ↗ | (On Diff #282443) | 'replace' -> 'replaces' |
10 ↗ | (On Diff #282443) | since the return value is a pointer, NULL would be more appropriate |
13 ↗ | (On Diff #282443) | 'store' -> 'stores' pls add a short table of the expected buffer layout |
15 ↗ | (On Diff #282443) | This would be more readable with a temp var buffer_offset_ptr since you're using it below as well. |
28 ↗ | (On Diff #282443) | return type is a pointer, NULL would be more appropriate |
Uh... It seems this is needed to add printf support to the AMD OpenCL drivers in Gallium Clover. Which would enable that architecture to have OpenCL 1.2 driver support (finally).
Why wasn't this committed to LLVM yet? I am saying this as an outside observer who was passing by looking at https://mesamatrix.net/ wondering why the AMD cards did not support printf for OpenCL 1.2. Then went down a rabbit hole and got here.
It seems this is a major stopper for that and yet this patch, which was approved by the reviewers some 6 months or more ago, has had no one get around to commit it yet.
Anyone with commit rights please help.
Unless I am missing something here.
Calling the file printf_alloc or similar would make it clearer that it doesn't actually implement printf