This wasn't setting any of the attributes the target would expect to
emit for kernels.
Details
Diff Detail
Event Timeline
That seems important. What was the symptom of failing to set these? We may now be redundantly setting some, e.g.
I think convergent is set somewhere else before this patch.
Added a few people who may be able to run the patch against nvptx to check for regressions. If there isn't already a nvptx attribute test we should commit one before this patch so we can see the change. If there is already one, then we haven't changed it here and all is good.
Edit: (Looked for an nvptx equivalent and can't see one, but that doesn't mean it's not already there somewhere)
A bunch of missing attributes on the kernel. The one I noticed was not setting amdgpu-implicitarg-num-bytes (although D112488 avoids needing to do that), but we have a few other attributes that simply wouldn't be set. I'm fighting with some divergence between upstream and the internal branches with these attributes. In particular, the internal branch is hacking on the generic attributes for openmp, and also redundantly (and incorrectly) setting amdgpu-flat-work-group-size to the invalid range 257,257.
convergent isn't a target attribute and isn't the target's responsibility to add.
Is the behaviour change in the above comments intentional? Pointed out by @estewart08
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
9208 | Here, we skip the amdgpu-implicitarg-num-bytes and uniform-work-group-size assignments if FD is nullptr | |
9288 | Here, we do the amdgpu-implicitarg-num-bytes and uniform-work-group-size assignments regardless of whether FD is true or not |
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
9288 | Cancel that, there's a IsOpenMP = ...&& !FD here. Failed to follow the control flow. |
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
9288 | Are we looking for if (AMDGPU::isKernel(function.getCallingConv()) instead of looking at the function attributes? I don't think we want to annotate non-kernels with these things |
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
9288 | !FD seems to always be true for openmp kernels because there's no associated function. This isn't looking at the IR function calling convention but I thought about switching to check that instead, but that's a separate change. |
Committed as 6c27d389c8a00040aad998fe959f38ba709a8750, recommitted as 2f0a5714184cca9325004506a22a2a3193c825aa
Here, we skip the amdgpu-implicitarg-num-bytes and uniform-work-group-size assignments if FD is nullptr