A value of 'workitemid / wavefrontize' or 'workitemid & (wavefrontize - 1)'
is wave uniform.
Details
Diff Detail
Event Timeline
A bit of explanation why change to the isUniformLoad is needed. The DA is stateless analysis and we can attach amdgpu.uniform metadata to the GEP, but when it comes to SDag DA cannot help and SDag sees the node as divergent. Even if divergent bit is unset on the node during dag creation it hardly helps after dag combines losing it when killing the original shift node. Moreover, dag combines create a pattern which is much more difficult to recognize. So checking the MMO metadata (much like gisel does, it relies on the MMO metadata completely for that).
This looks good to me. The patch enables the compiler to generate s_load when the user writes code that divides threadIdx.x by the wavefront size. The only suggestion I have is to add some test cases showing explicitly that the amdgpu.uniform metadata is added via the divergence analysis. (by the AnnotateUniformValues pass). The test cases provided rely upon that working correctly, though they show the end result rather than the steps needed to get the result.
I will update the same testcase to run only annotate uniform values pass and check the metadata.
what if the operation happens in divergent control flow, e.g
int x = 0; x = threadIdx.x > 32 : threadIdx/64 : 0;
will this patch still work?
Regardless of the CFG the value of TID / 64 is always uniform. The value of 'x' here is another value, derived from that uniform value. It is no different from x = cc ? sgpr0 : sgpr1; LHS and RHS are uniform, but 'x' is not.
What if the blockDim.x is not 64, like 65, blockDim.y is not 1
for example:
the workgroup shape is <65, 2, 1>
warp 0 should be [0,0] to [63, 0]
warp 1 should be [64,0] to [62, 1]
warp 2 should be [63, 1] to [64, 1]
warp1&2's threadIdx.x / 64 should be still divergent
Yes, you are right, thanks! Looks like I need to limit it to the case when there is amdgpu-no-workitem-id-y attribute on the function.
Yes, although practically divisor is a power of 2 and in fact a wavefront size (that is how people use it), so there is no SDiv or UDiv, there is a shift like in this patch.